Multi-agentSystems
MVNagendraPrasad,andVictorRLesser
DepartmentofComputerScience
UniversityofMassachusetts,Amherst,MA01002.
nagendra,lesser@cs.umass.edu
Abstract
Achievingeffectivecooperationinamulti-agentsystemisadifficultproblemforanumberofreasonssuchaslimitedandpossiblyout-datedviewsofactivitiesofotheragentsanduncertaintyabouttheoutcomesofinteractingnon-localtasks.Inthispaper,wepresentalearningsystemcalledCOLLAGE,thatendowstheagentswiththecapabilitytolearnhowtochoosethemostappropriatecoordinationstrategyfromasetofavailablecoordinationstrategies.COLLAGEreliesonmeta-levelinformationaboutagents’problemsolvingsituationstoguidethemtowardsasuitablechoiceforacoordinationstrategy.Wepresentempiricalresultsthatstronglyindicatetheeffectivenessofthelearningalgorithm.
Keywords:Multi-agentSystems,Coordination,Learning
1
1Introduction
Coordinationistheprocessofeffectivelymanaginginterdependenciesbetweenactivitiesdistributedacrossagentssoastoderivemaximumbenefitfromthem[21,6].Basedonstructureanduncertaintyintheirenvironment,agentshavetochooseandtemporallyordertheiractivitiestomitigatetheeffectsofharmfulinterdependenciesandexploitthebeneficialinterdependenciesamongthem.Researchersinbothhumanandcomputationalorganizationaltheorieshavepointedoutthatthereisnosingleorganizationorcoordinationprotocolthatisthebestforallenvironments.Inhumanorganizations,environmentalfactorssuchasdynamismandtaskuncertaintyhaveastrongeffectonwhatcoordinatedactionsareandhoworganizationallyacceptableoutcomesarise[18,9,33].Theseeffectshavebeenobservedinpurelycomputationalorganizationsaswell[8,7,6,23,26].Achievingeffectivecoordinationinamulti-agentsystem(MAS)isadifficultproblemforanumberofreasons.Anagent’slocalcontroldecisionsaboutwhatactivitytodonextorwhatinformationtocommunicateandtowhomorwhatinformationtoaskothersmaybeinappropriateorsuboptimalduetoitslimitedviewoftheinteractionsbetweenitsownactivitiesandthoseoftheotheragents.Inordertomakemoreinformedcontroldecisions,theagentshavetoacquireaviewofthetaskstructuresofotheragents.Totheextentthatthisresolvesagents’uncertaintyaboutthenon-localproblemsolvingactivities,theycanactcoherently.However,anagenthastoexpendcomputationalresourcesinacquiringandexploitingsuchnon-localviewsofotheragents’activities.Thisinvolvescommunicationdelaysandthecomputationalcostofprovidingthisinformationandassimilatingtheinformationfromotheragents.Giventheinherentuncertaintyinagents’activitiesandthecostofmeta-levelprocessing,relyingonsophisticatedcoordinationstrategiestoacquirenon-localviewsoftaskstructuresmaynotbeworthwhileforallproblem-solvingsituations.Incertainsituations,coordinationprotocolsthatpermitsomelevelofnon-coherentactivityandavoidtheadditionaloverheadforcoordinationmayleadtobetterperformance[7,6,23,34].Forexample,whentheagentsareunderseveretimepressureandtheloadoftheactivitiesattheagentsishigh,sophisticatedagentcoordinationstrategiesdonotgenerallypayoff.Agentsmaynothavethetimetobenefitfromtheincreasedawarenesstheyderivethroughcoordinationinsuchsituations.Inthispaper,wewillbedealingwithhowagentscanlearntodynamicallychoosetheappropriatecoordinationstrategyindifferentcoordinationprobleminstances.Weempiricallydemonstratethatevenforanarrowclassofagentactivities,learningtochoosetheappropriatecoordinationstrategybasedonmeta-levelcharacterizationoftheglobalproblemsolvingstateoutperformsusinganysinglecoordinationstrategyacrossallprobleminstances.
Inordertoaccomplishlearning,webreakthecoordinationproblemintotwophases.Inthefirstphase,theagentsexchangemeta-levelinformationnotdirectlyusedforcoordination.Thisinformationisusedbytheagentstoderiveapredictionoftheeffectivenessofvariouscoordinationmechanismsinthepresentproblemsolvingepisode.Thesemechanismsdifferintheamountofnon-localinformationtheyacquireanduse,andinthecomplexityofanalysisofinteractionsbetweenactivitiesattheagents.Agentschooseanappropriatesubsetofthecoordinationmechanisms(oracoordinationstrategy)basedonthemeta-levelinformationandenterPhaseII.Inthisphase,thecoordinationstrategyselectedinPhaseIdecidesthetypesofinformationtobeexchangedandthekindofreasoningaboutlocalandnon-localactivitiestheagentsneedtoperformtoachievecoherentbehavior.Wecallthemeta-levelinformationasituationandthetwophaseprocesssituation-specificcoordination.Learningsituation-specificcoordinationinvolvesassociatingappropriateviewsoftheglobalsituationwiththeknowledgelearnedabouttheeffectivenessofthecoordinationmechanisms.
2
Theagentsinacooperativemulti-agentsystemcanlearntocoordinatebyproactivelyresolvingsomeoftheiruncertaintyabouttheglobalproblemsolvingstateandsituatingtheirlearninginmoreglobalviews.
Therestofthepaperisorganizedasfollows.Wefirstplaceourworkincontextanddiscussitsrelationshiptotheexistingworkonlearninginmulti-agentsystems.WethenbrieflyreviewtheTÆMStaskstructurerepresentationforcoordinationproblemsandtheGPGPmodelfordesigningcoordinationstrategies.Wesubsequentlydescribeourlearningalgorithmthatlearnshowtochooseamongcoordinationstrategiesofdifferentlevelsofsophistication.Wethenpresentsomeofourexperimentalresultsandconclude.
2RelatedWork
Previousworkrelatedtolearninginmulti-agentsystemsislimitedandmuchofthisworkreliesontechniquesderivedfromreinforcementlearning[1,35],geneticalgorithms[16]andclassifiersystems[17].InSen,SekaranandHale[29]theagentsusereinforcementlearningtoevolvecom-plimentarypoliciesinaboxpushingtask,andinCritesandBarto[3]ateamofreinforcementlearningagentsoptimizeelevatordispatchingperformance.Inboththeseworks,theagentsdonotcommunicatewithoneanotherandanyagenttreatstheotheragentsasapartoftheenvironment.Weiss[37]usesavariantofHolland’s[17]bucketbrigadealgorithmforlearninghierarchicalor-ganizationstructuringrelationshipsinablockworlddomainviastrengtheningpromisingchainsofactionsthroughbid-rewardcycles.Tan[36]dealswithapredator-preydomaininagridworld.Theagentsshareperceptioninformationtoovercomeperceptuallimitationsorcommunicatepolicyfunctionslearnedthroughreinforcementlearning.Grefenstette[14]usesgeneticalgorithmstolearnreactivedecisionrulesforagentsinapredator-preydomainsimilartothatin[36].SandholmandCrites[28]studytheemergenceofcooperationintheIteratedPrisoner’sDilemmaproblem,wheretheagentsareself-interested,andanagentisnotfreetoaskforanykindofinformationfromtheotheragents.HaynesandSen[15]presentstudiesinlearningcasestoresolveconflictsamongagentsinapredator-preydomainsimilartothatusedbyTan[36].Thereisnocommunicationbetweentheagents,andthecasesresultfromtheagent’sperceptionoftheproblemsolvingstate.Themulti-agentsystemsthatthispaperdealswithconsistofcomplexcooperativeagents(eachagentisasophisticatedproblemsolver),andanagent’slocalproblemsolvingcontrolinteractswiththatoftheotheragents’inintricateways.Inourwork,ratherthantreatingotheragentsasapartoftheenvironmentandlearninginthepresenceofincreaseduncertainty,agentscommunicatemeta-levelinformationtoresolvetheuncertaintytotheextentpossible(thereisstilltheenvironmentaluncertaintythattheagentscannotdomuchabout).AgentssharingperceptualinformationasinTan[36]andGreffenstette[14]orbiddinginformationasinWeiss[37]donotmakeexplicitthenotionofsituatingthelocalcontrolknowledgeinamoreglobal,abstractsituation.Theinformationsharedisweak,andthestudieswereconductedindomainssuchaspredator-prey[36,14]orblocksworld[37]wheretheneedforsharingmeta-levelinformationandsituatinglearninginthisinformationisnotapparent.
Ourpreviousworkexploredlearningsituation-specificorganizationalrolesinaheterogeneousmulti-agentsystem[26].Anorganizationalrolerepresentsasetoftasksanagentcanperformonacompositesolution.Agentuseabstractionsofproblemsolvingstatetolearntoplaytherolestheyarebestsuitedforinamulti-agentsearchprocessforcooperativelyconstructinganoverall
3
solution.Thesystemwastestedinaparametricdesigndomainandthelearningagentsproduceddesignsthat,onanaverage,werebetterthanthoseproducedbyasystemwithagentsplayingroleshand-codedbyahumanexpert.
SugawaraandLesser[34]alsorecognizetheneedforsituationspecificityinlearningcoordina-tion,thoughtheydohavethenotionoftwo-phasecoordination.Theyareconcernedwithlearningtomakethesituationsmorediscriminatingtoavoidusinganinappropriatecoordinationstrategyinthedomainofdistributednetworkdiagnosis.Theirlearningmechanismsrelyondeepdomainknowledgeandagenthomogeneityassumptionsinordertoprogressivelyrefinesituationsbasedonfailure-drivenexplanationandcomparativeanalysisofproblemsolvingtraces.Theytestthetheoryonaverylimitednumberofcoordinationsituations,andtheevidencewasanecdotal.Itisnotclearhowsuchknowledge-intensivelearningcanbegeneralizedtootherinstanceswithoutsignificantknowledgeengineeringandthedevelopmentofmoresophisticatedexplanation-basedlearningtechniques.Despitetheselimitations,combiningtheirworkonlearningsituationrepre-sentationswiththelearningpresentedhereonsituation-basedchoiceofcoordinationcouldhaveinterestingimplicationsforsituation-specificlearning.
GarlandandAlterman[10]discussissuesinknowledgereuseinmulti-agentsystems.AgentsworkinginasimulatedMOVERS-WORLDdomainhavetocollaboratetomoveobjectstooheavyforasingleagenttomove.Agentsusetheirpastexperiencetoanticipatecoordinationratherthandynamicallycommunicatetoestablishit.Motivationforthisworkisdifferentfromours.Intheirwork,agentsexploittheirpastexperiencetoreducethecontroleffortneededtoachievecoordinationinfutureprobleminstances.SimilaruseofpastexperiencehasbeenstudiedinNagendraPrasad,LanderandLesser[25].
Certainmulti-agentlearningsystemsintheliteraturedealwithadifferenttaskfromthatpresentedinthispaper.SystemslikeILS[32]andMALE[31]usemulti-agenttechniquestobuildhybridlearnersfrommultiplelearningagents.Ontheotherhand,wedealwithlearningproblem-solvingcontrolformulti-agentsystems.
3CoordinationinMulti-agentSystems
Inthispaper,weshowtheeffectivenessoflearningsituation-specificcoordinationbyextend-ingadomain-independentcoordinationframeworkcalledGeneralizedPartialGlobalPlanning(GPGP)[6,4].GPGPisaflexibleandmodularmethodthatrecognizestheneedforcreatingtailoredcoordinationstrategiesinresponsetothecharacteristicsofaparticulartaskenvironment.Itisstructuredasanextensiblesetofmodularcoordinationmechanismssothatanysubsetofmechanismscanbeused.Eachmechanismcanbeinvokedinresponsetothedetectionofcertainfeaturesoftheenvironmentandinterdependenciesbetweenproblemsolvingactivitiesofagents.Thecoordinationmechanismscanbeconfigured(orparameterized)indifferentwaystoderivedifferentcoordinationstrategies.InGPGP(withoutthelearningextensions),onceasetofmech-anismsareconfigured(byahumanexpert)toderiveacoordinationstrategy,thatstrategyisusedacrossallprobleminstancesintheenvironment.Experimentalresultshavealreadyverifiedthatforsomeenvironmentsasubsetofthemechanismsismoreeffectivethanusingtheentiresetofmechanisms[6,23].Inthiswork,wepresentalearningextension,calledCOLLAGE,thatendowstheagentswiththecapabilitytochooseasuitablesubsetofthecoordinationmechanismsbasedonthepresentproblemsolvingsituation,insteadofhavingafixedsubsetacrossallprobleminstances
4
inanenvironment.
GPGPcoordinationmechanismsrelyonthecoordinationprobleminstancebeingrepresentedinaframeworkcalledTÆMS[4,5].TheTÆMSframework(TaskAnalysis,EnvironmentModeling,andSimulation)[4,5]representscoordinationproblemsinaformal,domain-independentway.Itcapturestheessenceofthecoordinationprobleminstances,independentofthedetailsofthedomain,intermsoftrade-offsbetweenmultiplewaysofaccomplishingagoal,progresstowardstheachievementofgoals,variousinteractionsbetweentheactivitiesintheprobleminstanceandtheeffectoftheseactivitiesontheperformanceofthesystemasawhole.TÆMScanmodelaspectsofcoordinationincomplexworth-orienteddomains[27]wherestateshavefunctionsthatratetheiracceptability(notnecessarilyabinaryrating).Therearedeadlinesassociatedwiththetasksandsomeofthesubtasksmaybeinterdependent,thatis,cannotbesolvedindependentlyinisolation.Theremaybemultiplewaystoaccomplishataskandthesetradeoffthequalityofaresultproducedforthetimetoproducethatresult.Qualityisusedasacatch-alltermrepresentingacceptabilitycharacteristicslikecertainty,precision,andcompleteness,otherthantemporalcharacteristics.Asetofrelatedtasksiscalledataskgrouporataskstructure.Ataskstructureistheelucidationofthestructureoftheproblemanagentistryingtosolve.Thequalityofataskisafunctionofthequalityofitssubtasks.Theleaftasksaredenotedasmethodswithbaselevelqualityandduration.Methodsrepresentdomainactions,likeexecutingablackboardknowledgesource,runninganinstantiatedplan,orexecutingapieceofcodewithitsdata.Ataskmayhavemultiplewaystoaccomplishit,representedbymultiplemethods,thattradeoffthetimetoproducearesultforthequalityoftheresult.Figure1isanexampleofasimpletasksstructure.Besidestask/subtaskrelationships,therecanbeotherinterdependencies,calledcoordinationinterrelationships,betweentasksinataskgroup[5].Inthispaper,wewillbedealingwithtwosuchinterrelationships:
facilitatesrelationshiporsoftinterrelationship:TaskAfacilitatesTaskBifthetheexecutionTaskAproducesaresultthataffectsanoptionalparameterofTaskB.TaskBcouldhavebeenexecutedwithouttheresultfromTaskA,buttheavailabilityoftheresultprovidesconstraintsontheexecutionofTaskB,leadingtoimprovedqualityandreducedtimerequirements.enablesrelationshipsorhardinterrelationship:IfTaskAenablesTaskBthentheexecutionTaskAproducesaresultthatisarequiredinputparameterforTaskB.TheresultsofexecutionofTaskAmustbecommunicatedtoTaskBbeforeitcanbeexecuted.
InGPGP,eachagentcanbeviewedashavingthreecomponents:alocalscheduler,acoor-dinationmoduleandabeliefdatabase.Thebeliefdatabaseofanagentrepresentsitssubjectiveviewofthepresentcoordinationprobleminstancecomprisingtheagent’sknowledgeaboutthetaskstructuresintheenvironment.Thelocalschedulerofanagentusestheinformationinthebeliefdatabasetobuildschedulesofmethodexecutionactionsinordertomaximizeutility.Inthisworkweuseadesign-to-timescheduler[12]thatcantakeintoaccount,theconstraintsprovidedbythecoordinationmoduleandprovideschedulesthatmaximizetheglobalutilitymeasure.Thecoordinationmodulemodulatesthelocalschedulerbynoticingthetaskstructuresinthebeliefdatabaseanddoinganumberofactivitieslikegatheringinformationaboutnewtaskstructuresintheenvironment,communicatinginformationaboutlocalbeliefstootheragentsorreceivinginfor-mationfromthem,andmakingorretractingcommitments.Acommitmentrepresentsacontracttoachieveaparticularqualitybyaspecifieddeadline.Thecoordinationmechanismsareparam-eterizedindependentlysothateachcombinationoftheparametersettingsleadstoacoordination
5
TT1ORT2ANDT21ANDT22M22FacilitatesT212ORT213M213M212Enables
Figure1:ExampleofaTaskStructure
strategy.Differentmechanismsrequiredifferenttypesofinformationtooperate.Thus,bychoos-ingdifferentsetsofcoordinationmechanisms,theamountofcommunicationandotheroverheadsassociatedwithcoordinationcanbevaried.Theinterfacebetweenthecoordinationmechanismsandthelocalschedulerisbidirectionalandnegotiation-based,permittingthecoordinationmoduletoask“what-if”questions[11].Thisisacrucialabilitywhoseutilitybecomesobviouslater,whenwedescribeourlearningalgorithminmoredetail.
InGPGP,acoordinationstrategycanbederivedbyactivatingasubsetofthecoordinationmechanisms.Giventhatthesemechanismscanbeparameterizedindependently,therearelargenumberofpossiblecoordinationstrategies.Learningtochoosethebeststrategyfromamongstalargesetcanbequitecumbersome.Wesuggesttwowaystoprunethissearchspace,leadingtoamoremanageablesetofdistinctcoordinationstrategies:
Usedomainknowledgetoprunedownthesetofpossiblecoordinationstrategiestosmallnumberofinterestingdistinctstrategies.Intheexperimentstobedescribedlater,wewillbedealingwiththedistributeddataprocessingdomain.Inthisdomain,weinvestigatethreestrategies:
–balanced(ordynamic-scheduling):Itisthesameaspreviouslydescribed.Agentscoordinatetheiractionsbydynamicallyformingcommitments.Relevantresultsaregeneratedbyspecifictimesandcommunicatedtotheagentstowhomcorrespondingcommitmentsaremade.Agentsscheduletheirlocaltaskstryingtomaximizetheaccrualofqualitybasedonthecommitmentsmadetoitbytheotheragents,whileensuringthatcommitmentstootheragentsaresatisfied.Theagentshavetherelevantnon-localviewofthecoordinationproblem,detectcoordinationrelationships,formcommitmentsandcommunicatethecommittedresults.
–data-flowstrategy:Anagentcommunicatestheresultofperformingatasktoalltheagentsandtheotheragentscanexploittheseresultsiftheystillcan.Thisrepresents
6
theotherextremewheretherearenocommitmentsfromanyagenttoanyotheragent.–roughcoordination:Thisissimilartobalancedbutcommitmentsdonotariseoutofcommunicationbetweenagentsbutareknownapriori.Eachagenthasanapproximateideaofwhentheotheragentscompletetheirtasksandcommunicateresultsbasedonitspastexperience.Theagentshavetherelevantnon-localviewofthecoordinationproblem,detectcoordinationrelationships,butuseroughcommitmentsandcommunicatethecommittedresults.“Roughcommitments”areaformoftacitsocialcontractbetweenagentsaboutthecompletiontimesoftheirtasks.Itmightbepossibletoviewroughcommitmentsasprecompiledsociallaws[30].
Thelattertwocoordinationstrategiesarethealternativesnormallyusedinthedistributeddataprocessingdomain[23].DeckerandLesser[6]proposedbalancedasasophisticatedstrategythatexploitsanumberofmechanismstoachievecoordination.Inourexperiments,agentslearntochooseamongthesethreecoordinationstrategiesinthedomainofdistributeddataprocessing.Inthisdomain,threetypesoftaskgroupsarise:routinetaskgroups,crisistaskgroups,andlowprioritytaskgroups(seeSection5fordetails).Thelattertwotaskgroupsariseonlyoccasionally(i.e.withaprobability)andhenceitisunrealistictoexpectcommitmentsoncrisistasksandlowprioritytaskstofollowsuchtacitaprioriroughcommitments.Sothiscoordinationtypeusesroughcommitmentsforroutinetasksbutbehavesjustlikedataflowforthenon-routinecrisisandlowprioritytasks.
Clustertheentiresetofpossiblestrategiesontheperformancespaceandchooseaprototyp-icalinstancefromeachcluster.Theagentperformancecanbecharacterizedalonganumberofdimensionsliketotalquality,numberofmethodsexecuted,numberofcommunications,andterminationtime.DeckerandLesser[6]exploretheperformancespaceofthecoordi-nationstrategiesbasedonthesefourperformancemeasuresandidentifyfiveprototypicalcoordinationstrategies:
–balancedwithallmechanismsfordetectingsoftandhardcoordinationinterrelation-shipsandformingcommitmentsonthemturnedon.Onlyrelevantpartialviewsareexchangedbetweenagentsfordetectingtheseinterrelationshipsandresultsthatsatisfycommitmentsareexchanged.
–simplewithnohardorsoftcoordinationmechanismsorexchangeofrelevantpartialnon-localviews.Allresultsobtainedduetotaskexecutionsarebroadcasttoallagents.–myopicwithallcommitmentmechanismsonbutnonon-localview.–toughwithnosoftcoordinationbutotherwisesameasbalanced
–mutewithnohardorsoftcoordinationmechanismsandnonon-local-viewsandnocommunicationwhatsoever.
Thesefivecoordinationsstrategieswillbeusedinourexperimentslateronsomesyntheticdomains.
4COLLAGE:LearningCoordination
4.1LearningCoordination
Ourlearningalgorithm,calledCOLLAGE,usesabstractmeta-levelinformationaboutcoordi-nationprobleminstancestolearntochoose,forthegivenprobleminstance,theappropriatecoordinationstrategyfromthethreestrategiesdescribedpreviously.LearninginCOLLAGE(COordinationLearnerformuLtipleAGEntsystems)fallsintothecategoryofInstance-BasedLearningalgorithms[2]originallyproposedforsupervisedclassificationlearning.We,however,usetheIBL-paradigmforunsupervisedlearningofdecision-theoreticchoice.Wewouldliketonotethatwecouldaswellhaveusedsomeothermethodlikeneuralnetsordecisiontrees.Ourinterestinthisworkisnotinstudyingthecomparativeperformancesofvariousalgorithms.
Learninginvolvesrunningthemulti-agentsystemonalargenumberoftrainingcoordinationprobleminstancesandobservingtheperformanceofdifferentcoordinationstrategiesontheseinstances.Whenanewtaskgrouparisesintheenvironment,eachoftheagentshasitsownpartiallocalviewofthetaskgroup.Basedonitslocalview,eachagentformsalocalsituationvector.Alocalsituationrepresentsanagent’sassessmentoftheutilityofreactingtovariouscharacteristicsoftheenvironment.SuchanassessmentcanpotentiallyindicatetheneedforactivatingthevariousGPGPmechanismsandconsequentlyhasadirectbearingonthetypeofcoordinationstrategythatisbestforthegivencoordinationepisode.Theagentsthenexchangetheirlocalsituationvectorsandeachoftheagentscomposesallthelocalsituationvectorsintoaglobalsituationvector.Allagentsagreeonachoiceofthecoordinationstrategyandthechoicedependsonthekindoflearningmodeoftheagents:
Mode1:Inthismode,theagentsrunalltheavailablecoordinationstrategiesandnotetheirrelativeperformancesfortheeachofthecoordinationprobleminstances.Thus,forexample,agentsruneachofdata-flow,rough,andbalancedforacoordinationepisodeandstoretheirperformancesforeachstrategy.
Mode2:Inthismode,theagentschooseoneofthecoordinationstrategiesforagivencoordinationepisodeandobserveandstoretheperformanceonlyforthatcoordinationstrategy.Theychoosethecoordinationthatisrepresentedtheleastnumberoftimesintheneighborhoodofasmallradiusaroundthepresentglobalsituation.Thisisdonetoobtainabalancedrepresentationforallthecoordinationstrategiesacrossthespaceofpossibleglobalsituations.
Mode2isaquasi-onlinealgorithm.Intheinitialstagesitjustexploresandinthelaterstagesitjustexploitsthelearnedinformation.StudyingCOLLAGEinasetupthatismoretypicalofonlinelearningalgorithms,whereexplorationandexploitationareinterleaved,ishighonouragendaoffuturework.
Atendofeachrunofthecoordinationepisodewithaparticularcoordinationstrategy,theperformanceofthesystemisregistered.Thisisrepresentedasavectorofperformancemeasuressuchastotalquality,numberofcommunications,andterminationtime.Learninginvolvessimplyaddingthenewinstanceformedbytheperformanceofthecoordinationstrategyalongwiththeassociatedproblemsolvingsituationtothe“instance-base”.Thus,thetrainingphasebuildsasetof
requirementsasfaraspossiblebutwithasacrificeinquality.Undertimepressure,lowerquality,lowerdurationmethodsarepreferredoverhigherquality,higherdurationmethodsforachievingaparticulartask.Inordertogetanestimateofthetimepressure,anagentgeneratesvirtualtaskstructuresfromitslocaltaskstructuresbysettingthedeadlinesofthetaskgroups,tasksandmethodsto(alargenumber)andschedulingthesevirtualtaskstructures.Theagentsscheduleagainwithlocaltaskstructuressettotheactualdeadline.Timepressureisobtainedastheratiooftheschedulequalitywiththeactualdeadlinesandtheschedulequalitywithlargedeadlines.
Thesixthcomponentrepresentstheload.Itisformedusingmethodssimilartothatdiscussedabovefortimepressurebutusingthedurationoftheschedulesformedwiththevirtualtaskstructures.Thetimetakenbytheschedulewhenithasnotimepressure(deadlineis)representstheamountofworktheagentwouldhavedoneunderidealconditions.However,timepressuremakestheagentworkalmostrightuptothedeadline.Itmaynotworkallthewaytothedeadlineasitmaynotfindamethodthatcanbefittedintothelastavailablechunkoftime.Thus,loadisobtainedastheratioofexecutiontimeunderactualdeadlineandtheexecutiontimeundernotimepressure.
4.1.2FormingaGlobalSituationVector
Eachagentcommunicatesitslocalsituationvectortoallotheragents.Anagentcomposesallthelocalsituationvectors:itsownandthoseitreceivedfromotherstoformaglobalsituationvector.Anumberofcompositionfunctionsarepossible.Theoneweusedintheexperimentsreportedhereissimple:component-wiseaverageofthelocalsituationvectors.Thustheglobalsituationvectorhassixcomponentswhereeachcomponentistheaverageofthecorrespondingcomponentsofthelocalsituationvectors.
Anexampleglobalsituationvectorlooksasfollows:(0.820.770.660.891.00.87).Herethelowvalueofthethirdcomponentrepresentslargequalitygainsbydetectingandcoordi-natingonhardinterrelationships.Thustwoofthemoresophisticatedcoordinationstrategiescalledbalancedandtough[6]arefoundtobebetterperformersinthissituation.Ontheotherhand,inaglobalsituationvectorsuchas(0.800.900.880.800.610.69)thelowvaluesoffifthandsixthcomponentsindicatehightimepressureandloadinthepresentproblemsolvingepisode.Eveniftheagentsusesophisticatedstrategiestocoordinate,theymaynothavethetimetobenefitfromit.Hence,relativelysimplecoordinationstrategieslikesimpleormute[6]dobetterinthisscenario.
Note,however,thatinmostsituationvectors,thesetrade-offsaresubtleandnotasobviousastheaboveexamples.Itisdifficultforahumantolookatthesituationsandeasilypredictwhichstrategyisthebestperformer.Thetrade-offsmaybeveryspecifictothekindsoftaskstructuresthatoccurinthedomain.Hence,hand-codingthestrategiesbyadesignerisnotapracticalalternative.Oncetheentireinstance-baseisformed,eachofthefeaturesforeachinstanceisnormalizedbytherangeofmaximumandminimumforthatfeaturewithintheentireinstance-base.Thisisdoneinordertoavoidbiasingthesimilaritymetricinfavorofanyparticularfeature.
10
4.2ChoosingaCoordinationStrategy
COLLAGEchoosesacoordinationstrategybasedonhowthesetofavailablestrategiesperformedinsimilarpastcases.WeadoptthenotationfromGilboaandSchmeidler[13].Eachcaseistriplet
whereandisthesetofsituationsrepresentingabstractcharacterizationofcoordinationproblems,andisthesetofcoordinationchoicesavailable,andisthesetofresultsfromrunningthecoordinationstrategies.
Decisionsaboutcoordinationstrategychoicearemadebasedonsimilarpastcases.Outcomesdecidethedesirabilityofthestrategies.Wedefineasimilarityfunctionandautilityfunctionsasfollows:
Intheexperimentspresentedlater,weusetheEuclideanmetricforsimilarity.
Thedesirabilityofacoordinationstrategyisdeterminedbyasimilarity-weightedsumoftheutilityityieldedinthesimilarpastcasesinasmallneighborhoodaroundthepresentsituationvector.Weobservedthatsuchanaveragingprocessinaneighborhoodaroundthepresentsituationvectorwasmorerobustthantakingthenearestneighbor,possiblybecausetheaveragingprocess
bethesetofpastsimilarcasestoproblem(greaterwaslesssensitivetonoise.Let
thanathresholdsimilarity).
For
,let
.Theutilityof
isdefinedas
GilboaandSchmeidler[13]describecase-baseddecisiontheoryasanormativetheoryofhumanbehaviorduringdecisionmaking.Eventhough,weadopttheirnotation,therearecrucialdifferencesinthemotivationsandstructureofthetwoworks.GilboaandSchmeidlerareprimaryconcernedwithadescriptivetheoryofhumandecisionmakingwhileourwork,developedindependently,isconcernedprimarilywithbuildingcomputationalsystemsbasedoncase-baseddecisiontheory.
11
ofinterrelationshipsintaskstructures.Thisoftengivesrisetoawiderangeoftaskstructuresandahugevarianceinthetypesofcapabilitiesneededbythesystemtoeffectivelyhandlethem.Accordingly,thelearningalgorithmsshowedatbestmodestgainsinperformance[24].Moreimportantly,itisunlikelythatmostrealapplicationsinvolveaninfinitevarietyoftaskstructures.Thedomainsemanticsdictateandlimitmorphologyofthetaskstructures.Whilethereisboundtobesomerandomnessinthesestructures,itishighlyunlikelythattheonlyregularitythatcanbemodeledinthetaskstructurerepresentationsofacoordinationprobleminstanceareafewparameterslikeitsmeandepthorbranchingfactor.NagendraPrasadetal.[23]developedagraph-grammar-basedstochastictaskstructuredescriptionlanguageandgenerationtoolformodelingtaskstructuresarisinginadomain.Suchagraph-grammar-basedtaskstructurespecificationlanguageispowerfulenoughtomodelthetopologicalrelationshipsoccurringintaskstructuresrepresentingmanyreallifeapplications.Fortheexperimentsbelow,weusethistooltomodeladistributeddataprocessingdomainasin[23].Wenowbrieflyintroducethisdomainandrefertheinterestedreaderto[23]forfurtherdetails.
Thedistributeddataprocessingdomainconsistsofanumberofgeographicallydisperseddataprocessingcenters(agents).Eachcenterisresponsibleforconductingcertaintypesofanalysistasksonstreamsofsatellitedataarrivingatitssite:“routineanalysis”thatneedstobeperformedondatacominginatregularintervalsduringtheday,“crisisanalysis”thatneedstobeperformedontheincomingdatabutwithacertainprobabilityand“lowpriorityanalysis”,theneedforwhicharisesatthebeginningofthedaywithacertainprobability.Lowpriorityanalysisinvolvesperformingspecializedanalysisonspecificarchivaldata.Differenttypesofanalysistaskshavedifferentpriorities.Acentershouldfirstattendtothe“crisisanalysistasks”andthenperform“routinetasks”onthedata.Timepermitting,itcanhandlethelow-prioritytasks.Theprocessingcentershavelimitedresourcestoconducttheiranalysisontheincomingdataandtheyhavetodothiswithincertaindeadlines.Resultsofprocessingdataatacentermayneedtobecommunicatedtoothercentersduetotheinterrelationshipsbetweenthetasksatthesecenters.In[23],wegivedetailsofhowastochasticgraph-grammarcanmodeltaskstructuresarisinginadomainsuchasthis.Inthesamework,wealsopresenttheresultsofempiricalexplorationsoftheeffectsofvaryingdeadlinesandcrisistaskgrouparrivalprobability.Basedontheexperiments,wenotedtheneedfordifferentcoordinationstrategiesindifferentsituationstoachievegoodperformance.Inthissection,weintendtodemonstratethepowerofCOLLAGEinchoosingthemostappropriatecoordinationstrategyinagivensituation.
Weperformedtwosetsofexperimentsvaryingtheprobabilityofthecentersseeingcrisistasks.Inthefirstsetofexperiments,thecrisistaskgrouparrivalprobabilitywas0.25andinthesecondsetitwas1.0.Forbothsetsofexperiments,lowprioritytasksarrivedwithaprobabilityof0.5andtheroutinetaskswerealwaysseenatthetimeofnewarrivals.Adayconsistedofatimesliceof140timeunitsandhencethedeadlineforthetaskstructureswasfixedat140timeunits.Intheexperimentsdescribedhere,utilityistheprimaryperformancemeasure.Eachmessageanagentcommunicatestoanotheragentpenalizestheoverallutilitybyafactorcalled
5.1.1Results
Fortheexperimentswherecrisistaskgrouparrivalprobabilitywas0.25,COLLAGEwastrainedon4500instancesinMode1andon10000instancesinMode2.Forthecasewherecrisistaskgrouparrivalprobabilitywas1.0,itwastrainedon2500instancesinMode1andon12500instancesinMode2.Inourexperiments,thesizeofthecase-baseforMode1learningwasdeterminedbyplottingtheperformanceofthelearner,averagedover500instancesrandomlygeneratedbythegrammar,forvarioussizesoftheinstance-base,andatcommunicationcostof0.Thelearningwasstoppedwhentheperformanceimprovementtaperedoff.Mode2,wastrainedtillitachievedtheperformanceofMode1learner,atcommunicationcostof0.Figure2showstheaveragequalityover100runsfordifferentcoordinationstrategiesatvariouscommunicationcosts.ThecurvesforbothMode1andMode2learningalgorithmslieabovethoseforalltheothercoordinationstrategiesforthemostpartinbothexperiments.WeperformedaWilcoxonmatched-pairsignedranksanalysistotestforsignificantdifferences(atsignificancelevel0.05)betweenaverageperformancesofthestrategiesacrosscommunicationscostsupto1.0(asopposedtopairwisetestsateachcommunicationcost).Thistestrevealedsignificantdifferencesbetweenthelearningalgorithms(bothMode1andModeII)andtheotherthreecoordinationstrategiesindicatingthatwecanassertwithahighdegreeofconfidencethattheperformanceofthelearningalgorithmsacrossvariouscommunicationcostsisbetterthanstaticallyusinganyoneofthefamilyofcoordinationstrategies.Asthecommunicationcostsgoup,themeanperformanceofthecoordinationstrategiesgodown.Forcrisistaskgrouparrivalprobabilityof0.25,thebalancedcoordinationstrategyperformsbetterthanthelearningalgorithmsatveryhighcommunicationcostsbecause,learningalgorithmsuseadditionalunitsofcommunicationtoformtheglobalsituationvectors.Atveryhighcommunicationcosts,eventhethreeadditionalmeta-levelmessagesforlocalsituationcommunication(oneforeachagent)ledtolargepenaltiesonutilityofsystem.Atcommunicationcostof1.0,Mode1learnerandMode2learneraverageat77.72and79.98respectively,whereas,choosingbalancedalwaysproducesanaverageperformanceof80.48.Similarbehaviorwasexhibitedatveryhighcommunicationcostswhenthecrisistaskgrouparrivalprobabilitywas1.0.Figures3givesthenumberofcoordinationstrategiesofeachtypechoseninthe100testrunsforMode1learningwhencrisistaskgroupprobabilitywas1.0.
5.2ExperimentswithSyntheticdomains
InordertotestCOLLAGEoninterestingscenarioswitharangeofcharacteristics,wecreatedanumberof“syntheticdomaintheories”usinggraphgrammarformalisms[23].Asyntheticgrammarrepresentsadomaintheorythatisnotgroundedinanyreallifeapplication.WetestedCOLLAGEonthreedifferentsyntheticenvironmentsgeneratedbytheircorrespondinggrammars.Intheseexperiments,therewerefouragentsinthesystemandeachagenthasonlyapartialviewofanewtaskgroupintheenvironment.Uponseeingataskgroup,aCOLLAGEagentformsalocalsituationvectorandcommunicatesittootheragentstoenablethemtogetamoreglobalviewoftheproblemsolvingprocess.Eachagentformsaglobalsituationvector(thesameforallagents)andindexesintoitsinstancebaseofpastexperiencetochooseagoodcoordinationstrategyforthe
1069686Average Utility76665646362600.10.20.30.40.50.60.70.80.91Communication Cost
a) Crisis TG 0.25
Collage M1Collage M2dataflowbalancedrough240230220Average Utility21020019018017016015000.10.20.30.40.50.60.70.80.91Communication Costb) Crisis TG 1.0
Figure2:AverageQualityversusCommunicationCost
605060504030201004030201000.0500.10.25RO0.5UGH0.75BA1LANC2ED3DATAFLOWFigure3:StrategieschosenbyCOLLAGEM1forCrisisTGProbability1.0atvariouscommuni-cationcosts
14
presentsituationfromamongbalanced,mute,myopic,simple,andtoughcoordinationstrategies.
5.2.1Grammar1&Grammar2
ForthedomainrepresentedbyG1,COLLAGEwastrainedon2500instancesinMode1and18000instancesinMode2.Figure4ashowstheaveragequalityoverthesame100runsfordifferentcoordinationstrategiesatvariouscommunicationcosts.ThecurvesforbothMode1andMode2learningalgorithmslieabovethoseforalltheothercoordinationstrategies.WeperformedaWilcoxonmatched-pairsignedranksanalysistotestforsignificantdifferences(atsignificancelevel0.05)betweenaverageperformancesofthestrategiesacrosscommunicationscosts.Thistestrevealedsignificantdifferencesbetweenthelearningalgorithms(bothMode1andMode2)andtheotherfivecoordinationstrategies,indicatingthatwecanassertwithahighdegreeofconfidencethattheperformanceofthelearningalgorithmsacrossvariouscommunicationcostsisbetterthanstaticallyusinganyoneofthefamilyofcoordinationstrategies.AswiththeDDPdomain,whenthecommunicationcostsgoup,themeanperformanceofthelearningalgorithmsgoesdown.Atsomehighvalueofcommunicationcost,theperformanceofthelearningalgorithmfallsbelowthatofmutebecauselearningagentsusefourunitsofcommunicationtopublishtheirlocalsituationvectors.Thecostofcommunicatingtheselocalsituationvectorscanoverwhelmthebenefitsderivedfromabetterchoiceofthecoordinationstrategy.Whenthecommunicationcostwasashighas0.25,theperformancesofMode1andMode2learnerswere12.495and12.88respectively.Ontheotherhandmutecoordinationgaveameanperformanceof13.55.
161514Average Utility13121110987600.010.020.030.040.050.060.070.08Communication Cost
a) Grammar 1
Collage M1Collage M2modularmutesimpletoughmyopicAverage Utility121110987654300.010.020.030.040.050.060.070.08Communication Cost
b) Grammar 2
Figure4:AverageQualityversusCommunicationCost
Grammar2issimilartoGrammar1instructurebutthedistributionofqualityanddurationoftheleafmethodsisdifferentfromthoseinGrammar1.Figure4bshowstheaveragequalityfordifferentcoordinationstrategiesatvariouscommunicationcosts.TheperformancebehaviorofthelearningalgorithmsissimilarasinthecaseofGrammar1.ForCOLLAGEMode2,theaverageperformanceinfactfallsbelowthatofmuteatcommunicationcost0.075.Whenthecommunicationcostwas
15
ashighas0.25,theperformanceofCOLLAGEMode1wasat10.762whichisstillhigherthanmuteat10.17.However,COLLAGEMode2fellto7.735.AcloserexaminationofFigure5andFigure6,thatgivethenumberofcoordinationstrategiesofeachtypechoseninthe100testrunsbyCOLLAGEinMode1andMode2,revealedthereasonsforthis.Ascommunicationcostsincrease,mutebecomestheoverwhelmingfavoritebutevenatashighcommunicationcostsas0.08,therearestillproblemswheremuteisnotthebestchoice.COLLAGEMode1chosemutemostoftheinstances(74%)exceptafewwheremuteisnotagoodcoordinationstrategy.Thisledtoabetterperformancethatcompensatedfortheincreasedcommunicationcost.Ontheotherhand,COLLAGEMode2chosemoreoftheothercommunicationintensivecoordinationstrategies(39%)leadingtopoorerperformances.WebelievethatmoretrainingintheMode2maybeabletorectifythissituation.NotethatwestoppedthetrainingofMode2,whenitsperformancewasaroundthatofMode1learneratcommunicationcost0.Thedivergencebetweentheirperformancesoccurredatnon-zerocommunicationcosts,andhencewasnotdetectedduringlearning.
908070605040302010090807060504030201000100.0000.0015MUT5E200BA00501LA0.015MYONCED0.05TO0.02PIC50.UGFigure5:CoordinationStrategieschosenbyCOLLAGEM1forG2
5.2.2WhennottoLearn
Syntheticgrammar“G3”isaninterestingcasestudy.WetrainedCOLLAGEontheG3domaininbothMode1andMode2andtestedthemon100runsfordifferentcoordinationstrategiesatvariouscommunicationcosts.WefoundthattoughcoordinationstrategyperformsslightlybetterthanCOLLAGE(seeFigure7).Uponcloserexaminationoftheprobleminstances,itwasnotedthattoughwasthebestperformerin81%oftheinstancesandothercoordinationstrategiesdidbetterintherestofthe19%.COLLAGElearnstochoosetherightcoordinationstrategyinallthe100instances.However,theagentsrequireadditionalunitsofcommunicationofmeta-levelinformationtoformtheglobalsituationvectoranddecidethattoughisthestrategyofchoice(inmostcases).Thelessonwelearnfromthisgrammaristhat,ifthereisanoverwhelmingfavoriteforbestperformanceinthefamilyofstrategies,thenitmaynotpaytouseCOLLAGEtodetermine
0.SIMPL0.075H25E7060504030201007060504030201000000251.00MUT00E50.01SI0MP00.025MYO0.015LE0.05BALPICFigure6:CoordinationStrategieschosenbyCOLLAGEM2forG2
thebestperformerthroughadditionalsituationcommunication.Stickingtothefavoritewithoutawarenessofthenonlocalsituationmayyieldasgoodaperformance.However,ifthefewcasesthatwarrantthechoiceofanotherstrategygivefarsuperiorperformance,thenthegainsfromchoosingastrategycanmorethancompensatefortheadditionalcommunication.This,however,wasnotthecaseinenvironmentsproducedbygrammarG3.
4636Average Quality26166-4-1400.10.20.30.40.50.60.70.80.91Communication Cost
ibl-offlineibl-onlinemutebalancedmyopicsimpletoughFigure7:AverageQualityversusCommunicationCostforGrammarG3
5.3Discussion
COLLAGEchoosesanappropriatecoordinationstrategybyprojectingdecisionsfrompastsimilarexperienceintothenewlyperceivedsituation.COLLAGEagentsperformedbetterthanthoseusingasinglecoordinationstrategyacrossallthe100instancesinalldomainsweexperimentedwith,exceptG3.Inthesedomains,thecostincurredbyadditionalcommunicationfordetectingglobalsituationisoffsetbythebenefitsofchoosingacoordinationstrategybasedongloballygrounded
17
0.25TOU0.07ANC5GHEDlearnedknowledge.DomainG3,however,isdistinguishedbythefactthatthereislittlevarianceinthechoiceofthebestcoordinationstrategyanditwasalmostalwaystough.Thishighlightsthefactthatlearningisespeciallybeneficialinmoredynamicenvironments.
6Conclusion
Manyresearchershaveshownthatnosinglecoordinationmechanismisgoodforallsituations.However,thereislittleintheliteraturethatdealswithhowtodynamicallychooseacoordinationstrategybasedontheproblemsolvingsituation.Inthispaper,wepresentedtheCOLLAGEsystem,thatusesmeta-levelinformationintheformofabstractcharacterizationofthecoordinationprobleminstancetolearntochoosetheappropriatecoordinationstrategyfromamongaclassofstrategies.Inthefirstphaseofcoordination,agentsexchangelocalsituationsthatareabstractionsofthetheirviewsoftheprobleminstancesandformaglobalsituation.Theyusethisglobalsituationtoguidetheirlearningandthesubsequentexploitationofthelearnedcoordinationknowledge.Ourexperimentsprovidestrongempiricalevidenceofthebenefitsoflearningsituation-specificcoordination.
However,thescopesituation-specificlearninggoesbeyondlearningcoordination.Theideaspresentedherearerelevanttothemoregeneralproblemoflearningcooperativecontrolinamulti-agentsystem.Effectivecooperationinamulti-agentsystemrequiresthattheglobalproblem-solvingstateinfluencethelocalcontroldecisionsmadebyanagent.Wecallsuchaninfluencecooperativecontrol[20].Coordinationstrategiesareaformofcooperativecontrol.Anagentwithapurelylocalviewoftheproblemsolvingstatecannotlearntomakeeffectiveproblemsolvingcontroldecisionsaboutcoordinationbecausethesedecisionmayhaveglobalimplications.Wethusadvocatetheneedforlearningglobally-situatedlocalcooperativecontrolknowledge.Anagentassociatesappropriateaspectsoftheglobalproblemsolvingstatewiththecooperativecontrolknowledgethatitlearns.Incooperativesystems,anagentcan“ask”otheragentsforanyinformationthatitdeemsrelevanttoappropriatelysituateitslocalcontrolknowledge.However,asmentionedpreviously,anagentcannotutilizetheentireglobalproblemsolvingstateevenifotheragentsarewillingtocommunicateallthenecessaryinformationbecauseofthelimitationsonitscomputationalresources,representationalheterogeneityandphenomenonlikedistraction[19].Thisleadsustotheimportanceofcommunicatingonlyrelevantinformationatsuitableabstractionstootheragents.Wecallsuchmeta-levelinformationasituation.Anagentthusneedstoassociateappropriateviewsoftheglobalsituationwiththeknowledgelearnedabouteffectivecontroldecisions.Wecallthisformofknowledgesituation-specificcontrol.Wehavelookedattherelevanceofthetheoryofsituation-specificlearninginthecontextoflearningotherformsofproblemsolvingcontrolincooperativemulti-agentsystems[25,26].
Moreover,COLLAGEdemonstratestheutilityandviabilityoflearningcoordinationincom-plex,realisticmulti-agentsystems.Thedomainsinthisworkarerealisticandcoordinationstrate-giesarecomplex.Someoftheworkinmulti-agentroboticsystems[22]alsodealswithrealisticandcomplexsystemsbutthatworkisprimarilyconcernedwithhomogeneousagents.Theworkinthispaperdealswithheterogeneousagents.TheyneedtospeakacommonlanguagetotheextentofbeingabletounderstandTÆMSrepresentations.Theycanorganizetheirlocalcomputationsandschedulinginanymannertheydesire.
Theworkhereisexcitingtousasmuchforthekindsofquestionsandchallengesituncoversasitisfortheresultsobtainedsofar.Someoftheimportantquestionsthatitraisesare“Whatisasuitablelevelofabstractionforasituationandhowcanonedetermineit?”,“Howspecificistheinstancebasetoaparticulardomainandhowmuchofitcanbetransferedacrossdomains?”.
OneimportantconcernaboutCOLLAGEisitsscalability.Asthenumberofcoordinationalter-nativesbecomelargeinnumber,thelearningphasecouldbecomecomputationallyveryintensiveandtheinstance-basesizecouldincreaseenormouslywithrespecttoMode2.Wearelookingathowtointegratemethodsforprogressivelyrefiningsituationvectorssuchasthosein[34],waystoorganizetheinstance-basetoaccessanddetectregionswherethereisinsufficientlearningandalsowaystodomoredirectedexperimentationduringlearningratherthanrandomlysamplingtheproblemspace.
InCOLLAGE,alltheagentsformidenticalinstance-bases.Wecouldaswellhavedonewithonedesignatedagentformingtheinstance-baseandchoosingthecoordinationstrategy.However,ourconfigurationwassetupwithamoregeneralschemeinmind.Insteadofallagentschoosingthesamecoordinationstrategy,theycanchooseinapairwiseoragroup-wisemannersothatasubsetoftheagentscoordinatetochoosethesamestrategy.Thiswillleadtodifferentcase-basesatdifferentagentsandanagentmayhavemorethanonecase-baseifitisapartofmorethanonegroup.Thisleadsustoanotherscalabilityissue:thenumberofagents.Iftherearealargenumberofagents,thencommonsituationvectorsmaylose“toomany”detailsaboutthesituations.Pairwiseorgroup-wisecoordinationmaybeabetteroption.However,wehavetodealwithissuessuchasinconsistentandconflictingknowledgeamongthecase-bases,formationofappropriategroups,anddifferentamountsoflearningfordifferentgroups.
References
[1]R.SuttonA.BartoandC.Watkins.Learningandsequentialdecisionmaking.InM.Gariel
andJ.W.Moore,editors,LearningandComputationalNeuroscience,Cambridge,MA,1990.MITPress.[2]D.W.Aha,D.Kibler,andM.K.Albert.Instance-basedlearningalgorithms.Machine
Learning,6:37–66,1991.[3]RobertH.CritesandAndrewG.Barto.Improvingelevatorperformanceusingreinforcement
learning.InAdvancesinNeuralInformationProcessingSystems8,MITPress,Cambridge,MA,1996.[4]KeithS.Decker.EnvironmentCenteredAnalysisandDesignofCoordinationMechanisms.
PhDthesis,UniversityofMassachusetts,1995.
19
[5]KeithS.DeckerandVictorR.Lesser.Quantitativemodelingofcomplexcomputationaltask
environments.InProceedingsoftheEleventhNationalConferenceonArtificialIntelligence,pages217–224,Washington,July1993.[6]KeithS.DeckerandVictorR.Lesser.Designingafamilyofcoordinationalgorithms.In
ProceedingsoftheFirstInternationalConferenceonMulti-AgentSystems,pages73–80,SanFrancisco,CA,June1995.AAAIPress.[7]E.DurfeeandV.Lesser.Predictabilityvs.responsiveness:Coordinatingproblemsolversin
dynamicdomains.InProceedingsoftheSeventhNationalConferenceonArtificialIntelli-gence,pages66–71,St.Paul,Minnesota,August1988.[8]MarkS.Fox.Anorganizationalviewofdistributedsystems.IEEETransactionsonSystems,
Man,andCybernetics,11(1):70–80,January1981.[9]J.Galbraith.OrganizationalDesign.Addison-Wesley,Reading,MA,1977.
[10]A.GarlandandR.Alterman.Multi-agentlearningthroughcollectivememory.InProceedings
ofthe1996AAAISpringSymposiumonAdaptation,Co-evolutionandLearninginMultiagentSystems,Stanford,CA,1996.[11]AlanGarvey,KeithDecker,andVictorLesser.Anegotiation-basedinterfacebetweena
real-timeschedulerandadecision-maker.CSTechnicalReport94–08,UniversityofMas-sachusetts,1994.[12]AlanGarveyandVictorLesser.Design-to-timereal-timescheduling.IEEETransactionson
Systems,ManandCybernetics,23(6):1491–1502,1993.[13]ItzhakGilboaandDavidSchmeidler.Case-basedDecisionTheory.TheQuaterlyJournalof
Economics,pages605–639,August1995.[14]JohnJ.Grefenstette.TheEvolutionofStrategiesformulti-agentenvironments.Adaptive
Behavior,1(1):65–89,1992.[15]ThomasHaynesandSandipSen.Learningcasestoresolveconflictsandimprovegroup
behavior.Submitted,1996.[16]J.H.Holland.AdaptationinNaturalandArtificialSystems.UniversityofMichiganPress,
AnnArbor,Michigan,1975.[17]J.H.Holland.Propertiesofbucketbrigadealgorithm.InFirstInternationalConferenceon
GeneticAlgorithmsandtheirApplications,pages1–7,Pittsburgh,PA,1985.[18]PaulLawrenceandJayLorsch.OrganizationandEnvironment.HarvardUniversityPress,
Cambridge,MA,1967.[19]V.R.LesserandL.D.Erman.Distributedinterpretation:Amodelandanexperiment.IEEE
TransactionsonComputers,C-29(12):1144–1163,December1980.
20
[20]VictorR.Lesser.AretrospectiveviewofFA/Cdistributedproblemsolving.IEEETransactions
onSystems,Man,andCybernetics,21(6):1347–1362,1991.[21]ThomasMaloneandKevinCrowston.Towardaninterdisciplinarytheoryofcoordination.
CenterforCoordinationScienceTechnicalReport120,MITSloanSchoolofManagement,1991.[22]M.J.Mataric.Learningtobehavesocially.InProceedingsoftheThirdInternational
ConferenceonSimulationofAdaptiveBehavior(SAB-94),1994.[23]M.V.NagendraPrasad,KeithS.Decker,AlanGarvey,andVictorR.Lesser.Exploring
OrganizationalDesignswithTAEMS:ACaseStudyofDistributedDataProcessing.InProceedingsoftheSecondInternationalConferenceonMulti-AgentSystems,Kyoto,Japan,December1996.AAAIPress.[24]M.V.NagendraPrasadandV.R.Lesser.LearningSituation-SpecificCoordinationinGener-alizedPartialGlobalPlanning.In1996AAAISpringSymposiumonAdaptation,Co-evolutionandLearninginMulti-agentSystems,Stanford,CA,1996.AAAIPress.[25]M.V.NagendraPrasad,V.R.Lesser,andS.E.Lander.Cooperativelearningovercom-positesearchspaces:Experienceswithamulti-agentdesignsystem.InThirteenthNationalConferenceonArtificialIntelligence(AAAI-96),Portland,Oregon,August1996.AAAIPress.[26]M.V.NagendraPrasad,V.R.Lesser,andS.E.Lander.Learningorganizationalrolesina
heterogeneousmulti-agentsystem.InProceedingsoftheSecondInternationalConferenceonMulti-AgentSystems,Kyoto,Japan,December1996.AAAIPress.[27]J.S.RosenscheinandG.Zlotkin.Designingconventionsforautomatednegotition.AI
Magazine,pages29–46,Fall1994.[28]T.SandholmandR.Crites.Multi-agentreinforcementlearningintherepeatedprisoner’s
dilemma.toappearinBiosystems,1995.[29]SandipSen,M.Sekaran,andJ.Hale.Learningtocoordinatewithoutsharinginformation.
InProceedingsoftheTwelfthNationalConferenceonArtificialIntelligence,pages426–431,Seattle,WA,July1994.AAAI.[30]Y.ShohamandM.Tennenholtz.Onthesynthesisofusefulsociallawsforartificialagent
societies(preliminaryreport).InProceedingsoftheTenthNationalConferenceonArtificialIntelligence,pages276–281,SanJose,July1992.[31]S.S.Sian.Extendinglearningtomultipleagents:issuesandamodelformulti-agentmachine
learning.InProceedingsofMachineLearning-EWSL91,pages440–456,Springer-Verlag,1991.[32]B.Silver,W.Frawely,G.Iba,J.Vittal,andK.Bradford.Aframeworkformulti-paradigmatic
learning.InProceedingsoftheSeventhInternationalConferenceonMachineLearning,pages348–358,1990.
21
[33]ArthurL.Stinchcombe.InformationandOrganizations.UniversityofCaliforniaPress,
Berkeley,CA,1990.[34]T.SugawaraandV.R.Lesser.On-linelearningofcoordinationplans.InProceedingsofthe
TwelfthInternationalWorkshoponDistributedAI,HiddenValley,Pa,May1993.[35]R.Sutton.Learningtopredictbythemethodsoftemporaldifferences.MachineLearning,
3:9–44,1988.[36]M.Tan.Multi-agentreinforcementlearning:Independentvs.cooperativeagents.InPro-ceedingsoftheTenthInternationalConferenceonMachineLearning,pages330–337,1993.[37]G.Weiss.Somestudiesindistributedmachinelearningandorganizationaldesign.Technical
ReportFKI-189-94,InstitutfrInformatik,TUMnchen,1994.
22
因篇幅问题不能全部显示,请点此查看更多更全内容