搜索
您的当前位置:首页正文

Learning situation-specific coordination in cooperative multi-agent systems

来源:好走旅游网
LearningSituation-SpecificCoordinationinCooperative

Multi-agentSystems

MVNagendraPrasad,andVictorRLesser

DepartmentofComputerScience

UniversityofMassachusetts,Amherst,MA01002.

nagendra,lesser@cs.umass.edu

Abstract

Achievingeffectivecooperationinamulti-agentsystemisadifficultproblemforanumberofreasonssuchaslimitedandpossiblyout-datedviewsofactivitiesofotheragentsanduncertaintyabouttheoutcomesofinteractingnon-localtasks.Inthispaper,wepresentalearningsystemcalledCOLLAGE,thatendowstheagentswiththecapabilitytolearnhowtochoosethemostappropriatecoordinationstrategyfromasetofavailablecoordinationstrategies.COLLAGEreliesonmeta-levelinformationaboutagents’problemsolvingsituationstoguidethemtowardsasuitablechoiceforacoordinationstrategy.Wepresentempiricalresultsthatstronglyindicatetheeffectivenessofthelearningalgorithm.

Keywords:Multi-agentSystems,Coordination,Learning

1

1Introduction

Coordinationistheprocessofeffectivelymanaginginterdependenciesbetweenactivitiesdistributedacrossagentssoastoderivemaximumbenefitfromthem[21,6].Basedonstructureanduncertaintyintheirenvironment,agentshavetochooseandtemporallyordertheiractivitiestomitigatetheeffectsofharmfulinterdependenciesandexploitthebeneficialinterdependenciesamongthem.Researchersinbothhumanandcomputationalorganizationaltheorieshavepointedoutthatthereisnosingleorganizationorcoordinationprotocolthatisthebestforallenvironments.Inhumanorganizations,environmentalfactorssuchasdynamismandtaskuncertaintyhaveastrongeffectonwhatcoordinatedactionsareandhoworganizationallyacceptableoutcomesarise[18,9,33].Theseeffectshavebeenobservedinpurelycomputationalorganizationsaswell[8,7,6,23,26].Achievingeffectivecoordinationinamulti-agentsystem(MAS)isadifficultproblemforanumberofreasons.Anagent’slocalcontroldecisionsaboutwhatactivitytodonextorwhatinformationtocommunicateandtowhomorwhatinformationtoaskothersmaybeinappropriateorsuboptimalduetoitslimitedviewoftheinteractionsbetweenitsownactivitiesandthoseoftheotheragents.Inordertomakemoreinformedcontroldecisions,theagentshavetoacquireaviewofthetaskstructuresofotheragents.Totheextentthatthisresolvesagents’uncertaintyaboutthenon-localproblemsolvingactivities,theycanactcoherently.However,anagenthastoexpendcomputationalresourcesinacquiringandexploitingsuchnon-localviewsofotheragents’activities.Thisinvolvescommunicationdelaysandthecomputationalcostofprovidingthisinformationandassimilatingtheinformationfromotheragents.Giventheinherentuncertaintyinagents’activitiesandthecostofmeta-levelprocessing,relyingonsophisticatedcoordinationstrategiestoacquirenon-localviewsoftaskstructuresmaynotbeworthwhileforallproblem-solvingsituations.Incertainsituations,coordinationprotocolsthatpermitsomelevelofnon-coherentactivityandavoidtheadditionaloverheadforcoordinationmayleadtobetterperformance[7,6,23,34].Forexample,whentheagentsareunderseveretimepressureandtheloadoftheactivitiesattheagentsishigh,sophisticatedagentcoordinationstrategiesdonotgenerallypayoff.Agentsmaynothavethetimetobenefitfromtheincreasedawarenesstheyderivethroughcoordinationinsuchsituations.Inthispaper,wewillbedealingwithhowagentscanlearntodynamicallychoosetheappropriatecoordinationstrategyindifferentcoordinationprobleminstances.Weempiricallydemonstratethatevenforanarrowclassofagentactivities,learningtochoosetheappropriatecoordinationstrategybasedonmeta-levelcharacterizationoftheglobalproblemsolvingstateoutperformsusinganysinglecoordinationstrategyacrossallprobleminstances.

Inordertoaccomplishlearning,webreakthecoordinationproblemintotwophases.Inthefirstphase,theagentsexchangemeta-levelinformationnotdirectlyusedforcoordination.Thisinformationisusedbytheagentstoderiveapredictionoftheeffectivenessofvariouscoordinationmechanismsinthepresentproblemsolvingepisode.Thesemechanismsdifferintheamountofnon-localinformationtheyacquireanduse,andinthecomplexityofanalysisofinteractionsbetweenactivitiesattheagents.Agentschooseanappropriatesubsetofthecoordinationmechanisms(oracoordinationstrategy)basedonthemeta-levelinformationandenterPhaseII.Inthisphase,thecoordinationstrategyselectedinPhaseIdecidesthetypesofinformationtobeexchangedandthekindofreasoningaboutlocalandnon-localactivitiestheagentsneedtoperformtoachievecoherentbehavior.Wecallthemeta-levelinformationasituationandthetwophaseprocesssituation-specificcoordination.Learningsituation-specificcoordinationinvolvesassociatingappropriateviewsoftheglobalsituationwiththeknowledgelearnedabouttheeffectivenessofthecoordinationmechanisms.

2

Theagentsinacooperativemulti-agentsystemcanlearntocoordinatebyproactivelyresolvingsomeoftheiruncertaintyabouttheglobalproblemsolvingstateandsituatingtheirlearninginmoreglobalviews.

Therestofthepaperisorganizedasfollows.Wefirstplaceourworkincontextanddiscussitsrelationshiptotheexistingworkonlearninginmulti-agentsystems.WethenbrieflyreviewtheTÆMStaskstructurerepresentationforcoordinationproblemsandtheGPGPmodelfordesigningcoordinationstrategies.Wesubsequentlydescribeourlearningalgorithmthatlearnshowtochooseamongcoordinationstrategiesofdifferentlevelsofsophistication.Wethenpresentsomeofourexperimentalresultsandconclude.

2RelatedWork

Previousworkrelatedtolearninginmulti-agentsystemsislimitedandmuchofthisworkreliesontechniquesderivedfromreinforcementlearning[1,35],geneticalgorithms[16]andclassifiersystems[17].InSen,SekaranandHale[29]theagentsusereinforcementlearningtoevolvecom-plimentarypoliciesinaboxpushingtask,andinCritesandBarto[3]ateamofreinforcementlearningagentsoptimizeelevatordispatchingperformance.Inboththeseworks,theagentsdonotcommunicatewithoneanotherandanyagenttreatstheotheragentsasapartoftheenvironment.Weiss[37]usesavariantofHolland’s[17]bucketbrigadealgorithmforlearninghierarchicalor-ganizationstructuringrelationshipsinablockworlddomainviastrengtheningpromisingchainsofactionsthroughbid-rewardcycles.Tan[36]dealswithapredator-preydomaininagridworld.Theagentsshareperceptioninformationtoovercomeperceptuallimitationsorcommunicatepolicyfunctionslearnedthroughreinforcementlearning.Grefenstette[14]usesgeneticalgorithmstolearnreactivedecisionrulesforagentsinapredator-preydomainsimilartothatin[36].SandholmandCrites[28]studytheemergenceofcooperationintheIteratedPrisoner’sDilemmaproblem,wheretheagentsareself-interested,andanagentisnotfreetoaskforanykindofinformationfromtheotheragents.HaynesandSen[15]presentstudiesinlearningcasestoresolveconflictsamongagentsinapredator-preydomainsimilartothatusedbyTan[36].Thereisnocommunicationbetweentheagents,andthecasesresultfromtheagent’sperceptionoftheproblemsolvingstate.Themulti-agentsystemsthatthispaperdealswithconsistofcomplexcooperativeagents(eachagentisasophisticatedproblemsolver),andanagent’slocalproblemsolvingcontrolinteractswiththatoftheotheragents’inintricateways.Inourwork,ratherthantreatingotheragentsasapartoftheenvironmentandlearninginthepresenceofincreaseduncertainty,agentscommunicatemeta-levelinformationtoresolvetheuncertaintytotheextentpossible(thereisstilltheenvironmentaluncertaintythattheagentscannotdomuchabout).AgentssharingperceptualinformationasinTan[36]andGreffenstette[14]orbiddinginformationasinWeiss[37]donotmakeexplicitthenotionofsituatingthelocalcontrolknowledgeinamoreglobal,abstractsituation.Theinformationsharedisweak,andthestudieswereconductedindomainssuchaspredator-prey[36,14]orblocksworld[37]wheretheneedforsharingmeta-levelinformationandsituatinglearninginthisinformationisnotapparent.

Ourpreviousworkexploredlearningsituation-specificorganizationalrolesinaheterogeneousmulti-agentsystem[26].Anorganizationalrolerepresentsasetoftasksanagentcanperformonacompositesolution.Agentuseabstractionsofproblemsolvingstatetolearntoplaytherolestheyarebestsuitedforinamulti-agentsearchprocessforcooperativelyconstructinganoverall

3

solution.Thesystemwastestedinaparametricdesigndomainandthelearningagentsproduceddesignsthat,onanaverage,werebetterthanthoseproducedbyasystemwithagentsplayingroleshand-codedbyahumanexpert.

SugawaraandLesser[34]alsorecognizetheneedforsituationspecificityinlearningcoordina-tion,thoughtheydohavethenotionoftwo-phasecoordination.Theyareconcernedwithlearningtomakethesituationsmorediscriminatingtoavoidusinganinappropriatecoordinationstrategyinthedomainofdistributednetworkdiagnosis.Theirlearningmechanismsrelyondeepdomainknowledgeandagenthomogeneityassumptionsinordertoprogressivelyrefinesituationsbasedonfailure-drivenexplanationandcomparativeanalysisofproblemsolvingtraces.Theytestthetheoryonaverylimitednumberofcoordinationsituations,andtheevidencewasanecdotal.Itisnotclearhowsuchknowledge-intensivelearningcanbegeneralizedtootherinstanceswithoutsignificantknowledgeengineeringandthedevelopmentofmoresophisticatedexplanation-basedlearningtechniques.Despitetheselimitations,combiningtheirworkonlearningsituationrepre-sentationswiththelearningpresentedhereonsituation-basedchoiceofcoordinationcouldhaveinterestingimplicationsforsituation-specificlearning.

GarlandandAlterman[10]discussissuesinknowledgereuseinmulti-agentsystems.AgentsworkinginasimulatedMOVERS-WORLDdomainhavetocollaboratetomoveobjectstooheavyforasingleagenttomove.Agentsusetheirpastexperiencetoanticipatecoordinationratherthandynamicallycommunicatetoestablishit.Motivationforthisworkisdifferentfromours.Intheirwork,agentsexploittheirpastexperiencetoreducethecontroleffortneededtoachievecoordinationinfutureprobleminstances.SimilaruseofpastexperiencehasbeenstudiedinNagendraPrasad,LanderandLesser[25].

Certainmulti-agentlearningsystemsintheliteraturedealwithadifferenttaskfromthatpresentedinthispaper.SystemslikeILS[32]andMALE[31]usemulti-agenttechniquestobuildhybridlearnersfrommultiplelearningagents.Ontheotherhand,wedealwithlearningproblem-solvingcontrolformulti-agentsystems.

3CoordinationinMulti-agentSystems

Inthispaper,weshowtheeffectivenessoflearningsituation-specificcoordinationbyextend-ingadomain-independentcoordinationframeworkcalledGeneralizedPartialGlobalPlanning(GPGP)[6,4].GPGPisaflexibleandmodularmethodthatrecognizestheneedforcreatingtailoredcoordinationstrategiesinresponsetothecharacteristicsofaparticulartaskenvironment.Itisstructuredasanextensiblesetofmodularcoordinationmechanismssothatanysubsetofmechanismscanbeused.Eachmechanismcanbeinvokedinresponsetothedetectionofcertainfeaturesoftheenvironmentandinterdependenciesbetweenproblemsolvingactivitiesofagents.Thecoordinationmechanismscanbeconfigured(orparameterized)indifferentwaystoderivedifferentcoordinationstrategies.InGPGP(withoutthelearningextensions),onceasetofmech-anismsareconfigured(byahumanexpert)toderiveacoordinationstrategy,thatstrategyisusedacrossallprobleminstancesintheenvironment.Experimentalresultshavealreadyverifiedthatforsomeenvironmentsasubsetofthemechanismsismoreeffectivethanusingtheentiresetofmechanisms[6,23].Inthiswork,wepresentalearningextension,calledCOLLAGE,thatendowstheagentswiththecapabilitytochooseasuitablesubsetofthecoordinationmechanismsbasedonthepresentproblemsolvingsituation,insteadofhavingafixedsubsetacrossallprobleminstances

4

inanenvironment.

GPGPcoordinationmechanismsrelyonthecoordinationprobleminstancebeingrepresentedinaframeworkcalledTÆMS[4,5].TheTÆMSframework(TaskAnalysis,EnvironmentModeling,andSimulation)[4,5]representscoordinationproblemsinaformal,domain-independentway.Itcapturestheessenceofthecoordinationprobleminstances,independentofthedetailsofthedomain,intermsoftrade-offsbetweenmultiplewaysofaccomplishingagoal,progresstowardstheachievementofgoals,variousinteractionsbetweentheactivitiesintheprobleminstanceandtheeffectoftheseactivitiesontheperformanceofthesystemasawhole.TÆMScanmodelaspectsofcoordinationincomplexworth-orienteddomains[27]wherestateshavefunctionsthatratetheiracceptability(notnecessarilyabinaryrating).Therearedeadlinesassociatedwiththetasksandsomeofthesubtasksmaybeinterdependent,thatis,cannotbesolvedindependentlyinisolation.Theremaybemultiplewaystoaccomplishataskandthesetradeoffthequalityofaresultproducedforthetimetoproducethatresult.Qualityisusedasacatch-alltermrepresentingacceptabilitycharacteristicslikecertainty,precision,andcompleteness,otherthantemporalcharacteristics.Asetofrelatedtasksiscalledataskgrouporataskstructure.Ataskstructureistheelucidationofthestructureoftheproblemanagentistryingtosolve.Thequalityofataskisafunctionofthequalityofitssubtasks.Theleaftasksaredenotedasmethodswithbaselevelqualityandduration.Methodsrepresentdomainactions,likeexecutingablackboardknowledgesource,runninganinstantiatedplan,orexecutingapieceofcodewithitsdata.Ataskmayhavemultiplewaystoaccomplishit,representedbymultiplemethods,thattradeoffthetimetoproducearesultforthequalityoftheresult.Figure1isanexampleofasimpletasksstructure.Besidestask/subtaskrelationships,therecanbeotherinterdependencies,calledcoordinationinterrelationships,betweentasksinataskgroup[5].Inthispaper,wewillbedealingwithtwosuchinterrelationships:

facilitatesrelationshiporsoftinterrelationship:TaskAfacilitatesTaskBifthetheexecutionTaskAproducesaresultthataffectsanoptionalparameterofTaskB.TaskBcouldhavebeenexecutedwithouttheresultfromTaskA,buttheavailabilityoftheresultprovidesconstraintsontheexecutionofTaskB,leadingtoimprovedqualityandreducedtimerequirements.enablesrelationshipsorhardinterrelationship:IfTaskAenablesTaskBthentheexecutionTaskAproducesaresultthatisarequiredinputparameterforTaskB.TheresultsofexecutionofTaskAmustbecommunicatedtoTaskBbeforeitcanbeexecuted.

InGPGP,eachagentcanbeviewedashavingthreecomponents:alocalscheduler,acoor-dinationmoduleandabeliefdatabase.Thebeliefdatabaseofanagentrepresentsitssubjectiveviewofthepresentcoordinationprobleminstancecomprisingtheagent’sknowledgeaboutthetaskstructuresintheenvironment.Thelocalschedulerofanagentusestheinformationinthebeliefdatabasetobuildschedulesofmethodexecutionactionsinordertomaximizeutility.Inthisworkweuseadesign-to-timescheduler[12]thatcantakeintoaccount,theconstraintsprovidedbythecoordinationmoduleandprovideschedulesthatmaximizetheglobalutilitymeasure.Thecoordinationmodulemodulatesthelocalschedulerbynoticingthetaskstructuresinthebeliefdatabaseanddoinganumberofactivitieslikegatheringinformationaboutnewtaskstructuresintheenvironment,communicatinginformationaboutlocalbeliefstootheragentsorreceivinginfor-mationfromthem,andmakingorretractingcommitments.Acommitmentrepresentsacontracttoachieveaparticularqualitybyaspecifieddeadline.Thecoordinationmechanismsareparam-eterizedindependentlysothateachcombinationoftheparametersettingsleadstoacoordination

5

TT1ORT2ANDT21ANDT22M22FacilitatesT212ORT213M213M212Enables

Figure1:ExampleofaTaskStructure

strategy.Differentmechanismsrequiredifferenttypesofinformationtooperate.Thus,bychoos-ingdifferentsetsofcoordinationmechanisms,theamountofcommunicationandotheroverheadsassociatedwithcoordinationcanbevaried.Theinterfacebetweenthecoordinationmechanismsandthelocalschedulerisbidirectionalandnegotiation-based,permittingthecoordinationmoduletoask“what-if”questions[11].Thisisacrucialabilitywhoseutilitybecomesobviouslater,whenwedescribeourlearningalgorithminmoredetail.

InGPGP,acoordinationstrategycanbederivedbyactivatingasubsetofthecoordinationmechanisms.Giventhatthesemechanismscanbeparameterizedindependently,therearelargenumberofpossiblecoordinationstrategies.Learningtochoosethebeststrategyfromamongstalargesetcanbequitecumbersome.Wesuggesttwowaystoprunethissearchspace,leadingtoamoremanageablesetofdistinctcoordinationstrategies:

Usedomainknowledgetoprunedownthesetofpossiblecoordinationstrategiestosmallnumberofinterestingdistinctstrategies.Intheexperimentstobedescribedlater,wewillbedealingwiththedistributeddataprocessingdomain.Inthisdomain,weinvestigatethreestrategies:

–balanced(ordynamic-scheduling):Itisthesameaspreviouslydescribed.Agentscoordinatetheiractionsbydynamicallyformingcommitments.Relevantresultsaregeneratedbyspecifictimesandcommunicatedtotheagentstowhomcorrespondingcommitmentsaremade.Agentsscheduletheirlocaltaskstryingtomaximizetheaccrualofqualitybasedonthecommitmentsmadetoitbytheotheragents,whileensuringthatcommitmentstootheragentsaresatisfied.Theagentshavetherelevantnon-localviewofthecoordinationproblem,detectcoordinationrelationships,formcommitmentsandcommunicatethecommittedresults.

–data-flowstrategy:Anagentcommunicatestheresultofperformingatasktoalltheagentsandtheotheragentscanexploittheseresultsiftheystillcan.Thisrepresents

6

theotherextremewheretherearenocommitmentsfromanyagenttoanyotheragent.–roughcoordination:Thisissimilartobalancedbutcommitmentsdonotariseoutofcommunicationbetweenagentsbutareknownapriori.Eachagenthasanapproximateideaofwhentheotheragentscompletetheirtasksandcommunicateresultsbasedonitspastexperience.Theagentshavetherelevantnon-localviewofthecoordinationproblem,detectcoordinationrelationships,butuseroughcommitmentsandcommunicatethecommittedresults.“Roughcommitments”areaformoftacitsocialcontractbetweenagentsaboutthecompletiontimesoftheirtasks.Itmightbepossibletoviewroughcommitmentsasprecompiledsociallaws[30].

Thelattertwocoordinationstrategiesarethealternativesnormallyusedinthedistributeddataprocessingdomain[23].DeckerandLesser[6]proposedbalancedasasophisticatedstrategythatexploitsanumberofmechanismstoachievecoordination.Inourexperiments,agentslearntochooseamongthesethreecoordinationstrategiesinthedomainofdistributeddataprocessing.Inthisdomain,threetypesoftaskgroupsarise:routinetaskgroups,crisistaskgroups,andlowprioritytaskgroups(seeSection5fordetails).Thelattertwotaskgroupsariseonlyoccasionally(i.e.withaprobability)andhenceitisunrealistictoexpectcommitmentsoncrisistasksandlowprioritytaskstofollowsuchtacitaprioriroughcommitments.Sothiscoordinationtypeusesroughcommitmentsforroutinetasksbutbehavesjustlikedataflowforthenon-routinecrisisandlowprioritytasks.

Clustertheentiresetofpossiblestrategiesontheperformancespaceandchooseaprototyp-icalinstancefromeachcluster.Theagentperformancecanbecharacterizedalonganumberofdimensionsliketotalquality,numberofmethodsexecuted,numberofcommunications,andterminationtime.DeckerandLesser[6]exploretheperformancespaceofthecoordi-nationstrategiesbasedonthesefourperformancemeasuresandidentifyfiveprototypicalcoordinationstrategies:

–balancedwithallmechanismsfordetectingsoftandhardcoordinationinterrelation-shipsandformingcommitmentsonthemturnedon.Onlyrelevantpartialviewsareexchangedbetweenagentsfordetectingtheseinterrelationshipsandresultsthatsatisfycommitmentsareexchanged.

–simplewithnohardorsoftcoordinationmechanismsorexchangeofrelevantpartialnon-localviews.Allresultsobtainedduetotaskexecutionsarebroadcasttoallagents.–myopicwithallcommitmentmechanismsonbutnonon-localview.–toughwithnosoftcoordinationbutotherwisesameasbalanced

–mutewithnohardorsoftcoordinationmechanismsandnonon-local-viewsandnocommunicationwhatsoever.

Thesefivecoordinationsstrategieswillbeusedinourexperimentslateronsomesyntheticdomains.

4COLLAGE:LearningCoordination

4.1LearningCoordination

Ourlearningalgorithm,calledCOLLAGE,usesabstractmeta-levelinformationaboutcoordi-nationprobleminstancestolearntochoose,forthegivenprobleminstance,theappropriatecoordinationstrategyfromthethreestrategiesdescribedpreviously.LearninginCOLLAGE(COordinationLearnerformuLtipleAGEntsystems)fallsintothecategoryofInstance-BasedLearningalgorithms[2]originallyproposedforsupervisedclassificationlearning.We,however,usetheIBL-paradigmforunsupervisedlearningofdecision-theoreticchoice.Wewouldliketonotethatwecouldaswellhaveusedsomeothermethodlikeneuralnetsordecisiontrees.Ourinterestinthisworkisnotinstudyingthecomparativeperformancesofvariousalgorithms.

Learninginvolvesrunningthemulti-agentsystemonalargenumberoftrainingcoordinationprobleminstancesandobservingtheperformanceofdifferentcoordinationstrategiesontheseinstances.Whenanewtaskgrouparisesintheenvironment,eachoftheagentshasitsownpartiallocalviewofthetaskgroup.Basedonitslocalview,eachagentformsalocalsituationvector.Alocalsituationrepresentsanagent’sassessmentoftheutilityofreactingtovariouscharacteristicsoftheenvironment.SuchanassessmentcanpotentiallyindicatetheneedforactivatingthevariousGPGPmechanismsandconsequentlyhasadirectbearingonthetypeofcoordinationstrategythatisbestforthegivencoordinationepisode.Theagentsthenexchangetheirlocalsituationvectorsandeachoftheagentscomposesallthelocalsituationvectorsintoaglobalsituationvector.Allagentsagreeonachoiceofthecoordinationstrategyandthechoicedependsonthekindoflearningmodeoftheagents:

Mode1:Inthismode,theagentsrunalltheavailablecoordinationstrategiesandnotetheirrelativeperformancesfortheeachofthecoordinationprobleminstances.Thus,forexample,agentsruneachofdata-flow,rough,andbalancedforacoordinationepisodeandstoretheirperformancesforeachstrategy.

Mode2:Inthismode,theagentschooseoneofthecoordinationstrategiesforagivencoordinationepisodeandobserveandstoretheperformanceonlyforthatcoordinationstrategy.Theychoosethecoordinationthatisrepresentedtheleastnumberoftimesintheneighborhoodofasmallradiusaroundthepresentglobalsituation.Thisisdonetoobtainabalancedrepresentationforallthecoordinationstrategiesacrossthespaceofpossibleglobalsituations.

Mode2isaquasi-onlinealgorithm.Intheinitialstagesitjustexploresandinthelaterstagesitjustexploitsthelearnedinformation.StudyingCOLLAGEinasetupthatismoretypicalofonlinelearningalgorithms,whereexplorationandexploitationareinterleaved,ishighonouragendaoffuturework.

Atendofeachrunofthecoordinationepisodewithaparticularcoordinationstrategy,theperformanceofthesystemisregistered.Thisisrepresentedasavectorofperformancemeasuressuchastotalquality,numberofcommunications,andterminationtime.Learninginvolvessimplyaddingthenewinstanceformedbytheperformanceofthecoordinationstrategyalongwiththeassociatedproblemsolvingsituationtothe“instance-base”.Thus,thetrainingphasebuildsasetof

requirementsasfaraspossiblebutwithasacrificeinquality.Undertimepressure,lowerquality,lowerdurationmethodsarepreferredoverhigherquality,higherdurationmethodsforachievingaparticulartask.Inordertogetanestimateofthetimepressure,anagentgeneratesvirtualtaskstructuresfromitslocaltaskstructuresbysettingthedeadlinesofthetaskgroups,tasksandmethodsto(alargenumber)andschedulingthesevirtualtaskstructures.Theagentsscheduleagainwithlocaltaskstructuressettotheactualdeadline.Timepressureisobtainedastheratiooftheschedulequalitywiththeactualdeadlinesandtheschedulequalitywithlargedeadlines.

Thesixthcomponentrepresentstheload.Itisformedusingmethodssimilartothatdiscussedabovefortimepressurebutusingthedurationoftheschedulesformedwiththevirtualtaskstructures.Thetimetakenbytheschedulewhenithasnotimepressure(deadlineis)representstheamountofworktheagentwouldhavedoneunderidealconditions.However,timepressuremakestheagentworkalmostrightuptothedeadline.Itmaynotworkallthewaytothedeadlineasitmaynotfindamethodthatcanbefittedintothelastavailablechunkoftime.Thus,loadisobtainedastheratioofexecutiontimeunderactualdeadlineandtheexecutiontimeundernotimepressure.

4.1.2FormingaGlobalSituationVector

Eachagentcommunicatesitslocalsituationvectortoallotheragents.Anagentcomposesallthelocalsituationvectors:itsownandthoseitreceivedfromotherstoformaglobalsituationvector.Anumberofcompositionfunctionsarepossible.Theoneweusedintheexperimentsreportedhereissimple:component-wiseaverageofthelocalsituationvectors.Thustheglobalsituationvectorhassixcomponentswhereeachcomponentistheaverageofthecorrespondingcomponentsofthelocalsituationvectors.

Anexampleglobalsituationvectorlooksasfollows:(0.820.770.660.891.00.87).Herethelowvalueofthethirdcomponentrepresentslargequalitygainsbydetectingandcoordi-natingonhardinterrelationships.Thustwoofthemoresophisticatedcoordinationstrategiescalledbalancedandtough[6]arefoundtobebetterperformersinthissituation.Ontheotherhand,inaglobalsituationvectorsuchas(0.800.900.880.800.610.69)thelowvaluesoffifthandsixthcomponentsindicatehightimepressureandloadinthepresentproblemsolvingepisode.Eveniftheagentsusesophisticatedstrategiestocoordinate,theymaynothavethetimetobenefitfromit.Hence,relativelysimplecoordinationstrategieslikesimpleormute[6]dobetterinthisscenario.

Note,however,thatinmostsituationvectors,thesetrade-offsaresubtleandnotasobviousastheaboveexamples.Itisdifficultforahumantolookatthesituationsandeasilypredictwhichstrategyisthebestperformer.Thetrade-offsmaybeveryspecifictothekindsoftaskstructuresthatoccurinthedomain.Hence,hand-codingthestrategiesbyadesignerisnotapracticalalternative.Oncetheentireinstance-baseisformed,eachofthefeaturesforeachinstanceisnormalizedbytherangeofmaximumandminimumforthatfeaturewithintheentireinstance-base.Thisisdoneinordertoavoidbiasingthesimilaritymetricinfavorofanyparticularfeature.

10

4.2ChoosingaCoordinationStrategy

COLLAGEchoosesacoordinationstrategybasedonhowthesetofavailablestrategiesperformedinsimilarpastcases.WeadoptthenotationfromGilboaandSchmeidler[13].Eachcaseistriplet

whereandisthesetofsituationsrepresentingabstractcharacterizationofcoordinationproblems,andisthesetofcoordinationchoicesavailable,andisthesetofresultsfromrunningthecoordinationstrategies.

Decisionsaboutcoordinationstrategychoicearemadebasedonsimilarpastcases.Outcomesdecidethedesirabilityofthestrategies.Wedefineasimilarityfunctionandautilityfunctionsasfollows:

Intheexperimentspresentedlater,weusetheEuclideanmetricforsimilarity.

Thedesirabilityofacoordinationstrategyisdeterminedbyasimilarity-weightedsumoftheutilityityieldedinthesimilarpastcasesinasmallneighborhoodaroundthepresentsituationvector.Weobservedthatsuchanaveragingprocessinaneighborhoodaroundthepresentsituationvectorwasmorerobustthantakingthenearestneighbor,possiblybecausetheaveragingprocess

bethesetofpastsimilarcasestoproblem(greaterwaslesssensitivetonoise.Let

thanathresholdsimilarity).

For

,let

.Theutilityof

isdefinedas

GilboaandSchmeidler[13]describecase-baseddecisiontheoryasanormativetheoryofhumanbehaviorduringdecisionmaking.Eventhough,weadopttheirnotation,therearecrucialdifferencesinthemotivationsandstructureofthetwoworks.GilboaandSchmeidlerareprimaryconcernedwithadescriptivetheoryofhumandecisionmakingwhileourwork,developedindependently,isconcernedprimarilywithbuildingcomputationalsystemsbasedoncase-baseddecisiontheory.

11

ofinterrelationshipsintaskstructures.Thisoftengivesrisetoawiderangeoftaskstructuresandahugevarianceinthetypesofcapabilitiesneededbythesystemtoeffectivelyhandlethem.Accordingly,thelearningalgorithmsshowedatbestmodestgainsinperformance[24].Moreimportantly,itisunlikelythatmostrealapplicationsinvolveaninfinitevarietyoftaskstructures.Thedomainsemanticsdictateandlimitmorphologyofthetaskstructures.Whilethereisboundtobesomerandomnessinthesestructures,itishighlyunlikelythattheonlyregularitythatcanbemodeledinthetaskstructurerepresentationsofacoordinationprobleminstanceareafewparameterslikeitsmeandepthorbranchingfactor.NagendraPrasadetal.[23]developedagraph-grammar-basedstochastictaskstructuredescriptionlanguageandgenerationtoolformodelingtaskstructuresarisinginadomain.Suchagraph-grammar-basedtaskstructurespecificationlanguageispowerfulenoughtomodelthetopologicalrelationshipsoccurringintaskstructuresrepresentingmanyreallifeapplications.Fortheexperimentsbelow,weusethistooltomodeladistributeddataprocessingdomainasin[23].Wenowbrieflyintroducethisdomainandrefertheinterestedreaderto[23]forfurtherdetails.

Thedistributeddataprocessingdomainconsistsofanumberofgeographicallydisperseddataprocessingcenters(agents).Eachcenterisresponsibleforconductingcertaintypesofanalysistasksonstreamsofsatellitedataarrivingatitssite:“routineanalysis”thatneedstobeperformedondatacominginatregularintervalsduringtheday,“crisisanalysis”thatneedstobeperformedontheincomingdatabutwithacertainprobabilityand“lowpriorityanalysis”,theneedforwhicharisesatthebeginningofthedaywithacertainprobability.Lowpriorityanalysisinvolvesperformingspecializedanalysisonspecificarchivaldata.Differenttypesofanalysistaskshavedifferentpriorities.Acentershouldfirstattendtothe“crisisanalysistasks”andthenperform“routinetasks”onthedata.Timepermitting,itcanhandlethelow-prioritytasks.Theprocessingcentershavelimitedresourcestoconducttheiranalysisontheincomingdataandtheyhavetodothiswithincertaindeadlines.Resultsofprocessingdataatacentermayneedtobecommunicatedtoothercentersduetotheinterrelationshipsbetweenthetasksatthesecenters.In[23],wegivedetailsofhowastochasticgraph-grammarcanmodeltaskstructuresarisinginadomainsuchasthis.Inthesamework,wealsopresenttheresultsofempiricalexplorationsoftheeffectsofvaryingdeadlinesandcrisistaskgrouparrivalprobability.Basedontheexperiments,wenotedtheneedfordifferentcoordinationstrategiesindifferentsituationstoachievegoodperformance.Inthissection,weintendtodemonstratethepowerofCOLLAGEinchoosingthemostappropriatecoordinationstrategyinagivensituation.

Weperformedtwosetsofexperimentsvaryingtheprobabilityofthecentersseeingcrisistasks.Inthefirstsetofexperiments,thecrisistaskgrouparrivalprobabilitywas0.25andinthesecondsetitwas1.0.Forbothsetsofexperiments,lowprioritytasksarrivedwithaprobabilityof0.5andtheroutinetaskswerealwaysseenatthetimeofnewarrivals.Adayconsistedofatimesliceof140timeunitsandhencethedeadlineforthetaskstructureswasfixedat140timeunits.Intheexperimentsdescribedhere,utilityistheprimaryperformancemeasure.Eachmessageanagentcommunicatestoanotheragentpenalizestheoverallutilitybyafactorcalled

5.1.1Results

Fortheexperimentswherecrisistaskgrouparrivalprobabilitywas0.25,COLLAGEwastrainedon4500instancesinMode1andon10000instancesinMode2.Forthecasewherecrisistaskgrouparrivalprobabilitywas1.0,itwastrainedon2500instancesinMode1andon12500instancesinMode2.Inourexperiments,thesizeofthecase-baseforMode1learningwasdeterminedbyplottingtheperformanceofthelearner,averagedover500instancesrandomlygeneratedbythegrammar,forvarioussizesoftheinstance-base,andatcommunicationcostof0.Thelearningwasstoppedwhentheperformanceimprovementtaperedoff.Mode2,wastrainedtillitachievedtheperformanceofMode1learner,atcommunicationcostof0.Figure2showstheaveragequalityover100runsfordifferentcoordinationstrategiesatvariouscommunicationcosts.ThecurvesforbothMode1andMode2learningalgorithmslieabovethoseforalltheothercoordinationstrategiesforthemostpartinbothexperiments.WeperformedaWilcoxonmatched-pairsignedranksanalysistotestforsignificantdifferences(atsignificancelevel0.05)betweenaverageperformancesofthestrategiesacrosscommunicationscostsupto1.0(asopposedtopairwisetestsateachcommunicationcost).Thistestrevealedsignificantdifferencesbetweenthelearningalgorithms(bothMode1andModeII)andtheotherthreecoordinationstrategiesindicatingthatwecanassertwithahighdegreeofconfidencethattheperformanceofthelearningalgorithmsacrossvariouscommunicationcostsisbetterthanstaticallyusinganyoneofthefamilyofcoordinationstrategies.Asthecommunicationcostsgoup,themeanperformanceofthecoordinationstrategiesgodown.Forcrisistaskgrouparrivalprobabilityof0.25,thebalancedcoordinationstrategyperformsbetterthanthelearningalgorithmsatveryhighcommunicationcostsbecause,learningalgorithmsuseadditionalunitsofcommunicationtoformtheglobalsituationvectors.Atveryhighcommunicationcosts,eventhethreeadditionalmeta-levelmessagesforlocalsituationcommunication(oneforeachagent)ledtolargepenaltiesonutilityofsystem.Atcommunicationcostof1.0,Mode1learnerandMode2learneraverageat77.72and79.98respectively,whereas,choosingbalancedalwaysproducesanaverageperformanceof80.48.Similarbehaviorwasexhibitedatveryhighcommunicationcostswhenthecrisistaskgrouparrivalprobabilitywas1.0.Figures3givesthenumberofcoordinationstrategiesofeachtypechoseninthe100testrunsforMode1learningwhencrisistaskgroupprobabilitywas1.0.

5.2ExperimentswithSyntheticdomains

InordertotestCOLLAGEoninterestingscenarioswitharangeofcharacteristics,wecreatedanumberof“syntheticdomaintheories”usinggraphgrammarformalisms[23].Asyntheticgrammarrepresentsadomaintheorythatisnotgroundedinanyreallifeapplication.WetestedCOLLAGEonthreedifferentsyntheticenvironmentsgeneratedbytheircorrespondinggrammars.Intheseexperiments,therewerefouragentsinthesystemandeachagenthasonlyapartialviewofanewtaskgroupintheenvironment.Uponseeingataskgroup,aCOLLAGEagentformsalocalsituationvectorandcommunicatesittootheragentstoenablethemtogetamoreglobalviewoftheproblemsolvingprocess.Eachagentformsaglobalsituationvector(thesameforallagents)andindexesintoitsinstancebaseofpastexperiencetochooseagoodcoordinationstrategyforthe

1069686Average Utility76665646362600.10.20.30.40.50.60.70.80.91Communication Cost

a) Crisis TG 0.25

Collage M1Collage M2dataflowbalancedrough240230220Average Utility21020019018017016015000.10.20.30.40.50.60.70.80.91Communication Costb) Crisis TG 1.0

Figure2:AverageQualityversusCommunicationCost

605060504030201004030201000.0500.10.25RO0.5UGH0.75BA1LANC2ED3DATAFLOWFigure3:StrategieschosenbyCOLLAGEM1forCrisisTGProbability1.0atvariouscommuni-cationcosts

14

presentsituationfromamongbalanced,mute,myopic,simple,andtoughcoordinationstrategies.

5.2.1Grammar1&Grammar2

ForthedomainrepresentedbyG1,COLLAGEwastrainedon2500instancesinMode1and18000instancesinMode2.Figure4ashowstheaveragequalityoverthesame100runsfordifferentcoordinationstrategiesatvariouscommunicationcosts.ThecurvesforbothMode1andMode2learningalgorithmslieabovethoseforalltheothercoordinationstrategies.WeperformedaWilcoxonmatched-pairsignedranksanalysistotestforsignificantdifferences(atsignificancelevel0.05)betweenaverageperformancesofthestrategiesacrosscommunicationscosts.Thistestrevealedsignificantdifferencesbetweenthelearningalgorithms(bothMode1andMode2)andtheotherfivecoordinationstrategies,indicatingthatwecanassertwithahighdegreeofconfidencethattheperformanceofthelearningalgorithmsacrossvariouscommunicationcostsisbetterthanstaticallyusinganyoneofthefamilyofcoordinationstrategies.AswiththeDDPdomain,whenthecommunicationcostsgoup,themeanperformanceofthelearningalgorithmsgoesdown.Atsomehighvalueofcommunicationcost,theperformanceofthelearningalgorithmfallsbelowthatofmutebecauselearningagentsusefourunitsofcommunicationtopublishtheirlocalsituationvectors.Thecostofcommunicatingtheselocalsituationvectorscanoverwhelmthebenefitsderivedfromabetterchoiceofthecoordinationstrategy.Whenthecommunicationcostwasashighas0.25,theperformancesofMode1andMode2learnerswere12.495and12.88respectively.Ontheotherhandmutecoordinationgaveameanperformanceof13.55.

161514Average Utility13121110987600.010.020.030.040.050.060.070.08Communication Cost

a) Grammar 1

Collage M1Collage M2modularmutesimpletoughmyopicAverage Utility121110987654300.010.020.030.040.050.060.070.08Communication Cost

b) Grammar 2

Figure4:AverageQualityversusCommunicationCost

Grammar2issimilartoGrammar1instructurebutthedistributionofqualityanddurationoftheleafmethodsisdifferentfromthoseinGrammar1.Figure4bshowstheaveragequalityfordifferentcoordinationstrategiesatvariouscommunicationcosts.TheperformancebehaviorofthelearningalgorithmsissimilarasinthecaseofGrammar1.ForCOLLAGEMode2,theaverageperformanceinfactfallsbelowthatofmuteatcommunicationcost0.075.Whenthecommunicationcostwas

15

ashighas0.25,theperformanceofCOLLAGEMode1wasat10.762whichisstillhigherthanmuteat10.17.However,COLLAGEMode2fellto7.735.AcloserexaminationofFigure5andFigure6,thatgivethenumberofcoordinationstrategiesofeachtypechoseninthe100testrunsbyCOLLAGEinMode1andMode2,revealedthereasonsforthis.Ascommunicationcostsincrease,mutebecomestheoverwhelmingfavoritebutevenatashighcommunicationcostsas0.08,therearestillproblemswheremuteisnotthebestchoice.COLLAGEMode1chosemutemostoftheinstances(74%)exceptafewwheremuteisnotagoodcoordinationstrategy.Thisledtoabetterperformancethatcompensatedfortheincreasedcommunicationcost.Ontheotherhand,COLLAGEMode2chosemoreoftheothercommunicationintensivecoordinationstrategies(39%)leadingtopoorerperformances.WebelievethatmoretrainingintheMode2maybeabletorectifythissituation.NotethatwestoppedthetrainingofMode2,whenitsperformancewasaroundthatofMode1learneratcommunicationcost0.Thedivergencebetweentheirperformancesoccurredatnon-zerocommunicationcosts,andhencewasnotdetectedduringlearning.

908070605040302010090807060504030201000100.0000.0015MUT5E200BA00501LA0.015MYONCED0.05TO0.02PIC50.UGFigure5:CoordinationStrategieschosenbyCOLLAGEM1forG2

5.2.2WhennottoLearn

Syntheticgrammar“G3”isaninterestingcasestudy.WetrainedCOLLAGEontheG3domaininbothMode1andMode2andtestedthemon100runsfordifferentcoordinationstrategiesatvariouscommunicationcosts.WefoundthattoughcoordinationstrategyperformsslightlybetterthanCOLLAGE(seeFigure7).Uponcloserexaminationoftheprobleminstances,itwasnotedthattoughwasthebestperformerin81%oftheinstancesandothercoordinationstrategiesdidbetterintherestofthe19%.COLLAGElearnstochoosetherightcoordinationstrategyinallthe100instances.However,theagentsrequireadditionalunitsofcommunicationofmeta-levelinformationtoformtheglobalsituationvectoranddecidethattoughisthestrategyofchoice(inmostcases).Thelessonwelearnfromthisgrammaristhat,ifthereisanoverwhelmingfavoriteforbestperformanceinthefamilyofstrategies,thenitmaynotpaytouseCOLLAGEtodetermine

0.SIMPL0.075H25E7060504030201007060504030201000000251.00MUT00E50.01SI0MP00.025MYO0.015LE0.05BALPICFigure6:CoordinationStrategieschosenbyCOLLAGEM2forG2

thebestperformerthroughadditionalsituationcommunication.Stickingtothefavoritewithoutawarenessofthenonlocalsituationmayyieldasgoodaperformance.However,ifthefewcasesthatwarrantthechoiceofanotherstrategygivefarsuperiorperformance,thenthegainsfromchoosingastrategycanmorethancompensatefortheadditionalcommunication.This,however,wasnotthecaseinenvironmentsproducedbygrammarG3.

4636Average Quality26166-4-1400.10.20.30.40.50.60.70.80.91Communication Cost

ibl-offlineibl-onlinemutebalancedmyopicsimpletoughFigure7:AverageQualityversusCommunicationCostforGrammarG3

5.3Discussion

COLLAGEchoosesanappropriatecoordinationstrategybyprojectingdecisionsfrompastsimilarexperienceintothenewlyperceivedsituation.COLLAGEagentsperformedbetterthanthoseusingasinglecoordinationstrategyacrossallthe100instancesinalldomainsweexperimentedwith,exceptG3.Inthesedomains,thecostincurredbyadditionalcommunicationfordetectingglobalsituationisoffsetbythebenefitsofchoosingacoordinationstrategybasedongloballygrounded

17

0.25TOU0.07ANC5GHEDlearnedknowledge.DomainG3,however,isdistinguishedbythefactthatthereislittlevarianceinthechoiceofthebestcoordinationstrategyanditwasalmostalwaystough.Thishighlightsthefactthatlearningisespeciallybeneficialinmoredynamicenvironments.

6Conclusion

Manyresearchershaveshownthatnosinglecoordinationmechanismisgoodforallsituations.However,thereislittleintheliteraturethatdealswithhowtodynamicallychooseacoordinationstrategybasedontheproblemsolvingsituation.Inthispaper,wepresentedtheCOLLAGEsystem,thatusesmeta-levelinformationintheformofabstractcharacterizationofthecoordinationprobleminstancetolearntochoosetheappropriatecoordinationstrategyfromamongaclassofstrategies.Inthefirstphaseofcoordination,agentsexchangelocalsituationsthatareabstractionsofthetheirviewsoftheprobleminstancesandformaglobalsituation.Theyusethisglobalsituationtoguidetheirlearningandthesubsequentexploitationofthelearnedcoordinationknowledge.Ourexperimentsprovidestrongempiricalevidenceofthebenefitsoflearningsituation-specificcoordination.

However,thescopesituation-specificlearninggoesbeyondlearningcoordination.Theideaspresentedherearerelevanttothemoregeneralproblemoflearningcooperativecontrolinamulti-agentsystem.Effectivecooperationinamulti-agentsystemrequiresthattheglobalproblem-solvingstateinfluencethelocalcontroldecisionsmadebyanagent.Wecallsuchaninfluencecooperativecontrol[20].Coordinationstrategiesareaformofcooperativecontrol.Anagentwithapurelylocalviewoftheproblemsolvingstatecannotlearntomakeeffectiveproblemsolvingcontroldecisionsaboutcoordinationbecausethesedecisionmayhaveglobalimplications.Wethusadvocatetheneedforlearningglobally-situatedlocalcooperativecontrolknowledge.Anagentassociatesappropriateaspectsoftheglobalproblemsolvingstatewiththecooperativecontrolknowledgethatitlearns.Incooperativesystems,anagentcan“ask”otheragentsforanyinformationthatitdeemsrelevanttoappropriatelysituateitslocalcontrolknowledge.However,asmentionedpreviously,anagentcannotutilizetheentireglobalproblemsolvingstateevenifotheragentsarewillingtocommunicateallthenecessaryinformationbecauseofthelimitationsonitscomputationalresources,representationalheterogeneityandphenomenonlikedistraction[19].Thisleadsustotheimportanceofcommunicatingonlyrelevantinformationatsuitableabstractionstootheragents.Wecallsuchmeta-levelinformationasituation.Anagentthusneedstoassociateappropriateviewsoftheglobalsituationwiththeknowledgelearnedabouteffectivecontroldecisions.Wecallthisformofknowledgesituation-specificcontrol.Wehavelookedattherelevanceofthetheoryofsituation-specificlearninginthecontextoflearningotherformsofproblemsolvingcontrolincooperativemulti-agentsystems[25,26].

Moreover,COLLAGEdemonstratestheutilityandviabilityoflearningcoordinationincom-plex,realisticmulti-agentsystems.Thedomainsinthisworkarerealisticandcoordinationstrate-giesarecomplex.Someoftheworkinmulti-agentroboticsystems[22]alsodealswithrealisticandcomplexsystemsbutthatworkisprimarilyconcernedwithhomogeneousagents.Theworkinthispaperdealswithheterogeneousagents.TheyneedtospeakacommonlanguagetotheextentofbeingabletounderstandTÆMSrepresentations.Theycanorganizetheirlocalcomputationsandschedulinginanymannertheydesire.

Theworkhereisexcitingtousasmuchforthekindsofquestionsandchallengesituncoversasitisfortheresultsobtainedsofar.Someoftheimportantquestionsthatitraisesare“Whatisasuitablelevelofabstractionforasituationandhowcanonedetermineit?”,“Howspecificistheinstancebasetoaparticulardomainandhowmuchofitcanbetransferedacrossdomains?”.

OneimportantconcernaboutCOLLAGEisitsscalability.Asthenumberofcoordinationalter-nativesbecomelargeinnumber,thelearningphasecouldbecomecomputationallyveryintensiveandtheinstance-basesizecouldincreaseenormouslywithrespecttoMode2.Wearelookingathowtointegratemethodsforprogressivelyrefiningsituationvectorssuchasthosein[34],waystoorganizetheinstance-basetoaccessanddetectregionswherethereisinsufficientlearningandalsowaystodomoredirectedexperimentationduringlearningratherthanrandomlysamplingtheproblemspace.

InCOLLAGE,alltheagentsformidenticalinstance-bases.Wecouldaswellhavedonewithonedesignatedagentformingtheinstance-baseandchoosingthecoordinationstrategy.However,ourconfigurationwassetupwithamoregeneralschemeinmind.Insteadofallagentschoosingthesamecoordinationstrategy,theycanchooseinapairwiseoragroup-wisemannersothatasubsetoftheagentscoordinatetochoosethesamestrategy.Thiswillleadtodifferentcase-basesatdifferentagentsandanagentmayhavemorethanonecase-baseifitisapartofmorethanonegroup.Thisleadsustoanotherscalabilityissue:thenumberofagents.Iftherearealargenumberofagents,thencommonsituationvectorsmaylose“toomany”detailsaboutthesituations.Pairwiseorgroup-wisecoordinationmaybeabetteroption.However,wehavetodealwithissuessuchasinconsistentandconflictingknowledgeamongthecase-bases,formationofappropriategroups,anddifferentamountsoflearningfordifferentgroups.

References

[1]R.SuttonA.BartoandC.Watkins.Learningandsequentialdecisionmaking.InM.Gariel

andJ.W.Moore,editors,LearningandComputationalNeuroscience,Cambridge,MA,1990.MITPress.[2]D.W.Aha,D.Kibler,andM.K.Albert.Instance-basedlearningalgorithms.Machine

Learning,6:37–66,1991.[3]RobertH.CritesandAndrewG.Barto.Improvingelevatorperformanceusingreinforcement

learning.InAdvancesinNeuralInformationProcessingSystems8,MITPress,Cambridge,MA,1996.[4]KeithS.Decker.EnvironmentCenteredAnalysisandDesignofCoordinationMechanisms.

PhDthesis,UniversityofMassachusetts,1995.

19

[5]KeithS.DeckerandVictorR.Lesser.Quantitativemodelingofcomplexcomputationaltask

environments.InProceedingsoftheEleventhNationalConferenceonArtificialIntelligence,pages217–224,Washington,July1993.[6]KeithS.DeckerandVictorR.Lesser.Designingafamilyofcoordinationalgorithms.In

ProceedingsoftheFirstInternationalConferenceonMulti-AgentSystems,pages73–80,SanFrancisco,CA,June1995.AAAIPress.[7]E.DurfeeandV.Lesser.Predictabilityvs.responsiveness:Coordinatingproblemsolversin

dynamicdomains.InProceedingsoftheSeventhNationalConferenceonArtificialIntelli-gence,pages66–71,St.Paul,Minnesota,August1988.[8]MarkS.Fox.Anorganizationalviewofdistributedsystems.IEEETransactionsonSystems,

Man,andCybernetics,11(1):70–80,January1981.[9]J.Galbraith.OrganizationalDesign.Addison-Wesley,Reading,MA,1977.

[10]A.GarlandandR.Alterman.Multi-agentlearningthroughcollectivememory.InProceedings

ofthe1996AAAISpringSymposiumonAdaptation,Co-evolutionandLearninginMultiagentSystems,Stanford,CA,1996.[11]AlanGarvey,KeithDecker,andVictorLesser.Anegotiation-basedinterfacebetweena

real-timeschedulerandadecision-maker.CSTechnicalReport94–08,UniversityofMas-sachusetts,1994.[12]AlanGarveyandVictorLesser.Design-to-timereal-timescheduling.IEEETransactionson

Systems,ManandCybernetics,23(6):1491–1502,1993.[13]ItzhakGilboaandDavidSchmeidler.Case-basedDecisionTheory.TheQuaterlyJournalof

Economics,pages605–639,August1995.[14]JohnJ.Grefenstette.TheEvolutionofStrategiesformulti-agentenvironments.Adaptive

Behavior,1(1):65–89,1992.[15]ThomasHaynesandSandipSen.Learningcasestoresolveconflictsandimprovegroup

behavior.Submitted,1996.[16]J.H.Holland.AdaptationinNaturalandArtificialSystems.UniversityofMichiganPress,

AnnArbor,Michigan,1975.[17]J.H.Holland.Propertiesofbucketbrigadealgorithm.InFirstInternationalConferenceon

GeneticAlgorithmsandtheirApplications,pages1–7,Pittsburgh,PA,1985.[18]PaulLawrenceandJayLorsch.OrganizationandEnvironment.HarvardUniversityPress,

Cambridge,MA,1967.[19]V.R.LesserandL.D.Erman.Distributedinterpretation:Amodelandanexperiment.IEEE

TransactionsonComputers,C-29(12):1144–1163,December1980.

20

[20]VictorR.Lesser.AretrospectiveviewofFA/Cdistributedproblemsolving.IEEETransactions

onSystems,Man,andCybernetics,21(6):1347–1362,1991.[21]ThomasMaloneandKevinCrowston.Towardaninterdisciplinarytheoryofcoordination.

CenterforCoordinationScienceTechnicalReport120,MITSloanSchoolofManagement,1991.[22]M.J.Mataric.Learningtobehavesocially.InProceedingsoftheThirdInternational

ConferenceonSimulationofAdaptiveBehavior(SAB-94),1994.[23]M.V.NagendraPrasad,KeithS.Decker,AlanGarvey,andVictorR.Lesser.Exploring

OrganizationalDesignswithTAEMS:ACaseStudyofDistributedDataProcessing.InProceedingsoftheSecondInternationalConferenceonMulti-AgentSystems,Kyoto,Japan,December1996.AAAIPress.[24]M.V.NagendraPrasadandV.R.Lesser.LearningSituation-SpecificCoordinationinGener-alizedPartialGlobalPlanning.In1996AAAISpringSymposiumonAdaptation,Co-evolutionandLearninginMulti-agentSystems,Stanford,CA,1996.AAAIPress.[25]M.V.NagendraPrasad,V.R.Lesser,andS.E.Lander.Cooperativelearningovercom-positesearchspaces:Experienceswithamulti-agentdesignsystem.InThirteenthNationalConferenceonArtificialIntelligence(AAAI-96),Portland,Oregon,August1996.AAAIPress.[26]M.V.NagendraPrasad,V.R.Lesser,andS.E.Lander.Learningorganizationalrolesina

heterogeneousmulti-agentsystem.InProceedingsoftheSecondInternationalConferenceonMulti-AgentSystems,Kyoto,Japan,December1996.AAAIPress.[27]J.S.RosenscheinandG.Zlotkin.Designingconventionsforautomatednegotition.AI

Magazine,pages29–46,Fall1994.[28]T.SandholmandR.Crites.Multi-agentreinforcementlearningintherepeatedprisoner’s

dilemma.toappearinBiosystems,1995.[29]SandipSen,M.Sekaran,andJ.Hale.Learningtocoordinatewithoutsharinginformation.

InProceedingsoftheTwelfthNationalConferenceonArtificialIntelligence,pages426–431,Seattle,WA,July1994.AAAI.[30]Y.ShohamandM.Tennenholtz.Onthesynthesisofusefulsociallawsforartificialagent

societies(preliminaryreport).InProceedingsoftheTenthNationalConferenceonArtificialIntelligence,pages276–281,SanJose,July1992.[31]S.S.Sian.Extendinglearningtomultipleagents:issuesandamodelformulti-agentmachine

learning.InProceedingsofMachineLearning-EWSL91,pages440–456,Springer-Verlag,1991.[32]B.Silver,W.Frawely,G.Iba,J.Vittal,andK.Bradford.Aframeworkformulti-paradigmatic

learning.InProceedingsoftheSeventhInternationalConferenceonMachineLearning,pages348–358,1990.

21

[33]ArthurL.Stinchcombe.InformationandOrganizations.UniversityofCaliforniaPress,

Berkeley,CA,1990.[34]T.SugawaraandV.R.Lesser.On-linelearningofcoordinationplans.InProceedingsofthe

TwelfthInternationalWorkshoponDistributedAI,HiddenValley,Pa,May1993.[35]R.Sutton.Learningtopredictbythemethodsoftemporaldifferences.MachineLearning,

3:9–44,1988.[36]M.Tan.Multi-agentreinforcementlearning:Independentvs.cooperativeagents.InPro-ceedingsoftheTenthInternationalConferenceonMachineLearning,pages330–337,1993.[37]G.Weiss.Somestudiesindistributedmachinelearningandorganizationaldesign.Technical

ReportFKI-189-94,InstitutfrInformatik,TUMnchen,1994.

22

因篇幅问题不能全部显示,请点此查看更多更全内容

Top