您的当前位置：首页 Online Stochastic Optimization Without Distributions

Online Stochastic Optimization Without Distributions

来源：好走旅游网

OnlineStochasticOptimizationWithoutDistributions

RussellBentandPascalVanHentenryck

BrownUniversityProvidence,RI02912{rbent,pvh}@cs.brown.edu

Abstract

Thispaperconsidersonlinestochasticschedulingprob-lemswheretimeconstraintsseverelylimitthenumberofofﬂineoptimizationswhichcanbeperformedatde-cisiontimeand/orinbetweendecisions.Priorresearchhasdemonstratedthat,wheneveradistributionoftheinputsisavailableforsampling,onlinestochaticalgo-rithmsmayproducesigniﬁcantimprovementsinsolu-tionqualityoverobliviousapproaches.However,theavailabilityofaninputdistribution,althoughreason-ableinmanycontexts,istoostrongarequirementinavarietyofapplications.Thispaperbroadenstheap-plicabilityofonlinestochasticalgorithmsbyrelaxingthisrequirementandusingmachinelearningtechniquesorhistoricaldatainstead.Inparticular,itshowsthatmachinelearningtechniquescanbeengineeredtolearnthedistributiononline,whenitsunderlyingstructureisavailable.Moreover,thepaperpresentstheideaofhistoricalsamplingwhichprovideasimpleandeffec-tivewaytoleveragehistoricaldataincontinuousandperiodiconlineoptimization.Experimentalresultsonpacketschedulingandvehicleroutingindicatethepo-tentialofmachinelearningandhistoricalsamplingforonlinescheduling.

Introduction

Onlineschedulingandroutingproblemsarisenaturallyinmanyapplicationareasandhavereceivedincreasingatten-tioninrecentyears.Contrarytoofﬂineoptimization,thedataisnotavailableaprioriinonlineoptimization.Ratheritisincrementallyrevealedduringalgorithmexecution.Inmanyonlineoptimizationproblems,thedataisasetofre-quests(e.g.,packetsinnetworkschedulingorcustomersinvehiclerouting)whicharerevealedovertimeandthealgo-rithmmustdecidewhichrequesttoprocessnext.Inaddi-tion,timeconstraintstypicallyrestrictthenumberofofﬂineoptimizationsthatcanbeperformedatdecisiontimeand/orinbetweendecisions.Onlineproblemsofthiskindariseinmanyapplications,includingvehiclerouting,taxidispach-ing,packetscheduling,andonlinedeliveries.Itisalsousefultodistinguishtwoclassesofsuchproblems:continuousandperiodicoptimization.Incontinuousoptimation,theonline

initiallyproposedin(Chang,Givan,&Chong2000),thedistributionsarespeciﬁedintermsofMarkovModels(MM)while,invehiclerouting,onlyhistoricaldatafromprevi-ousdaysisavailable.Theexperimentalresultsarepartic-ularlyinteresting.Onpacketschedulingproblems,theMLapproachcanbeengineeredtobesufﬁcientlyfastandpre-cisesothatonlinealgorithmsbecomeessentiallycompara-blewhenusingtheexactorthelearneddistributions.More-over,historicalsampling,whichisamazinglysimpletoim-plement,isalsoshowntobecomparablewiththeexactdis-tribution.Thevalueofhistoricalsamplingisalsodemon-stratedonvehiclerouting,conﬁrmingitseffectivenessofbothcontinuousandperiodicapplications.Interestlingly,historicalsamplingremainseffectiveevenifthehistoricaldataisrelativelysmall.

Asaconsequence,thepaperdemonstratesthatonlinestochasticoptimizationalgorithmscanbeeffectiveevenwithoutdistributions,broadeningtheirapplicabilitysigniﬁ-cantlythankstotheuseofmachinelearningortheexploita-tionofhistoricaldata.

Therestofthispaperisorganizedasfollows.Theﬁrstthreesectionspresenttheonlinestochasticframework,theonlinestochasticalgorithms,thesamplingalgorithmwhenthedistributionisknown.Thenextthreesectionsconsti-tutethecoreofthepaper.TheydescribetheMLandHDapproachesandtheirexperimentalresults.Thelastsectionconcludesthepaper.

TheOnlineStochasticFramework

Thissectionpresentstheonlinestochasticframeworkinitssimplestformtocristallizetheideas.Itsgeneralizationsaredescribedindetailin(Bent&VanHentenryck2004a).TheOfﬂineProblemTheframeworkassumesadiscretemodeloftimeandtheofﬂineproblemconsidersatimehori-zonH=[HH]andanumberofrequestsR.Eachrequestrisassociatedwithaweightw(r)whichrepresentsthegainiftherequestisserved.Asolutiontotheofﬂineproblemservesarequestrateachtimet∈HandcanbeviewedasafunctionH→R.Solutionsmustsatisfytheproblem-speciﬁcconstraintswhichareleftunspeciﬁedintheframe-work.󰀁Thegoalistoﬁndafeasiblesolutionσmaximiz-ingt∈Hw(σ(t)).Intheonlineversion,therequestsarenotavailableinitiallyandbecomeprogressivelyavailableateachtimestep.

TheOnlineProblemTheonlinealgorithmshaveattheirdisposalaproceduretosolve,orapproximate,theofﬂineproblem,aswellasthedistributionoffuturerequests.Thedistributionisconsideredasablack-boxwhichisonlyavail-ableforsampling.Since,oncontinuousapplications,itisnotpracticaltosamplethedistributionfortheentiretimehorizon,thedesiredsizeofthesamplesisasamplingargu-ment.

TimeConstraintsPracticalapplicationsoftenincludese-veretimeconstraintsonthedecisiontimeand/oronthetime

ONLINEOPTIMIZATION(H)

1R←∅;2w←0;3fort∈H

4doΘ(t)←NEWREQUESTS(t);

5R←AVAILABLEREQUESTS(R,t)∪Θ(t);6r←CHOOSEREQUEST(R,t);7SERVEREQUEST(r,t);8w←w+w(r);9R←R\\{r};

Figure1:TheGenericOnlineAlgorithm

betweendecisions.Tomodelthisrequirement,thealgo-rithmsmayonlyusetheofﬂineprocedureOtimesateachtimestep.

PropertiesoftheFrameworkTheframeworkisgeneralenoughtomodelavarietyofpracticalapplications,yetithassomefundamentalcomputationaladvantagescomparedtoothermodels.Thekeyobservationisthat,inmanypracti-calapplications,theuncertaintydoesnotdependonthede-cisions.Thereisnoneedtoexploresequencesofdecisionsand/ortreesofscenarios:thedistributioncanbesampledtoprovidescenariosofthefuturewithoutconsideringthede-cisions.Asaconsequence,theframeworkprovidessignif-icantcomputationaladvantagesovermoregeneralmodelssuchasmulti-stagestochasticprogramming(Birge&Lou-veaux1997),whichisanofﬂineframework,andpartiallyobservableMarkovdecisionprocesses(POMDPs)(Kael-bling,Littman,&Cassandra1998),wheretheuncertaintycandependonthedecisionsAsmentionedearlier,theframe-workcanbenaturallyextendedtoincorporateservicecom-mitments,multipledecisions,andimmediatedecisionmak-ing(Bent&VanHentenryck2004a).

TheGenericOnlineAlgorithmThealgorithmsinthispapersharethesameonlineoptimizationschemadepictedinFigure1.Theydifferonlyinthewaytheyimplementfunc-tionCHOOSEREQUEST.Theonlineoptimizationschemasimplyconsidersthesetofavailableandnewrequestsateachtimestepandchoosesarequestrwhichisthenservedandremovedfromthesetofavailablerequests.FunctionAVAILABLEREQUEST(R,t)returnsthesetofrequestsavail-ableforserviceattimetandfunctionSERVEREQUEST(r,t)simplyservesrattimet(σ(t)←r).ToimplementfunctionCHOOSEREQUEST,thealgorithmshaveattheirdisposaltwoblack-boxes:

1.afunctionOPTSOL(R,t)that,givenasetRofrequestsandatimet,returnsanoptimalsolutionforRover[t,∞];2.afunctionGETSAMPLE([t,∆])thatreturnsasetofre-questsovertheinterval[t,t+∆]bysamplingthearrivaldistribution.Theimplementationofthisblackboxisthefocusoftheworkpresentedhere.Toillustratetheframework,wespecifytwoobliviousalgo-rithmsasinstantiationsofthegenericalgorithm.Theseal-gorithmswillserveasabasisforcomparison.

Greedy(G):Thisalgorithmservestheavailablerequestwithhighestweight.Itcanbespeciﬁedformallyas

CHOOSEREQUEST-G(R,t)1A←READY(R,t);

2returnargmax(r∈A)w(r);

LocalOptimal(LO):Thisalgorithmchoosesthenextrequesttoserveattimetbyﬁndingtheoptimalsolutionfortheavailablerequestsatt.Itcanbespeciﬁedas

CHOOSEREQUEST-LO(R,t)1σ←OPTSOL(R,t);

2returnσ(t);

OnlineStochasticAlgorithms

Thissectionreviewsthevariousonlinestochasticalgorithmsconsideredinthispaper.Itstartswiththeexpectational-gorithm,proposedforinstancein(Chang,Givan,&Chong2000),andshowshowitcanbeadaptedtoincorporatetimeconstraints.

Expectation(E):AlgorithmEchoosestheactionmaxi-mizingexpectationateachtimestep.Informallyspeaking,themethodgeneratesfuturerequestsbysamplingandevaluateseachavailablerequestagainstthatsample.Asimpleimplementationcanbespeciﬁedasfollows:

CHOOSEREQUEST-E(R,t)1A←READY(R,t);

2forr∈A3dof(r)←0;4fori←1...O/|A|5doS←R∪GETSAMPLE([t+1,∆]);6forr∈A7doσ←OPTSOL(S\\{r},t+1);8f(r)←f(r)+w(r)+w(σ);9returnargmax(r∈A)f(r);

Line1computestherequestswhichcanbeservedattimet.Lines2-3initializetheevaluationfunctionf(j)foreachre-questr.Thealgorithmthengeneratesanumberofsamplesforfuturerequests(line4).Foreachsuchsample,itcom-putesthesetRofallavailableandsampledrequestsattimet(line5).Thealgorithmthenconsiderseachavailablerequestrsuccessively(line6),itimplicitlyschedulesrattimet,andappliestheoptimalofﬂinealgorithmusingS\\{r}andthetimehorizon(line7).Theevaluationofrequestrisupdatedinline8byincrementingitwithitsweightandthescoreofthecorrespondingoptimalofﬂinesolution.Allscenariosareevaluatedforallavailablerequestsandthealgorithmthenreturnstherequestr∈Awiththehighestevaluation.Ob-serveLine4ofAlgorithmEwhichdistributestheavailableofﬂineoptimizationsOacrossallavailablerequestsA.WhenOissmall(duetothetimeconstraints),eachre-questisonlyevaluatedwithrespecttoasmallnumberofsamplesandthealgorithmEdonotyieldmuchinforma-tion.Thisispreciselywhyonlinevehicleroutingalgorithms(Bent&VanHentenryck2004c)cannotuseE,sincethenumberofrequestsisverylarge(about50to100),thetime

betweendecisionsisrelativelyshort,andoptimizationsarecomputationallydemanding.Tocopewithtimeconstraints,twoapproximationsofalgorithmE,consensusandregret,havebeenproposed.

Consensus(C):TheconsensusalgorithmCwasintro-ducedin(Bent&VanHentenryck2004d)asanabstractionofthesamplingmethodusedinonlinevehiclerouting(Bent&VanHentenryck2004c).ItskeyideaistosolveeachscenarioonceandthustoexamineOscenariosinsteadofO/|A|.Moreprecisely,insteadofevaluatingeachpossiblerequestattimetwithrespecttoeachsample,algorithmCexecutestheofﬂinealgorithmontheavailableandsampledrequestsoncepersample.Therequestscheduledattimetinoptimalsolutionσiscreditedw(σ)andallotherrequestsreceivenocredit.AlgorithmCcanbespeciﬁedasfollows:

CHOOSEREQUEST-C(R,t)

1forr∈R2dof(r)←0;3fori←1...O4doS←R∪GETSAMPLE([t+1,∆]);5σ←OPTSOL(S,t);6f(σ(t))←f(σ(t))+w(σ);7returnargmax(r∈R)f(r);

Observeline5whichcallstheofﬂinealgorithmwithallavailableandsampledrequestsandatimehorizonstartingattandline6whichincrementsthenumberoftimesrequestσ(t)isscheduledﬁrst.Line7simplyreturnstherequestwiththelargestscore.ThemainappealofAlgorithmCisitsabil-itytoavoidpartitioningtheavailablesamplesbetweentherequests,whichisasigniﬁcantadvantagewhenthenumberofsamplesissmalland/orwhenthenumberofrequestsislarge.Itsmainlimitationisitselitism.Onlythebestrequestisgivensomecreditforagivensample,whileotherrequestsaresimplyignored.Itignoresthefactthatseveralrequestsmaybeessentiallysimilarwithrespecttoagivensample.Moreover,itdoesnotrecognizethatarequestmayneverbethebestforanysample,butmaystillbeextremelyrobustoverall.1

Regret(R):Theregretalgorithmshowshowtogatherthatkindofinformationfromthesamplesolutionswithoutsolv-ingadditionaloptimizationproblems.Itskeyideaistoap-proximatethedeviationofarequest,i.e.,thecostofschedul-ingasuboptimalrequestattimet.

Deﬁnition1(Deviation)LetRbethesetofrequestsattimetandr∈R.ThedeviationofrwrtRandt,denotedbyDEVIATION(r,R,t),isdeﬁnedas

|w(OPTSOL(R,t))−(w(r)+w(OPTSOL(R\\{r},t+1)))|.

AlgorithmRistherecognitionthat,inmanyapplications,itispossibletoestimatethedeviationofarequestrattimetquickly.Inotherwords,oncetheoptimalsolutionσofascenarioiscomputed,itiseasytocomputethedeviationofalltherequests,thusapproximatingEwithoneoptimiza-tion.Thisintuitioncanbeformalizedusingtheconceptofregret.

Deﬁnition2(Regret)Givenarequestr,asetR(r∈R),atimet,,andanoptimalsolutionσ=OPTSOL(R,t),aregretisafunctionthatover-approximatesthedeviationofrwrtRandt,i.e.,

REGRET(r,R,t,σ)≥DEVIATION(r,R,t).

Moreover,thereexisttwofunctionsfoandfrsuchthat•OPTSOL(R,t)runsintimeO(fo(R,∆));•REGRET(r,R,t,σ)runsintimeO(fr(R,∆));•|A|fr(R,∆)isO(fo(R,∆)).

Intuitively,thecomplexityrequirementstatesthatthecomputationofthe|A|regretsdoesnottakemoretimethantheoptimization.Regretstypicallyexistinpracticalapplications.Inanonlinefacilitylocationproblem,theregretofopeningafacilityfcanbeestimatedbyevaluatingthecostofclosingtheselectedfacilityσ(t)andopeningf.Invehiclerouting,theregretofservingacustomercnextcanevaluatedbyswappingcwiththeﬁrstcustomeronthevehicleservingc.Inpacketscheduling,theregretofservingapacketpcanbeestimatedbyswappingand/orservingaconstantnumberofpackets.Inallcases,thecostofcomputingtheregretissmallcomparedtothecostoftheofﬂineoptimizationandsatisfytheaboverequirements.Notethatthereisaninterestingconnectiontolocalsearch,sincecomputingtheregretmaybeviewedasevaluatingthecostofalocalmovefortheapplicationathand.WearenowreadytopresenttheregretalgorithmR:

CHOOSEREQUEST-R(R,t)

1forr∈R2dof(r)←0;3fori←1...O4doS←R∪GETSAMPLE([t+1,∆]);5σ←OPTSOL(S,t);6f(σ(t))←f(σ(t))+w(σ);7forr∈READY(R,t)\\{σ(t)}8dof(r)←f(r)+(w(σ)−REGRET(σ,r,R,t));9returnargmax(r∈R)f(r);

ItsbasicorganizationfollowsalgorithmC.However,insteadofassigningsomecreditonlytotherequestselectedattimetforagivensamples,algorithmR(lines7-8)usestheregretstocompute,foreachavailablerequestr,anapproximationofthebestsolutionofsservingrattimet,i.e.,w(σ)−REGRET(σ,r,R,t).Henceeveryavailablerequestisgivenanevaluationforeverysampleattimetforthecostofasingleofﬂineoptimization(asymptotically).ObservethatalgorithmRperformsOofﬂineoptimizationsattimet.

Sampling

Todescribethestochasticalgorithmsentirely,itisneces-sarytospecifyfunctionGETSAMPLE,i.e.,howtogener-atesamplesfromthedistribution.Tomakethepresentation

a: .95b: .012b: .30a: .02a: .05b: .02b: .1513a: .10a: .50b: .60a: .05b: .25Figure2:ExampleMarkovModel

concrete,thispaperassumesthattheunderlyingdistributionisaMarkovmodel(Feller1957),althoughtheprinciplescaneasilybeadaptedtoavarietyofdistributions.Inad-dition,thepaperassumesforsimplicitythatthestructureoftheMarkovmodeldirectlyreﬂectstheuncertaintypresentintheonlinestochasticoptimizationframework.Morepre-cisely,eachstates∈SintheMarkovmodelrepresentsadifferentarrivalprobabilityfortherequestsandthetransi-tionsT(s1,s2)betweenstatess1ands2representchangesinthearrivalratesoftherequests.Finally,thetransitionfunctionTisaugmentedbyathirdparameterspecifyingthesetofrequestsgeneratedbyatransition.ThusT(s1,s2,w)representstheprobabilityofgoingfroms1tosw.Observethatatransition(possibly2andpro-ducingoutputaself-transition)istakenateverytimesteptinordertogeneratetherequests.Figure2providesanexampleofsuchaMarkovmodel,withtwopossibleoutputs,aandb.

Giventheseassumptions,itiseasytospecifyfunctionGETSAMPLEforfullyobservabledistributions:

GETSAMPLE-FO(t,∆)

1returnRANDOMWALK(St,∆);

Thealgorithmsimplyperformsarandomwalkof∆stepsintheMarkovmodelbeginningatthestateobservedattimet.Therandomwalkisspeciﬁedas

RANDOMWALK(s,∆)

1R←∅;

2fori←1...∆3do←SELECTRANDOMTRANSITION(s,T);

4R←R∪w;5s←s󰀂;6returnR;

wherelines2-5walkthroughtheMarkovmodelfor∆steps.Line3choosesatransitionatrandombasedonthecurrentstateandthetransitionfunction.Line4storestherequestsgeneratedonthattransition.Therequestsgeneratedduringtherandomwalkarereturnedinline6.

LearningtheDistributionOnline

Itisunrealisticinpracticetoassumeafullyobservabledis-tributionasinputtotheonlineoptimizationalgorithm.Asa

consequence,thissectioninvestigateshowtorelaxthisas-sumptionbyusingmachinelearningtechniques.Itstartsbyrelaxingfullobservabilitybeforeconsideringthecasewherethedistributionparametersarealsounknown.

PartiallyObservableDistributionsEvenifaprecisedis-tributionisavailablefortherequests,itisunreasonabletoassumethatfunctionGETSAMPLEknowsthecurrentstateoftheMarkovmodel.Partialobservabilityisamuchmorere-alisticassumptionadoptedinmanyuncertaintyframeworks(e.g.,(Kaelbling,Littman,&Cassandra1998)).Underpar-tialobservability,thestateStisunknownandcanonlybeinferredthroughobservations,thatisthroughthesetofre-questsgeneratedateachtimestep.

Toinferthecurrentstate,thesamplingalgorithmmakesuseofbeliefstates,afundamentalideafromPOMDPre-search.Moreprecisely,thealgorithmmaintains,foreachstates∈S,aprobabilityBt(s)ofbeinginstatesattimetandupdatestheseprobabilitiesaftereveryobservation.Animplementationofsamplingunderpartialobservabilityisasfollows:

GETSAMPLE-PO(t,∆)

1fors∈S2doBt(s)←PR(s|Θ(t),Bt−1);3s←RANDOM(S,Bt);

4returnRANDOMWALK(s,∆);

whereRANDOM(S,B)selectsarequestsinSwithproba-bilityB(s).Line2updatesthebeliefprobabilitiesbycon-ditioningtheprobabilityofbeinginstatesattimetonthebeliefstatesattimet−1andtherequestsobservedattimet.Line3choosesaninitialstaterandomlyusingthebeliefprobabilitiesandline4returnsasetofrequestsbasedonarandomwalkbeginningatstates.

StructuralModeloftheDistributionInmanycircum-stances,evenpartialobservabilityistoostrongarequire-ment.Inparticular,thestructureofthedistributionmaybeknownbutnotitsparameters.Toinfertheseparame-ters,theimplementationofGETSAMPLEmayresorttotheBaum-Welschalgorithm(Baum1972;Charniak1993)forlearningtheparametersofaMarkovmodelfromasequenceofrequests.Thislearningalgorithmstartsbyinitializingthetransitionprobabilitiesarbitrarily.Itthenusestheexistingprobabilitiesandthetrainingsequencetoreﬁnetheproba-bilities.Theprocessisrepeatedwiththenewprobabilitiesuntilthealgorithmconverges.Thealgorithm

GETSAMPLE-PO-TRAIN(t,∆)

1whileConverging2dofors∈S3doαs(0)=Bt−λ(s);4fori←1...λ5doαs(i)←󰀁s󰀃∈Sαs

󰀃(i−1)T(s󰀃,s,Θ(t−λ+i));

6βs(λ)=Bt(s);7fori←(λ−1)8doβs(i)←󰀁

...0s󰀃∈S

βs󰀃(i+1)T(s,s󰀃,Θ(t−i));9

fors∈10doC

󰀂S,s󰀃∈S,w∈(s,s󰀃

,w)←󰀁Ωλαs(t)T(s,s󰀃,w)βs󰀃(t+11

fors∈S,s󰀃∈S,w∈Ω

t=1

1)12

doT(s,s󰀃,w)←

C󰀂

(s,s󰀃,w)

6fori←1...∆λ

;

7doR←R∪RANDOM(P,Ω)8returnR;

countsthenumberoftimeseachpossibleoutputisobservedinthelastλtimesteps(line3)andusesthesenumberstocomputeaprobabilityforthenextoutput(line5).Line7generatesrandomrequestsbasedontheprobabilitiesofthepossibleoutputs.

HistoricalSamplingOneofthedrawbacksofaveragingisthatitlosesthesequencestructure(e.g.,changesinre-questarrivalrates).Historicalsamplingaimsatovercomingthisdrawbackbyusingpastsubsequencesassamples.Itsimplementation

GETS󰀂AMPLE-S(t,∆)

1t←RANDOM([0,t−∆]);

2R←∅;

3fori←t󰀂...t󰀂+∆4doR←R∪Θ(i)5returnR;

isamazinglysimple:Itrandomlyselectsapositioninthepastsequenceofrequestsandgeneratesasampleofsize∆fromthatstartingposition.

Inadditiontoitsconceptualandimplementationsimplic-ity,historicalsamplingisappeallingforavarietyofreasons.First,andcontrarytohistoricalaveraging,itcapturesstruc-turalinformationonthesequence.Second,whenevertheunderlyingdistributionisaMarkovmodel,theresultingse-quencecorrespondstoarandomwalkinthemodelfromarandomstate.Asaconsequence,historicalsamplingim-plicitlyimplementsaversionofthepartialobservabilityal-gorithmwherethebeliefstatesareapproximatedbythestatefrequencies.Third,inthecaseofperiodicoptimization,his-toricalsamplingsimplyamountstousingpastinstancesassamplesofthe(unknowndistribution)withtheadditionalbeneﬁtthatthestartingstateisknowninthiscase.Asaconsequence,historicalsamplingisasimpleandeffectivetechniquetoexploithistoricaldataincontinuousandperi-odiconlineoptimization.

PacketScheduling

Thissectionreportsexperimentalresultsontheonlinepacketschedulingproblemﬁrststudiedin(Chang,Givan,&Chong2000).Thisnetworkingapplicationisofinterestexperimentallysince(1)thenumberofrequeststoconsiderateachtimetissmalland(2)theofﬂinealgorithmcanbesolvedinpolynomialtime.Asaresult,itispossibletoeval-uateallthealgorithmsexperimentally,contrarytovehicleroutingapplicationswherethisisnotpractical.ThepacketschedulingisalsointerestingasitfeaturesacomplexarrivaldistributionforthepacketsbasedonMarkovmodels.TheOfﬂineProblemWearegivenasetJobsofjobsparti-tionedintoasetofclassesC.Eachjobjischaracterizedbyitsweightw(j),itsarrivaldatea(j),anditsclassc(j).Jobsinthesameclasshavethesameweight(butdifferentarrivaltimes).WearealsogivenaschedulehorizonH=[HH]duringwhichjobsmustbescheduled.Eachjobjrequiresasingletimeunittoprocessandmustbescheduledinitstimewindow[a(j),a(j)+d],wheredisthesameconstantfor

100OG95LOEC90RssoL deth85gieW eg80arevA757065020406080100120140160180200Maximum Number of Offline OptimizationsFigure3:FullyObservableSampling

alljobs(i.e.,drepresentsthetimeajobremainsavailabletoschedule).Inaddition,notwojobscanbescheduledatthesametimeandjobsthatcannotbeservedintheirtimewin-dowsaredropped.Thegoalistoﬁndascheduleofmax-imalweight,i.e.,aschedulewhichmaximizesthesumoftheweightsofallscheduledjobs.Thisisequivalenttomin-imizingweightedpacketloss.Moreformally,assume,forsimplicityandwithoutlossofgenerality,thatthereisajobscheduledateachtimestepoftheschedulehorizon.Underthisassumption,ascheduleisafunctionσ:H→Jobswhichassignsajobtoeachtimeintheschedulehorizon.Ascheduleσisfeasibleif

∀t1,t2∈H:t1=t2→σ(t1)=σ(t2)∀t∈H:a(σ(t))≤t≤a(σ(t))+d.

Theweight󰀁ofascheduleσ,denotedbyw(σ),isgivenbyw(σ)=t∈Hw(σ(t)).Thegoalistoﬁndafeasiblesched-uleσmaximizingw(σ).ThisofﬂineproblemcanbesolvedinquadratictimeO(|J||H|).

TheOnlineProblemTheexperimentalevaluationisbasedontheproblemsof(Chang,Givan,&Chong2000;Bent&VanHentenryck2004d),whereallthedetailscanbefound.Intheseproblems,thearrivaldistributionsarespec-iﬁedbyindependentMMs,oneforeachjobclasssimilarinformasseeninFigure2.Thus,eachclasscanbethoughtofashavingitsownblackboxforgeneratingrequestsamples.Theresultsaregivenforthereference7-classproblemsandforanonlinescheduleconsistingof200,000timesteps.Be-causeitisimpracticaltosamplethefutureforsomanysteps,thealgorithmsuseasamplinghorizonof50,whichseemstobeanexcellentcompromisebetweentimeandquality.Theregretfunctionisgivenin(Bent&VanHentenryck2004b)andconsistsofswappingaconstantnumberofpacketsintheoptimalschedule.

SamplingwiththeExactDistributionFigure3depictstheaveragepacketlossasafunctionofthenumberofavail-ableoptimizationsOforthevariousalgorithmsonavariety

100OG95LOE−FOC−FO90R−FOssE−POoLC−PO de85R−POthgieW eg80arevA757065020406080100120140160180200Maximum Number of Offline Optimizations

Figure4:PartiallyObservableSampling

of7-classproblemswhentheMarkovModelisfullyob-servable.Italsogivestheoptimal,aposteriori,packetlossO.TheresultsindicatethevalueofstochasticinformationasalgorithmEsigniﬁcantlyoutperformstheobliviousalgo-rithmsGandLOandbridgemuchofthegapbetweenthesealgorithmsandtheoptimalsolution.NotethatLOisworsethanG,illustratingthe(frequent)pathologicalbehaviorofover-optimizing.

ConsensusisdominatedbyEwhenthenumberofavail-ableoptimizationsincreases,althoughitstillproducessig-niﬁcantimprovementsovertheobliviousalgorithms.Thisisofcoursepertinent,sinceEisnotpracticalformanyprob-lemswithtimeconstraints.Finally,thebeneﬁtsoftheregretalgorithmareclearlyapparent.AlgorithmRindeeddom-inatesalltheotheralgorithms,includingconsensuswhenthereareveryfewofﬂineoptimizations(strongtimecon-straints)andexpectationevenwhenthereareareasonablylargenumberofthem(weaktimeconstraints).Reference(Bent&VanHentenryck2004b)alsoshowsthistobethecaseforcomplexonlinevehicleroutingwithtimewindows.PartialObservabilityFigure4considersthecasewhentheMarkovmodelispartiallyobservableandcomparesittothefullyobservablecase.Itisinterestingtonotethatthereisnosigniﬁcantdegradationinsolutionquality,indicatingthatmaintainingthebeliefprobabilitiesissufﬁcient,onthisproblem,toobtaingoodresults.

StructuralModelFigure5depictstheexperimentalre-sultswhentheMMparametersarelearned.Inordertogetreasonableresults,λissetto560.Theresultsarecompa-rabletotheresultsiftheMMisfullyobservable.However,theseresultsaremisleading,astheydonottakeintoaccounttimespentintraining.Figure6showstheroughamountoftimeittakestotrainonsequencesofacertainsizeintermsoftheequivalentnumberofoptimizations.Ascanbeseenfromthisplot,trainingonasequenceoflength560isroughlyequivalenttoperforming230optimizationseverytimestep!Ifthatistakenintoaccount,theresultslooklikeFigure7.Toacertainextent,thislimitationcanbeover-

100OG95LOE−FOC−FO90R−FOssE−PO−TRAINoLC−PO−TRAIN de85R−PO−TRAINthgieW eg80arevA757065020406080100120140160180200Maximum Number of Offline OptimizationsFigure5:ExperimentalResultsforPacketSchedulingafterLearningtheMMParameters600Optimization500ez400iS elpmaS300 gniniarT2001000050100150200250Equivalent Number of OptimizationsFigure6:Trainingvs.OptimizationTimeTradeoffcomebyreducingthesizeofthetrainingsequence,i.e.,140timesteps,asdepictedinFigure8,albeitatalossofover-allsolutionquality.Amorepromisingdirectionistoapplythelearningalgorithmperiodicallyinsteadofateachtimestepsinordertoamortizethecostoftraining.ThisisshowninFigure9,whichindicatesthefeasibilityofsophisticatedlearningtechniquesinanonlinesetting.Indeed,thelossinsolutionqualitycomparedtotheexactdistributionissmallandtheimprovementwithrespecttoobliviousalgorithmsissigniﬁcant.

HistoricalAveragingFigure10depictstheexperimentalresultsforhistoricalaveraging.Thesearetheweakestre-sultspresentedhere,yetthissimplemethodstillproducessigniﬁcantimprovementsovertheobliviousalgorithms.Theresultsaregivenforλ=500,whichachievesthebestresultsexperimentally.

HistoricalSamplingFinally,Figure11depictstheexper-imentalresultsforhistoricalsamplingwhichareveryinter-esting.Indeed,historicalsamplingproducesthebestresults

100OG95LOE−FOC−FO90R−FOssE−PO−TRAINoLC−PO−TRAIN deR−PO−TRAINth85gieW eg80arevA757065050100150200250300350400Maximum Number of Offline Optimizations

Figure7:ExperimentalResultsforPacketScheduling:In-cludingtheLearningTime100O95GLOE−FO90C−FOssoR−FOL dE−PO−TRAINeth85C−PO−TRAINgieR−PO−TRAINW eg80arevA757065050100150200Maximum Number of Offline OptimizationsFigure8:ExperimentalResultsforPacketScheduling:In-cludingtheLearningTimeonSorterSequences

forthemostadvancedalgorithm:theregretalgorithmR.(ItisalsosuperiorforalgorithmE,althoughthisisnotprac-ticallyrelevant).Inotherwords,withoutanyknowledgeofthedistributionandwithoutinducinganytimeoverhead,historicalsamplingisaseffectiveasthepartialobservabilitymodelthatmaintainsbeliefstatesbasedonobservations.

VehicleRouting

Wenowevaluatehistoricalsamplingonasigniﬁcantlymorecomplicatedproblem:onlinemultiplevehicleroutingwithtimewindows.Thisproblemwasstudiedinitiallyin(Bent&VanHentenryck2004c)toshowthevalueofstochasticinformationinvehiclerouting.Itisaperiodiconlineopti-mizationproblemswherehistorialdataistypicallyavailable.ProblemFormulationThevehicleroutingproblemsarespeciﬁedformallyin(Bent&VanHentenryck2004c)whereallthedetailscanbefound.Eachproblemcontainsadepot,anumberofcustomerregionsandanumberofcustomerser-vicerequestsfromtheregions.Eachrequesthasademand,

100OG95LOE−FOC−FO90R−FOssE−PO−TRAINoLC−PO−TRAIN deR−PO−TRAINth85gieW eg80arevA757065020406080100120140160180200Maximum Number of Offline OptimizationsFigure9:ExperimentalResultsforPacketSchedulingwhenMMParametersareUnknown:BalancingOptimizationandTraining105O100GLOE−LO95C−LOR−LOssE−AoLC−A 90deR−AthgieW85 egare80vA757065020406080100120140160180200Maximum Number of Offline OptimizationsFigure10:ExperimentalResultsonPacketSchedulingforHistoricalAveraging

aservicetime,andatimewindowspeciﬁedbyaninterval[e,l],whichrepresentstheearliestandlatestpossiblearrivaltimesrespectively.Thereareanumberofidenticalvehi-clesavailableforuse,eachwithcapacityQ.Avehicleroutestartsatthedepot,servessomecustomersatmostonce,andreturnstothedepot.Thedemandofarouteisthesumma-tionofthedemandofitscustomers.Aroutingplanisasetofroutesservicingeachcustomerexactlyonce.Aso-lutiontotheofﬂineVRPTWisaroutingplanthatsatisﬁesthecapacityconstraintsonthevehicleandthetimewindowconstraintsoftherequests.Theobjectiveistoﬁndaso-lutionmaximizingthenumberofservedcustomers.InthedynamicVRPTW,customerrequestsarenotknowninad-vanceandbecomeavailableduringthecourseoftheday.Ingeneral,anumberofrequestsareavailableinitially.

TheonlineVRPTWismuchmorecomplicatedthantheonlinepacketschedulinganddoesnotﬁtdirectlyinthestochasticframeworkpresentedearlier.First,decisionsaretriggeredbytwotypesofevents:newcustomerrequestsandthearrivalofavehicleatacustomersite.Second,ade-

100OG95LOE−FOC−FO90R−FOssE−SoLC−S de85R−SthgieW eg80arevA757065020406080100120140160180200Maximum Number of Offline Optimizations

Figure11:ExperimentalResultsonPacketSchedulingforHistoricalSampling

cisionconsistsofchoosingwhichcustomerstoservenextonagivenvehicle.Third,theonlinealgorithmmustde-cidewhethertoacceptorrejectcustomerrequestsimmedi-atelyandmustservicetheacceptedrequests.Finally,theVRPTWisahardNP-completeproblemwhoseinstancesareextremelydifﬁculttosolveoptimally.Only2to10of-ﬂineoptimizationscanbesolvedinbetweentwoeventsandthenumberofeventsislarge(e.g.,50differentrequests).Hence,theexpectationmethodisnotpracticalatall.ExperimentalSettingTheexperimentalresultsarebasedontheﬁvehardestclass-4problemsfrom(Bent&VanHen-tenryck2004c),wherealldetailscanbefound.TheyarederivedfromtheSolomonbenchmarkswhichareverychal-lengingandinvolve100customers.Theinstancesexhibitvariousdegreesofdynamism(i.e.,theratiobetweenknownanddynamiccustomers),differentdistributionsofearlyandlaterequests,aswellastimewindowsofverydifferentsizes.Hencetheycoverawidespectrumofpossibilitiesandstruc-tures.Thenumberofvehiclesavailableforthedynamicalgorithmswasdeterminedbysolvingtheofﬂineproblemsandaddingtwovehicles.

Historicalsamplinghaseither10or1000pastsamplesatitsdisposaldependingontheexperiments.Foragiventimet,theimplementationofmethodGETSAMPLEinthisperi-odicapplicationconsistsofchoosingoneofthesepastsam-plesandreturnsalltherequestsoccurringaftertimet.Com-paredtoearlieralgorithmswhichassumedperfectknowl-edgeofthedistribution,historicalsamplinglosestwokindsofinformation.Ontheonehand,itworksfromalimitedpoolofsamplesand,ontheotherhand,itlosessomede-pendencyinformation.Indeed,withaperfectdistribution,thearrivalofactualrequestsaffectsthearrivalprobabilityoffuturerequests,i.e.thearrivalofarequestinaregionandtimeperiodeliminatesfuturerequestsfromthatregioninthesametimeperiod(Bent&VanHentenryck2004c).ExperimentalResultsTable1reportsomepreliminaryresultstoevaluatethepotentialofhistoricalsampling.It

depictsthenumberofmissedcustomersfortheconsensus(C)andregret(R)algorithmsusingtheactualdistribution,aswellasthesameresultswhenhistoricalsamplingisusedinsteadoftheactualdistribution(HS(C)andHS(R)).Eachentryinthetableistheaverageof20independentrunswith5optimizationsbeingallowedinbetweenevents.Theal-gorithmallimplementtheleastcommitmentapproachde-scribedin(Bent&VanHentenryck2004a).Historicalsam-plingwasevaluatedwithahistoricaldatacontaining10and1000datapoints.Thesepreliminaryresultsindicatethathistoricalsamplingperformsverywellontheseproblems.Whenthehistoricaldata(1000datapoints)islarge,nosig-niﬁcantdifferencecanbeobservedbetweenhistoricalsam-plingandtheactualdistribution.Whenthehistoricaldataislimited(10datapoints),thequalityofthesolutionsseemstodecreaseslighlyforconsensus,whilethereisnorealdiffer-enceforregret.

Conclusion

Thispaperfocusedonlinestochasticoptimizationproblemswheretimeconstraintsseverelylimitthenumberofof-ﬂineoptimizationswhichcanbeperformedatdecisiontimeand/orinbetweendecisions.Thelastcoupleofyearswit-nessedsigniﬁcantprogressinthisarea,includingthede-signofnovelalgorithmsexploitingstochasticinformationtomakebetterschedulingandroutingdecisions.Experi-mentalresultshaveshownthat,wheneveradistributionoffuturerequestsisavailableforsampling,thesealgorithmsmayproducesigniﬁcantimprovementsinsolutionquality,whilerunningwithinthetimeconstraints.

Thispaperreconsidersthefundamentalassumptionun-derlyingthesealgorithms:thatadistributionoffuturere-quests,oranapproximationthereof,isavailableforsam-pling.Itrelaxedthatassumptionandstudiedwhethertheun-derlyingdistributioncanbelearned(whenpartofitsstruc-tureisknown)orwhetherhistoricaldatacanexploitedforsamplingpurposes.Bothcontinuousandperiodiconlineop-timizationwereconsidered.

Asfarasthemachinelearningapproachisconcerned,thepapershowedthattechniquessuchasbeliefstatesandtheBaum-WelschalgorithmforlearningMMscanbeengi-neeredtoworkinanonlinesetting.Theexperimentalresultsonpacketschedulingareverypromising:theyindicatethattheapproachpreservesmostofthebeneﬁtsoftheonlineal-gorithmsworkingontheactualdistribution.

Asfarashistoricaldataisconcerned,thepaperproposedtheideaofhistoricalsampling,whichconsistsofexploit-inghistoricaldataforsampling,i.e.,earliersequencesofrequestsinthecaseofcontinuousonlineoptimizationandpastinstancesinthecaseofperiodiconlineoptimization.Theexperimentalresults,bothonpacketscheduling(contin-uousoptimization)andvehiclerouting(periodicoptimiza-tion),showsthathistoricalsamplingisextremelyeffectiveandproducesthesamesolutionqualityastheactualdistri-bution.

Theseresultsclearlyboostthepotentialofonlinestochas-ticoptimization,sincetheysigniﬁcantlyweakentheirmainassumption.Inadditiontocaseswherepredictivemodelsareavailable,theyalsoshowthatonlinestochasticalgoritmscan

Problem

13.5014.5511.4514.0012.80

HS(C)10

14.3013.6011.6015.4016.00

10.4511.959.009.359.70

HIS(R)1000

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文