RussellBentandPascalVanHentenryck
BrownUniversityProvidence,RI02912{rbent,pvh}@cs.brown.edu
Abstract
Thispaperconsidersonlinestochasticschedulingprob-lemswheretimeconstraintsseverelylimitthenumberofofflineoptimizationswhichcanbeperformedatde-cisiontimeand/orinbetweendecisions.Priorresearchhasdemonstratedthat,wheneveradistributionoftheinputsisavailableforsampling,onlinestochaticalgo-rithmsmayproducesignificantimprovementsinsolu-tionqualityoverobliviousapproaches.However,theavailabilityofaninputdistribution,althoughreason-ableinmanycontexts,istoostrongarequirementinavarietyofapplications.Thispaperbroadenstheap-plicabilityofonlinestochasticalgorithmsbyrelaxingthisrequirementandusingmachinelearningtechniquesorhistoricaldatainstead.Inparticular,itshowsthatmachinelearningtechniquescanbeengineeredtolearnthedistributiononline,whenitsunderlyingstructureisavailable.Moreover,thepaperpresentstheideaofhistoricalsamplingwhichprovideasimpleandeffec-tivewaytoleveragehistoricaldataincontinuousandperiodiconlineoptimization.Experimentalresultsonpacketschedulingandvehicleroutingindicatethepo-tentialofmachinelearningandhistoricalsamplingforonlinescheduling.
Introduction
Onlineschedulingandroutingproblemsarisenaturallyinmanyapplicationareasandhavereceivedincreasingatten-tioninrecentyears.Contrarytoofflineoptimization,thedataisnotavailableaprioriinonlineoptimization.Ratheritisincrementallyrevealedduringalgorithmexecution.Inmanyonlineoptimizationproblems,thedataisasetofre-quests(e.g.,packetsinnetworkschedulingorcustomersinvehiclerouting)whicharerevealedovertimeandthealgo-rithmmustdecidewhichrequesttoprocessnext.Inaddi-tion,timeconstraintstypicallyrestrictthenumberofofflineoptimizationsthatcanbeperformedatdecisiontimeand/orinbetweendecisions.Onlineproblemsofthiskindariseinmanyapplications,includingvehiclerouting,taxidispach-ing,packetscheduling,andonlinedeliveries.Itisalsousefultodistinguishtwoclassesofsuchproblems:continuousandperiodicoptimization.Incontinuousoptimation,theonline
initiallyproposedin(Chang,Givan,&Chong2000),thedistributionsarespecifiedintermsofMarkovModels(MM)while,invehiclerouting,onlyhistoricaldatafromprevi-ousdaysisavailable.Theexperimentalresultsarepartic-ularlyinteresting.Onpacketschedulingproblems,theMLapproachcanbeengineeredtobesufficientlyfastandpre-cisesothatonlinealgorithmsbecomeessentiallycompara-blewhenusingtheexactorthelearneddistributions.More-over,historicalsampling,whichisamazinglysimpletoim-plement,isalsoshowntobecomparablewiththeexactdis-tribution.Thevalueofhistoricalsamplingisalsodemon-stratedonvehiclerouting,confirmingitseffectivenessofbothcontinuousandperiodicapplications.Interestlingly,historicalsamplingremainseffectiveevenifthehistoricaldataisrelativelysmall.
Asaconsequence,thepaperdemonstratesthatonlinestochasticoptimizationalgorithmscanbeeffectiveevenwithoutdistributions,broadeningtheirapplicabilitysignifi-cantlythankstotheuseofmachinelearningortheexploita-tionofhistoricaldata.
Therestofthispaperisorganizedasfollows.Thefirstthreesectionspresenttheonlinestochasticframework,theonlinestochasticalgorithms,thesamplingalgorithmwhenthedistributionisknown.Thenextthreesectionsconsti-tutethecoreofthepaper.TheydescribetheMLandHDapproachesandtheirexperimentalresults.Thelastsectionconcludesthepaper.
TheOnlineStochasticFramework
Thissectionpresentstheonlinestochasticframeworkinitssimplestformtocristallizetheideas.Itsgeneralizationsaredescribedindetailin(Bent&VanHentenryck2004a).TheOfflineProblemTheframeworkassumesadiscretemodeloftimeandtheofflineproblemconsidersatimehori-zonH=[HH]andanumberofrequestsR.Eachrequestrisassociatedwithaweightw(r)whichrepresentsthegainiftherequestisserved.Asolutiontotheofflineproblemservesarequestrateachtimet∈HandcanbeviewedasafunctionH→R.Solutionsmustsatisfytheproblem-specificconstraintswhichareleftunspecifiedintheframe-work.Thegoalistofindafeasiblesolutionσmaximiz-ingt∈Hw(σ(t)).Intheonlineversion,therequestsarenotavailableinitiallyandbecomeprogressivelyavailableateachtimestep.
TheOnlineProblemTheonlinealgorithmshaveattheirdisposalaproceduretosolve,orapproximate,theofflineproblem,aswellasthedistributionoffuturerequests.Thedistributionisconsideredasablack-boxwhichisonlyavail-ableforsampling.Since,oncontinuousapplications,itisnotpracticaltosamplethedistributionfortheentiretimehorizon,thedesiredsizeofthesamplesisasamplingargu-ment.
TimeConstraintsPracticalapplicationsoftenincludese-veretimeconstraintsonthedecisiontimeand/oronthetime
ONLINEOPTIMIZATION(H)
1R←∅;2w←0;3fort∈H
4doΘ(t)←NEWREQUESTS(t);
5R←AVAILABLEREQUESTS(R,t)∪Θ(t);6r←CHOOSEREQUEST(R,t);7SERVEREQUEST(r,t);8w←w+w(r);9R←R\\{r};
Figure1:TheGenericOnlineAlgorithm
betweendecisions.Tomodelthisrequirement,thealgo-rithmsmayonlyusetheofflineprocedureOtimesateachtimestep.
PropertiesoftheFrameworkTheframeworkisgeneralenoughtomodelavarietyofpracticalapplications,yetithassomefundamentalcomputationaladvantagescomparedtoothermodels.Thekeyobservationisthat,inmanypracti-calapplications,theuncertaintydoesnotdependonthede-cisions.Thereisnoneedtoexploresequencesofdecisionsand/ortreesofscenarios:thedistributioncanbesampledtoprovidescenariosofthefuturewithoutconsideringthede-cisions.Asaconsequence,theframeworkprovidessignif-icantcomputationaladvantagesovermoregeneralmodelssuchasmulti-stagestochasticprogramming(Birge&Lou-veaux1997),whichisanofflineframework,andpartiallyobservableMarkovdecisionprocesses(POMDPs)(Kael-bling,Littman,&Cassandra1998),wheretheuncertaintycandependonthedecisionsAsmentionedearlier,theframe-workcanbenaturallyextendedtoincorporateservicecom-mitments,multipledecisions,andimmediatedecisionmak-ing(Bent&VanHentenryck2004a).
TheGenericOnlineAlgorithmThealgorithmsinthispapersharethesameonlineoptimizationschemadepictedinFigure1.Theydifferonlyinthewaytheyimplementfunc-tionCHOOSEREQUEST.Theonlineoptimizationschemasimplyconsidersthesetofavailableandnewrequestsateachtimestepandchoosesarequestrwhichisthenservedandremovedfromthesetofavailablerequests.FunctionAVAILABLEREQUEST(R,t)returnsthesetofrequestsavail-ableforserviceattimetandfunctionSERVEREQUEST(r,t)simplyservesrattimet(σ(t)←r).ToimplementfunctionCHOOSEREQUEST,thealgorithmshaveattheirdisposaltwoblack-boxes:
1.afunctionOPTSOL(R,t)that,givenasetRofrequestsandatimet,returnsanoptimalsolutionforRover[t,∞];2.afunctionGETSAMPLE([t,∆])thatreturnsasetofre-questsovertheinterval[t,t+∆]bysamplingthearrivaldistribution.Theimplementationofthisblackboxisthefocusoftheworkpresentedhere.Toillustratetheframework,wespecifytwoobliviousalgo-rithmsasinstantiationsofthegenericalgorithm.Theseal-gorithmswillserveasabasisforcomparison.
Greedy(G):Thisalgorithmservestheavailablerequestwithhighestweight.Itcanbespecifiedformallyas
CHOOSEREQUEST-G(R,t)1A←READY(R,t);
2returnargmax(r∈A)w(r);
LocalOptimal(LO):Thisalgorithmchoosesthenextrequesttoserveattimetbyfindingtheoptimalsolutionfortheavailablerequestsatt.Itcanbespecifiedas
CHOOSEREQUEST-LO(R,t)1σ←OPTSOL(R,t);
2returnσ(t);
OnlineStochasticAlgorithms
Thissectionreviewsthevariousonlinestochasticalgorithmsconsideredinthispaper.Itstartswiththeexpectational-gorithm,proposedforinstancein(Chang,Givan,&Chong2000),andshowshowitcanbeadaptedtoincorporatetimeconstraints.
Expectation(E):AlgorithmEchoosestheactionmaxi-mizingexpectationateachtimestep.Informallyspeaking,themethodgeneratesfuturerequestsbysamplingandevaluateseachavailablerequestagainstthatsample.Asimpleimplementationcanbespecifiedasfollows:
CHOOSEREQUEST-E(R,t)1A←READY(R,t);
2forr∈A3dof(r)←0;4fori←1...O/|A|5doS←R∪GETSAMPLE([t+1,∆]);6forr∈A7doσ←OPTSOL(S\\{r},t+1);8f(r)←f(r)+w(r)+w(σ);9returnargmax(r∈A)f(r);
Line1computestherequestswhichcanbeservedattimet.Lines2-3initializetheevaluationfunctionf(j)foreachre-questr.Thealgorithmthengeneratesanumberofsamplesforfuturerequests(line4).Foreachsuchsample,itcom-putesthesetRofallavailableandsampledrequestsattimet(line5).Thealgorithmthenconsiderseachavailablerequestrsuccessively(line6),itimplicitlyschedulesrattimet,andappliestheoptimalofflinealgorithmusingS\\{r}andthetimehorizon(line7).Theevaluationofrequestrisupdatedinline8byincrementingitwithitsweightandthescoreofthecorrespondingoptimalofflinesolution.Allscenariosareevaluatedforallavailablerequestsandthealgorithmthenreturnstherequestr∈Awiththehighestevaluation.Ob-serveLine4ofAlgorithmEwhichdistributestheavailableofflineoptimizationsOacrossallavailablerequestsA.WhenOissmall(duetothetimeconstraints),eachre-questisonlyevaluatedwithrespecttoasmallnumberofsamplesandthealgorithmEdonotyieldmuchinforma-tion.Thisispreciselywhyonlinevehicleroutingalgorithms(Bent&VanHentenryck2004c)cannotuseE,sincethenumberofrequestsisverylarge(about50to100),thetime
betweendecisionsisrelativelyshort,andoptimizationsarecomputationallydemanding.Tocopewithtimeconstraints,twoapproximationsofalgorithmE,consensusandregret,havebeenproposed.
Consensus(C):TheconsensusalgorithmCwasintro-ducedin(Bent&VanHentenryck2004d)asanabstractionofthesamplingmethodusedinonlinevehiclerouting(Bent&VanHentenryck2004c).ItskeyideaistosolveeachscenarioonceandthustoexamineOscenariosinsteadofO/|A|.Moreprecisely,insteadofevaluatingeachpossiblerequestattimetwithrespecttoeachsample,algorithmCexecutestheofflinealgorithmontheavailableandsampledrequestsoncepersample.Therequestscheduledattimetinoptimalsolutionσiscreditedw(σ)andallotherrequestsreceivenocredit.AlgorithmCcanbespecifiedasfollows:
CHOOSEREQUEST-C(R,t)
1forr∈R2dof(r)←0;3fori←1...O4doS←R∪GETSAMPLE([t+1,∆]);5σ←OPTSOL(S,t);6f(σ(t))←f(σ(t))+w(σ);7returnargmax(r∈R)f(r);
Observeline5whichcallstheofflinealgorithmwithallavailableandsampledrequestsandatimehorizonstartingattandline6whichincrementsthenumberoftimesrequestσ(t)isscheduledfirst.Line7simplyreturnstherequestwiththelargestscore.ThemainappealofAlgorithmCisitsabil-itytoavoidpartitioningtheavailablesamplesbetweentherequests,whichisasignificantadvantagewhenthenumberofsamplesissmalland/orwhenthenumberofrequestsislarge.Itsmainlimitationisitselitism.Onlythebestrequestisgivensomecreditforagivensample,whileotherrequestsaresimplyignored.Itignoresthefactthatseveralrequestsmaybeessentiallysimilarwithrespecttoagivensample.Moreover,itdoesnotrecognizethatarequestmayneverbethebestforanysample,butmaystillbeextremelyrobustoverall.1
Regret(R):Theregretalgorithmshowshowtogatherthatkindofinformationfromthesamplesolutionswithoutsolv-ingadditionaloptimizationproblems.Itskeyideaistoap-proximatethedeviationofarequest,i.e.,thecostofschedul-ingasuboptimalrequestattimet.
Definition1(Deviation)LetRbethesetofrequestsattimetandr∈R.ThedeviationofrwrtRandt,denotedbyDEVIATION(r,R,t),isdefinedas
|w(OPTSOL(R,t))−(w(r)+w(OPTSOL(R\\{r},t+1)))|.
AlgorithmRistherecognitionthat,inmanyapplications,itispossibletoestimatethedeviationofarequestrattimetquickly.Inotherwords,oncetheoptimalsolutionσofascenarioiscomputed,itiseasytocomputethedeviationofalltherequests,thusapproximatingEwithoneoptimiza-tion.Thisintuitioncanbeformalizedusingtheconceptofregret.
Definition2(Regret)Givenarequestr,asetR(r∈R),atimet,,andanoptimalsolutionσ=OPTSOL(R,t),aregretisafunctionthatover-approximatesthedeviationofrwrtRandt,i.e.,
REGRET(r,R,t,σ)≥DEVIATION(r,R,t).
Moreover,thereexisttwofunctionsfoandfrsuchthat•OPTSOL(R,t)runsintimeO(fo(R,∆));•REGRET(r,R,t,σ)runsintimeO(fr(R,∆));•|A|fr(R,∆)isO(fo(R,∆)).
Intuitively,thecomplexityrequirementstatesthatthecomputationofthe|A|regretsdoesnottakemoretimethantheoptimization.Regretstypicallyexistinpracticalapplications.Inanonlinefacilitylocationproblem,theregretofopeningafacilityfcanbeestimatedbyevaluatingthecostofclosingtheselectedfacilityσ(t)andopeningf.Invehiclerouting,theregretofservingacustomercnextcanevaluatedbyswappingcwiththefirstcustomeronthevehicleservingc.Inpacketscheduling,theregretofservingapacketpcanbeestimatedbyswappingand/orservingaconstantnumberofpackets.Inallcases,thecostofcomputingtheregretissmallcomparedtothecostoftheofflineoptimizationandsatisfytheaboverequirements.Notethatthereisaninterestingconnectiontolocalsearch,sincecomputingtheregretmaybeviewedasevaluatingthecostofalocalmovefortheapplicationathand.WearenowreadytopresenttheregretalgorithmR:
CHOOSEREQUEST-R(R,t)
1forr∈R2dof(r)←0;3fori←1...O4doS←R∪GETSAMPLE([t+1,∆]);5σ←OPTSOL(S,t);6f(σ(t))←f(σ(t))+w(σ);7forr∈READY(R,t)\\{σ(t)}8dof(r)←f(r)+(w(σ)−REGRET(σ,r,R,t));9returnargmax(r∈R)f(r);
ItsbasicorganizationfollowsalgorithmC.However,insteadofassigningsomecreditonlytotherequestselectedattimetforagivensamples,algorithmR(lines7-8)usestheregretstocompute,foreachavailablerequestr,anapproximationofthebestsolutionofsservingrattimet,i.e.,w(σ)−REGRET(σ,r,R,t).Henceeveryavailablerequestisgivenanevaluationforeverysampleattimetforthecostofasingleofflineoptimization(asymptotically).ObservethatalgorithmRperformsOofflineoptimizationsattimet.
Sampling
Todescribethestochasticalgorithmsentirely,itisneces-sarytospecifyfunctionGETSAMPLE,i.e.,howtogener-atesamplesfromthedistribution.Tomakethepresentation
a: .95b: .012b: .30a: .02a: .05b: .02b: .1513a: .10a: .50b: .60a: .05b: .25Figure2:ExampleMarkovModel
concrete,thispaperassumesthattheunderlyingdistributionisaMarkovmodel(Feller1957),althoughtheprinciplescaneasilybeadaptedtoavarietyofdistributions.Inad-dition,thepaperassumesforsimplicitythatthestructureoftheMarkovmodeldirectlyreflectstheuncertaintypresentintheonlinestochasticoptimizationframework.Morepre-cisely,eachstates∈SintheMarkovmodelrepresentsadifferentarrivalprobabilityfortherequestsandthetransi-tionsT(s1,s2)betweenstatess1ands2representchangesinthearrivalratesoftherequests.Finally,thetransitionfunctionTisaugmentedbyathirdparameterspecifyingthesetofrequestsgeneratedbyatransition.ThusT(s1,s2,w)representstheprobabilityofgoingfroms1tosw.Observethatatransition(possibly2andpro-ducingoutputaself-transition)istakenateverytimesteptinordertogeneratetherequests.Figure2providesanexampleofsuchaMarkovmodel,withtwopossibleoutputs,aandb.
Giventheseassumptions,itiseasytospecifyfunctionGETSAMPLEforfullyobservabledistributions:
GETSAMPLE-FO(t,∆)
1returnRANDOMWALK(St,∆);
Thealgorithmsimplyperformsarandomwalkof∆stepsintheMarkovmodelbeginningatthestateobservedattimet.Therandomwalkisspecifiedas
RANDOMWALK(s,∆)
1R←∅;
2fori←1...∆3do←SELECTRANDOMTRANSITION(s,T);
4R←R∪w;5s←s;6returnR;
wherelines2-5walkthroughtheMarkovmodelfor∆steps.Line3choosesatransitionatrandombasedonthecurrentstateandthetransitionfunction.Line4storestherequestsgeneratedonthattransition.Therequestsgeneratedduringtherandomwalkarereturnedinline6.
LearningtheDistributionOnline
Itisunrealisticinpracticetoassumeafullyobservabledis-tributionasinputtotheonlineoptimizationalgorithm.Asa
consequence,thissectioninvestigateshowtorelaxthisas-sumptionbyusingmachinelearningtechniques.Itstartsbyrelaxingfullobservabilitybeforeconsideringthecasewherethedistributionparametersarealsounknown.
PartiallyObservableDistributionsEvenifaprecisedis-tributionisavailablefortherequests,itisunreasonabletoassumethatfunctionGETSAMPLEknowsthecurrentstateoftheMarkovmodel.Partialobservabilityisamuchmorere-alisticassumptionadoptedinmanyuncertaintyframeworks(e.g.,(Kaelbling,Littman,&Cassandra1998)).Underpar-tialobservability,thestateStisunknownandcanonlybeinferredthroughobservations,thatisthroughthesetofre-questsgeneratedateachtimestep.
Toinferthecurrentstate,thesamplingalgorithmmakesuseofbeliefstates,afundamentalideafromPOMDPre-search.Moreprecisely,thealgorithmmaintains,foreachstates∈S,aprobabilityBt(s)ofbeinginstatesattimetandupdatestheseprobabilitiesaftereveryobservation.Animplementationofsamplingunderpartialobservabilityisasfollows:
GETSAMPLE-PO(t,∆)
1fors∈S2doBt(s)←PR(s|Θ(t),Bt−1);3s←RANDOM(S,Bt);
4returnRANDOMWALK(s,∆);
whereRANDOM(S,B)selectsarequestsinSwithproba-bilityB(s).Line2updatesthebeliefprobabilitiesbycon-ditioningtheprobabilityofbeinginstatesattimetonthebeliefstatesattimet−1andtherequestsobservedattimet.Line3choosesaninitialstaterandomlyusingthebeliefprobabilitiesandline4returnsasetofrequestsbasedonarandomwalkbeginningatstates.
StructuralModeloftheDistributionInmanycircum-stances,evenpartialobservabilityistoostrongarequire-ment.Inparticular,thestructureofthedistributionmaybeknownbutnotitsparameters.Toinfertheseparame-ters,theimplementationofGETSAMPLEmayresorttotheBaum-Welschalgorithm(Baum1972;Charniak1993)forlearningtheparametersofaMarkovmodelfromasequenceofrequests.Thislearningalgorithmstartsbyinitializingthetransitionprobabilitiesarbitrarily.Itthenusestheexistingprobabilitiesandthetrainingsequencetorefinetheproba-bilities.Theprocessisrepeatedwiththenewprobabilitiesuntilthealgorithmconverges.Thealgorithm
GETSAMPLE-PO-TRAIN(t,∆)
1whileConverging2dofors∈S3doαs(0)=Bt−λ(s);4fori←1...λ5doαs(i)←s∈Sαs
(i−1)T(s,s,Θ(t−λ+i));
6βs(λ)=Bt(s);7fori←(λ−1)8doβs(i)←
...0s∈S
βs(i+1)T(s,s,Θ(t−i));9
fors∈10doC
S,s∈S,w∈(s,s
,w)←Ωλαs(t)T(s,s,w)βs(t+11
fors∈S,s∈S,w∈Ω
t=1
1)12
doT(s,s,w)←
C
(s,s,w)
6fori←1...∆λ
;
7doR←R∪RANDOM(P,Ω)8returnR;
countsthenumberoftimeseachpossibleoutputisobservedinthelastλtimesteps(line3)andusesthesenumberstocomputeaprobabilityforthenextoutput(line5).Line7generatesrandomrequestsbasedontheprobabilitiesofthepossibleoutputs.
HistoricalSamplingOneofthedrawbacksofaveragingisthatitlosesthesequencestructure(e.g.,changesinre-questarrivalrates).Historicalsamplingaimsatovercomingthisdrawbackbyusingpastsubsequencesassamples.Itsimplementation
GETSAMPLE-S(t,∆)
1t←RANDOM([0,t−∆]);
2R←∅;
3fori←t...t+∆4doR←R∪Θ(i)5returnR;
isamazinglysimple:Itrandomlyselectsapositioninthepastsequenceofrequestsandgeneratesasampleofsize∆fromthatstartingposition.
Inadditiontoitsconceptualandimplementationsimplic-ity,historicalsamplingisappeallingforavarietyofreasons.First,andcontrarytohistoricalaveraging,itcapturesstruc-turalinformationonthesequence.Second,whenevertheunderlyingdistributionisaMarkovmodel,theresultingse-quencecorrespondstoarandomwalkinthemodelfromarandomstate.Asaconsequence,historicalsamplingim-plicitlyimplementsaversionofthepartialobservabilityal-gorithmwherethebeliefstatesareapproximatedbythestatefrequencies.Third,inthecaseofperiodicoptimization,his-toricalsamplingsimplyamountstousingpastinstancesassamplesofthe(unknowndistribution)withtheadditionalbenefitthatthestartingstateisknowninthiscase.Asaconsequence,historicalsamplingisasimpleandeffectivetechniquetoexploithistoricaldataincontinuousandperi-odiconlineoptimization.
PacketScheduling
Thissectionreportsexperimentalresultsontheonlinepacketschedulingproblemfirststudiedin(Chang,Givan,&Chong2000).Thisnetworkingapplicationisofinterestexperimentallysince(1)thenumberofrequeststoconsiderateachtimetissmalland(2)theofflinealgorithmcanbesolvedinpolynomialtime.Asaresult,itispossibletoeval-uateallthealgorithmsexperimentally,contrarytovehicleroutingapplicationswherethisisnotpractical.ThepacketschedulingisalsointerestingasitfeaturesacomplexarrivaldistributionforthepacketsbasedonMarkovmodels.TheOfflineProblemWearegivenasetJobsofjobsparti-tionedintoasetofclassesC.Eachjobjischaracterizedbyitsweightw(j),itsarrivaldatea(j),anditsclassc(j).Jobsinthesameclasshavethesameweight(butdifferentarrivaltimes).WearealsogivenaschedulehorizonH=[HH]duringwhichjobsmustbescheduled.Eachjobjrequiresasingletimeunittoprocessandmustbescheduledinitstimewindow[a(j),a(j)+d],wheredisthesameconstantfor
100OG95LOEC90RssoL deth85gieW eg80arevA757065020406080100120140160180200Maximum Number of Offline OptimizationsFigure3:FullyObservableSampling
alljobs(i.e.,drepresentsthetimeajobremainsavailabletoschedule).Inaddition,notwojobscanbescheduledatthesametimeandjobsthatcannotbeservedintheirtimewin-dowsaredropped.Thegoalistofindascheduleofmax-imalweight,i.e.,aschedulewhichmaximizesthesumoftheweightsofallscheduledjobs.Thisisequivalenttomin-imizingweightedpacketloss.Moreformally,assume,forsimplicityandwithoutlossofgenerality,thatthereisajobscheduledateachtimestepoftheschedulehorizon.Underthisassumption,ascheduleisafunctionσ:H→Jobswhichassignsajobtoeachtimeintheschedulehorizon.Ascheduleσisfeasibleif
∀t1,t2∈H:t1=t2→σ(t1)=σ(t2)∀t∈H:a(σ(t))≤t≤a(σ(t))+d.
Theweightofascheduleσ,denotedbyw(σ),isgivenbyw(σ)=t∈Hw(σ(t)).Thegoalistofindafeasiblesched-uleσmaximizingw(σ).ThisofflineproblemcanbesolvedinquadratictimeO(|J||H|).
TheOnlineProblemTheexperimentalevaluationisbasedontheproblemsof(Chang,Givan,&Chong2000;Bent&VanHentenryck2004d),whereallthedetailscanbefound.Intheseproblems,thearrivaldistributionsarespec-ifiedbyindependentMMs,oneforeachjobclasssimilarinformasseeninFigure2.Thus,eachclasscanbethoughtofashavingitsownblackboxforgeneratingrequestsamples.Theresultsaregivenforthereference7-classproblemsandforanonlinescheduleconsistingof200,000timesteps.Be-causeitisimpracticaltosamplethefutureforsomanysteps,thealgorithmsuseasamplinghorizonof50,whichseemstobeanexcellentcompromisebetweentimeandquality.Theregretfunctionisgivenin(Bent&VanHentenryck2004b)andconsistsofswappingaconstantnumberofpacketsintheoptimalschedule.
SamplingwiththeExactDistributionFigure3depictstheaveragepacketlossasafunctionofthenumberofavail-ableoptimizationsOforthevariousalgorithmsonavariety
100OG95LOE−FOC−FO90R−FOssE−POoLC−PO de85R−POthgieW eg80arevA757065020406080100120140160180200Maximum Number of Offline Optimizations
Figure4:PartiallyObservableSampling
of7-classproblemswhentheMarkovModelisfullyob-servable.Italsogivestheoptimal,aposteriori,packetlossO.TheresultsindicatethevalueofstochasticinformationasalgorithmEsignificantlyoutperformstheobliviousalgo-rithmsGandLOandbridgemuchofthegapbetweenthesealgorithmsandtheoptimalsolution.NotethatLOisworsethanG,illustratingthe(frequent)pathologicalbehaviorofover-optimizing.
ConsensusisdominatedbyEwhenthenumberofavail-ableoptimizationsincreases,althoughitstillproducessig-nificantimprovementsovertheobliviousalgorithms.Thisisofcoursepertinent,sinceEisnotpracticalformanyprob-lemswithtimeconstraints.Finally,thebenefitsoftheregretalgorithmareclearlyapparent.AlgorithmRindeeddom-inatesalltheotheralgorithms,includingconsensuswhenthereareveryfewofflineoptimizations(strongtimecon-straints)andexpectationevenwhenthereareareasonablylargenumberofthem(weaktimeconstraints).Reference(Bent&VanHentenryck2004b)alsoshowsthistobethecaseforcomplexonlinevehicleroutingwithtimewindows.PartialObservabilityFigure4considersthecasewhentheMarkovmodelispartiallyobservableandcomparesittothefullyobservablecase.Itisinterestingtonotethatthereisnosignificantdegradationinsolutionquality,indicatingthatmaintainingthebeliefprobabilitiesissufficient,onthisproblem,toobtaingoodresults.
StructuralModelFigure5depictstheexperimentalre-sultswhentheMMparametersarelearned.Inordertogetreasonableresults,λissetto560.Theresultsarecompa-rabletotheresultsiftheMMisfullyobservable.However,theseresultsaremisleading,astheydonottakeintoaccounttimespentintraining.Figure6showstheroughamountoftimeittakestotrainonsequencesofacertainsizeintermsoftheequivalentnumberofoptimizations.Ascanbeseenfromthisplot,trainingonasequenceoflength560isroughlyequivalenttoperforming230optimizationseverytimestep!Ifthatistakenintoaccount,theresultslooklikeFigure7.Toacertainextent,thislimitationcanbeover-
100OG95LOE−FOC−FO90R−FOssE−PO−TRAINoLC−PO−TRAIN de85R−PO−TRAINthgieW eg80arevA757065020406080100120140160180200Maximum Number of Offline OptimizationsFigure5:ExperimentalResultsforPacketSchedulingafterLearningtheMMParameters600Optimization500ez400iS elpmaS300 gniniarT2001000050100150200250Equivalent Number of OptimizationsFigure6:Trainingvs.OptimizationTimeTradeoffcomebyreducingthesizeofthetrainingsequence,i.e.,140timesteps,asdepictedinFigure8,albeitatalossofover-allsolutionquality.Amorepromisingdirectionistoapplythelearningalgorithmperiodicallyinsteadofateachtimestepsinordertoamortizethecostoftraining.ThisisshowninFigure9,whichindicatesthefeasibilityofsophisticatedlearningtechniquesinanonlinesetting.Indeed,thelossinsolutionqualitycomparedtotheexactdistributionissmallandtheimprovementwithrespecttoobliviousalgorithmsissignificant.
HistoricalAveragingFigure10depictstheexperimentalresultsforhistoricalaveraging.Thesearetheweakestre-sultspresentedhere,yetthissimplemethodstillproducessignificantimprovementsovertheobliviousalgorithms.Theresultsaregivenforλ=500,whichachievesthebestresultsexperimentally.
HistoricalSamplingFinally,Figure11depictstheexper-imentalresultsforhistoricalsamplingwhichareveryinter-esting.Indeed,historicalsamplingproducesthebestresults
100OG95LOE−FOC−FO90R−FOssE−PO−TRAINoLC−PO−TRAIN deR−PO−TRAINth85gieW eg80arevA757065050100150200250300350400Maximum Number of Offline Optimizations
Figure7:ExperimentalResultsforPacketScheduling:In-cludingtheLearningTime100O95GLOE−FO90C−FOssoR−FOL dE−PO−TRAINeth85C−PO−TRAINgieR−PO−TRAINW eg80arevA757065050100150200Maximum Number of Offline OptimizationsFigure8:ExperimentalResultsforPacketScheduling:In-cludingtheLearningTimeonSorterSequences
forthemostadvancedalgorithm:theregretalgorithmR.(ItisalsosuperiorforalgorithmE,althoughthisisnotprac-ticallyrelevant).Inotherwords,withoutanyknowledgeofthedistributionandwithoutinducinganytimeoverhead,historicalsamplingisaseffectiveasthepartialobservabilitymodelthatmaintainsbeliefstatesbasedonobservations.
VehicleRouting
Wenowevaluatehistoricalsamplingonasignificantlymorecomplicatedproblem:onlinemultiplevehicleroutingwithtimewindows.Thisproblemwasstudiedinitiallyin(Bent&VanHentenryck2004c)toshowthevalueofstochasticinformationinvehiclerouting.Itisaperiodiconlineopti-mizationproblemswherehistorialdataistypicallyavailable.ProblemFormulationThevehicleroutingproblemsarespecifiedformallyin(Bent&VanHentenryck2004c)whereallthedetailscanbefound.Eachproblemcontainsadepot,anumberofcustomerregionsandanumberofcustomerser-vicerequestsfromtheregions.Eachrequesthasademand,
100OG95LOE−FOC−FO90R−FOssE−PO−TRAINoLC−PO−TRAIN deR−PO−TRAINth85gieW eg80arevA757065020406080100120140160180200Maximum Number of Offline OptimizationsFigure9:ExperimentalResultsforPacketSchedulingwhenMMParametersareUnknown:BalancingOptimizationandTraining105O100GLOE−LO95C−LOR−LOssE−AoLC−A 90deR−AthgieW85 egare80vA757065020406080100120140160180200Maximum Number of Offline OptimizationsFigure10:ExperimentalResultsonPacketSchedulingforHistoricalAveraging
aservicetime,andatimewindowspecifiedbyaninterval[e,l],whichrepresentstheearliestandlatestpossiblearrivaltimesrespectively.Thereareanumberofidenticalvehi-clesavailableforuse,eachwithcapacityQ.Avehicleroutestartsatthedepot,servessomecustomersatmostonce,andreturnstothedepot.Thedemandofarouteisthesumma-tionofthedemandofitscustomers.Aroutingplanisasetofroutesservicingeachcustomerexactlyonce.Aso-lutiontotheofflineVRPTWisaroutingplanthatsatisfiesthecapacityconstraintsonthevehicleandthetimewindowconstraintsoftherequests.Theobjectiveistofindaso-lutionmaximizingthenumberofservedcustomers.InthedynamicVRPTW,customerrequestsarenotknowninad-vanceandbecomeavailableduringthecourseoftheday.Ingeneral,anumberofrequestsareavailableinitially.
TheonlineVRPTWismuchmorecomplicatedthantheonlinepacketschedulinganddoesnotfitdirectlyinthestochasticframeworkpresentedearlier.First,decisionsaretriggeredbytwotypesofevents:newcustomerrequestsandthearrivalofavehicleatacustomersite.Second,ade-
100OG95LOE−FOC−FO90R−FOssE−SoLC−S de85R−SthgieW eg80arevA757065020406080100120140160180200Maximum Number of Offline Optimizations
Figure11:ExperimentalResultsonPacketSchedulingforHistoricalSampling
cisionconsistsofchoosingwhichcustomerstoservenextonagivenvehicle.Third,theonlinealgorithmmustde-cidewhethertoacceptorrejectcustomerrequestsimmedi-atelyandmustservicetheacceptedrequests.Finally,theVRPTWisahardNP-completeproblemwhoseinstancesareextremelydifficulttosolveoptimally.Only2to10of-flineoptimizationscanbesolvedinbetweentwoeventsandthenumberofeventsislarge(e.g.,50differentrequests).Hence,theexpectationmethodisnotpracticalatall.ExperimentalSettingTheexperimentalresultsarebasedonthefivehardestclass-4problemsfrom(Bent&VanHen-tenryck2004c),wherealldetailscanbefound.TheyarederivedfromtheSolomonbenchmarkswhichareverychal-lengingandinvolve100customers.Theinstancesexhibitvariousdegreesofdynamism(i.e.,theratiobetweenknownanddynamiccustomers),differentdistributionsofearlyandlaterequests,aswellastimewindowsofverydifferentsizes.Hencetheycoverawidespectrumofpossibilitiesandstruc-tures.Thenumberofvehiclesavailableforthedynamicalgorithmswasdeterminedbysolvingtheofflineproblemsandaddingtwovehicles.
Historicalsamplinghaseither10or1000pastsamplesatitsdisposaldependingontheexperiments.Foragiventimet,theimplementationofmethodGETSAMPLEinthisperi-odicapplicationconsistsofchoosingoneofthesepastsam-plesandreturnsalltherequestsoccurringaftertimet.Com-paredtoearlieralgorithmswhichassumedperfectknowl-edgeofthedistribution,historicalsamplinglosestwokindsofinformation.Ontheonehand,itworksfromalimitedpoolofsamplesand,ontheotherhand,itlosessomede-pendencyinformation.Indeed,withaperfectdistribution,thearrivalofactualrequestsaffectsthearrivalprobabilityoffuturerequests,i.e.thearrivalofarequestinaregionandtimeperiodeliminatesfuturerequestsfromthatregioninthesametimeperiod(Bent&VanHentenryck2004c).ExperimentalResultsTable1reportsomepreliminaryresultstoevaluatethepotentialofhistoricalsampling.It
depictsthenumberofmissedcustomersfortheconsensus(C)andregret(R)algorithmsusingtheactualdistribution,aswellasthesameresultswhenhistoricalsamplingisusedinsteadoftheactualdistribution(HS(C)andHS(R)).Eachentryinthetableistheaverageof20independentrunswith5optimizationsbeingallowedinbetweenevents.Theal-gorithmallimplementtheleastcommitmentapproachde-scribedin(Bent&VanHentenryck2004a).Historicalsam-plingwasevaluatedwithahistoricaldatacontaining10and1000datapoints.Thesepreliminaryresultsindicatethathistoricalsamplingperformsverywellontheseproblems.Whenthehistoricaldata(1000datapoints)islarge,nosig-nificantdifferencecanbeobservedbetweenhistoricalsam-plingandtheactualdistribution.Whenthehistoricaldataislimited(10datapoints),thequalityofthesolutionsseemstodecreaseslighlyforconsensus,whilethereisnorealdiffer-enceforregret.
Conclusion
Thispaperfocusedonlinestochasticoptimizationproblemswheretimeconstraintsseverelylimitthenumberofof-flineoptimizationswhichcanbeperformedatdecisiontimeand/orinbetweendecisions.Thelastcoupleofyearswit-nessedsignificantprogressinthisarea,includingthede-signofnovelalgorithmsexploitingstochasticinformationtomakebetterschedulingandroutingdecisions.Experi-mentalresultshaveshownthat,wheneveradistributionoffuturerequestsisavailableforsampling,thesealgorithmsmayproducesignificantimprovementsinsolutionquality,whilerunningwithinthetimeconstraints.
Thispaperreconsidersthefundamentalassumptionun-derlyingthesealgorithms:thatadistributionoffuturere-quests,oranapproximationthereof,isavailableforsam-pling.Itrelaxedthatassumptionandstudiedwhethertheun-derlyingdistributioncanbelearned(whenpartofitsstruc-tureisknown)orwhetherhistoricaldatacanexploitedforsamplingpurposes.Bothcontinuousandperiodiconlineop-timizationwereconsidered.
Asfarasthemachinelearningapproachisconcerned,thepapershowedthattechniquessuchasbeliefstatesandtheBaum-WelschalgorithmforlearningMMscanbeengi-neeredtoworkinanonlinesetting.Theexperimentalresultsonpacketschedulingareverypromising:theyindicatethattheapproachpreservesmostofthebenefitsoftheonlineal-gorithmsworkingontheactualdistribution.
Asfarashistoricaldataisconcerned,thepaperproposedtheideaofhistoricalsampling,whichconsistsofexploit-inghistoricaldataforsampling,i.e.,earliersequencesofrequestsinthecaseofcontinuousonlineoptimizationandpastinstancesinthecaseofperiodiconlineoptimization.Theexperimentalresults,bothonpacketscheduling(contin-uousoptimization)andvehiclerouting(periodicoptimiza-tion),showsthathistoricalsamplingisextremelyeffectiveandproducesthesamesolutionqualityastheactualdistri-bution.
Theseresultsclearlyboostthepotentialofonlinestochas-ticoptimization,sincetheysignificantlyweakentheirmainassumption.Inadditiontocaseswherepredictivemodelsareavailable,theyalsoshowthatonlinestochasticalgoritmscan
Problem
13.5014.5511.4514.0012.80
HS(C)10
14.3013.6011.6015.4016.00
R
10.4511.959.009.359.70
HIS(R)1000
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- haog.cn 版权所有 赣ICP备2024042798号-2
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务