Online Ranking Combination
Erzsébet Frigó
frigo.erzsebet@sztaki.hu
InstituteforComputerScienceandControl(MTASZTAKI) Budapest,Hungary
ABSTRACT
Asataskofhighimportanceforrecommendersystems,wecon- sidertheproblemoflearningtheconvexcombinationofranking algorithmsbyonlinemachinelearning.Inthecaseoftwobase rankers,weshowthattheexponentiallyweightedcombination achievesnearoptimalperformance.However,thenumberofre- quiredpointstobeevaluatedmaybeprohibitivewithmorebase modelsinarealapplication.Weproposeagradientbasedstochastic optimizationalgorithmthatusesfnitediferences.Ournewalgo- rithmachievessimilarempiricalperformancefortwobaserankers, whilescalingwellwithanincreasednumberofmodels.Inourex- perimentswithfvereal-worldrecommendationdatasets,weshow thatthecombinationoferssignifcantimprovementoverprevi- ouslyknownstochasticoptimizationtechniques.Ouralgorithmis thefrstefectivestochasticoptimizationmethodforcombining rankedrecommendationlistsbyonlinemachinelearning.
CCS CONCEPTS
•Information systems →Collaborative fltering; • Theory of computation → Online learning algorithms.
KEYWORDS
ranking;combination;RFDSA ACM Reference Format:
ErzsébetFrigóandLeventeKocsis.2019.OnlineRankingCombination.In Thirteenth ACM Conference on Recommender Systems (RecSys ’19), September 16–20, 2019, Copenhagen, Denmark. ACM,NewYork,NY,USA,8pages.
https://doi.org/10.1145/3298689.3346993
1 INTRODUCTION
Amilestoneintheresearchofrecommendationalgorithms,the NetfixPrizeCompetition[4]hadhighimpactonresearchdirec- tions.Thetargetofthecontestwasbasedontheonetofvestar ratingsgivenbyusers,withonepartofthedatausedformodel trainingandtheotherforevaluation.Asanimpactofthecompeti- tion,tasksnowtermedbatchratingpredictionweredominating researchresults.However,realsystemsdifernotjustinthatthe userfeedbackisimplicit,butalsointhattheyprocessdatastreams whereusersrequestoneorafewitemsatatimeandgetexposed
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributed forproftorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitation onthefrstpage.Copyrightsforcomponentsofthisworkownedbyothersthanthe author(s)mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,or republish,topostonserversortoredistributetolists,requirespriorspecifcpermission and /orafee.Requestpermissionsfrompermissions@acm.org.
RecSys ’19, September 16–20, 2019, Copenhagen, Denmark
©2019Copyrightheldbytheowner/author(s).PublicationrightslicensedtoACM.
ACMISBN978-1-4503-6243-6/19/09. . . $15.00 https://doi.org/10.1145/3298689.3346993
Levente Kocsis
kocsis@sztaki.hu
InstituteforComputerScienceandControl(MTASZTAKI) Budapest,Hungary
tonewinformationthatmaychangetheirneedsandtastewhen theyreturntotheservicenexttime.Furthermore,anonlinetrained modelmaychangeandreturncompletelydiferentlistsforthe sameuserevenforinteractionsverycloseintime.
Thedifcultyofevaluatingstreamingrecommenderswasfrst mentionedin[18],althoughtheauthorsevaluatedmodelsbyof- finetrainingandtestingsplit.Ideasforonlineevaluationmetrics appearedfrstin[21,22,29].Inonlineorprequentialevaluation[9], whichhasgrowninpopularity,therankingmeasureiscomputed fromasequenceofexamples.Foreachexampleinthesequence,the recommendersystemprovidesatop-k listofitemstotheactiveuser.
Thelistisevaluatedagainsttypicallyasinglerelevantitemthat theuserinteractedwith.Then,theuser-iteminteractionisadded tothepreviouslyavailabledata,andtherecommendersystemis abletoupdateitsmodel.
Recommendersystemsoftenrelyonanensembleofbaseranking algorithms.Forinstance,intheNetfixprizecompetition,consid- erableefortwentintochoosingthealgorithmstotheblendand combiningthem[28].Inanonlinescenario,theenvironmentfora combinationalgorithmisnon-stationary:notonlytheuserpref- erencesanditempopularities,butalsothebaserankingmodels changeintime.Therefore,thecombinationofthebasealgorithms alsoneedstobeupdated.Whileitisinfeasibletoupdatethepa- rametersofthecombinationwiththecomputationallyintensive blendingapproachesusedinbatchsettings,convexcombination ofthebasemodelsoftenleadtosatisfyingresults.Insummary, weconsideronlineconvexcombinationalgorithmsfor(implicit feedback)recommendersunderprequentialevaluation.
Fromthemachinelearningpointofview,themaindifcultyof combiningrankedrecommendationlistsisthatthetypicalranking measures,suchasNDCG[12],arenotcontinuous,makingtheirop- timizationadifculttask.Inthispaper,wecompareandidentifythe numericalissuesoftwostrategiestooptimizefornon-continuous rewards.Thefrstapproachusesexponentiallyweightedforecast- ers,whichexploretheweightspacegloballyanddonotrelyonthe existenceofagradientoftherewardfunction.Thesecondclassof methodsusesgradientdescenttomaximizethereward.
Exponentiallyweightedalgorithms(EWA)[7]optimizeranking combinationweightsbyexploringtheweightspaceglobally.EWA wasshowntobeclosetooptimalforLipschitz-continuousenvi- ronments[19].WewillshowthatEWAisabletooptimizeranking combinationaswell,undercertainassumption(seeProposition4.1).
However,thenumberofcombinationsthatneedstobeevaluated tofulflltheassumptiongrowsexponentiallywiththenumberof baserankers.Therefore,itisnotpracticalinarealapplicationif morebaserankersareemployed.
Tobeabletohandlealargernumberofbaserankers,weturnour attentiontothesecondapproach,localoptimizationbygradient
basedmethods.Inparticular,westartwiththeResilientSimultane- ousPerturbationStochasticApproximation(RSPSA)algorithm[15], whichwasusedforoptimizingmodelparametersingames.While RSPSAwasshowntocopewithnon-continuousrewards,itisnon- trivialwhetheritcancopewithrankingfunctionsaswell.Indeed, weobserveempiricallythatRSPSAdoesnotscalewellforranking prediction.Thereasonforthisisthatrankingfunctionshavemany fatregionswithrespecttoindividualcombinationweights.
Ourmethod,ResilientFiniteDiferenceStochasticApproxima- tion(RFDSA+),isthefrstefectivestochasticoptimizationmethod forcombiningrankedlists.Toimprovethescalabilitypropertiesof RSPSA,weswitchfromsimultaneousperturbationtofnitedifer- encestoidentifyfatregionswithrespecttoagivenweight.Inthis way,weeliminatethenoiseoftheperturbationofotherweights andconcentratealwaysonlyonoptimizingasingleweightata time.WeshowempiricallythatRFDSA+achievesnearoptimal performancewhentwobaserankersarecombined,andscaleswell withthenumberofbaserankers.
Thearticleisorganizedasfollows:afterdiscussingtherelated researchinSection2,weformalizeourframeworkinSection3.Ex- ponentiallyweightedalgorithmsarediscussedinSection4,where inProposition4.1weshowthetheoreticalguaranteeofEWA.Gra- dientbasedalgorithmsarediscussedinSection5.Ourproposed algorithmRFDSA+isdescribedinSection6.Empiricalevaluation highlightingthestrengthofRFDSA+isprovidedinSection7.Some conclusionsanddiscussionoffutureresearchclosethepaperin Section8.
2 RELATED RESEARCH
Researchonincrementalrecommenderalgorithmswithprequential evaluationscenariohasgainedpopularityinrecentyears.Thereare severalpapersthatuseprequentialevaluation[3,5,13,21,22,31], however,only[22]considerstheissueofcombiningmultiplebase rankers.ThelatterwillbediscussedinmoredetailinSection5.1, andevaluatedempiricallyinSection7.
Rankingcombinationhasreceivedconsiderableattentiondur- ingtheNetfixprizecompetition,whentheapproachof[28]was essentialforthewinningentry.Inthebatchsetting,oneofthelater approachesthatcanbeadaptednaturallytoanonlinescenariois [6].Theauthorsuseexponentiallyweightedforecaster,anduse thecumulativelossofeachbasealgorithmtocomputeitsscorein theconvexcombination.Onecannoticethatanyarbitrarylinear shiftofthescoresofabasealgorithmwouldleaveitscumulative lossunchanged,butitwouldafectthebasealgorithmcontribution tothemix.Therefore,thealgorithmseemssomewhatlesssound, nevertheless,maystillperformreasonablywellonsomepractical instances.WewilldescribethealgorithmmoreformallyinSec- tion4.3,andevaluate(forimplicitfeedbackproblems)empirically inSection7.
Intheonlinesetting,rankingcombinationwasproposedby [25,30]usingduelingbandits.Theirapproachassumesthatthe lossfunctionsareconvexandstationary.Neitherassumptionseems reasonableformostrankingmeasuresinarealapplication.There areseveralalgorithmsintheliteratureofonlinelearningthatcan
beconsideredforcombiningrankingmodels.[2]consideredatwo- pointapproximationofthegradientforconvexfunctions.Therank- ingmeasuresarenotconvex,nevertheless,thealgorithmissimilar toSPSA[27]thathavebeenappliedtooptimizing non-convex functionsaswell.WewilldiscussthealgorithminSection5.2.
Theexponentiallyweightedalgorithmwasappliedtooptimize
√ (non-convex)Lipschitz-continuousfunctions[19]andithas O ( T ) guaranteesinfull-informationsetting,where T isthelengthofthe episode.Fullinformationsettingwouldimply,however,evaluating aprohibitivelylargenumberofpointswhenthenumberofbase rankersisslightlylarger.Therearebanditvariantsaswell[14]that evaluateonlyonepointperiteration,however,theyscalebadly onerror.Theregretboundforthecontinuum-armedbanditsis O (T (N +1)/(N +2)) [14],where N isthedimensionalityoftheprob- lem.Theexponentiallyweightedalgorithmswillbeconsideredin Section4.
Finally,therearealargenumberofstochasticapproximation algorithmsthatcan,inprinciple,beappliedtoonlinerankingcom- bination.Unfortunately,neitherofthemisstraightforwardtouse forrankingfunctionssuchasNDCGthatarenotcontinuous,and evenasmoothedcumulativerankingrewardfunctioncanbenon- convexaswell.Inmostgames,therewardisalsonon-continuous, forexample,1/0forwin/loss,oradiscretenumberofpoints(or money)thatcanbewoninacardgame.ThealgorithmRSPSA[15]
wasproposedfor(ofine)optimizationofsomeparameterofa pokerplayingprogram.Ourproposedalgorithm,RFDSA+,builds ontheideaofRSPSAbutconsidersoneweightatatime,toremedy theproblemsofpastalgorithmsinhandlingfatareasinranking functions.
3 PROBLEM SETUP
Weconsidertheonlinecombinationoftherankedlistofmultiple baserecommenderalgorithms.Assoonasthebasealgorithmsgive aprediction,wehavetoapplyandpotentiallyre-learnthecombina- tionweightonthefy.Incontrasttothetypicalbatchlearningtasks wherewecanforexampleperformgridsearchbyusingalarge amountofpasttrainingdata,closertoarealrecommendersystem operation,inourtask,weprocesstherecommendationrequests andthefeedbackasasequenceintime.Comparedtobatchlearning, theadvantageoftheonlinemethodsisthattheycanadaptfaster toconceptdrifts[8]thatcanrearrangetherelativestrengthofthe diferentbasemodels.
Bothbatchandprequentialevaluationrelyonasetofrecorded user-iteminteractions.Forbatchevaluation,onesplitsthedatain atrainingandatestset,andtrainsthealgorithmsontheformer andtestsonthelatter.Conversely,forprequentialevaluation,we testalgorithmssequentiallyoneachdatapoint,andpotentiallyuse allprecedingdatapointsfortraining.Sinceoften,theuserselects asingleitemonly,wewillconsiderimplicitfeedbackevaluation metricswithonlyonerelevantitem,buttheevaluationcaneasily begeneralizedforthecasewhentheusertakesmultiplechoicesor whenthefeedbackisexplicit.Prequentialevaluationisclosertoa realapplication,sinceinpractice,userinteractionoccurssequen- tially.Algorithmscanalsoexploitthemostrecentdata.Itistrue forbothevaluationmethodsthatrecommendationismadebefore
revealingthechoiceoftheuser,thatisagivenuser-iteminteraction isprocessedindependentlyofwhatthesystemrecommends.
Givenachronologicallyordereddatasetwith T records,pre- quentialevaluationisanepisodewith T rounds.Ineachround t, wetakethefollowingsteps.
(1) Weobservethenextuser-itempairfromthedataset,and settheactiveuseraccordingly.
(2) Wequerytherecommendersystemforatop-K recommen- dationfortheactiveuser.
(3) Weevaluatetheoutputrecommendationlistagainstthe singlerelevantitem jt thattheuserinteractedwith.
(4) Finally,werevealtherelevantitem jt totherecommender system,andallowtoupdatethemodelusingtheadditional user-itempair.
Inthecontextofconvexcombinationalgorithms,weconsider N baserankingalgorithms,andthe ithbasealgorithmisdenotedby Ai .Ineachround t = 1, . . . ,T ,frst,eachbasealgorithmAi assigns ascore xtij toeachitemj.Afterthat,theconvexcombination algorithmassignstheweight θti toeachalgorithmAi .Theweights forman N -dimensionalvector θt = (θt 1, . . . , θt N ).Theparameter spaceis θt ∈ Θ = RN . Thecombinedscoreofitem j inround t is
PN
0+
xt j = =1θti xtij . Thetoplistsaregeneratedbysortingtheitems
i
bythecombinedscoresindescendingorder.
Aftertheactiveuser’spreferreditemisrevealed,thecombination algorithmcollectsthereward rt ,whichdependsonthetoplist generatedandontheuser’schoice.With anabuseofnotation, wewilldenotetherewardofabaserankerAi by rti ,thereward correspondingtoaweightassignment θ by rt (θ ),andthereward obtainedbyacombinationalgorithmCby rt (C).Thecumulative
Pt
rewardcollecteduptoround t is Rt = τ=1rτ .Welet Rti , Rt (θ ), and Rt (C) denotethecumulativerewardcorrespondingtoabase ranker,aweightvector,andacombinationalgorithm.
Thereareseveralchoicesofrankingmeasures.Apopularchoice, whichweuseinourexperiments,isNDCG@K[12].Inprequential evaluation,weassumetheworstscenariothatthereisonlyone itemwithnon-zerolabelineachround t,namely jt .TheNDCG@K ofapermutation πt oftheitemsreducesto
(1/ log
2(rankπt (jt ) + 1) if rankπt (jt ) ≤K, rt = NDCG@K(πt )=
0 otherwise,
asthereisalwaysexactlyonerelevantitemandhencetheideal DCGisequaltoone.
4 EXPONENTIALLY WEIGHTED BASELINE ALGORITHMS
Thefrstsetofbaselinerankcombinationalgorithmsinthissection relyontheexponentiallyweightedforecaster[7].Theyexplore theweightspaceglobally,withoutrelyingontheexistenceofa gradientoftherewardfunction.InProposition4.1,wewillalso showthatinthecaseoftwobaserecommenders,theexponentially weightedcombinationachievesnearoptimalperformance.
4.1 ExpA
Thesimplestchoicetodealwithmultiplebaserankersistousethe exponentiallyweightedforecasterontherankers.Accordingly,the
combinationalgorithm,denotedbyExpA,selectsbaserankerAi inround t withprobability
Pt −1
−ηt =1rτ i
pti = e τ Pt −1 . (1) PN −ηt =1rτ j
=1e τ
j
SelectingbaserankerAi inround t meanssetting θti = 1and θt j = 0for j , i.Thealgorithmisguaranteedtoachieveacumu- lativerewardthatisnotworsethanthecumulativerewardofthe
√
bestbaserankersbyanadditive O ( T ) terminexpectation[7].
4.2 ExpW
WhileExpAcanlocatethebestbaseranker,wecanhopethata convexcombinationoftherankerscanachieveabetterperformance thananysingleranker.Wechooseafnitesetofpoints P ⊂ Θ andapplytheexponentiallyweightedforecasterto P tochoose theweightcombinations θt ∈ P toplay.Ifanappropriatelylarge numberofpointsarechosen,andthecumulativerewardfunction RT(θ ) (asafunctionof θ)issufcientlysmooth,thenthealgorithm, denotedbyExpW,willachieveacumulativerewardthatiscloseto thatoftheoptimalconvexcombination.Thefollowingproposition formalizesthisstatement.
Proposition4.1. Let P ⊂ Θ be a fnite set such that
" # √
E max RT (θ ) −max RT (p) ≤ T . (2)
θ ∈Θ p ∈P
Then the regret of the exponentially weighted forecaster applied on P is bounded by
" #
√
E max RT (θ ) − RT (ExpW) ≤ O˜ T . (3)
θ ∈Θ
Theprooffollowsbyputtingtogethertheregretboundofthe exponentiallyweightedforecasterandinequality(2).
Forasufcientlylarge T ,function RT (θ ) isfairlysmoothin practice,asobservedinSection7.3.Fortwobaserankers,thepa- rameterspacecanberepresentedbyaonedimensionalsimplex,
√
i.e.asection.Then,auniformgridwith O ( T ) gridpointscanbe sufcient,ifthecumulativefunctionactslikeaLipschitzfunction onthegridpoints.Thislatterconditionwilloftenbetrue(seealso Figure1).However,withmorebaserankers,thenumberofpoints requiredforaLipschitz-likecumulativefunctionis Ω(T (N −1)/2). SinceExpWneedstoevaluatetherewardineachpoint,thenum- berofevaluationsscalesexponentiallywiththenumberofbase rankers.
4.3 ExpAW
In[6],theauthorsproposedanalgorithmthatcanberegardedasa mixofExpAandExpW.Thealgorithm,denotedherebyExpAW, reliesonthecumulativeperformanceofthebaserankers(asExpA), butitisusedastheweightofthebaseranker,insteadofusingitas selectionprobability.TheweightofbaserankerAi inround t is
Pt −1
−ηt
e τ =1rτ i
θti = Pt −1 . (4)
PN −ηt τ =
1rτ j j=1e
Itiseasytoseethatrewardofabaserankerdoesnotchange ifthescoresofrankersarescaledbysomefactor.However,the
scalingwillafecttherewardofthecombinationalgorithminan arbitraryway.Nevertheless,withareasonablenormalization,the algorithmmaystillleadtoadecentperformance,anditislesslikely tobeafectedbyanincreaseinthenumberofbaserankers.
5 GRADIENT BASELINE ALGORITHMS
Inthissection,wepresentthesecondsetofbaselinerankcombi- nationmethods,whichareguidedbythegradientofthereward function.Inthefrstsubsection,wedescribealgorithm(SGD)that computesthegradientofasurrogatetothereward.Thenexttwo algorithms(SPSAandRSPSA)arestochasticapproximationalgo- rithmsthatapproximatethegradientbyfnitediferences.Wemen- tionthatournewalgorithmRFDSA+buildsonRSPSA,extendingit todealwiththedifcultiesoffatregionsintherankingfunctions.
5.1 SGD
WecallourfrstcombinationalgorithmSGD,sinceitusesstochastic gradientdescentforthemeansquarederror(MSE)asasurrogate tothereward.Thetargetforthecurrentitemissetto1,andthe targetsforasetofrandomlysamplednegativeitemsissetto0.
Afterseeingauser-itempair,astochasticgradientstepistakento minimizetheMSEbetweentherankingscore(xt j )andthetarget value.Thealgorithmwasusedby[23]formatrixfactorizationand by[22]foronlinecombination.
Wedonotexpectthealgorithmtohavedifcultywithalarge numberofbaserankers.However,minimizingthesurrogateloss maynotresultinasufcientlygoodoptimizationoftheoriginal rewardfunction.
5.2 SPSA
Thegradientofmostrankingfunctionswithrespecttothecom- binationweightsistypicallyzeroinmostpointswhereitexists.
However,ifweaverageovermoretimesteps,itstartsto‘smooth out’.Itstillcannotbecomputedinaclosedform,butitcanbe approximatedbyfnitediferences.Foronlineoptimizationofcon- vexfunctions,[2]suggestedthegradienttobeapproximatedby simultaneousperturbation,withanonlinegradientsteptakenin theapproximateddirection.Fornon-convexoptimization,asimilar algorithmisknownasSimultaneousPerturbationStochasticAp- proximation(SPSA)[27].Theapproximatedgradient дti isgiven by
дti = (rt (θt + ct ∆t ) − rt (θt − ct ∆t ))/(ct ∆ti ), where ct isanappropriatelydecreasingsequence,
∆t = (∆t 1, . . . , ∆t N ),
and ∆ti are±1valuedunbiasedBernoullirandomvariables.
5.3 RSPSA
InSPSA,especiallywithnon-smoothfunctions,thedifcultyliesin choosingtheappropriateperturbation.Thesumofrankingreward functionsisastepfunction.Iftheperturbationsizeistoosmall,we mightstuckonaplateauandcannotfndtherightdirection.Ifthe perturbationistoolarge,wemisslocaloptima.Theappropriate perturbationstepsizemightdiferdependingonthecoordinate andtime.
TheRSPSAalgorithmwasproposedin[15]forgames,which alsohaveadiscretereward(e.g.,1forwin,0forloss).Thealgorithm combinesthesimultaneousperturbationapproximationwiththe ResilientBackpropagation(RPROP)[11]updaterule.InRPROP,we assignadistinctstepsizetoeachweight.Informally,ifthedirection ofthegradientchanges,thenthestepsizeisdecreased.Otherwise, thestepsizeisincreased.Theweightupdatedependsonlyonthe signofthegradient,andthestepsizedetermineshowmuchthe weightchanges.InRSPSA,theperturbationsizeforeachweightis connectedtothestepsize,solvingtheabovementioneddifculty.
TheRPROPupdateruleisdesignedforbatchupdate,andtherefore, inoursetting,weuseminibatchestocollectthegradientsbefore anupdate.
6 OUR METHOD: RFDSA+
WedesignedournewmethodbyobservingthebehaviorofRSPSA forrankingcombination.OneofthestrengthsoftheRPROPup- dateruleisthatitisincreasingtheupdatestepsonalargeplateau, andtakinglargerstepsinthedirectionsofthegradient.Ranking functionsasfunctionofacombinationweightconsistofconstant intervals.Howeveriftheperturbationissufcientlylarge,theav- eragedgradientestimatewillbenon-zero.Ifthestepsizefora weightissmallinafatarea,thenitshouldbeincreasedinorder toescapethefatarea,butalsoinordertobeabletoestimatethe rightdirection.Inotherwords,theweightneedsasufcientlylarge perturbationtobeabletoinfuencetherankingfunction.
However,weobservedthatinRSPSA,theestimateddirection changesofteninthefatarea,andthestepsizeinfactdecreases.
Toillustratetheproblem,considerarankingfunctionthatiscom- pletelyfatinthedirectionofsomebutnotallcoordinatesinthe neighborhoodofthecurrent θt .Inthiscase,thedirectionofthe estimatedgradientofthe‘fat’coordinatesbecomesanunbiased Bernoullivariable,sinceeveniftherankingfunctioniscompletely fatwithrespecttocoordinate i,thenumeratorof дti willstillbe non-zerobecauseofthenon-fatcoordinates.However,thenumer- atorwillbeindependentoftherandomlychosendirectionof ∆ti , andhence дti willsimplymirrortherandomvariable ∆ti .
Toremedytheproblemoffatregions,weswitchfromsimul- taneousperturbationtofnitediferencesinordertoidentifythat therankingfunctionisfatwithrespecttotheweightinquestion.
Notethatbyperturbingjustoneweight,weeliminatethenoise comingfromtheperturbationoftheotherweights.Ifwedetecta fatregion,thenweincreasethestepsize.
ThepseudocodeoftheRFDSA+isprovidedinAlgorithm1.The keydiferencestoRSPSAareswitchingfromsimultaneouspertur- bationtofnitediferences(line7–8),andhandlingthefatregions (line22–23).TheRPROPupdateisgivenbyline11–28.
Thealgorithmhasfourparameters:themini-batchsize B,the initialstepsize δ0,andthestepsizeadjustmentvariables η+ and η−.Fornoisefunctions,typicalvaluesare η+= 1.1and η−= 0.85 [15].Theinitialvalueofthestepsizehasminimalinfuence,since itisquicklyadjusted;itissetto δ0= 0.1.Thesizeoftheminibatch willbechosen1,000intheexperiments,thesameasforSPSAand RSPSA.Thelengthofanepisode T ,andthenumberofthebase rankers N isdeterminedbytheproblem.
ThekeyvariablesoftheRFDSA+algorithmarethestepsizes δi , correspondingtoeachweight θi .Theauxiliaryvariables si store thepreviousweightupdateandareusedforidentifyingachange inthedirectionofthepartialderivatives.Duringaminibatch,the negativepartialderivativesarecollectedinthevariables дi .
TheRFDSA+algorithmstartswithaninitializationphasein line1–4.Aftereveryuserinteraction,attime t,thepartialderiva- tivesarecomputedasfollows.Foreachbaseranker i,weperturbits weightbytwicethecorrespondingstepsize(line7).Thecoupling factor2isstandardforRSPSA[15],butslightlydiferentvalues canbeusedaswell.Weuseone-sidedpositiveperturbationinthe descriptionofthealgorithm.Usingone-sidedperturbationhalves thenumberofevaluationsneeded.Withone-sidedperturbation, itismorenaturaltochoosethedirectionrandomly(±2δi )valued Bernoullirandomvariable.Thecurrentdescriptionwaschosenfor brevity.Thepartialderivatives дi areupdatedinline8,usingthe fnitediferenceestimator.
Attheendofeachminibatch,theweights θi andthestepsizes δi areupdatedaccordingtotheRPROPrule[11]inlines11–28, independentlyforeachcomponent i.Theauxiliaryvariable h de- tectsachangeofdirectioninthepartialderivative.Ifthereisno change(lines13–15),thestepsizeisincreased,andtheweight θi
willbeupdatedinthedirectionofthederivativeswiththeamount determinedbythestepsize.Ifthereischangeinthedirection,then thestepsizeisdecreased,andtheweightisleftunchanged.The weightwillbeupdatedafterthenextminibatch(line20).Thekey modifcationthatdealswithfatregionsinthepartialderivatives isshowninlines22–23.Accordingly,thestepsizeisincreased ifthepartialderivativeis0duringtheminibatch.Detectingthe fatregionismadepossiblebyusingfnitediferenceestimation, insteadofsimultaneousperturbation.Theactualweightupdateis showninline25.
7 EXPERIMENTS
Inthissection,frstweempiricallyinvestigatehowwellthecombi- nationalgorithmsperformfortwobaserankers,comparedtothe optimal(static)combination.Thenweanalyzehowthecombina- tionalgorithmsscalewhenalargernumberofbaserankersare available.
7.1 Data sets
Alldatasetsconsistsoftime-orderedsequenceofuser-itempairs.
Onlythefrstoccurrenceofauser-itempairisincluded.Thetaskat acertainpointoftimeistoranktheavailableitemsforthecurrent user.Afteratoplistisprovidedbyaparticularalgorithm,areward isobtainedusing N DCG@100asrankingmeasure(seeSection3).
Inourcase,thereisonlyoneitemwithnon-zerolabel(theone fromthecurrentuser-itempair).Followingtheevaluationstep,the itemisrevealedtothebaserankersandthecombinationalgorithm, allowingthemtoupdatetheirmodel.
Intheseexperiments,weusethreedatasetsfromtheAmazon collection(CDsandVinyl;MoviesandTV;Electronics[20]),the 10MMovieLensdataset1,andatwitterdatasetwheretheitems aredefnedbythehashtagsusedintweets.
1http://grouplens.org/datasets/movielens/
1 for i = 1 to N do
2 θi ←1/N ; дi ←0
3 si ←0; δi ← δ0
4 end
5 for t = 1 to T do
6 for i = 1 to N do θ+ ← θ; θ+
7 i ← θi + 2δi
8 дi ← дi + (rt (θ+) − rt (θ ))/(2δi )
9 end
10 if t mod B = 0 then
11 for i = 1 to N do
12 h ← дi si
13 if h > 0 then δi ← η+δi
14
15 si ← siдn(дi )δi
16 else if h < 0 then δi ← η−δi 17
18 si ←0
19 else
20 si ← siдn(дi )δi
21 end
22 if дi = 0 then
δi ← η+δi
23
24 else
25 θi ← θi + si
26 end
27 дi ←0
28 end
29 end
30 end
Algorithm 1: RFDSA+
7.2 Base rankers
Werelyontwobasicclassesofcollaborativeflteringmodels:item basednearestneighbor(item2item)[26]andmatrixfactorization [1].Thesetwoclassesofmethodsrepresentthemostsuccessful andmostpopularcollaborativeflteringalgorithms2[17,24].In additiontothetwotechniques,wealsoincludetemporalpopularity (denotedPop),whichrecordshowmanytimesanitemwasvisited intheprecedingtimewindow.
Foritem2item,weuseatime-decayeditem-to-itemsimilarity function,themodelbeingupdatedeveryday.Whencomputingthe scoreforanitem,weconsiderthesimilaritytoallitemspreviously visitedbytheuser.Thus,thisalgorithmalsoincorporatestherecent history.
Weincludefourmatrixfactorizationvariants:online matrix factorization(OMF)[23],onlineasymmetricmatrixfactorization (OAMF)[16],batchmatrixfactorization(MF),and(batch)implicit alternatingleastsquares(iALS)[10].Allvariantsuselatentfac- torswithtendimensions.Theonlinevariantsupdateonceafter
2Forparticulardatasets,theremaybesuperioralgorithms,especiallyinbatchsettings.
Thetwomainbaserankersconsideredarerepresentativesoftwomainapproachesto collaborativefltering,andhavenaturalincrementalversions.Noneofthecombination algorithmsexploittheparticularbaserankers,thusreplacingthebaserankersis straightforward.
0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054
0.0001 0.001 0.01 0.1 1
NDCG
θ
RT(θ)
Figure 1: Reward with various combination coefcients (θ) for the combination of OMF and item2item on the Amazon- CD set. In the fgure, θ denotes the normalized weight of the OMFbase ranker. The normalized weight for item2itemis 1− θ. RT (0) = 0.03434.
everyuser-itempair.Thebatchvariantsretraintheirmodelsafter every100,000timesteps,usingarequirednumberofiterations.
WeusestochasticgradientdescentforOMF,OAMFandMFwith thecurrentitemfromthedatasetdesignatedaspositiveitem,and additionalnegativeitemssampledrandomly[23].
Theparametersofthebaserankersareoptimizedforeachdata set.Inthecombination,thescoresofthebaserankersarenormal- izedbythestandarddeviation.
7.3 Combination of two models
WeshowourresultsforcombiningthetwobasemodelsOMFand item2item.Welet θ denotetheweightofOMFintheconvexcom- bination.Theaveragecumulativereward,dependingon θ,isshown fortheAmazon-CDdatasetinFigure1.Interestingly,theoptimum isreachedforacombinationthatputsheavyweightoitem2item, eventhoughOMFaloneperformsbetterthanitem2item.
Theaveragecumulativerewardofthecombinationalgorithms isshowninFigure2.Thepeculiarshapeinthefrstthreeyearsis duetothelowamountofdatacollectedandthemoresignifcant changesinthedatadistribution.Weobservetherelativeorder ofthebasealgorithmschangesovertime:atfrstOMFisbetter, thenitem2item,andthenOMFagain.Thisshowsthatselectingan algorithmonpartialdata,andusingonlythatalgorithmlaterisa poorchoice.ExpAfollowsthebetterbasealgorithm,beingslightly worsethanthatduetoexploration.ExpW3achievesaperformance thatequalstothebeststaticconvexcombination(cf.Figure1).
ExpAWisonparwithExpWinthebeginning,butitsperformance deteriorateslater.Thisisnatural,sinceitischoosingalargerweight forOMFduetothesuperiorperformanceofOMF,despitethatthe actualoptimumistoassignalargeweighttoitem2item,asseen inFigure1.SGDhassimilarperformancetoExpW,givingalso alargerweighttoOMF.ThisispossiblybecauseSGDandOMF optimizethesamesurrogatelossfunction.
3ForExpW,thesetofpoints P consistedofauniformgridwith100points.
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
0 1000 2000 3000 4000 5000 6000 7000
NDCG
days
item2item OMF ExpA ExpAW ExpW SGD SPSA RSPSA RFDSA+
Figure 2: Average cumulative NDCG of the ranking algo- rithms on the Amazon-CD set.
0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000 1.0000000
0 1000 2000 3000 4000 5000 6000 7000
θ
days
OptG100+
ExpAW SGD SPSA RSPSA RFDSA+
Figure 3: The weight assignment of the ranking algorithms on the Amazon-CD set. OptG100+ corresponds to the opti- mal weight assignment over 100 uniform grid points, with a few additional points chosen near the presumed optimum.
In the fgure, θ denotes the normalized weight of the OMF base ranker. The normalized weight for item2item is 1− θ.
Theweightassignmentofthecombinationalgorithmsisshown inFigure3.Thefgureincludesadditionallyanoptimalstaticweight assignment,i.e. θt = argmaxθ ∈P Rt (θ ).Byanalyzingtheweight assignmentofthe threecombinationalgorithms that optimize NDCGdirectly(SPSA,RSPSAandRFDSA+),weobservethatall giveitem2itemalargeweight,althoughtheweightsforSPSAare furtherawayfromtheoptimum.Consequently,wenoticeinFig- ure2thatthethreealgorithmsperformwell,RSPSAandRFDSA+
matchingtheoptimalperformanceofExpW.
7.4 Scaling
Weanalyzethescalingofthecombinationalgorithmsintwoways:
(1)byincludinganincreasingnumberofOMFbaserankers(difer- ingonlyintherandominitialization)nexttoitem2item,and(2) byincludingallsixbaserankersinthemix.
Table 1: Combination of six base rankers on fve data sets. The average NDCG of the base rankers is shown at the top of table, while the average NDCG of the combination algorithms at the bottom.
Algorithm Amazon-CD Amazon-Movies Amazon-Electro MovieLens Twitter
item2item 0.0343 0.0350 0.0156 0.1445 0.0221
OMF 0.0389 0.0440 0.0222 0.1357 0.3528
Pop 0.0628 0.0663 0.0347 0.0857 0.3486
OAMF 0.0318 0.0320 0.0160 0.1717 0.3118
MF 0.0052 0.0086 0.0056 0.0051 0.0055
iALS
ExpA
0.0046
0.0628
0.0075
0.0663
0.0060
0.0347
0.0053
0.1717
0.0054
0.3486
ExpAW 0.0628 0.0664 0.0347 0.1717 0.3486
SGD 0.0640 0.0674 0.0353 0.1568 0.3563
SPSA 0.0696 0.0692 0.0349 0.1678 0.3683
RSPSA 0.0640 0.0670 0.0396 0.1435 0.4468
RFDSA+ 0.0880 0.0882 0.0452 0.1879 0.4601
0.03 0.035 0.04 0.045 0.05 0.055 0.06
1 2 3 4 5 6 7 8 9 10
NDCG
number OMF’s xOMF
item2item + xOMF ExpA ExpAW
SGD SPSA RSPSA RFDSA+
Figure 4: Average NDCG of the ranking algorithms on the Amazon-CD set with varying number of OMFs. The com- bination includes one item2itemand one to ten OMFbase rankers. In the case of xOMF there is only one OMF, but the dimension of the latent factors is increased from 10 to the range of 10–100.
Inthefrstcase,assumingthatthevariousOMFmodelsachieve similarperformance,oneexpectsthattheoptimalweightforitem2item staysrelativelythesame,withtheweightofoneOMFfromthe previoussectiondividedamongthemultipleinstances.Thedif- cultyhereisthattheproperweightassignment(foritem2item) needstobefoundinaspacewithlargerdimensionality.Forlarger dimensions,placinggridpointsthatcovertheparameterspace sufcientlywouldrequireexponentialnumberofevaluations,thus wedonotincludeExpWinthisexperiment.Theperformanceof theothercombinationalgorithmsisshowninFigure4.
WeobservethattherankingperformanceofRFDSA+isnot droppingasthenumberofOMFsincreases.Itisevenabletouse theslightvariationintheOMFstoincreasetheperformanceslightly.
TheperformanceofSPSAandRSPSAdeterioratessignifcantlyas moreOMFsareincludedinthemix.ExpA,ExpAWandSGDall
copewellwiththeincreaseddimension,buttheirperformanceis muchweakeroverallthanthatofRFDSA+.Therelativeinvariance ofExpAunderlinesthattheindividualOMFrankersachievesimilar performance(wecheckedthatthevarianceoftheirNDCGscoreis indeedverysmall).Weaddedtwofurtherbaselinestothefgure:
xOMFisavariantofOMFwithincreasedlatentvectordimension, anditem2item+xOMF,acombinationusingRFDSA+.Weobserve thattheindividualperformanceofanonlinefactormodelincreases withthedimensionofthelatentvectors.However,incombination withitem2item,itisbettertousemanysmallermodelsthanone bigone,assumingthattheyarecombinedwithanalgorithmsuchas RFDSA+thatscaleswell.Resultsontheotherdatasetsaresimilar andomittedduetospacelimitations.
Next,weshowtheperformanceofthecombinationalgorithms whenallthesixbaserankersareusedinTable1.First,wenoticethat theindividualperformanceofthebatchbaserankers(MFandiALS) ispoorforalldatasets.Theperformanceoftheotherbaserankers vary,dependingonthedataset.Regardingtheperformanceofthe combinationalgorithms,wecandrawsomewhatsimilarconclusion asforFigure4:RFDSA+hassignifcantlybetterperformanceforall datasetscomparedtoothercombinationalgorithms.Wealsonote thattheimprovementinperformanceoverthebestindividualbase rankerisconsiderableforalldatasets.ExpAachievesapproximately theperformanceofthebestindividualranker.ExpAWandSGD copereasonablywellwithmorebaserankers,buttheirperformance isnotexceedingbymuchtheperformanceofthebaseranker.SPSA andRSPSA(whichwereperformingwellfortwobaserankers)are notperformingparticularlywellwhenalargernumberofmodels areincludedinthemix.
8 CONCLUSIONS
Inthispaper,wehaveconsideredthetaskoflearningtheonline convexcombinationofbaserecommenderalgorithmsbystochastic optimization.Forthecaseoftwobaserankers,wehaveshown thattheclassofexponentialweightedalgorithmsattainscloseto optimalperformance.However,thealgorithmcannotbeapplied inrealapplicationwithalargernumberofbaserankers,because oftheexponentialnumberofevaluationsneeded.Toremedythe
scalingproblem,wehaveproposedanewalgorithmRFDSA+.The algorithmusesfnitediferencestoestimatethegradientofthe rankingreward,andtheRPROPupdateruletoadjustthecombina- tionweights.Theupdaterulewasmodifedinordertodealwith fatregionsthatoftenappearinrankingfunctions.Thenewalgo- rithmisshownempiricallytoperformclosetooptimumfortwo baserankers,andscalewellifthenumberofmodelsisincreased byhomogeneousbaserankersorvariedones.Weobservedthat byapplyingtheRFDSA+combinationalgorithmaconsiderable improvementinrankingperformancecanbeobtainedoverthe baserankers.
REFERENCES
[1] JacobAbernethy,KevinCanini,JohnLangford,andAlexSimma.2007.Online collaborativefltering. University of California at Berkeley, Tech. Rep (2007).
[2] AlekhAgarwal,OferDekel,andLinXiao.2010.OptimalAlgorithmsforOnline ConvexOptimizationwithMulti-PointBanditFeedback..In COLT.Citeseer, 28–40.
[3] MarieAl-Ghossein,Pierre-AlexandreMurena,TalelAbdessalem,AnthonyBarré, andAntoineCornuéjols.2018.Adaptivecollaborativetopicmodelingforonline recommendation.In Proceedings of the 12th ACM Conference on Recommender Systems.ACM,338–346.
[4] JamesBennett,StanLanning,etal.2007.Thenetfixprize.In Proceedings of KDD cup and workshop,Vol.2007.NewYork,NY,USA.,35.
[5] RobinBurke.2010.Evaluatingthedynamicpropertiesofrecommendational- gorithms.In Proceedings of the fourth ACM conference on Recommender systems. ACM,225–228.
[6] RóbertBusa-Fekete,BalázsKégl,TamásÉltető,andGyörgySzarvas.2011.Rank- ingbycalibratedAdaBoost.In Proceedings of the Learning to Rank Challenge. 37–48.
[7] NicoloCesa-BianchiandGáborLugosi.2006. Prediction, learning, and games. Cambridgeuniversitypress.
[8] JoaoGama,PedroMedas,GladysCastillo,andPedroRodrigues.2004.Learning withdriftdetection.In Brazilian symposium on artifcial intelligence.Springer, 286–295.
[9] JoãoGama,RaquelSebastião,andPedroPereiraRodrigues.2009. Issuesin evaluationofstreamlearningalgorithms.In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,329–338.
[10] YifanHu,YehudaKoren,andChrisVolinsky.2008.CollaborativeFilteringfor ImplicitFeedbackDatasets..In ICDM,Vol.8.Citeseer,263–272.
[11] ChristianIgelandMichaelHüsken.2000. ImprovingtheRpropLearningAl- gorithm.In Proceedings of the Second International ICSC Symposium on Neural Computation (NC 2000),H.BotheandR.Rojas(Eds.).ICSCAcademicPress, 115–121. citeseer.ist.psu.edu/igel00improving.html
[12] KalervoJärvelinandJaanaKekäläinen.2000.IRevaluationmethodsforretrieving highlyrelevantdocuments.In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval.ACM, 41–48.
[13] MichaelJugovac,DietmarJannach,andMozhganKarimi.2018.Streamingrec:a frameworkforbenchmarkingstream-basednewsrecommenders.In Proceedings of the 12th ACM Conference on Recommender Systems.ACM,269–273.
[14] RobertDKleinberg.2005.Nearlytightboundsforthecontinuum-armedbandit problem.In Advances in Neural Information Processing Systems.697–704.
[15] LeventeKocsisandCsabaSzepesvári.2006.Universalparameteroptimisationin gamesbasedonSPSA. Machine learning 63,3(2006),249–286.
[16] YehudaKoren.2008. Factorizationmeetstheneighborhood:amultifaceted collaborativeflteringmodel.In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,426–434.
[17] YehudaKoren,RobertBell,andChrisVolinsky.2009.Matrixfactorizationtech- niquesforrecommendersystems. Computer 42,8(2009).
[18] NealLathia,StephenHailes,andLiciaCapra.2009. Temporalcollaborative flteringwithadaptiveneighbourhoods.In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval.ACM, 796–797.
[19] Odalric-AmbrymMaillardandRémiMunos.2010.Onlinelearninginadversarial lipschitzenvironments. Machine Learning and Knowledge Discovery in Databases (2010),305–320.
[20] JulianMcAuley,ChristopherTargett,QinfengShi,andAntonVanDenHengel.
2015.Image-basedrecommendationsonstylesandsubstitutes.In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,43–52.
[21] RóbertPálovicsandAndrásABenczúr.2015.TemporalinfuenceovertheLast.
fmsocialnetwork. Social Network Analysis and Mining 5,1(2015),4.
[22] RóbertPálovics,AndrásABenczúr,LeventeKocsis,TamásKiss,andErzsébet Frigó.2014.Exploitingtemporalinfuenceinonlinerecommendation.In Proceed- ings of the 8th ACM Conference on Recommender systems.ACM,273–280.
[23] RongPan,YunhongZhou,BinCao,NathanNLiu,RajanLukose,MartinScholz, andQiangYang.2008.One-classcollaborativefltering.In Data Mining, 2008.
ICDM’08. Eighth IEEE International Conference on.IEEE,502–511.
[24] IPilászy,ASerény,GDózsa,BHidasi,ASári,andJGub.2015.Neighbormeth- odsvs.matrixfactorizationcasestudiesofreal-liferecommendations.In LSRS Workshop at ACM RecSys.
[25] FilipRadlinski,RobertKleinberg,andThorstenJoachims.2008. Learningdi- verserankingswithmulti-armedbandits.In Proceedings of the 25th international conference on Machine learning.ACM,784–791.
[26] BadrulSarwar,GeorgeKarypis,JosephKonstan,andJohnRiedl.2001.Item-based collaborativeflteringrecommendationalgorithms.In Proceedings of the 10th international conference on World Wide Web.ACM,285–295.
[27] J.C.Spall.1992.Multivariatestochasticapproximationusingasimultaneous perturbationgradientapproximation.IEEE Trans. Automat. Control 37(1992), 332–341.
[28] AndreasTöscher,MichaelJahrer,andRobertMBell.2009.Thebigchaossolution tothenetfixgrandprize. Netfix prize documentation (2009),1–52.
[29] JoãoVinagre,AlípioMárioJorge,andJoãoGama.2014. Evaluationofrec- ommendersystemsinstreamingenvironments.In Workshop on ’Recommender Systems Evaluation: Dimensions and Design’ (REDD 2014), held in conjunction with RecSys 2014.
[30] YisongYueandThorstenJoachims.2009.Interactivelyoptimizinginformation retrievalsystemsasaduelingbanditsproblem.In Proceedings of the 26th Annual International Conference on Machine Learning.ACM,1201–1208.
[31] DanielZoller,StephanDoerfel,ChristianPölitz,andAndreasHotho.2017.Lever- agingUser-InteractionsforTime-AwareTagRecommendations..In RecTemp@
RecSys.9–15.