Online Ranking Combination Erzsébet

(1)

Online Ranking Combination

Erzsébet Frigó

frigo.erzsebet@sztaki.hu

InstituteforComputerScienceandControl(MTASZTAKI) Budapest,Hungary

ABSTRACT

Asataskofhighimportanceforrecommendersystems,wecon- sidertheproblemoflearningtheconvexcombinationofranking algorithmsbyonlinemachinelearning.Inthecaseoftwobase rankers,weshowthattheexponentiallyweightedcombination achievesnearoptimalperformance.However,thenumberofre- quiredpointstobeevaluatedmaybeprohibitivewithmorebase modelsinarealapplication.Weproposeagradientbasedstochastic optimizationalgorithmthatusesfnitediferences.Ournewalgo- rithmachievessimilarempiricalperformancefortwobaserankers, whilescalingwellwithanincreasednumberofmodels.Inourex- perimentswithfvereal-worldrecommendationdatasets,weshow thatthecombinationoferssignifcantimprovementoverprevi- ouslyknownstochasticoptimizationtechniques.Ouralgorithmis thefrstefectivestochasticoptimizationmethodforcombining rankedrecommendationlistsbyonlinemachinelearning.

CCS CONCEPTS

•Information systems →Collaborative fltering; • Theory of computation → Online learning algorithms.

KEYWORDS

ranking;combination;RFDSA ACM Reference Format:

ErzsébetFrigóandLeventeKocsis.2019.OnlineRankingCombination.In Thirteenth ACM Conference on Recommender Systems (RecSys ’19), September 16–20, 2019, Copenhagen, Denmark. ACM,NewYork,NY,USA,8pages.

https://doi.org/10.1145/3298689.3346993

1 INTRODUCTION

Amilestoneintheresearchofrecommendationalgorithms,the NetfixPrizeCompetition[4]hadhighimpactonresearchdirec- tions.Thetargetofthecontestwasbasedontheonetofvestar ratingsgivenbyusers,withonepartofthedatausedformodel trainingandtheotherforevaluation.Asanimpactofthecompeti- tion,tasksnowtermedbatchratingpredictionweredominating researchresults.However,realsystemsdifernotjustinthatthe userfeedbackisimplicit,butalsointhattheyprocessdatastreams whereusersrequestoneorafewitemsatatimeandgetexposed

Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributed forproftorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitation onthefrstpage.Copyrightsforcomponentsofthisworkownedbyothersthanthe author(s)mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,or republish,topostonserversortoredistributetolists,requirespriorspecifcpermission and /orafee.Requestpermissionsfrompermissions@acm.org.

RecSys ’19, September 16–20, 2019, Copenhagen, Denmark

ACMISBN978-1-4503-6243-6/19/09. . . $15.00 https://doi.org/10.1145/3298689.3346993

Levente Kocsis

kocsis@sztaki.hu

InstituteforComputerScienceandControl(MTASZTAKI) Budapest,Hungary

tonewinformationthatmaychangetheirneedsandtastewhen theyreturntotheservicenexttime.Furthermore,anonlinetrained modelmaychangeandreturncompletelydiferentlistsforthe sameuserevenforinteractionsverycloseintime.

Thedifcultyofevaluatingstreamingrecommenderswasfrst mentionedin[18],althoughtheauthorsevaluatedmodelsbyof- finetrainingandtestingsplit.Ideasforonlineevaluationmetrics appearedfrstin[21,22,29].Inonlineorprequentialevaluation[9], whichhasgrowninpopularity,therankingmeasureiscomputed fromasequenceofexamples.Foreachexampleinthesequence,the recommendersystemprovidesatop-k listofitemstotheactiveuser.

Thelistisevaluatedagainsttypicallyasinglerelevantitemthat theuserinteractedwith.Then,theuser-iteminteractionisadded tothepreviouslyavailabledata,andtherecommendersystemis abletoupdateitsmodel.

Recommendersystemsoftenrelyonanensembleofbaseranking algorithms.Forinstance,intheNetfixprizecompetition,consid- erableefortwentintochoosingthealgorithmstotheblendand combiningthem[28].Inanonlinescenario,theenvironmentfora combinationalgorithmisnon-stationary:notonlytheuserpref- erencesanditempopularities,butalsothebaserankingmodels changeintime.Therefore,thecombinationofthebasealgorithms alsoneedstobeupdated.Whileitisinfeasibletoupdatethepa- rametersofthecombinationwiththecomputationallyintensive blendingapproachesusedinbatchsettings,convexcombination ofthebasemodelsoftenleadtosatisfyingresults.Insummary, weconsideronlineconvexcombinationalgorithmsfor(implicit feedback)recommendersunderprequentialevaluation.

Fromthemachinelearningpointofview,themaindifcultyof combiningrankedrecommendationlistsisthatthetypicalranking measures,suchasNDCG[12],arenotcontinuous,makingtheirop- timizationadifculttask.Inthispaper,wecompareandidentifythe numericalissuesoftwostrategiestooptimizefornon-continuous rewards.Thefrstapproachusesexponentiallyweightedforecast- ers,whichexploretheweightspacegloballyanddonotrelyonthe existenceofagradientoftherewardfunction.Thesecondclassof methodsusesgradientdescenttomaximizethereward.

Exponentiallyweightedalgorithms(EWA)[7]optimizeranking combinationweightsbyexploringtheweightspaceglobally.EWA wasshowntobeclosetooptimalforLipschitz-continuousenvi- ronments[19].WewillshowthatEWAisabletooptimizeranking combinationaswell,undercertainassumption(seeProposition4.1).

However,thenumberofcombinationsthatneedstobeevaluated tofulflltheassumptiongrowsexponentiallywiththenumberof baserankers.Therefore,itisnotpracticalinarealapplicationif morebaserankersareemployed.

Tobeabletohandlealargernumberofbaserankers,weturnour attentiontothesecondapproach,localoptimizationbygradient

(2)

basedmethods.Inparticular,westartwiththeResilientSimultane- ousPerturbationStochasticApproximation(RSPSA)algorithm[15], whichwasusedforoptimizingmodelparametersingames.While RSPSAwasshowntocopewithnon-continuousrewards,itisnon- trivialwhetheritcancopewithrankingfunctionsaswell.Indeed, weobserveempiricallythatRSPSAdoesnotscalewellforranking prediction.Thereasonforthisisthatrankingfunctionshavemany fatregionswithrespecttoindividualcombinationweights.

Ourmethod,ResilientFiniteDiferenceStochasticApproxima- tion(RFDSA+),isthefrstefectivestochasticoptimizationmethod forcombiningrankedlists.Toimprovethescalabilitypropertiesof RSPSA,weswitchfromsimultaneousperturbationtofnitedifer- encestoidentifyfatregionswithrespecttoagivenweight.Inthis way,weeliminatethenoiseoftheperturbationofotherweights andconcentratealwaysonlyonoptimizingasingleweightata time.WeshowempiricallythatRFDSA+achievesnearoptimal performancewhentwobaserankersarecombined,andscaleswell withthenumberofbaserankers.

Thearticleisorganizedasfollows:afterdiscussingtherelated researchinSection2,weformalizeourframeworkinSection3.Ex- ponentiallyweightedalgorithmsarediscussedinSection4,where inProposition4.1weshowthetheoreticalguaranteeofEWA.Gra- dientbasedalgorithmsarediscussedinSection5.Ourproposed algorithmRFDSA+isdescribedinSection6.Empiricalevaluation highlightingthestrengthofRFDSA+isprovidedinSection7.Some conclusionsanddiscussionoffutureresearchclosethepaperin Section8.

2 RELATED RESEARCH

Researchonincrementalrecommenderalgorithmswithprequential evaluationscenariohasgainedpopularityinrecentyears.Thereare severalpapersthatuseprequentialevaluation[3,5,13,21,22,31], however,only[22]considerstheissueofcombiningmultiplebase rankers.ThelatterwillbediscussedinmoredetailinSection5.1, andevaluatedempiricallyinSection7.

Rankingcombinationhasreceivedconsiderableattentiondur- ingtheNetfixprizecompetition,whentheapproachof[28]was essentialforthewinningentry.Inthebatchsetting,oneofthelater approachesthatcanbeadaptednaturallytoanonlinescenariois [6].Theauthorsuseexponentiallyweightedforecaster,anduse thecumulativelossofeachbasealgorithmtocomputeitsscorein theconvexcombination.Onecannoticethatanyarbitrarylinear shiftofthescoresofabasealgorithmwouldleaveitscumulative lossunchanged,butitwouldafectthebasealgorithmcontribution tothemix.Therefore,thealgorithmseemssomewhatlesssound, nevertheless,maystillperformreasonablywellonsomepractical instances.WewilldescribethealgorithmmoreformallyinSec- tion4.3,andevaluate(forimplicitfeedbackproblems)empirically inSection7.

Intheonlinesetting,rankingcombinationwasproposedby [25,30]usingduelingbandits.Theirapproachassumesthatthe lossfunctionsareconvexandstationary.Neitherassumptionseems reasonableformostrankingmeasuresinarealapplication.There areseveralalgorithmsintheliteratureofonlinelearningthatcan

beconsideredforcombiningrankingmodels.[2]consideredatwo- pointapproximationofthegradientforconvexfunctions.Therank- ingmeasuresarenotconvex,nevertheless,thealgorithmissimilar toSPSA[27]thathavebeenappliedtooptimizing non-convex functionsaswell.WewilldiscussthealgorithminSection5.2.

Theexponentiallyweightedalgorithmwasappliedtooptimize

√ (non-convex)Lipschitz-continuousfunctions[19]andithas O ( T ) guaranteesinfull-informationsetting,where T isthelengthofthe episode.Fullinformationsettingwouldimply,however,evaluating aprohibitivelylargenumberofpointswhenthenumberofbase rankersisslightlylarger.Therearebanditvariantsaswell[14]that evaluateonlyonepointperiteration,however,theyscalebadly onerror.Theregretboundforthecontinuum-armedbanditsis O (T ^(N⁺¹^{)/(N +}²⁾) [14],where N isthedimensionalityoftheprob- lem.Theexponentiallyweightedalgorithmswillbeconsideredin Section4.

Finally,therearealargenumberofstochasticapproximation algorithmsthatcan,inprinciple,beappliedtoonlinerankingcom- bination.Unfortunately,neitherofthemisstraightforwardtouse forrankingfunctionssuchasNDCGthatarenotcontinuous,and evenasmoothedcumulativerankingrewardfunctioncanbenon- convexaswell.Inmostgames,therewardisalsonon-continuous, forexample,1/0forwin/loss,oradiscretenumberofpoints(or money)thatcanbewoninacardgame.ThealgorithmRSPSA[15]

wasproposedfor(ofine)optimizationofsomeparameterofa pokerplayingprogram.Ourproposedalgorithm,RFDSA+,builds ontheideaofRSPSAbutconsidersoneweightatatime,toremedy theproblemsofpastalgorithmsinhandlingfatareasinranking functions.

3 PROBLEM SETUP

Weconsidertheonlinecombinationoftherankedlistofmultiple baserecommenderalgorithms.Assoonasthebasealgorithmsgive aprediction,wehavetoapplyandpotentiallyre-learnthecombina- tionweightonthefy.Incontrasttothetypicalbatchlearningtasks wherewecanforexampleperformgridsearchbyusingalarge amountofpasttrainingdata,closertoarealrecommendersystem operation,inourtask,weprocesstherecommendationrequests andthefeedbackasasequenceintime.Comparedtobatchlearning, theadvantageoftheonlinemethodsisthattheycanadaptfaster toconceptdrifts[8]thatcanrearrangetherelativestrengthofthe diferentbasemodels.

Bothbatchandprequentialevaluationrelyonasetofrecorded user-iteminteractions.Forbatchevaluation,onesplitsthedatain atrainingandatestset,andtrainsthealgorithmsontheformer andtestsonthelatter.Conversely,forprequentialevaluation,we testalgorithmssequentiallyoneachdatapoint,andpotentiallyuse allprecedingdatapointsfortraining.Sinceoften,theuserselects asingleitemonly,wewillconsiderimplicitfeedbackevaluation metricswithonlyonerelevantitem,buttheevaluationcaneasily begeneralizedforthecasewhentheusertakesmultiplechoicesor whenthefeedbackisexplicit.Prequentialevaluationisclosertoa realapplication,sinceinpractice,userinteractionoccurssequen- tially.Algorithmscanalsoexploitthemostrecentdata.Itistrue forbothevaluationmethodsthatrecommendationismadebefore

(3)

revealingthechoiceoftheuser,thatisagivenuser-iteminteraction isprocessedindependentlyofwhatthesystemrecommends.

Givenachronologicallyordereddatasetwith T records,pre- quentialevaluationisanepisodewith T rounds.Ineachround t, wetakethefollowingsteps.

(1) Weobservethenextuser-itempairfromthedataset,and settheactiveuseraccordingly.

(2) Wequerytherecommendersystemforatop-K recommen- dationfortheactiveuser.

(3) Weevaluatetheoutputrecommendationlistagainstthe singlerelevantitem j_tthattheuserinteractedwith.

(4) Finally,werevealtherelevantitem j_ttotherecommender system,andallowtoupdatethemodelusingtheadditional user-itempair.

Inthecontextofconvexcombinationalgorithms,weconsider N baserankingalgorithms,andthe ithbasealgorithmisdenotedby A_i.Ineachround t = 1, . . . ,T ,frst,eachbasealgorithmA_iassigns ascore x_tijtoeachitemj.Afterthat,theconvexcombination algorithmassignstheweight θ_titoeachalgorithmA_i.Theweights forman N -dimensionalvector θ_t= (θ_t¹, . . . , θ_{t N}).Theparameter spaceis θt ∈ Θ = R^N. Thecombinedscoreofitem j inround t is

P_N

0+

x_{t j}= ₌1θ_tix_tij. Thetoplistsaregeneratedbysortingtheitems

i

bythecombinedscoresindescendingorder.

Aftertheactiveuser’spreferreditemisrevealed,thecombination algorithmcollectsthereward rt ,whichdependsonthetoplist generatedandontheuser’schoice.With anabuseofnotation, wewilldenotetherewardofabaserankerA_iby r_ti,thereward correspondingtoaweightassignment θ by r_t(θ ),andthereward obtainedbyacombinationalgorithmCby r_t(C).Thecumulative

Pt

rewardcollecteduptoround t is Rt = _τ₌1r_τ.Welet Rti , Rt (θ ), and R_t(C) denotethecumulativerewardcorrespondingtoabase ranker,aweightvector,andacombinationalgorithm.

Thereareseveralchoicesofrankingmeasures.Apopularchoice, whichweuseinourexperiments,isNDCG@K[12].Inprequential evaluation,weassumetheworstscenariothatthereisonlyone itemwithnon-zerolabelineachround t,namely j_t.TheNDCG@K ofapermutation πt oftheitemsreducesto

(1/ log

2(rank_π_t(j_t) + 1) if rankπt (j_t) ≤K, r_t= NDCG@K(π_t)=

0 otherwise,

asthereisalwaysexactlyonerelevantitemandhencetheideal DCGisequaltoone.

4 EXPONENTIALLY WEIGHTED BASELINE ALGORITHMS

Thefrstsetofbaselinerankcombinationalgorithmsinthissection relyontheexponentiallyweightedforecaster[7].Theyexplore theweightspaceglobally,withoutrelyingontheexistenceofa gradientoftherewardfunction.InProposition4.1,wewillalso showthatinthecaseoftwobaserecommenders,theexponentially weightedcombinationachievesnearoptimalperformance.

4.1 ExpA

Thesimplestchoicetodealwithmultiplebaserankersistousethe exponentiallyweightedforecasterontherankers.Accordingly,the

combinationalgorithm,denotedbyExpA,selectsbaserankerA_i inround t withprobability

Pt −1

−ηt =1rτ i

p_ti= e ^τ_P_{t −1} . (1) P_N −η_t ₌1r_{τ j}

=1e ^τ

j

SelectingbaserankerA_iinround t meanssetting θ_ti= 1and θ_{t j}= 0for j , i.Thealgorithmisguaranteedtoachieveacumu- lativerewardthatisnotworsethanthecumulativerewardofthe

√

bestbaserankersbyanadditive O ( T ) terminexpectation[7].

4.2 ExpW

WhileExpAcanlocatethebestbaseranker,wecanhopethata convexcombinationoftherankerscanachieveabetterperformance thananysingleranker.Wechooseafnitesetofpoints P ⊂ Θ andapplytheexponentiallyweightedforecasterto P tochoose theweightcombinations θt ∈ P toplay.Ifanappropriatelylarge numberofpointsarechosen,andthecumulativerewardfunction RT(θ ) (asafunctionof θ)issufcientlysmooth,thenthealgorithm, denotedbyExpW,willachieveacumulativerewardthatiscloseto thatoftheoptimalconvexcombination.Thefollowingproposition formalizesthisstatement.

Proposition4.1. Let P ⊂ Θ be a fnite set such that

" # √

E max R_T(θ ) −max R_T(p) ≤ T . (2)

θ ∈Θ p ∈P

Then the regret of the exponentially weighted forecaster applied on P is bounded by

" #

√

E max R_T(θ ) − R_T(ExpW) ≤ O˜ T . (3)

θ ∈Θ

Theprooffollowsbyputtingtogethertheregretboundofthe exponentiallyweightedforecasterandinequality(2).

Forasufcientlylarge T ,function R_T(θ ) isfairlysmoothin practice,asobservedinSection7.3.Fortwobaserankers,thepa- rameterspacecanberepresentedbyaonedimensionalsimplex,

√

i.e.asection.Then,auniformgridwith O ( T ) gridpointscanbe sufcient,ifthecumulativefunctionactslikeaLipschitzfunction onthegridpoints.Thislatterconditionwilloftenbetrue(seealso Figure1).However,withmorebaserankers,thenumberofpoints requiredforaLipschitz-likecumulativefunctionis Ω(T ^(N⁻¹^)/²). SinceExpWneedstoevaluatetherewardineachpoint,thenum- berofevaluationsscalesexponentiallywiththenumberofbase rankers.

4.3 ExpAW

In[6],theauthorsproposedanalgorithmthatcanberegardedasa mixofExpAandExpW.Thealgorithm,denotedherebyExpAW, reliesonthecumulativeperformanceofthebaserankers(asExpA), butitisusedastheweightofthebaseranker,insteadofusingitas selectionprobability.TheweightofbaserankerA_iinround t is

Pt −1

−ηt

e ^{τ =}¹^r^{τ i}

θ_ti= Pt −1 . (4)

P_N _−η_t _{τ =}

1rτ j j=1e

Itiseasytoseethatrewardofabaserankerdoesnotchange ifthescoresofrankersarescaledbysomefactor.However,the

(4)

scalingwillafecttherewardofthecombinationalgorithminan arbitraryway.Nevertheless,withareasonablenormalization,the algorithmmaystillleadtoadecentperformance,anditislesslikely tobeafectedbyanincreaseinthenumberofbaserankers.

5 GRADIENT BASELINE ALGORITHMS

Inthissection,wepresentthesecondsetofbaselinerankcombi- nationmethods,whichareguidedbythegradientofthereward function.Inthefrstsubsection,wedescribealgorithm(SGD)that computesthegradientofasurrogatetothereward.Thenexttwo algorithms(SPSAandRSPSA)arestochasticapproximationalgo- rithmsthatapproximatethegradientbyfnitediferences.Wemen- tionthatournewalgorithmRFDSA+buildsonRSPSA,extendingit todealwiththedifcultiesoffatregionsintherankingfunctions.

5.1 SGD

WecallourfrstcombinationalgorithmSGD,sinceitusesstochastic gradientdescentforthemeansquarederror(MSE)asasurrogate tothereward.Thetargetforthecurrentitemissetto1,andthe targetsforasetofrandomlysamplednegativeitemsissetto0.

Afterseeingauser-itempair,astochasticgradientstepistakento minimizetheMSEbetweentherankingscore(x_{t j})andthetarget value.Thealgorithmwasusedby[23]formatrixfactorizationand by[22]foronlinecombination.

Wedonotexpectthealgorithmtohavedifcultywithalarge numberofbaserankers.However,minimizingthesurrogateloss maynotresultinasufcientlygoodoptimizationoftheoriginal rewardfunction.

5.2 SPSA

Thegradientofmostrankingfunctionswithrespecttothecom- binationweightsistypicallyzeroinmostpointswhereitexists.

However,ifweaverageovermoretimesteps,itstartsto‘smooth out’.Itstillcannotbecomputedinaclosedform,butitcanbe approximatedbyfnitediferences.Foronlineoptimizationofcon- vexfunctions,[2]suggestedthegradienttobeapproximatedby simultaneousperturbation,withanonlinegradientsteptakenin theapproximateddirection.Fornon-convexoptimization,asimilar algorithmisknownasSimultaneousPerturbationStochasticAp- proximation(SPSA)[27].Theapproximatedgradient д_tiisgiven by

д_ti= (r_t(θ_t+ c_t∆_t) − r_t(θ_t− c_t∆_t))/(c_t∆_ti), where c_tisanappropriatelydecreasingsequence,

∆_t= (∆_t¹, . . . , ∆_{t N}),

and ∆_tiare±1valuedunbiasedBernoullirandomvariables.

5.3 RSPSA

InSPSA,especiallywithnon-smoothfunctions,thedifcultyliesin choosingtheappropriateperturbation.Thesumofrankingreward functionsisastepfunction.Iftheperturbationsizeistoosmall,we mightstuckonaplateauandcannotfndtherightdirection.Ifthe perturbationistoolarge,wemisslocaloptima.Theappropriate perturbationstepsizemightdiferdependingonthecoordinate andtime.

TheRSPSAalgorithmwasproposedin[15]forgames,which alsohaveadiscretereward(e.g.,1forwin,0forloss).Thealgorithm combinesthesimultaneousperturbationapproximationwiththe ResilientBackpropagation(RPROP)[11]updaterule.InRPROP,we assignadistinctstepsizetoeachweight.Informally,ifthedirection ofthegradientchanges,thenthestepsizeisdecreased.Otherwise, thestepsizeisincreased.Theweightupdatedependsonlyonthe signofthegradient,andthestepsizedetermineshowmuchthe weightchanges.InRSPSA,theperturbationsizeforeachweightis connectedtothestepsize,solvingtheabovementioneddifculty.

TheRPROPupdateruleisdesignedforbatchupdate,andtherefore, inoursetting,weuseminibatchestocollectthegradientsbefore anupdate.

6 OUR METHOD: RFDSA+

WedesignedournewmethodbyobservingthebehaviorofRSPSA forrankingcombination.OneofthestrengthsoftheRPROPup- dateruleisthatitisincreasingtheupdatestepsonalargeplateau, andtakinglargerstepsinthedirectionsofthegradient.Ranking functionsasfunctionofacombinationweightconsistofconstant intervals.Howeveriftheperturbationissufcientlylarge,theav- eragedgradientestimatewillbenon-zero.Ifthestepsizefora weightissmallinafatarea,thenitshouldbeincreasedinorder toescapethefatarea,butalsoinordertobeabletoestimatethe rightdirection.Inotherwords,theweightneedsasufcientlylarge perturbationtobeabletoinfuencetherankingfunction.

However,weobservedthatinRSPSA,theestimateddirection changesofteninthefatarea,andthestepsizeinfactdecreases.

Toillustratetheproblem,considerarankingfunctionthatiscom- pletelyfatinthedirectionofsomebutnotallcoordinatesinthe neighborhoodofthecurrent θt .Inthiscase,thedirectionofthe estimatedgradientofthe‘fat’coordinatesbecomesanunbiased Bernoullivariable,sinceeveniftherankingfunctioniscompletely fatwithrespecttocoordinate i,thenumeratorof дti willstillbe non-zerobecauseofthenon-fatcoordinates.However,thenumer- atorwillbeindependentoftherandomlychosendirectionof ∆ti , andhence д_tiwillsimplymirrortherandomvariable ∆_ti.

Toremedytheproblemoffatregions,weswitchfromsimul- taneousperturbationtofnitediferencesinordertoidentifythat therankingfunctionisfatwithrespecttotheweightinquestion.

Notethatbyperturbingjustoneweight,weeliminatethenoise comingfromtheperturbationoftheotherweights.Ifwedetecta fatregion,thenweincreasethestepsize.

ThepseudocodeoftheRFDSA+isprovidedinAlgorithm1.The keydiferencestoRSPSAareswitchingfromsimultaneouspertur- bationtofnitediferences(line7–8),andhandlingthefatregions (line22–23).TheRPROPupdateisgivenbyline11–28.

Thealgorithmhasfourparameters:themini-batchsize B,the initialstepsize δ⁰,andthestepsizeadjustmentvariables η⁺and η⁻.Fornoisefunctions,typicalvaluesare η⁺= 1.1and η⁻= 0.85 [15].Theinitialvalueofthestepsizehasminimalinfuence,since itisquicklyadjusted;itissetto δ⁰= 0.1.Thesizeoftheminibatch willbechosen1,000intheexperiments,thesameasforSPSAand RSPSA.Thelengthofanepisode T ,andthenumberofthebase rankers N isdeterminedbytheproblem.

(5)

ThekeyvariablesoftheRFDSA+algorithmarethestepsizes δ_i, correspondingtoeachweight θ_i.Theauxiliaryvariables s_istore thepreviousweightupdateandareusedforidentifyingachange inthedirectionofthepartialderivatives.Duringaminibatch,the negativepartialderivativesarecollectedinthevariables д_i.

TheRFDSA+algorithmstartswithaninitializationphasein line1–4.Aftereveryuserinteraction,attime t,thepartialderiva- tivesarecomputedasfollows.Foreachbaseranker i,weperturbits weightbytwicethecorrespondingstepsize(line7).Thecoupling factor2isstandardforRSPSA[15],butslightlydiferentvalues canbeusedaswell.Weuseone-sidedpositiveperturbationinthe descriptionofthealgorithm.Usingone-sidedperturbationhalves thenumberofevaluationsneeded.Withone-sidedperturbation, itismorenaturaltochoosethedirectionrandomly(±2δ_i)valued Bernoullirandomvariable.Thecurrentdescriptionwaschosenfor brevity.Thepartialderivatives д_iareupdatedinline8,usingthe fnitediferenceestimator.

Attheendofeachminibatch,theweights θ_iandthestepsizes δ_iareupdatedaccordingtotheRPROPrule[11]inlines11–28, independentlyforeachcomponent i.Theauxiliaryvariable h de- tectsachangeofdirectioninthepartialderivative.Ifthereisno change(lines13–15),thestepsizeisincreased,andtheweight θi

willbeupdatedinthedirectionofthederivativeswiththeamount determinedbythestepsize.Ifthereischangeinthedirection,then thestepsizeisdecreased,andtheweightisleftunchanged.The weightwillbeupdatedafterthenextminibatch(line20).Thekey modifcationthatdealswithfatregionsinthepartialderivatives isshowninlines22–23.Accordingly,thestepsizeisincreased ifthepartialderivativeis0duringtheminibatch.Detectingthe fatregionismadepossiblebyusingfnitediferenceestimation, insteadofsimultaneousperturbation.Theactualweightupdateis showninline25.

7 EXPERIMENTS

Inthissection,frstweempiricallyinvestigatehowwellthecombi- nationalgorithmsperformfortwobaserankers,comparedtothe optimal(static)combination.Thenweanalyzehowthecombina- tionalgorithmsscalewhenalargernumberofbaserankersare available.

7.1 Data sets

Alldatasetsconsistsoftime-orderedsequenceofuser-itempairs.

Onlythefrstoccurrenceofauser-itempairisincluded.Thetaskat acertainpointoftimeistoranktheavailableitemsforthecurrent user.Afteratoplistisprovidedbyaparticularalgorithm,areward isobtainedusing N DCG@100asrankingmeasure(seeSection3).

Inourcase,thereisonlyoneitemwithnon-zerolabel(theone fromthecurrentuser-itempair).Followingtheevaluationstep,the itemisrevealedtothebaserankersandthecombinationalgorithm, allowingthemtoupdatetheirmodel.

Intheseexperiments,weusethreedatasetsfromtheAmazon collection(CDsandVinyl;MoviesandTV;Electronics[20]),the 10MMovieLensdataset¹,andatwitterdatasetwheretheitems aredefnedbythehashtagsusedintweets.

1http://grouplens.org/datasets/movielens/

1 for i = 1 to N do

2 θ_i←1/N ; д_i←0

3 s_i←0; δ_i← δ⁰

4 end

5 for t = 1 to T do

6 for i = 1 to N do θ⁺← θ; θ⁺

7 i ← θ_i+ 2δ_i

8 д_i← д_i+ (r_t(θ⁺) − r_t(θ ))/(2δ_i)

9 end

10 if t mod B = 0 then

11 for i = 1 to N do

12 h ← дi s_i

13 if h > 0 then δ_i← η⁺δ_i

14

15 s_i← siдn(д_i)δ_i

16 else if h < 0 then δ_i← η⁻δi 17

18 s_i←0

19 else

20 s_i← siдn(дi )δ_i

21 end

22 if дi = 0 then

δ_i← η⁺δ_i

23

24 else

25 θ_i← θ_i+ s_i

26 end

27 д_i←0

28 end

29 end

30 end

Algorithm 1: RFDSA+

7.2 Base rankers

Werelyontwobasicclassesofcollaborativeflteringmodels:item basednearestneighbor(item2item)[26]andmatrixfactorization [1].Thesetwoclassesofmethodsrepresentthemostsuccessful andmostpopularcollaborativeflteringalgorithms²[17,24].In additiontothetwotechniques,wealsoincludetemporalpopularity (denotedPop),whichrecordshowmanytimesanitemwasvisited intheprecedingtimewindow.

Foritem2item,weuseatime-decayeditem-to-itemsimilarity function,themodelbeingupdatedeveryday.Whencomputingthe scoreforanitem,weconsiderthesimilaritytoallitemspreviously visitedbytheuser.Thus,thisalgorithmalsoincorporatestherecent history.

Weincludefourmatrixfactorizationvariants:online matrix factorization(OMF)[23],onlineasymmetricmatrixfactorization (OAMF)[16],batchmatrixfactorization(MF),and(batch)implicit alternatingleastsquares(iALS)[10].Allvariantsuselatentfac- torswithtendimensions.Theonlinevariantsupdateonceafter

2Forparticulardatasets,theremaybesuperioralgorithms,especiallyinbatchsettings.

Thetwomainbaserankersconsideredarerepresentativesoftwomainapproachesto collaborativefltering,andhavenaturalincrementalversions.Noneofthecombination algorithmsexploittheparticularbaserankers,thusreplacingthebaserankersis straightforward.

(6)

0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054

0.0001 0.001 0.01 0.1 1

NDCG

θ

R_T(θ)

Figure 1: Reward with various combination coefcients (θ) for the combination of OMF and item2item on the Amazon- CD set. In the fgure, θ denotes the normalized weight of the OMFbase ranker. The normalized weight for item2itemis 1− θ. R_T(0) = 0.03434.

everyuser-itempair.Thebatchvariantsretraintheirmodelsafter every100,000timesteps,usingarequirednumberofiterations.

WeusestochasticgradientdescentforOMF,OAMFandMFwith thecurrentitemfromthedatasetdesignatedaspositiveitem,and additionalnegativeitemssampledrandomly[23].

Theparametersofthebaserankersareoptimizedforeachdata set.Inthecombination,thescoresofthebaserankersarenormal- izedbythestandarddeviation.

7.3 Combination of two models

WeshowourresultsforcombiningthetwobasemodelsOMFand item2item.Welet θ denotetheweightofOMFintheconvexcom- bination.Theaveragecumulativereward,dependingon θ,isshown fortheAmazon-CDdatasetinFigure1.Interestingly,theoptimum isreachedforacombinationthatputsheavyweightoitem2item, eventhoughOMFaloneperformsbetterthanitem2item.

Theaveragecumulativerewardofthecombinationalgorithms isshowninFigure2.Thepeculiarshapeinthefrstthreeyearsis duetothelowamountofdatacollectedandthemoresignifcant changesinthedatadistribution.Weobservetherelativeorder ofthebasealgorithmschangesovertime:atfrstOMFisbetter, thenitem2item,andthenOMFagain.Thisshowsthatselectingan algorithmonpartialdata,andusingonlythatalgorithmlaterisa poorchoice.ExpAfollowsthebetterbasealgorithm,beingslightly worsethanthatduetoexploration.ExpW³achievesaperformance thatequalstothebeststaticconvexcombination(cf.Figure1).

ExpAWisonparwithExpWinthebeginning,butitsperformance deteriorateslater.Thisisnatural,sinceitischoosingalargerweight forOMFduetothesuperiorperformanceofOMF,despitethatthe actualoptimumistoassignalargeweighttoitem2item,asseen inFigure1.SGDhassimilarperformancetoExpW,givingalso alargerweighttoOMF.ThisispossiblybecauseSGDandOMF optimizethesamesurrogatelossfunction.

3ForExpW,thesetofpoints P consistedofauniformgridwith100points.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07

0 1000 2000 3000 4000 5000 6000 7000

NDCG

days

item2item OMF ExpA ExpAW ExpW SGD SPSA RSPSA RFDSA+

Figure 2: Average cumulative NDCG of the ranking algorithms on the Amazon-CD set.

0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000 1.0000000

0 1000 2000 3000 4000 5000 6000 7000

θ

days

OptG100+

ExpAW SGD SPSA RSPSA RFDSA+

Figure 3: The weight assignment of the ranking algorithms on the Amazon-CD set. OptG100+ corresponds to the optimal weight assignment over 100 uniform grid points, with a few additional points chosen near the presumed optimum.

In the fgure, θ denotes the normalized weight of the OMF base ranker. The normalized weight for item2item is 1− θ.

Theweightassignmentofthecombinationalgorithmsisshown inFigure3.Thefgureincludesadditionallyanoptimalstaticweight assignment,i.e. θt = argmax_θ_∈PR_t(θ ).Byanalyzingtheweight assignmentofthe threecombinationalgorithms that optimize NDCGdirectly(SPSA,RSPSAandRFDSA+),weobservethatall giveitem2itemalargeweight,althoughtheweightsforSPSAare furtherawayfromtheoptimum.Consequently,wenoticeinFig- ure2thatthethreealgorithmsperformwell,RSPSAandRFDSA+

matchingtheoptimalperformanceofExpW.

7.4 Scaling

Weanalyzethescalingofthecombinationalgorithmsintwoways:

(1)byincludinganincreasingnumberofOMFbaserankers(difer- ingonlyintherandominitialization)nexttoitem2item,and(2) byincludingallsixbaserankersinthemix.

(7)

Table 1: Combination of six base rankers on fve data sets. The average NDCG of the base rankers is shown at the top of table, while the average NDCG of the combination algorithms at the bottom.

Algorithm Amazon-CD Amazon-Movies Amazon-Electro MovieLens Twitter

item2item 0.0343 0.0350 0.0156 0.1445 0.0221

OMF 0.0389 0.0440 0.0222 0.1357 0.3528

Pop 0.0628 0.0663 0.0347 0.0857 0.3486

OAMF 0.0318 0.0320 0.0160 0.1717 0.3118

MF 0.0052 0.0086 0.0056 0.0051 0.0055

iALS

ExpA

0.0046

0.0628

0.0075

0.0663

0.0060

0.0347

0.0053

0.1717

0.0054

0.3486

ExpAW 0.0628 0.0664 0.0347 0.1717 0.3486

SGD 0.0640 0.0674 0.0353 0.1568 0.3563

SPSA 0.0696 0.0692 0.0349 0.1678 0.3683

RSPSA 0.0640 0.0670 0.0396 0.1435 0.4468

RFDSA+ 0.0880 0.0882 0.0452 0.1879 0.4601

0.03 0.035 0.04 0.045 0.05 0.055 0.06

1 2 3 4 5 6 7 8 9 10

NDCG

number OMF’s xOMF

item2item + xOMF ExpA ExpAW

SGD SPSA RSPSA RFDSA+

Figure 4: Average NDCG of the ranking algorithms on the Amazon-CD set with varying number of OMFs. The combination includes one item2itemand one to ten OMFbase rankers. In the case of xOMF there is only one OMF, but the dimension of the latent factors is increased from 10 to the range of 10–100.

Inthefrstcase,assumingthatthevariousOMFmodelsachieve similarperformance,oneexpectsthattheoptimalweightforitem2item staysrelativelythesame,withtheweightofoneOMFfromthe previoussectiondividedamongthemultipleinstances.Thedif- cultyhereisthattheproperweightassignment(foritem2item) needstobefoundinaspacewithlargerdimensionality.Forlarger dimensions,placinggridpointsthatcovertheparameterspace sufcientlywouldrequireexponentialnumberofevaluations,thus wedonotincludeExpWinthisexperiment.Theperformanceof theothercombinationalgorithmsisshowninFigure4.

WeobservethattherankingperformanceofRFDSA+isnot droppingasthenumberofOMFsincreases.Itisevenabletouse theslightvariationintheOMFstoincreasetheperformanceslightly.

TheperformanceofSPSAandRSPSAdeterioratessignifcantlyas moreOMFsareincludedinthemix.ExpA,ExpAWandSGDall

copewellwiththeincreaseddimension,buttheirperformanceis muchweakeroverallthanthatofRFDSA+.Therelativeinvariance ofExpAunderlinesthattheindividualOMFrankersachievesimilar performance(wecheckedthatthevarianceoftheirNDCGscoreis indeedverysmall).Weaddedtwofurtherbaselinestothefgure:

xOMFisavariantofOMFwithincreasedlatentvectordimension, anditem2item+xOMF,acombinationusingRFDSA+.Weobserve thattheindividualperformanceofanonlinefactormodelincreases withthedimensionofthelatentvectors.However,incombination withitem2item,itisbettertousemanysmallermodelsthanone bigone,assumingthattheyarecombinedwithanalgorithmsuchas RFDSA+thatscaleswell.Resultsontheotherdatasetsaresimilar andomittedduetospacelimitations.

Next,weshowtheperformanceofthecombinationalgorithms whenallthesixbaserankersareusedinTable1.First,wenoticethat theindividualperformanceofthebatchbaserankers(MFandiALS) ispoorforalldatasets.Theperformanceoftheotherbaserankers vary,dependingonthedataset.Regardingtheperformanceofthe combinationalgorithms,wecandrawsomewhatsimilarconclusion asforFigure4:RFDSA+hassignifcantlybetterperformanceforall datasetscomparedtoothercombinationalgorithms.Wealsonote thattheimprovementinperformanceoverthebestindividualbase rankerisconsiderableforalldatasets.ExpAachievesapproximately theperformanceofthebestindividualranker.ExpAWandSGD copereasonablywellwithmorebaserankers,buttheirperformance isnotexceedingbymuchtheperformanceofthebaseranker.SPSA andRSPSA(whichwereperformingwellfortwobaserankers)are notperformingparticularlywellwhenalargernumberofmodels areincludedinthemix.

8 CONCLUSIONS

Inthispaper,wehaveconsideredthetaskoflearningtheonline convexcombinationofbaserecommenderalgorithmsbystochastic optimization.Forthecaseoftwobaserankers,wehaveshown thattheclassofexponentialweightedalgorithmsattainscloseto optimalperformance.However,thealgorithmcannotbeapplied inrealapplicationwithalargernumberofbaserankers,because oftheexponentialnumberofevaluationsneeded.Toremedythe

(8)

scalingproblem,wehaveproposedanewalgorithmRFDSA+.The algorithmusesfnitediferencestoestimatethegradientofthe rankingreward,andtheRPROPupdateruletoadjustthecombina- tionweights.Theupdaterulewasmodifedinordertodealwith fatregionsthatoftenappearinrankingfunctions.Thenewalgo- rithmisshownempiricallytoperformclosetooptimumfortwo baserankers,andscalewellifthenumberofmodelsisincreased byhomogeneousbaserankersorvariedones.Weobservedthat byapplyingtheRFDSA+combinationalgorithmaconsiderable improvementinrankingperformancecanbeobtainedoverthe baserankers.

REFERENCES

[1] JacobAbernethy,KevinCanini,JohnLangford,andAlexSimma.2007.Online collaborativefltering. University of California at Berkeley, Tech. Rep (2007).

[2] AlekhAgarwal,OferDekel,andLinXiao.2010.OptimalAlgorithmsforOnline ConvexOptimizationwithMulti-PointBanditFeedback..In COLT.Citeseer, 28–40.

[3] MarieAl-Ghossein,Pierre-AlexandreMurena,TalelAbdessalem,AnthonyBarré, andAntoineCornuéjols.2018.Adaptivecollaborativetopicmodelingforonline recommendation.In Proceedings of the 12th ACM Conference on Recommender Systems.ACM,338–346.

[4] JamesBennett,StanLanning,etal.2007.Thenetfixprize.In Proceedings of KDD cup and workshop,Vol.2007.NewYork,NY,USA.,35.

[5] RobinBurke.2010.Evaluatingthedynamicpropertiesofrecommendational- gorithms.In Proceedings of the fourth ACM conference on Recommender systems. ACM,225–228.

[6] RóbertBusa-Fekete,BalázsKégl,TamásÉltető,andGyörgySzarvas.2011.Rank- ingbycalibratedAdaBoost.In Proceedings of the Learning to Rank Challenge. 37–48.

[7] NicoloCesa-BianchiandGáborLugosi.2006. Prediction, learning, and games. Cambridgeuniversitypress.

[8] JoaoGama,PedroMedas,GladysCastillo,andPedroRodrigues.2004.Learning withdriftdetection.In Brazilian symposium on artifcial intelligence.Springer, 286–295.

[9] JoãoGama,RaquelSebastião,andPedroPereiraRodrigues.2009. Issuesin evaluationofstreamlearningalgorithms.In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,329–338.

[10] YifanHu,YehudaKoren,andChrisVolinsky.2008.CollaborativeFilteringfor ImplicitFeedbackDatasets..In ICDM,Vol.8.Citeseer,263–272.

[11] ChristianIgelandMichaelHüsken.2000. ImprovingtheRpropLearningAl- gorithm.In Proceedings of the Second International ICSC Symposium on Neural Computation (NC 2000),H.BotheandR.Rojas(Eds.).ICSCAcademicPress, 115–121. citeseer.ist.psu.edu/igel00improving.html

[12] KalervoJärvelinandJaanaKekäläinen.2000.IRevaluationmethodsforretrieving highlyrelevantdocuments.In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval.ACM, 41–48.

[13] MichaelJugovac,DietmarJannach,andMozhganKarimi.2018.Streamingrec:a frameworkforbenchmarkingstream-basednewsrecommenders.In Proceedings of the 12th ACM Conference on Recommender Systems.ACM,269–273.

[14] RobertDKleinberg.2005.Nearlytightboundsforthecontinuum-armedbandit problem.In Advances in Neural Information Processing Systems.697–704.

[15] LeventeKocsisandCsabaSzepesvári.2006.Universalparameteroptimisationin gamesbasedonSPSA. Machine learning 63,3(2006),249–286.

[16] YehudaKoren.2008. Factorizationmeetstheneighborhood:amultifaceted collaborativeflteringmodel.In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,426–434.

[17] YehudaKoren,RobertBell,andChrisVolinsky.2009.Matrixfactorizationtech- niquesforrecommendersystems. Computer 42,8(2009).

[18] NealLathia,StephenHailes,andLiciaCapra.2009. Temporalcollaborative flteringwithadaptiveneighbourhoods.In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval.ACM, 796–797.

[19] Odalric-AmbrymMaillardandRémiMunos.2010.Onlinelearninginadversarial lipschitzenvironments. Machine Learning and Knowledge Discovery in Databases (2010),305–320.

[20] JulianMcAuley,ChristopherTargett,QinfengShi,andAntonVanDenHengel.

2015.Image-basedrecommendationsonstylesandsubstitutes.In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,43–52.

[21] RóbertPálovicsandAndrásABenczúr.2015.TemporalinfuenceovertheLast.

fmsocialnetwork. Social Network Analysis and Mining 5,1(2015),4.

[22] RóbertPálovics,AndrásABenczúr,LeventeKocsis,TamásKiss,andErzsébet Frigó.2014.Exploitingtemporalinfuenceinonlinerecommendation.In Proceed- ings of the 8th ACM Conference on Recommender systems.ACM,273–280.

[23] RongPan,YunhongZhou,BinCao,NathanNLiu,RajanLukose,MartinScholz, andQiangYang.2008.One-classcollaborativefltering.In Data Mining, 2008.

ICDM’08. Eighth IEEE International Conference on.IEEE,502–511.

[24] IPilászy,ASerény,GDózsa,BHidasi,ASári,andJGub.2015.Neighbormeth- odsvs.matrixfactorizationcasestudiesofreal-liferecommendations.In LSRS Workshop at ACM RecSys.

[25] FilipRadlinski,RobertKleinberg,andThorstenJoachims.2008. Learningdi- verserankingswithmulti-armedbandits.In Proceedings of the 25th international conference on Machine learning.ACM,784–791.

[26] BadrulSarwar,GeorgeKarypis,JosephKonstan,andJohnRiedl.2001.Item-based collaborativeflteringrecommendationalgorithms.In Proceedings of the 10th international conference on World Wide Web.ACM,285–295.

[27] J.C.Spall.1992.Multivariatestochasticapproximationusingasimultaneous perturbationgradientapproximation.IEEE Trans. Automat. Control 37(1992), 332–341.

[28] AndreasTöscher,MichaelJahrer,andRobertMBell.2009.Thebigchaossolution tothenetfixgrandprize. Netfix prize documentation (2009),1–52.

[29] JoãoVinagre,AlípioMárioJorge,andJoãoGama.2014. Evaluationofrec- ommendersystemsinstreamingenvironments.In Workshop on ’Recommender Systems Evaluation: Dimensions and Design’ (REDD 2014), held in conjunction with RecSys 2014.

[30] YisongYueandThorstenJoachims.2009.Interactivelyoptimizinginformation retrievalsystemsasaduelingbanditsproblem.In Proceedings of the 26th Annual International Conference on Machine Learning.ACM,1201–1208.

[31] DanielZoller,StephanDoerfel,ChristianPölitz,andAndreasHotho.2017.Lever- agingUser-InteractionsforTime-AwareTagRecommendations..In RecTemp@

RecSys.9–15.