• Nem Talált Eredményt

Online Ranking Combination Erzsébet

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Online Ranking Combination Erzsébet"

Copied!
8
0
0

Teljes szövegt

(1)

Online Ranking Combination

Erzsébet Frigó

frigo.erzsebet@sztaki.hu

InstituteforComputerScienceandControl(MTASZTAKI) Budapest,Hungary

ABSTRACT

Asataskofhighimportanceforrecommendersystems,wecon- sidertheproblemoflearningtheconvexcombinationofranking algorithmsbyonlinemachinelearning.Inthecaseoftwobase rankers,weshowthattheexponentiallyweightedcombination achievesnearoptimalperformance.However,thenumberofre- quiredpointstobeevaluatedmaybeprohibitivewithmorebase modelsinarealapplication.Weproposeagradientbasedstochastic optimizationalgorithmthatusesfnitediferences.Ournewalgo- rithmachievessimilarempiricalperformancefortwobaserankers, whilescalingwellwithanincreasednumberofmodels.Inourex- perimentswithfvereal-worldrecommendationdatasets,weshow thatthecombinationoferssignifcantimprovementoverprevi- ouslyknownstochasticoptimizationtechniques.Ouralgorithmis thefrstefectivestochasticoptimizationmethodforcombining rankedrecommendationlistsbyonlinemachinelearning.

CCS CONCEPTS

•Information systems →Collaborative fltering; • Theory of computation → Online learning algorithms.

KEYWORDS

ranking;combination;RFDSA ACM Reference Format:

ErzsébetFrigóandLeventeKocsis.2019.OnlineRankingCombination.In Thirteenth ACM Conference on Recommender Systems (RecSys ’19), September 16–20, 2019, Copenhagen, Denmark. ACM,NewYork,NY,USA,8pages.

https://doi.org/10.1145/3298689.3346993

1 INTRODUCTION

Amilestoneintheresearchofrecommendationalgorithms,the NetfixPrizeCompetition[4]hadhighimpactonresearchdirec- tions.Thetargetofthecontestwasbasedontheonetofvestar ratingsgivenbyusers,withonepartofthedatausedformodel trainingandtheotherforevaluation.Asanimpactofthecompeti- tion,tasksnowtermedbatchratingpredictionweredominating researchresults.However,realsystemsdifernotjustinthatthe userfeedbackisimplicit,butalsointhattheyprocessdatastreams whereusersrequestoneorafewitemsatatimeandgetexposed

Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributed forproftorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitation onthefrstpage.Copyrightsforcomponentsofthisworkownedbyothersthanthe author(s)mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,or republish,topostonserversortoredistributetolists,requirespriorspecifcpermission and /orafee.Requestpermissionsfrompermissions@acm.org.

RecSys ’19, September 16–20, 2019, Copenhagen, Denmark

©2019Copyrightheldbytheowner/author(s).PublicationrightslicensedtoACM.

ACMISBN978-1-4503-6243-6/19/09. . . $15.00 https://doi.org/10.1145/3298689.3346993

Levente Kocsis

kocsis@sztaki.hu

InstituteforComputerScienceandControl(MTASZTAKI) Budapest,Hungary

tonewinformationthatmaychangetheirneedsandtastewhen theyreturntotheservicenexttime.Furthermore,anonlinetrained modelmaychangeandreturncompletelydiferentlistsforthe sameuserevenforinteractionsverycloseintime.

Thedifcultyofevaluatingstreamingrecommenderswasfrst mentionedin[18],althoughtheauthorsevaluatedmodelsbyof- finetrainingandtestingsplit.Ideasforonlineevaluationmetrics appearedfrstin[21,22,29].Inonlineorprequentialevaluation[9], whichhasgrowninpopularity,therankingmeasureiscomputed fromasequenceofexamples.Foreachexampleinthesequence,the recommendersystemprovidesatop-k listofitemstotheactiveuser.

Thelistisevaluatedagainsttypicallyasinglerelevantitemthat theuserinteractedwith.Then,theuser-iteminteractionisadded tothepreviouslyavailabledata,andtherecommendersystemis abletoupdateitsmodel.

Recommendersystemsoftenrelyonanensembleofbaseranking algorithms.Forinstance,intheNetfixprizecompetition,consid- erableefortwentintochoosingthealgorithmstotheblendand combiningthem[28].Inanonlinescenario,theenvironmentfora combinationalgorithmisnon-stationary:notonlytheuserpref- erencesanditempopularities,butalsothebaserankingmodels changeintime.Therefore,thecombinationofthebasealgorithms alsoneedstobeupdated.Whileitisinfeasibletoupdatethepa- rametersofthecombinationwiththecomputationallyintensive blendingapproachesusedinbatchsettings,convexcombination ofthebasemodelsoftenleadtosatisfyingresults.Insummary, weconsideronlineconvexcombinationalgorithmsfor(implicit feedback)recommendersunderprequentialevaluation.

Fromthemachinelearningpointofview,themaindifcultyof combiningrankedrecommendationlistsisthatthetypicalranking measures,suchasNDCG[12],arenotcontinuous,makingtheirop- timizationadifculttask.Inthispaper,wecompareandidentifythe numericalissuesoftwostrategiestooptimizefornon-continuous rewards.Thefrstapproachusesexponentiallyweightedforecast- ers,whichexploretheweightspacegloballyanddonotrelyonthe existenceofagradientoftherewardfunction.Thesecondclassof methodsusesgradientdescenttomaximizethereward.

Exponentiallyweightedalgorithms(EWA)[7]optimizeranking combinationweightsbyexploringtheweightspaceglobally.EWA wasshowntobeclosetooptimalforLipschitz-continuousenvi- ronments[19].WewillshowthatEWAisabletooptimizeranking combinationaswell,undercertainassumption(seeProposition4.1).

However,thenumberofcombinationsthatneedstobeevaluated tofulflltheassumptiongrowsexponentiallywiththenumberof baserankers.Therefore,itisnotpracticalinarealapplicationif morebaserankersareemployed.

Tobeabletohandlealargernumberofbaserankers,weturnour attentiontothesecondapproach,localoptimizationbygradient

(2)

basedmethods.Inparticular,westartwiththeResilientSimultane- ousPerturbationStochasticApproximation(RSPSA)algorithm[15], whichwasusedforoptimizingmodelparametersingames.While RSPSAwasshowntocopewithnon-continuousrewards,itisnon- trivialwhetheritcancopewithrankingfunctionsaswell.Indeed, weobserveempiricallythatRSPSAdoesnotscalewellforranking prediction.Thereasonforthisisthatrankingfunctionshavemany fatregionswithrespecttoindividualcombinationweights.

Ourmethod,ResilientFiniteDiferenceStochasticApproxima- tion(RFDSA+),isthefrstefectivestochasticoptimizationmethod forcombiningrankedlists.Toimprovethescalabilitypropertiesof RSPSA,weswitchfromsimultaneousperturbationtofnitedifer- encestoidentifyfatregionswithrespecttoagivenweight.Inthis way,weeliminatethenoiseoftheperturbationofotherweights andconcentratealwaysonlyonoptimizingasingleweightata time.WeshowempiricallythatRFDSA+achievesnearoptimal performancewhentwobaserankersarecombined,andscaleswell withthenumberofbaserankers.

Thearticleisorganizedasfollows:afterdiscussingtherelated researchinSection2,weformalizeourframeworkinSection3.Ex- ponentiallyweightedalgorithmsarediscussedinSection4,where inProposition4.1weshowthetheoreticalguaranteeofEWA.Gra- dientbasedalgorithmsarediscussedinSection5.Ourproposed algorithmRFDSA+isdescribedinSection6.Empiricalevaluation highlightingthestrengthofRFDSA+isprovidedinSection7.Some conclusionsanddiscussionoffutureresearchclosethepaperin Section8.

2 RELATED RESEARCH

Researchonincrementalrecommenderalgorithmswithprequential evaluationscenariohasgainedpopularityinrecentyears.Thereare severalpapersthatuseprequentialevaluation[3,5,13,21,22,31], however,only[22]considerstheissueofcombiningmultiplebase rankers.ThelatterwillbediscussedinmoredetailinSection5.1, andevaluatedempiricallyinSection7.

Rankingcombinationhasreceivedconsiderableattentiondur- ingtheNetfixprizecompetition,whentheapproachof[28]was essentialforthewinningentry.Inthebatchsetting,oneofthelater approachesthatcanbeadaptednaturallytoanonlinescenariois [6].Theauthorsuseexponentiallyweightedforecaster,anduse thecumulativelossofeachbasealgorithmtocomputeitsscorein theconvexcombination.Onecannoticethatanyarbitrarylinear shiftofthescoresofabasealgorithmwouldleaveitscumulative lossunchanged,butitwouldafectthebasealgorithmcontribution tothemix.Therefore,thealgorithmseemssomewhatlesssound, nevertheless,maystillperformreasonablywellonsomepractical instances.WewilldescribethealgorithmmoreformallyinSec- tion4.3,andevaluate(forimplicitfeedbackproblems)empirically inSection7.

Intheonlinesetting,rankingcombinationwasproposedby [25,30]usingduelingbandits.Theirapproachassumesthatthe lossfunctionsareconvexandstationary.Neitherassumptionseems reasonableformostrankingmeasuresinarealapplication.There areseveralalgorithmsintheliteratureofonlinelearningthatcan

beconsideredforcombiningrankingmodels.[2]consideredatwo- pointapproximationofthegradientforconvexfunctions.Therank- ingmeasuresarenotconvex,nevertheless,thealgorithmissimilar toSPSA[27]thathavebeenappliedtooptimizing non-convex functionsaswell.WewilldiscussthealgorithminSection5.2.

Theexponentiallyweightedalgorithmwasappliedtooptimize

√ (non-convex)Lipschitz-continuousfunctions[19]andithas O ( T ) guaranteesinfull-informationsetting,where T isthelengthofthe episode.Fullinformationsettingwouldimply,however,evaluating aprohibitivelylargenumberofpointswhenthenumberofbase rankersisslightlylarger.Therearebanditvariantsaswell[14]that evaluateonlyonepointperiteration,however,theyscalebadly onerror.Theregretboundforthecontinuum-armedbanditsis O (T (N +1)/(N +2)) [14],where N isthedimensionalityoftheprob- lem.Theexponentiallyweightedalgorithmswillbeconsideredin Section4.

Finally,therearealargenumberofstochasticapproximation algorithmsthatcan,inprinciple,beappliedtoonlinerankingcom- bination.Unfortunately,neitherofthemisstraightforwardtouse forrankingfunctionssuchasNDCGthatarenotcontinuous,and evenasmoothedcumulativerankingrewardfunctioncanbenon- convexaswell.Inmostgames,therewardisalsonon-continuous, forexample,1/0forwin/loss,oradiscretenumberofpoints(or money)thatcanbewoninacardgame.ThealgorithmRSPSA[15]

wasproposedfor(ofine)optimizationofsomeparameterofa pokerplayingprogram.Ourproposedalgorithm,RFDSA+,builds ontheideaofRSPSAbutconsidersoneweightatatime,toremedy theproblemsofpastalgorithmsinhandlingfatareasinranking functions.

3 PROBLEM SETUP

Weconsidertheonlinecombinationoftherankedlistofmultiple baserecommenderalgorithms.Assoonasthebasealgorithmsgive aprediction,wehavetoapplyandpotentiallyre-learnthecombina- tionweightonthefy.Incontrasttothetypicalbatchlearningtasks wherewecanforexampleperformgridsearchbyusingalarge amountofpasttrainingdata,closertoarealrecommendersystem operation,inourtask,weprocesstherecommendationrequests andthefeedbackasasequenceintime.Comparedtobatchlearning, theadvantageoftheonlinemethodsisthattheycanadaptfaster toconceptdrifts[8]thatcanrearrangetherelativestrengthofthe diferentbasemodels.

Bothbatchandprequentialevaluationrelyonasetofrecorded user-iteminteractions.Forbatchevaluation,onesplitsthedatain atrainingandatestset,andtrainsthealgorithmsontheformer andtestsonthelatter.Conversely,forprequentialevaluation,we testalgorithmssequentiallyoneachdatapoint,andpotentiallyuse allprecedingdatapointsfortraining.Sinceoften,theuserselects asingleitemonly,wewillconsiderimplicitfeedbackevaluation metricswithonlyonerelevantitem,buttheevaluationcaneasily begeneralizedforthecasewhentheusertakesmultiplechoicesor whenthefeedbackisexplicit.Prequentialevaluationisclosertoa realapplication,sinceinpractice,userinteractionoccurssequen- tially.Algorithmscanalsoexploitthemostrecentdata.Itistrue forbothevaluationmethodsthatrecommendationismadebefore

(3)

revealingthechoiceoftheuser,thatisagivenuser-iteminteraction isprocessedindependentlyofwhatthesystemrecommends.

Givenachronologicallyordereddatasetwith T records,pre- quentialevaluationisanepisodewith T rounds.Ineachround t, wetakethefollowingsteps.

(1) Weobservethenextuser-itempairfromthedataset,and settheactiveuseraccordingly.

(2) Wequerytherecommendersystemforatop-K recommen- dationfortheactiveuser.

(3) Weevaluatetheoutputrecommendationlistagainstthe singlerelevantitem jt thattheuserinteractedwith.

(4) Finally,werevealtherelevantitem jt totherecommender system,andallowtoupdatethemodelusingtheadditional user-itempair.

Inthecontextofconvexcombinationalgorithms,weconsider N baserankingalgorithms,andthe ithbasealgorithmisdenotedby Ai .Ineachround t = 1, . . . ,T ,frst,eachbasealgorithmAi assigns ascore xtij toeachitemj.Afterthat,theconvexcombination algorithmassignstheweight θti toeachalgorithmAi .Theweights forman N -dimensionalvector θt = (θt 1, . . . , θt N ).Theparameter spaceis θt ∈ Θ = RN . Thecombinedscoreofitem j inround t is

PN

0+

xt j = =1θti xtij . Thetoplistsaregeneratedbysortingtheitems

i

bythecombinedscoresindescendingorder.

Aftertheactiveuser’spreferreditemisrevealed,thecombination algorithmcollectsthereward rt ,whichdependsonthetoplist generatedandontheuser’schoice.With anabuseofnotation, wewilldenotetherewardofabaserankerAi by rti ,thereward correspondingtoaweightassignment θ by rt (θ ),andthereward obtainedbyacombinationalgorithmCby rt (C).Thecumulative

Pt

rewardcollecteduptoround t is Rt = τ=1rτ .Welet Rti , Rt (θ ), and Rt (C) denotethecumulativerewardcorrespondingtoabase ranker,aweightvector,andacombinationalgorithm.

Thereareseveralchoicesofrankingmeasures.Apopularchoice, whichweuseinourexperiments,isNDCG@K[12].Inprequential evaluation,weassumetheworstscenariothatthereisonlyone itemwithnon-zerolabelineachround t,namely jt .TheNDCG@K ofapermutation πt oftheitemsreducesto

(1/ log

2(rankπt (jt ) + 1) if rankπt (jt ) ≤K, rt = NDCG@K(πt )=

0 otherwise,

asthereisalwaysexactlyonerelevantitemandhencetheideal DCGisequaltoone.

4 EXPONENTIALLY WEIGHTED BASELINE ALGORITHMS

Thefrstsetofbaselinerankcombinationalgorithmsinthissection relyontheexponentiallyweightedforecaster[7].Theyexplore theweightspaceglobally,withoutrelyingontheexistenceofa gradientoftherewardfunction.InProposition4.1,wewillalso showthatinthecaseoftwobaserecommenders,theexponentially weightedcombinationachievesnearoptimalperformance.

4.1 ExpA

Thesimplestchoicetodealwithmultiplebaserankersistousethe exponentiallyweightedforecasterontherankers.Accordingly,the

combinationalgorithm,denotedbyExpA,selectsbaserankerAi inround t withprobability

Pt −1

−ηt =1rτ i

pti = e τ Pt −1 . (1) PN −ηt =1rτ j

=1e τ

j

SelectingbaserankerAi inround t meanssetting θti = 1and θt j = 0for j , i.Thealgorithmisguaranteedtoachieveacumu- lativerewardthatisnotworsethanthecumulativerewardofthe

bestbaserankersbyanadditive O ( T ) terminexpectation[7].

4.2 ExpW

WhileExpAcanlocatethebestbaseranker,wecanhopethata convexcombinationoftherankerscanachieveabetterperformance thananysingleranker.Wechooseafnitesetofpoints P ⊂ Θ andapplytheexponentiallyweightedforecasterto P tochoose theweightcombinations θt ∈ P toplay.Ifanappropriatelylarge numberofpointsarechosen,andthecumulativerewardfunction RT(θ ) (asafunctionof θ)issufcientlysmooth,thenthealgorithm, denotedbyExpW,willachieveacumulativerewardthatiscloseto thatoftheoptimalconvexcombination.Thefollowingproposition formalizesthisstatement.

Proposition4.1. Let P ⊂ Θ be a fnite set such that

" # √

E max RT (θ ) −max RT (p) ≤ T . (2)

θ ∈Θ p ∈P

Then the regret of the exponentially weighted forecaster applied on P is bounded by

" #

E max RT (θ ) − RT (ExpW) ≤ O˜ T . (3)

θ Θ

Theprooffollowsbyputtingtogethertheregretboundofthe exponentiallyweightedforecasterandinequality(2).

Forasufcientlylarge T ,function RT (θ ) isfairlysmoothin practice,asobservedinSection7.3.Fortwobaserankers,thepa- rameterspacecanberepresentedbyaonedimensionalsimplex,

i.e.asection.Then,auniformgridwith O ( T ) gridpointscanbe sufcient,ifthecumulativefunctionactslikeaLipschitzfunction onthegridpoints.Thislatterconditionwilloftenbetrue(seealso Figure1).However,withmorebaserankers,thenumberofpoints requiredforaLipschitz-likecumulativefunctionis Ω(T (N −1)/2). SinceExpWneedstoevaluatetherewardineachpoint,thenum- berofevaluationsscalesexponentiallywiththenumberofbase rankers.

4.3 ExpAW

In[6],theauthorsproposedanalgorithmthatcanberegardedasa mixofExpAandExpW.Thealgorithm,denotedherebyExpAW, reliesonthecumulativeperformanceofthebaserankers(asExpA), butitisusedastheweightofthebaseranker,insteadofusingitas selectionprobability.TheweightofbaserankerAi inround t is

Pt −1

−ηt

e τ =1rτ i

θti = Pt −1 . (4)

PN −ηt τ =

1rτ j j=1e

Itiseasytoseethatrewardofabaserankerdoesnotchange ifthescoresofrankersarescaledbysomefactor.However,the

(4)

scalingwillafecttherewardofthecombinationalgorithminan arbitraryway.Nevertheless,withareasonablenormalization,the algorithmmaystillleadtoadecentperformance,anditislesslikely tobeafectedbyanincreaseinthenumberofbaserankers.

5 GRADIENT BASELINE ALGORITHMS

Inthissection,wepresentthesecondsetofbaselinerankcombi- nationmethods,whichareguidedbythegradientofthereward function.Inthefrstsubsection,wedescribealgorithm(SGD)that computesthegradientofasurrogatetothereward.Thenexttwo algorithms(SPSAandRSPSA)arestochasticapproximationalgo- rithmsthatapproximatethegradientbyfnitediferences.Wemen- tionthatournewalgorithmRFDSA+buildsonRSPSA,extendingit todealwiththedifcultiesoffatregionsintherankingfunctions.

5.1 SGD

WecallourfrstcombinationalgorithmSGD,sinceitusesstochastic gradientdescentforthemeansquarederror(MSE)asasurrogate tothereward.Thetargetforthecurrentitemissetto1,andthe targetsforasetofrandomlysamplednegativeitemsissetto0.

Afterseeingauser-itempair,astochasticgradientstepistakento minimizetheMSEbetweentherankingscore(xt j )andthetarget value.Thealgorithmwasusedby[23]formatrixfactorizationand by[22]foronlinecombination.

Wedonotexpectthealgorithmtohavedifcultywithalarge numberofbaserankers.However,minimizingthesurrogateloss maynotresultinasufcientlygoodoptimizationoftheoriginal rewardfunction.

5.2 SPSA

Thegradientofmostrankingfunctionswithrespecttothecom- binationweightsistypicallyzeroinmostpointswhereitexists.

However,ifweaverageovermoretimesteps,itstartsto‘smooth out’.Itstillcannotbecomputedinaclosedform,butitcanbe approximatedbyfnitediferences.Foronlineoptimizationofcon- vexfunctions,[2]suggestedthegradienttobeapproximatedby simultaneousperturbation,withanonlinegradientsteptakenin theapproximateddirection.Fornon-convexoptimization,asimilar algorithmisknownasSimultaneousPerturbationStochasticAp- proximation(SPSA)[27].Theapproximatedgradient дti isgiven by

дti = (rt t + ct t ) − rt t − ct t ))/(ct ti ), where ct isanappropriatelydecreasingsequence,

t = (∆t 1, . . . , ∆t N ),

and ∆ti are±1valuedunbiasedBernoullirandomvariables.

5.3 RSPSA

InSPSA,especiallywithnon-smoothfunctions,thedifcultyliesin choosingtheappropriateperturbation.Thesumofrankingreward functionsisastepfunction.Iftheperturbationsizeistoosmall,we mightstuckonaplateauandcannotfndtherightdirection.Ifthe perturbationistoolarge,wemisslocaloptima.Theappropriate perturbationstepsizemightdiferdependingonthecoordinate andtime.

TheRSPSAalgorithmwasproposedin[15]forgames,which alsohaveadiscretereward(e.g.,1forwin,0forloss).Thealgorithm combinesthesimultaneousperturbationapproximationwiththe ResilientBackpropagation(RPROP)[11]updaterule.InRPROP,we assignadistinctstepsizetoeachweight.Informally,ifthedirection ofthegradientchanges,thenthestepsizeisdecreased.Otherwise, thestepsizeisincreased.Theweightupdatedependsonlyonthe signofthegradient,andthestepsizedetermineshowmuchthe weightchanges.InRSPSA,theperturbationsizeforeachweightis connectedtothestepsize,solvingtheabovementioneddifculty.

TheRPROPupdateruleisdesignedforbatchupdate,andtherefore, inoursetting,weuseminibatchestocollectthegradientsbefore anupdate.

6 OUR METHOD: RFDSA+

WedesignedournewmethodbyobservingthebehaviorofRSPSA forrankingcombination.OneofthestrengthsoftheRPROPup- dateruleisthatitisincreasingtheupdatestepsonalargeplateau, andtakinglargerstepsinthedirectionsofthegradient.Ranking functionsasfunctionofacombinationweightconsistofconstant intervals.Howeveriftheperturbationissufcientlylarge,theav- eragedgradientestimatewillbenon-zero.Ifthestepsizefora weightissmallinafatarea,thenitshouldbeincreasedinorder toescapethefatarea,butalsoinordertobeabletoestimatethe rightdirection.Inotherwords,theweightneedsasufcientlylarge perturbationtobeabletoinfuencetherankingfunction.

However,weobservedthatinRSPSA,theestimateddirection changesofteninthefatarea,andthestepsizeinfactdecreases.

Toillustratetheproblem,considerarankingfunctionthatiscom- pletelyfatinthedirectionofsomebutnotallcoordinatesinthe neighborhoodofthecurrent θt .Inthiscase,thedirectionofthe estimatedgradientofthe‘fat’coordinatesbecomesanunbiased Bernoullivariable,sinceeveniftherankingfunctioniscompletely fatwithrespecttocoordinate i,thenumeratorof дti willstillbe non-zerobecauseofthenon-fatcoordinates.However,thenumer- atorwillbeindependentoftherandomlychosendirectionof ∆ti , andhence дti willsimplymirrortherandomvariable ∆ti .

Toremedytheproblemoffatregions,weswitchfromsimul- taneousperturbationtofnitediferencesinordertoidentifythat therankingfunctionisfatwithrespecttotheweightinquestion.

Notethatbyperturbingjustoneweight,weeliminatethenoise comingfromtheperturbationoftheotherweights.Ifwedetecta fatregion,thenweincreasethestepsize.

ThepseudocodeoftheRFDSA+isprovidedinAlgorithm1.The keydiferencestoRSPSAareswitchingfromsimultaneouspertur- bationtofnitediferences(line7–8),andhandlingthefatregions (line22–23).TheRPROPupdateisgivenbyline11–28.

Thealgorithmhasfourparameters:themini-batchsize B,the initialstepsize δ0,andthestepsizeadjustmentvariables η+ and η.Fornoisefunctions,typicalvaluesare η+= 1.1and η= 0.85 [15].Theinitialvalueofthestepsizehasminimalinfuence,since itisquicklyadjusted;itissetto δ0= 0.1.Thesizeoftheminibatch willbechosen1,000intheexperiments,thesameasforSPSAand RSPSA.Thelengthofanepisode T ,andthenumberofthebase rankers N isdeterminedbytheproblem.

(5)

ThekeyvariablesoftheRFDSA+algorithmarethestepsizes δi , correspondingtoeachweight θi .Theauxiliaryvariables si store thepreviousweightupdateandareusedforidentifyingachange inthedirectionofthepartialderivatives.Duringaminibatch,the negativepartialderivativesarecollectedinthevariables дi .

TheRFDSA+algorithmstartswithaninitializationphasein line1–4.Aftereveryuserinteraction,attime t,thepartialderiva- tivesarecomputedasfollows.Foreachbaseranker i,weperturbits weightbytwicethecorrespondingstepsize(line7).Thecoupling factor2isstandardforRSPSA[15],butslightlydiferentvalues canbeusedaswell.Weuseone-sidedpositiveperturbationinthe descriptionofthealgorithm.Usingone-sidedperturbationhalves thenumberofevaluationsneeded.Withone-sidedperturbation, itismorenaturaltochoosethedirectionrandomly(±2δi )valued Bernoullirandomvariable.Thecurrentdescriptionwaschosenfor brevity.Thepartialderivatives дi areupdatedinline8,usingthe fnitediferenceestimator.

Attheendofeachminibatch,theweights θi andthestepsizes δi areupdatedaccordingtotheRPROPrule[11]inlines11–28, independentlyforeachcomponent i.Theauxiliaryvariable h de- tectsachangeofdirectioninthepartialderivative.Ifthereisno change(lines13–15),thestepsizeisincreased,andtheweight θi

willbeupdatedinthedirectionofthederivativeswiththeamount determinedbythestepsize.Ifthereischangeinthedirection,then thestepsizeisdecreased,andtheweightisleftunchanged.The weightwillbeupdatedafterthenextminibatch(line20).Thekey modifcationthatdealswithfatregionsinthepartialderivatives isshowninlines22–23.Accordingly,thestepsizeisincreased ifthepartialderivativeis0duringtheminibatch.Detectingthe fatregionismadepossiblebyusingfnitediferenceestimation, insteadofsimultaneousperturbation.Theactualweightupdateis showninline25.

7 EXPERIMENTS

Inthissection,frstweempiricallyinvestigatehowwellthecombi- nationalgorithmsperformfortwobaserankers,comparedtothe optimal(static)combination.Thenweanalyzehowthecombina- tionalgorithmsscalewhenalargernumberofbaserankersare available.

7.1 Data sets

Alldatasetsconsistsoftime-orderedsequenceofuser-itempairs.

Onlythefrstoccurrenceofauser-itempairisincluded.Thetaskat acertainpointoftimeistoranktheavailableitemsforthecurrent user.Afteratoplistisprovidedbyaparticularalgorithm,areward isobtainedusing N DCG@100asrankingmeasure(seeSection3).

Inourcase,thereisonlyoneitemwithnon-zerolabel(theone fromthecurrentuser-itempair).Followingtheevaluationstep,the itemisrevealedtothebaserankersandthecombinationalgorithm, allowingthemtoupdatetheirmodel.

Intheseexperiments,weusethreedatasetsfromtheAmazon collection(CDsandVinyl;MoviesandTV;Electronics[20]),the 10MMovieLensdataset1,andatwitterdatasetwheretheitems aredefnedbythehashtagsusedintweets.

1http://grouplens.org/datasets/movielens/

1 for i = 1 to N do

2 θi ←1/N ; дi ←0

3 si ←0; δi ← δ0

4 end

5 for t = 1 to T do

6 for i = 1 to N do θ+ ← θ; θ+

7 i ← θi + 2δi

8 дi ← дi + (rt +) − rt (θ ))/(2δi )

9 end

10 if t mod B = 0 then

11 for i = 1 to N do

12 h ← дi si

13 if h > 0 then δi ← η+δi

14

15 si ← siдn(дi i

16 else if h < 0 then δi ← ηδi 17

18 si ←0

19 else

20 si ← siдn(дi i

21 end

22 if дi = 0 then

δi ← η+δi

23

24 else

25 θi ← θi + si

26 end

27 дi ←0

28 end

29 end

30 end

Algorithm 1: RFDSA+

7.2 Base rankers

Werelyontwobasicclassesofcollaborativeflteringmodels:item basednearestneighbor(item2item)[26]andmatrixfactorization [1].Thesetwoclassesofmethodsrepresentthemostsuccessful andmostpopularcollaborativeflteringalgorithms2[17,24].In additiontothetwotechniques,wealsoincludetemporalpopularity (denotedPop),whichrecordshowmanytimesanitemwasvisited intheprecedingtimewindow.

Foritem2item,weuseatime-decayeditem-to-itemsimilarity function,themodelbeingupdatedeveryday.Whencomputingthe scoreforanitem,weconsiderthesimilaritytoallitemspreviously visitedbytheuser.Thus,thisalgorithmalsoincorporatestherecent history.

Weincludefourmatrixfactorizationvariants:online matrix factorization(OMF)[23],onlineasymmetricmatrixfactorization (OAMF)[16],batchmatrixfactorization(MF),and(batch)implicit alternatingleastsquares(iALS)[10].Allvariantsuselatentfac- torswithtendimensions.Theonlinevariantsupdateonceafter

2Forparticulardatasets,theremaybesuperioralgorithms,especiallyinbatchsettings.

Thetwomainbaserankersconsideredarerepresentativesoftwomainapproachesto collaborativefltering,andhavenaturalincrementalversions.Noneofthecombination algorithmsexploittheparticularbaserankers,thusreplacingthebaserankersis straightforward.

(6)

0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054

0.0001 0.001 0.01 0.1 1

NDCG

θ

RT(θ)

Figure 1: Reward with various combination coefcients (θ) for the combination of OMF and item2item on the Amazon- CD set. In the fgure, θ denotes the normalized weight of the OMFbase ranker. The normalized weight for item2itemis 1− θ. RT (0) = 0.03434.

everyuser-itempair.Thebatchvariantsretraintheirmodelsafter every100,000timesteps,usingarequirednumberofiterations.

WeusestochasticgradientdescentforOMF,OAMFandMFwith thecurrentitemfromthedatasetdesignatedaspositiveitem,and additionalnegativeitemssampledrandomly[23].

Theparametersofthebaserankersareoptimizedforeachdata set.Inthecombination,thescoresofthebaserankersarenormal- izedbythestandarddeviation.

7.3 Combination of two models

WeshowourresultsforcombiningthetwobasemodelsOMFand item2item.Welet θ denotetheweightofOMFintheconvexcom- bination.Theaveragecumulativereward,dependingon θ,isshown fortheAmazon-CDdatasetinFigure1.Interestingly,theoptimum isreachedforacombinationthatputsheavyweightoitem2item, eventhoughOMFaloneperformsbetterthanitem2item.

Theaveragecumulativerewardofthecombinationalgorithms isshowninFigure2.Thepeculiarshapeinthefrstthreeyearsis duetothelowamountofdatacollectedandthemoresignifcant changesinthedatadistribution.Weobservetherelativeorder ofthebasealgorithmschangesovertime:atfrstOMFisbetter, thenitem2item,andthenOMFagain.Thisshowsthatselectingan algorithmonpartialdata,andusingonlythatalgorithmlaterisa poorchoice.ExpAfollowsthebetterbasealgorithm,beingslightly worsethanthatduetoexploration.ExpW3achievesaperformance thatequalstothebeststaticconvexcombination(cf.Figure1).

ExpAWisonparwithExpWinthebeginning,butitsperformance deteriorateslater.Thisisnatural,sinceitischoosingalargerweight forOMFduetothesuperiorperformanceofOMF,despitethatthe actualoptimumistoassignalargeweighttoitem2item,asseen inFigure1.SGDhassimilarperformancetoExpW,givingalso alargerweighttoOMF.ThisispossiblybecauseSGDandOMF optimizethesamesurrogatelossfunction.

3ForExpW,thesetofpoints P consistedofauniformgridwith100points.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07

0 1000 2000 3000 4000 5000 6000 7000

NDCG

days

item2item OMF ExpA ExpAW ExpW SGD SPSA RSPSA RFDSA+

Figure 2: Average cumulative NDCG of the ranking algo- rithms on the Amazon-CD set.

0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000 1.0000000

0 1000 2000 3000 4000 5000 6000 7000

θ

days

OptG100+

ExpAW SGD SPSA RSPSA RFDSA+

Figure 3: The weight assignment of the ranking algorithms on the Amazon-CD set. OptG100+ corresponds to the opti- mal weight assignment over 100 uniform grid points, with a few additional points chosen near the presumed optimum.

In the fgure, θ denotes the normalized weight of the OMF base ranker. The normalized weight for item2item is 1− θ.

Theweightassignmentofthecombinationalgorithmsisshown inFigure3.Thefgureincludesadditionallyanoptimalstaticweight assignment,i.e. θt = argmaxθ ∈P Rt (θ ).Byanalyzingtheweight assignmentofthe threecombinationalgorithms that optimize NDCGdirectly(SPSA,RSPSAandRFDSA+),weobservethatall giveitem2itemalargeweight,althoughtheweightsforSPSAare furtherawayfromtheoptimum.Consequently,wenoticeinFig- ure2thatthethreealgorithmsperformwell,RSPSAandRFDSA+

matchingtheoptimalperformanceofExpW.

7.4 Scaling

Weanalyzethescalingofthecombinationalgorithmsintwoways:

(1)byincludinganincreasingnumberofOMFbaserankers(difer- ingonlyintherandominitialization)nexttoitem2item,and(2) byincludingallsixbaserankersinthemix.

(7)

Table 1: Combination of six base rankers on fve data sets. The average NDCG of the base rankers is shown at the top of table, while the average NDCG of the combination algorithms at the bottom.

Algorithm Amazon-CD Amazon-Movies Amazon-Electro MovieLens Twitter

item2item 0.0343 0.0350 0.0156 0.1445 0.0221

OMF 0.0389 0.0440 0.0222 0.1357 0.3528

Pop 0.0628 0.0663 0.0347 0.0857 0.3486

OAMF 0.0318 0.0320 0.0160 0.1717 0.3118

MF 0.0052 0.0086 0.0056 0.0051 0.0055

iALS

ExpA

0.0046

0.0628

0.0075

0.0663

0.0060

0.0347

0.0053

0.1717

0.0054

0.3486

ExpAW 0.0628 0.0664 0.0347 0.1717 0.3486

SGD 0.0640 0.0674 0.0353 0.1568 0.3563

SPSA 0.0696 0.0692 0.0349 0.1678 0.3683

RSPSA 0.0640 0.0670 0.0396 0.1435 0.4468

RFDSA+ 0.0880 0.0882 0.0452 0.1879 0.4601

0.03 0.035 0.04 0.045 0.05 0.055 0.06

1 2 3 4 5 6 7 8 9 10

NDCG

number OMF’s xOMF

item2item + xOMF ExpA ExpAW

SGD SPSA RSPSA RFDSA+

Figure 4: Average NDCG of the ranking algorithms on the Amazon-CD set with varying number of OMFs. The com- bination includes one item2itemand one to ten OMFbase rankers. In the case of xOMF there is only one OMF, but the dimension of the latent factors is increased from 10 to the range of 10–100.

Inthefrstcase,assumingthatthevariousOMFmodelsachieve similarperformance,oneexpectsthattheoptimalweightforitem2item staysrelativelythesame,withtheweightofoneOMFfromthe previoussectiondividedamongthemultipleinstances.Thedif- cultyhereisthattheproperweightassignment(foritem2item) needstobefoundinaspacewithlargerdimensionality.Forlarger dimensions,placinggridpointsthatcovertheparameterspace sufcientlywouldrequireexponentialnumberofevaluations,thus wedonotincludeExpWinthisexperiment.Theperformanceof theothercombinationalgorithmsisshowninFigure4.

WeobservethattherankingperformanceofRFDSA+isnot droppingasthenumberofOMFsincreases.Itisevenabletouse theslightvariationintheOMFstoincreasetheperformanceslightly.

TheperformanceofSPSAandRSPSAdeterioratessignifcantlyas moreOMFsareincludedinthemix.ExpA,ExpAWandSGDall

copewellwiththeincreaseddimension,buttheirperformanceis muchweakeroverallthanthatofRFDSA+.Therelativeinvariance ofExpAunderlinesthattheindividualOMFrankersachievesimilar performance(wecheckedthatthevarianceoftheirNDCGscoreis indeedverysmall).Weaddedtwofurtherbaselinestothefgure:

xOMFisavariantofOMFwithincreasedlatentvectordimension, anditem2item+xOMF,acombinationusingRFDSA+.Weobserve thattheindividualperformanceofanonlinefactormodelincreases withthedimensionofthelatentvectors.However,incombination withitem2item,itisbettertousemanysmallermodelsthanone bigone,assumingthattheyarecombinedwithanalgorithmsuchas RFDSA+thatscaleswell.Resultsontheotherdatasetsaresimilar andomittedduetospacelimitations.

Next,weshowtheperformanceofthecombinationalgorithms whenallthesixbaserankersareusedinTable1.First,wenoticethat theindividualperformanceofthebatchbaserankers(MFandiALS) ispoorforalldatasets.Theperformanceoftheotherbaserankers vary,dependingonthedataset.Regardingtheperformanceofthe combinationalgorithms,wecandrawsomewhatsimilarconclusion asforFigure4:RFDSA+hassignifcantlybetterperformanceforall datasetscomparedtoothercombinationalgorithms.Wealsonote thattheimprovementinperformanceoverthebestindividualbase rankerisconsiderableforalldatasets.ExpAachievesapproximately theperformanceofthebestindividualranker.ExpAWandSGD copereasonablywellwithmorebaserankers,buttheirperformance isnotexceedingbymuchtheperformanceofthebaseranker.SPSA andRSPSA(whichwereperformingwellfortwobaserankers)are notperformingparticularlywellwhenalargernumberofmodels areincludedinthemix.

8 CONCLUSIONS

Inthispaper,wehaveconsideredthetaskoflearningtheonline convexcombinationofbaserecommenderalgorithmsbystochastic optimization.Forthecaseoftwobaserankers,wehaveshown thattheclassofexponentialweightedalgorithmsattainscloseto optimalperformance.However,thealgorithmcannotbeapplied inrealapplicationwithalargernumberofbaserankers,because oftheexponentialnumberofevaluationsneeded.Toremedythe

(8)

scalingproblem,wehaveproposedanewalgorithmRFDSA+.The algorithmusesfnitediferencestoestimatethegradientofthe rankingreward,andtheRPROPupdateruletoadjustthecombina- tionweights.Theupdaterulewasmodifedinordertodealwith fatregionsthatoftenappearinrankingfunctions.Thenewalgo- rithmisshownempiricallytoperformclosetooptimumfortwo baserankers,andscalewellifthenumberofmodelsisincreased byhomogeneousbaserankersorvariedones.Weobservedthat byapplyingtheRFDSA+combinationalgorithmaconsiderable improvementinrankingperformancecanbeobtainedoverthe baserankers.

REFERENCES

[1] JacobAbernethy,KevinCanini,JohnLangford,andAlexSimma.2007.Online collaborativefltering. University of California at Berkeley, Tech. Rep (2007).

[2] AlekhAgarwal,OferDekel,andLinXiao.2010.OptimalAlgorithmsforOnline ConvexOptimizationwithMulti-PointBanditFeedback..In COLT.Citeseer, 28–40.

[3] MarieAl-Ghossein,Pierre-AlexandreMurena,TalelAbdessalem,AnthonyBarré, andAntoineCornuéjols.2018.Adaptivecollaborativetopicmodelingforonline recommendation.In Proceedings of the 12th ACM Conference on Recommender Systems.ACM,338–346.

[4] JamesBennett,StanLanning,etal.2007.Thenetfixprize.In Proceedings of KDD cup and workshop,Vol.2007.NewYork,NY,USA.,35.

[5] RobinBurke.2010.Evaluatingthedynamicpropertiesofrecommendational- gorithms.In Proceedings of the fourth ACM conference on Recommender systems. ACM,225–228.

[6] RóbertBusa-Fekete,BalázsKégl,TamásÉltető,andGyörgySzarvas.2011.Rank- ingbycalibratedAdaBoost.In Proceedings of the Learning to Rank Challenge. 37–48.

[7] NicoloCesa-BianchiandGáborLugosi.2006. Prediction, learning, and games. Cambridgeuniversitypress.

[8] JoaoGama,PedroMedas,GladysCastillo,andPedroRodrigues.2004.Learning withdriftdetection.In Brazilian symposium on artifcial intelligence.Springer, 286–295.

[9] JoãoGama,RaquelSebastião,andPedroPereiraRodrigues.2009. Issuesin evaluationofstreamlearningalgorithms.In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,329–338.

[10] YifanHu,YehudaKoren,andChrisVolinsky.2008.CollaborativeFilteringfor ImplicitFeedbackDatasets..In ICDM,Vol.8.Citeseer,263–272.

[11] ChristianIgelandMichaelHüsken.2000. ImprovingtheRpropLearningAl- gorithm.In Proceedings of the Second International ICSC Symposium on Neural Computation (NC 2000),H.BotheandR.Rojas(Eds.).ICSCAcademicPress, 115–121. citeseer.ist.psu.edu/igel00improving.html

[12] KalervoJärvelinandJaanaKekäläinen.2000.IRevaluationmethodsforretrieving highlyrelevantdocuments.In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval.ACM, 41–48.

[13] MichaelJugovac,DietmarJannach,andMozhganKarimi.2018.Streamingrec:a frameworkforbenchmarkingstream-basednewsrecommenders.In Proceedings of the 12th ACM Conference on Recommender Systems.ACM,269–273.

[14] RobertDKleinberg.2005.Nearlytightboundsforthecontinuum-armedbandit problem.In Advances in Neural Information Processing Systems.697–704.

[15] LeventeKocsisandCsabaSzepesvári.2006.Universalparameteroptimisationin gamesbasedonSPSA. Machine learning 63,3(2006),249–286.

[16] YehudaKoren.2008. Factorizationmeetstheneighborhood:amultifaceted collaborativeflteringmodel.In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,426–434.

[17] YehudaKoren,RobertBell,andChrisVolinsky.2009.Matrixfactorizationtech- niquesforrecommendersystems. Computer 42,8(2009).

[18] NealLathia,StephenHailes,andLiciaCapra.2009. Temporalcollaborative flteringwithadaptiveneighbourhoods.In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval.ACM, 796–797.

[19] Odalric-AmbrymMaillardandRémiMunos.2010.Onlinelearninginadversarial lipschitzenvironments. Machine Learning and Knowledge Discovery in Databases (2010),305–320.

[20] JulianMcAuley,ChristopherTargett,QinfengShi,andAntonVanDenHengel.

2015.Image-basedrecommendationsonstylesandsubstitutes.In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,43–52.

[21] RóbertPálovicsandAndrásABenczúr.2015.TemporalinfuenceovertheLast.

fmsocialnetwork. Social Network Analysis and Mining 5,1(2015),4.

[22] RóbertPálovics,AndrásABenczúr,LeventeKocsis,TamásKiss,andErzsébet Frigó.2014.Exploitingtemporalinfuenceinonlinerecommendation.In Proceed- ings of the 8th ACM Conference on Recommender systems.ACM,273–280.

[23] RongPan,YunhongZhou,BinCao,NathanNLiu,RajanLukose,MartinScholz, andQiangYang.2008.One-classcollaborativefltering.In Data Mining, 2008.

ICDM’08. Eighth IEEE International Conference on.IEEE,502–511.

[24] IPilászy,ASerény,GDózsa,BHidasi,ASári,andJGub.2015.Neighbormeth- odsvs.matrixfactorizationcasestudiesofreal-liferecommendations.In LSRS Workshop at ACM RecSys.

[25] FilipRadlinski,RobertKleinberg,andThorstenJoachims.2008. Learningdi- verserankingswithmulti-armedbandits.In Proceedings of the 25th international conference on Machine learning.ACM,784–791.

[26] BadrulSarwar,GeorgeKarypis,JosephKonstan,andJohnRiedl.2001.Item-based collaborativeflteringrecommendationalgorithms.In Proceedings of the 10th international conference on World Wide Web.ACM,285–295.

[27] J.C.Spall.1992.Multivariatestochasticapproximationusingasimultaneous perturbationgradientapproximation.IEEE Trans. Automat. Control 37(1992), 332–341.

[28] AndreasTöscher,MichaelJahrer,andRobertMBell.2009.Thebigchaossolution tothenetfixgrandprize. Netfix prize documentation (2009),1–52.

[29] JoãoVinagre,AlípioMárioJorge,andJoãoGama.2014. Evaluationofrec- ommendersystemsinstreamingenvironments.In Workshop on ’Recommender Systems Evaluation: Dimensions and Design’ (REDD 2014), held in conjunction with RecSys 2014.

[30] YisongYueandThorstenJoachims.2009.Interactivelyoptimizinginformation retrievalsystemsasaduelingbanditsproblem.In Proceedings of the 26th Annual International Conference on Machine Learning.ACM,1201–1208.

[31] DanielZoller,StephanDoerfel,ChristianPölitz,andAndreasHotho.2017.Lever- agingUser-InteractionsforTime-AwareTagRecommendations..In RecTemp@

RecSys.9–15.

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

As of now, the family consists of two preliminary BERT Base models trained on Wikipedia and the epony- mous huBERT model, trained on a new nine-billion-token corpus; it is also

Non-foaming additives simply re- duce the viscosity of the base bitumen by making changes on in- ternal properties of the base bitumen since the foaming additives such as synthetic

We study the complexity of local search for the Boolean constraint satisfaction problem (CSP), in the following form: given a CSP instance, that is, a collection of constraints, and

In adsorption tests determination is finished when the tension of the chamber is decreased by the water adsorption of the sample to the ERH leyp} (Fig. This is foUo'wed

Lady Macbeth is Shakespeare's most uncontrolled and uncontrollable transvestite hero ine, changing her gender with astonishing rapiditv - a protean Mercury who (and

The transcendental force that s,veeps him into the middle of the dance is like the whirlwind in the previousl y mentioned poems, while the loss of the narrator's

Here Sidney employs a conception of the image-making power of the poet which is clearly cognate with that faculty defined by Shelley as imagination; Shelley’s use of images

The beautiful theorem from [4] led Gy´ arf´ as [3] to consider the geometric problem when the underlying graph is a complete bipartite graph: Take any 2n points in convex position