Contents lists available atScienceDirect
Journal of Computer and System Sciences
www.elsevier.com/locate/jcss
On the combinatorial design of data centre network topologies ✩ , ✩✩
Iain A. Stewart
SchoolofEngineeringandComputingSciences,DurhamUniversity,ScienceLabs,SouthRoad,DurhamDH13LE,UK
a rt i c l e i n f o a b s t ra c t
Articlehistory:
Received28April2016
Receivedinrevisedform 26January2017 Accepted29May2017
Availableonline13June2017
Keywords:
Datacentrenetworks
Switch-centricdatacentrenetworks Fat-Trees
Combinatorialdesigns Bipartitegraphs Pathdiversity
Thetheoryofcombinatorialdesignshasrecentlybeenusedinordertobuildswitch-centric data centre networks incorporating alarge number ofservers, in comparison with the popularFat-Treedatacentrenetwork.Weclarifyandextendtheseresultsandprovethat inthesedatacentrenetworks:therearepairwiselink-disjointpathsjoiningalltheservers adjacenttosomeswitchwithalltheserversadjacenttoanyotherswitch;and thereare pairwiselink-disjointpathsfromalltheserversadjacenttosomeswitchtoanyidentically- sizedcollectionoftargetservers wherethesetargetserversneednot beadjacenttothe sameswitch. In bothcases,wealways control the pathlengths.Our constructionsand analysisareundertakenonbipartitegraphswiththeapplicationstodatacentrenetworks being easily derived. Our results show the potential of the application of results and methodologiesfromcombinatoricstodatacentrenetworkdesign.
©2017TheAuthor(s).PublishedbyElsevierInc.Thisisanopenaccessarticleunderthe CCBYlicense(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
1.1. Thedatacentrenetworkcontext
Datacentresareexpandingbothintermsoftheirsizeandtheirimportanceascomputationalplatformsforcloudcom- puting, websearch, social networking,andso on.There isan increasing demandthat datacentres incorporatemoreand moreserversbutsothatoverallcomputationalefficiencyisnotcompromisedthroughexcessivetraffic.A keyfactorastothe eventualperformanceofadatacentreisthedatacentrenetwork(DCN);thatis,theinterconnectionfabricoftheserversand switcheswithinthedatacentre.Aswestrivetoincorporatemoreandmoreservers,newtopologiesarebeingdevelopedso astocopewiththeincreaseinscaleandbestutilizetheadditionalcomputationalpower.Itiswithtopologicalaspects of DCNsthatweareconcernedinthispaper.
ThetraditionaldesignofaDCNisswitch-centricsothattheroutingintelligenceresidesamongsttheswitches,withthe serversbehavingonlyascomputationalnodes.Inswitch-centricDCNs,therearenodirectserver-to-serverlinks;onlyserver- to-switch andswitch-to-switchlinks.Switch-centric DCNsaretraditionallytree-likewithserverslocatedatthe‘leaves’ of the tree-likestructure.Examples includeElasticTree [1],VL2 [2], HyperX[3], Portland[4],andFlattenedButterfly[5],al- thoughthedominatingswitch-centricDCNisFat-Tree[6].Whilst itisgenerallyacknowledgedthattree-like,switch-centric
✩ ThisworkwassupportedbytheUKEngineeringandPhysicalSciencesResearchCouncil(EPSRC)grant‘InterconnectionNetworks:Practiceuniteswith Theory(INPUT)’[grantnumberEP/K015680/1].
✩✩ ApreliminaryversionofthispaperappearedasanextendedabstractintheProceedingsof20thInternationalSymposiumonFundamentalsofComputation Theory(A.Kosowski,I.Walukiewicz,eds.),Gdansk,Poland,August17–192015,LectureNotesinComputerScience,Volume9210,Springer,2015,283–295.
E-mailaddress:i.a.stewart@durham.ac.uk.
http://dx.doi.org/10.1016/j.jcss.2017.05.015
0022-0000/©2017TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).
concernedhere.
Itisextremelydifficultto designcomputationallyefficient(switch-centric)DCNssoastoincorporatelargenumbersof serversastherearemanyadditionalconsiderationstotake intoaccount.Forexample,switchesand(especially)servers in data centreshave a limitednumber ofportswith a consequencebeingthat the more servers there are,the greater the averageorworst-caselink-countbetweentwodistinctservers;hence,thereisapacketlatencyoverheadtobeborne.Also, so astobetter support routing,fault-tolerance, andload-balancing, we wouldprefer that thereare numerousalternative pathswithintheDCNjoininganytwodistinctservers;thatis,thatthereispathdiversity.IrrespectiveoftheDCNparadigm within whichone works,there aremanyother designparameters tobear inmindrelatingto,forexample,(incremental) scalability, throughput, cost, oversubscription, energy consumption, latency, and security (see, for example, [8,9] for an overview).TheupshotisthattheDCNdesignerhastosimultaneouslysecureanumberofperformancecharacteristics,some ofwhicharecompetingagainsteachother;thismakestheDCNdesignspacecomplexanddifficulttoworkin.
1.2. UsingcombinatorialdesignstobuildDCNs
Arecentproposalin[10]advocatedtheuseofcombinatorialdesigntheoryinordertodesignswitch-centricDCNs;these DCNshavebeneficialpropertiesasregardsincorporatingmoreserversandpossessingpathdiversityyetitispossibletolimit theworst-caselink-lengthofserver-to-servershortestpaths(andso,ultimately,achieve bettercontroloverpacketlatency inaDCN).Theuseofcombinatorialdesignswithinthestudyofgeneralinterconnectionnetworksisnotnewandoriginated in[11]wherethetargetednetworksinvolvedprocessorscommunicatingviabuses(thereaderisreferredto[12]forarange ofapplications ofcombinatorialdesigntheory within computer science).A hypergraph frameworkwas developed in[11]
where the hypergraph nodes representthe processors andthe hyperedges the buses. Likewise, an analogousframework was developed in [10] butwhere the hypergraph nodes andedges both represent switches so that the pendantservers
‘hangoff’someoftheswitches(wepresentadetaileddescriptionofthisframeworkinSection3.3).In[10],theubiquitous switch-centricFat-TreeDCNfrom[6]wasusedasayardstickagainstwhichtocomparethenewDCNdesignsdevelopedin [10]underthenormalizationthatallDCNsaretohavethesameworst-caselink-lengthofserver-to-servershortestpaths, namely 6,asthis equals theworst-case link-lengthofserver-to-server shortestpaths in the Fat-Tree DCN. Itwas shown thatmoreserverscanbeincorporatedwithinthenewDCNsyet,crucially,theresultingDCNshavegoodpathdiversity.Itis thealgebraicproperties(relatingtosymmetryandbalance)possessedbytransversaldesignsthatenable theconstructions andanalysisasdescribed in[10].One slightdifficulty withtheoriginalandnovelapproach takenin[10] isthat some of thepathdiversityresultsderivedthereareincorrect(asweexplainlaterinSection4.1).Notonlyhascombinatorialdesign theoryfeaturedasregardsthedesignofinterconnectionnetworksbutotheraspects ofalgebrahavetoo;indeed,therehas beenrecentworkontherelevanceofCayleygraphs,Hamminggraphs,andhyperbolicitytoDCNdesign(see,e.g.,[13–15]).
1.3. Ourcontribution
Inthispaperwe returntotheframework of[10]andformulateandprove pathdiversityresultsfortheswitch-centric DCNsconstructedusingthemethodsofthat paper.AsourconcernisentirelywithtopologicalpropertiesofDCNs,hence- forth we abstract our DCNsas undirectedgraphs where thenodes are to representservers and switches andthe edges point-to-pointlinks.Thecruxoftheconstructionin[10]is(essentially)tobuildabipartitegraphusingasystematicmethod, calledthe3-stepmethod,involvinga different‘base’bipartite graphandatransversaldesign,andtoconverttheresulting bipartite graphintoswitch-centricDCNs(in avariety ofways).Afterexplaininghow hypergraphsandtransversaldesigns canall beconsideredasbipartitegraphs inSection2,inSection3we provideadetaileddescriptionofthe3-stepframe- workfrom[10]andexplainhowthebipartitegraphsconstructedareconvertedintoswitch-centricDCNs.Next,werevisit theresultsfrom[10].Inparticular,inSection4wecorrectandextendtheanalysisin[10]andaffirmthatusingthe3-step methodfrom[10],we canbuildswitch-centricDCNs:withmanymoreservers thantheFat-TreeDCNyetsothat,likethe Fat-Tree,everyserver-to-servershortestpathhaslengthatmost6;andsothat(assumingsomenumericconditionsonthe basebipartitegraphandthetransversaldesign)wecanfindpairwiselink-disjointpathsfromalloftheserversadjacentto aparticularswitchtoalloftheserversadjacenttoanyotherswitch.Moreover,weprovideanupperboundonthelengths ofthe pathsconstructed intermsofthe diameterofthebasebipartite graph (seeTheorem 4).We alsodeal withasce- nariomissingfrom[10](seepart(b)ofTheorem 4).Asweexplain,thegeneralsituationismoresubtlethanwas assumed in[10].
The DCNpath diversity,aswe havedescribed itabove,comes aboutfrombuildingbipartite graphs(which aresubse- quentlyconvertedtoDCNs)sothatgivenanytwodistinctnodes,therearenumerousnode-disjointpathsjoiningthesetwo
nodes;that is,thesebipartite graphshaveone-to-one pathdiversity.InSection 5,wegoontoshowthat wecanactually build numerousedge-disjointpathsfroma sourcenodeto differentdestinationnodesinourbipartitegraphs; thatis,we haveone-to-manypathdiversity(one-to-oneandone-to-manypathdiversityaredefinedinSection2.1).TheDCNsobtained fromthesebipartitegraphsaresuchthat(assumingsomenumericconditionsonthebasebipartitegraphandthetransver- saldesign)wecanfindpairwise link-disjointpathsfromalloftheserversadjacenttosomeswitchtoanyidentically-sized collection of servers (irrespective of which switch they are adjacent to). Consequently, we show that our DCNs provide supportforadditionalcommunicationpatternsthatareprevalentwithindatacentrenetworks.Itshouldbenotedthatone- to-manyandmany-to-manycommunicationpatternsarecommonplaceindatacentres;forexample,in‘bigdata’processing applicationssuchasMapReduce,Hadoop,Spark,andStorm(see,e.g.,thesurvey[16]).
Thispaperisunashamedlytheoretical.However,wedemonstratethatnotonlyisthereinterestingcombinatoricswithin thepractical worldofDCNdesignbutthatcombinatorialmathematicscanpotentiallycontributeto theDCNdesignspace on apractical level.Wefeel thatthemathematical aspectsofDCNshaveso farremainedalmostcompletely unexamined andweadvocateaclosertheoreticalscrutinyofDCNsbothasamodelofcomputationandinrelationtothevastswathesof researchongeneralinterconnectionnetworks.Wementionsomepracticalconsiderationsanddirectionsforfurtherresearch intheConclusion.
2. Basicconcepts
We begin by briefly reviewing some architectural aspects of switch-centric DCNs that are pertinent to our subse- quent research. We then move on to the discrete structures featuring in [10,11], namely hypergraphs, bipartite graphs, andtransversaldesigns.So thatwe mightfullydescribe andunderstandthe constructionsin[10,11],aswellasourown upcominganalysisofswitch-centricDCNs,weeventuallyamalgamatehypergraphs,bipartitegraphs,andtransversaldesigns so that bythe endofthissection,we willhave developedan encompassingbipartite graphframework forthedesign of switch-centricDCNs.ThereadershouldbeawarethatitwillnotbeuntilSection 3.3thatwetransformthebipartitegraphs that we have beenconstructing up untilthen intoswitch-centric DCNs. Asa hintasto thistransformation (andso that thereaderdoesnotlosesightofoureventualgoal),roughlyspeakingweshallregard thenodesofoneofourconstructed bipartite graphs as switch-nodes and attach to some of these switch-nodes additional server-nodes in order to get our switch-centricDCN.Generalgraph-theoreticconceptscanbeobtainedin[17].
2.1. Switch-centricDCNs
Aswitch-centricDCNisabstractedasagraph(whichwealsorefertoasaDCN)wherethenodesarepartitionedintotwo sets: thereareserver-nodes;andthereareswitch-nodes.Ofcourse,theserver-nodescorrespond toserversintheDCNand theswitch-nodestoswitches;notethatimmediatelytherearepracticaldesignlimitationsimposedbythenumberofports inarealswitchandthenumberofNICportsinarealserver(wesometimesrefertothenumberofportsofaswitch-node ratherthanitsdegree). Furthermore,inswitch-centricDCNstherearenolinksjoiningoneserver-nodedirectlytoanother server-node(becauseallroutingwithinaswitch-centricDCNfallswithinthepurviewoftheswitches).Ofconcerntousin thispaperwillbe incorporatingacomparativelylargenumberofserver-nodeswithinourDCNsbutsothatthemaximum lengthofashortestpathjoininganytwoserver-nodes,thatis,thediameteroftheDCN,iskeptwithinagivenbound,where the lengthofsuch a pathis thenumberofdistinct linkson thepath.Essentially, wewill becomparing DCNsasto how manyserver-nodestheyincorporatebutwhentheirdiametersarenormalized.
However,DCNsmustalsopossessotherpropertiestomakethemusablewithinadatacentrecontext.Forexample,they alsoneed to:bescalableandincrementallyscalable(thatis,havethecapacityto copewithincreasesincomponentsand data); havelowmessagelatency;provideforhighoverallthroughput(underarangeoftrafficpatterns);beabletotolerate (alimitednumberof)faults;beenergyefficient;bebotheconomicallyandphysicallyviable;andsupportvirtualization(that is, thepartitioningof theDCNinto virtual networksona dynamic basis),amongst manyother things. Supportforsome ofthesepropertiescanbe measuredusinggraphtheory;forexample,thediameteroftheDCNgivesguidanceasregards theexpectedmessagelatency.Ofparticularinteresttouswillbepathdiversitywhichwe define(somewhatinformally)as thecapacitytosenddatawithoutinducingadditionalcongestionorsoastocopewithexistingcongestionorfaults.There aretwocontexts ofinteresttous:theone-to-one(orunicast)context,whenasourceserver-nodewishestosenddatatoa destination server-nodeby theutilizationofindependentpaths (wewillreturntowhat wemeanby‘independent’soon);
and the one-to-many (or multicast) context, when a source server-node wishes to send datato a number of destination server-nodessothat thedifferenttransmissionsdonotinducecongestion.Path diversityishighly relevanttoanumberof the above propertiessuch aslatency andscalability,wheredifferent paths areused to splitandbalance loads,andfault tolerance, where differentpaths provide alternative means of transitin the caseof faults. Path diversity isimportant in boththeone-to-one andone-to-manycontexts,withthisimportanceaccentuatedinthelattercontextwhenadatacentre needs tosupportdatareplicationandapplicationslikeMapReduce[18].Anadditionaldimensionisaddedwithrespectto virtualization when we havevirtual machinesembedded within adata centrethat sharethe sameresources butrequire traffic to be routed via different routes. As we shall soon see, just as with latency, the independence of paths can be consideredgraph-theoretically.
numberofhyperedgescontainingitandtherankofahyperedgeisitssizeasasubsetofV.A hypergraphisregular(resp.
uniform)ifeverynodehasthesamedegree(resp.everyhyperedgehasthesamerank)withthisdegree(resp.rank)being thedegree(resp.rank)ofthehypergraph.EverygraphG=
(
V,
E)
hasa naturalrepresentationasahypergraph:thenodes ofthehypergraphare V;andthehyperedgesareE,wherethehyperedgeeconsistsofthepairofnodesincidentwiththe edgeeofG.2.3. Hypergraphsandbipartitegraphs
WecanrepresentanyhypergraphH=
(
V,
E)
asabipartitegraph:thenodesetofthebipartitegraphisV∪E;andthere isanedge(
v,
e)
,forv∈V ande∈E,inthebipartitegraphif,andonlyif,v∈einthehypergraph.Itisclearthatthisyields aone-to-onecorrespondencebetweenhypergraphsandbipartitegraphs(withoutisolatednodes)thatcomecompletewith apartitionoftheelementsintoa‘left-handside’,whichwillcorrespondtothenodesofthehypergraph,anda‘right-hand side’,whichwillcorrespondtothehyperedgesofthehypergraph(rememberthatinahypergraph,everynodeisinatleast onehyperedgeandeveryhyperedgecontains atleastone node,sowecannot haveisolatednodesinourbipartitegraphs).Weassume (henceforth)that everybipartitegraphcomesequippedwithsuch apartitionandforclarityfromnowonwe refertothenodesontheleft-handsideasnodesandthenodesontheright-handsideasblocks(thisisinkeepingwithour upcomingrealisationoftransversaldesignsasbipartitegraphs).Likewise,werefertothedegreeofanodeasitsdegreeand thedegree ofa blockasits rank.A bipartitegraphcorresponding to aregular, uniformhypergraph ofdegreed andrank
is calleda
(
d, )
-bipartitegraph.Every bipartite graph(andso every hypergraph)alsodescribesits dualbipartitegraph (oralternativelyitsdualhypergraph)wheretherolesofthenodesontheleft-handsideandtheblocksontheright-hand side of the partition are reversed in thedefinition of thebipartite graph;so, for example,the dual bipartite graph ofa(
d, )
-bipartitegraphisregularofdegreeanduniformofrankd.
Notethatif G isabipartite graphthenitcorresponds toahypergraph viaour representationabove andit alsocorre- spondstoahypergraphviathenaturalrepresentationhighlightedinSection2.2.Thetwohypergraphscorrespondingtothe samebipartite grapharedifferentandweare neverinterested intherepresentationofabipartite graphasahypergraph viathenaturalrepresentationofSection2.2.
2.4. Pathsinhypergraphs
Apathinsome hypergraph H=
(
V,
E)
(orthecorresponding bipartite graph)isan alternating sequenceofnodesand hyperedges sothat all nodesare distinct,all hyperedges are distinct,anda node v∈V followsorprecedes a hyperedge e∈E inthesequenceonlyif v∈e inthehypergraph (or(
v,
e)
isan edgeinthecorrespondingbipartite graph).Thefirst elementofsomepathisthesourceandthefinalelementthedestination.Thelengthofanypathisitslengthinthebipartite graphcorrespondingtothehypergraph,andthedistancebetweentwodistinctelementsofV∪E isthelengthofashortest pathjoiningthesetwoelementsinthecorresponding bipartitegraph.Thediameter ofH isthemaximumofthedistances betweeneverypairofdistinctnodesofV,andtheline-diameterofH isthemaximumofthedistancesbetweeneverypair ofdistincthyperedgesofE.Wehavetworemarks.First,wehavetraditionalnotionsofdiameterandline-diameterinanybipartitegraph.Notethat ournotionofdiameterinabipartitegraph,whichisthelongestshortestnode-to-nodepath(andsoignoresnode-to-block and block-to-block paths),is different fromthe usual graph-theoretic notionof diameter in a bipartite graph (the same commentcanbe madeasregardsline-diameter). Whenwetalkofthediameterorline-diameter ofa bipartitegraph,we meanwithrespecttoournotionofdiameterorline-diameter,respectively;ifweneedto talkofthetraditionalnotionof graphdiameterthenwe willmake thisclear.Second, ournotionofpathlength inahypergraph differsfromthat in[10]
wherethelengthisthenumberofnodes(resp.hyperedges) inahyperedge-to-hyperedge(resp.node-to-node)path.There isno realconsequenceto thisdifference; essentially, ournotionofpath lengthis doublethat in[10]. However, weshall soonmovetoanexclusivelybipartitegraph-theoreticformulationinwhichournotionoflengthisthenaturalonetoadopt.
Weshallbeinterested inbuildingsets ofpaths insomehypergraph H sothatthepathsmighthavethesamesources ordestinations;moreover,weshallrequirethatthesepathsdonot‘interfere’withoneanother(orare‘independent’aswe mentionedearlier).Wesaythataset P ofpathsinH is:
• pairwiseinternally-disjointifanysourceordestinationofsomepathofP onlyappearsasasourceordestinationonany pathof P,andanynodeorhyperedgethatisnotasourceordestinationappearsonatmostonepathof P
•pairwiseedge-disjointifeverypair
(
v,
e)
∈V×E issuchthat vfollowsorprecedeseonsomepathatmostonceacross allpathsfromthesetP.2.5. Hypergraphsasswitch-centricDCNs
Given some hypergraph H=
(
V,
E)
, our intention is to ultimately transform this hypergraph into a DCN by consid- ering both the nodes andthe hyperedges as switch-nodes so that the switch-nodes corresponding to thenodes (which we shalllater call the level-1 switch-nodes, withthe switch-nodes corresponding to the hyperedgesthe level 2-switch- nodes) alsohave adjacentserver-nodes,which wehave yetto define(this intentionis bestappreciatedby workingwith the corresponding bipartite graph ratherthan the hypergraph; the upcoming Fig. 5provides a visualization ofwhat we mean). Consequently,we canregard ahypergraph H asmodellinga switch-centricDCN N wherethereare twolevels of switch-nodes.Suppose that we haveaset P ofpairwise internally-disjointpathsfroma node u of H toanothernode v of H.This translatesto aset P ofpairwise internally-disjointpathsin N fromthecorresponding level-1 switch-node u tothecor- responding level-1 switch-node v. We can usethe paths of P forthe simultaneous transfer of data from server-nodes adjacent to thelevel-1 switch-node u to server-nodes adjacentto thelevel-1 switch-node v (see Fig. 5). In orderto fa- cilitatethisdatatransfer werequirethat level-1 switch-nodesarenon-blockingwhereas thelevel-2 switch-nodescanbe blocking; recallthat aswitch-nodeis non-blocking whennocontentionariseswhensimultaneously sendingdata through the switch-nodeontwodistinct inputlinksandout ontwodistinct outputlinks, andblocking otherwise.Thisisbecause we need to beable tosimultaneously move datafromall servers adjacentto the level-1 switch-nodeu in N across the switch-node andout alongdifferentlinks(the samecanbe saidfor v). Ifour pathsin H areonlypairwise edge-disjoint thenwerequirethatlevel-1 andlevel-2 switch-nodesofN arenon-blocking(aswemighthaveswitch-nodesappearingon morethanonepathofP,eventhoughnolinkdoes).
2.6. Transversaldesigns
Thenotionofatransversaldesigniscrucialtowhatfollows.
Definition1.Letk
,
≥2. A[,k]-transversaldesign T isa triple(X ,
D,U)where: |X|=k; D=
(
D1,
D2, . . . ,
D)
is a partitionofX intoequal-sizedgroups(eachofsizek);andU= {Uj:j=1
,
2, . . . ,
k2}isafamilyofk2subsetsofX,each ofsizeandcalledablock,sothat
• |Di∩Uj|=1,fori=1
,
2, . . . ,
, j=1,
2, . . . ,
k2•eachpairofelements{xi
,
xj},wherexi∈Di,xj∈Dj andi=j,iscontainedinexactly1 block.We adoptagraph-theoreticperspective ontransversaldesigns asdefinedinDefinition 1:wethinkofthe[,k]-trans- versaldesignT asabipartitegraphwheretheelements ofX (resp.U) lieontheleft-handside (resp.right-handside)of thepartition,andsoarecallednodes(resp.blocks)withinthebipartitegraph,andsothatinthisbipartitegraphthereisan edge
(
p,
Q)
,for p∈X andQ ∈U,if,andonlyif,inthetransversaldesigntheelement pisintheblock Q.Notethatthe bipartite graphcorrespondingtothetransversaldesignfromDefinition 1isa(
k, )
-bipartitegraph.Henceforth,weadopt our bipartite graph frameworkand regard both hypergraphsandtransversaldesigns asbipartite graphs (unless we state otherwise).There is an intimate relationship involving transversaldesigns, orthogonalarrays andmutuallyorthogonallatinsquares, although thereisnoneedtogive definitionshere. However,itiswell known:that thereare
mutuallyorthogonal latin squaresoforderkif,andonlyif,thereisa[+2
,
k]-orthogonalarrayif,andonlyif,thereisa[+2,
k]-transversaldesign;andthat thereare atmostk−1 mutually orthogonallatin squaresoforderk (see,forexample,[19]).Hence, ifwe have a [,k]-transversal designthen
≤k+1. Also, ifk is a prime powerthen a [,k]-transversal design exists whenever 2≤
≤k+1 (again,see[19]).We shallusethesefacts lateron.Thestudyoftheexistenceof[,k]-transversaldesigns, forvarious
andk,isalong-standingareaofresearch.
Werequireonefinalbitofnotation.IfT issometransversaldesign,asinDefinition 1,andxandy arenodesindistinct groupsthenwerefertotheuniqueblockadjacenttobothxandy astheblockgeneratedbyxand y.
3. The3-stepconstructionanditsextensions
We now describe the 3-stepconstructionfor buildingbipartite graphs (or, equivalently,hypergraphs) by usinga ‘base’
bipartite graphandatransversaldesign(whichwe thinkofasabipartite graph).Thisconstructionoriginatedin[11]and was used in [10]. We then explain how this construction was subsequently extended in [10] both by iteration and by compositionsoastoyieldswitch-centricDCNs.
Fig. 1.A(d, )-bipartite graphH0.
Fig. 2.A[,k]-transversal designT.
3.1. The3-stepconstruction
The3-stepconstructionproceedsasfollows.
Step 1:LetH0beaconnected
(
d, )
-bipartitegraphsothattherearennodes(ontheleft-handsideofthepartition,each ofdegreed)andeblocks(ontheright-handside,eachofrank).SuchanH0canbevisualizedasinFig. 1(ordinarily,we representnodesascirclesandblocksassquares).
Step 2:LetT bea[
,
k]-transversaldesign.Inparticular,therearegroupsofknodes(ontheleft-handside)aswell as k2 blocks(ontheright-handside).Sucha T canbevisualizedasinFig. 2.BuildthebipartitegraphH asfollows.Forevery node pofH0,introduceagroup Gp ofknodesofH;wesaythatthegroupofnodesGp ofHisassociatedwiththenode p ofH0.Foreveryblock Q ofH0,adjacenttothenodesp1
,
p2, . . . ,
pin H0,introduceacopyofT,denotedTQ,rootedon thegroupsofnodesGp1
,
Gp2, . . . ,
Gp; so,associatedwiththeblock Q ofH0,wehaveaset BQ ofk2 blocksin H.We refertothegroupsofnodes Gp1
,
Gp2, . . . ,
Gp astherootsofthecopyTQ ofT in H.Suchabipartitegraph H canbe visualizedasinFig. 3wheretwoofthecopiesofT arepartiallyshown(notethattheymighthavesomerootsincommon buttheirrespectivesetsofblocksarealwaysdisjointasaretheirsetsofedges).ThebipartitegraphH0providesatemplate astohowweintroducecopiesofT toformH.Notethat:
• eachnodeofH canbeindexedasap,j,wherep∈ {pi:i=1
,
2, . . . ,
n}and j∈ {1,
2, . . . ,
k},sothat pisthenodeofH0 towhichthegroup Gp inwhichap,j sitsisassociatedand jistheindexofthenodeap,jinthisgroup• eachblockofHcanbeindexedasBQ,U,whereQ ∈ {Qi:i=1
,
2, . . . ,
e}andU∈ {1,
2, . . . ,
k2},sothat Q istheblockof H0 towhichthesetofblocksBQ inwhichBQ,U sitsisassociatedandU istheblockofT towhichBQ,U corresponds.Inaddition,eachnodeofT canbeindexedui,j,wherei∈ {1
,
2, . . . , }
and j∈ {1,
2, . . . ,
k},sothatDiisthegroupofnodes inwhichui,j sitsand j istheindexofui,jinthatgroup.Fig. 3.AmalgamatingH0andT to getH.
Step 3: LetH∗ be thebipartitegraphobtainedfromthebipartite graphH by reversingtherolesofnodesandblocks(so, H∗ isthedualbipartitegraphofH).Notethatthebipartitegraph H∗ isregularofdegree
anduniformofrankdk.
Werefertothe
(
dk, )
-bipartitegraphH (resp.the(,
dk)
-bipartitegraph H∗) constructedaboveashavingbeencon- structed bythe2-step(resp.3-step)methodusingthe(
d, )
-bipartitegraph H0 andthe[,
k]-transversal designT.Note that H (resp.H∗)hasnknodes(resp.ek2 nodes)andek2blocks(resp.nkblocks).Ourintentionwithourconstructionsistoultimatelydesignswitch-centricDCNswithbeneficial properties(asweout- lined in Section 2). Whilst there are many properties we would like our DCNs to have, it is important that DCNs can integrate alarge number ofserver-nodes sothat the server-node-to-server-node distances are shortandso that there is redundancy astowhich(short)server-node-to-server-noderouteswechoosetouse.Inourframeworkofbipartitegraphs, thistranslatesasbuildingbipartite graphswithalargenumberofnodesandwithredundant(short)node-to-nodepaths.
As a first step,thefollowing resultwas proven in[11] (it isactually derivablefromthe proofsofour upcoming results) andallowsuscontroloverthelengthofshortestblock-to-blockpathsin2-stepconstructions(andsoshortestnode-to-node pathsin3-stepconstructions).
Theorem2([11]).Supposethatthe
(
dk, )
-bipartitegraphH hasbeenconstructedusingthe2-stepmethodusingthe(
d, )
-bipartite graphH0andthe[,
k]-transversaldesignT .IfH0hasline-diameterλ
≥4thenH hasline-diameterλ
.Ofcourse,if H∗ isthedualbipartitegraphof H inTheorem 2thenit hasdiameter
λ
.Wereiteratethatournotionof diameterandline-diameterdiffersfromthatin[11,10](wherethelengthofablock-to-blockpathisthenumberofnodes onthatpath;so,in[11,10]theboundλ
≥4 inourTheorem 2appearsasλ
≥2).3.2. Iteration
Wecaniteratethe3-stepconstruction(aswasdonein[10]).NotethatifH0 isa
(
d, )
-bipartitegraphofline-diameterλ
≥4,withnnodesandeblocks,thenthebipartitegraph H1 resultingfromthe2-stepconstruction(using H0 andsome [,k]-transversal design T)isa(
dk, )
-bipartite graphofline-diameterλ
.So,repeating the2-stepconstruction butwith H1replacingH0(wekeepthesameT,althoughwedonothaveto)yieldsa(
dk2, )
-bipartitegraphH2ofline-diameterλ
. Byiteratingthisconstruction,wecanclearlyobtaina(
dki, )
-bipartitegraphHiofline-diameterλ
.Converting Hi intoH∗i resultsinabipartitegraphwithek2inodes,withnkiblocks,withdiameterλ
,andthatisregularofdegreeanduniform ofrankdki.
3.3. Composition
Weare nowinapositiontotransformourbipartite graphsintoswitch-centric DCNs.Aswell astheconstructions,and their associatedproofs, that were presentedin [10], newmethods of composing bipartite graphs (builtaccording to the
Fig. 4.Building a switch-centric DCN via MethodAwhenc>1.
Fig. 5.Building a switch-centric DCN via MethodAwhenc=1.
Fig. 6.Building a switch-centric DCN via MethodB.
3-stepconstruction)soastoobtainswitch-centricDCNswere alsoderived.In[10], 4 suchmethods weregiven:Methods M1,M2 andM3 aredifferentcasesofMethod A,below;andMethodM4isMethodB.
Inwhatfollows,let Hbea
(, δ)
-bipartitegraphwhere< δ
andwheretherearennodesandeblocks.MethodA:Wetakec copiesofH where
δ
−c>
0 andc≥1.Foreach nodeuof H:weremovethecorrespondingnode ineachofthec copiesofH andintroduceanewswitch-node(commontoallcopiesofH);wemakeallofthecedges incident withthec original nodesincident withthisnewswitch-node; andweattach
ρ
=δ
−cpendantserver-nodes to thenew switch-node.Allblocks of H are considered asswitch-nodes.We follow[10] andcallthe newswitch-nodes level-1switch-nodes,and the original switch-nodeslevel-2switch-nodes.The construction ofthe switch-centric DCN N
(
H)
fromH viathismethodcanbevisualisedasinFig. 4,whereweonlyshowtheconstructionforthec nodescorresponding toonenodeofH.Notethatevery switch-nodeofN(
H)
hasδ
ports.Also,thereissomechoiceasregardstheparameterc (sothat choosingdifferentvaluesforc yields differentvaluesforρ
). Weillustrate thespecialcasewhen c=1 inFig. 5, whereH isa(
3,
5)
-bipartitegraph.Thegeneralcasewhenc≥1 correspondstoMethodM2 of[10];thespecialcasewhen c=1 correspondstoMethod M1;andthespecialcasewhenc= δ2corresponds toMethodM3.Inthislattercase,the aimis toensure that every level-1 switch-node isadjacent toroughly the samenumberoflevel-2 switch-nodes asitis server-nodes.Notethat: thenumberofserver-nodesin N(
H)
isn(δ
−c)
;the numberoflevel-1 switch-nodes isn; and thenumberoflevel-2 switch-nodesisce.MethodB:Wenowworkwithaswitch-centricDCNasconstructedbyMethod A.Leteverylevel-1 switch-nodehave
ρ
ad- jacentserver-nodes.Supposethatthereisanevennumberoflevel-1 switch-nodes.Partitionthesetoflevel-1 switch-nodes intopairs.Foreachpairofswitch-nodes(
S,
S)
:removeρ2server-nodesthatareadjacenttoSandremoveρ2server- nodesthat areadjacentto S; andmakeevery server-nodethat isadjacenttothe switch-node S ortheswitch-node S alsoadjacenttotheother switch-node.Notethatthenumberofportsofanyswitch-nodehasnotchangedbutthatevery server-nodeisnowadjacentto2 switch-nodes.Thephilosophybehindthisconstructionistobettertoleratethefailureofa level-1switch-node.TheconstructioncanbevisualizedasinFig. 6wherepairedlevel-1 switch-nodeshavethesameshade ofgreyandwhereρ
=3.Table 1
Comparingswitch-centricDCNsbuiltwithswitch-nodeswith64 ports.
Network # switch ports Diameter # server-nodes # switch-nodes
Fat-Tree 64 6 65,536 5,120
H∗ 64 4 54,720 6,840
N1A(H∗) 64 6 3,064,320 61,560
N2A(H∗) 64 6 437,760 102,600
N3A(H∗) 64 6 1,751,040 82,080
NB(H∗) 64 6 1,532,160 61,560
H¯∗ 64 4 20,480 1,280
N1A(H¯∗) 64 6 1,228,800 21,760
3.4. SomeillustrationsofDCNs
In [10],switch-centricDCNs constructedusingthe3-stepmethodallied withMethods A and B werefavourablycom- paredwiththe3-levelFat-TreeDCNfrom[6]withregardtothenumberofserver-nodesthereinwhenthediameterandthe numberofportsofaswitch-nodeareheldconstant.Thereaderisreferredto[6,10]forfulldetailsasregardsthetopology ofFat-TreeandtoTables2–4in[10]forthecompletecomparison;however,weincludeareplicatedtableherepurelyfor illustrative purposes.InTable 1(whichis Table 2from[10]):thenumberofportsofanyswitch-nodeisforcedtobe 64;
thediametersoftheDCNsresultingfromusingthe3-stepmethod,iterationandcompositionareforcedtobe(atmost)6 (likethatofFat-Tree);andthenumbersofserver-nodesandswitch-nodesintheresultingDCNsareasgiven(notethatthe lengthofaserver-node-to-server-nodepathasdefinedin[10]isthenumberofswitch-nodesonit,whichisonelessthan ournotionoflengthwhichisthenumberoflinksonthepath).
•The bipartite graph H∗ is obtainedusing the 3-step method starting witha
(
8,
8)
-bipartite graph H0, that has 855 nodes,855 blocks,anddiameterandline-diameter4 (suchabipartitegraphH0 exists;see[20]),anda[8,
8]-transversal design T.The DCNH∗ inTable 1istheDCNobtainedbysimplyregardingevery nodeofthebipartite graphH∗ asa server-node(notethatinthisDCNwerequirethateveryserver-nodehas8 NICports);theDCNN1A(
H∗)
(resp.N2A(
H∗)
, N3A(
H∗)
) is obtainedby employing Method A withc=1 (resp. c=7, c=4); and the DCN NB(
H∗)
is obtained by employingMethod BwithN1A(
H∗)
(notethatthenumberofswitch-nodesentryinTable 2in[10]isincorrect).•The bipartite graph H¯ isobtained usingthe 3-step methoditerated twice, starting witha
(
4,
4)
-bipartite graph H¯0, that has80 nodes, 80 blocks,and diameterand line-diameter 4 (such a bipartite graph H¯0 exists; see [20]), and a [4,
4]-transversal designT¯ (actually,in[10]thistransversaldesignisnotmentioned;itdoes,however,exist).TheDCN H¯∗ in Table 1is theDCNobtainedby simplyregarding everynode ofthe bipartite graph H¯∗ asa server-node(note thatthenumberofserver-nodesentryinTable 2in[10]isincorrect,thoughthecorrectnumberisstatedinthetext);andtheDCN N1A
(
H∗)
isobtainedbyemploying Method A withc=1 (note thatthe numbersofserver-nodesandof switch-nodesentriesinTable 2in[10]areincorrect).ItisclearfromTable 1(andfrom[10])thatwecanbuildmuchbiggerserver-centricDCNsusingthe3-stepmethodandthe subsequent iterationsandcompositionsthan Fat-Treebutwithoutincreasing thediameter(which isa proxyforlatency);
ofcourse,wewouldwishthenewDCNstohaveother propertiesthat makethemattractivewithinadatacentrecontext.
Establishing such propertieswas essentially the whole point of[10] andwe continue withthisline ofresearch in what follows. Note that we provide additional illustrations of our constructions of switch-centric DCNs, in tandem with our upcomingresults,inSection4.3.
Beforewemovetoourmainresults,letuscommentonusingthe2-stepmethodasopposedtothe3-stepmethodwhen building ourswitch-centric DCNs(the same commentwas madein [10]). Notethat when one usesthe (iterated)2-step method,whilsttherankoftheresultingbipartitegraphstaysthesame,thedegreegrows.Werewetoattachserver-nodes totheswitch-nodesthat replacethenodesofthe2-stepbipartite graphH,ratherthanthe3-stepbipartite graphH∗,the number ofports ofthe level-2 switch-nodes(which would be
) wouldbe much lessthan thenumber ofportsof the level-1 switch-nodes.Hence,itmakesmoresensetoproceedaswehavedoneabove.
4. One-to-onepathdiversity
So far, we haveset the scenefrom[10] anddescribed a methodby which we can build bipartite graphs(the 3-step method)whichcanthenbetransformedintoswitch-centricDCNswithmanymoreserversthanFat-Treewhilstmaintaining thediameterofFat-Tree,i.e.,6.However,aswementionedearlier,therearemanymoreaspectstothedesignofDCNswith an important onebeing pathdiversity. Inwhat follows,we highlightsome problemswiththe proofsofone-to-one path diversity in [10] for bipartite graphs builtusing the 3-stepmethod. We then provide not only correctproofs asregards one-to-one pathdiversity butwe alsoextendandimprove theanalysisin [10] withnewresults.We endthe section by applyingourconstructionssoastobuildDCNswithgoodone-to-onepathdiversityproperties.
are claimedin thesituationwhenthetwo blocks BQ,U and BQ,U aresuch that Q =Q (recallourmethodofindexing inSection3.1whichweadopthere).However, thereareseriousflawsintheproofofTheorem 3of[10],somuchsothat thetheoremisuntrue.Inshort,Theorem 3of[10]claims thatifthereare
ω
pairwiseinternally-disjointpathsinH0 from Q to Q then there are min{ω ,
kω
} pairwise internally-disjoint paths in H from BQ,U to BQ,U. This doesnot make sense:themaximumnumberofpairwise internally-disjointpaths in H from BQ,U to BQ,U is(as thebipartite graph H has rank
) andso we must havethat min{
ω ,
kω
}≤. Forinstance,in Example1 of[10], thebipartite graph H0 isthe cycleoflength 10 (H0 is derived fromthecycleoflength 5 using its naturalrepresentationas ahypergraph;see Section2.2),sothatd=
=2,n=e=5,andthereare2 internally-disjointpathsfromanyblockofH0 toanyotherblock ofH0.A[2
,
3]-transversaldesignT isusedandthebipartitegraphH∗ builtbythe3-stepmethodhasrank6 anddegree2.However,ifTheorem 3of[10]were truethen therewouldbe4 pairwise disjointpaths fromBQ,U to BQ,U in H∗ which clearlycannotbethecase.
4.2. Theone-to-onescenario
Wenowresurrect(someof)theproofsofthemainresultsfrom[10] andextendtheresultsclaimedinthatpaper.The followinglemmaprovesmostuseful.
Lemma3.LetT besome[
,
k]-transversaldesignwithgroupsofnodes{D1,
D2, . . . ,
D}.LetU besomeblockofT .Foreach i∈ {1,
2, . . . ,
},letri∈DibetheuniquenodeofDithatisadjacenttoU .SetR= {ri:i=1,
2, . . . ,
}.LetP beasetofdistinctpairs ofnodessothat:exactlyonenodeofanypairinP isinR andnonodeofR isinmorethanonepairofP ;andnopairinP issuchthat bothnodeslieinthesamegroup.TheblocksgeneratedbythepairsinP arealldistinctanddifferentfromU .Proof. Supposethat {ri
,
x}∈P,wherex∈Dl\R withl=i andwherei∈ {1,
2, . . . ,
}.LetUri,x betheblockgeneratedby riandx.IfUri,x=U thenU isadjacenttothedistinctnodesrl andxinDl whichyieldsacontradiction.Supposethat {rj
,
y}∈P\ {{ri,
x}},where j∈ {1,
2, . . . , }
.LetUrj,y be theblock generatedby rj and y. Supposethat Uri,x=Urj,y; hence, Uri,x is adjacent to both ri andrj with i= j. As any two nodes lying in distinct groups in T are adjacenttoauniqueblockof T,wemusthavethatUri,x=Urj,y=U;butthisyieldsacontradictionasabove.Hence,the blocksgeneratedbythepairsinP arealldistinctandalldifferentfromU. 2Weusethislemmathroughout,bothexplicitlyandimplicitly.
Ourmainresultintheone-to-onecontext isconcerned withbuildingasmanypairwiseinternally-disjointpathsaswe canfromanyblocktoanyotherblockinthebipartitegraphbuiltusingthe2-stepmethod(or,equivalently,fromanynode toanyother node inthebipartitegraphbuiltusingthe 3-stepmethod).We explainthe impactoftheexistence ofthese paths onthepath diversityofsubsequentlybuiltDCNs presently.Oneaddedandsignificant complicationintheproof of thefollowingresultcomesaboutwhenthetransversaldesignT isa[k+1
,
k]-transversaldesign(so,thereisthepotential for=k+1
>
kpaths).Theorem4.Letk
, ,
d≥2.LetH bebuiltbythe2-stepmethodfromthe(
d, )
-bipartitegraph H0usingthe[,k]-transversal designT .(a) LetQ andQbedistinctblocksofH0sothatthereare
λ
≥1pairwiseinternally-disjointpathsinH0fromQ toQ,eachoflength atmostμ
.Therearemin{,k}pairwiseinternally-disjointpathsfromanyblockBQ,V of H toanyotherblockBQ,V of H . Furthermore,ifλ
≥2thentherearepairwiseinternally-disjointpathsfromanyblockBQ,V ofH toanyotherblockBQ,V of H .Allpathshavelengthatmost
μ
+4.(b) IfBQ,VandBQ,VaredistinctblocksofH thenthereare
pairwiseinternally-disjointpathsfromBQ,VtoBQ,V,eachoflength atmost6andlyingentirelywithinTQ.
Proof. RecallthatwementionedinSection2.6thatnecessarily
≤k+1.
Case (a)(i):Supposethat:
=k+1;
λ
≥2;andthedistinctnodes p1andp2arecommonneighboursinH0 ofQ andQ. We‘batch’thegroupsofnodesof TQ andTQ togethersothatineachofTQ andTQ,thek+1 groupsofnodesform 1 batchofkgroupsand1 batchof1 groupasfollows:Fig. 7.The basic set-up in Case (a)(i).
•fori∈ {1
,
2},defineGi0=Gpi=H0i•the remaining k−1 groups within TQ are G11
,
G12, . . . ,
G1k−1 and the remaining k−1 groups within TQ are H11,
H12, . . . ,
H1k−1sothat:– any groupoftheformG1j,where j
>
0,isassociatedwithsomenode p∈ {/
p1,
p2}of H0 thatisadjacenttoboth Q and Qif,andonlyif,thegroup H1j isassociatedwiththesamenode pof H0 (so,ifG1j andH1j areassociatedwith thesamenode p∈ {/
p1,
p2}ofH0thentheyarethesamegroupinH).Foreach j∈ {0
,
1, . . . ,
k−1},letr1j∈G1j (resp.s1j∈H1j)be theuniquenode ofG1j (resp. H1j) that isadjacentto BQ,V (resp. BQ,V) in H.Notethat thepairr1j ands1j lieinthesamegroup ofnodesin Hif, andonlyif,both G1j and H1j are associatedwiththesamenode pofH0 andthisnodepisadjacenttobothQ andQinH0.Thesituationcanbevisualized asinFig. 7(whereinthiscase Q and Q havea+2≥2 commonneighboursin H0 andwhere,forexample,r11=s11 but ra1=s1a).LetG10= {r01
,
t1, . . . ,
tk−1}andH01= {s10,
w1, . . . ,
wk−1}sothat:•ifr01=s10 thentj=wj,for j=1
,
2, . . . ,
k−1•ifr01=s10 thenr01=w1,s10=t1 andtj=wj,for j=2
,
3, . . . ,
k−1.WearenowreadytogeneratesomeblockswithinTQ andTQ inH.Foreach j∈ {1
,
2, . . . ,
k−1}:•let Br1
j,tj betheuniqueblockofTQ inHgeneratedbythenodesr1j∈G1j andtj∈G10
•let Bs1
j,wj betheuniqueblockofTQ inH generatedbythenodess1j∈H1j andwj∈H10.
So,we havegeneratedk−1 blocksinTQ andk−1 blocksin TQ.Notethatanyblockof TQ isnecessarilydistinctfrom anyblockofTQ.ByLemma 3appliedtwicetoboth TQ andTQ,allblocksof{Br1
j,tj:j=1
,
2, . . . ,
k−1}aredistinctand differentfromBQ,V,andallblocksof{Bs1j,wj:j=1
,
2, . . . ,
k−1}aredistinctanddifferentfromBQ,V.Callthesetwosets ofblocksourworkingsetsofblocks.WearenowinapositiontobuildsomepathsfromBQ,V to BQ,V in H.Ifr10=s10 thendefinethepaths:
•
π
01asBQ,V,
r01,
BQ,V•
π
11asBQ,V,
r11,
BQ,V,ifr11=s11,andasBQ,V,
r11,
Br1 1,t1,
t1,
Bs11,w1
,
s11,
BQ,V,ifr11=s11 (notethatt1=w1).Ifr10=s10 thendefinethepaths:
•
π
01asBQ,V,
r01,
Bs11,w1
,
s11,
BQ,V (notethat w1=r01)•
π
11asBQ,V,
r11,
Br11,t1
,
s10,
BQ,V (notethatt1=s10).We’ll nowbuild pathsfrom BQ,V to BQ,V usingnodesfromthegroups{G10}∪ {G1j
,
H1j:j=2,
3, . . . ,
k−1}.Foreach j∈ {2,
3, . . . ,
k−1}:•ifr1j=s1j thendefinethepath:
–
π
1j asBQ,V,
r1j,
Br1 j,tj,
tj,
Bs1j,wj
,
s1j,
BQ,V (notethattj=wj)•ifr1j=s1j thendefinethepath:
–