DaaiigiC

(1)

Missing Link Predition and Fuzzy

Communities

Ph.D. thesis booklet

Tamás Nepusz

Advisors:

Dr. FülöpBazsó(MTAKFKI-RMKI)

Dr.GyörgyStrausz(BME MIT)

BudapestUniversityofTehnologyandEonomis,

DepartmentofMeasurementandInformationSystems

2008

(2)

This dissertation is about networks:omplex systems onsist-

ing of uniqueelements onnetedby binaryrelations arranged

inseeminglyrandombutintrinsiallystruturedpatterns. Net-

worktheorywassuessfullyappliedtomodelvariousreal-world

phenomena ranging from the interations of proteinsin living

organisms to the large-saleorganization of human soiety or

to the struture of man-made tehnologial networks like the

Internet. Sinenetworks arebuiltfrom binaryrelationsamong

entities,theyannaturallybetransformedintographs,allowing

onetostudynetworkpropertiesbythetoolsofawell-established

eldofmathematis, namelygraphtheory.

Mydissertationonsistsoftwomajorparts,andthisisalsore-

etedin theontentsofthisthesisbooklet.Intherstpart,I

desribe and examineastohasti graph model where verties

of the graph are assigned to vertex types, and the onnetion

probability of any two verties depends solely on the types of

thevertiesinvolved.Myprimaryaimwas toapply thismodel

totheproblemofpreditingunknownonnetionsinanetwork

whose onnetionalstruture isknownonly partially. These-

ond part of my dissertation investigates a method that nds

densesubgraphs(modules,ommunities,lusters)in anetwork

undertheassumptionthatthesenetworkmodulesarenotwell-

separated and vertiesof the network may belong to multiple

ommunitiesatthesametime.

Prediting the probability of unknown onne-

tions in omplex networks

Introdution

Mostofthestate-of-the-arttoolsinnetworksieneassumethat

the onnetions of the network being studied are either om-

pletelyknown,or evenifthey arenot,the unertainties in the

(3)

in many ases, espeially when our knowledge about the on-

netionsstemsfromexperiments.This isaommonsenarioin

biologyandsoiology.Apeuliarexampleisthegraphmodelof

neuralonnetionsin the ortex,sinethe existeneofagiven

onnetion between two ortial areas an only be proved or

disprovedbyexpensiveandompliatedexperimentswithsome-

times ambiguous results.It is thereforeof primary importane

to estimate the probability of the existene of yet unharted

neural onnetions based on the known ones in order to help

experimenterstoonentrateonthosethat arelikelytoexist.

There aremultiple waysto taklethe link preditionproblem,

whihanroughlybelassiedasfollows:

Methodsbased on loal similarityindies. Thesemethods

alulateasimilaritymeasurebetweenallpairsofverties

inthegraphbasedonsimpleloalpropertiesoftheverties

(and possiblysomeadditionalapriori information). The

ommon underlying assumption ofthese methods is that

unharted onnetions are likely to exist between vertex

pairswithhighsimilaritysores.These soresareusually

derivedfromthesetofneighboursofthevertiesand the

amountofoverlapbetweenthesesets. Someexemplarsof

these methods are theJaard similaritysoreorthe in-

verse log-weighted similarity index of Adami and Adar

[1℄.

Methodsbased on paths and randomwalks. Thesemeth-

odsassessthesimilarityoftwogivenvertiesfromtheset

of paths or random walksbetweenthem. Most probably

oneoftheoldestmethodfromthisfamilyistheKatzsim-

ilarity sore [10℄, whih onsiders all the possible paths

betweenvertexpairs.Sine in generalthereare innitely

manypathsbetweenanytwooftheverties(allowingver-

tex repetitions in paths), the weight of longer paths is

dampedexponentiallyto keepthesumnite. When

P x,y ^hki

(4)

denotes the set of paths of length

k

^between ^verties

x

and

y

^, ^and

0 < β < 1

^is ^the ^damping ^fator, ^the ^Katz

similarityof

x

^and

y

^is^given^by:

∞

X

k=1

β ^k |P _x,y ^hki |

⁽¹⁾

Anotherfrequentlyusedpath-basedsimilaritymeasureis

SimRank [8℄. Similarly to the well-known Google Page-

Rankmeasure,SimRankisdenedbyaself-onsistentre-

ursive equation. The basi ideais that two vertiesare

similar iftheyhaveinomingedges fromsimilar verties.

Let

Γ ⁺ (x)

^be^the^set^of^predeessor^verties^of

x

^(i.e.,^for

everyvertex in

Γ ⁺ (x)

^, ^there êxists ât ^least â ^singleêdge

thatoriginatesfromthisvertexandterminatesin

x

^),^and

let

0 < γ < 1

^anappropriatedampingfator (a ommon hoie is 0.8 [8℄). The denition of SimRank is then as

follows:

SimRank

(x, x) = 1

SimRank

(x, y) = γ P

a∈Γ ⁺ (x)

P

b∈Γ ⁺ (y)

^SimRank

(a, b)

|Γ ⁺ (x)| |Γ ⁺ (y)|

(2)

Itanbeshownthatthesolutionoftheequationsaboveis

uniqueandanbedeterminediteratively[8℄.Atthesame

time, SimRankis theexpetedvalueof

γ ^L

^,^where

L

^is ^a

random variable desribing thenumberof steps required

fortworandomwalksstartedfrom

x

^and

y

^to^meet^for^the

rsttime.

Aommondisadvantageofthemethodsdesribedaboveisthat

unknown or unertain onnetions are treatedas nonexistent,

thus they aremore appropriate forpredition problems where

onehas to extrapolateto the future behaviour of thenetwork

(5)

pletely[11℄.

Thereisalsoathird,relativelynewapproahofthelinkpredi-

tionproblem:letusrstonstrutanappropriaterandomgraph

modelthat issophistiatedenoughtodesribethenetworkbe-

ing studied, then nd the parameterisation of the model that

reproduesthe givennetwork withthehighestprobability,and

usetheprobabilityoftheexisteneofourunknownonnetions

inthemodeltoestimatetheirprobabilityintherealnetwork.If

theoriginalrandomgraphmodelisabletohandleknownnonex-

istingandunknownonnetionsdierently,oneanexpetmore

auratepreditionsfromthesemethodsthanfromtheonesthat

donotdistinguishbetweenthem. Themethoddesribedinmy

dissertationutilisestheseideas.

Researhgoals

Mygoalwasto oneiveamethod thatis abletoestimate the

probability of unknown onnetions in a given stati network

underthefollowingassumptions:

1. All thevertiesin thenetwork areknown,thepossibility

ofaddingnewvertiesormergingexistingonesshouldnot

beonsidered.

2. Vertexpairsareeitheronneted(andonrmed),dison-

neted(andonrmed),orunertain(theironnetedness

isunknown,butwemayhaveanassoiatedaprioridegree

ofbeliefregardingitsexistene).

3. Vertexpairs areordered; inother words, theonnetions

are direted, and the existene of a onnetion from

A

to

B

^does^notîmply ^theêxistene ôfâônnetion ⁱⁿ ^the

oppositediretion.

(6)

Idenedastohastigraphmodel(alledthepreferenemodel),

where theonnetion probabilitiesof vertexpairsaregoverned

by vertex types assigned to the verties involved. The model

uses

k

^types,ândêvery^vertex^has^two^typesât ^the^same^time.

Letusallthese typesin-types andout-types.Thein- andout-

types of thevertiesare enoded by integersbetween 1and

k

in vetors

~u = [u i ]

^and

~v = [v i ]

^. ^The ^model âlso ôntains â

preferenematrix

P = [p ij ]

^of^size

k ×k

^,^where

p ij

^desribes^the

probabilitythat avertex with out-type

i

^onnets ^to ^a ^vertex

within-type

j

^. ^Therefore,^the^regularityⁱⁿ^the^struture^of ^the

networkisdesribedbythetypeassignmentsofthevertiesand

thepreferenematrix.Fittingthisstohastigraphmodeltothe

networkbeingstudiedisthebasisoftheprobabilityestimation

outlinedin myresearhgoals.

Sine the network an ontain unertain onnetions, I ould

nothaveusedtraditional graphdesriptionssuh asadjaeny

matriesoradjaenylists.I hadto ndadesriptionthat ex-

tendsoneofthesedesriptionsin awaythatenablesustotake

intoonsiderationtheadditionalinformationregardingtheun-

ertainty and the degree of belief assoiated to eah possible

onnetion.Thenetworkbeingstudiedisdesribedbya

b ij

^be-

liefvalueforevery

(i, j)

^ordered^vertex^pair:

b ij = 1

ⁱⁿ^the^ase

ofaonrmedexistingonnetion,

b ij = 0

^for^a^onrmed^miss-

ingonnetionand

0 < b ij < 1

^for^unertain^onnetions,^with

highervaluesorrespondingtohigherdegreesofbeliefinother

words, to higherapriori probabilitiesbasedonour additional

domain-speiknowledge.Thematrix

B = [b ij ]

^equals^the^ad-

jaeny matrixof theoriginal graphif itis ompletely known,

thuseetivelydemonstratingthattheadjaenymatrixrepre-

sentationissimplyaspeialaseofthebeliefmatrix.Thebelief

matrixanbestoredeientlyasasparsematrixifthenetwork

issparseandmostmissingonnetionsareonrmed.

Thepreferene modelan bettedto anarbitrarynetworkby

(7)

likelihood, given thebelief matrix

B

of thenetwork.The esti- matedaposteriori probabilitiesanthenbedeterminedbythe

ttedvertextypesandtheirorrespondingentriesin thealu-

latedoptimalpreferenematrix.

Let

G 0

^denote^the^graph^being^analysed^and^let

n

^be^the^number

of verties in

G 0

^. ^The ^likelihood ^of ^a ^given parameterisation

θ = (k, ~u, ~v, P )

^is^then^as^follows:

L(θ|G 0 ) =

n

Y

i=1 n

Y

j=1 j6=i

b ij p u _i ,v _j + (1 − b ij )(1 − p u _i ,v _j )

(3)

Pratialappliations usethelogarithmof thelikelihoodin or-

derto avoidroundingerrorsand numerialinstabilitiesaused

byoating pointalulatinginvolvingverysmall probabilities.

The log-likelihood an be maximised by one of the following

methodsortheirombination:

Expetation-maximization(EM) method. Startingfroma

randomtypeassignment,oneanndaloaloptimumby

repeatedlyapplyingtwosteps.Oneof themisthe E-step

(based onthe rstletter of expetation), while theother

oneisalledtheM-step(denotingmaximisation).Bytem-

porarily assuming

~u

^and

~v

^(the ^type assignments) to be onstant,the E-stepdetermines

E (log L(θ|G 0 ))

^and^then

estimates

E p ij

^for ^every ^possible ^type ^pair. ^The ^M-step

usestheestimated

P

matrixresultingfrom theE-stepto modifythevertextypesinawaythatmaximisestheloal

ontributionofeveryvertextothelog-likelihoodunderthe

assumptionthat noothervertieswillhangetheirgroup

assignments. The algorithm stops when no modiation

wasperformedintheM-step(sinethisimpliesthatnoth-

ingwill hangeintheE-stepaswell).

Markov hain Monte Carlo (MCMC)method. Thisalgo-

rithmperformsarandomwalkin thespaeofallpossible

(8)

therandomwalkinvolveshangingthetypeassignmentof

asinglevertexhosenrandomly.Elementsofthe

P

prefer- enematrixarethenre-estimatedbasedonthenewong-

urationsimilarlyto theE-stepin theEMalgorithm. The

newstateisaeptedunonditionallyasthenextstatein

therandomwalkifitslikelihoodishigherthanthelikeli-

hoodof theold state.If thenewlikelihood is lowerthan

the old one,the ratio of thenew and theold likelihoods

givestheprobabilityofaeptane.Thisshemeistheap-

pliationoftheMetropolisHastingsalgorithm[7℄forthis

spei problem, therefore the state probabilities in the

stationary distribution of the resultingMarkovhain are

proportionaltotheirlikelihoods.Bytakingalargenumber

ofsamplesfromthehainafter asuientlylongburn-in

period(whihletstheresidualeetsofthestartingstate

diminish), wean ndaparameterisation withhigh like-

lihood.

The shortomingof the EMalgorithm is that it anget stuk

in a loal maximum, but this is ounterbalaned by the fat

that it onvergesfast. The MCMC algorithm is free from this

shortoming, sine we oasionally allow steps towards worse

statesaswell.Inpratie,theadvantagesofthemethodsanbe

ombinedbyreplaingtheburn-instageoftheMCMCproess

by EM iterations. The Markov hain is then started from the

loalmaximumfoundbytheEMproess.

Results

T 1/1. I showed that vertex degreesin the networks gener-

atedbythepreferenemodelaredesribedbytheweightedsum

of Poisson-distributed random variables. I also proved a su-

ient onditionfortheexisteneof agiantomponentin these

networks.

(9)

rametersof the model to agiven network,taking into aount

thedegreesofbeliefassoiatedtothepossibleonnetionsinthe

network.I testedthevalidityofthese algorithmsonomputer-

generatedtestgraphs.

T 1/3. I showedthat theAkaike information riterion[2℄ is

abletohoosethemostappropriatenumberofvertexgroupsof

themodelinanunsupervisedmanner.

PubliationsrelatedtothesesT 1/1.,T 1/2.,T1/3.:

•

^Nepusz ^T., ^Bazsó^F.: Likelihood-based lustering of direted graphs.In: IEEE Proeedings of the 3rd Interna-

tional Symposium onComputational Intelligeneand In-

telligentInformatis,Agadir,Moroo,2830Marh2007,

pp.189194.

•

^Nepusz ^T., ^Bazsó ^F.: Maximum-likelihood methods for data miningin datasetsrepresented by graphs.In:IEEE

Proeedingsofthe5thInternationalSymposiumonIntel-

ligent Systems and Informatis, Subotia, Serbia, 2425

August2007,pp.161-165.

•

^Nepusz ^T.,^Négyessy ^L.,^Tusnády ^G., ^Bazsó ^F.:^Reon-

struting ortial networks: ase of direted graphswith

highlevelofreiproity.Toappearin:HandbookofLarge-

Sale Random Networks, editors: Béla Bollobás, Róbert

Kozma, Dezs® Miklós. Springer, 2008. ISBN 978-3-540-

69394-9.

Appliations

The appliability of the model and the algorithms is demon-

stratedintheeldofbiology,sinenetworkdatasetsinbiology

usually originate from experiments, therefore they frequently

(10)

ofthevisualandsensorimotorortiesofthemaaquemonkey

asdesribedin [12℄. This network inorporated45 brain areas

and 463 onrmed existing neural onnetions between them.

360pairsofareaswereknowntobedisonneted,andnoinfor-

mationwasavailableregardingtheremaining 1157pairs.Suh

unertainty poses a hallenge to even the state-of-the-art link

preditionapproahes.

The preferene model wasable to reonstrut the known part

of the ortial network with high ondene (92.7% of known

existentand83.1%ofknownnonexistentonnetionswerepre-

dited orretly). Resultspertaining to thevisual areas of the

networkdesribethemostexatreonstrutionpublishedinthe

literature so far, and the preditions regarding the unknown

onnetionsalso seemplausible in line withearlier reonstru-

tionattempts[5,9℄.ROCurveswerealsousedtoomparethe

methodto othergenerilink preditionmethods (seeFig.1).

Conlusion

I presented a method that is ableto estimate the probability

of unknownonnetionsin a stati,direted omplexnetwork,

taking domain-spei information into aount by the means

ofa priori onnetionprobabilities(belief values) andI shown

the appliability of the method on a real predition problem.

Themethodisrelevantnotonlyintheeldofbiology,butinall

problemswhereresearhersareonfrontedwithnetworkswhih

areknownto beinomplete.

Fuzzy ommunities in undireted networks

Introdution

Aommonfeature ofnetworksmodeling naturalphenomenais

sparseness: the vast majorityof possible onnetionsare miss-

(11)

link predition methods on theortial network dataset. AUC

= areaunder urve,attaining its maximum at 1when the re-

onstrutionisperfet.

ing,thusthenumberofatualedgesgrowslinearlyinthenum-

berofvertiesasthe network size tends to innity. Inspiteof

theirsparsity,thesenetworksfrequentlyontaindensesubmod-

ules, whih tend tooinide with largerfuntional units of the

network.Forinstane,densesubgraphsofasoialnetworkusu-

allyorrespondtoirlesoffriends,groupsofoworkersandso

on.Oneof themoststudied problemsof network theoryis the

eient identiation of suh dense subgraphs [4℄, also alled

modules, ommunities orlusters.It analsobedemonstrated

that these ommunities an overlap with eah other [14℄, but

mostommunitydetetionalgorithmsassumethateveryvertex

belongsto oneandonlyoneoftheommunities.Thedierene

betweentheoverlappingandthenonoverlappingapproahisil-

lustratedonFig.2.

(12)

examplegraph.Left:nonoverlappinglusteringwithtwolusters

aordingto thealgorithmof Clausetetal. [4℄.Right:overlap-

pinglusteringwithtwolusters.Thebridge-likepositionofthe

entralvertexisnotrevealedbythenonoverlappingapproah.

Researhgoals

Researhonalgorithmsthatareabletodetetoverlappingom-

munitiesisarelativelynewproblemin networksiene.Atthe

time whenI started myown investigations, there wasno algo-

rithmthat wasableto quantifyhowmuhdoesagivenvertex

belongtoagivenommunity;thealgorithmsavailablewereonly

abletodeidewhetheragivenvertexbelongstoagivenommu-

nityornot.Therefore,myaimwastodevelopanalgorithmthat

isabletoidentifyoverlappingommunitiesinomplexnetworks

and haraterise the membership degrees of the verties with

respet to the deteted ommunities. I assumed that edges in

thenetworkareundiretedandthenetworktopologyisknown

exatly.

(13)

Myresearhwasinspiredbythefuzzy

c

^-means^lustering^[3,^6℄.

Fuzzy

c

^-means ^lustering ^was^proven^to ^be ûsefulând êient

in problemswhenthepointstobelusteredwereembedded in

an

n

-dimensional spae with anappropriate distane funtion.

However,there isnosinglestraightforwardembeddinganddis-

tanefuntion forgraphs,soIhadtotakeadierentapproah.

Formally, the output of a fuzzy lustering is a fuzzy partition

matrix, denoted by

U = [u ki ]

^from ^now ^on.

u ki

^denotes ^the

membership degreeofvertex

i

ⁱⁿ ^luster

k

^. ^The^following^on-

straintsareimposedontheelementsofthematrix:

1.

0 ≤ u ki ≤ 1

^for^all

i

^and

k

^.

2.

0 < P n

i=1 u ki < n

^for ^all

k

^, ^where

n

^is ^the ^number ^of

vertiesinthegraph.Informally:lustersannotbeempty

and noluster anontainallthevertiesto thegreatest

possibleextent.

3.

P c

k=1 u ki = 1

^for^all

i

^,^where

c

^is ^the^number^of^lusters.

Informally:thesumofallmembershipdegreespertaining

to agiven vertex is 1, thereforeweare not interested in

outlier vertiesthatdonotbelongtoanyofthelusters.

Myalgorithmisbasedonasimilarityfuntiondenedoverpairs

ofverties.Ifwethinkaboutthemembership degreesasprob-

abilities (

u ki

^is ^theprobability of the event that vertex

i

^is ⁱⁿ

luster

k

^), ^the probability of the event that verties

i

^and

j

are in thesameluster equalsthe dotprodut of theirrespe-

tivemembershipvetors:

s ij = P c

k=1 u ki u kj

^.^It^follows^that ^the

similaritymatrix

S = [s ij ]

^based^on^theseprobabilitiesissimply

S = U ^T U

.Thekeyassumptionofthealgorithmisthatthepres- eneof anedgebetweentwovertiesrelatesto theirsimilarity,

while the absene of an edge implies dissimilarity. Therefore,

one should try to nd a matrix

U

that makesonneted ver- tiessimilaranddisonnetedvertiesdissimilar.Weandene

(14)

agoalfuntionthatquantiesthegoodnessoftforagiven

U

basedon the sumof squareddierenes betweenthe expeted

andtheatualsimilarity:

f ( U ) =

n

X

i=1 n

X

j=1

w ij (˜ s ij − s ij ) ² ,

⁽⁴⁾

where

w ij

îsânârbitrary^weighing^termând

˜ s ij = 1

îfândônlyîf

i

^are

j

^onneted^or

i = j

^,^zero^otherwise.^This^goal^funtion^has

tobeoptimisedwithrespettotheonstraintsdened above.

Theonstraintonthesumofmembershipdegreesofagivenver-

texanbeinorporatedintothegoalfuntionbyLagrangemul-

tipliers,leadingtoaonstrainednonlinearoptimisationproblem

where the individual variables (

u ki

⁾ ^an ^take ^values ^from ^the

range

[0; 1]

^.^Starting^from ^a^randomonguration,thevalueof thegoalfuntion an beoptimisedbystandardgradient-based

optimisationmethods (e.g., steepest desent orthe method of

onjugategradients).

Oneoftheadvantagesoffuzzylusteringomparedtononover-

lapping lustering is that it is able to quantify the sharedness

of a vertex between groups. I introdued several measures to

ahievethatgoal:

Bridgeness. Intuitively,avertexis abridgebetweenommu-

nitiestothegreatestpossibleextentifitbelongstoallthe

lusters withthesamemembership degrees.Thisstateis

haraterised by amembership vetorwhose oordinates

are

1/c

^.^Theôtherêxtreme îs^when^the^vertex^belongs^to

only oneof the ommunities, resulting in a membership

vetorwith asingle element of 1 (all other elements are

zeros). Note that the varianeof the vetor omponents

is zero in the former ase and the maximal variane of

(c − 1)/c

îs âttained ⁱⁿ ^the ^latter âse. ^The ^bridgeness

measureanthereforebederivedfromthevarianeofthe

(15)

b i = 1 − v u u t

c c − 1

c

X

j=1

u ji − 1

c 2

(5)

Bridgeness an also be weighted by theentralityof the

vertex,allowingonetoltervertiesthatareoutliers(hav-

inglargebridgenesswithsmallentrality).

Exponentiatedentropy. Anotherpossibleapproahistoon-

siderthemembershipvetorofvertex

i

^as^theprobability massfuntionofadisreterandomvariable

U i

^and^alu-

latetheentropyofthevariable.Theentropyof

U i

^will^be

lowerifvertex

i

^is^anonoverlappingvertexandhigherif

i

isasigniantoverlap.Byusingbase-2logarithmin

H(U i )

(theentropyof

U i

^),^the^number^of^signiant^ommunities

anbeobtainedby

χ i = 2 ^H(U ⁱ ⁾

^:

χ i = 2 ⁻ ^P ^c ^k=1 ^u ^ki ^log ² ^u ^ki =

c

Y

k=1

u ^−u _ki ^ki

⁽⁶⁾

Results

T2/1. Idevisedandimplementedanalgorithmto ndfuzzy

ommunitiesinundiretednetworks.Thealgorithmisbasedon

themaximisationof aglobal goalfuntion derivedfrom vertex

similarities.I testedthevalidity ofthealgorithmonomputer-

generatedtestgraphs.

T2/2. IextendedthemodularitymeasureofNewman[13℄to

aount for the fuzziness of the obtained partitions. I showed

howanoneemploythe fuzziedmodularityto hoosethe op-

timalnumberof ommunities.

(16)

ommunitiesbyintroduingthebridgeness,theweightedbridge-

nessandtheexponentiatedentropymeasuresofthemembership

vetors.

PubliationsrelatedtothesesT 2/1.,T 2/2.,T2/3.:

•

^Nepusz ^T., ^Petrózi ^A., ^Négyessy ^L., ^Bazsó ^F.: ^Fuzzy

ommunitiesandtheoneptofbridgenessinomplexnet-

works.PhysRevE,77(1):016107,2008.

•

^Nepusz ^T.,^Bazsó ^F., ^Strausz ^Gy.: ^Algorithmi^identi-

ationofbridgevertiesinomplexnetworks.In:Proeed-

ings ofthe15

th

PhDMini-symposium, Budapest Univer-

sityofTehnologyandEonomis, pp.7881,2008.

Appliations

The appliabilityof the method is demonstrated again on the

ortialnetworkdatasetdesribedinthepreviouspart.Verties

ofthenetworkanbe lassiedasbrainareasrelatedto either

visualortatileinputproessing.Visualareasanalsobesub-

divided based on anatomial onsiderations. My expetations

were that fuzzy lusteringshould beableto ndthe bisetion

between visual and tatile input proessing areas and should

identify the areas related to the integration of visual and ta-

tile information as bridges(sine the integration task requires

strongonnetionsto bothlusters).

The most appropriate fuzzy lustering of the ortex was ob-

tained with four lusters. Two of these four lusters inluded

mostly visual areas, the remaining two ontained mostly ta-

tile input proessing areas. Only two areas were mislassied

andtheknownanatomialsubdivision ofthevisualortexwas

also reognisable.Therewere veareasthat were identied as

bridges based on the entrality-weighted bridgeness measure;

soresfortheseareasweresigniantlyhigherthantheaverage

(17)

ashigherlevelintegratoryareasaordingtoourpresentunder-

standingof thevisual andsensorimotororties. Thetwomis-

lassied areas were also among these ve bridges, suggesting

thatthelassiationerrorisausedbythebridge-likeposition

oftheseareas.

Conlusion

The methodology desribed in this part of the dissertation is

suitable for deteting ommunities in omplex networks even

whentheseommunitiesoverlaportheirboundariesarenotwell

dened.Somemoreillustrationsoftheresultsofthemethodon

real datasets are also provided. The deteted bridges deserve

furtherattention,sinethesevertiesmayplayaruialrolein

thesystemmodeledbythenetworkstruture.Furtherresearh

diretions inlude (butare notlimitedto)the extensionof the

method to direted and weighted graphs, outlier verties and

alternativesimilaritymeasures.

List of publiations

Publiationsrelated to the Ph.D. theses

1. Nepusz T.,Négyessy L.,Tusnády G., Bazsó F.:Reon-

struting ortial networks: ase of direted graphswith

highlevelofreiproity.Toappearin:HandbookofLarge-

Sale Random Networks, editors: Béla Bollobás, Róbert

Kozma, Dezs® Miklós. Springer, 2008. ISBN 978-3-540-

69394-9.

2. Nepusz T.,Bazsó F., Strausz Gy.: Algorithmiidenti-

ationofbridgevertiesinomplexnetworks.In:Proeed-

ings ofthe15

th

PhDMini-symposium, Budapest Univer-

sityofTehnologyandEonomis, pp.7881,2008.

(18)

ommunitiesandtheoneptofbridgenessinomplexnet-

works.PhysRevE,77(1):016107,2008.

4. Nepusz T., Bazsó F.: Maximum-likelihood methods for

data miningin datasetsrepresented by graphs.In:IEEE

Proeedingsofthe5thInternationalSymposiumonIntel-

ligent Systems and Informatis, Subotia, Serbia, 2425

August2007,pp.161-165.

5. Nepusz T., BazsóF.: Likelihood-based lustering of di-

reted graphs.In: IEEE Proeedings of the 3rd Interna-

tional Symposium onComputational Intelligeneand In-

telligentInformatis,Agadir,Marokkó,2830Marh2007,

pp.189-194.

Further related publiations

6. Négyessy L., Nepusz T., Zalányi L., Bazsó F.: Conver-

gene and divergene are mostly reiproated properties

of theonnetions in the network of ortial areas. Pro

RoySoLondonB,aeptedasis,2008.

7. NepuszT.,BazsóF.,StrauszGy.:Anewapproahforthe

lustering of direted graphs.In:Proeedings of the14

th

PhDMini-symposium,BudapestUniversityofTehnology

andEonomis, pp.3437,2007.

8. CsárdiG.,Nepusz T.:Theigraphsoftwarepakagefor

omplex network researh. InterJournal of Complex Sys-

tems 1695,2006.

9. PetróziA.,NepuszT.,BazsóF.:Measuringtie-strength

invirtualsoialnetworks.Connetions,27(2):4957,2006.

10. Négyessy L.,Nepusz T.,KosisL.,BazsóF.:Predition

ofthemainortialareasandonnetionsinvolvedin the

tatile funtion of the visual ortexby network analysis.

Eur JNeurosi,23(7):19191930,2006.

(19)

fortheanalysisoflargedatasetsrepresentedbygraphs.In:

Proeedings of the13

th

PhDMini-symposium, Budapest

UniversityofTehnologyandEonomis,pp.5253,2005.

Other publiations

12. PetróziA.,AidmanE.V.,NepuszT.:Capturingdoping

attitudes by self-report delarations and impliit assess-

ment: amethodology study. Substane Abuse Treatment,

PreventionandPoliy,3:9, 2008.

13. KovásZs.,NepuszT.,HamarG.:Egyszer¶mozgásanal-

izátorneurológiaihasználatra(inHungarian).Kórház-és

orvostehnika,43(1):3-8,2005.

14. Herzeg E.,BaumgartnerI.,FazekasG.,KovásZs.,Ne-

pusz T.: Mozgásanalizátor alkalmazása féloldali bénult

betegek kézfunkiójának felmérésére (in Hungarian). Re-

habilitáió14(4): 31-34,2004.

Bibliography

[1℄ L.A. Adamiand E.Adar. Friends andneighbors on the

Web. SoialNetworks,25(3):211230,2003.

[2℄ H. Akaike. A new look at the statistial model identi-

ation. IEEE Transations on Automati Control, 19(6):

716723,1974.

[3℄ J. C. Bezdek. Pattern Reognition with Fuzzy Objetive

FuntionAlgorithms.Plenum,NewYork,USA,1981.ISBN

978-0306406713.

[4℄ A.Clauset,M.E.J.Newman,andC.Moore.Findingom-

munity struture in verylargenetworks. Physial Review

E,70:066111,2004.

[5℄ L.daFontouraCosta,M.Kaiser,andC.C.Hilgetag. Pre-

ditingtheonnetivityof primateortial networks from

(20)

ology,1(16),2007.

[6℄ J.C.Dunn. A fuzzyrelativeoftheISODATAproessand

itsuseindetetingompatwell-separatedlusters. Jour-

nalofCybernetis, 3:3257,1973.

[7℄ W. K. Hastings. Monte Carlo sampling methods using

Markovhains and their appliations. Biometrika, 57(1):

97109,1970. doi:10.2307/2334940.

[8℄ G.JehandJ.Widom. SimRank:Ameasureofstrutural-

ontext similarity. In Proeedings of the Eighth ACM

SIGKDD International Conferene on Knowledge Disov-

eryandDataMining,pages538543,NewYork,2002.As-

soiationof ComputingMahinery.

[9℄ B.Jouve,P. Rosenstiehl, andM.Imbert. A mathematial

approah to the onnetivity between the ortial visual

areasofthemaaquemonkey.CerebralCortex,8(1):2839,

1998.

[10℄ L.Katz.Anewstatusindexderivedfromsoiometrianal-

ysis. Psyhometrika, 18(1):3943,1953.

[11℄ D.Liben-NowellandJ.Kleinberg.Thelinkpreditionprob-

lem forsoial networks. In Pro. 12th International Con-

fereneonInformationandKnowledgeManagement,2003.

[12℄ L.Négyessy, T.Nepusz, L.Kosis,and F. Bazsó. Predi-

tionofthemainortial areasandonnetionsinvolvedin

thetatilefutionofthevisualortexbynetworkanalysis.

EuropeanJournalof Neurosiene,23(2):19191930,2006.

[13℄ M.E.J.Newman. Fastalgorithmfordetetingommunity

strutureinnetworks.Physial ReviewE,69:066133,2004.

[14℄ G.Palla,I.Derényi,I.Farkas,and T.Visek. Unovering

theoverlappingommunitystrutureof omplexnetworks

innature andsoiety. Nature,435(7043):814818,2005.

DaaiigiC

P x,y hki

k

x

y

0 < β < 1

x

y

∞

X

k=1

β k |P x,y hki |

Γ + (x)

x

Γ + (x)

x

0 < γ < 1

(x, x) = 1

(x, y) = γ P

a∈Γ + (x)

P

b∈Γ + (y)

(a, b)

|Γ + (x)| |Γ + (y)|

γ L

L

x

y

A

B

k

k

~u = [u i ]

~v = [v i ]

P = [p ij ]

k ×k

p ij

i

j

b ij

(i, j)

b ij = 1

b ij = 0

0 < b ij < 1

B = [b ij ]

B

G 0

n

G 0

θ = (k, ~u, ~v, P )

L(θ|G 0 ) =

n

Y

i=1 n

Y

j=1 j6=i

b ij p u i ,v j + (1 − b ij )(1 − p u i ,v j )

~u

~v

E (log L(θ|G 0 ))

E p ij

P

P

•

•

•

c

c

n

U = [u ki ]

u ki

i

k

0 ≤ u ki ≤ 1

i

k

0 < P n

i=1 u ki < n

k

n

DaaiigiC

P x,y ^hki

β ^k |P _x,y ^hki |

Γ ⁺ (x)

Γ ⁺ (x)

a∈Γ ⁺ (x)

b∈Γ ⁺ (y)

|Γ ⁺ (x)| |Γ ⁺ (y)|

γ ^L

b ij p u _i ,v _j + (1 − b ij )(1 − p u _i ,v _j )

S = U ^T U

w ij (˜ s ij − s ij ) ² ,

χ i = 2 ^H(U ⁱ ⁾

χ i = 2 ⁻ ^P ^c ^k=1 ^u ^ki ^log ² ^u ^ki =

u ^−u _ki ^ki