• Nem Talált Eredményt

Multivalency in Protein-Protein interaction networks

There are several instances where proteins bind multivalently in vivo, which is vastly un- derrepresented in protein-protein interaction databases. Multivalency can drastically change the binding strength between two proteins, and the used measurement technique determines whether the multivalent or monovalent or a mixture of the two is represented in the results.

In the case of screening, the used methods either extract complexes from the system like Co-immunoprecipitation or affinity purification coupled to mass spectrometry, providing information only about multivalent binding. In other cases, a gene of a protein is marked like in the case of the yeast two-hybrid [82] or nucleic acid programmable protein array resulting in measurements of modified protein structures. These methods can change the complex- forming ability of the proteins resulting in overrepresented monovalent interactions.

Furthermore, all in vitro binding strength measurements with synthesized proteins are dependent on the molecular design: whether a multimerization of complex formation is al- lowed prior to the measurement.

Therefore knowing two cardinal information, we can improve the quality of the available data: (1) whether multivalency is relevant for a protein and (2) the used method overrepresent the monovalent or multivalent interactions.

4.3.1 Protein Multivalency Annotation

There are several ways multivalency can accrue presented in section 1.3.3, and for our pur- pose, the three most relevant cases are the protein complexes, the protein repeats, and the surface or membrane proteins. There are available data regarding their status for these cases, which can be incorporated into their semantic information.

4.3. MULTIVALENCY IN PROTEIN-PROTEIN INTERACTION NETWORKS 81

4.3.1.1 Protein Complexes and Multimers

A protein complex is a group of two or more associated polypeptide chains. Multienzyme complexes (where multiple catalytic domains are present in a single polypeptide chain) are distinct from protein complexes and relevant for multivalent interactions.

Proteins in complexes are linked by non-covalent biding and form a functional unit.

These complexes can be transient, permanent, or fuzzy complexes depending on the stability of the complexes and the presence of their components at any given time or space.

Available databases list the protein complexes and the subunits in the complex like Com- plex Portal [83] or presented hierarchically like in Gene Ontology [84].

Proteins participating in complex formation are potentially multivalent binders; there- fore, they can be treated accordingly.

4.3.1.2 Surface Proteins or Membrane Rafts

Membrane proteins from multivalent systems are the base amongst others for cell adhesion, signaling, and compartmentalization.

Membrane proteins can be distributed randomly or in ordered structures or dynamic rafts forming multivalent systems with different properties.

There are available data about membrane rafts like Raftprot database [85] or recent data about sialic acid-mediated protein networks based on glycan–protein cross-linking mass spectrometry [86]. These sources can be used to annotate the present proteins as potentially multivalent partners.

4.3.1.3 Protein Repeats

An additional category for potentially multivalent proteins are genes with repeated suffi- ciently large peptide elements. This data is available for several domain repeats[87, 88] like the transmembrane domains. Furthermore, there are techniques [89, 90] developed to find repeats in the protein sequence.

This information about the repeated domains can form the base of labeling proteins as potential multivalent binders.

4.3.2 Semantic Databases

There are available semantic protein-protein interaction (PPI) data, but there is usually lim- ited information about the interaction partners’ binding strength or clarity. In these cases,

the monovalent and multivalent binding is intermixed, and clarifying it would benefit most of the uses of the PPI networks.

Supporting the databases, the relevance of multivalency for any given protein can be predicted and annotated. Further annotation can be made for the sources of the PPI methods noting whether monovalent of multivalent reactions are representative for the technique.

4.3.2.1 Underrepresented Multivalent Proteins

In cases where the interaction screening or the binding strength measurements were exe- cuted using techniques over representing the monovalent binding, the data can be adjusted.

In general, it is safe to increase the binding strength of proteins with multiple binding domains by the number of the binding domains, especially for borderline cases with meth- ods with high false-negative rates. This is especially relevant in cases where we have no knowledge about binding domains and their distances.

Furthermore, knowing the distance between the binding domains, we can determine whether the partners are in-register or not. In these cases, the predicted multivalent bind- ing strength can be significantly increased, or in case of binary information hits below the threshold can be treated as positive findings. For these cases, the model can be used to as- sign crude estimation for the binding strength based on the average distances between the binding domains.

4.3.2.2 Network-based Analysis

There are several techniques developed on PPI networks to improve the quality of the find- ings or to find potential proteins based on data with limited scope [91]. There are usually scores on the nodes representing proteins and on the edges representing interactions for these techniques. These scores can be adjusted by multivalent information. The scores on the nodes can be increased for multivalent proteins by a constant, for example representing stronger interaction by the factor equal to the multiplicity of the proteins. Furthermore, if two nodes with multivalent annotation have joint edges, the corresponding scores could be drastically increased.

Using this correction technique like network propagation could have better results due to the increased quality of the network data.