Application for Parkinson's disease therapy

Parkinson’s disease (PD) is one of the most well-studied neurodegenerative diseases characterized by the progressive loss of dopamine producing neurons in substantia nigra.

As the research group at the Department of Organic Chemistry has interest in Parkinson's disease therapies, we applied the data fusion based methodology to prioritize repositioning candidates for Parkinson's disease (PD). [14]. Nevertheless, the developed methodology can be applied to a wide range of repositioning projects in general.

According to our current knowledge all neurodegenerative diseases share, with different levels of importance, the following underlying mechanisms: oxidative stress, neuroinflammation, mitochondrial dysfunction, protein misfolding and aggregation, glutamate excitotoxicity, proteosomal dysfunction, disrupted intracellular transport and neurofilamental network, microglial activation and abnormal apoptotic behaviour [14, 16].

To apply the methodology on practical pharmacology problems, there are some common steps to be done [14]. These steps are the following:

1. Definition of the broader prioritization goal

The prioritization can be a single run or it can be a sequential process. The goal can be to find drugs for an indication, or to find an indication to a drug. In our specific application, the goal was to find FDA approved drugs with good repositioning potential as a PD therapy.

2. Construction of the candidate list

We used the entire set of approved drugs as a candidate set. We could use any set, like a proprietary chemical library, or a subset of approved drugs. For example, as we search candidates for a central nervous system (CNS) related indication, we can pre-filter the candidates based on their blood-brain barrier penetration ability. Other options are filtering based on intellectual property considerations, toxicity related substructures or unwanted biological effects.

3. Construction of special kernels

As discussed earlier, the similarity of compounds can be assessed outside of the classic chemical representation space. One approach is the side effect based similarity, first applied by Campilos et al. and further discussed in one of our publications [10, 52]. Other rich sources of information can be constructed from chemically perturbed gene expression profiles, like from the CMAP or LINCS datasets [42, 87]. A use case for the application of CMAP profiles is discussed in detail in our work [12]. Another promising option is the incorporation of disease specific information sources, and expert knowledge through kernels. An interesting new possibility is the data source construction from High Content Imaging (HCI) screens [88].

4. Design and construction of the query

An important property of the query is heterogeneity, as it is also discussed in the case of group fusion [26]. To a given limit, heterogeneity is desirable as it increases the probability of non-trivial hits. Too high heterogeneity on the other hand can lead to anomalous behaviour. We constructed four different queries representing four subcategories or mechanisms of action such as neuroprotective agents, dopaminergic agents, muscarinic agents, and NMDA antagonists (see Table 9). Designing a query for prioritization is equivalent to setting the focus of the in silico study. Intuitively, we need to describe the indication we are interested in with a set of compounds. According to our studies, an optimal query size is around 3-10 compounds, but the query size can deviate significantly from this value in special cases.

Table 9 – The four Parkinson’s disease related queries with their descriptions.

5. Running the method and evaluating the performance

There are diagnostic steps which can be done to rule out meaningless results. Checking the query heterogeneity – e.g. ISS/UAS value - before the run is the first diagnostic step.

Checking the positions of the query compounds in the output list is also informative. If some candidate compounds got higher rank than some query compounds it can signal a strong hit, but if too many candidates had been ranked before the query it is a strong signal of extreme heterogeneity.

6. Extracting knowledge from the ordering

Query Description

amantadine pramipexole rasagiline

Neuroprotective agents: Agents with disease modifying effect and the ability of slowing or reversing disease progression. dopamine receptors replacing the effect of the missing endogenous ligand.

Muscarinic antagonists: Agents used for reducing the relative cholinergic hyperactivity in the central nervous system caused by dopamine deficiency, restoring the striatal dopaminergic-cholinergic balance.

There are several ways to extract information from the resulted ordering, in addition to the investigation of the top hits. One option is to apply filters to lower the number of compounds we need to investigate. These filters can be chemical structure based or text mining based filters. We applied a PubMed based filter, where we filtered out compounds without co-occurrences with the terms “PD”, “Parkinson” or “Parkinson’s disease” in PubMed abstracts. Other options are filtering based on physicochemical properties, or based on functional group occurrences [90].

In addition to the filtering we can use enrichment analysis to test, if there is a property enriched in the top of the list. The application of the compound set enrichment analysis (CSEA) is discussed in our publication [12]. Compound set here means compounds having common properties interesting for our purpose, like common mode of action, target, indication or side effect [12]. The idea originates from gene set enrichment analysis (GSEA) [91].

We used the SaddleSum algorithm for enrichment analysis [92]. The intuition behind the algorithm is rather simple. We are interested in the enrichment of certain annotations on the top of our prioritization list. We have a vocabulary V of compound sets; the members of each set shares the same annotation. In our examples we will use the ATC Level 4 classes as annotations. We can collect a weight for every annotation by adding up the inverse rank or the score of all compounds sharing the given annotation.

To answer the question 'Is the given annotation significantly enriched on the top of the list?' we have to ask how likely it is that if we randomly pick entities, the sum of weights exceeds S. This probability will be our p-value. If this probability is low, it is highly likely that the enrichment is not caused by chance.

In document Prediction of biological activity using heterogeneous information sources (Pldal 73-76)