InterCluster-A Tool to Cluster Protein-Protein Interactions : Datamining of Protein Interactions in Primary Open Angle Glaucoma

A growing number of diseases seem to be associated with protein aggregation and each disease has several proteins involved in it. To obtain a better understanding of the diseases, the proteins involved in them and their primary interaction partners were collected and clustered. A tool is developed to aid in clustering these proteins and all their primary interactors. The tool is used to cluster proteins involved in Primary Open Angle Glaucoma and their primary interactors. A cluster was selected for analysis based on the availability of experimental analysis in literature. The localization of the proteins in the chosen cluster was collected. On analyzing, four of the proteins in the cluster was found to be associated with heparin binding. Primary open angle glaucoma is known to be associated with loss of retinal vasculature. The tool has helped in finding a cluster of protein interactions with more experimental data. Also it has helped in finding out the 4 proteins associated with the disease that are involved in heparin binding from 10500 proteins. This would not have been possible to do manually. Further studying the role of these four proteins based on heparin binding and loss of vasculature in primary open angle glaucoma would give a better understanding of the disease and the molecular mechanism involved in it. Corresponding authors: Pandaranayaka, E.P.J. Center of Excellence in Bioinformatics, School of Biotechnology, Madurai Kamaraj University, Madurai – 625021, India. E-mail: eswari@mkustrbioinfo.com, eswaripj@gmail.com Received Date: May25, 2015 Accepted Date: July 14, 2015 Published Date: July 16, 2015 Citation: Dhanya, S., et al. InterCluster-A Tool to Cluster Protein-Protein Interactions: Datamining of Protein Interactions in Primary Open Angle Glaucoma. (2015) J Bioinfo Proteomics Rev 1(1): 15-20. www.ommegaonline.com


Localization and function
The localization information of the proteins involved in the disease was collected using CoPub Portal (www.copub.org). Function of the proteins was obtained from UniProt and confirmed using literature search.

Tool development
A tool was developed to cluster the proteins involved in the disease and their primary interactors. Python was used as the back end and the web interface was designed using Hypertext Preprocessor (PHP) and Hyper Text Markup Language (HTML). The web interface allows the user to upload the list of primary interactors in a compressed format. The input to the tool is a compressed folder which consists of list of primary interactors, stored in a text file named with the corresponding protein.
Thus there are text files, with list of primary interactors for every protein. The tool results in clusters at all levels. The five output files generated are explained below.
Occurrence of primary interactor among the proteins: The occurrence of primary interactor among the proteins is given in the file "Name.out". This includes the list of primary interactors along with the protein with which they interact.
Interaction number of the primary interactors: The number of interactions for each primary interactor is given in the file "Num.out".
Clusters within interaction group: The cluster information at all levels is in the file "Cluster.out". Under each "Interaction number" group, there are various sub-groups and clusters in each of them are shown. X>Y implies that, there are X primary interactors common to Y proteins involved in the disease. Consolidated list of all clusters: The consolidated list of all clusters is presented in "List.out". The clusters are obtained in the following format: Detailed information about the clusters: The detailed information about the clusters is given in "Output.out". This file gives elaborate details about the cluster and how the cluster is formed.

Validation of the tool
To validate the tool, a dataset was created, based on T-cell receptor pathway. The proteins on the membrane were considered equivalent to the proteins involved in the disease. The proteins directly interacting with these membrane proteins were considered as the primary interactors. The files were then given as input to the tool.

Proteins and their primary interactors
POAG is a major cause of blindness, characterized by progressive degeneration of the optic nerve and is usually associated with elevated intraocular pressure (IOP). This results in loss of retinal ganglion cell axons, along with supporting glia and vasculature. Reducing the intraocular pressure prevents progression of the disease in all stages [11,12] .
Thirty one proteins are involved in POAG, fifteen of them have three dimensional structure ( Table 1). The number of primary interactors varied from tens to five hundreds. The total number of primary interactors for all the proteins together was around 10500.

InterCluster
InterCluster is a web-based server for clustering proteins and primary interactors. InterCluster is freely accessible at http://bicmku.in/InterCluster (Figure 1). The total number of proteins involved in POAG and their primary interactors from STRING database is around 10500. InterCluster clusters these proteins and results in 16 groups and 110 clusters. The cluster with the maximum interaction number [7> 11] is selected as Cluster 1. The [9 > 3] sub-cluster in the [11:9] cluster is considered as Cluster 2 as it has maximum experimental evidence.

Validation of the tool
The tool was validated using primary interactions in T-cell receptor pathway. The tool was able to group the primary interactors and the proteins to form clusters as involved in the pathway (Figure 2). For example, 2 membrane proteins in the pathway PD-1 and CTLA4 interact with the protein SHP1. Here, SHP1 is considered as a primary interactor. Hence the cluster obtained is: SHP1 > PD-1, CTLA4 This cluster has the maximum number of interaction. Eleven proteins involved in POAG have seven primary interactors (Table 2) in common. The functions of the proteins involved in this cluster were quite diverse. From the localization data acquired, some of the POAG proteins and the primary interactors were found to be co-localized. For example, MYOC (Uniprot ID:Q99972) and HRAS (UniProt ID:P01112) are co-localised in mitochondria [13,14] . Studies also show that both MYOC and HRAS play a role in cell proliferation and survival [15,16] . Further analysis was not carried out due to lack of experimental data in literature. This cluster was chosen for analysis as this had maximum number of experimental analysis. Three proteins involved in POAG has nine primary interactors (Table 3) in common. In this cluster, four proteins, MYOC (Uniprot ID: Q99972), APOE (UniProt ID: P02649), FN1 (UniProt ID: P02751) and CYP1B1 (UniProt ID: Q16678) are proteins involved in POAG. Though MYOC is a protein involved in POAG, in this cluster it occurs only as a primary interactor. It is known that mutations in MYOC and APOE cause POAG [17] . Accumulation of FN1 in the trabecular meshwork is associated to POAG [18] . Mutant CYB1B1 causes MYOC upregulation, which in turn causes POAG pathogenesis [19] . Clinical studies show that POAG is significantly more prevalent in a group of people with optic cup -retinal venous occlusion (OC-RVO). RVO is the blockage of the small veins that carry blood away from the retina. The mean IOP is significantly higher in OC-RVO than in other types of RVO [20] . POAG is also associated with increase in IOP and loss of vasculature [11] . From this analysis the 4 proteins APOE, MYOC, ELANE (UniProt ID: P08246) and KNG1 (UniProt ID: P01042) in this cluster are found to be involved in heparin binding. [21][22][23][24] Further, based on the localization of the proteins [25][26][27][28][29][30][31] 4 in Cluster 2 ( Figure 3) it is most probable that STAT3 (UniProt ID: P40763) interacts with POAG proteins APOE and CYP1B1 in the cytoplasm. Also MYOC might interact with POAG proteins FN1 and CYP1B1 in the endoplasmic reticulum. The localization information of these proteins gathered from literature is based on experimental analysis.

Discussion
POAG is associated with IOP, degeneration of the optic nerve and loss of vasculature [11] . InterCluster, a tool developed to cluster protein-protein interactions, has helped in datamining the proteins involved in POAG and their primary interactions. Among 10500 proteins that were primary interactors of the 31 proteins involved in POAG we could datamine four proteins APOE, MYOC, ELANE and KNG1 in a cluster (Cluster 2) experimentally proved to be involved in heparin binding [21][22][23][24] . This would have been impossible without the aid of the tool. These proteins may probably be involved in the pathogenesis of POAG by playing a major role in RVO leading to increase in IOP and loss of vasculature which is one of the reasons for neuronal death in POAG [11] . Understanding the protein-protein interactions involved in POAG and other protein aggregation diseases would also help in elucidating protein-drug interactions as in the case of clozapine induced agranulocytosis [32] which would help in finding appropriate drugs to find cure for these incurable diseases. Pathogenesis on POAG has been a mystery though decades of research has been carried out as only 2-3% of the disease is associated with mutation. With increasing research more proteins are known to be involved in the disease based on mutation analysis. But the real reason has not been yet understood. Our study has opened a new way to look at the pathogenesis of POAG and further elaborate research in the new direction.