Discovery and Identification of Serum Potential Biomarkers for Colorectal Cancer Using TMT Quantitative Proteomics

Colorectal cancer (CRC) is a leading cause of cancer deaths worldwide. Nevertheless, the exact mechanism of CRC occurrence remains unclear, and it is lack of biomarkers for early detection. In this study, we employed a TMT-based quantitative proteomic approach to analyze the proteomic changes in the sera collected from early-staged colorectal cancer patients and healthy volunteers. Among the 282 proteins identified, a total of 76 proteins were found differentially expressed in the patients. Bioinformatics analysis revealed that the differentially expressed proteins were related to focal adhesion and p53 signaling pathway. The differential expression of S-adenosylmethionine synthase (MAT2A) and ATP-binding cassette sub-family B member 9 (ABCB9) was further confirmed by using ELISA analysis. The receiver operation characteristic curve of the diagnostic model was 0.908 for MAT2A, and 0.980 for ABCB9. In conclusion, MAT2A and ABCB9 may be potential protein biomarkers of CRC. Our research may shed light on the early diagnosis of CRC.

ma [5] . If not being removed, these polyps may invade nearby tissues and grow into and beyond the wall of colon and rectum. This malignant growth may provide access to lymph and circulatory systems thereby promoting the spread of cancer cells to distant organs [6,7] . The traditional pathway observed in most of the sporadic cancer is characterized by a cascade of accumulating mutations [8] . Typically, the first mutation is the APC gene, which affects cell division. Subsequent mutations then develop in the KRAS oncogene, which has downstream effects on cell growth, differentiation and survival. Over time, these mutations can cause a loss of function of the p53 gene, which is a master regulator of transcription and apoptosis, thus impacting a wide range of cellular functions that result in cancergenesis [9,10] .
In order to achieve consensus on a globally recognized standard for classifying cancer spreading extent, TNM is developed by UICC. T represents size and extension of the primary tumor, N represents lymph nodes involvement and M represents distant metastasis [11] . Metastasis to regional lymph nodes is one of the most reliable prognostic criteria for CRC patients, which has been adopted globally in the TNM staging system [12] . Patients of stage I/II would have a better prognosis than stage III/ IV. Therefore it is urgent to explore efficient early diagnosis criteria [13] . Currently, many potential biomarkers have already been successfully translated into clinical practice, such as Carcinoembryonic antigen (CEA) and CA19-9, although these are no highly promising diagnostic targets for personalized medicine [14] . So there is a critical need for highly sensitive and specific biomarkers of the earliest disease stage [15] .
Proteomic approaches have been widely used in the attempt to identify new biomarkers of CRC and also to elucidate the molecular mechanisms of CRC [16] . That has led to the identification of many proteins that could be potentially employed as biomarkers. In this study, we enrolled node-negative patients with early-staged CRC and healthy volunteers to look for the potential CRC biomarkers by using proteomics.

Reagents
BCA Protein Assay Kit, Micro BCA Protein Assay Kit, and TMT Mass Tagging Kits and Reagents were purchased from the Thermo Fisher Scientific. Trypsin was provided by Gibco and sequencing level trypsin was from Promega. Acetonitrile (ACN) was obtained from J. T. Baker. All the chemicals and solvents were of analytical grade or LC−MS grade.

Serum samples
Written informed consent was obtained from all subjects. This study included patients with CRC (n = 54) who underwent surgical treatment at Beijing Chao-Yang Hospital. The diagnosis of colorectal cancer was made by ultrasonography and dynamic computed tomography (CT) or magnetic resonance imaging (MRI), and the diagnosis of CRC was made by pathological examinations of the resected tissues after surgery. No patients had a biopsy prior to the operation or received preoperative treatments. Blood samples were drawn within one week of CT or MRI diagnosis and prior to surgical treatment. Healthy volunteers (n = 30) were recruited at the Peking University Hospital for blood collection in accordance with approved Institutional Review Board (IRB) protocols. Serum separated from whole blood was stored at −80 °C within 4h of collection and aliquoted extensively to avoid repeat freeze thaw cycles. For proteomic profiling, two microliters of serum from patients in each group was pooled on ice.

Protein digestion
Total protein concentrations were measured by the BCA Protein Assay Kit. Then the same amount of protein was digested in-solution by 1% w/w trypsin, sequencing grade (Promega) overnight at 37 °C. After digestion, the peptides were quantified with Micro BCA Protein Assay Kit.

Tandem Mass Tag (TMT) Labeling
Fifty micrograms of peptides from each sample group were used for TMT labeling. The labeling experiment was performed essentially according to manufacturer's protocol. The peptides from each sample were redissolved in dissolution buffer and then incubated with a specific TMT tag for 1 h at room temperature. The healthy control group sample was labeled with reporter tag 126, while the CRC patients' group sample was labeled with reporter tag 127. To each sample, 8 μL of 5% hydroxylamine was added and incubated for 20 min to quench the reaction. Samples with different tags were then mixed equally for the following peptide purification. Subsequently, the labeled peptides were desalted with Sep-Pak Vac C18 cartridges (Waters, Milford, MA, USA). The final obtained peptides samples were dried using a rotation vacuum concentrator [17] .

Protein identification and relative quantification
The mass spectrometric analyses of the HILIC fractions were performed on a LTQ-Orbitrap XL (Thermo Scientific) instrument coupled to an EASY-nLC using a nanoelectrospray ion source (Thermo Scientific, Proxeon Biosystems). The LC separation of the peptides was carried out using a homemade reversed-phase column packed with ReproSil-Pur 120 C18-AQ 3 μm resin (Dr. Maisch GmbH). The peptides were eluted in a 4-hour linear gradient from solvent A (0.1% (v/v) formic acid) to solvent B (0.1% (v/v) formic acid, 95% (v/v) acetonitrile) with a constant flow of 300 nl/min. The MS/MS spectra of the 3 most intense ions were acquired in centroid mode by higher-energy c-trap dissociation (HCD) and collision induced dissociation (CID) allowing the detection of the low mass TMT reporter ions. All LTQ-Orbitrap raw data files were processed and quantified using Proteome Discoverer version 1.4 (Thermo Scientific). Data were considered reliable when p-value was less than 0.05 and the error factor < 2. The fold changes ratio > 1.5 (upregulated) or < 0.67 (downregulated) was selected as the cutoff value to designate significant changes in protein expression.

Bioinformatics analysis
The GO analysis: cellular component, biological process, molecular function, and KEGG pathway were annotated by Genecodis3 (http://genecodis.cnb.csic.es./). The protein-protein interaction network was analyzed by Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) software (http:// string.embl.de/).

ELISA analysis
Human MAT2A ELISA kit (EIAab Science, Wuhan, Hubei, China; the detection limit was 5 ng/mL), human ABCB9 ELISA kit (EIAab Science, Wuhan, Hubei, China; the detection limit was 10 ng/mL). ELISA kit were utilized to measure concentrations of proteins in each serum sample of validation set (n = 56, 16 healthy controls and 40 CRC patients) and performed in duplicates according to the ELISA manufacturer's instructions. Briefly, diluted samples were incubated in 96-well plates coated with antibodies of proteins for 2 h. After incubation and washing, the biotinylated detection antibodies were added to the wells for 1 h at room temperature with gentle shaking. After a second incubation and washing, add prepared HRP-Streptavidin solution to each well for 1 h. Then substrate Colorimetric TMB Reagent was added to the wells at room temperature for 20 min, and the sulfuric acid was added to stop the enzyme reaction. Finally, absorbance was read on microplate spectrophotometer (Bio-Rad) at a wavelength of 450 nm. The protein concentration was estimated using a double logarithmic diagram based on the measured standard curve.

Construction of the diagnostic model
The serum levels of MAT2A and ABCB9 were used to construct a diagnostic model. During model construction, the diagnostic score of healthy control was set as "0," while that of CRC patient was set as "1." The forward stepwise method was used to determine which proteins should be included or excluded from the diagnostic model [18] . The ROC curves were conducted.

Statistical analysis
The numerical data were analyzed with SPSS software, version 19.0 (SPSS, Chicago, IL, USA). Experimental data were presented as mean ± SD and p < 0.05 was considered as statistically significant. Comparison of two groups was tested using the

Experimental design
The overview of the clinical samples and the study design are illustrated in Figure 1. At the initial screening and discovery phase, serum samples were pooled respectively from CRC patients and healthy controls. After digestion, TMT-coupled LC-MS/MS analysis was used to screen candidate protein biomarkers. At the next verification stage, a panel of candidate biomarkers was further verified by ELISA. Finally, the receiver operation characteristic (ROC) curves were carried out. In the discovery phase, serum protein candidate biomarkers were identified and quantified using TMT-coupled LC-MS/MS method. In the validation phase, two proteins were further analyzed using ELISA, and diagnostic models were established by ROC curve.

Protein identification and relative quantification
Serum samples from fourteen colorectal cancer patients (" Table S1") and the same number of healthy controls were pooled respectively. After TMT-coupled LC-MS/MS analysis, proteins were identified and quantified according to the criteria for protein quantification. A total of 282 proteins were identified and 263 proteins were quantified. In serum of CRC patients, a total of 76 differentially expressed proteins were screened. Of the 76 differentially expressed proteins, 38 proteins were upregulated (Supporting Information " Table S2"), and 38 proteins were downregulated (Supporting Information " Table S3").

Bioinformatics analysis
The analysis of cellular component by GO revealed that the differentially expressed proteins were mostly located in extracellular region, implying that most of the differential proteins were secretery proteins (Figure 2A). Further, the analysis of biological process suggested that most of these differentially expressed proteins were revealed to have a role in blood coag-ulation, platelet activation, immune response, platelet degranulation and complement activation classical pathway, suggesting that the diverse regulation of signaling pathways is dependent upon expression of these proteins ( Figure 2B). In addition, data mining indicated that these identified differential proteins based on the molecular function mainly clustered to protein binding, protein homodimerization activity, eukaryotic surface binding and antigen binding. The most common functional annotation was the binding activity ( Figure 2C). The most significant KEGG pathways were mapped by bioinformatics interrogation using Genecodis3, which clustered to complement and coagulation cascades, chemokine signaling pathway, focal adhesion, ECM-receptor interaction, p53 signaling pathway as well as ubiquinone and other terpenoid-quinone biosynthesis. They showed close relationship with tumor genesis and metastasis, and immune response ( Figure 2D). These classical signaling pathways revealing the cancer characteristics can support the reliability of mass spectrometry data and lay the foundation of potential biomarkers. Additionally, it was found that most of the differen-Serum Potential Biomarkers for Colorectal Cancer 4 tially expressed proteins were involved in physical or functional interaction to constitute a network through STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database analysis ( Figure 2E). According to bioinformatics analysis, two potential candidate biomarkers (MAT2A and ABCB9) were selected for sandwich ELISA.

Verification of the candidate biomarkers by ELISA
In the validation phase, the serum concentrations of two candidate biomarkers MAT2A and ABCB9 were further measured by ELISA in subjects (n = 56, 16 healthy controls and 40 CRC patients). We chose the patients in validation cohort independent of the ones in discovery cohort to increase samples size, and confirm the regularity and universality of the study. The serum concentrations of MAT2A in CRC patients and healthy controls were 767.72 ± 394.88 and 353.48 ± 264.89 pg/mL, respectively, while, the serum concentrations of ABCB9 in CRC patients and healthy controls were 118.13 ± 157.07 and 5.31 ± 3.06 pg/mL respectively. The results showed that the serum concentrations of MAT2A (p < 0.0001; Figure 3A) and ABCB9 (p < 0.0001; Figure 3B) were significantly higher in the CRC patients group, compared to the healthy control group. The results were consistent with the TMT labeling proteomics data.

Diagnostic value of the model
The diagnostic values of serum MAT2A and ABCB9 were evaluated by ROC curve analysis. The area under the curves (AUC) were 0.908 (P < 0.001) and 0.980 (P < 0.001) for MAT2A and ABCB9 respectively, relative to the healthy volunteers ( Figure 4A and 4B).

Discussion
Serum protein is an easy accessible sample for identifying potential protein biomarkers. Moreover, serum can reflect the physiological and pathological conditions [19] . In addition, the serum proteomics method, TMT-coupled LC-MS/MS has been shown to be more sensitive and reliable for high-throughput protein identification and relative quantification between six groups simultaneously.
In this study, 263 proteins were quantified, of which 76 proteins were differentially expressed by at least 1.50-or 0.67fold. Bioinformatics analysis showed that most of the differentially expressed proteins were involved in the tumor genesis and metastasis, and immune response.
ELISA analysis indicated that there were significant differences in serum MAT2A and ABCB9 between the CRC patients and healthy volunteers, suggesting that these proteins may serve as serum potential biomarkers for colorectal cancer. Previous reports revealed that MAT2A expression is increased in human colon cancer and in colon cancer cells treated with mitogens, whereas silencing MAT2A resulted in apoptosis [20] . MAT2A is also up-regulated in liver cancer and is a potentially important drug target [21] . Similarly, our study found that the serum MAT2A concentration significantly increased in CRC patients, indicating that it may involve in the occurrence and development of CRC. ATP-binding cassette (ABC) transporters play a crucial role in the development of resistance by the efflux of anticancer agents outside of cancer cells [22] . Existing research showed that microRNA-24 increased the sensitivity to paclitaxel in drug-resistant breast carcinoma cell lines via targeting ABCB9 [23] , and microRNA-31 inhibits cisplatin-induced apoptosis in non-small cell lung cancer cells by regulating the drug transporter ABCB9 [24] . Our finding indicated the serum concentration of ABCB9 was significantly higher in the CRC patients group, consisting with the TMT labeling proteomics data. In conclusion, MAT2A and ABCB9 are probably potential CRC biomarkers. Moreover, MAD2L2, PRPF8 and PF4 were changed more than ten folds between CRC patients and healthy volunteers. MAD2L2 inhibits the activation of the anaphase promoting complex APC thereby regulating progression through cell cycle [25] . PRPF8 functions as a scaffold that mediates the ordered assembly of spliceosomal proteins, and its defection may cause missplicing in myeloid malignancies [26] . PF4 is released during platelet aggregation and inhibits endothelial cell proliferation [27] . They are valuable for further study in more depth. AUC was employed to show the accuracy of biomarkers. The results showed that the AUCs of MAT2A and ABCB9 were 0.908 and 0.980, respectively. They presented high sensitivity and specificity of individual biomarkers for CRC.

Conclusion
CRC remains an incurable disease without effective noninvasive techniques. Therefore, it is urgent and important to find early diagnosis methods in clinic. To face the present situation, we performed a research to identify CRC biomarkers in serum by TMT-coupled LC-MS/MS and ELISA. MAT2A and ABCB9 are regarded as potential serum protein biomarkers for CRC. The diagnostic model, composed of these two proteins, may provide meaningful data for diagnosis of CRC. We expect that our study will provide valuable clues on both discovering 6 clinical biomarkers and elucidating previously unknown mechanisms of CRC pathogenesis.