Identification of target genes in cardiomyopathy with fibrosis and cardiac remodeling

Background Identify genes probably associated with chronic heart failure and predict potential target genes for dilated cardiomyopathy using bioinformatics analyses. Methods Gene expression profiles (series number GSE3585 and GSE42955) of cardiomyopathy patients and healthy controls were downloaded from the Expression Omnibus Gene (GEO) database. Differential expression of genes (DEGS) between the two groups of total 14 cardiomyopathy patients and 10 healthy controls were subsequently identified by limma package of R. Database for Annotation, Visualization, and Integrated Discovery (DAVID Tool), which is an analysis of enriched biological processes. Search Tool for the Retrieval Interacting Genes (STRING) was used as well for the analysis of protein-protein interaction network (PPI). Prediction of the potential drugs was suggested based on the preliminarily identified genes using Connectivity Map (CMap). Results Eighty-nine DEGs were identified (57 up-regulated and 32 down-regulated). The most enrichment Gene Ontology (GO) terms (P < 0.05) contain genes involved in extracellular matrix (ECM) and biological adhesion signal pathways (P < 0.05, ES > 1.5) such as ECM-receptors, focal adhesion and transforming growth factor beta (TGF-β), etc. Fifty-one differentially expressed genes were found to encode interacting proteins. Eleven key genes along with related transcription factors were identified including CTGF, POSTN, CORIN, FIGF, etc. Conclusion Bioinformatics-based analyses reveal the targeted genes probably associated with cardiomyopathy, which provide clues for pharmacological therapies aiming at the targets.


Background
Patients with dilated cardiomyopathy (DCM) often presents with progressive congested heart failure, arrhythmia and thromboembolic disease in forms of left ventricular mural thrombus and/or strike [1]. DCM is seen as a major cause of heart failure apart from coronary heart disease and hypertension [2]. Heart failure is a progressive chronic disease with a 5-year-suvival rate less than 50% [3]. The histologically confirmed diagnosis indicated that prevalence of DCM is 14 to 52% among population proven previous history of myocarditis [4]. High morbidity and mortality underscore the necessity of deeper investigation of the underlying mechanism responsible for development of heart failure in DCM [5]. B-type natriuretic peptides (BNP) is a commonly used biomarker so far for diagnosis of DCM. However, the biomarkers are not so specific since the increased levels of the biomarkers sometimes indicate variety of cardiovascular diseases caused by heterogenetic etiologies, and cannot be explained by impaired left ventricular function alone [6]. The major goal for the treatment of DCM is to reduce the mortality and morbidity rate, to relief symptoms, and to prevent or, in some extent, reverse ventricular remodeling [7]. Fett et al. reported the necessity of application of polymerase chain reaction (PCR) prior to the immunosuppressive therapy [8]. Hamshere at el provided a therapy by administration of granulocyte colony-stimulating factor (G-CSF) with intracoronary autologous bone marrow-derived cells (BMCs) to improve left ventricular systolic function in patients with DCM [9]. Beta-blocker and pace maker inhibitor ivabradine has been proved to have positive effects in reversing cardiac remodeling. However, reactions to beta-blocker or ivabradine vary based on the cases. In some individuals the left ventricular ejection fraction is improved significantly as a result of reversed of attenuated remodeling while in the others it remains [10]. More studies are needed to focus on treatment that improves the outcome of patients with DCM, to precisely make the diagnosis of the disease based on screening of biomarkers. These studies can improve prognosis of patients by lowering the risk of development of heart failure and relevant complications. So it is crucial to understand the mechanism and find biomarkers with a good specificity and sensitivity.
Gene chip microarray database has an open access. Gene chip technique is a widely used approach in analyzing gene function in post-genome era [11]. It is a High-throughput sequencing technique with optimal specificity and sensitivity. Previous studies have partially demonstrated the mechanism underlying DCM by this approach [12,13]. In this study, two platforms (GPL96 and GPL6244) are incorporated to analyze differential expression genes, enrichment of GO terms or pathways and protein-protein interactions in DCM and predict potential targets and drugs for a better treatment of the disease.

Microarray gene expression
Gene expression profiles with series number GSE3585 based on platform GPL96 and series number GSE42955 on platform GPL6244 were downloaded from the Expression Omnibus Gene (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). Data of seven cardiomyopathy patients and seven healthy controls were randomly selected from each platform, respectively. The total sample of the two patient groups is 14 and the healthy controls are 10 subjects. The samples from all selected subjects had been hybridized on the Affymetrix Human Genome U133A array on a GPL96/ GPL6244 platform (Affymetrix, Santa Clara, CA, USA).

Identification of Differential Expressed Genes (DEG)
Limma of R/bio-conductor was used for screening of the DEGs (settings: P < 0.05, log2 (Fold Change)>/=1). Fifty-seven up-regulated (gene set A) and 32 down -regulated (gene set B) genes were identified. Hierarchical clustering and visualization were made by Heat-map package of R.

Enrichment analysis of significant modules
The Database for Annotation, Visualization and Integrated Discovery (DAVID) [14] provides a non-line comprehensive set of functional annotation tools for biological interpretation of large gene lists. DAVID was used here to group the functions of DEGs in modules, identify enriched biological processes and cellular components, and identify the pathways associated with the DEGs in the most significant modules. Function and pathway terms were retrieved from the Gene Ontology (p < 0.05, Benjamin< 0.01) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (p < 0.05, Benjamin< 0.01) databases, respectively.

Analysis of protein-protein interaction network
The topological properties of the PPI networks were analyzed using Network Analyzer available in Cytoscape. Search Tool for the Retrieval Interacting Genes 10.0 (http://string-db.org/) [15] provide online analysis of interactions among DEG-encoding proteins. The network of interactions was then imported into Cytoscape [16], degree ≥5 was set to filter crucial proteins in the middle of the network.

Analysis of transcription factors
PASTAA [17] was used for predictive analysis of transcription factors of DCMs, P value calculated from hyper geometric distribution was used to evaluate the correlation between the DCMs and transcription factor. TRAP [18] was used to predict the correlation. Geneset C,D was uploaded to the database and JASPAR, (version 2016) [19] was used to predict the DNA binding site.

Acquisition of target genes of drugs
Connectivity map [20] is an online implement that provides gene transcription-expression profile of thousands of genomes reflecting how cultured mammalian cell react to administration of 1309 kinds of bioactive modules in terms of gene-expression. CMap was used to identify potential unknown effect of existing drugs. Up-regulated gene set (gene set A) and down-regulated gene set (gene set B) were uploaded to CMap, EBAYES was used to calculate P-value, PEBAYES-value< 0.05 was considered significant, and could be a potential molecule in treatment of DCM. The half maximal inhibitory concentration (IC50) is a measure of the effectiveness of a substance in inhibiting a specific biological or biochemical function. In previous studies effectiveness of different drugs were demonstrated and published. Thus, results obtained by CMap were uploaded to NCBI Pub-Chem database for verification.

Identification of DCGs
Compared to the control group, 89 differential expression genes (DEG) are identified in the DCM group (57 up-regulated and 32 down-regulated), Fig. 1 is the hierarchical clustering heat-map.
The results of functional annotation about KEGG pathways are shown in Fig. 2. COL1A1, COL1A2, COMP, THBS2, and THBS4, RELN participate in the pathways associated with ECM-receptor interaction and focal adhesion. COMP, THBS2 and THBS4 are significantly enriched in all three pathways.

Topological analysis of PPI networks
Fifty-one significantly enriched gene were updated to STRING to construct the PPI network, and the PPI network was subsequently imported in Cytoscape to construct sub-networks. In sub-network 1, CTGF, COL1A1 (proteins with the highest degree of 8) and protein adjacent to them (COL1A2, THBS2, THBS4, POSTN, COMP, FIGF, PRELP) are highlighted. In sub-network 2 RPS4Y1 is in the central position. Sub-network three exhibit the three proteins with highest LogFC among all the sub-networks, as is all shown in Fig. 3. Table 1 shows fold changes of 11 DCMs and relevant P value, respectively. Figure 4 shows expression level of selected 11 genes between control and DCM samples. Among the 11 genes, nine of them are up-regulated while two are down-regulated.

Analysis of transcription factor
Transcription factor that modulate gene expression in DCM predicted by PASTAA is shown in Table 2. As shown in the Fig. 5a, Tef-1 and TFIIA transcription factor families play an important role in up-regulation of gene expression. TBP and TFIIA show features of constitutive expression as widely existing in transcription. By contrast, as shown in Fig. 5b, CORIN, FIGF of Myb families and Hnf-4 family are predicted to play an important role in down-regulation of gene expression. Figure 6 shows the transcription factor-binding site predicted by JASPAR.

Acquisition of target genes of drugs
Gene set A and B was uploaded to Connectivity Map to search for potential drugs, Table 3 shows predicted drug molecules that may induce/inhibit DCM-associated gene expression. Predicted molecule in Table 2 was uploaded to NCBI PubChem database to search for target genes of receptors. 19 target genes are significantly correlated with DCM, as shown in Table 4. Numbers in the brackets stand for the count of drugs. CTGF, POSTN, CORIN, FIGH have higher amount of drugs. Collagen family and THBS2, THBS4 are also ranked highly.

Discussion
Microarray analysis, an optimal approach to identify differentially expressed gene, helps to define an early diagnosis and lower misdiagnosis rate [21]. So far, the application Fig. 1 The heat-map of differential expression genes (P < 0.05, |logFC| > 1). NF: Control group, DCM: DCM group, up-regulated genes were in red and down-regulated genes were in blue of microarray analysis has revealed considerable bioprocesses associated with DCM. Barth et al [5] predicted and constructed a gene set including 27 genes differentially expressed, among which 25 were later confirmed. Micaela et al [22] analyzed ion channel-associated gene differential expression. They reported that ion channel-associated gene such as SCN2B. KCNJ5, CLIC2 play an important role in related bioprocesses. The present study aims at underlining the effects of regulation by transcription factors on differential gene expression. Effectiveness of target gene set is confirmed by referring to existing database.
By comparing differentially expressed genes in DCM samples with that from healthy controls, we predict that CORIN, FIGF, CTGF, COL1A1, COL1A2, THBS4, THBS4, POSTN, COMP, PRELP and RPS4Y1 may play a role in development of DCM. By the way, CORIN (encoding production could convert pro-ANP to bioactive ANP) is one of the known marker genes, the purpose of this study is to explore the mechanism underlying the differential expression level.
Cardiac remodeling characterized by collagen deposition in extracellular matrix [23] and myocardial fibrosis leading to heart failure [24,25]. Expression level changes of genes encoding proteins associated with: 1) ECM-receptor interaction, 2) focal adhesion, 3) TGF-beta signaling pathway may play an integral role in development of systolic dysfunction. Up-regulation of CTGF in TGF-beta signaling pathway can lead to an increased deposition of type 3 collagen (COL3A1) and type 1 collagen (COL1A1, COL1A2 and COL1A3) [26,27]. Furthermore, excessive collagen may cause myocardial fibrosis and heart failure. THBS2 and THBS4, existing in all three pathways that encode fibronectin are also modulated by CTGF. Previous study shows that up-regulation of THBS1 and THBS4 may lead to matri-cellular protein deposition, and subsequently resulting in DCM, heart failure or death. FIGF (also known as vascular endothelial growth factor-D; VEGF-D) in ECM-receptor interaction may also participate in the pathology of DCM. Gong X et al. [28] reported that VEGF preserve cardiac function after intra-myocardial transplantation in a DCM mouse by reducing cellular apoptosis and myocardial fibrosis in addition to enhanced angiogenesis, indicating that down-regulation of FIGF may improve cardiac function and preserve myocardial cells. Differential expression gene exhibit up-regulation in GO terms of extracellular matrix and focal adhesion [29]. Mal-regulation of extracellular matrix is associated with progression of cardiac remodeling and heart failure. COMP encodes a non-collagen protein in ECM and previous studies have demonstrated that abnormal expression of COMP result in myocardial cell apoptosis and loss of myofilament [30]. COMP, COL1A1, COL1A2 and PRELP are up-regulated in ECM-receptor interaction and focal adhesion, inducing a component change in extracellular matrix [31].
POSTN bond with heparin, and is showed in the PPI network co-expressed with COL1A1, COL1A2, THBS4 and CTGF. High impression level of POSTN in myocardial fibroblast contributes to cardiac remodeling [32]. Therefore, highly co-expression of both POSTN and CTGF (regulate heparin in fibroblasts) may be associated with antagonism. However, it is unknown whether     Transcription factors binding site predicted by JASPAR, Hnf-4a (Fig. 6a), Myb (Fig. 6b) and Tef-1 (Fig. 6c) POSTN is regulated by CTGF. The co-expression of RPS4Y1, DDX3Y, EIF1AY and RPL13 were discovered. Heidecker et al. [33] demonstrated that they are associated with idiopathic DCM, but the expression level differentiates significantly and the mechanism remains unclear. Transcription factors may affect the differential expression genes in DCM. Tef-1 (also known as TEAD1, TCF-13), a transcription enhancer, promotes the expression of troponin, myosin and actin [34,35]. TFIIA, a member of TF family, exists ubiquitously in global tissues, participating in the formation of transcription initiation complex as an intra-nuclear protein [36]. It may trigger the up-regulation of differentially expressed genes. MYB [37], known as a transcription activator, encodes transcription factors of myb family. MYB has 1) a N-terminal DNA binding domain, 2) a transcription activating domain in the center [38] 3) a C-terminal domain associated with inhibition of transcription. The C-domain may be responsible for down-regulation of FIGF and CORIN. Hnf-4 transcription factor family is mainly expressed in kidney and liver [39]. Given that it is a transcription activator, the inhibition may down-regulate the expression of CORIN, FIGF and subsequentially the down-regulation of ANP level.
Re-localization of drugs is a cost-effective approach so that the development of CMap is engaged in efficiently predicting potential targets that drugs can aim at. Tridihexethyl, fluoxetine and pirlindole can inactivate G-protein-coupled receptor. Fluoxetine, methyldopa and sulfadiazine can inactivate cystic fibrosis trans-membrane conductance regulator and sequentially down-regulate the expression of Collagen, THBS, COMP and PRELP. Methyldopa, sulfadiazine and khellin may further regulate the gene expression by regulating the translation. Besides, up-regulated CTGF, POSTN and down-regulated CORN, FIGF are seen as the major targets in CMap prediction, indicating that combination of multiple drugs may achieve a better treatment.
However, up or down regulation of specific genes may not be responsible for pathogenesis of the relevant diseases [40,41]. They may be only a downstream reaction or byproducts, such as s NPPB(logFC> 4.2)or NPPA(logFC> 2.5)in DCM. Age, gender, multi-drug therapy, etiology and individual variation may also cause differential gene expression. That is the limitation of the present study. In addition, we all know now that myofibril gene (protein) mutations are associated with the development of various cardiomyopathies including DCM. It is necessary in the future to carry out more analyses including the above-mentioned factors and myofibril protein mutations. Besides, differentially expressed genes in the same expression profile vary among different filters. These factors have also been taken into consideration in drawing any conclusions.

Conclusions
Bioinformatics-based analyses reveal the targeted genes probably associated with cardiomyopathy, which provide clues for pharmacological therapies aiming at the targets. Further studies considering other factors such as age, gender, multi-drug therapy, etiology and individual variation should be carried out in the future and bench work experiments should be performed to verify the relationship between the DEGs and the development of DCM. Mean:the arithmetic mean of the connectivity scores for corresponding instances, N:The number of instances, Enrichment: The degree of enrichment of the given instance in all instances, P-value:an estimate of the likelihood that the enrichment of a set of predicted potential drugs of all CMap drugs would be observed by chance Table 4 The genes targeted by the 20 potential drugs from CMap