The concept of “systems biology” is raised by Hood in 1999. It means studying all components with a systematic view. Systems biomedicine is the application of systems biology in medicine. It studies all components in a whole system and aims to reveal the patho-physiologic mechanisms of disease. In recent years, with the development of both theory and technology, systems biomedicine has become feasible and popular. In this review, we will talk about applications of some methods of omics in systems biomedicine, including genomics, metabolomics (proteomics, lipidomics, glycomics), and epigenomics. We will particularly talk about microbiomics and omics for common diseases, two fields which are developed rapidly recently. We also give some bioinformatics related methods and databases which are used in the field of systems biomedicine. At last, some examples that illustrate the whole biological system will be given, and development for systems biomedicine in China and the prospect for systems biomedicine will be talked about.
INTRODUCTION
The concept of “systems biology” is raised by Hood in 1999, when the human genome draft was nearly completed. It means studying all components with a systematic view. Systems biomedicine is the application of systems biology in medicine. Its aim is to reveal the patho-physiologic mechanisms of disease. It studies all components in a biological system holistically (including the DNA, mRNA, proteins, and small biological molecules in a cell, tissue or body) under defined conditions, and reveals the complex interactions between these components [
1].
Systems biomedicine requires a multidisciplinary approach, with cooperation from experts in different fields including life sciences, information science, mathematics, computer science and other disciplines. Actually, this idea of multidisciplinary cooperation was proposed by Kamada [
2] and Zeng [
3] in 1992. Because of the lack of high-throughput biologic research tools, systems biomedicine research was hindered for many years. In recent years, with the development of both theory and technology, such as omics and bioinformatics, plus cooperation between different fields, including biology, mathematics and computer science, systems biomedicine has received renewed interest.
In this review, we will talk about applications of omics methods in systems biomedicine, including genomics, metabolomics (proteomics, lipidomics, glycomics), and epigenomics. Microbiomics and common diseases omics are two fields which are developed rapidly. We will talk particularly about these two fields. As for the tools for systems biomedicine, bioinformatics is necessary and important. We will give some bioinformatics related methods and databases. At last, some examples that illustrate the whole biological system at the integrative level will be discussed.
APPLICATIONS OF OMICS
Genomics
Genomics is a broad field. It focuses on the study of the genomes, includes whole-genome DNA sequencing and genetic mapping of organisms, and includes studies of multiple genomic phenomena, such as gene expression and gene-gene interation [
4]. Rapid advances in genetic sequencing techniques have revolutionized genetic analysis, and DNA sequencing at the whole-genome level is now possible. Despite the completion of the Human Genome Project in 2003, which revolutionized human genetics, linkages between genomic data and phenotypic information are still limited. For this reason, further approaches are preformed to investigate the relationships between genotype and phenotype in organisms.
GWAS
Genome-wide association study (GWAS) is a method to study the association between phenotype and single nucleotide polymorphisms (SNPs), which are polymorphic markers found quite evenly throughout the genome. GWAS is a revolutionary approach because it can estimate the genetic association between the entire genome and a disease in numerous unrelated individuals at high resolution, and is not affected by previous hypotheses about genetic associations with disease [
5]. Early in 1996, Risch and Merikangas first suggested using GWAS approach to study the complex causes of human disease [
6]. GWAS results were published in 2005 [
7] and 2006 [
8], and the technique was given a major kick-start by the Wellcome Trust Case Control Consortium (WTCCC) [
9].
Monogenic and oligogenic diseases
Monogenic and oligogenic diseases are generally thought of as simple Mendelian diseases, such as sickle cell disease (SCD) and β-thalassemia. In 2008, a major GWAS showed that rs11886868 located in BCL11A gene strongly affected fetal hemoglobin levels in 4305 Sardinians and in a large number of sickle cell patients [
10]. Another report in 2008 identified an SNP located in the BCL11A gene that was associated with fetal hemoglobin levels and pain crises in sickle cell disease [
11]. These findings suggested that BCL11A plays an important role in regulation of fetal globin expression. As expected, further research has identified BCL11A as a specific regulator of the expression of human hemoglobin [
12,
13].
Although exon regions only occupy 1% of the genome, 85% of the exon mutations will lead to Mendelian-inherited disorders. In recent years, whole-exome sequencing (WES) has been a powerful research tool to reveal novel exon mutations in Mendelian disorders with obscure etiologies [
14−
16]. In 2010, a WES study found a link between Miller syndrome and mutations in DHODH [
17]. Recent WES studies have revealed that
de novo germline SNPs in single genes is the major cause of rare sporadic malformation syndromes such as Schinzel-Giedion syndrome, Kabuki syndrome and Bohring-Opitz syndrome [
18−
20]. Therefore, the widespread availability of exome sequencing would promote the study of diseases, especially for the monogenic and oligogenic diseases.
Complex diseases
Coronary artery disease (CAD) and myocardial infarction (MI) are the leading causes of disability and mortality worldwide [
21,
22]. Recently, GWAS was used to identify loci for CAD and discover its risk factors in humans, and the results were recently reviewed [
23]. Several studies have identified SNP 9q21 as a high risk locus for CAD [
9,
24−
29]. Further studies have confirmed that the risk SNP 9q21 could regulate the expression of CDKN2A/B in humans [
30−
32]. Recently, a GWAS discovered four novel risk loci named 2q24.1, 4p32.1, 6p21.32 and 12q21.33 in 33000 Han Chinese cohorts. These findings could provide new insights into the pathways that contribute to CAD susceptibility in the Han Chinese population [
33]. As one of the most important risk factors of CAD, hypertension is also widely studied. There were two major studies of hypertension using the GWAS approach in 2007 [
9,
34]. However, both studies did not identify markers associated with hypertension on a genome-wide level. Nevertheless, the following two studies have identified 8 genetic regions associated with systolic or diastolic blood pressure and 10 SNPs with hypertension in subjects of European ancestry [
35,
36]. GWASs have concluded that the 12q21 locus is associated with systolic blood pressure in non-European ethnic groups [
37,
38]. Moreover, the strengths and weaknesses of GWAS on hypertension research were reviewed recently [
39].
During the past few years, GWAS has identified numerous strong associations between genetic loci and different types of cancer. Some loci, such as 8q24, were identified as cancer-susceptibility regions for many unrelated cancers, including prostate cancer [
40], glioma [
41], breast cancer [
42], colorectal cancer [
43], bladder cancer [
44], ovarian cancer [
45] and pancreatic cancer [
46]. Therefore, an investigation into those loci may reveal new mechanisms of carcinogenesis.
Type 2 diabetes (T2D) is one of the most prevalent metabolic diseases. Before the GWAS era, only one locus located in glucokinase gene (GCK) had been strongly associated with fasting glucose levels [
47]. GWASs have identified about 50 risk loci of T2D to date [
48−
51].
GWASs have also identified a number of SNPs with other complex diseases, such as auto-immune disease and psychiatric disorders.
In recent years, many important biologic discoveries have been made via GWAS. This approach has revealed the associations between genomic variants and complex or monogenic diseases in large populations, especially the linkages between SNPs and common complex diseases, such as diabetes, cardiovascular disease (CVD), auto-immune disease, psychiatric disorders. To date, more than 2000 genetic loci have been identified with significant association to one or more complex traits [
52]. Although association between risk loci and diseases are well documented, the mechanisms of these associations require further research. Furthermore, GWAS may also discover new targets of complex diseases for drugs [
53−
55]. Based on GWAS results, deeper sequencing and analysis of the risk loci, and integration of the results with metabolomics or even cellular pathways may reveal some new, potential avenues for making targeted drugs and clinical interventions [
56].
ENCODE
The aim of the Encyclopedia of DNA Elements (ENCODE) Project is to determine all the functional elements in the human DNA sequence. The whole project is divided into three phases. The initial pilot phase of the project ran from 2003 to 2007, focusing on the sequence elements and discovered their biological function (approximately 1% of human genome sequence) [
57]. The second phase covered the whole genome, including 70000 promoter regions and 400000 enhancer regions [
58]. Results of the second phase were published in September 2012, including 6 papers in
Nature, 18 papers in
Genome Research and 6 papers in
Genome Biology. ENCODE’s analysis revealed that about 80% of the human genome has a “biochemical function.” It also revealed some new aspects of gene expression and regulation, and the organization of related information [
58]. The last phase, which is still under way, will finish the annotation of all human DNA elements, aiding our understanding of normal life processes, and of mechanisms of disease.
Metabolomics
Metabolomics, one of the most important fields in systems biomedicine, is the study of biochemical processes. The metabolome consists of many metabolites, which are the small molecules producted by enzymes in cells, tissues, organs or organisms [
59]. Thanks to recent technological advancements, thousands of metabolites can now be measured, fast and quantitatively. Biomarker discovery by metabolomics can help patients and doctors make better drug choices.
Proteomics
Proteomics is the study of the properties of proteins, such as their structures, expression levels, post-translational modifications, and interactions [
60]. In 1989, Fields and Song devised the yeast two-hybrid (Y2H) method to probe for protein–protein interactions [
61]. High-throughput Y2H maps have since been generated for many different species.
An alternative approach, high-throughput co-affinity purification followed by mass spectrometry (AP/MS), can also be used to detect protein–protein interactions. In 2008, Yu et al. produced a high-quality binary interactome network (a protein interaction map) in yeast supported by literature-curated protein interaction data sets [
62]. In 2010, a genetic interaction map of the entire
Saccharomyces cerevisiae gonome displayed more details of protein–protein interactions in cells, and it also identified that extensive and precise genetic landscape mapping could help to explain genetic interactions and help with drug target identification [
63]. Several studies have highlighted proteins that interact directly with proteins already known to be implicated in pathogenesis [
64−
67]. Therefore, a thorough understanding of the function and structure of biological networks can reveal how diseases arise and progress.
Lipidomics
Cellular lipids, which are generated and metabolized by enzymes, are small molecules with great chemical diversity. All biological membranes contain amphiphilic lipids, including glycerophospholipids, sterols, and sphingolipids. Cellular organelles usually have different, organelle-specific lipids. Mitochondria, for instance, are enriched with cardiolipin. Some diffusible and soluble lipids are usually considered as signal molecules, such as arachidonic acid, lipoxin B4, prostaglandin H2 and platelet-activating factor. However, highly non-polar lipids, which are usually synthesized in the endoplasmic reticulum (ER), are usually stored in lipid bodies as energy stores.
In recent years, lipidomics has benefited from novel analytical approaches, particularly liquid chromatography (LC) and mass spectrometry (MS) [
68]. MS is often coupled with LC, which can separate a lipid before their introduction to the ionization source of the mass spectrometer. High-resolution hybrid systems couple the advantages of several mass analyzers into a single instrument and allow for highly accurate mass measurements of ion species [
69,
70]. LC/MS has revealed over 500 different types of lipids in human plasma alone, such as fatty acyls, glycerolipids, and glycerophospholipids [
71].
Clinically, the most important plasma lipids are cholesterol and triglyceride (TG). Due to the non-polar character of cholesterol and TG, the lipoproteins (spheroidal macromolecules) are required for their secretion into the plasma. Studies have shown that levels of these plasma lipids and lipoproteins are linked to the susceptibility of CVD [
72,
73]. A comprehensive analysis of these lipids shows that the compositions of lipids in atherosclerotic plaques are different from normal plasma lipids [
74]. Further studies have shown that plasma lipid profiling could be used to test for risk of unstable CAD [
75]. In obese (ob/ob) mice, lipid profiles in liver and their correlation networks were significantly different when compared to the control mice [
76]. Statins are the most commonly prescribed drugs for the prevention of CVD, and work by reducing the level of low density lipoprotein (LDL) cholesterol in the plasma. Individuals with different lipid metabolisms respond differently to statin treatment, and lipidomics could be used to guide the dosage [
77].
Lipid profiling chromatography, MS and nuclear magnetic resonance (NMR) provide compositional identification of lipid samples. Proton magnetic resonance spectroscopy premits the study of biochemistry and metabolism of lipids in a living cell, and then provides a specific index to detect the abnormalities and progression of diseases
in vivo [
78]. Recently, a novel approach for lipid profiling, named single-cell laser-trapping Raman spectroscopy, may directly quantify lipid profiling in single cell
in vivo. This new method could be very useful for a diverse range of applications in lipidomics [
79].
Glycomics
Glycomics is the quantification of the glycome of a cell, tissue or organism [
80]. Most cells, from prokaryotic bacteria to mammals, are coated with a glycocalyx (or ‘sugar coat’). In eukaryotes, glycans bond covalently with proteins and lipids to form the dynamic, structurally diverse family of glycoconjugates. Glycoproteins and glucolipids are responsive to a wide range of intercellular and intracellular biological processes [
81].
In recent years, studies have already revealed some potential biomarkers for multiple sclerosis [
82], cancer [
83] and inflammation-related diseases [
84] in serum. Callewaert et al. showed that glycome profiles vary significantly in different stages of fibrosis [
85].
Importantly, glycoproteins are also used in clinical cancer diagnosis. Therefore, discovering the biomarkers using glycomics is very beneficial for the early detection of cancer [
86]. Recently, fucosylated α-fetoprotein has been used as a diagnostic marker of primary hepatocarcinoma [
81]. In the future, discovering biomarkers using glycomics will not only provide a new paradigm for understanding the role of the glycome in many biologic areas, but will also aid the process of clinical disease diagnosis.
Epigenomics
Maunakea et al. define epigenomes as “the combination of entire genome-wide chromatin modifications in any given cell types that directs its unique gene expression pattern” [
87]. Unlike the genome, epigenomes are highly dynamic in different cell types. Epigenomes, characterized by DNA methylation, histone modifications, post-transcriptional regulation via miRNAs, and post-translational regulation via protein modification, establish and maintain cell type-specific gene expression states [
88]. Moreover, studies have shown that the epigenotype plays a critical role in different types of diseases.
DNA methylation
DNA methylation of cytosine at position C5 in CpG dinucleotides is the major target for DNA modification in mammals [
89]. DNA methylation is involved in the regulation of cellular processes, such as embryonic development, and genomic imprinting [
90]. As early as 1983, a linkage between DNA hypomethylation and cancer was discovered [
91]. Recent studies have suggested that loss of genomic methylation is an early event in cancer [
92], and it is known that specific and global DNA methylation is usually disordered during carcinogenesis. A recent study showed that the change of DNA methylation in p15 (INK4b) is strongly associated with expression of ANRIL on chromosome 9p21 and CAD [
93]. Further studies of methylation could provide new insights in disease and novel strategies of diagnosis and therapy.
Histone modifications
In 2008, Wang et al. analyzed 39 histone modifications in human CD4+T cells [
94]. Histone modifications play a key role in the regulation of gene expression. For this reason, histones are increasingly being recognized as dynamic regulators of gene activity, which are controlled by several post-translational chemical modifications, such as acetylation, methylation, phosphorylation, ubiquitylation and sumoylation [
95]. Studies showed that altered histone modifications can lead to many human diseases, mostly related to cancers. It has been identified that the global loss of acetylation and trimethylation in H4K16 and H4K20 respectively, for instance, was associated with breast and liver cancer in studies [
96,
97]. Recently, a novel therapeutic strategy for aggressive B cell lymphomas through altered histone modifications was identified [
98]. In diabetes development, key genes Pdx1 and Glut4 were proved to be regulated by DNA methylation and histone modifications [
99,
100]. Further studies of histone modifications are useful to the understanding of both cell development and the disease therapy.
RNA editing and miRNA
Post-transcriptional modification of mRNA plays a key role in gene expression regulation and cell development. Early in 1991, RNA editing was identified as a determinant controller of ion flow in glutamate-gated channels in the brain [
101]. Recently, common, tissue-specific methylations of the N6 position of adenosine (m6A) of mammalian mRNA were detected, and methlytions of m6A were found to be specifically enriched near stop codons and in 3′UTR [
102]. Moreover, RNA editing is also involved in some disease progress. Studies of 2C-subtype serotonin receptor RNA editing patterns in psychiatric disorders were recently reviewed [
103].
The noncoding region includes several types such as microRNA, siRNA, piRNA and long noncoding RNA. These noncoding RNAs are functional in chromatin modification, transcription and post-transcription modification. Researches demonstrated that noncoding RNAs played a role in disease and could be diagnostic markers or therapeutic targets [
104,
105]. ANRIL, a long noncoding RNA, was found to be associated with CVD by GWAS. Research demonstrated that ANRIL can regulate the expression of the CDKN2A/CDKN2B locus and thus influence the proliferation of vascular smooth muscle cell and coronary heart disease [
31,
106]. BACE1-AS level was found to be elevated in Alzheimer's disease patients, and BACE1-AS can regulate BACE mRNA expression involved in the Alzheimer’s disease pathology [
107].
miRNAs relevant to epileptogenesis contain m6A sites and could be regulated by RNA epigenetic modification [
108]. Mutation and dysfunction of miRNA may lead to various diseases [
109]. Due to the informative nature of circulating miRNAs, many miRNAs are used as biomarkers for tumors. And different biomarkers may reflect presence of CVD, or tumors in specific tissues and differentiation of the states [
110,
111]. One recent study showed that miRNA-21 could affect myocardial disease by regulating MAP kinase signaling pathway in fibroblasts [
112].
Protein modification
Post-translational modifications of protein regulates protein activation and other cell processes. One recent study identified nearly 200000 protein post-translational modification sites across 11 eukaryotic species [
113]. Many of these studies suggested that the mutation of the post-translational target sites were directly or indirectly involved in disease. For instance, the abnormal post-translational modifications in prion protein were shown to be associated with autosomal dominant spongiform encephalopathy [
114]. The study of post-translational modifications will improve our fundamental understanding of the mechanisms of disease.
Mitochondrial genome
Human mitochondria contain a compact circular genome [
115]. The regulation and expression of the human mitochondrial genome is unique. The first complete map of the human mitochondrial transcriptome was supplied in 2011 [
116], and this map could enhance the research of disease-associated variants [
116]. Mutations in the mitochondrial genome have been linked with many common diseases [
117]. Studies showed that dysfunction of mitochondrial DNA (mtDNA) was involved in diabetes [
118,
119], and a recent study identified that mtDNA mutation C3256T in white blood cells is involved in atherosclerosis and CAD [
120]. Although there is a clear association between mtDNA mutations and some diseases, the true relationship between mtDNA mutations and human health needs further investigation.
With the breakthrough of technology, especially next-generation sequencing technology, epigenomics is entering a new era. It is now possible to map dynamic epigenetic information with precision and speed. Epigenomic research may be highly beneficial to our understanding of the disease pathology in the future.
Human Microbiome Project
Animals and microorganisms have existed together for hundreds of millions of years. The harmonious relationship between animals and microorganisms is closely related to men's health. Researchers gradually realized the importance of the microbiome after the implementation of the Human Genome Project which was launched by the US National Institute of Health (NIH) in 2007. Metagenomics of the Human Intestinal Tract (MetaHIT) was then initiated in 2008 [
121], which aims to sequence 3000 bacterial genomes found in the human gut [
122]. NCBI and DACC have published 800 bacterial genomes so far [
123].
The human body contains various of microbiotas, and each microbiota contains various microbial cells [124]. Researchers have found that these microbiome genomes have a great influence on human health.
Microbiome diversity and similarity
Intestinal microbiota are dynamic and instable. Almost 99.9% of the human genome is identical, however, there are extreme differences in the human microbiome [
124]. Since some diseases are associated with the microbiome, these differences offer profound insights in the field of personalized medicine.
One of the most important features of the microbiome is its diversity. Microbiomes vary with the geography, age, and lifestyle of their human hosts [
125,
126]. Older people have a more diverse microbiome. A person’s microbiome is also different in different regions of the body. Rapidly-Technologies such as16S rRNA, stool transplantation [
124] and sequencing of DNA are very powerful. Ever-decreasing sequencing costs provide large amounts of microbiomic data, which will aid in our understanding of the diversity of the microbiome.
In spite of a large amount of differences in human microbiome, several metabolic pathways are identical [
127]. Stable metabolic pathway is the distinguishing feature of healthy people which can be employed to diagnose diseases. If the metabolic pathways deviate from the normal, a poor health will be the result. This suggests that it is very important to treat diseases with metabolic pathways of microorganisms.
Microbiome and diseases
Pathogenesis research at the genome level is insufficient. From former observations in mouse models to late studies in human volunteers, a series of evidence has proven that intestinal microbes play a critical role in diseases.
A complex interaction between bacteria and intestinal tissue could play an important role in pathogenesis [
128]. Germ-free mice receiving microbiota from conventional mice presented obesity, and this was shown to be attributed to energy deposition into host adipocytes by microbiota [
129]. The gut microbiome is a marker for diabetes. Research has indicated that microorganisms in type 2 diabetes (T2D) patients are more abundant than in controls [
130]. Metabolic disease is affected by a multitude of factors, but diet plays the biggest role [
131]. Lipopolysaccharides (LPS) and other bacterial fragments can lead to the development of metabolic and cardiovascular diseases [
132]. Commensal microorganisms have also been linked to asthma. Disrupting the microbiota of babies has been shown to enhance the morbidity of asthma [
133]. Wang et al. found that gut flora metabolism of phosphatidylcholine promoted cardiovascular diseases [
134].
New model for diseases and prospects
The gut microbiome should be considered as a risk factor for the development of metabolic diseases. So a new model comes out to describe the influencing factors of diseases [
135]. The new model emphasizes the importance of the interaction between gut microbiota and the host, whereas the old model is only based on environmental and genetic factors. We should attach the importance of dietary, lifestyle and physiologic characteristics to health and take microbiome more seriously. We hope that we can understand the human health more systematically and use more effective ways to treat diseases.
Common diseases
One of the aims of systems biomedicine is to reveal the patho-physiologic mechanisms of diseases. Genetic diseases are categorized into three types: single-gene disorders, common complex diseases, and oligogenetic diseases. The common complex traits are influenced by many factors, each of which has only a modest effect. GWAS, epigenomics, metabolomics and microbiomics are playing important roles in studying the mechanisms of common disease.
Genome-wide association studies (GWAS)
Genome-wide association studies (GWAS) could be a powerful instrument to understand the network of common diseases, and an increasingly large body of evidence is supporting this theory. A majority of variants lie within noncoding sequences or within introns, which are often marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs) [
136]. The WTCCC [
9] pointed out that many previously identified genetic loci were connected in a network of seven common diseases, and some loci were linked to more than one disease. Torkamani et al. suggested that due to the genetic links among diseases, a new way to categorize disease should be explored [
137]. Harold et al. found that Alzheimer's disease was associated with CLU and PICALM gene by a two-stage GWAS [
138]. Chen et al. found that rs3803662 and rs10941679 were associated with breast cancer [
139]. Yeager et al. identified that in 10286 cases and 9135 controls of European ancestry, chromosome 8q24a was associated with prostate cancer [
140]. Schnabel et al. reported that thanks to the development of novel technologies in GWAS, we could understand the complexity of CVD more systematically [
141].
The GWAS approach is very promising. It could be further improved by integrating other omics and technologies to develop a new strategy to study the network of common diseases [
142].
Epigenomics
A large proportion of mechanisms among common diseases are elusive only by the strategy of GWAS [
143]. Epigenetic modifications are heritable alterations of the genome and its feature is that it can change the gene expression, but cannot change the DNA sequence. Epigenetic modifications play pivotal roles in studying the networks of common diseases, and due to the development of high-throughput arrays, the high-throughput study of epigenetic modifications is now possible [
144].
Toyota et al. reported that DNA methylation regulated diverse functions such as imprinting, genomic stability, and gene transcription, and it might even be associated with human tumors [
145]. Ivanova et al. reported that BMP4 promoter methylation levels were directly correlated with tumors and led to bad prognosis [
146]. Epigenetic modifications may also have a role in increasing CVD risk [
147].
New opportunities will be provided by epigenome-wide association studies (EWASs), however, challenges will also be presented [
148].
Metabolomics
Metabolic disturbances can lead to increased morbidity of common diseases [
149]. GWAS could identify the differences between unhealthy and healthy patients.
Quinones and Kaddurah-Daouk identified that metabolic pathways such as fatty acids, oxidative stress and mitochondrial function were highly associated with central nervous system (CNS) disorders [
150].
Metabolomics is a new strategy to study metabolites in urine, blood and tissue and it could be affected by the microbiome [
151]. Metabolic differences between healthy and unhealthy people can lead to a systematic understanding of the network of common diseases.
BIOINFORMATICS: COMPUTING SYSTEMS BIOMEDICINE
Recent years saw rapid advances in genome sequencing technology, which boasted the researches in identification of genes that potentially lead to such “complex diseases” as hypertension, cardiovascular disease, asthma, and cancer [
152,
153]. These complex diseases were proposed to be caused by a combination of multiple genes and a multitude of environmental factors. Genetically, their complexity lies in multiple aspects related to genetic polymorphisms (e.g., single-nucleotide polymorphisms or SNPs), gene expression, biological pathway, epigenetic modifications, noncoding RNA and gene-environment interactions. Nowadays, by adopting high-throughput methods, scientists can obtain much information about SNP genetic spectrum, gene expression profile, and protein spectrum. Accordingly, bioinformatics methods, tools, and databases have been developed and used to aid in the researches to understand these complex diseases at the systemic level.
Researches in this field are focused on the following questions: (1) How can genetic susceptibility markers be identified? (2) How can relationships between epigenetic markers and diseases be identified? (3) How can chromatin modification, RNA expression and protein expression data be combined to analyze the mechanisms of disease? (4) How can functional modules and biological pathways be elucidated? (5) How can all the data in a whole intact system, such as a cell, an organ, or a whole body be integrated? (6) What databases are available to guide our research in systems biomedicine?
1.How can genetic susceptibility markers be identified?
Since the completion of the HapMap project, SNP markers have been widely applied, and GWASs have been adopted to test the association between SNPs and a specific disease. A GWAS research often involves 300000 or more SNP markers, evenly spreading across a genome [
154]. GWAS involves powerful statistical test to locate many common disease genes by constructing and analyzing high-density SNP maps. By performing the WTCCC study [
9] and German MI (Myocardial Infarction Family) Study [
155–
157], Samani et al. identified SNPs with the strongest associations with coronary artery disease [
24]. Lu et al. applied meta-analysis to identify susceptibility loci related to coronary artery disease in the Han Chinese, and identified four new loci related to coronary artery disease [
33]. These findings provide new insights into pathways contributing to the susceptibility for coronary artery disease in the Han Chinese population.
2.How can relationships between epigenetic markers and diseases be identified?
Gene expression must take place accurately at the right time and place. Normal phenotypic changes of DNA and histones regulate the expression of gene function. Systematic study of the important roles of DNA methylation in the human epigenome, embryonic development, gene imprinting, allele inactivation and tumorigenesis has become a new research hot spot.
DNA methylation is a crucial epigenetic modification of the genome. DNA methylation and RNA-Seq data have implied the correlations between DNA methylation and transcriptional levels in both the kidney and the liver [
158]. Robertson et al. showed that many human diseases are associated with aberrant DNA methylation patterns [
90]. Yi et al. described the relationship between tumor occurrence and the abnormal methylation [
159]. Uhlmann et al. found that tumor marker genes of different pathological types of glioma cells have different levels of methylation [
160]. Recently, a large number of high-throughput data sets of methylation and histone modification in normal tissues and diseases have been available thanks to the usage of next-generation sequencing technologies, and were collected in databases, such as e.g., the Cancer Genome Atlas (TCGA) [
161], Human Histone Modification Database (HHMD) [
162], DiseaseMeth [
163] and MethDB [
164–
167]. Accordingly, novel computational tools, including Batman [
168], methBLAST [
169], MethTools [
168] and QUMA [
171] to perform methylation analysis have been developed [
172]. Notably, a comprehensive tool—CpG_MPs has been applied to identify and analyze the methylation patterns of genomic regions from bisulfite sequencing data [
173], whereas QDMR (quantitative differentially-methylated regions) implemented a new means of data analysis to identify DMRs (differentially-methylated regions) from genome-wide methylation profiles by adapting Shannon entropy theory [
174].
3.How are chromatin modification, RNA expression and protein expression data combined to analyze mechanisms of disease?
DNA microarray technology has offered the possibility to monitor the activities of thousands of genes simultaneously. To identify disease-risk biomarkers, multiple data mining approaches have been applied to analyze gene expression profile. According to the models of learning algorithms, current analytical strategies can be classified into two groups: unsupervised and supervised learning methods [
175].
Unsupervised learning methods, such as K-means clustering, hierarchical clustering and principal component analysis method, can be used for gene clustering or sample clustering. For example, gene clustering is often performed to highlight some functionally related gene groups and predict the function of those genes [
176], while sample clustering is performed to find different disease subtypes [
177]. In 2000, based on the hierarchical clustering algorithm, Alizadeh et al. identified two molecularly distinct spliceosomes in diffuse large B cell lymphoma tumor patients using DNA microarray data [
178].
Supervised learning methods, including linear discrimination, decision trees, artificial neural networks and support vector machines (SVM), can be used for the identification of the characteristics of genes and sites [
179]. In 2004, Li, X. et al. developed a novel tree-based ensemble decision approach, analyzed two publicly available data sets [
175] and identified 20 genes which are highly significant as related to colon cancer and 23 genes which are molecular signatures of the acute leukemia phenotype. Furthermore, in 2012, by adopting learning algorithms, Chen et al. combined genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles over a 14-month period from a single individual, and presented an integrative personal omics profile (iPOP), and revealed various disease risks [
180]. With further accumulation of omics profiles from larger number of individuals, more sophisticated supervised learning methods are needed to aid in our understanding of the mechanisms of disease.
4.How can functional modules and biological pathways be elucidated?
The development of complex diseases is initiated by multiple etiologies. Analysis of modules and pathways reflects the trend of system biology, which is about integration and interpretation beyond individual genes. However, it is a challenging task to identify common functional modules and biological pathways associated with complex diseases.
Statistical methods and software implementing them have been developed to predict gene functions and have accelerated the study of genes and their products for clinical purposes. Based on iPOP-analysis, integrating clustering analysis and pathway enrichment, Chen et al. analyzed various omics sets, identified differentially expressed components and elucidated several important pathways [
180]. By analyzing the contribution of genetic factors and biological network of pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG), Chen et al. proposed an approach to prioritize risk pathways by fusing SNPs and pathways for common diseases. This approach was applied to five common diseases, and the result revealed that the five diseases not only share common risk pathways, but also have their own specific risk pathways [
181].
5.How can all the data in a whole intact system, such as a cell, an organ, or a whole body be integrated?
To explore human genetic disorders and the relationship among multiple genes at a higher level of cellular and organismal organization, Goh et al. constructed the Human Disease Network and Disease Gene Network [
65]. To explore regulatory network models, Califano et al. proposed an integrative approach requiring the simultaneous reconstruction of context-specific gene regulatory network based GWAS data set [
182]. By integrating the disease states of miRNAs, Li et al. constructed a network of bipartite miRNAs and sub-pathways [
183]. Ye et al. constructed an miRNA-TF co-regulatory network specifically for T-ALL(T-ALL), inferred some hub regulators, genes, and their regulation in the network, and identified important miRNAs and regulatory modules in T cell acute lymphoblastic leukemia [
184].
In summary, bioinformatic studies have begun to dig into the systematic biomedicine and get some promising results. With the help of bioinformatic approaches and tools, we can establish biological models and construct the biological control network, and better understand context-specific nature of biological process regulation, thus to gain a new insight into normal cell physiology and its dysregulation in disease. Biological system is a dynamic system, and with the development of systems biomedicine, we can observe the dynamics of the biological systems further. We assume that reconstruction of realistic biological model may be one of the most critical challenges of quantitative biology.
6.What databases are available to guide our research in systems biomedicine?
Up to now, numerous biological databases have been developed, which may help researchers in systematic biomedicine. Here, we introduce some databases related to complex diseases.
OMIM (Online Mendelian Inheritance in Man). OMIM is the most authoritative database of human genetic diseases. It classifies the genetic diseases and provides creditable and detailed information about disease heredity and related disease gene loci. (http://www.ncbi.nlm.nih.gov/omim, http://omim.org/).
GAD (Genetic Association Database). GAD contains a number of gene and polymorphism information of human complex diseases. This information comes from the research and arrangement of previous association analysis, which is convenient for researchers to quickly identify polymorphism of diseases from many multiple data. Moreover, this database allows the users to review submission records.
CGAP (Cancer Genome Anatomy Project). The aim of CGAP is to produce the information by unscrambling the molecular structure and establish a series of analysis technologies to dig tumor-related genes, proteins, and other biological markers. Moreover, it provides information resources and technological methods for the study of tumors. This database contains 7 related modules to share data, bioinformatics analytical tools and biological related resources.
GeneCards. GeneCards is an integrated database of human genes, which provides genomic, proteomic, transcriptomic, and functional information of known and predicted human genes. This database focuses on diseases, mutations and SNPs, gene expression, gene functions, pathways and protein interactions. This database particularly emphasizes the overall information, but there is less details about human diseases. It is a powerful functional genome data and contains external linkage for related databases, which can provide chromosomal localization, gene expression information, homologous genes and corresponding proteins for human diseases. (http://www.genecards.org/).
KEGG DISEASE. The KEGG DISEASE is an online database to store the information about genes, pathways, drugs and diagnosis markers of diseases. It includes information about genetic and environmental perturbations. In this database, every disease has a certain H number for entry. According to the list of known genetic factors, environmental factors, diagnostic markers, or therapeutic drugs, diseases are being organized in KEGG DISEASE. (http://www.genome.jp/kegg/disease/).
miR2Disease. miR2Disease is a manually built database that provides information about miRNA deregulation in various human diseases. It is convenient for users to get the detailed information on an miRNA-disease relationship by each entry (http://www.mir2disease.org/) [
185].
HEP (Human Epigenome Project-Data). The goals of HEP are to define, record, and explain the DNA methylation patterns of the whole human genome in the major tissue. To date, investigators can freely obtain DNA methylation spectral data of human 6, 20, and 22 chromosomes. (http://www.epigenome.org/).
HHMD (Human Histone Modification Database). HHMD is the most comprehensive and systematic database, which is based on the various experiments about human genome histone modifications. It contains the high-throughput experimental data of 43 human genome histone modifications, and it also provides the histone modifications information of 9 cancers obtained by literatures.
MethyCancer. This database is a multiple information database including DNA methylation, mutations, cancer-related gene, cancer information and CpG island. The aim of the database is to study the interaction among the DNA methylation, gene expression and cancer.
INTEGRATION: RESEARCH FROM SYSTEMIC VIEW
As mentioned above, large quantities of omics data are accumulating. The final aim of systems biomedicine is to integrate all these components and to reconstruct a system. Understanding how complex phenotypes originate from a certain system is a difficult problem. Fortunately, some researches into integration have emerged in recent years.
One report utilized a computational model to simulate the process of cell divisions of the human pathogen
Mycoplasma genitalium [
186]. It first defined 4 types of substances (metabolites, RNA, protein, and DNA) and 28 submodels. Multiple external variables affected a cell, including the geometry, the mass of the cell, the stimuli that the cell accepted, and the type of host that the cell lived in. All variables influenced 4 types of substances and 28 submodels of the cell, and then determined whether the cell was to divide. This study demonstrated the application of systematic and quantitative methods to predict a cell’s status.
Integrative personal omics profile (iPOP) used multiple omics methods, including genomics, transcriptomics, proteomics, and metabolomics to study a single individual [
180]. It analyzed risk factors for some diseases, such as diabetes and coronary artery disease. It also analyzed profile transcription, protein, and RNA editing at the systemic level at different states. By comparing changes between healthy and diseased states, it revealed systemic signatures associated with disease. This study is a good example for the application of systems biomedicine in the human body.
DEVELOPMENT IN CHINA AND PROSPECT OF SYSTEMS BIOMEDICINE
Systems biomedicine is an emerging field. In China, systems biomedicine is developing rapidly in recent years. Some centers for systems biomedicine have been built. Specific examples for systems biomedicine include Peking University Institute of Systems Biomedicine and Shanghai Center for Systems Biomedicine. These centers are pursuing interdisciplinary research, focusing on basic research from the systematic view, and encouraging applications in clinical medicine.
In addition to the rapid development of systems biomedicine in recent years, the idea of traditional Chinese medicine (TCM), which has a long history, resembles systems biomedicine in many aspects. TCM focuses on health maintenance, diagnosis and treatment of diseases on a systemic level, whereas systems biomedicine aims to integrate individual molecules and their interactions to understand how complex phenotypes arise. Systems biomedicine bridges TCM and western medicine. In the future, the systems approach will combine principles of TCM with that of western medicine, and pave the way for predictive, preventive and personalized medical practice.
Systems biomedicine will greatly facilitate our understanding of disease pathogenesis in the future. It helps to discover the disease mechanism and helps to prevent, diagnose and treat diseases in clinics. As we know, diseases are caused by multiple genetic and environment factors. For a certain person, disease is caused by some certain factors. Knowing the detailed information about one’s genetic background and environment factors from the systematic view, we could evaluate the disease susceptibility, offer the proper prevention, and use exact treatment. The development of systems biomedicine makes it possible to realize “3P” medicine, that is, predictive, preventive and personalized medicine.
Conflict of interest
The authors Zhuqin Zhang, Zhiguo Zhao, Bing Liu, Dongguo Li, Dandan Zhang, Houzao Chen and Depei Liu declare that they have no conflict of interests.
Higher Education Press and Springer-Verlag Berlin Heidelberg