1 INTRODUCTION
The ultimate goal of modern human biology is to completely understand how genetic make-up results in specific phenotypic properties, namely how human bodies function, and how to prevent deterioration. Although we often assume that the human genome is fixed, there are individual differences that drive phenotypic variations, such as skin pigmentation, weight, height, blood pressure. A large portion of these traits can be regulated by environmental changes, for instance diet, climate, sunlight exposure, microbial exposure. Such diverse interactive sources have led to an unprecedented collection of physical, environmental, biological, and medical threats faced by human beings.
The past two decades have witnessed a revolutionary advancement in the field of human genomics. Following the announcement on the completion of the draft of human genome sequence in 2000 [
1,
2], genomic technologies, along with the analytical and digital revolution, have progressed at a tremendous pace – the inauguration of the International HapMap Project [
3–
5], followed by the launching of whole genome genotyping arrays, and subsequently, the genome-wide association studies (GWAS) on numerous Mendelian and complex diseases [
6]. In addition, there is the blooming of the next-generation sequencing technologies [
7–
9] and the commencement of the 1000 Genomes Project [
10,
11]. Today, we witness the completion of various ‘mega-scaled’ whole genome sequencing initiatives that involve thousands to tens of thousands of samples – the UK10K Project [
12]; the SG10K Study [
13]; the Genome Asia 100K [
14]; the ChinaMap initiative [
15]; Korean Genome Project [
16]; the All of Us Research Programme; and the recently-launched Genome Sequencing of 1000 Indians, to name a few.
However, scientists realize that the power of genomics has come to a bottleneck, and that the understanding of genotype-phenotype interactions has encountered challenges due to the complexity of the biological systems that interplay with multiple environmental factors [
17]. Although GWAS has shifted the paradigm for genetic investigations and the understanding of genetic architectures for many complex traits, the majority cannot be reliably replicated, and our understanding between the genotype-phenotype relationships remains scarce [
18–
20]. To overcome this challenge, comprehensive catalogue of genetic variations for both mainstream and marginalized populations [
21] must couple with multiple dimensional phenotypes and considerations of harmonized environmental influence hence the emergence of human phenomics. The marginalized populations, such as the indigenous populations, referring to the populations residing within geographically distinct traditional habitats, ancestral territories, and relying on natural resources in these habitats and territories for survival, may be an ideal model by nature to comprehend human genomic-phenomics interactions. They maintain their unique cultural identities, as well as social, economic, cultural, and political practices separate from mainstream societies and norms. The indigenous populations usually have long population history and are genetically more homogenous compared to the mainstream populations. In addition, these communities are minimally exposed to modernization, genetically homogenous, and reside within a relatively ‘controllable’ habitat, which collectively ease the measurement of the phenotypic changes. This review attempts to provide an overview on the potential use of indigenous populations research in cataloguing the human phenome, and the challenges ahead.
2 LESSONS LEARNT FROM GWAS
Since its first publication in 2002 [
22], genome-wide association studies (GWAS) have revealed well-supported association sand successfully mapped thousands of novel variants associated with complex traits, leading to discovery of new disease physiology (https://www.ebi.ac.uk/gwas/).
The underlying principle of GWAS is the ‘common disease-common variants’ hypothesis, which assumes that the major contributors to the genetic susceptibility of common diseases are the common genetic variants amongst a population. These variants often have lower penetrance and small additive genetic effect, which explains why the vast majority of these loci generally explains only a small proportion of the phenotypic variance with modest effect size. In the case of human height, a total of 180 stringently validated loci collectively explains only ~10% of the genetic variation [
23]. Thus, the problem of ‘missing heritability’ remains arguable [
19]. Some speculate that ‘missing heritability’ could be due to rarer variants with high penetrance that contribute to disease susceptibility [
19]. Future measures include applying stringent statistical tests to minimize false-positive findings [
24], or simply increasing the sample size and density of genotypes [
25–
28].
A striking example is hypertension. During its earlier stages, GWAS failed to identify the genes responsible for the genetic susceptibility of hypertension [
6,
29]. Subsequently, a large-scale meta-GWAS was performed, which identified several hundreds of associated loci. Yet, they only explain ~3%–5% of hypertension heritability [
30], and many of the association signals are not replicable. This is plausibly due to the fact that the etiology of hypertension is primarily classified into different intermediate phenotypes based on its physiological mechanisms, which potentially dilutes the statistical effects. Therefore, we believe that one solution to address genotype-phenotype interactions is via deep phenotyping.
3 WHAT IS THE HUMAN PHENOME?
Phenotype refers to a set of properties or characteristics that arise from the interaction between an individual’s genetic make-up and the environment, while phenome refers to the complete set of all human characteristics, determined by complex interactions between genes and the environment. The human phenome includes a comprehensive collection of all phenotypes, ranging from macro- to micro-scales, from external appearance to internal mechanisms, from biochemical characteristics to microbiota and psychological behavior, from population to individual levels, and from system to tissue and cellular characteristics [
31].
Human phenomics integrates and models multiple ‘-omics’ parameters together with other biological metadata and their impact on disease risk at individual and population levels. It is the key driver towards elucidating the ultimate gene-environment interactions that underpin the differential risks, prevalence, and emergence of disease phenotypes.
4 HUMAN PHENOME AND PRECISION MEDICINE
Precision medicine is a holistic idea that customizes the healthcare approach of an individual patient or a particular group of patients [
32]. Although genetic make-up is no doubt the key element to the implementation of precision medicine, individual history, environment, lifestyle, as well as heterogeneous disease phenotypes and manifestations must be taken into consideration. Therefore, in addition to physical and biochemical parameters, numerous ‘-omics’ technologies including transcriptomics, proteomics, metabolomics, lipidomics, metagenomics, epigenomics, and microbiomes, have been proposed as an alternative solution to overcome challenges for precision medicine [
33]. In other words, the prerequisite to warrant the success of precision medicine is comprehensive cataloguing the cell-, individual- and population-based phenomes (Fig. 1). Precision medicine in hypertension is a typical model that reflects the importance of phenomics [
34].
Hypertension is a classic model to exemplify the importance of phenomics in precision medicine. Hypertension, defined as the persistent elevation in blood pressure (BP) greater than 140/90 mmHg, affects 1.2 billion people worldwide [
35]. It has been increasingly recognized as a syndrome, rather than a disease [
36]. In addition, only ~30% of hypertensive individuals have their blood pressure well-controlled, despite being prescribed various anti-hypertensive medications [
37].
Hypertension is categorized into primary (or essential) and secondary hypertension. Approximately 90% of hypertension appears with no identified cause, hence defined as primary hypertension, while the rest are due to endocrine or renal pathologies, hence termed as secondary hypertension. Primary hypertension is further categorized into intermediate phenotypes, including salt-resistant (~40% of the primary hypertension) and salt-sensitive, which can be further sub-classified into low-renin (~55% of the salt-sensitive) and normal-renin levels (non-modulation) [
36]. Other hypertension sub-intermediate phenotypes being suggested including obesity-related hypertension [
38] and deoxycorticosterone acetate (DOCA) salt hypertension, which exhibits salt-dependent excess mineralocorticoid [
39,
40]. What aggravates this complication is the fact that numerous factors, including age, sex, circadian rhythm, individual lifestyle, and physical activity. These factors have been proposed to further explain the increased risk of hypertension and individual differential response to anti-hypertensive medication. In addition, variation in average blood pressure levels have been shown in different populations [
41], and that hypertension susceptibility is correlated with geographical latitude [
42–
44].
Collectively, these findings suggest that the realization of precision medicine for hypertension relies on comprehensive phenotyping and the understanding of the complex, dynamic interactions between “genotype-environment-phenotypes” [
34,
36]. Essentially, the holistic approach of human phenomics to the implications of precision medicine should be reflected by Fig. 1.
5 INDIGENOUS POPULATIONS AS THE MODEL POPULATION FOR PHENOMIC STUDY
The idea of phenomics was first proposed by Nelson Freimer and Chiara Sabatti in 2003 [
31]. However, owing to the limitation of technology, the idea was initially unsuccessful. Subsequently, a handful of phenome-wide association studies (PheWAS) were published, which principally examined phenotypic data derived from electronic health records across large population cohorts and associated them with genome-wide genotyping data [
45–
48], while others selected a list of chosen phenotypes [
49,
50]. A handful of single-nucleotide polymorphisms (SNPs) with pleiotropic effects were identified from these studies, further advancing our understanding of genotype-phenotype interactions.
Phenome research has progressed into utilizing animal models with several recently-initiated projects [
51,
52]. Although utilizing animal models simplifies the analytical framework, the practical limitation of performing bench studies in a laboratory setting has often restricted findings and does not reflect the larger picture of systems biology of human traits [
53]. For instance, non-human primates can sustain viral replication in relevant cell types and develop a robust immune response without manifesting the disease itself. A similar phenomenon can be observed in the murine model, where the dengue virus generally will not cause pathology in wild type mice.
As Sydney Brenner rightfully quoted, “We don’t have to search for a model organism anymore. Because we are the model organisms”– the indigenous populations serve as a natural model to deepen our understanding of genomic-environmental-phenomics interactions.
Take the indigenous populations of Peninsular Malaysia (locally known as Orang Asli) as an example. These populations are classified into three major tribes, namely the Negrito, Senoi, and Proto Malay, each of which is further categorized into six sub-tribes. Archaeological and population genetic studies suggest the possibility of these populations inhabiting the Southeast Asia region more than 50,000 years ago [
54,
55]. Although genome sequencing of these populations has been reported, such research has been rather modest [
54,
56] compared to many genome sequencing initiatives worldwide.
Traditionally, the Orang Asli inhabit remote areas neighboring the tropical rainforest. Because of this, the majority of these tribes were nomadic hunter-gatherers or swiddening agriculturalists that practiced egalitarianism. Specifically, the Proto Malays were more settled down and demonstrated advanced farming practices. Due to the nature of the tropical rainforest, the Orang Asli were exposed to numerous environmental stress stimuli, including poor personal hygiene, parasites and pathogens, as well as a lack of dietary nutrition. However, in the last several decades, certain Orang Asli communities have been relocated to semi-urban areas due to unavoidable modernization and the government’s initiative to alleviate poverty [
57]. Consequently, there has been an increase in the prevalence of metabolic syndromes within the Orang Asli, especially among those being resettled near urban areas. Yet, the trajectory of the reported incident rates is still slower than that of the mainstream populations [
58–
60].
Given such ‘natural’ exposure to the environmental stressors of the tropical rainforest and a fairly ‘controllable’ lifestyle and diet, attribution to any observed phenotypic variation could be narrowed down to the variability of intrinsic factors, thus comprehending the resiliency of these populations would be biological meaningful.
6 PHENOTYPIC STUDIES OF INDIGENOUS POPULATIONS
Genomic and phenomic studies of indigenous populations worldwide have been largely underrepresented. Until 2019, indigenous populations only accounted ~0.02% of all GWAS conducted [
61,
62]. Indeed, most global large-scale genomic research studies lack representation of indigenous populations (to name a few, the HapMap Project; 1,000 Genomes Project; Genome Aggregation Database (gnomAD); Simons Genome Diversity Project). Unfortunately, only a handful of studies reported their respective genomic structures and population histories [
56,
63–
70]. Studies that correlate their genomic structure and phenotypes have been primarily focusing on identifying signatures of positive selection [
71–
74]. Research related to phenome-wide association studies (PheWAS) or phenomics of indigenous populations have been scarce, as evidenced by a PubMed search (on 18th Dec 2020) that revealed no publications related to this topic.
Phenotypic research carried out on the indigenous populations were largely related to disease traits, noncommunicable diseases have been of interest in recent years [
58–
60,
72,
75–
77]. Numerous studies reported high prevalence rates of infectious diseases (
e.g., plasmodium, soil transmitted helminths, viral infections) among indigenous populations [
78–
83], yet many did not demonstrate symptoms of severe manifestations. Intriguingly, some indigenous populations exhibit unique phenotypic characteristics that cannot be explained merely by environmental exposure. A striking example is the Negrito populations of Peninsular Malaysia who predominantly inhabit the remote areas of Northern Peninsular Malaysia and demonstrated an average blood pressure higher than other populations, despite minimally attributed to hypertension risk factors [
84]. In addition, studies conducted on inflammatory and endothelial activation biomarker levels suggest that genetic factors may be, in part, a plausible explanation of lower cardiovascular disease risk among the Negrito populations compared to other populations [
58]. In contrast, studies conducted on hypertension and cardiovascular diseases in mainstream populations would be challenging since these populations are exposed to various risk factors (
e.g., sedentary lifestyle, stress stimuli, excess salt intake, poor diet, pollution, poor health awareness) and eliminating these factors would pose a great challenge.
Another example is the Greenlandic Inuit population who live in extreme arctic climates. By consuming a low-carbohydrate diet, they rely primarily on fatty acids and ketone bodies as their main source of energy. The Inuits have lower levels of cardiometabolic risk factors for the same level of body mass index (BMI) or waist circumference (WC) compared to Europeans [
85], but have shown a significant increase in type 2 diabetes mellitus over the last 30 years [
86].
7 CHALLENGES AHEAD
While investigating the phenomics of indigenous populations sounds promising, several challenges and trade-offs in contrast to mainstream populations are acknowledged as followed:
(i) Large-scale genome sequencing initiatives on diverse indigenous populations, in particular those within Southeast Asia, are essential. This is because Southeast Asia houses approximately 70% of the global human genetic diversity. Indeed, the fact that the 1000 Genomes Project data does not sufficiently cover the global human genetic diversity [
87] highlights the need for global collaborative initiatives to complete the catalogue of human genetic variations.
(ii) Comprehensive and accurate phenotyping is crucial to the improvement and advancement of modern precision medicine. However, it is practically difficult to collect complete sets of phenotypic data and biological samples (
e.g., blood, urine, saliva, tissue, imaging data) from the same individual. Such data fragmentation poses challenges to downstream analysis [
88]. Substantial efforts are required to engage with indigenous communities, establish trust and mutual respect, in order to ease the process of sample and metadata collection.
(iii) The majority of indigenous populations have a small population size, which consequently means that genetic variations would be too small to (a) detect mutations with large effects; and (b) explain variants commonly observed in other populations. Of particular importance, the statistical power of attributing the phenotypic variability to genetics or environmental exposures may be restricted [
53]. This is even more challenging when the phenotypic data collection is prospective, whereby follow-ups are required to measure changes or trends. Alternative analysis pipelines are required to overcome these restraints, without compromising the biological effects.
(iv) Extending from (iii), the issue of replicability within indigenous populations may arise due to a lower statistical power resulting from a small sample size. In addition, the power of study may be confounded by a high inbreeding rate in the indigenous populations. However, considering the relatively homogenous genotype and environmental exposure, a proper study design may be able to mitigate the potential error, for example, selecting the top percentile versus the bottom percentile of a quantitative trait from an indigenous cohort.
(v) The revolutionary change of the field of life sciences and materialization of precision medicine (Fig. 1) will eventually face a ‘data tsunami’. Scientists will be overwhelmed by huge, diversified, and complicated metadata. Therefore, harmonizing protocols and analysis pipelines that integrate such diversified data are crucial. However, this approach will undoubtedly lead to drastically higher costs for experimental procedures, infrastructure establishment, and analysis pipeline. Consequently, this may not be affordable to many developing countries, especially those in Southeast Asia. To this end, establishing an integrated genomic-phenomic database for indigenous populations would be meaningful [
89] to systematically organize the resources for future reference.
In contrast, despite heterogeneous genetic make-up and environmental exposure, there are several pros to studying the mainstream populations. Due to their large population sizes:
(i) Collection of genotypic and phenotypic data is relatively less burdensome. In addition, the availability of reference panels (e.g., 1000 Genomes Project and the GnomAD database) allows for more accurate genotype imputation compared to indigenous populations that lack a representative reference panel. In addition, there is a lower possibility of inbreeding within mainstream populations as opposed to indigenous populations.
(ii) The statistical power to detect mutations with greater genetic effects and genetic variation responsible for phenotypic variability would be higher. In addition, the issue of replicability could be mitigated.
For the ease of comparison, the pros and cons of researching indigenous and mainstream populations are listed in Table 1.
8 CONCLUDING THOUGHTS
The completion of the human genome sequence in 2000 had brought a world-shattering shift of paradigm to genome medicine research. Since then, significant technological advancements have drastically reduced the cost of genome sequencing within a relatively short period of time, increasing both its output quality and quantity. The establishment of the human genome-phenome reference is anticipated to provide a more complete understanding of the origins and diversity of various human traits and diseases, ultimately, to create a new translational medicine paradigm for optimizing and improving healthcare and patient management. Provided that the aforementioned challenges can be overcome, the indigenous populations serve as an ideal model population to excavate our understanding on genomic-environmental-phenomics interactions.
The Author(s) 2021. Published by Higher Education Press