Author summary: Direct-to-consumer genetic testing in China has exploded over the past five years. Chinese DTC-GC users are overwhelmingly willing to participate in research initiated by service providers. As most of these users are non-Caucasian, we evaluated the reliability of GWAS-derived polygenic disease reports using populations of predominantly European ancestry and found that prediction power increased alongside new GWAS loci integration. In assessing the outcomes of different GWAS, replicability varied among studies with different ethnic backgrounds and sample sizes. We speculate that Chinese DTC-GT databases represent valuable biobanks for genetic studies and clinical applications.
1 INTRODUCTION
In the framework of basic medical services, genetic tests are rarely offered by medical professionals other than genetic counselors and justice services; from a medical perspective, such tests are focused on the prevention of severe Mendelian disorders or birth defects, or tumor genotyping or paternity tests. In recent decades, however, technological innovation has enabled large-scale genetic screening, including population genomics and genome-wide association studies (GWAS), significantly expanding our knowledge as to the genetic underpinnings of common diseases and traits [
1,
2]. In light of identified associations between genomic variants and polygenic traits and the decreased cost of high-throughput genotyping, several direct-to-consumer genetic testing (DTC-GT) companies now offer genetic reports without requiring a medical professional intermediary.
In 2007, 23andMe, a company based in the United States, became the first DTC-GT service to provide personal genomics services, sending saliva sampling kits to users who returned them for analysis against a discrete subset of several thousand genetic markers. According to an estimation from MIT Technology Review [
3], by January 2019, 26 million U.S. consumers, or roughly 8% of the country’s population, had tested their DNA via one or more such DTC-GT services. DTC-GT user willingness to participate in research has substantially increased the biobank-scale genotype and phenotype database; starting in 2012, these companies became active participants in population genomics and health studies. From 2013 to 2018, Chinese DTC-GT services delivered around one million DNA tests to Chinese users [
4]. Considering the population volume in East Asia alongside its non-European background, the Chinese DTC-GT market represents a potentially valuable contribution to the academic community with room for development.
DTC-GT services have also aroused concerns and fomented debates regarding risk assessment reliability, clinical utility, consumer perceptions, and ethical issues [
5–
8]. In some investigations, the reproducibility of DTC-GT assessments was evaluated among different DTC-GT companies, reporting high concordance rates for SNP data but inconsistent disease risk predictions [
9–
12]. The predictive power of some reports has also been evaluated by case studies, and Graves’ disease, type 2 diabetes, lupus, Alzheimer’s disease, restless leg syndrome, Crohn disease, age-related macular degeneration, and celiac disease were found to have the highest prediction power [
7,
13]. To the best of our knowledge, large-scale systematic evaluations of DTC-GT’s genetic assessments, in particular from the service providers themselves, have yet to be published. Currently, updates to health reports intended to integrate new GWAS outcomes may result in reclassification of one’s predicted risk levels long after DNA results were delivered, leading to user confusion and complicating clinical practice. Investigations therein suggest a reclassification rate ranged from 16.3% to 24.4% [
14]. A reassessment of their predictive power, followed by risk level adjustment, has yet to be undertaken and thereby undercuts the reliability of different DTC-GT services.
Compounding this issue, the applicability of GWAS results and corresponding polygenic risk assessment to non-Caucasian populations range from uncertain to downright misleading [
15–
17] since nearly all GWAS data comes from studies performed on predominantly European-ancestry (white) populations [
18]. This situation presents risks and opportunities for DTC-GT services of non-European populations, such as the Chinese. While current knowledge may not be in alignment with local users, large cohort studies with Chinese populations are on the rise [
19,
20]. This may enable future GWAS outcomes from local research with relatively small sample sizes to be validated by biobank-scale datasets. Increasing the size and diversity of the global DTC-GT database could essentially improve GWAS replication efficacy and new association discovery.
This investigation analyzes user growth, rates of user participation in research, and academic outcomes of investigations conducted by major DTC-GT service providers in the U.S. and China. GWAS-derived DTC-GT reports were systematically evaluated for reliability through analysis of the distribution of polygenic risk scores (PRS) for multiple WeGene polygenic disease reports and tracing of risk level reclassifications over time. Results were directed towards assessing purported increases in predictive power from these companies. We also evaluated the reproducibility of several GWAS outcomes from the WeGene Biobank, including trans-ethnic or cross-ethnic studies, and of investigations with relatively small sample sizes.
2 RESULTS
2.1 Rapid growth of Chinese DTC-GT market
Founded in 2014 and 2015, respectively, WeGene and 23Mofang are leading the genetic testing upsurge in China, providing microarray-based, high-throughput genotyping products to customers similar to analyses provided by 23andMe and AncestryDNA in Western nations. Five-year user growth patterns for Chinese DTG-GT providers (Fig. 1B) are comparable to those for U.S. pioneers (Fig. 1A). The first wave of rapid growth in DTC-GT in China emerged in 2017 and 2018 and benefited from cost reductions in genotyping and a dynamic capital market in medical and health fields. In 2017, WeGene was the first to offer a whole-genome sequencing (WGS) service directly to consumers, and 23Mofang followed with a whole-Y chromosome sequencing and analysis service. In 2019, Hong-Kong S.A.R.-based CircleDNA entered the mainland China market, offering a whole-exome sequencing (WES) service. Recent research estimates that around one million Chinese had availed themselves of a DTC-GT service as of 2018 [
4].
2.2 Published studies from DTC-GT service providers
DTC-GT databases have also emerged as a valuable resource for population genomics and genotype-phenotype association studies. 23andMe has been an active participant in the academic community since 2010 [
23] (Fig. 2A), with research focused on human health, traits, and behaviors. 23andMe is also heavily involved in ethical, legal and social implications (ELSI) topics surrounding genetic testing. Chinese service providers followed suit beginning in 2017. Like their U.S. counterparts, Chinese service providers have published studies on biogeographic ancestry and population genomics [
24–
29], and on method development [
28,
30] (Fig. 2B); these assessments, however, do not rely on the ethically problematic large-scale collection of phenotypic data. To the best of our knowledge, the only published GWAS study of a phenotypic trait in the Chinese population was on photic sneeze reflex [
31], with no health-related studies having been published as of 2019.
2.3 User participation in genetic research
By October 2019, 98.0% of WeGene profiles (Note: a single user account may include multiple genetic profiles) were accompanied by consent to use the genetic and phenotypic data for research purposes. This is much higher than the 80% participation rate for 23andMe users [
23]. Among WeGene users, 97.5% of the company’s profiles allowed access for genealogy matching. Of these, 56.6% have participated in at least one of 662 third-party trait reports (PocketDNA), contributed by WeGene customers via the open API platform. User activation and retention are high, with 86.0% of customers having visited at least one WeGene online platform (website, mobile app, WeChat platform) over the past six months, and 77.1% over the past three months. Averages for 23andMe are approximately 60% over three months [
32].
Basic information sharing is also high, with 99.6% of consented profiles reporting biological sex and 94.9% reporting date of birth (Fig. 3A). Current residence, ancestral home (defined as the birthplace of the participant’s father’s father which is recorded in the citizen residence registry in China), ethnic group, and surname were provided by 46.6% to 47.0% of profiles (Fig. 3A). In phenotype collection, the most provided traits are height and weight, with over 41,000 reports submitted (Fig. 3B). Among the 33 research projects on the WeGene open platform, 15 were self-conducted investigations, and 18 were collaborative studies. Among consented profiles, 48.4% have participated in at least one research project. Over 10,000 responses were collected from questionnaires about stress and mental traits, eyelid type, sleep pattern, color recognition, and blood type (Fig. 3B).
2.4 Reclassification of risk level for polygenic health reports
Three polygenic diseases, Alzheimer’s disease (AD) [
33–
66], type 2 diabetes (T2D) [
67–
92], and schizophrenia (SP) [
20,
93–
106], were selected to assess user risk level reclassification, as were growth in user numbers and report updates following synthesis of added GWAS outcomes. Normalized PRS and risk levels were calculated for the first 100, 300, 1,000, 3,000, and 10,000 users using the first version of WeGene from 2015, and then for each key update through October 2019 (Supplementary Table S1, Data S1). Increased numbers of users did not change the overall PRS distribution per Pearson’s correlation between user amount and PRS interquartile range (IQR) at
p = 0.66 but did add smoothness. A two-tailed
F-test for the variances of the PRS of two adjacent sampling points resulted in
p>0.05, with the exception of a single significant case wherein user numbers increased from 1,000 to 3,000 for the AD report, which came out to
p = 0.004 (Fig. 4A). As the number of loci increased along with report updates, the PRS distribution broadened concomitantly (Pearson’s coefficient: 0.74,
p = 0.0035; two-tailed
F-test:
p<0.0001).
Observed risk level reclassification after report updates occurred on a reasonable scale (Fig. 5). Between two adjacent versions of reports, 76.2%±12.0% users’ risk levels remain unchanged, 22.1%±10.7% were reclassified to an adjacent level, and extreme alterations (high to low or medium-low, or low to high or medium-high) were only observed in 0.010%±0.019% of cases.
As most GWAS data were derived from European populations, cross-ethnic replication of GWAS loci and applicability of the GWAS-based PRS models remain unclear. We evaluated the predictive power of polygenic disease reports by analyzing user feedback on AD family history. In total, 10,435 individuals reported a family history of AD, and 1,763 individuals (16.9%) reported at least one positive AD case in a parent or grandparent. The predictive power of correlation testing between disease risk levels and family history is expected to improve following increasing numbers of users and integration of new GWAS outcomes, as shown by a trend towards increased odds ratios and narrower confidence intervals (CIs) (Fig. 6).
2.5 Replicability of GWAS results
Seven association studies were selected for validation in WeGene Biobank, including single nucleotide polymorphism (SNPs) originally identified in European populations and studies with relatively small sample sizes.
Studies in Caucasian populations indicated that apolipoprotein E (
ApoE) genotypes are associated with late-onset AD [
107,
108]. Among the 10,435 WeGene profiles that participated in AD report feedback, both the risk allele type ε4 (OR: 1.43, 95% CI: 1.27 to 1.61, two-tailed Fisher’s,
p = 2.6 × 10
−9) and protective allele type ε2 (OR: 0.76, 95% CI: 0.64 to 0.98, two-tailed Fisher’s,
p = 7.9 × 10
−4) were significantly associated with AD family history. We also tried to replicate per-locus associations of the 84 loci used in the WeGene AD report (Supplementary Table S2) and found 35 loci with low minor allele frequency (MAF) (<0.05) among WeGene users; among the remaining 49 loci, 12 could be replicated with normal significance (one-tailed Fisher’s,
p<0.05), led by rs7412 (
p = 0.002), an
ApoE-determining SNP.
Similarly, rs9939609, a SNP in the fat mass and obesity-associated protein (FTO) gene reported to be associated with body mass index (BMI) [
109], was also replicated by a correlation test between body weight categories (overweight: BMI>28, lean: BMI<18.5) and genotypes (Chi-square,
p = 0.011), and the rs9939609-AA carriers presented with significantly higher BMIs than individuals with rs9939609-GG (23.3 vs. 22.4, one-tailed
t-test,
p = 3.4 × 10
−11). In another case, among the top 20 loci (ranked by OR) associated with male pattern baldness (Supplementary Table S3), six out of the eight loci with MAF≥5% were significantly associated with self-reported hair loss levels (Chi-square for all genotypes, one-tailed Fisher’s for high-risk genotypes,
p<0.05 for both) in WeGene Biobank (Supplementary Table S2).
Conversely, SNPs reportedly associated with cilantro dislike and soap taste in European populations [
110], were not replicated in WeGene users: rs72921001 was non-significant (Chi-square test,
p = 0.060), and no rs78503206 polymorphism was found in the database. Similarly, none of the four SNPs associated with handedness in a recent study in UK Biobank participants [
111] were found in a Chinese sample size of 7,644 (Chi-square,
p>0.05).
For certain small sample GWAS on East Asians, the WeGene Biobank could be valuable as a dataset for GWAS discovery and validation. In a study of 96 Han Chinese individuals [
112], three loci were identified as significantly or inconsistently significantly associated with eyelid traits, although none of these presented among the 13,715 participants reporting eyelid type in the WeGene Biobank (Chi-square,
p>0.05). Conversely, a study on 2,980 Han Chinese did not turn up any significant markers for petaloid toenails [
113] despite the fact that it is a signature trait among Han Chinese. Similar genotyping methods in over 8,000 individuals reporting fifth toenail types in the WeGene Biobank, however, uncovered 32 SNPs that met the genome-wide significance threshold (
p<5 × 10
-8) (unpublished data).
3 DISCUSSION
The years 2017 and 2018 saw rapid growth in the DTC-GT market in the U.S. and China. Although less than 0.1% of the Chinese population has performed a self-assisted DNA test to date, as home to the world’s largest population, China has the potential to be a powerful force in the emerging DTC-GT market.
3.1 User composition limits research diversity and value of current data
As is to be expected with novel technology, acceptance and popularity of DTC-GT in China was originally heavily skewed towards young people. WeGene users have an average age of 31, and approximately 50% of their clients are aged 26 to 38. Over 80% of these users live in first-tier metropolitans. Biased age and residence compositions have limited clinical research opportunities and applications of Chinese DTC-GT biobanks and cohort recruitment and GWAS for less-common diseases, with a dataset of diagnosed users insufficient to support these measures. We encountered data limitations in our evaluation of the PRS model and replication of GWAS loci for late-onset diseases, such as AD, and were unable to directly link positive AD cases to particular genetic profiles. We instead had to rely on user family history as an alternative, somewhat limiting the replication, extension, and reliability of our findings. As such, similar to early 23andMe research endeavors, publications from the Chinese DTC-GT service providers remain focused on population genomics and tools [
24–
30].
3.2 Heavy user activity and research initiatives promote future outcomes
The skewed age composition also confers a benefit in terms of the openness and willingness to engage in data sharing among current users, including feedback reporting, third-party report participation, and phenotype collection. DTC-GT companies use comprehensively connected web-based platforms that include an official website, mobile app, and social media profiles on WeChat API and other official media. This likely contributes to robust user activation and retention and thereby promotes phenotype collection for research purposes. These advantages suggest promising academic contributions from Chinese DTC-GT companies in the future. Chinese companies interested in replicating 23andMe’s model could yield Chinese GWAS on human health and other traits within three to five years. As user numbers increase, phenotype collection shifts in user composition will render Chinese DTC-GT-derived biobanks more valuable, particularly for studies on disease.
3.3 Report reliability and optimization
The core mission of DTC-GT service providers is to provide increasingly accurate and understandable genetic reports to customers. Current DTC-GT reports for polygenic diseases and traits are predominantly generated by frontloading GWAS outcomes in the absence of systematical examination and validation [
6–
8]. Our examination of the predictive power of multiple polygenic disease reports using normalized PRS distribution indicates a trend towards increasing prediction accuracy alongside concomitant user growth and the synthesis of new GWAS results. In the meantime, the scale of risk level reclassification was shown to be within normal parameters and is unlikely to cause distress in customers lacking professional knowledge of genetics.
It is important to note, however, that a large number of reported SNP associations from other GWAS could not be replicated in the WeGene user database, likely due to the application of European-based GWAS against a non-Caucasian population. We also found that the overall reproducibility of loci included in such reports remains unclear, confounding verifiable evaluations. We thereby propose significant improvements could be made to phenotype and disease risk predictive models, and necessary follow-up tasks should include an evaluation of the reproducibility of different GWAS, an implementation of new GWAS studies to identify new loci, new SNP ranking and weight, predictive model selection and adjustment, precise covariant selection (such as biological sex, age, and family history) in a complex predictive model, and ethnicity-specific modeling.
3.4 Opportunities for Chinese DTC-GT biobanks
Biobanks are an important data resource for human genetic research projects, particularly medical cohort studies and GWAS discovery and replication. The UK Biobank, with more than 500,000 genotyped participants, is the largest biobank that is publicly accessible [
114], and has been mined for genetic research across the globe. Among UK Biobank samples, around 150,000 individuals were genotyped with microarrays from Affymetrix that search for 600,000 to 800,000 SNPs and indels. UK Biobank investigations have produced 944 scientific papers [
114]. Apart from government-financed biobanks, commercial biobanks like 23andMe have also become valuable resources for large-scale studies. All 23andMe users were genotyped with high-throughput arrays from Illumina and Affymetrix, covering from 500,000 to 900,00 SNPs and indels across versions, similar to the arrays used by WeGene. 23andMe datasets have been mined for 130 scientific publications [
23], and the biobanking of 23andMe also possess commercial value via data purchase and trading with the pharmaceutical industry [
115]. Currently, whole-genome genotyping (WGG) is used by most biobanks to balance costs, sample size, scientific interest, and cross-biobank compatibility of biobanking.
Our trans-ethnic GWAS replication analyses recapitulated previous studies demonstrating that population background is a crucial factor influencing the reproducibility of GWAS outcomes [
15–
17]. A biobank with WGG data from a majority Chinese population is in high demand for health-related studies and commercial purposes such as drug development. The most famous open-to-public human biobank in East Asia is BioBank Japan; no UK Biobank-like dataset for Chinese or even East Asian populations generally currently exists. In the absence of an official biobank, and light of rapid demand growth for commercial and research datasets alongside robust user study participation, Chinese DTC-GT-based biobank shows strong potential in both academic and industrial contexts.
4 MATERIALS AND METHODS
4.1 Research participants
Participants in the genome-wide association study (GWAS) validation and health risk level analyses were drawn from consenting WeGene customers from Shenzhen Zaozhidao Technology Co. Ltd., a direct-to-consumer genetic testing service provider. User statistics, genotypes, and phenotypes were collected in October 2019.
4.2 Ethical approval
Informed consent for online research was obtained from all individual participants included in the study. The study was approved by the Ethical Committee of Shenzhen WeGene Clinical Laboratory. The study was conducted in accordance with the human and ethical research principles of The Ministry of Science and Technology of the People’s Republic of China (Regulation of the Administration of Human Genetic Resources, July 1, 2019).
4.3 DNA sampling and genotyping assay
Saliva samples for DNA extraction were collected and stored with an Oragene DNA Sample Collection Kit (OG-250 or OG-510, DNA Genotek, Canada). DNA isolation and purification were performed with the Magnetic Saliva Fast DNA kit DP703-73A (Tiangen, China). Samples were genotyped at WeGene Clinical Laboratory on one of two custom arrays: Affymetrix WeGene V1 Array (596,744 SNPs) by Affymetrix GeneTitan MC Instrument, and Illumina WeGene V2 Array (742,762 SNPs) by Illumina iScan System.
4.4 Quality control of genotype data
Quality control (QC) was performed with PLINK V1.9 [
116]. Individuals and SNPs with an overall genotype call rate lower than 98.5% were excluded. In polygenic risk score (PRS) distribution and health risk level reclassification analyses, individuals with AD, T2D, or SP, and SNPs with a genotype call rate lower than 80.0% were excluded.
4.5 Phenotype and family disease history
Self-reported phenotypes and family histories were provided by participants via web-based questionnaires. Customers who did not fill out these questionnaires were eliminated from the dataset used for statistical analysis of the target disease or phenotype.
Body mass index (BMI) Individuals’ BMIs were calculated from self-reported height and weight using the following formula:
Only participants aged from 18 to 65 and with BMI values from the 5th to 95th percentile were used in statistics.
Hair loss Respondents were asked if they were bald, and example images for different levels were given for selecting one of four responses: “no,” “slight,” “medium,” or “severe”; these were used to classify the respondent’s phenotype. They were then asked if their father and mother were bald, with the same four options plus a fifth for “not sure” for each. Respondents were then asked to provide dates of birth for themselves and their parents. “Slight” and “medium” were quantified as “hair loss.” “Severe” was quantified as “bald” in GWAS replication analysis.
Family disease history Respondents were asked whether they have any family members diagnosed with a specific disease. The family members include the respondent, the respondent’s father, the respondent’s mother, the respondent’s grandfathers, and grandmothers. A participant was marked as positive for disease family history in any of these family members was reported as a diagnosed case, otherwise, the respondent was marked as negative for the disease history. The disease family histories of T2D, AD, and SP were used in this study.
Cilantro preference Respondents were asked whether they did or did not like cilantro, and whether or not they thought it had a pleasant or soap-like taste or aroma; “not sure” was also provided as an option. These answers were used to classify phenotypes for GWAS replication.
Handedness Participants were asked to provide their biological sex and whether or not they were a twin with options for identical, fraternal same-sex, and fraternal opposite-sex. They were then asked about handedness with options for “right-handed,” “left-handed,” “ambidextrous,” and “not sure”; these were used to classify GWAS phenotype. Subsequently, they were asked for their preferred hand in multiple behaviors, including writing, drawing, throwing, using scissors, tooth-brushing, using a knife, using a spoon, using chopsticks, using a hand broom, and unscrewing caps with the following five options provided for each: “right hand only,” “right hand mostly,” “no preference,” “left hand mostly,” and “left hand only.” Finally, they were asked about each parent’s handedness with options for “right-handed,” “left-handed,” “ambidextrous,” and “not sure.”
Eyelids Participants were asked to classify single- or double-fold eyelids for each eye, with an additional option of “difficult to classify” for both; these responses were used for GWAS phenotype classification. The participants were also asked to classify the eyelid types for the right and left eye of each parent with the added option of “not sure.”
Petaloid toenail Participants were asked whether their fifth pedal digit (“little toe”) had a petaloid toenail for each foot. Phenotypes were classified as Petaloid_E (petaloid toenail on one foot) and Petaloid_D (petaloid toenail on both feet) in accordance with established standards [
113]. GWAS for Petaloid_E and Petaloid_D were performed separately.
4.6 Odds ratio (OR) normalization, PRS and risk level
All participants were included in OR and PRS calculations before the participant volume reached 10,000. Participants for version 2015-01 were acquired from users up to the first key update (September or October 2017). The impact of report updates was evaluated by randomly selecting 10,000 more participants plus all those who provided a corresponding family disease history and received genetic testing before the first report update. Risk level reclassification was assessed using 10,000 randomly selected subjects genotyped with the WeGene V2 Array at all time points.
Allele ORs were converted to genotype ORs before PRS calculations. If a biallelic OR was not specified in the original literature, a single risk/protective allele OR was assigned to the heterozygous genotype, and both risk/protective alleles were assigned to a homozygous genotype with the squared allele’s OR. Each SNP’s OR distribution was log(2)-transformed and adjusted to be zero-centered in the population using the following formula:
Where
adjORj,a is the adjusted OR for genotype
a of locus
j;
ORj,n is the OR of locus
j for individual
n; and
ORj,a is the original OR of the genotype
a of locus
j.
For a single health risk report, the PRS incorporating all risk loci for individual
n was:
Participant PRS values were classified into five risk level categories by percentile: Low= PRS<10th; Medium-low=10th≤PRS<25th; Medium=25th≤PRS≤75th; Medium-high=75th<PRS≤90th; High= PRS>90th. Participant PRS-based health risk levels were subject to change according to increased numbers of users, OR adjustments, and health report update following incorporation of new GWAS-identified SNPs.
4.7 Genome-wide association study
Initial genome-wide association analyses on ordinal or binary phenotype were performed with PLINK 1.9 [
116] using multiple linear regression models of additive allelic effects with sex and an appropriate number of genetic principal components (PCs) as covariates. Detailed methods will be released when the corresponding GWAS published.
4.8 Statistics and visualization
Statistics were conducted in Python and R with packages including scipy and numpy. Data visualization was performed with R and corresponding packages, including ggplot2, RColorBrewer, ggalluvial, and qqman. Fisher’s exact test (2 × 2 table) or Chi-square tests (3 × 2 genotype table) were performed to assess independence. A t-test was performed for mean value comparisons between parametric statistics. Pearson’s correlation was used to evaluate correlations between parametric data. P-value correction for multiple testing was performed with a Bonferroni adjustment. The significance threshold was set to p<0.05 and false discovery rate (FDR)<0.05. During GWAS discovery, the genome-wide significance threshold was set to p<5 × 10‒8 for SNPs. In GWAS replication, p<0.05 was used as the threshold for statistical significance.
4.9 Data availability
In light of our commitment to customer privacy and privacy regulations from the Administration of Human Genetic Resource of China, we will not be publishing user health reports or detailed genotype or phenotype distributions. For questions about the analyses in this research or academic collaboration opportunities with WeGene, please contact the WeGene Research Team by email (research@wegene.com).
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature