Hepatitis B virus (HBV) results in chronic and acute infections of liver and has affected hundreds of millions of people globally, including in China [
1,
2]. Approximately one million deaths per annum are caused by chronic hepatitis, hepatocellular carcinoma (HCC) and cirrhosis [
3]. HCC is among the most prevalent tumors worldwide and HBV infection is its well-known risk factor [
4]. The mechanism by which HBV causes liver cancer has not been well understood, but it may involve the viral biology itself and the subsequent cellular responses (e.g., inflammation) occurring after viral infection. Although current vaccination initiatives have lowered the rate of HBV infections in China, chronic infections still remain a major concern [
5].
HBV is a small, non-cytopathic virus with a 3.2-kb DNA genome, which contains 4 overlapping genes: (i) the
PreS/S gene that encodes the Hepatitis B surface antigen (HBsAg), (ii) the
PreC/C gene that encodes for the nucleocapsid and e-antigen (HBeAg), (iii) the
P gene that encodes for polymerase, and (iv) the
X gene [
6]. The S domain in
PreS/S gene plays a key role in morphogenesis, facilitating viral entry into the cells. Studies have shown that antibodies designed against S domain have the potential to neutralize HBV infections [
7-
9]. An “antigenic domain” (residues 100-170) is considered crucial for the presence of genotypic variants in HBV-positive patients [
10] (Figure 1). The HBV DNA is replicated through an intermediate reverse transcription step, which generates genetic variations due to lack of proof-reading ability in the viral polymerase [
11-
13]. This ultimately leads to the generation of quasispecies pools in the host, which further aggravates to the emergence of mutants when selection pressure is exerted, especially by treatment procedures such as vaccination and administration of nucleoside analogs for inhibition of viral polymerase [
12].
The current methodology used to confirm the presence of mutant strains in patients include quantitative polymerase chain reaction (qPCR) and restriction fragment length polymorphism analysis [
14]. However, these immunoassays have a limited sensitivity and specificity, e.g., a particular assay can only be used to detect a specific mutant [
15,
16]. Therefore, it is instrumental to develop a reliable methodology for the identification of multiple HBV mutants and eliminate the need to carry out multiple immunoassays [
17]. Here, based on selective region PCR and deep sequencing, we have developed a method for the rapid identification of multi-strain HBV infection in patients. The procedure requires approximately 1.5 days to complete and can potentially be used for the clinical identification of multi-strain HBV infection and help improve treatment regiments.
To demonstrate our method of rapid identification of multi-strain HBV infection in patients, we have obtained serum samples from a late-stage liver cancer patient with clinically suspected multiple re-infections of HBV, enrolled at the Peking Union Medical College Hospital, Beijing, China. A volume of ~5 mL whole blood was collected and the serum was separated by centrifugation at 1000 xg for 10 min. The serum was collected in aliquots of 500 µL and was immediately stored at –80°C until further use. The patient was informed about the purpose of study and a written consent was obtained before sample collection. DNA was extracted from the serum using the EZNA Viral DNA Kit (Omega Biotek, USA). In brief, 250 µL serum was lysed by protease and acrylamide, treated with RNase-A, ethanol washed and eluted. DNA was quantified using Nanodrop 2000c (Thermo Scientific, USA). Serum separation and DNA extraction were performed in a BSL-2 laboratory. For PCR, the primers were designed to amplify the 220 bp antigenic domain region. Forward (5′-CTGGACTACCAAGGTATGT-3′) and reverse (5′-GTAAACTGAGCCAGGAGA-3′) primers were synthesized by Genewiz, China. PCR cycling conditions included an initial denaturation at 94°C for 2 min, 30 cycles of denaturation at 94°C for 30 s, annealing at 55°C for 30 s and extension at 72°C for 1 min, and final extension at 72°C for 10 min. The reaction was performed in Veriti thermal cycler (Life technologies, USA) with a total volume of 50 µL containing 5 ng of template DNA, 0.3 µM of each primer, 200 µM of each deoxynucleoside triphosphate (dNTP), 1U of Taq polymerase and 2× PCR buffer (supplied with the polymerase). The amplified DNA was electrophoresed on a 1.5% agarose gel for about 30 min at 90V. After electrophoresis, the desired DNA fragment was excised and purified using EZNA gel extraction kit (Omega Biotek, USA) according to manufacturer’s instructions.
The purified PCR product was further used for library preparation and high-throughput sequencing. Briefly, the sample was adapter ligated, amplified, size selected, and quantified. The library was sequenced using an Illumina MiSeq system. The barcoded sequencing data consisted of 1,298 paired-end reads with a read length of 150 bp. Data analysis was performed using CLC Genomics Workbench 7, involving the removal of adapter contamination from sequenced reads and trimming the low quality ends from the reads. The threshold for trimming was Q20. We next performed the alignment of the Illumina reads against the HBV reference genome (GenBank accession number NC_003977) using a mapping quality score of 30. Duplicate sequences were excluded from further analysis. SNPs were also called with the alignment data using CLC Genomics Workbench 7. Per base quality score indicated that a high percentage of the sequences passed the PHRED Q20 and Q30 threshold, indicative of low sequencing error, while any low quality sequences were trimmed from the reads to improve the overall quality of alignment. The average sequence coverage was ~200×, which provided a sufficient depth and hence increased reliability of the single-nucleotide polymorphism (SNP) call of the 220 bp antigenic domain (Figure 1).
The analysis revealed 4 SNPs, including 2 C to T mutations at position 533 and 592, and 2 T to A mutations at position 562 and 581 (Figure 2). These C to T and T to A mutations were among the high coverage polymorphisms, i.e., they were represented by at least 50% of the reads. One of these variations (position 592) is covered by more than 99% of the reads. Multiple other variations were also observed in these regions, which were relatively under-covered to be considered reliable (Table 1).
It is also noteworthy that mutation C533T substituted amino acid proline with a serine residue, a non-polar to polar change, which may result in structural changes in the antigenic domain. Furthermore, the T581A variation replaced a serine residue with threonine: a polar residue replaced by another polar residue. Two of the other SNPs, i.e., the T562A and C592T mutations did not induce any change in the serine and asparagine residues they code for. Taking into account the SNP data and possible mutations in the antigenic domain, it is reasonable to consider that the patient would have had up to 16 different strains of the virus, either through multiple infections or de novo mutation of the virus. Such multi-strain infection would be harder for the patient’s immune system and treatment regiments to tackle, making it possible for the virus to persist and cause other diseases.
Here we have developed a method for the rapid identification of multi-strain HBV infections and applied the method to analyze a potential multi-strain HBV infection case in a late-stage liver cancer patient. We amplified and sequenced a small genomic region from the PreS/S viral gene, which suggested the existence of genotypic viral variants in the patient. Our data indicated the presence of 4 SNPs, which suggested the possibility of multi-strain infection. These SNPs could arise due to various reasons, e.g., from re-infection and from the selective pressure exerted by the therapeutic regiments. For persistent infection, the virus would have to evade the host immune system. Hence, variations are introduced to the genome, and the selection of variants favors successful replication cycles. We have amplified only a small region from the viral genome, which could be one of the reasons why we only observed 4 SNPs from sequencing. Sequencing the other genomic regions from the virus may result in an increased number of SNPs; yet, this would still suggest the presence of multiple strains of viruses in the patient. Thus for improving the feasibility and simplicity of the methodology, we sequenced only a small region from PreS/S gene. The method we reported here is fast, cost-effective, and may enable clinicians in the rapid identification of multi-strain HBV infection to develop appropriate treatments for chronic HBV infection in patients.
Higher Education Press and Springer-Verlag Berlin Heidelberg