Introduction
Insect mitochondrial DNA (mtDNA) is a circular DNA molecule 14−20 kb in size with 13 protein-coding genes (PCGs), two rRNA genes, 22 tRNA genes, and one A+ T-rich region which contains the initiation sites for transcription and replication (
Clayton, 1992;
Wolstenholme, 1992;
Boore, 1999). In recent years, the mitochondrial genome are popularly used in studies on phylogenetics, comparative and evolutionary genomics, population genetics, and molecular evolution. The advantages of the mitochondrial material over those of the nuclear DNA include its stability in maternal inheritance, limited recombination and lower rates of nucleotide substitution.
Lepidoptera is the 2nd largest order of insects, containing 47 superfamilies, 126 families and 250 subfamilies (
Kristensen and Skalski, 1999). Thus far, complete mitochondrial genome (mitogenome) sequences have been determined in more than 60 species of Lepidoptera. Although many studies have focused on the phylogeny of Lepidoptera, there are still many relationships within the order that remain to be elucidated (
Lavrov et al., 2000). Among the unanswered questions include the superfamilial relationships, particularly those within the Macrolepidoptera (Bombycoidea, Geometroidea, Noctuoidea, Papilionoidea, Lasiocampoidea, Mimallonoidea, Axioidea, Calliduloidea, Hedyloidea, Hesperoidea, and Drepanoidea). Minet (
1991) and Nielsen (
1989) proposed a sister relationship between Geometroidea and Papilionoidea, but the relationship of this group to other macrolepidopteran superfamilies remains uncertain, resulting in a trichotomy among the most speciose macrolepidopteran superfamilies. Some studies with comprehensive selection of lepidopteran species and multiple genes representing a substantial length of sequence information demonstrated several unconventional relationships, including one that was incompatible to the monophyletic Macrolepidoptera (
Regier et al., 2009;
Mutanen et al., 2010). More recently, Kim et al.(
2011) formulated a refined hypothesis of a superfamilial relationship (((((Bombycoidea+ Geometroidea) + Noctuoidea) + Pyraloidea) + Papilionoidea) + Tortricoidea), signifying the lack of support for the traditionally defined Macrolepidoptera.
Another major area of controversy is regarding the familial relationships within the true butterflies. Butterflies are commonly recognized to comprise somewhere between four and 14 families (
Kristensen, 1976;
Smart, 1989).The relationships (((Lycaenidae+ Nymphalidae) + Pieridae) + Papilionidae) were recently revisited by studies utilizing a combination of morphological and molecular characters (
Weller and Pashely, 1995;
Wahlberg et al., 2005) or molecular data alone (
Kim et al., 2010). However, more recent and extensive molecular phylogenetic studies of the Lepidoptera revealed a non-monophyletic Papilionoidea, when Hesperiidae was grouped within another macrolepidopteran family − Hedylidae within the superfamily (
Regier et al., 2009;
Mutanen et al., 2010).
In the present study, we aimed at determining the complete mitogenomic sequences of one additional papilionids Lamproptera curia and elucidating the organization of the genome. Our focus was on utilizing the concatenated amino acid and nucleotide sequences of 13 protein-coding genes (PCGs) for phyolgenetic analyses. The newly obtained sequences of L. curia, together with those corresponding sequences generated in previous studies from 45 other species of Lepidoptera were utilized in order to determine relationships within the order and to test relationships within Papilionoidea.
Materials and methods
Specimen collection and DNA extraction
An adult specimen of
Lamproptera curia was collected from Lingui, Guangxi, China, in August 2012; preserved in 100% ethanol and stored at 4°C until DNA extraction. Total genomic DNA was isolated from the muscles of thorax or leg using a routine phenol/chloroform method (
Zhou et al., 2007).
Primer design, PCR, and sequencing:The primers used for the amplification of complete mitogenomes in this study were based on sequences listed in Table 1. A few the exact numbers of universal PCR primers for short fragment amplifications of the
cox1,
cox2,nd5 and
cytb genes were synthesized based on sequences from a previous study (
Simon et al., 1994). The remaining primers were designed based on the sequence alignment of the available complete lepidopteran mitogenomes using Primer Premier 5.0 software (
Singh et al., 1998). The entire mitogenome of
L. curia was amplified in six fragments (
cox1-
cox3,
cox3-
nad5,
nad5-
nad4,
nad4-
cob,
cob-
rrnL,
rrnL-
cox1) using long-PCR techniques with TaKaRa LATaq polymerase under the following cycling conditions: initial denaturation for five minutes at 95°C, followed by 30 cycles of 95°C for 50 s, 45 − 50°C for 50 s, 68°C for 2 min and 30 s; and a final extension step of 68°C for 10 min. The PCR products were visualized by electrophoresis on 1.0% agarose gel, then purified using a PCR purification kit from QIAGEN (QIAGEN, Germany) and sequenced directly with an ABI 3730 DNA Analyzer using a BigDye chemistry kit (Applied Biosystems, Inc., Carlsbad, CA, USA), in which the same PCR primers were used. All PCR products were sequenced from both strands. The resultant mitogenome sequence data were deposited into the GenBank database under the accession number KJ141168.
Sequence analysis and annotation
Sequence annotation was performed using the blast tools in NCBI website (http://blast.ncbi.nlm.nih.gov/Blast) and DNAStar package (DNAStar Inc. Madison, USA). The tRNA genes and their secondary structure were predicted using tRNAscan-SE software v.1.21(
Lowe and Eddy, 1997). The PCGs and rRNAs were confirmed by sequence comparison with ClustalX1.8 software and NCBI BLAST search function (
Altschul et al., 1990). Nucleotide composition and codon usage were calculated with DAMBE software (
Xia and Xie, 2001).
Phylogenetic analysis
To reconstruct the phylogenetic relationship among lepidopteran insects, the complete mitogenomes of 46 Lepidoptera species(including that of L.curia)were obtained from the GenBank database (Table 2). These mitogenomes were divided into 7 Lepidopteran superfamilies within the Lepidopteran order. The mitogenomes of three diptera species were used as outgroups. The concatenated nucleotide and amino acid sequence of 13 protein-coding genes were used for constructing phylogenetic trees.
The phylogenetic trees were constructed using maximum likelihood (ML) (
Abascal et al., 2007) and Bayesian inference (BI) (
Yang and Rannala, 1997) methods. The ML analyses were conducted using PHYML (
Guindon et al., 2005) under the following conditions: the proportion of invariable sites as “estimated,” number of substitution rate categories as four, gamma distribution parameter as “estimated,” and the starting tree as a BIONJ distance-based tree. The confidence values of the ML tree were evaluated via the bootstrap test with 500 iterations. The Bayesian analyses were performed using MrBayes 3.1.2 (
Ronquist and Huelsenbeck, 2003) with the partitioned strategy. The best fitting substitution model was selected as in the ML analysis. The MCMC analyses (with random starting trees) were run with one cold and three heated chains simultaneously for 1 000 000 generations sampled every 100 generations. Bayesianposterior probabilities were calculated from the sample points after the MCMC algorithm started to converge.
Results and discussion
Genome organization
The complete mtDNA sequence of L. curia was 15277 bp in length (Table 3) and it consisted of 2 rRNAs, 22 tRNAs, 13 PCGs and one major non-coding A+ T-rich region. As shown in Fig. 1 and in consistent with the case in many insect mitogenomes, the major strand of the DNA coded for a higher number of genes (9 PCGs and 14 tRNAs), whereas the minor strand coded a lesser number (4 PCGs, 8 tRNAs and 2 rRNA genes).
All PCGs in the
L. curia mitogenome were initiated by typical ATN codons (seven with ATG, four with ATT, one with ATA), except the
cox1 gene which was tentatively designated by the CGA codon (Table1). Generally, the trinucleotide TTG was assumed to be the
cox1 start codon for some invertebrate taxa including insect species, such as
Pyrocoelia rufa (
Bae et al., 2004),
Caligula boisdnvalii (
Hong et al., 2008), and
Acraea issoria (
Hu et al., 2010).
Among the stop codons of 13 protein-coding genes, three kinds of codon were found in
L.curia TAA (
ND2, ATPase8, ATPase6, COIII, ND4L, ND6, Cytb); TAG (
ND1,ND3); and incomplete stop codon T (
COI, COII, ND4,ND5). Incomplete termination codons are frequently observed in most insect mitogenomes and, in fact, all the sequenced mitogenomes of lepidopteran insects to date contained such codons (Kim et al., 2009). This incomplete codon is often activated through post-transcriptional polyadenylation, in which two A residues are added to create the TAA terminator (
Anderson et al., 1981;
Ojala et al., 1981).
tRNA and rRNA genes
The 22 tRNAs varied from 61 [tRNA
Ser(AGN)] to 69 bp (tRNA
Met, tRNA
Ile, tRNA
Gln, tRNA
Lys) in size, and presented typical clover-leaf structure, with the unique exception of tRNA
Ser (AGN), which lacked the dihydrouridine DHU) stem (Fig. 2). The
L. curia tRNAs harbored a total of 24 pair mismatches in their stems, including six pairs in the DHU stems, three pairs in the amino acid acceptor stems, eight pairs in the TΨC stems and seven pairs in the anticodon stems, respectively. Among these 24 mismatches, 16 were G-U pairs which formed a weak bond in the secondary structure, the remaining eight were atypical pairings: one mismatch in the tRNAHis (C-U), one mismatch in the tRNA
Lys (C-U), one mismatch in the tRNA
Ser(AGY) (U-C), one mismatch in the tRNA
Ile(U-U), 2 in the tRNA
Phe (one U-A and one C-G), and 2 in the tRNA
Ser (UCN) (2U-U) (Fig. 2). The number of mismatches in the
L. curia tRNAs found by the present study was well within the range reported from previous studies for other lepidopteran insect tRNAs (
Liu et al., 2008;
Jiang et al., 2009;
Kim et al., 2010). These tRNAs mismatches can be corrected through RNA editing mechanisms, which are well known for arthropod mtDNA (
Lavrov et al., 2000).
As in all other insect mitogenome sequences, two rRNA genes (
rrnL and
rrnS) were detected in
L. curia. The lrRNA and srRNA genes of the
L. curia mitogenome were 1334 and 785 bp in length, respectively. They were located between tRNA
Leu (CUN) and tRNA
Val and between tRNA
Val and the A+ T-rich region, respectively (Fig. 1). The length of the lrRNA gene was determined to be 1334 bp, which was very well within the size range of 470 bp in
Bemisia tabaci (
Thao et al., 2004) to 1426 bp in
Hyphantria cunea (
Liao et al., 2010), as observed in the corresponding gene of other insects sequenced previously. Similarly, the length of the srRNA gene was determined to be 785 bp, which again was well within the size range of 434 bp in
Ostrinia nubilalis (
Clary and Wolstenholme,1985) to 827 bp in
Locusta migratoria, as observed in the corresponding genes of other completely sequenced insects (
Flook et al., 1995).
A+ T-rich region
The A+ T-rich region of L. curia was located between the srRNA and TrnaMet genes. The 469 bp long A+ T-rich region exhibited the highest A+ T contents (89.8%) than any other regions of L. curia mitogenome. A conserved sequence was found in the 5′-end of A+ T-rich region, which consisted of 2 repeats of a unit (5′-ATAGATTTTTTTTTTTTTTTT-3′). Additionally, other short microsatellite-like repeat regions were also observed throughout the A+ T-rich region, without noticeable macro-repeats.
Phylogenetic analysis
The newly obtained mitogenomes of L. curia was used for phylogenetic analyses, together with the mitogenomes of 45 other lepidopteran, representing seven lepidopteran superfamilies (Papilionoidea, Hesperioidea, Bombycoidea, Geometroidea, Noctuoidea, Pyraloidea and Tortricoidea). The phylogenetic analyses were carried out using Bayesian Inference (BI) and maximum likelihood (ML) algorithms for the concatenated nucleotide and amino acid sequences of 13 protein-coding genes. Four phylogenetic trees were constructed but they all produced one very similar topology structure, involving seven superfamilies clustered into three clades (Figs. 3−6). The first clade included the families of Papilionoidea and Hesperioidea. The second one included the superfamilies of Bombycoidea, Geometroidea, Noctuoidea and Pyraloidea. In particular, Bombycoidea and Geometroidea in the second clade, was first clustered into a small branch of (Bombycoidea+ Geometroidea). This branch of (Bombycoidea+ Geometroidea) was in turn made up for a larger clade with the Noctuoidea. Finally the sub-branch was clustered with Pyraloidea. The Tortricoidea species constituted the third clade.
Papilionoidea: Kristensen (
1976) suggested a close relationship between the Nymphalidae and Lycaenidae groups with the Pieridae based on morphological characteristics alone. The proposal was further supported by subsequent studies using a combination of morphological and molecular characteristics (
Wahlberg et al., 2005;
Kim et al., 2011).
In the present study, when the Hesperiidae was excluded, the molecular phylogenetic relationship among the true butterfly families (Nymphalidae, Lycaenidae, Pieridae and Papilionidae) was demonstrated as (((Lycaenidae+ Pieridae) + Nymphalidae) + Papilionidae), but with the Lycaenidae being identified as a sister group of Pieridae with low nodal support at 43% when the ML method was used based on the amino acid sequence data (Fig. 4). However,the molecular phylogenetic relationship among true butterfly families was showed in a same structure in the three remaining phylogenetic trees as (((Nymphalidae+ Lycaenidae) + Pieridae) + Papilionidae) (Figs. 3, 5 and 6). Nevertheless, this relationship was well in agreement with similar findings of previous studies (
Kristensen, 1976;
Wahlberg et al., 2005;
Kim et al., 2010). Additionally,the sister group relationship existed between Nymphalidae and Lycaenidae was found to have high nodal supports (ML:82%;BI:0.98 and 1.00).
Hesperioidea: Morphological classification currently divides the true butterfly into 2 superfamilies: Papilionoidea and Hesperioidea. Within the Hesperioidea, there is only one family i.e. “Hesperiidae” (
Harvey, 1991;
Ackery et al., 1999). However, the above mentioned morphological classification is yet to be supported by molecular phylogenetic studies. In the study of Wahlberg et al.(
2005) Hesperioidea was found to be within Papilionoidea using Bayesian Inference (BI), having a sister relationship with((Pieridae+ Nymphalidae+ (Riodinidae+ Lycaenidae)). Similar conclusions were also given by Regier et al.(
2008) and Mutanen et al.(
2010) in their respective studies:Hesperioidea as a sister branch of ((Pieridae+ Lycaenidae) + Nymphalidae))existed within Papilionoidea. Likewise, both phylogenetic trees constructed using either ML and BI methods in the present study showed a finding differing to the results of morphological studies. Again, Hesperioidea was found to be within the superfamily of Papilionoidea. All these suggest to the authors that further studies in molecular phylogenetics are warranted in order to determine the evolution position of Hesperioidea.
Bombycoidea: All four phylogenetic trees generated in the present study support a relationship of ((Saturniidae+ Sphingidae) + Bombycidae). This relationship was in total agreement with previous findings, based on either morphological studies (
Minet, 1991) or molecular phyolgenetic analyses (
Kawahara et al., 2009).
Noctuoide: The findings of the present study reinforced the notion that among the Noctuoid, Lymantriidae met first with Arctiidae, subsequently with Noctuidae and lastly with Notodontidae. This convergent structure(((Lymantriidae+ Arctiidae) + Noctuidae) + Notodontidae)was highly evident by both phylogenetic trees obtained using ML method and BI approach with 100% nodal support. Furthermore, our results were also in agreement with those of morphological studies (
Yin et al., 2008).
Pyraloidea: Currently, the proposal of Munroe and Solis (
1999) on morphological classification of Pyraloidea, i.e. Pyraloidea included Crambidae and Pyralidae; are widely accepted in the field. Based on the phyogenetic trees of ML and BI generated by the present study, Crambidae and Pyralidae were sister branches of each other, providing yet again more evidence bolstering the above mentioned suggestion.
Tortricoidea: Morphological classification placed Tortricoidea superfamily as one family(Tortricidae) consisting of 3 sub-families (
Razowski, 1976;
Horak, 1999). Based on our analyses, 3 species under the super-family of Tortricoidea converged to a branch as showed in both ML tree and BI tree. Among, two species from the sub-family of Olethreutinae:
Grapholita molesta and
Spilonota lechriaspis joined one another first. This branch was then joined by
Adoxophyes honmai from the sub-family of Tortricinae. The nodal support obtained for the above structures was significantly high with ML ranged from 96 to 100% (Figs. 3 and 4) and BI at 100% (Figs. 5 and 6) respectively. The results of our research supported the findings of previous morphological studies.
Geometroidea: Phylogenetic trees showed that Geometroidea has the closest relationship with Bombycoidea, as compared to other super family.
Molecular phylogenetic analyses based on mitogenome in recent years have become a hot topic of researches on molecular classification of lepidopteran insect, with good progresses made. Lee et al. (
2006) used 7 protein-coding genes of the lepidoptera mtogenome for their phylogenetic analyses, with all findings in support of single lineage (monophyletic) of Lepidoptera. The internal relationship among the classification groups under Lepidoptera(Apoditrysia(Obtectomera(Macrolepidoptera))), was found by the same study to be in consistent with that defined by the traditional classification. In 2010, Feng et al. (
2010) carried out a similar study on the phylogenetic relationship among the insects of Lepidoptera based on 12 PCGs. Their study supported the current theory of morphological studies, i.e., all of the families: Bombycoidea, Pyraloidea and Papilionoidea is monophyletic.
The findings of the present study is in support of a monophyletic origin for Bombycoidea, Noctuoide, Geometroidea, Pyraloidea and Tortricoidea respectively. Six superfamilies converged to one signle branch of Obtectomera,with the exception of Tortricoidea. This finding is in agreement with traditional classification.
In conclusion, we have sequenced the complete mitogenomes of L.curia. Phylogenetic analyses principally yielded the relationships (((((Bombycoidea+ Geometroidea) + Noctuoide) + Pyraloidea) + (Papilionoidea+ Hesperioidea)) + Tortricoidea). Within the true butterfly families, the relationships (Nymphalidae+ Lycaenidae+ Pieridae+ Hesperiidae+ Papilionidae)were supported by the majority of data sets. Hesperioidea was found to be within the superfamily of Papilionoidea. To further evaluate the Hesperioidea phylogenetic relationships among the true butterflies, a larger number of complete mitogenome sequences that encompass more of the Hesperioidea mitogenome will be required.
Author contributions
Qin xin-min conceived the project. Guan qing-xin performed mitochondrial genome sequencing. Li hui-min performed genomic DNA samples. Zhang yu, Liu yu-ji and Guo dan-ni analyzed the data.
Higher Education Press and Springer-Verlag Berlin Heidelberg