Conserved gene arrangement in the mitochondrial genomes of barklouse families Stenopsocidae and Psocidae

Substantial variation in gene organization and arrangement has been reported for sequenced mitochondrial (mt) genomes from the suborders of the insect order Psocoptera. In this study we sequenced the complete mt genome of Stenopsocus immaculatus, the first representative of the family Stenopsocidae from the suborder Psocomorpha. Relative to the ancestral pattern, rearrangements of a protein-coding gene (nad3) and five tRNA genes (trnQ, trnC, trnN, trnS1, trnE) were found. This pattern was similar to that of two barklice from the family Psocidae, with the exception of the translocation of trnS1, trnE and trnI. Based on comparisons of pairwise breakpoint distances of gene rearrangements, gene number and chromosome number, it was concluded that mt genomes of Stenopsocidae and Psocidae share a relatively conserved pattern of gene rearrangements; mt genomes within the Psocomorpha have been generally stable over long evolutionary history; and mt gene rearrangement has been substantially faster in the booklice (suborder Troctomorpha) than in the barklice (suborders Trogiomorpha and Psocomorpha). It is speculated that the change of life history and persistence of unusual reproductive systems with maternal inheritance contributed to the contrasting rates in mt genome evolution between the barklice and booklice.


Introduction
The animal mitochondrial (mt) genome is important in cell metabolism, apoptosis, disease and aging [1] . The mt genome is the most extensively studied genomic system in insects, with sequenced representatives from all insect orders available in GenBank [2] . Sequence data from the mt genome has been used widely for phylogenetic analyses across broad taxonomic levels [3][4][5][6][7][8] due to the abundance of mt genomes in tissues, their greater rate of evolution than the nuclear genome and their evolutionarily conserved transcription products [2,9,10] The insect mt genome is a compact circular molecule and typically encodes 13 protein-coding genes (PCGs), two rRNA (rRNA) genes and 22 tRNA (tRNA) genes. In addition to this set of 37 genes, there are a variety of noncoding regions, the largest of which is usually termed the control region (CR) as it contains both an origin of replication and transcription [11] . In general, the arrangement of genes in insect mt genomes is highly conserved [2,12,13] , with notable exceptions in the Hymenoptera, Thysanoptera and Psocodea (Phthiraptera and Psocoptera) [14][15][16][17][18][19] .
The insect order Psocoptera contains 39 extant families and more than 5000 described species in three suborders: Trogiomorpha (barklice), Psocomorpha (barklice) and Troctomorpha (booklice) [20] . To date, complete mt genomes have been published for three barklice and five booklice [18,19,[21][22][23][24] . The mt genomes of the five booklice (all from the genus Liposcelis) differed significantly in gene arrangement from the generally-accepted ancestral insect mt genome, and even from each other [18] , e.g., no commonalities were observed between the ancestral gene arrangement and the arrangement in Liposcelis entomophila [21] , and only one commonality (atp8-atp6) was seen between the ancestral gene arrangement and the other four Liposcelis species [22] . Liposcelis decolor and Liposcelis sculptilis, like most other insects, have the typical singlechromosome mt genome; the other three Liposcelis species, however, possess multipartite mt genomes comprising two chromosomes [22,23] . The mt genomes of the barklice were much less rearranged than those of the booklice. The genome of the lepidopsocid (suborder Trogiomorpha) had rearrangements of eight genes, including a PCG [24] . Two genomes of the other two barklice, Psococerastis albimaculata and Longivalvus hyalospilus (family Psocidae, suborder Psocomorpha), had the least rearrangements, with the transposition of one PCG and five tRNA genes, explained by the mechanism of tandem duplication-random loss (TDRL) [19] . The variation observed within the currently-available barklice and booklice mt genomes greatly limits their use in reliably inferring evolutionary patterns within the Psocoptera, and additional mt genomic data from other families are needed.
To further understand the evolution of the mt genome organization in the Psocoptera, we sequenced the complete mt genome of Stenopsocus immaculatus, the first representative of the family Stenopsocidae. The conserved pattern of gene arrangement in the mt genomes of two barklice families, Stenopsocidae and Psocidae suggested that mt genomes within the suborder Psocomorpha have been stable over long evolutionary timescales after some early gene rearrangements.

Samples collection and DNA extraction
Specimens of the narrow barklouse, S. immaculatus, were collected from Ashoro Washippu, Hokkaido, Japan in 2013 and kept in 100% ethanol before being transferred to -20°C for long-term preservation at the Entomological Museum of China Agricultural University, Beijing, China. Genomic DNA was extracted from six adult whole bodies excluding the abdomen with a DNeasy DNA Extraction kit (Qiagen, Valencia, CA, USA).

Mitochondrial genome sequencing, assembly and annotation
Two fragments of mtDNA (cox1 and rrnS) were amplified using PCR and Sanger sequencing following Li et al. [25] (Table S1). A library was prepared from the genomic DNA with an insert size of 450 bp and was sequenced on the Illumina Hiseq 2500 platform at Berry Genomics, Beijing. 4 Gb of clean data (250 bp paired-end reads) were used to assemble the mt genome using the method map to reference in Geneious v9.0.4 [26] with the sequences of cox1 and rrnS as the reference. The assembly parameters were: minimum overlap identity 98%, no gaps, maximum mismatches per read 2%, maximum ambiguity 2, and minimum overlap 100 bp. PCGs and rRNAs were identified using BLAST searches in GenBank and subsequently by alignment with sequences of other barklice [19,24] . tRNA genes were identified by tRNAscan-SE Search Server v1.21 [27] and checked manually. tRNA genes that could not be determined by tRNAscan-SE were determined in the unannotated regions by sequence similarity to tRNAs of other barklice. The annotated entire mt genome sequences of S. immaculatus have been deposited in GenBank under accession number KX187004.

Nucleotide composition, substitution rate and gene rearrangement analyses
The nucleotide composition of the mt genomes was calculated using MEGA 6.0 [28] . The nonsynonymous (Ka) and synonymous (Ks) substitution rate was calculated using DnaSP v.5.0 [29] . To compare the rates in mt gene rearrangement among major lineages of Psocoptera, pairwise breakpoint distances were calculated with the CREx web server [30] .

Sequence alignment and phylogenetic analyses
Four barklice and five booklice from Psocoptera were included in phylogenetic analyses ( Table 1). The true bug, Halyomorpha halys, was used as an outgroup. Sequences of 12 PCGs (without nad4L) along with two rRNA genes were used for the phylogenetic analyses. Each PCG was aligned separately based on multiple codon alignment with the MAFFT algorithm in the TranslatorX online platform [32] . The two rRNA genes were aligned with the MAFFT 7.0 online server [33] .
The individual gene alignments were assembled into a concatenated data set which was partitioned by genes using both Bayesian inference (BI) and maximum likelihood (ML) analyses with MrBayes 3.23 [34] and RAxML-HPC2 8.1.11 [35] . The best-fitting model (GTR + I + G) for the nucleotide data set was determined with jModelTest 0.1.1 [36] . For the ML analysis, the reliability of the inferred topology was assessed by performing 1000 rapid bootstrap replicates. For Bayesian analysis, two simultaneous runs of 10 million generations were performed for the data set and trees were sampled every 1000 generations, with the first 25% discarded as burn-in. The analysis was considered to have reached stationarity when the average standard deviation of split frequencies decreased to 0.01 [37] . The mt genome of S. immaculatus was determined to be a double-strand circular DNA molecule 16991 bp in length, including the 37 typical coding genes (13 PCGs, 22 tRNAs and two rRNAs) and a 1720 bp CR (Fig. 1). Twenty-three genes are encoded on the major strand (J-strand) and other 14 genes are encoded on the minor strand (N-strand). In  addition to the CR, another 14 non-coding regions were observed with the longest one (332 bp) flanked by trnI and trnM (Table S2). There were also 13 overlaps identified, ranging from 1 to 7 bp in length. The nucleotide composition of the S. immaculatus mt genome had a bias toward A and T, with an A + T content of 78.3%; the highest A + T content was found in the third codon position of PCGs (88.9%) and the CR (84.8%) ( Table 2). The whole mt genome was found to have a slight T skew and C skew, and the PCGs had an obvious T skew and moderate G skew. The codon usage of PCGs also contributed to the genome-wide bias toward AT and there was a strong bias to A or T at the third codon position of PCGs (Table S3). All of the PCGs had the standard ATN start codon. Eleven PCGs had the complete stop codon, either TAG or TAA, whereas cox3 and nad5 have an incomplete stop codon T, which is presumed to be subsequently completed by the post-transcriptional polyadenylation process [38] .

Results and discussion
The typical 22 tRNAs found in insect mt genomes were found in S. immaculatus, with lengths varying from 61 to 69 bp. Most of the tRNAs could be folded as classic cloverleaf structures, with the exception of trnS1, in which its dihydrouridine arm simply forms a loop, instead of a stemloop (Fig. S1). Based on the secondary structure, a total of 22 unmatched base pairs were found in the tRNAs. Eighteen of them were G-U pairs, which form a weak bond. The remaining four pairs included one A-G, one C-U, and two A-C mismatches.
The putative CR (1720 bp) was flanked by rrnS and trnI and is highly AT-rich (84.8%). Two tandem mirror repeats, unit I (683 bp) and unit II (686 bp), were identified in this region (Fig. 2). The sequence of mirror-repeat unit II was reversed with respect to unit I and showed high sequence similarity (99.4%) to unit I, with only four point mutations. Mirror repeats are known to form triplex H-DNA which are intrinsically mutagenic in mammalian cells [39] . They are known to induce large scale mutations such as deletions and/or rearrangements at a higher frequency [40] . Most of the repetitions found in the CRs of insect mt genomes, to date, are short repeat units similar to minisatellites with high copy numbers [41] . The occurrence of a large mirror repeats, with few mutations, found in the CR of the S. immaculatus mt genome is a rare event deserving the attention of future studies.

Gene rearrangements in the mitochondrial genome of Stenopsocus immaculatus
Compared with the generally-accepted ancestral mt gene arrangement of insects, rearrangements of a PCG (nad3) and five tRNAs (trnM, trnC, trnN, trnS1 and trnE) were found in two regions of the S. immaculatus mt genome: (1) between cox3 and trnH, and (2) between CR and trnY (Fig. 3). These two regions are considered as the active regions for mt gene rearrangement in hymenopterans [15] , flat bugs [3] and barklice from the family Psocidae [19] .
Three TDRL events could explain the gene rearrangements observed in the mt genome of S. immaculatus. In the first hypothesized TDRL event, the gene cluster of trnI-trnQ-trnM-nad2-trnW-trnC-trnY was duplicated in tandem and one copy of each duplicated gene was randomly deleted, leading to the new gene arrangement trnI-trnM-trnC-trnQ-nad2-trnW-trnY. The gene rearrangement in the second active region could be derived from another hypothesized TDRL event (Fig. 4). The mt gene rearrangements in S. immaculatus is slightly more complex than those in two barklice form the Psocidae (suborder Psocomorpha) with a third hypothesized TDRL event changing trnS1-trnE to trnE-trnS1.

Evolution of mitochondrial genome organization in the Psocoptera
To investigate the evolution of mt genome organization in the Psocoptera, we inferred phylogenetic relationships between the four barklice and five booklice, representing all three suborders, for which complete mt genome sequences are available. From ML and BI analyses trees were inferred with an identical topology with strong support for most of the clades (Fig. 5). Within the suborder Psocomorpha, S. immaculatus (Stenopsocidae) was recovered as the sister to a clade comprising P. albimaculata and L. hyalospilus (Psocidae). The close relationship between these two Psocomorpha families was also supported by their highly similar gene arrangement, sharing the same gene boundaries, with the exception of the translocation of trnS1 and trnE and trnI. This result indicated a relatively conserved pattern of gene rearrangements in the mt genomes of Stenopsocidae and Psocidae. A phylogenetic study had previously suggested that these two families (Stenopsocidae and Psocidae) represent the most widely divergent clades within the suborder Psocomorpha [42] . This suggests that mt genomes within Psocomorpha have been stable over a long period of evolutionary history following some early gene rearrangements.
To compare the evolutionary rates of mt gene rearrangement among major lineages of the Psocoptera, the pairwise breakpoint distances of gene rearrangement, gene number and chromosome number, and nonsynonymous (Ka) and synonymous (Ks) substitution rates were calculated, and are given adjacent to the phylogenetic tree (Fig. 5). The mt genomes of the Psocoptera showed accelerated evolutionary rates with breakpoint distances of gene rearrangements, relative to the ancestral insect genome, ranging from 12 to 36. There were substantial differences found between the three suborders of the Psocoptera. The five booklice from the Troctomorpha have had much faster evolutionary rates than the three barklice from the other Note: AT-skew = (A% -T%)/(A% + T%); GC-skew = (G% -C%)/(G% + C%).  two suborders, with notably higher breakpoint distances and nucleotide substitution rates. The five booklice also showed more variation in terms of missing/copied genes and genome fragmentation (i.e., two chromosomes) (Fig. 5). Differences in the mt genomes among the three suborders may be attributable to the obviously different ecologies of barklice and booklice. Barklice are entirely free-living insects, whereas booklice are closely associated with the nests of birds and mammals as well as human households. Our previous study indicated that ecological changes in booklice and parasitic lice (Phthiraptera) appears to be associated with an increased rate of mt gene rearrangement [19] . This finding is supported by a recent study which associated an extraordinarily divergent mitochondrial karyotype with maternally transmitted sex ratio distortion in Liposcelis bostrychophila [43] . When considering that parthenogenetic reproduction is common in booklice, we speculate that the change of life history and persistence of unusual reproductive systems with maternal inheritance may have contributed to the contrasting rates of evolution in mt genomes between the barklice and booklice.

Conclusions
We present the complete mt genome of Stenopsocus immaculatus, the first representative of the family Stenopsocidae from the suborder Psocomorpha. Comparative mt genome analyses reveal conserved pattern of gene rearrangements in Stenopsocidae and Psocidae, and mt gene rearrangement has been substantially faster in the booklice (suborder Troctomorpha) than in the barklice (suborders Trogiomorpha and Psocomorpha). The change of life history and persistence of unusual reproductive systems with maternal inheritance is likely to contribute to the contrasting rates in mt genome evolution between the barklice and booklice.