Feature extraction of hyperspectral images for detecting immature green citrus fruit

Yongjun DING, Won Suk LEE, Minzan LI

Front. Agr. Sci. Eng. ›› 2018, Vol. 5 ›› Issue (4) : 475-484.

PDF(2814 KB)
Front. Agr. Sci. Eng. All Journals
PDF(2814 KB)
Front. Agr. Sci. Eng. ›› 2018, Vol. 5 ›› Issue (4) : 475-484. DOI: 10.15302/J-FASE-2018241
RESEARCH ARTICLE
RESEARCH ARTICLE

Feature extraction of hyperspectral images for detecting immature green citrus fruit

Author information +
History +

Abstract

At an early immature growth stage of citrus, a hyperspectral camera of 369–1042 nm was employed to acquire 30 hyperspectral images in order to detect immature green fruit within citrus trees under natural illumination conditions. First, successive projections algorithm (SPA) were implemented to select 677, 804, 563, 962, and 405 nm wavebands and to construct multispectral images from the original hyperspectral images for further processing. Then, histogram threshold segmentation using NDVI of 804 and 677 nm was implemented to remove image backgrounds. Three slope parameters, calculated from the pairs 405 and 563 nm, 563 and 677 nm, and 804 and 962 nm were used to construct a classifier to identify the potential citrus fruit. Then, a marker-controlled watershed segmentation based on wavelet transform was applied to obtain potential fruit areas. Finally, a green fruit detection model was constructed according to Grey Level Co-occurrence Matrix (GLCM) texture features of the independent areas. Three supervised classifiers, logistic regression, random forest and support vector machine (SVM) were developed using texture features. The detection accuracies were 79%, 75%, and 86% for the logistic regression, random forest, and SVM models, respectively. The developed algorithm showed a great potential for identifying immature green citrus for an early yield estimation.

Keywords

hyperspectral / green citrus / image processing / fruit detection / precision agriculture / yield mapping

Cite this article

Download citation ▾
Yongjun DING, Won Suk LEE, Minzan LI. Feature extraction of hyperspectral images for detecting immature green citrus fruit. Front. Agr. Sci. Eng., 2018, 5(4): 475‒484 https://doi.org/10.15302/J-FASE-2018241

1 Introduction

Genome sequencing and assembly for European domestic Duroc pigs has greatly improved the genetic resources available for this important livestock species[1], and has enhanced the potential of this pig as a model organism for biomedical studies. Biological adaptability has enabled development of over 730 current pig breeds or lines that are distributed globally across a wide range of environments[2]. Modern domestic pigs have undergone strong genetic selection in specialized commercial populations; this has led to remarkable phenotypic changes and genetic adaptation, which makes these breeds an important world heritage and scientific resource for comparative genomic studies[3,4]. Recently, a list of ‘domestication genes’ has been compiled for silkworms[5], chickens[6], pigeons[7], rabbits[8], dogs[9], cattle[10] and pigs[1114] by genomic sequencing. Rongchang pigs, a Chinese indigenous breed raised only in Southwest China with a center of production in the Sichuan basin, have been intensively selected for efficient accumulation of muscle and highly prized pork traits (i.e., juiciness, flavor, tenderness, pink hue and heavy marbling). The phenotypes of Rongchang pigs, characterized by their average-size head, concave and wrinkled face, well-developed limbs, concave back, tilted haunch and a big belly, are remarkably different from wild boars.
To identify genomic selection signatures in Rongchang pigs, we performed whole-genome resequencing of six female Rongchang pigs (about 170 Gb in total) and evaluated the genomic regions under selection.

2 Materials and methods

2.1 Animals and genome sequencing

Genomic DNA was extracted from the blood of six female Rongchang pigs from a nucleus herd in Chongqing Municipal Breeding Pig Farm, with no direct and collateral blood relationship within the last three generations among the individuals selected. Sequencing was performed on a HiSeq 2000 platform (Illumina, San Diego, CA, USA). In addition, we downloaded genomic data of 101 pigs worldwide from the EMBL-EBI database (http://www.ebi.ac.uk/) under accession number ERR173 and the NCBI sequence read archive (SRA) under accession number SRA065461, which included 30 European domestic pigs, 20 Chinese domestic pigs, 30 Tibetan wild pigs from China, 10 Asian wild boars, six European wild boars, four other species in the genus Sus, and an African warthog[1,11,15,16].

2.2 SNP calling

We first filtered low-quality paired reads, which mainly resulted from base-calling duplicates and adapter contamination. The qualified paired-end reads were mapped to the pig reference genome assembly (Sscrofa10.2)[1] using BWA software[17]. After alignment, we performed SNP calling on a population-scale for two groups (56 domestic pigs and 51 other pigs as detailed above and an African warthog) using a Bayesian approach implemented in SAMtools[18]. The genotype likelihoods from reads for each individual at each genomic location were calculated, and the allele frequencies were estimated. Only the high-quality SNPs (coverage depth≥4 and≤1000, RMS (root mean square) mapping quality≥20, distance of adjacent SNPs≥5 bp and missing ratio of samples within each group<50%) were kept for the subsequent analysis.

2.3 Functional enrichment analysis

Functional enrichment analysis of Gene Ontology (GO) terms, pathway, and InterPro domains were identified using the DAVID web server[19]. Genes were mapped to their respective human orthologs, and the lists were submitted to DAVID for enrichment analysis of the significant overrepresentation of GO biological processes (GO-BP), molecular function (GO-MF) terms, and KEGG pathway and InterPro categories. In all tests, all the known genes were assigned as the background, and P values (i.e., EASE score), which indicated significance of overlap between various gene sets, were calculated using Benjamini-corrected modified Fisher’s exact test. Only terms with P<0.05 were considered significant.

2.4 Phylogenetic analyses

Phylogenetic relationships were inferred using the package TreeBeST (http://treesoft.sourceforge.net/treebest.shtml) under the p-distance model using SNPs at a population-scale. We performed principle component analysis (PCA) with population-scale SNPs using EIGENSOFT4.2[20]. The significance level of eigenvectors was determined using the Tracy-Widom test[20].

2.5 Identification of selected regions

A sliding window approach (100-kb windows with 10-kb steps) was applied to quantify the pooled heterozygosity (Hp), genetic differentiation (FST), and selection statistics (Tajima’s D, which is a measure of selection in the genome) between Rongchang pigs and Asian wild boars. To detect regions with significant selective sweep signatures, we Z-transformed the resultant distributions of Hp scores and FST values, and simultaneously selected windows with low Z(Hp) (<−2) in Rongchang pigs and high Z(FST) (>2) as genomic regions with strong selective sweep signals that could harbor genes under selection.

3 Results and discussion

3.1 Sequencing and SNP calling

We generated a total of about 170 Gb of paired-end reads, of which 85% (144.43 Gb) of high-quality reads were mapped to the pig reference genome assembly (Sscrofa10.2) with about 6.4-fold coverage for each individual (Table S1). In addition, we downloaded about 1037 Gb of genomic data from 101 publically available pig genomes in the EMBL-EBI database and about 659 Gb in the NCBI SRA database[1,11,15,16] (Table S2).
We performed SNP calling on a population-scale and identified 10.13 M SNPs from 107 individuals (Table S3). We then separately pooled and obtained SNP sets for each of two groups, which included 6.74 M from the 56 domestic pigs and 7.76 M from the 51 other individuals. We identified 6.50 M SNPs from six Rongchang pigs, of which 47205 were coding SNPs included 15662 nonsynonymous nucleotide substitutions (15540 missense, 102 stop-gain and 20 stop-loss mutations) that were detected in 6910 genes (Table S4). These nonsynonymous SNP-containing genes in Rongchang pigs were mainly related to the G-protein coupled receptor protein signaling pathway (97 genes, P = 2.14 × 1010), especially sensory perception of chemical stimulus (68 genes, P = 1.02 × 109), smell (61 genes, P = 1.02 × 108), olfactory transduction (56 genes, P = 1.92 × 108) and olfactory receptor (59 genes, P = 5.73 × 108). Pigs have one of the largest repertoires of functional olfactory receptor genes that encode the G-protein coupled receptor superfamily[21]. In previous reports[1], similar rapid evolution of olfactory-related genes with extensive nucleotide variation have been found, reflecting the importance of smell in this scavenging animal and other odor-driven behaviors, such as individual recognition and mating preferences[22,23].

3.2 Genome-wide selective sweep signals

It has been well documented that from about 1000000 years ago, European and Chinese pigs diverged from each other, originating independently from different subspecies of ancestral wild boars around 10000 years ago[1,16,24]. To examine relatedness between Rongchang pigs and other pigs, we constructed a neighbor-joining tree (Fig. 1a), and conducted PCA (Fig. 1b; Table S5) using genomic SNPs, both of which revealed a deep phylogenetic split between European and Asian pigs. To accurately detect genomic footprints left by selection in Rongchang pigs and avoid genetic differences that resulted from geographic isolation of Europe from China, we specifically measured the genome-wide variations and frequency spectrum based on 8.32 M SNPs between six Rongchang pigs and 10 Asian wild boars (Table S6).
Fig.1 Phylogenetic relationship of Rongchang pigs. (a) Neighbor-joining phylogenetic tree of pig breeds. The scale bar represents p-distance; (b) two-way principle component plot of pig breeds. The fractions of the variance explained are 12.2% and 5.74% for eigenvectors 1 and 2, respectively, with a Tracy-Widom P value<1078 (Table S5).

Full size|PPT slide

In total, 229772 100-kb windows with 10-kb steps across the pig genome contained≥100 SNPs within each window and covered 84.4% of the genome and were used to identify the regions that may have been affected by selection during domestication. We empirically chose to set the thresholds at Z(Hp)Rongchang<−2 and Z(FST)>2, because they represent the extreme tails of the distributions and are hence likely enriched for strong selective sweep signals along the genome, which could harbor genes that underwent a selective sweep. From this we identified a total of 44.86 Mb of genomic data (1.61% of the genome containing 449 genes) with strong selective sweep signals in Rongchang pigs (Fig. 2a), which also exhibited significant differences (P<1016, Mann–Whitney U test) based on Z(Hp),Z(FST), and Tajima’s D when compared with the genomic background (Fig. 2b).
We also constructed a phylogenetic neighbor-joining tree (Fig. 2c) and performed PCA (Fig. 2d) exclusively using the SNPs in regions with strong selective sweep signals. Although Rongchang pigs and Asian wild boars are genetically close, based on the 8.32 M SNPs across the whole genome (Fig. 1), they form two distinct clusters with respect to these SNPs (0.62 M, 7.45%), which are potentially under adaptive evolution resulting from industrial agriculture (Fig. 2c, Fig. 2d).
Fig.2 Genomic regions with strong selective sweep signals in Rongchang pigs. (a) Genome-wide distribution of pooled heterozygosity values (Hp), genetic differentiation (FST), and corresponding Z transformations (Z(Hp)) and Z(FST), which were calculated in 100-kb windows with 10-kb steps (n = 229772, contain≥100 SNPs). Data points located to the right of the vertical line (where Z(FST) is 2) and below the horizontal line (where Z(Hp) is −2) were identified as selected regions in Rongchang pigs (red points). m, mean; s, standard deviation; (b) violin plot of Z(Hp)Rongchang, Z(FST), and |Tajima’s DRongchang pigs – Tajima’s DAsian wild boars| in genomic regions with strong selective sweep signals for Rongchang pigs compared with the whole genome. Out of 229772 100-kb windows that contained≥100 SNPs with 10-kb steps across the pig reference genome (gray violin), 1852 windows were picked out as regions with strong selective sweep signals (green violin). Each violin with the width depicting a 90°-rotated kernel density trace and its reflection. Vertical black boxes denote the interquartile range between the first and third quartiles (25th and 75th percentiles, respectively) and the white point inside denotes the median. Vertical black lines denote the lowest and highest values within a 1.5 times interquartile range from the first and third quartiles, respectively. The statistical significance was calculated by the Mann–Whitney U test; (c) phylogenetic tree (scale bar represents p-distance); (d) two-way principle component plot of Rongchang pigs (n = 6) and Asian wild boars (n = 10) based on SNPs in regions with strong selective sweep signals with 25.0% of variance explained for eigenvector 1, (P = 0.030, Tracy-Widom test) and 13.7% for eigenvector 2 (P = 0.277, Tracy-Widom test).

Full size|PPT slide

The 449 genes embedded in selected regions were analyzed using DAVID to examine whether these domestic genes were enriched for specific functional gene categories (Table 1). Our findings coincide with previous reports on genes related to pig domestication[14,1115]: genes related to growth and hormone binding, which included seven terms, were observed to be under strong selective sweep in Rongchang pigs, and may have contributed to the rapid growth and enhanced muscle development of domestic pigs.
Tab.1 Functional gene categories enriched for genes affected by selection in Rongchang pigs
CategoryTerm descriptionInvolved gene numberP value
GO-BP:0010648Negative regulation of cell communication130.007
GO-BP:0007242Intracellular signaling cascade400.011
GO-BP:0048009Insulin-like growth factor receptor signaling pathway30.015
GO-MF:0017046Peptide hormone binding40.018
GO-MF:0042562Hormone binding50.019
GO-MF:0005158Insulin receptor binding40.020
GO-BP:0051960Regulation of nervous system development100.022
GO-BP:0032868Response to insulin stimulus60.033
GO-BP:0050769Positive regulation of neurogenesis50.037
GO-BP:0050767Regulation of neurogenesis80.040
GO-BP:0045664Regulation of neuron differentiation70.041
GO-BP:0010975Regulation of neuron projection development50.041
KEGG-Pathway: 00983Drug metabolism40.041
GO-BP:0006396RNA processing190.046
GO-BP:0010720Positive regulation of cell development50.047
GO-MF:0019899Enzyme binding180.049
GO-BP:0009725Response to hormone stimulus140.049

Note: P values (i.e., EASE scores), which indicate significance of the overlap between various gene sets, were calculated using a Benjamini-corrected modified Fisher’s exact test. Only Gene Ontology (GO) biological process (GO-BP), GO-molecular function (GO-MF) and KEGG pathway terms with P<0.05 were considered significant and listed.

Of note, 10 genes putatively under selection that are related predominantly to nervous system development, most with a single allele in Rongchang pigs, include CNTN4, DLL3, GHSR, LHX5, MAP1B, MBP, METRN, NUMBL, TNFRSF12A and REST, several of which affect brain development, neuronal functions, and behavior (Fig. 3). This result supports the view that altered behavior (such as reduced fear, higher levels of adult play, and tameness or aggression toward humans), in addition to the obvious dramatic changes in appearance and physiology, was also important during domestication, and that mutations affecting developmental genes may underlie these changes[2529]. In addition, four genes (CYP2A6, GMPS, UPB1 and UPP2) in Rongchang pigs exhibited strong selective sweep signatures enriched for drug metabolism (Fig. S1). Given that alterations in these genes are associated with a variety of drug metabolism associated-diseases[30], their positive selection may be attributed to constant exposure of domestic pigs in modern industry to much higher dosages of chemicals/drugs and an increased number of environmental xenobiotics, which could have accelerated evolution of drug metabolism.
Fig.3 Genes related to nervous system development that show selective sweep signatures in Rongchang pigs. (a) Z(Hp), Z(FST), and Tajima’s D values are plotted using a 10-kb sliding window. Genomic regions located above the upper horizontal dashed red line (Z(FST) = 2) and below the lower horizontal dashed black line (Z(Hp) = −2) were considered regions with strong selective sweep signals for Rongchang pigs (beige regions). Genome annotations are shown at the bottom (black bar: coding sequences, blue bar: genes). The boundaries of genes related to nervous system development are marked in red; (b) the gene trees for 10 genes related to nervous system development of 10 Asian wild boars and six Rongchang pigs.

Full size|PPT slide

4 Conclusions

This study examined the genetic relationships among Chinese Rongchang and other pigs, and uncovered genetic footprints of domestication and selection that provide an important resource for further improvements of this important livestock species. We envision that the data presented here will provide a representative example on which to base future deciphering of genomic footprints left by livestock domestication and selection.

References

[1]
Bulanon D M, Burks T F, Alchanatis V, Noguchi N. A multispectral imaging analysis for enhancing citrus fruit detection. Environment Control in Biology, 2010, 48(2): 81–91
CrossRef Google scholar
[2]
Bulanon D M, Burks T F, Alchanatis V. Study on temporal variation in citrus canopy using thermal imaging for citrus fruit detection. Biosystems Engineering, 2008, 101(2): 161–171
CrossRef Google scholar
[3]
Xu H R, Ye Z Z, Ying Y B. Identification of citrus fruit in a tree canopy using color information. Transactions of the Chinese Society of Agricultural Engineering, 2005, 21(5): 98–101 (in Chinese)
[4]
Cai J R, Zhou X J, Li Y L, Fan J. Recognition of mature oranges in natural scene based on machine vision. Transactions of the Chinese Society of Agricultural Engineering, 2008, 24(1): 175–178 (in Chinese)
[5]
Kane K E, Lee W S. Multispectral imaging for in-field green citrus identification. In: ASAE Annual Meeting 2007, Minneapolis. St. Joseph: American Society of Agricultural and Biological Engineers, 2007, 1–11
[6]
Kurtulmus F, Lee W S, Vardar A. Green citrus detection using ‘eigenfruit’ color and circular Gabor texture features under natural outdoor conditions. Computers and Electronics in Agriculture, 2011, 78(2): 140–149
CrossRef Google scholar
[7]
Sengupta S, Lee W S. Identification and determination of the number of immature green citrus fruit in a canopy under different ambient light conditions. Biosystems Engineering, 2014, 117(1): 51–61
CrossRef Google scholar
[8]
Zhao C, Lee W S, He D. Immature green citrus detection based on colour feature and sum of absolute transformed difference (SATD) using colour images in the citrus grove. Computers and Electronics in Agriculture, 2016, 124(C): 243–253
CrossRef Google scholar
[9]
Li H, Lee W S, Wang K. Immature green citrus fruit detection and counting based on fast normalized cross correlation (FNCC) using natural outdoor colour images. Precision Agriculture, 2016, 17(6): 678–697
CrossRef Google scholar
[10]
Okamoto H, Lee W S. Green citrus detection using hyperspectral imaging. Computers and Electronics in Agriculture, 2009, 66(2): 201–208
CrossRef Google scholar
[11]
Kim M S, Chen Y R, Mehl P M. Hyperspectral reflectance and fluorescence imaging system for food quality and safety. Transactions of the ASAE (American Society of Agricultural Engineers), 2001, 44(3): 721–729
[12]
Araújo M C U, Saldanha T C B, Galvao R K H, Yoneyama T, Chame H C, Visani V. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems, 2001, 57(2): 65–73
CrossRef Google scholar
[13]
Wu P C, Chen L G. An efficient architecture for two-dimensional discrete wavelet transform. IEEE Transactions on Circuits and Systems for Video Technology, 2001, 11(4): 536–545
CrossRef Google scholar
[14]
Kim J B, Kim H J. Multiresolution-based watersheds for efficient image segmentation. Pattern Recognition Letters, 2003, 24(1): 473–488
CrossRef Google scholar
[15]
Yang X, Li H, Zhou X. Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and Kalman filter in time-lapse microscopy. IEEE Transactions on Circuits and Systems. I, Regular Papers, 2006, 53(11): 2405–2414
CrossRef Google scholar
[16]
Yazdi M, Gheysari K. A new approach for the fingerprint classification based on gray-level co-occurrence matrix. International Journal of Computer and Information Science and Engineering, 2008, 2(3): 171–174
[17]
Yan J, Lee J. Degradation assessment and fault modes classification using logistic regression. Journal of Manufacturing Science and Engineering, 2005, 127(4): 912–914
CrossRef Google scholar
[18]
Guo L, Ma Y, Cukic B, Singh H. Robust prediction of fault-proneness by random forests. In: 15th International Symposium on Software Reliability Engineering 2004, Bretagne. USA: IEEE, 2004, 417–428
[19]
Cherkassky V, Ma Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 2004, 17(1): 113–126
CrossRef Pubmed Google scholar
[20]
Brereton R G, Lloyd G R. Support vector machines for classification and regression. Analyst, 2010, 135(2): 230–267
CrossRef Pubmed Google scholar

Acknowledgements

This work was funded by the National Natural Science Foundation of China (31360291), China Scholarship Council (201408625069) and University of Florida.

Compliance with ethics guidelines

ƒYongjun Ding, Won Suk Lee, and Minzan Li declare that they have no conflicts of interest or financial conflicts to disclose.ƒ
This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

The Author(s) 2018. Published by Higher Education Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0)
AI Summary AI Mindmap
PDF(2814 KB)

Supplementary files

FASE-17161-OF-CL_suppl_1 (297 KB)

Accesses

Citations

Detail

Sections
Recommended

/