An unsupervised learning approach to study synchroneity of past events in the South China Sea

Kevin C. Tse, Hon-Chim Chiu, Man-Yin Tsang, Yiliang Li, Edmund Y. Lam

Front. Earth Sci. ›› 2019, Vol. 13 ›› Issue (3) : 628-640.

PDF(1218 KB)
PDF(1218 KB)
Front. Earth Sci. ›› 2019, Vol. 13 ›› Issue (3) : 628-640. DOI: 10.1007/s11707-019-0748-x
RESEARCH ARTICLE
RESEARCH ARTICLE

An unsupervised learning approach to study synchroneity of past events in the South China Sea

Author information +
History +

Abstract

Unsupervised machine learning methods were applied on multivariate geophysical and geochemical datasets of ocean floor sediment cores collected from the South China Sea. The well-preserved and continuous core samples comprising high resolution Cenozoic sediment records enable scientists to carry out paleoenvironment studies in detail. Bayesian age-depth chronological models constructed from biostratigraphic control points for the drilling sites are applied on cluster boundaries generated from two popular unsupervised learning methods: K-means and random forest. The unsupervised learning methods experimented have produced compact and unambiguous clusters from the datasets, indicating that previously unknown data patterns can be revealed when all variables from the datasets are taken into account simultaneously. A study of synchroneity of past events represented by the cluster boundaries across geographically separated ocean drilling sites is achieved through converting the fixed depths of cluster boundaries into chronological ranges represented by Gaussian density plots which are then compared with known past events in the region. A Gaussian density peak at around 7.2 Ma has been identified from results of all three sites and it is suggested to coincide with the initiation of the East Asian monsoon. Contrary to traditional statistical approach, a priori assumptions are not required for unsupervised learning, and the clustering results serve as a novel data-driven proxy for studying the complex and dynamic processes of the paleoenvironment surrounding the ocean sediment. This work serves as a pioneering approach to extract valuable information of regional events and opens up a systematic and objective way to study the vast global ocean sediment datasets.

Keywords

machine learning / ocean sediments / unsupervised classification

Cite this article

Download citation ▾
Kevin C. Tse, Hon-Chim Chiu, Man-Yin Tsang, Yiliang Li, Edmund Y. Lam. An unsupervised learning approach to study synchroneity of past events in the South China Sea. Front. Earth Sci., 2019, 13(3): 628‒640 https://doi.org/10.1007/s11707-019-0748-x

References

[1]
Alley R B, Mayewski P A, Sowers T, Stuiver M, Taylor K C, Clark P U (1997). Holocene climatic instability: a prominent, widespread event 8200 yr ago. Geology, 25(6): 483–507
[2]
An Z (2000). The history and variability of the East Asian paleomonsoon climate. Quat Sci Rev, 19(1): 171–187
[3]
Benaouda D, Wadge G, Whitmarsh R B, Rothwell R G, MacLeod C (1999). Inferring the lithology of borehole rocks by applying neural network classifiers to downhole logs: an example from the ocean drilling program. Geophys J Int, 136(2): 477–491
[4]
Bennett K D, Fuller J L (2002). Determining the age of the Mid-Holocene Tsuga canadensis (hemlock) decline, eastern North America. Holocene, 12(4): 421–429
[5]
Birks H J B (1989). Holocene isochrone maps and patterns of tree-spreading in the British isles. J Biogeogr, 16(6): 503–540
[6]
Breiman L (1984). Classification and Regression Trees. New York: Chapman & Hall
[7]
Breiman L (2001). Random forests. Mach Learn, 45: 5–32
[8]
Chauhan S, Ruhaak W, Khan F, Enzmann F, Mielke P, Kersten M, Sass I. (2016). Processing of rock core microtomogrpahy images: using seven different machine learning algorithms. Comput Geosci, 86: 120–128
[9]
Cheeseman P, Self M, Kelly J, Taylor W, Freeman D, Stutz J (1988). Bayesian classification. In: Proceedings of the Seventh AAAI National Conference on Artificial Intelligence. AAAI’88. New York: AAAI Press, 607–611
[10]
Cracknell M J, Reading A M, McNeill A W (2014). Mapping geology and volcanic hosted massive sulfide alteration in the Hellyer-Mt Charter region, Tasmania, using random forest and self-organising maps. Aust J Earth Sci, 61: 287–304
[11]
Davis M H A (1984). Piecewise-deterministic markov processes: a general class of non-diffusion stochastic models (with discussion). J R Stat Soc B, 46: 353–388
[12]
Exp. 349 scientists. (2014). IODP expedition 349 preliminary report, South China Sea tectonics- opening of the South China Sea and its implications for southeast asian tectonics, climates and deep mantle processes since the late mesozoic. Initial reports. New York: IODP
[13]
Goetz J N, Brenning A, Petschko H, Leopold P (2015). Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput Geosci, 81: 1–11
[14]
Haslett J, Parnell A (2008). A simple monotone process with application to radiocarbon-dated depth chronologies. J R Stat Soc Ser C Appl Stat, 57(4): 399–418
CrossRef Google scholar
[15]
Hazen R (2014). Data-driven abductive discovery in mineralogy. Am Mineral, 99: 2165–2170
[16]
Hennig C (2016). What are the true clusters? Pattern Recognit Lett, 64: 53–62
[17]
Insua T L, Hamel L, Moran K, Anderson L M, Webster J M (2015). Advanced classification of carbonate sediments based on physical properties. Sedimentology, 62: 590–606
[18]
Isabella R, Backman J, Fornaciari E. (2006). A review of calcareous nannofossil astrobiochronology encompassing the past 25 million years. Quat Sci Rev, 25: 3113–3137
[19]
Jain A K (2010). Data clustering: 50 years beyond k-means. Pattern Recognit Lett, 31: 651–666
[20]
Jorgensen B (1987). Exponential dispersion models. J R Stat Soc B, 49: 127–162
[21]
Kabacoff R I (2015). R in Action- Data analysis and graphics with R. San Jose: Manning
[22]
Kohonen T (2001). Self-Organizing Maps. New York: Springer-Vertag
[23]
Lary D J, Alavi A H, Gandomi A H, Walker L W. (2016). Machine learning in geosciences and remote sensing. Geoscience Frontiers, 7: 3–10
[24]
Li Q, Jian Z, Li B (2004). Oligocene-miocene planktonic foraminiferal biostratigraphy, site 1148, northern South China Sea. In: Proceedings of ODP Sci. Results. New York: IODP, 184 (1): 1–26
[25]
Liao T W (2005). Clustering of time series data—a survey. Pattern Recognit, 38: 1857–1874
[26]
Liu Y, Weisberg R H (2005). Patterns of ocean current variability on the west florida shelf using the selforganizing map. J Geophys Res Oceans, 110(C6): 0148–0227
[27]
Liu Y, Weisberg R H (2011). A review of self-organizing map applications in meteorology and oceanography. In: Mwasiagi J I, ed. Self-Organizing Maps—Applications and Novel Algorithm Design. Rijeka, Croatia: Intech, 253–272
[28]
MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In: Le Cam L M, Neyman J, eds. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. San Francisco: University of California, 281–297
[29]
Murphy K P (2012). Machine Learning A Probabilistic Perspective. New York: The MIT Press
[30]
Nakamori T (2001). Global carbonate accumulation rates from cretaceous to present and their implications for the carbon cycle model. Isl Arc, 10(1): 1–8
[31]
Nathan S, Leckie R (2003). Miocene planktonic foraminiferal biostratigraphy of sites 1143 and 1146, ODP leg 184, South China Sea. Proc ODP, Sci Results, 184 (1): 1–43
[32]
Parnell A, Haslett J, Allen J, Buck C, Huntley B (2008). A flexible approach to assessing synchroneity of past events using bayesian reconstructions of sedimentation history. Quat Sci Rev, 27(19): 1872–1885
[33]
Pavlidou E, van der Meijde M, van der Werff H, Hecker C (2016). Finding a needle by removing the haystack: a spatio-temporal normalization method for geophysical data. Comput Geosci, 90: 78–86
[34]
Penn B S (2005). Using self-organizing maps to visualize high-dimensional data. Comput Geosci, 31(5): 531–544
[35]
Pham B T, Bui D T, Prakash I (2017). Landslide susceptibility assessment using bagging ensemble based alternating decision trees, logistic regression and J48 decision trees methods: a comparative study. London. Geotech Geol Eng, 35(6): 2597–2611
[36]
Pham B T, Tien Bui D, Pham H V, Le H Q, Prakash I, Dholakia M B (2016). Landslide hazard assessment using random subspace fuzzy rules based classifier ensemble and probability analysis of rainfall data: a case study at Mu Cang Chai District, Yen Bai Province (Viet Nam). J In Soc of Remote Sensing, 45(4): 673–683
[37]
Philip Chen C L, Zhang C Y (2014). Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci, 275: 314–347
[38]
Romary T, Rivoirard J, Deraisme J (2015). Unsupervised classification of multivariate geostatistical data: two algorithms. Comput Geosci, 85: 96–103
[39]
Sammon J W (1969). A nonlinear mapping for data structure analysis. IEEE Trans Comput, 18: 401–409
[40]
Singh A, Yadav A, Rana A (2013). K-means with three different distance metrics. Int J Comput Appl, 67(10): 13–17
[41]
Srivastava A, Nemani R, Steinhaeuser K (2017). Large-Scale Machine Learning in the Earth Sciences. New York: Chapman and Hall/CRC
[42]
Tse K C, Chiu H C, Tsang M Y, Li Y, Lam E Y (2019). Unsupervised learning on scientific ocean drilling datasets from the South China Sea. Front Earth Sci, 13(1): 180–190
[43]
Wagstaff K L (2012). Proceedings of the 29th international conference on machine learning. San Francisco: California Institute of Technology
[44]
Wang P, Blum P, (2000). 2000 Proceedings of the Ocean Drilling Program, Initial Reports, Vol. 184. Initial Reports. New York: ODP Press
[45]
Wang P, Li Q (2009). The South China Sea–paleoceanography and sedimentology. In: The South China Sea–Paleoceanography and Sedimentology. Berlin: Springer
[46]
Way M J, Scargle J D, Ali K M, Srivastava A N (2012). Advances in Machine Learning and Data Mining for Astronomy. New York: CRC Press
[47]
Whitman J M, Davies T A (1979). Cenozoic oceanic sedimentation rates: How good are the data? Mar Geol, 30(34): 269–284
[48]
Williams R (2011). Earth Science: New Methods and Studies. London: Apple Academic Press
[49]
Wolfe P J (2013). Making sense of big data. Proc Natl Acad Sci USA, 110(45): 18031–18032

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1218 KB)

Accesses

Citations

Detail

Sections
Recommended

/