A study on specialist or special disease clinics based on big data

Zhuyuan Fang , Xiaowei Fan , Gong Chen

Front. Med. ›› 2014, Vol. 8 ›› Issue (3) : 376 -381.

PDF (443KB)
Front. Med. ›› 2014, Vol. 8 ›› Issue (3) : 376 -381. DOI: 10.1007/s11684-014-0356-9
LETTER TO FRONTIERS OF MEDICINE
LETTER TO FRONTIERS OF MEDICINE

A study on specialist or special disease clinics based on big data

Author information +
History +
PDF (443KB)

Abstract

Correlation analysis and processing of massive medical information can be implemented through big data technology to find the relevance of different factors in the life cycle of a disease and to provide the basis for scientific research and clinical practice. This paper explores the concept of constructing a big medical data platform and introduces the clinical model construction. Medical data can be collected and consolidated by distributed computing technology. Through analysis technology, such as artificial neural network and grey model, a medical model can be built. Big data analysis, such as Hadoop, can be used to construct early prediction and intervention models as well as clinical decision-making model for specialist and special disease clinics. It establishes a new model for common clinical research for specialist and special disease clinics.

Keywords

big data / correlation analysis / medical information / integration / data analysis / clinical model

Cite this article

Download citation ▾
Zhuyuan Fang, Xiaowei Fan, Gong Chen. A study on specialist or special disease clinics based on big data. Front. Med., 2014, 8(3): 376-381 DOI:10.1007/s11684-014-0356-9

登录浏览全文

4963

注册一个新账户 忘记密码

1 Background and current situation

Along with the rapid growth and popularization of the Internet, Internet of Things, and cloud computing, data are increasing faster than those during any previous period, data size is getting larger, and data types are becoming more diverse and complex. The big data era emerged unannounced, penetrating into all aspects of society. Big data refer to data sets that are generated from diverse sources with multiple, large, and complex types and potential values that are difficult to process and analyze in a short time [1].

Big data are a phenomenon derived from the rapid development of information technology. When the concept of big data was proposed, the corresponding technologies were developed. Big data technology has been considered an important strategic direction of the informatization of the health sector in many developed countries. The practices in different healthcare systems and market environments in developed countries show that health information sharing and disease analysis using big data technology can improve the efficiency, quality, accessibility, and affordability of healthcare services. The application of big data in the medical industry has been regarded as the future of medical industry informatization [2,3].

Under the traditional model, the management of patients is either doctor centered or disease centered. Therefore, patient information is dispersed, and the overall management of patients cannot be conducted. Most diseases have long duration and are related to multiple diseases. Therefore, the integration and management of disease-related information is needed [4].

The effective data integration model enables big data to meet various patient needs, such as personalized medicine, coordination and communication, patient support and empowerment, and accessibility; provide a superior technology platform; and promote the transformation of the medical model in the aspects of medical research, clinical decision making, disease management, patient participation, and healthcare decision making.

In China’s health and medical field, the application and development of big data technology lag behind. Many medical institutions remain at the traditional business system level. Nonetheless, some institutions have carried out a certain degree of exploration and even made some progress. The application of a big data analysis platform in China’s health care is minimal compared with other countries. The current development is only the beginning.

In terms of clinical diagnosis and treatment, health information technologies related to clinical data acquisition, storage, management, and application are developing fast. Several medical institutions are gradually adapting to this trend. For example, MD Anderson Cancer Center of the University of Texas supported terabyte-level data used to conduct research on tumor pathology, epidemiology, accurate prediction of pathogenesis, and model research.

The analyses of development trends, the prevalent areas of diseases, and the symptom types in different populations are some of the clinical applications of big data. Big data can also be used to determine the causal relationship between risk factors and diseases. Under the new medical model, healthcare industries that use big data expand rapidly, whereas the traditional healthcare industry becomes even more informationized. The evidence-based reference database expands to provide patients with prevention and treatment programs. The disease management model based on big data can monitor clinical indicators, forecast changes in condition, and implement early treatment through data collection and analysis. For example, changes in body weight and pulmonary artery pressure monitored by a built-in device for chronic heart failure patients are provided to remind physicians to take timely treatment and prevent disease progression and longer hospitalization, thus shortening the length of hospital stay, reducing the amount of emergency, and decreasing medical expenses. Research on chronic diseases, such as cardiovascular diseases, has shown a trend of information mining based on vast amounts of complex data, that is, on the basis of all current information systems. A data center based on big data has been built to effectively integrate healthcare business application systems to form a data warehouse, establish a decision support system, and demonstrate achievements using visualization techniques. The application system designed and developed in this framework needs to support the integration to meet the analytic needs of big data on heterologous and heterogeneous massive data.

Big data penetrate into all aspects of basic research, drug development, clinical diagnosis and treatment, and health management. The arrival of the big data era is caused by the wide spread of mobile health management and medical applications. If personalized health services and health network can be integrated with underlying data, research data, and clinical data, the value of information mining and analysis will be incalculable and may support personalized medical treatment and medication. Therefore, the “precise medical treatment” era may soon emerge. Big data still remains both an opportunity and a challenge for clinical medicine. Only by combining information technology and clinical medicine, can huge amounts of data be completely used to address the challenges in various diseases, such as cardiovascular diseases. To meet these challenges, we must update the concept and transform the mode of thinking.

2 Constructing a big medical data platform

In accordance with the heterogeneity and complexity of data sources for clinical diseases (e.g., structured and unstructured data, images, and charts, etc.), the following should be conducted: integrate multiple heterogeneous systems related to clinical disease processing and analysis; set up a basic support environment for big medical data center; study the standardized data elements and data structures of diseases; establish a data analysis environment by restructuring, expanding, and reusing data; and study security and privacy in big data processing of diseases to formulate relevant standards and specifications. The research results provide extension services for a variety of applications in the medical industry. Based on existing huge amounts of clinical data from Chinese and western integrative medicine, the standardized data structures of common diseases are studied to create a platformized and centralized big data-gathering method. Moreover, the following should be implemented: examine the unified data standards and storage model; develop the visualized big data pre-processing tools for diseases; realize the functions of data classification, cleaning, statistics, and basic analysis; integrate various heterogeneous information system resources; and build a clinical information database of Chinese and western medicines. Multi-dimensional data indexes are also established. Taking the value of data mining as the target, a data warehouse of disease is developed to support the establishment of the prediction and intervention model for diseases. Fig.1 is the overall deployment model.

2.1 Construction of a big data platform infrastructure

2.1.1 Overall topology

The basic network construction of the data platform integrates patient-related clinical information systems to ensure that the data acquisition channel is efficient and complete. The platform involves many subsystems and users, and contains a large number of different applications, computing resources, and network communications equipment, of which the platform is the central node. The platform is taken as a data exchange center and also plays a connecting role.

2.1.2 Cloud deployment model

The entire data platform adopts a distributed deployment mode. The building and laying out of a server (cloud) terminal, business services, and data storage should be completed. Future popularization and application can be implemented according to the actual situation as the distributed structure is deployed logically.

2.1.3 Data center construction

A data center is built based on the existing medical information systems. Uniform standards are formulated, and the healthcare business application systems are effectively integrated to form an interconnected medical and health business collaboration network. The data center not only has a platform for core medical application systems to provide computing environment support for all kinds of application systems, but also serves as a core application platform based on the medical data warehouse. Moreover, it carries out centralized data safety management for the healthcare industry by standardizing the industrial network application databases and network hardware interfaces, and provides a wide range of application in the medical industry based on shared data integration.

2.1.4 Construction of a data computing environment

Based on the characteristics of data, the computing environment is improved in the aspects of collection, storage, and speed to enhance the data production capacity of the computing platform. The institutional level platform is built to ensure effective data collection and to standardize data processing. The platform sets up a perfect data acquisition environment. Improvement of the data center focuses on building the storage capacity and security environment. Moreover, the big data technology is deployed in a cloud computing environment to ensure the calculation, storage, and flexible resource deployment through distributed processing, distributed databases, cloud storage, and virtualization technologies. The subsequent Hadoop architecture [5], HDFS file system, Hbase high-performance data access, and Hive high-speed retrieve and flow calculation are based on excellent hardware and software environment for cloud computing.

2.2 Big data integration for heterologous and heterogeneous data

In the collection of massive heterologous and heterogeneous data, the data sources are rich and the data types are diverse. The amount of data to be stored, analyzed, and mined is tremendous, the requirements of data display are high, and the high efficiency and availability of data processing are important.

Traditional data collection and storage have only one source; the amount of data to be stored, managed, and analyzed is relatively small; and most of them can be processed by relational databases and parallel data warehouses. In terms of improving the data processing speed by applying parallel computing, the traditional parallel database technology pursues high consistency and fault tolerance. According to CAP theory, ensuring its availability and scalability is difficult. The traditional data processing method is processor-centered. Conversely, a data-centered mode is required to reduce the expenses for data movement under the big data environment. Therefore, the traditional data processing method cannot adapt to the requirements of big data.

The basic processing flow of big data is not much different from that of traditional data. The main difference is that methods, such as MapReduce, can be applied in each processing link to carry out parallel processing in big data because of the huge amount of unstructured data [5,6].

The parallel processing technique is adopted to increase the data processing speed for big data. The method is designed to realize that the parallel processing of big data uses a large number of low-cost servers, its outstanding advantage is scalability and availability, and it is especially suitable for processing massive structured, semi-structured, and unstructured data.

Distributed processing is conducted for traditional data query, decomposition, and analysis. Tasks are assigned to different processing nodes so parallel processing obtains a stronger capability. As a simplified programming model for parallel processing, it reduces the threshold for the development of parallel applications.

The traditional data reporting mode is replaced by a clinical business support platform system; the flexible definitions of business data content and format are achieved using extract-transform-load (ETL) technology and configurations [7,8]. Relevant information is extracted in real time and on a regular basis from various medical information systems to ensure the authenticity of the information and reduce the burdens of healthcare institutions. Data analysis and presentation systems provide functions of real-time query as well as analysis and statistics. They have high efficiency in medical service supervision.

2.3 Standardization of business processes

In recent years, countries such as the United Kingdom, the United States, Canada, and Australia have invested heavily in medical and health information construction at the state and local levels based on big data. However, even in the United States, where information technology and medical science are extremely developed, data sharing and analysis between different systems and institutions also have problems. The standard of digitalization is a premise to realize healthcare regionalization and to achieve big data success.

Standardization is a difficult aspect of hospital information construction. The standard systems managed and maintained by the big data platform are constructed and released to various environments of accessed application systems through service, communication, and sharing channels provided by the platform. The channel mode for generation, management, and release of standard systems is established using appropriate technologies used by users through service points and business systems of the platform. This mode can fully meet the needs of the widespread use of complex standard systems.

A standardized system for clinical and business data exchange provides a channel for mutual communication. A sound standard system for clinical data exchange is established including “Clinical Medical Business Data Set” and “Data Integration Platform Access Specification.” It can improve the degree of standardization of clinical data, lay the foundation for integrating clinical resources, and promote clinical research. The clinic (emergency) business process is specified to integrate the separate clinical medical business processes originally distributed in various clinical medical business systems, forming a new mode of medical services and providing better clinical medical service for patients.

The standards of a basic data set (i.e., clinical medical documents, data management and exchange specification, business process for system interoperability, and functional standards) provide specifications to ensure the interaction, integration, and sharing of information among various health information systems within the region and promote the standardized application of health information resources.

Fig.2 shows the medical information processing flow.

2.4 Construction of the big data warehouse

The analysis platform for specialist and special disease clinics can be built by a business intelligence method based on the data warehouse. It includes five aspects: data sources, data warehouse, data analysis, visualization, and data applications.

Data sources: Data are acquired from existing patient-related information systems, such as HIS, LIS, PACS, EMR, ECG, ultrasound, and systems to be built such as patient follow-up management systems. The standardized transformation of heterogeneous and heterologous data can be done using the big data acquisition technology. Data standardization and consistency are achieved using ETL tools.

Data warehouse: Multidimensional cubes are established according to the demand of the business models of specialist and special disease clinics to process big data.

Data analysis: The cross analysis of relational and multidimensional data is supported.

Visualization: The display formats are rich; common types are supported; data drilling is supported; tracing back to the original data is possible.

Data applications: According to different business needs, the analysis results are used for clinical decision support, management decision support, and benefit analysis.

2.5 Data analysis and processing

For the analysis of big data in the medical field, a variety of analytical methods for data processing are required because of their complexity.

In consideration of the complexity of big data, reasonable clustering is carried out using a cluster analysis algorithm [9], different categories are described using an explicit method. The data are aggregated into categories to ensure that similarity between categories is minimal, the intra-category similarity is large, and its analytical model can be simplified. In using a principal component analysis algorithm under the principle of securing minimum loss of information, the best overall simplification of multi-variable data is carried out and the dimension is reduced for the high-dimensional variable space.

When studying big data, the effective variable selection is performed for clinical data. The combination of gray model [10] and artificial neural network [11] is constructed to establish disease prediction and serve as an intervention or forecast model. The different algorithms are respectively adopted in both models to predict future data using historical data and to ensure that better prediction accuracy is achieved. Both models also have their own advantages. The gray model is used to fit historical data. The difference in value between historical data and fitting data constitutes the residual sequence, which is revised through artificial neural network model. Basic data predicted by improved gray model are combined with the revised residual sequence. The predicting outcomes of the combined model are finally obtained.

Based on the results of the cluster analysis, gray model, and artificial neural network, the auxiliary decision-making model of clinical diseases is constructed using decision tree algorithm. Data generation is a process that establishes a decision tree and a method that can implement the visualization of rules. Through the analysis on the classical tree pruning algorithm and in consideration of the user’s needs, the descriptive parameters of decision trees are accepted according to different data mining sets, finally obtaining the ideal decision tree model.

3 Clinical model construction

Based on the existing massive clinical data, the information structures of standardized data of common diseases are established, a platformized and centralized big data-gathering method is formulated, and the visual preprocessing tools are provided for the data from specialist and special disease clinics, including function data classification, cleaning, statistics, and basic analysis. On this basis, the big data analysis techniques, such as Hadoop, can be used to construct early prediction and intervention models as well as a clinical decision-making model for specialist and special disease clinics.

Research on the evaluation methods for traditional Chinese medicine (TCM) have focused on “disease-symptom-prescription-efficacy” and the difficulty in the theoretical system development of TCM. For a long time, the index system of western medicine has been adopted to measure the efficacy of TCM and Chinese medicine to illustrate the effectiveness and scientific nature of TCM clinics or research. However, the characteristics of theoretical systems have been ignored, such as the whole view of Chinese medicine and syndrome differentiation and treatment. Unlike western clinical studies that emphasize cause and effect, TCM research is more inclined to focus on the correlation analysis. The theoretical research on traditional TCM features cases of insufficient acquisition of correlation factors, predominated qualitative indicators, and individualization. It is also largely different from the method of evidence-based medicine.

Big data analysis technique is suitable for collecting and processing massive complex data and solving the problem of insufficient data accuracy to determine data correlation. This research model can carry out cross-correlation analysis and correlation analysis with a data-driven method, create a business model with a demand-driven method, and conduct data validation. Therefore, it can develop a new model for common clinical research on specialist and special disease clinics.

Through the establishment of the model, business and data are linked to form a system platform. Its results and conclusion reflect the clinical and scientific research needs. Through the data-driven and demand-driven methods, information such as living habits, clinical symptoms, medical histories, physical examination in Chinese and western medicine, auxiliary examination indexes, western medicine diagnoses, and TCM syndromes can be collected to establish data association relationships, conduct data analysis, and build clinical prediction, clinical intervention, and clinical decision support models.

Currently, studies are being carried out to analyze the association between climate factors and diseases using big data methods and techniques. The outpatient attendance from various departments may be associated with seasonal variation so specialized staff and out-patient clinics should be reasonably allocated.

4 Summary

Overall, a big data-based processing and analysis platform for specialist and special disease clinics combines big data technologies with relevant standards such as treatment and intervention guidelines of various diseases; considers the clinical user’s needs through the practices of the platform; addresses the problems of scattered medical information, insufficient information mining, and coupling of clinical services with information technology; improves the treatment level of diseases; and enhances the utilization efficiency of information resources.

This platform for specialist and special disease clinics is designed to maximize the use of the existing information and mine the value of information, thus finding the relevance of different factors in the life cycle of the disease, provide the basis for scientific research and clinical practice, ensure medical quality and safety, enhance the overall quality of medical services, improve healthcare access, reduce healthcare costs, and decrease medical risks.

References

[1]

Douglas L. The Importance of “Big Data”: A Definition. Gartner. Retrieved 2012-June-21

[2]

Liu BY. A navigational chart for contemporary traditional Chinese medicine to be drawn based on big data. China News Tradit Chin Med (Zhongguo Zhong Yi Yao Bao) 2013-5.3 (in Chinese)

[3]

Liu L. Biomedicine of big data age. Commun CCF (Zhongguo Ji Suan Ji Xue Hui Tong Xun) 2013; 91(9): 16–19 (in Chinese)

[4]

Wang WQ, Krishnan E. Big data and clinicians: a review on the state of the science. JMIR Med Inform 2014; 2(1): 1–11

[5]

Borthaku D. The Hadoop Distributed File System: Architecture and Design. [last published 2008-–10-17]

[6]

Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM 2008; 51(1): 107–113

[7]

Gong QY, Li X, Liu D, Fan XE. Application research of ETL technology in public health data on sharing. Chin J Public Health Eng (Zhongguo Wei Sheng Gong Cheng Xue) 2009; 8(1): 54–56 (in Chinese)

[8]

Wang KL, Wang L, Wang PL, Song B. Research on technology of ETL in data warehouse and its practice. Comput Appl Softw (Ji Suan Ji Ying Yong Yu Ruan Jian) 2005; 22(11), 30–31, 78 (in Chinese)

[9]

Xia HS. Data Warehouse and Data Mining Technology. Beijing: Science Press, 2004: 165 (in Chinese)

[10]

Deng JL. Grey Theory. Wuhan: Huazhong University of Science and Technology Press, 2002 (in Chinese)

[11]

Han LQ. Artificial Neural Network Tutorial. Beijing: Beijing University of Posts and Telecommunication Press, 2007 (in Chinese)

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (443KB)

2559

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/