A review of systematic evaluation and improvement in the big data environment

Feng YANG , Manman WANG

Front. Eng ›› 2020, Vol. 7 ›› Issue (1) : 27 -46.

PDF (308KB)
Front. Eng ›› 2020, Vol. 7 ›› Issue (1) : 27 -46. DOI: 10.1007/s42524-020-0092-6
REVIEW ARTICLE
REVIEW ARTICLE

A review of systematic evaluation and improvement in the big data environment

Author information +
History +
PDF (308KB)

Abstract

The era of big data brings unprecedented opportunities and challenges to management research. As one of the important functions of management decision-making, evaluation has been given more functions and application space. Exploring the applicable evaluation methods in the big data environment has become an important subject of research. The purpose of this paper is to provide an overview and discussion of systematic evaluation and improvement in the big data environment. We first review the evaluation methods based on the main analytic techniques of big data such as data mining, statistical methods, optimization and simulation, and deep learning. Focused on the characteristics of big data (association feature, data loss, data noise, and visualization), the relevant evaluation methods are given. Furthermore, we explore the systematic improvement studies and application fields. Finally, we analyze the new application areas of evaluation methods and give the future directions of evaluation method research in a big data environment from six aspects. We hope our research could provide meaningful insights for subsequent research.

Keywords

big data / evaluation methods / systematic improvement / big data analytic techniques / data mining

Cite this article

Download citation ▾
Feng YANG, Manman WANG. A review of systematic evaluation and improvement in the big data environment. Front. Eng, 2020, 7(1): 27-46 DOI:10.1007/s42524-020-0092-6

登录浏览全文

4963

注册一个新账户 忘记密码

Introduction

Background

With the development of the Internet of Things (IoTs), cloud computing, wearable devices, and social media, big data has become ubiquitous. At present, the global data volume is growing exponentially. Advancements in technologies such as cluster computing and cloud computing have made storage, analysis, sharing, and distribution of big data easier and cheaper (Wani and Ashtankar, 2017). Many companies have realized the huge business value contained in big data and combined their respective advantages to use it. For example, Amazon analyzes customers’ previous orders, shopping cart information, browsing history, store collections, and other information to predict whether to order so that goods can be shipped to nearby areas of potential customers in advance to reduce shipping time. ZARA timely feeds back to its designers and production departments a large amount of customer feedback collected from its online store and experience information from offline brick-and-mortar customers to adjust product style and output. The use of big data technology to provide support for management decisions has become a hot topic in current management research.

As one of the most important functions of management, evaluation is a process of measuring the attributes of things and giving reliable conclusions by some data and methods based on certain goals and standards. In a big data environment, the data has the unique features of “massive, high dimensional, heterogeneous, unstructured, incomplete, noisy, and erroneous”. The traditional evaluation method is suitable for limited static sample data, and it is difficult to adapt to the big data with high-frequency dynamic changes. However, evaluation has a positive value. First, evaluation can directly support management decisions, such as organizational performance evaluation, employee performance evaluation ratio, asset value evaluation, academic evaluation, qualification evaluation, credit rating, etc. Second, evaluation can provide information support for other methods and indirectly assist in complex management decisions, such as risk assessment for insurance pricing decisions, reliability assessment for equipment management, customer perceived value assessment for marketing decisions, and health assessment for diagnosis and treatment decisions. Therefore, evaluation methods in a big data environment need to be studied urgently.

Furthermore, evaluation is widely used in the improvements of various systems, for example, the performance evaluation of the supply chain system, the strategic analysis of the inventory system, the management of the risk system, the sales forecast of the marketing system, the ranking recommendation of the online platform, assessments of disease diagnosis and treatment in medical systems, the road prediction of the transportation system, and safety assessment of fire protection systems. Combining big data analysis technology can have a more scientific understanding of system evaluation, thereby improving the system and creating better benefits. There are still some difficulties and challenges in the application of system evaluation and decision making in the big data environment. How to take advantage of the development opportunities brought by the big data environment, research the system evaluation work in-depth, and explore the system evaluation and improvement methods in the big data environment has important theoretical and practical significance.

At present, comprehensive analysis and research of evaluation methods and systematic improvement in a big data environment are lacking. This article reviews the systematic evaluation methods and improvement researches in the big data environment from four aspects. First, we explored and reviewed the evaluation methods based on big data analytics techniques. Second, focused on the different data characteristics, we reviewed the evaluation methods based on the characteristics of big data. Third, we focus on the improvement methods of different systems in the big data environment. Fourth, the application areas of assessment activities in the big data environment are given. Finally, we summarize the new application areas of evaluation methods in the big data environment and the future research directions of evaluation methods. Through the research in this article, we hope to break through the limitations of traditional evaluation, have a clear understanding of the current state of system evaluation and improvement research in the big data environment, and provide a reference for future researchers.

Literature review search strategies

A search within the timeframe ranging from 2011 to 2019 was considered to represent the period covering the emergence of “big data”. We first focus on searching via the following databases: Informs, Elsevier, Taylor and Francis, Wiley, and IEEE Xplore. We use major keywords such as “big data, data-driven, data analytic techniques, evaluation methods, assess, assessment, systematic improvement, data mining, statics, optimization, simulation, deep learning, association rules, missing data, data loss, data noise, visualization, systematic assessment” to search related literature, supplemented by secondary keywords like “clustering, classification, operations, supply chain management, inventory management, risk assessment, risk analysis, marketing, forecast, system”. Based on our experience in the field and related research in the literature, we also get some additional papers. We focus on a list of journals that are considered to be the leading journals in the operation and management fields: Management Science, Journal of Operations Management, Operations Research, Production and Operations Management, Decision Sciences, Manufacturing and Service Operations Management, Marketing Science, Information Systems Research. The search results show that the evaluation methods are widely used in business value, retail forecasting, risk analysis, inventory management, transportation route planning, consumer preferences, product categories, and disease forecast and assessment.

Our search started on November 13, 2019, and ended on December 25, 2019. Regarding the literature retrieval process, we take the literature review in Section 2.1 as an example. Specifically, we first focused on searching via Informs advanced search using major keywords such as “big data, data mining”, supplemented by secondary keywords like “clustering, classification, association rule”. For example, when we input primary keyword “data mining” and the timeframe ranging from 2011 to 2019, the initial search resulted in 32 articles. These references, including the abstracts of all articles, have been downloaded into Endnote, a reference management software package, for further analysis. A co-author then conducted the screening of the abstract of each article to assess its relevance with our research goals. We obtained the related literature (Lutu and Engelbrecht, 2013; Bai et al., 2015; Das et al., 2016; Liu et al., 2016b). Afterward, secondary keywords “clustering, classification, association rule” were input and several joint papers were held (Jagabathula et al., 2018; Kopcso and Pachamanova, 2018; Roy et al., 2019). Subsequently, supplement with Google Scholar searches, we found the relevant reference (Hastie et al., 2005; Choi et al., 2018; Sato et al., 2019). At the end of this process, 11 articles were deemed relevant for evaluation methods based on data mining. The literature search process in other parts is similar, the only difference is that the literature search platform and the concerned journal are different. For example, Section 3 used Google Scholar more. In Sections 4 and 5, more articles are retrieved in Taylor and Francis, and Wiley.

Evaluation methods based on big data analytics techniques

Big data processing techniques are different when it comes to data from different sources. For example, data can come from mobile devices, the web, social media, and cloud platforms, and their formats can be text, graphic, images, and videos. Therefore, there are terms of text analytics, web analytics, social analytics, and multimedia analytics (Hu et al., 2014). Big data techniques involve a number of disciplines, including statistics, data mining, machine learning, neural networks, optimization methods, and visualization approaches (Chen and Zhang, 2014). There are many specific techniques in these disciplines, and they overlap with each other to some extent. Traditionally, data mining and deep learning techniques are widely used in forecasting, revenue management, marketing, and risk analysis. Statistical methods are powerful tools for business intelligence analytics (Sivarajah et al., 2017). Optimization and simulation are widely accepted tools to improve system performance, applying to inventory management and supply chain analytics (Wang et al., 2016). We focus on the application of big data analytic techniques in business intelligence, supply chain management, and operations management. Similar to the literature of Choi et al. (2018), we overview the evaluation methods in the big data environment based on four analytic techniques, including data mining, statistics, optimization and simulation, and deep learning.

Evaluation methods based on data mining

Data mining is to extract information from a data set and transform it into an understandable structure for further use. Commonly used data mining methods are classification, clustering, association rule analysis, estimation, and prediction (Hastie et al., 2005). Data mining plays an important role in business intelligence and big data analytics (Choi et al., 2018).

Focused on the problem of product assortment, Bai et al. (2015) present a method for assortment planning and optimization for multiple stores that can identify the optimal product classification for each store and allow analyses of classification efficiency assessments in all existing stores. Their methodology offers solutions on product assortment for complements versus substitutes and conducts sales efficiency evaluation and assortment optimization. Considering the potential values from online social platforms, such as Google Trends, Twitter assessments, IMDb (Internet Movie Database) reviews, Wikipedia views, and Huffington Post news, Liu et al. (2016b) conduct massive analyses on of nearly 2 billion Tweets and 400 billion Wikipedia pages and conclude that extracting and sorting from information of online platforms can reflect consumer intent in a timely manner, which has critical implications for forecasting purchases. Kopcso and Pachamanova (2018) design an example where predictive analytics is used to determine the input to a customer service specification model. They then illustrate how calculations of business value for business stakeholders. They evaluate the level of an organization’s maturity by using a predictive and prescriptive analytics model. Jagabathula et al. (2018) develop a method based on the embedding technique that takes the customer’s observations and probability classes generating the observations as inputs and outputs the embedded results for each customer. They show that this method outperforms empirical Bayesian, standard latent class, and demographic-based techniques.

In addition, the evaluation methods based on data mining include quantitative attributes based on text mining (Das et al., 2016), positive-versus-negative (pVn) classification (Lutu and Engelbrecht, 2013), a method for evaluating different platforms (Roy et al., 2019), and knowledge discovery tool (Sato et al., 2019). The relevant evaluation methods in this section are reviewed in Table 1.

Evaluation methods based on statistics

Statistics, as the most fundamental technique for data analysis, exist in almost all subject areas of research. The statistical methods commonly used in management evaluation include regression analysis, maximum likelihood estimation, Bayesian estimation, and Markov stochastic process. With the maturity of big data analysis technology, evaluation work based on statistical methods is also increasing.

To improve the accuracy of sales predictions in the tire industry, Sagaert et al. (2018) propose a forecasting method, which can automate the identification of key leading indicators, driving to generate accurate forecasts. In their case study, the accuracy of their proposed method improves by 16.1%. Furthermore, this method also can handle external indicators of short-term and long-term dynamics. In analysis management, Jiang et al. (2019) propose a logistic regression model using data generated in past simulation experiments to estimate portfolio risk and classifying portfolio risk levels in real-time. They show that the simulation analytics idea is viable and promising in the field of financial risk management. Chehrazi and Weber (2015) construct a dynamic collectability score (DCS) that can be used to estimate the probability of delinquent credit-card accounts. The DCS framework is applied to a large set of account-level repayment data. Compared to standard bank-internal scoring methods, the DCS framework has significant improvements in classification and prediction performance. A new stochastic variational Bayesian (SVB) approach is used to estimate movie ratings and semantic tags by a large data set. The approach is very useful in actual recommendation contexts (Ansari et al., 2018). Other evaluation methods are used in transport risk management (Shang et al., 2017), ranking and selection (Salemi et al., 2019), audit quality (DeFond et al., 2017), project portfolio optimization (Yang et al., 2015), and stock-keeping unit (SKU)-clustering problem (Park et al., 2017). The relevant evaluation methods are shown in Table 2.

Evaluation methods based on optimization and simulation

The evaluation of the algorithm is mainly considered from the time complexity and space complexity. The optimization algorithms commonly used in management evaluation mainly include gradient descent method, simulated annealing method, Newton method, and quasi-Newton method (Simon, 2013).

Considering the problem of travel time and routing plan, Bertsimas et al. (2019a) leverage a simple approach to solve the travel time estimation and route planning problem in a real-world situation. Given travel times for any number of origin-destination pairs, the method can estimate the travel time as well as provide a sensible path associated with this travel time. Their algorithm is robust against a high input uncertainty and can successfully exploit noisy data to provide results characterized by their accuracy. Based on flow procedures, Hochbaum (2018) proposes a combinatorial method to solve efficiently the classification problem as a network flow problem on a graph, which has higher accuracy and shorter running time in pattern recognition, image segmentation, and general data mining. Focused on the design of mechanisms for a sequencing problem, Hoeksma and Uetz (2016) combine an exponential size linear programming with a convex decomposition algorithm to find the optimal linear programming solutions. Increasing the integration of local generators is a challenge in the planning, design, and operation of the distribution system. Naghdi et al. (2018) present a quasi-Newton trust-region algorithm to evaluate the planning, design, and operation of the distribution system. Two networks were used for testing, and the obtained results revealed the accuracy and validity of the proposed method. Huang et al. (2019) develop a novel two-stage data-analytic method that can serve as a template for modeling customer-firm interactions. The application of a new method can improve decision making in real-time. Their paper is one of the first studies to examine the evolution of player participation based on motivational factors using observational data.

Other evaluation methods based on optimization algorithms and simulation include stochastic annealing (Ball et al., 2018), reserving relief supplies for earthquake (Yang et al., 2016b), the routing optimization algorithm (Bertsimas et al., 2019b), and evaluation of recommender systems (Adomavicius and Zhang, 2016). The relevant evaluation methods are reviewed in Table 3.

Evaluation methods based on deep learning

Deep learning is an algorithm based on representational learning of data in machine learning. Typical deep learning models include the convolutional neural network (CNN), deep neural network (DNN), long short-term memory (LSTM) network and more. Deep learning also brings many development opportunities for evaluation work.

To explore the nature of intertemporal cross-product patterns, Xia et al. (2019) propose a conditional restricted Boltzmann machines (CRBM)-based model in an enormous consumer purchase data set. By using the proposed model, retailers can potentially capture and predict each consumer’s complex shopping patterns with greater accuracy for personalized marketing. Focused on the quality assessment of Wikipedia, Wang and Li (2019) select state-of-the-art deep-learning models to conduct quality evaluation from classification performance and training performance and validate the effectiveness of the proposed model.

Considering finance risk, Borovkova and Tsiamas (2019) propose a long short-term memory (LSTM) neural network for intraday stock predictions by many technical analysis indicators. They evaluate the predictive power of their model on several US large-cap stocks and find the proposed model has better performance than the benchmark models or equally weighted ensembles. Accurate prediction of forex rates is an essential element of an effective response to hedging or speculation strategies in the forex market. Galeshchuk and Mukherjee (2017) explore the ability of deep convolution neural networks to forecast the direction of forex rates change, finding that trained deep networks can achieve satisfactory prediction accuracy. In addition, a hybrid architecture based on deep learning (Amorin et al., 2019), the impact of personality similarity on subsequent purchase (Adamopoulos et al., 2018), and a deep-learning approach to identify customers’ needs (Timoshenko and Hauser, 2019) are studied. The relevant evaluation methods are shown in Table 4.

Evaluation methods based on big data characteristics

Big data are huge and complex which are difficult to process using traditional data-warehousing tools (Kalbandi and Anuradha, 2015). Organizations often collect data from internal and external sources. Internal sources usually provide data related to their internal operations and business processes and external sources are provided by suppliers, retailers, customers, and market information (Geczy, 2014). Data from different sources are often interconnected. In this paper, we focus on the characteristics of big data include the association features between data, data quality (data loss and data noise), and visualization features.

Evaluation methods based on data association features

Association rule is a way to discover the hidden relationships between variables in large databases (Agrawal et al., 1993). Traditionally, input data sets can come from mobile devices, the web, social media, and cloud platforms, and their formats can be text, graphics, images, and videos. The premise of systematic evaluation and improvement in a big data environment is to understand the features of input data sets. If the features of the input data set are obvious, they can be modeled and analyzed by traditional statistical and econometric methods (Chen and Zhang, 2014). However, it is not obvious which features should be input. There are many methods for data feature selection (Bennasar et al., 2015; Abedinia et al., 2017; Ambusaidi et al., 2016). Association rules are widely used to discover hidden patterns from large databases and find interesting knowledge and information. This makes sense for nominal features and constructing full-fledged models. Cang and Yu (2012) developed a fitness function based on association rules, which has been shown to be effective for input feature selection. It systematically improves the generalization ability of the evolution model. Based on the data association features, the management field has also carried out a large number of evaluation activities.

Focused on customer churn problem in retail sales, Aung et al. (2019) apply the FP (Frequent Pattern)-Growth method to the customer churn data set. They develop a customer churn prediction model to help the retail company to make decisions on estimating the loss of clients or the promotion activities. Zhang et al. (2019) propose an improved method based on association rule to evaluate energy efficiency, and they show that the proposed approach is effective in outlier identification and data transformation. Considering a large amount of associated data in the audit business, Parkinson et al. (2016) develop a novel method of modeling file system permissions to evaluate auditing efficiency. Their method can correctly identify irregularities with an average accuracy rate of 91%, minimizing the reliance on expert knowledge.

Association rule-based evaluation is also used in many other fields, including risk management (Bhatia, 2019), the electric vehicle data anomaly detection clinical (Wang and Wu, 2019), and bioinformatics (Boudellioua et al., 2016). Besides, some researches focus on developing new mining methods for association rule (Feng et al., 2016; Czibula et al., 2019). The relevant evaluation methods are shown in Table 5.

Evaluation methods considering data loss

Due to technical, human, and user privacy reasons, there is often a large number of missing data. The lack of data will hinder the subsequent analysis of big data and then affect decision-making. Soley-Bori (2013) proposes basic concepts and methods for dealing with missing data. After explaining the missing data mechanisms and missing patterns, the author reviews some main conventional methods in data analytics, including imputation methods, multiple imputation, maximum likelihood, Bayesian methods, and listwise deletion. Advantages and limitations are listed so that the reader can identify the main trade-offs when using each method. Likewise, Graham et al. (2012) review the methods for handling missing data from the view of psychology. Little and Rubin (2019) review some methods for handling missing data, including imputation, multiple imputations, and maximum likelihood.

Clinical data sets often suffer from high missingness, which seriously impacts the diagnosis and prediction of disease. Imputing missing values provides an opportunity to resolve the issue. Based on imputation methods, Wu et al. (2019b) propose a machine learning method that can improve the quality of breast cancer datasets. The results reveal that the proposed method gains strong robustness and discriminant power even the data set experiences a high missing rate (>50%). It always happens the loss of relevant information when aggregating the high-frequency traffic collision data into the lower frequency. Li et al. (2019) propose a vector auto-regression (VAR) approach to evaluate traffic collision and they show that the proposed VAR demonstrates better performance than other missing value imputation techniques. In evaluating the impacts of products and processes, Moreau et al. (2012) develop a statistical approach to carry out a life cycle assessment. The authors show how missing data of material and energy flows to evaluate the hydropower plants. Jia and Wu (2019) use Monte Carlo simulation to assess the five methods for dealing with data loss and show robust full information maximum likelihood (RFIML) and MI-LV (multiple imputation-latent variable) combined with cat-DWLS (diagonally weighted least squares) seemed the best methods. In addition, evaluation activities based on missing data are also used in many other fields, including clinical endpoint bioequivalence (Lou et al., 2019), estimating men’s fertility (Dudel and Klüsener, 2018), and power system (Yang et al., 2020). The relevant evaluation methods are shown in Table 6.

Evaluation methods considering data noise

There are often various interference factors in the data collection process, which makes noise in the original data we obtain (Ilow and Hatzinakos, 1998). Noise can affect the data analysis to varying degrees. Many big data analysis techniques use algorithm iteration to obtain the optimal solution. If the data set contains a lot of noisy data, it will greatly affect the convergence speed of the data and the accuracy of data analysis.

Recent evaluation work based on noisy data has also made some progress. Principal component analysis (PCA) is one of the powerful dimension reduction techniques, but PCs are still contaminated with noise in the data. Rezghi and Obulkasim (2014) proposes a noise-free PCA (NFPCA) method by introducing regularization to mitigate the effect of noise. And the authors show that NFPCA produces highly informative than the ordinary PCA method and it has a lower computational cost. Leveraging the encoder-decoder framework for neural machine translation, Zoph et al. (2016) propose a transfer learning to assess the performance of machine translation. The results show the transfer learning mode can improve the syntax based on the machine translation by an average of 1.3 BLEU (Bilingual Evaluation Understudy). Van Vliet and Salmelin (2020) present a framework that decomposes the weight matrix of a fitted linear model into three subcomponents and develop a post-hoc modification of linear models. They show that the decoding accuracy of two example linear models can be boosted by incorporating the information. Furthermore, the relevant evaluation methods including a self-organizing incremental neural network approach (Wiwatcharakoses and Berrar, 2019), a new modeling and two-dimensional mapping approach (Ball et al., 2018), a Bayesian approach to online robust parameter design (Huang et al., 2017), and an evaluation of typical flow cytometry (Cao and Grima, 2019). The relevant evaluation methods are reviewed in Table 7.

Evaluation methods based on visualization (convert big data into small data)

The term “big data” is related to machines, while “small data” is related to people. Small data are data that is easy to access and the process in capacity and format contains useful information and is understandable by humans (Kitchin and Lauriault, 2015). A common method to transform big data into small data is to visualize big data, such as histograms, violin plots, heat-maps, and scatter plots (Gatto et al., 2015).

With the improvement of data availability, using big data to help managers make scientific decisions has become a trend. France and Ghose (2016) introduce a statistical likelihood to evaluate visualizing submarkets in product categories. A series of experiments show their method is better at identifying market structure than other methods described. Likewise, Ringel and Skiera (2016) also focus on the visualized competitive structure in large markets. They integrate large-scale data into new modeling and two-dimensional mapping methods, enabling users to visualize asymmetric competition in large markets and identify different submarkets. An empirical application of the LED-TV market with 1124 products and 56 brands resulted in valid and useful insights and showed that their method outperforms traditional models. Aiming to establish a new approach identifying cultivars of Chrysanthemi Flos (CF), Nie et al. (2019a) develop a multimodal quantitative method combining principal component analysis (PCA) and similarity evaluation system (SES) to identify four cultivars of CF. The results show that the comprehensive method is effective. Furthermore, other evaluation methods based on data visualization can be seen in the literature Nie et al. (2019b)and Rajwan et al. (2013). The relevant evaluation methods are shown in Table 8.

Systematic improvement methods in the big data environment

With the rapid development of big data technology, massive data information can be tracked, collected, and utilized. The analytics techniques can provide new knowledge on their own without human intervention, helping decision-makers understand and predict consumer behavior (Dhar, 2013). Studying the improvement methods of different application systems is of great significance to promote system evaluation and optimization. In this section, we focus on systematic improvement methods under big data processing techniques and information characteristics and review some hot operations management problems.

Systematic improvement methods based on big data processing

With the widespread applications of smart service systems in the fields of home, transportation, energy, and healthcare sectors, Lim and Maglio (2018) combine metrics and machine learning algorithms to preprocess and analyze text data from smart service systems. Based on an analysis of 5378 scientific articles and 1234 news articles, they establish a common evaluation ground for understanding modern service systems. Considering current ranking algorithms in social media platforms ignore consumers’ multidimensional preferences for products, Ghose et al. (2012) propose a new method to improve the hotel system ranking. They use a real data set from a website and show the qualitative comments are the first step in text mining. Liu et al. (2016b) conduct an extensive analysis of nearly 2 billion Tweets and 400 billion Wikipedia pages, concluding that extracting and sorting from the information on online platforms can provide a timely representation of consumer intentions, which has important implications for forecasting purchases.

Statistics is a mature area whose purpose is to provide a scientific framework for collecting, analyze, and conclusion (Choi et al., 2018). Distelhorst et al. (2017) analyze an intervention by Nike to promote the adoption of lean manufacturing in its apparel supply chain across 11 developing countries. They find that lean manufacturing and high involvement work practices can improve social performance by estimating from a panel of more than 300 factories. Ramasubbu and Kemerer (2016) analyze the impact of technical debt on system reliability by utilizing a large-scale longitudinal data set. Their empirical results illustrate how firms could evaluate business risk exposure due to technical debt accumulation, and they also assess the estimated net effects. Bai et al. (2012) provide an effective and viable means for managing the risk associated with data quality in accounting information systems. Compared to previous approaches to data quality risks, their methodology is more cost effective and easier to implement.

As a standard analytical approach, optimization and simulation can solve the optimal (or near-optimal) solutions in quantitative decision-making problems. Naghdi et al. (2018) present a quasi-Newton trust-region algorithm to evaluate the planning, design, and operation of the distribution system. Two networks were used for testing, and the obtained results revealed the accuracy and validity of the proposed method. Ansari et al. (2018) develop a new stochastic variational Bayesian (SVB) approach for scalable estimation and used it to estimate movie ratings and semantic tags by a large data set. Their approach is very useful in actual recommendation contexts. Focused on the transportation system, Buijs et al. (2016) propose an evaluate method to structure and improve Fritom’s existing collaborative transport planning process.

Deep learning models mainly include neural networks. Sun and Vasarhelyi (2018) develop a deep neural network (DNN) that can assess the risk of credit card defaults based on the personality characteristics and spending behaviors. Compared with other machine-learning algorithms, DNN has a higher F score and better overall predictive performance. Flight delays are another common problem in the transportation field, which seriously affects the travel experience. Chung et al. (2017) use a large data set from a major Hong Kong airline to analyze flight delays at 112 airports around the world. They leverage a cascading neural network to improve the flight schedule prediction, and then applied it to the crew optimization problem. The new method can improve the accuracy of flight delay prediction, which greatly improves the crew matching performance.

Systematic improvement based on data processing technology usually involves a large amount of data for secondary processing, which gives some management inspiration. The improvement of systems based on big data processing technology requires more in-depth discussions to dig out more valuable information. The relevant systematic improvement researches are shown in Table 9.

Systematic improvement method based on information characteristics

As an abstract and intangible special resource, information has a corresponding use-value, and it can meet people’s needs in some aspects. Reasonable extraction of information can effectively improve decision-making (Adnan and Akbar, 2019). In a big data environment, information sharing, disclosure, and security issues are particularly important. In this part, we focus on the systematic improvement research under the characteristics of information sharing, information disclosure, and information privacy.

Li and Gu (2019) propose an integrated approach for the support system to allow users to query data simultaneously from both relational Standard Query Language (SQL) systems and NoSQL (not only SQL) systems. The proposed approach can effectively reduce development complexity and improve development efficiency. Kishore et al. (2020) link call detail records (CDR) with influenza-like illness (ILI) registry and evaluate the role that international travelers played in the introduction of epidemics (A/H1N1). Their methods carry out a similar assessment of domestic airports and the system efficiency is improved significantly. To explore how the individuals’ valuations change in the presence of multiple privacy factors, Buckman et al. (2019) use an incentive-compatible mechanism to capture individuals’ willingness to accept disclosure. The results show participants’ privacy valuations are largely unaffected by requiring the disclosure of personal identifying information, the information context, and the intended secondary use of the disclosed information.

In addition, to examine trends in academic research on personal information privacy, Choi et al. (2017a) extract 2356 documents published between 1972 and 2015 (by August) to carry out topic modeling using Scopus database. They show the topics of algorithms, online social networks, and Facebook privacy have become promising. Moreover, the top journals put more attention on both e-business and healthcare. The system operations based on information characteristics are shown in Table 10.

Supply chain management and operations

With the improvement of enterprise resource planning (ERP) software, capturing and storing data at different levels of operations become easier. Companies also hope more efficient processes by analyzing this data. Big data analytics techniques are widely used in supply chain management and operations to make smarter decisions (Wamba et al., 2015). Key areas of operational management in a big data environment include supply chain and logistics management, inventory management, retail forecasting, and risk assessment.

Big data has a significant impact on supply chain management. Researchers have studied capacity sharing contracts where the demand is uncertain (Yang et al., 2017a). Wu et al. (2019a) examine the relationship between data analytics capabilities and innovation. The results suggest data analytics capabilities are most strongly associated with innovation. Firms might receive the most benefits from using data analytics if they historically focused on specific types of innovation. Newman et al. (2014) propose a parameter estimation routine for multinomial logit discrete choice models. Their method is computationally efficient and can easily incorporate price and other product attributes. They simulate the hotel industry data and demonstrate their method has superior computational performance over alternative estimation methods that are capable of estimating price effects. Focused on the sustainability of supply chain systems, Badiezadeh et al. (2018) propose a “double frontier network data envelopment analysis (DEA)” to assess the sustainability of supply chain systems in a big data environment. Their method can rank the sustainability scores of supply chains. Considering strategic customers, Yang et al. (2019c) investigate the impact of selling effort on the pricing decisions of the supply chain.

Inventory management is an important topic in operations management (OM). Considering the inventory control problem, Huang and van Mieghem (2014) develop a decision support model, which can reduce 3% inventory holding cost and 5% back-ordering cost. Likewise, Bertsimas et al. (2016) explore the inventory control issue by conditional stochastic optimization method in a big data environment. They analyze 4-year inventory data from the retail network, obtaining the optimal inventory management scheme for a retail company with multiple inventory and retail locations.

Retail forecasting has been another key area of research for the OM, especially in multi-channel retail (Mehra et al., 2018). In the research of traditional retail, the forecast more depends on historical data, expert advice, and market information. However, in a big data environment, increasing available information sources can potentially enhance prediction performance. In recommendation systems, some firms reduce consumer search workloads by using big data technologies (Dutta et al., 2017). Cui et al. (2018) propose a comprehensive system that predicts the total daily sales of online men’s exclusive retailer though feature extractions and machine learning, finding the system can improve the forecast accuracy by utilizing social media inputs. They analyze data posted by Facebook (recording the company’s interactions with customers) to supplement operational data to generate sales forecasts. Furthermore, the service optimization of after-sales operations also influenced by big data analytics. Boone et al. (2018) add customer search data to time series models to improve out-of-sample forecast errors, finding that the models added with data from Google Trends can improve sales forecast accuracy for online specialty food and cookware retailer, especially in multiple products. Yang et al. (2014) use a DEA method to forecasting the production abilities of recycling systems and show the reuse level of water in China is still low.

Risk assessment, for both business operations and non-profit organization, would benefit from the advancements of big data techniques (Choi et al., 2017b). To assess the risk of having rail failures, Jamshidi et al. (2017) develop a novel method to explore rail surface defects. They collect big data on track surface defects through intelligent image processing, which includes measurable lengths of these defects. Finally, they conduct a practical case study on Dutch railways and find that the railway fault assessment system they proposed performs well. Considering the risk assessment of procuring infrastructure mega-projects, Chan et al. (2018) develop a fuzzy evaluation model and demonstrate the practicality of the risk evaluation model by analyzing Hong Kong–Zhuhai–Macao Bridge project. Supply disruption affects the efficiency of the supply chain. Yang et al. (2019b) find the licensing strategies can effectively reduce the negative effects caused by the risk of interruption, improving the performance of the supply chain. The evaluation of environmental impact indicators for sustainable maritime transportation systems is studied (Lizzette et al., 2019). The relevant supply chain management and operations are shown in Table 11.

Application research of system evaluation in the big data environment

In this section, we discuss the applications of system evaluation in the big data environment in the medical industry, finance, business, information systems, transportation, and other areas. Figure 1 summarizes the key areas and future researches.

Medical industry

Since big data can deal with massive data volume and variety at high velocity, it has the potential to create significant value in healthcare by improving outcomes while lowering costs. Data analytics plays an important role in improving the quality of care, increasing the efficiency of operational processes, predicting and planning responses to disease epidemics, and optimizing healthcare spending at all levels (Nambisan et al., 2015). Hence, we explore the applications of system evaluation on the medical industry from medical image informatics, healthcare management, and privacy concerns.

The medical image informatics can help the doctor better diagnose the disease. Insights derived from the analysis of patient data can help healthcare professionals better identify disease symptoms and predict the cause and incidence of disease, ultimately improving the overall quality of care (Genta and Sonnenberg, 2014). Nambisan et al. (2015) focus on social media communications to identify individuals suffering from depression. In the field of healthcare management, Hydari et al. (2018) analyze a new patient data set from the Pennsylvania Patient Safety Authority (PSA), finding that advanced electronic medical records (EMRs) can decline 17.5% patient safety events caused by reductions in medication errors, falls, and complication errors. Senot et al. (2016) examine the impact of combined quality and experiential quality on hospitals’ readmissions and cost performance. They analyze six years of data from 3474 US acute care hospitals and show that combined quality and experiential quality decrease the likelihood of readmission for a patient.

A major problem facing information systems today is the protection of consumer privacy. Building Healthcare Information Exchanges (HIE) frameworks containing security principles and privacy is necessary in the big data environment. Considering the privacy issues in the smart healthcare framework, Adjerid et al. (2016) explore the impact of the forms of privacy regulations and policies on HIE efforts. They find that privacy regulation alone negatively affects HIE efforts while privacy regulations with incentives have positive impacts on HIE efforts. Some researchers investigate privacy-preserving data mining (PPDM) operation. PPDM is a data mining method that cannot leak any data containing the client’s sensitive information (Xu et al., 2014).

Finance

The finance research in a big data environment mainly focused on risk analysis. The major financial institutions have millions of credit card accounts. Because of the large scale data, these loan pools and returned securities are computationally difficult to analyze. Allodi and Massacci (2017) develop a quantitative scheme to assess cybersecurity risk. Their scheme can give quantitative probability estimates to help deal with untargeted cyber-attacks against the organization. They use a real data set from a financial institution to verify the new scheme, finding that the new risk assessment scheme is effective. Focusing on a broad class of dynamic, discrete-time models of loan-level risk, Sirignano and Giesecke (2018) develop a numerical method for the analysis of large pools of loans as well as asset-backed securities backed by such pools. Applying their approach to a data set of over 25 million mortgages, the accuracy and speed of the approximation have improved for a variety of pools with different risk profiles. Furthermore, big data analytics is also employed in various other topics such as the relationship between ratings of new issuances and the number of rating analysts (Jiang et al., 2018) and predictor of US real economic activity (Faccini et al., 2018).

Business

Akter and Wamba (2016) make a systematic review of e-commerce in a big data environment. They show that data analytics can transform data into insights by the dynamics processes and technology, providing more value for robust decision-making and business problem solutions for e-commerce companies. Some growth areas of e-commerce research include advertising strategies and recommendation systems for online firms (Ghoshal et al., 2015). Web and mobile advertising have also been other interesting areas of research. Mookerjee et al. (2017) set up a model to predict the clicks of visitors for online advertisements. They propose an approach to managing online advertisements. The results show that both click-rate and firm’s revenue are increased. Yang et al. (2019a) develop a winner-take-all model to improve the total values of corporations. Their approach assesses how a firm’s competitiveness and development potential of corporations help managers effectively avoid overcapacity.

Information systems (IS)

Information systems (IS) has become an interdisciplinary research area that combines computing techniques with big data from business practice (Agarwal and Dhar, 2014). Most research in the IS areas focuses on improving the efficiency of business operations. Focused on IS-Marketing research, Ruths and Pfeffer (2014) analyze the data from online social media to explore consumer behaviors and forecast events. In this direction, Qiu and Kumar (2017) design a randomized field experiment to examine the performance of prediction markets, finding that the increasing size of audiences and the online endorsement often produce more precise predictions. The results suggest that the prediction is more accurate when targeting people with intermediate abilities. Considering the pitfalls of social media, Kumar et al. (2018) summarize several collective behaviors of users on social media and describe them uniformly. They then propose a new hierarchical supervised-learning approach that can assess the likelihood of finding anomalies in online reviews. The results suggest that it is difficult to detect dishonest online reviews owing to complex interactions.

Transportation

With the development of intelligent mobility, the transportation industry has also entered the era of big data. The analysis of big data can significantly improve traffic conditions. Liu et al. (2016a) use big data that contains truck driver behavior information before the accident and geographic location information to study the direct and indirect relationships between truck traffic safety and related factors. They propose a methodology to improve trucks at railway crossings. Considering the pedestrian crash, Xie et al. (2017) explore the possibility of accidents in the logistics network by big data analytics. They build a new model based on a grid-based unit structure framework that can simultaneously utilize large data sets of transit turnstiles, taxis, and even social media. They believe that big data analysis can more accurately estimate the relevant risk factors, which can help identify hot spots in traffic accidents so that positive measures can be taken. Focused on traffic flow predictions, Lv et al. (2015) derive a novel method considering both temporal and spatial correlations and then develop a stacked auto-encoder that is trained by a greedy layer-by-layer method to learn traffic flow characteristics. Shang et al. (2017) use a Bayesian statistics method to explore the cargo logistics risk (CLR). The authors strive to flexibly estimate the conditional density function of the CLR by a large amount of air cargo data set. Their findings help logistics companies distinguish whether the source of CLR is recurrent or disrupt.

Other areas

Focused on the assessment of weather risk, Biffis and Chavez (2017) develop a data-driven risk transfer scheme and demonstrate how to use weather data of rainfall and temperature to create a risk profile. They conduct a real case study for Mozambique’s maize production. The results show that their proposed framework can save 30% cost (from insurance). Considering the management of the e-government system, Joseph and Johnson (2013) demonstrate how big data improves operational efficiency and process effectiveness in the US government. Furthermore, the systemic evaluation in a big data environment is also employed in other fields such as the environment system, the recycling economy evaluation index system, and logistic system. Interested readers can refer to Bi et al. (2014), Yang et al. (2016a), and Zhou et al. (2016) for more discussions.

Way ahead: Potential applications and challenges

Potential applications of evaluation methods

The Internet of Things has created a world of interconnected sensing devices that can collect and store information from their respective real environments (Hashem et al., 2016). The combination of the Internet of Things and the application of big data analysis may bring breakthrough changes to various industries and academic research. For example, smart healthcare is a promising area for future development. The Internet of Things and cloud-based big data analytics can more accurately detect and treat diseases while requiring lower healthcare costs (Varshney and Chang, 2016). The Internet of Things and block-chain ensure the authenticity and traceability of all processes, reducing the transaction risk of various entities. The risk assessment combined with block-chain technology is also worth studying. In addition, the IoTs and related big data applications can play key roles in advancing environmental sustainability (Bibri, 2018). Making a reasonable assessment of the sustainable economy and proposing improvement methods is also of great significance today in response to environmental degradation.

Logistics services are becoming more intelligent and can be tracked in real-time (Yang et al., 2017b). The development of the Internet of Things and big data has also promoted the development of smart logistics. According to the China Smart Logistics Development Report released in 2018, the industry value of smart logistics will exceed one trillion by 2025. Focusing on the evaluation of smart logistics will be a promising topic based on the Internet of Things and big data analysis technology. Furthermore, with the rapid development of social platforms and short video Apps, evaluating consumer’s online behavior is also a trend (Zheng et al., 2019).

The development of new evaluation methods

The big data put forward new requirements on behavioral cognition of evaluation, evaluation modes, and evaluation functions. First, behavioral cognition of evaluation. The evaluator shows a series of new behavioral characteristics in the context of big data. For example, complete evaluation accuracy is impossible to achieve in the context of big data. Therefore, the evaluator’s requirements for evaluation accuracy may need to be reduced. Specifically, the accuracy of evaluation depends on the extracting costs of information. After all, the acquisition of big data requires huge economic costs. Word-of-mouth service on the Internet is a representative type of public comment. Evaluation in the context of public comment may cause some behavioral deviations (such as group polarization). These behavioral deviations also need to be understood. Second, changes in evaluation modes. In the context of big data, online evaluation models will emerge. The online evaluation has the characteristics of high-speed computing and instant response. In addition, the background of big data also emphasizes the dependence on distributed human-computer interaction. The current evaluation model is not profound for the introduction of distributed human-computer interaction. Third, a new expansion of the evaluation function. Collecting and processing big data through a large number of mobile, distributed terminals and processors is to meet the needs of decision-makers to mine the value of data, which provides opportunities for evaluating the emergence of new features. For example, public safety evaluation based on mobile social network data, energy efficiency evaluation based on smart meter data, residential credit evaluation based on smart payment data, and building security evaluation based on IoTs data are all new evaluations in the context of big data.

Furthermore, the age of big data brings new opportunities to the evaluation methods, which are mainly manifested as follows: (1) Evaluation of data quality. Although big data contains sufficient information, it also means big garbage (such as repeated, redundant, disturbing, and distorted information), which hinders the reliability of the evaluation results. Therefore, to evaluate data quality has become a top priority. (2) New data analysis techniques to reduce computational difficulty. Evaluation methods based on statistical models are difficult to implement, while evaluation methods based on optimized models often perform poorly on large data scales. How to reduce the computational difficulty on the basis of ensuring the completeness and rationality of data information is the basic performance required by the new evaluation method and its implementation technology. (3) Robust evaluation methods. Most of the current evaluation methods deal with single or several discrete static data sets, and cannot perform continuous dynamic evaluation throughout the process. In the big data environment, the status of the evaluation object is updated in real-time. But decision-makers want some robust assessment results. Therefore, it is necessary and valuable to develop a robust evaluation method that can adapt to dynamically changing data. (4) Evaluation methods for unstructured data. In the big data environment, many data are unstructured or semi-structured, and most of the current evaluation methods can only deal with structured data. Due to the huge amount of data, the diversity of information dimensions, the personalization of expressions, and the complex relationships between data, it is a very difficult task to transform unstructured data into structured data. To develop novel evaluation methods for unstructured data will greatly enrich the evaluation theory. (5) Sampling and population contain different amounts of information. For a large amount of data in the traditional sense, it is assumed that the sampling and the population have the same amount of information, and the performance of the overall data can be indirectly measured by random sampling. In a real big data environment, the total amount of information that cannot be contained in the sampling. Hence, the big data environment poses severe challenges to the traditional random sampling method. (6) Evaluation methods for new features. The use of big data are the conscious behavior of decision-makers. Collecting and processing big data through a large number of mobile and distributed terminals and processors is to meet the needs of decision-makers to mine the value of data. This provides the possibility of the emergence of new evaluation features. This is the evaluation of new features such as data based on mobile social networks used for public safety assessment, smart meter data for energy efficiency assessment, smart payment data for resident’s credit assessment, etc.

Conclusions

This article reviews the systematic evaluation and improvement researches in the big data environment. We first focus on the development of evaluation methods under different big data analytics techniques and data characteristics. The systematic improvement researches under different data analysis techniques and different information characteristics are also reviewed. It also summarizes the applied research of system evaluation from six aspects: Medical industry, finance, business, information systems, transportation, and other areas. Then it discusses the future development of evaluation methods in the context of big data. The research on the evaluation methods and systematic improvement work in the big data environment is a great innovation of traditional evaluation theories and methods, which will help promote the scientific process of evaluation. The study of evaluation theory and methods in the context of big data is a new field of big data research, and its research results can be used as an important module of management decision-making. Systematic evaluation and improvement also have broad research and application prospects.

References

[1]

Abedinia O, Amjady N, Zareipour H (2017). A new feature selection technique for load and price forecast of electrical power systems. IEEE Transactions on Power Systems, 32(1): 62–74

[2]

Adamopoulos P, Ghose A, Todri V (2018). The impact of user personality traits on word of mouth: Text-mining social media platforms. Information Systems Research, 29(3): 612–640

[3]

Adjerid I, Acquisti A, Telang R, Padman R, Adler-Milstein J (2016). The impact of privacy regulation and technology incentives: The case of health information exchanges. Management Science, 62(4): 1042–1063

[4]

Adnan K, Akbar R (2019). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1): 91

[5]

Adomavicius G, Zhang J (2016). Classification, ranking, and top-K stability of recommendation algorithms. INFORMS Journal on Computing, 28(1): 129–147

[6]

Agarwal R, Dhar V (2014). Big data, data science, and analytics: The opportunity and challenge for IS research. Information Systems Research, 25(3): 443–448

[7]

Agrawal R, Imieliński T, Swami A (1993). Mining association rules between sets of items in large databases. SIGMOD Record, 22(2): 207–216

[8]

Akter S, Wamba S F (2016). Big data analytics in e-commerce: A systematic review and agenda for future research. Electronic Markets, 26(2): 173–194

[9]

Allodi L, Massacci F (2017). Security events and vulnerability data for cyber security risk. Risk Analysis, 37(8): 1606–1627

[10]

Ambusaidi M A, He X, Nanda P, Tan Z (2016). Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Transactions on Computers, 65(10): 2986–2998

[11]

Amorin C, Kegelmeyer L M, Kegelmeyer W P (2019). A hybrid deep learning architecture for classification of microscopic damage on National Ignition Facility laser optics. Statistical Analysis and Data Mining: The ASA Data Science Journal, 1–9

[12]

Ansari A, Li Y, Zhang J Z (2018). Probabilistic topic model for hybrid recommender systems: A stochastic variational Bayesian approach. Marketing Science, 37(6): 987–1008

[13]

Aung M M, Han T T, Ko S M (2019). Customer churn prediction using association rule mining. International Journal of Trend in Scientific Research and Development, 3(5): 1886–1890

[14]

Badiezadeh T, Saen R F, Samavati T (2018). Assessing sustainability of supply chains by double frontier network DEA: A big data approach. Computers & Operations Research, 98: 284–290

[15]

Bai X, Bhattacharjee S, Boylu F, Gopal R (2015). Growth projections and assortment planning of commodity products across multiple stores: A data mining and optimization approach. INFORMS Journal on Computing, 27(4): 619–635

[16]

Bai X, Nunez M, Kalagnanam J R (2012). Managing data quality risk in accounting information systems. Information Systems Research, 23(2): 453–473

[17]

Ball R C, Branke J, Meisel S (2018). Optimal sampling for simulated annealing under noise. INFORMS Journal on Computing, 30(1): 200–215

[18]

Bennasar M, Hicks Y, Setchi R (2015). Feature selection using joint mutual information maximization. Expert Systems with Applications, 42(22): 8520–8532

[19]

Bertsimas D, Delarue A, Jaillet P, Martin S (2019a). Travel time estimation in the age of big data. Operations Research, 67(2): 498–515

[20]

Bertsimas D, Jaillet P, Martin S (2019b). Online vehicle routing: The edge of optimization in large-scale applications. Operations Research, 67(1): 143–162

[21]

Bertsimas D, Kallus N, Hussain A (2016). Inventory management in the era of big data. Production and Operations Management, 25(12): 2002–2013

[22]

Bhatia S (2019). Predicting risk perception: New insights from data science. Management Science, 65(8): 3800–3823

[23]

Bi G, Wang P, Yang F, Liang L (2014). Energy and environmental efficiency of China’s transportation sector: A multidirectional analysis approach. Mathematical Problems in Engineering, 1–12

[24]

Bibri S E (2018). The IoT for smart sustainable cities of the future: An analytical framework for sensor-based big data applications for environmental sustainability. Sustainable Cities and Society, 38: 230–253

[25]

Biffis E, Chavez E (2017). Satellite data and machine learning for weather risk management and food security. Risk Analysis, 37(8): 1508–1521

[26]

Boone T, Ganeshan R, Hicks R L, Sanders N R (2018). Can Google Trends improve your sales forecast? Production and Operations Management, 27(10): 1770–1774

[27]

Borovkova S, Tsiamas I (2019). An ensemble of LSTM neural networks for high-frequency stock market classification. Journal of Forecasting (in press) doi: 10.1002/for.2585

[28]

Boudellioua I, Saidi R, Hoehndorf R, Martin M J, Solovyev V (2016). Prediction of metabolic pathway involvement in prokaryotic UniProtKB data by association rule mining. PLoS One, 11(7): e0158896

[29]

Buckman J R, Bockstedt J C, Hashim M J (2019). Relative privacy valuations under varying disclosure characteristics. Information Systems Research, 30(2): 375–388

[30]

Buijs P, Alvarez J A L, Veenstra M, Roodbergen K J (2016). Improved collaborative transport planning at Dutch logistics service provider Fritom. Interfaces, 46(2): 119–132

[31]

Cang S, Yu H (2012). Mutual information based input feature selection for classification problems. Decision Support Systems, 54(1): 691–698

[32]

Cao Z, Grima R (2019). Accuracy of parameter estimation for auto-regulatory transcriptional feedback loops from noisy data. Journal of the Royal Society Interface, 16(153): 20180967

[33]

Chan A P, Osei-Kyei R, Hu Y, Yun L E (2018). A fuzzy model for assessing the risk exposure of procuring infrastructure mega-projects through public-private partnership: The case of Hong Kong–Zhuhai–Macao Bridge. Frontiers of Engineering Management, 5(1): 64–77

[34]

Chehrazi N, Weber T A (2015). Dynamic valuation of delinquent credit-card accounts. Management Science, 61(12): 3077–3096

[35]

Chen P C L, Zhang C Y (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275: 314–347

[36]

Choi H S, Lee W S, Sohn S Y (2017a). Analyzing research trends in personal information privacy using topic modeling. Computers & Security, 67: 244–253

[37]

Choi T M, Chan H K, Yue X (2017b). Recent development in big data analytics for business operations and risk management. IEEE Transactions on Cybernetics, 47(1): 81–92

[38]

Choi T M, Wallace S W, Wang Y (2018). Big data analytics in operations management. Production and Operations Management, 27(10): 1868–1883

[39]

Chung S H, Ma H L, Chan H K (2017). Cascading delay risk of airline workforce deployments with crew pairing and schedule optimization. Risk Analysis, 37(8): 1443–1458

[40]

Cui R, Gallino S, Moreno A, Zhang D J (2018). The operational value of social media information. Production and Operations Management, 27(10): 1749–1769

[41]

Czibula G, Czibula I G, Miholca D L, Crivei L M (2019). A novel concurrent relational association rule mining approach. Expert Systems with Applications, 125: 142–156

[42]

Das A S, Gupta A, Singh G, Subramaniam L V (2016). Mining qualitative attributes to assess corporate performance. In: INFORMS Tutorials in Operations Research: Optimization Challenges in Complex, Networked and Risky Systems. INFORMS, 269–281

[43]

DeFond M, Erkens D H, Zhang J (2017). Do client characteristics really drive the Big N audit quality effect? New evidence from propensity score matching. Management Science, 63(11): 3628–3649

[44]

Dhar V (2013). Data science and prediction. Communications of the ACM, 56(12): 64–73

[45]

Distelhorst G, Hainmueller J, Locke R M (2017). Does lean improve labor standards? Management and social performance in the Nike supply chain. Management Science, 63(3): 707–728

[46]

Dudel C, Klüsener S (2018). Estimating men’s fertility from vital registration data with missing values. Population Studies, 73(3): 439–449

[47]

Dutta K, Ghoshal A, Kumar S (2017). The interdependence of data analytics and operations management. In: Martin K S, Sushil K G, eds. The Routledge Companion to Production and Operations Management. New York: Taylor and Francis, 291–308

[48]

Faccini R, Konstantinidi E, Skiadopoulos G, Sarantopoulou-Chiourea S (2018). A new predictor of US real economic activity: The S&P 500 option implied risk aversion. Management Science, 65(10): 1–23

[49]

Feng F, Cho J, Pedrycz W, Fujita H, Herawan T (2016). Soft set based association rule mining. Knowledge-Based Systems, 111: 268–282

[50]

France S L, Ghose S (2016). An analysis and visualization methodology for identifying and testing market structure. Marketing Science, 35(1): 182–197

[51]

Galeshchuk S, Mukherjee S (2017). Deep networks for predicting direction of change in foreign exchange rates. Intelligent Systems in Accounting, Finance & Management, 24(4): 100–110

[52]

Gatto L, Breckels L M, Naake T, Gibb S (2015). Visualization of proteomics data using R and bioconductor. Proteomics, 15(8): 1375–1389

[53]

Geczy P (2014). Big data characteristics. The Macrotheme Review, 3(6): 94–104

[54]

Genta R M, Sonnenberg A (2014). Big data in gastroenterology research. Nature Reviews Gastroenterology & Hepatology, 11(6): 386–390

[55]

Ghose A, Ipeirotis P G, Li B (2012). Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science, 31(3): 493–520

[56]

Ghoshal A, Kumar S, Mookerjee V (2015). Impact of recommender system on competition between personalizing and non-personalizing firms. Journal of Management Information Systems, 31(4): 243–277

[57]

Graham J W, Cumsille P E, Shevock A E (2012). Methods for handling missing data. In: Schinka J A, Velicer W F, eds. Handbook of Psychology: Vol. 2. Research methods in psychology. 2nd ed. New York, NY: John Wiley & Sons, 109–141

[58]

Hashem I A T, Chang V, Anuar N B, Adewole K, Yaqoob I, Gani A, Ahmed E, Chiroma H (2016). The role of big data in smart city. International Journal of Information Management, 36(5): 748–758

[59]

Hastie T, Tibshirani R, Friedman J (2005). The elements of statistical learning: Data mining, inference and prediction. The Mathematical Intelligencer, 27(2): 83–85

[60]

Hochbaum D S (2018). Machine learning and data mining with combinatorial optimization algorithms. In: INFORMS Tutorials in Operations Research: Recent Advances in Optimization and Modeling of Contemporary Problems. INFORMS, 109–129

[61]

Hoeksma R, Uetz M (2016). Optimal mechanism design for a sequencing problem with two-dimensional types. Operations Research, 64(6): 1438–1450

[62]

Hu H, Wen Y G, Chua T S, Li X L (2014). Toward scalable systems for big data analytics: A technology tutorial. IEEE Access, 2: 652–687

[63]

Huang T, Dong W, Xie X, Shi G, Bai X (2017). Mixed noise removal via Laplacian scale mixture modeling and nonlocal low-rank approximation. IEEE Transactions on Image Processing, 26(7): 3171–3186

[64]

Huang T, van Mieghem J A (2014). Clickstream data and inventory management: Model and empirical analysis. Production and Operations Management, 23(3): 333–347

[65]

Huang Y, Jasin S, Manchanda P (2019). “Level Up”: Leveraging skill and engagement to maximize player game-play in online video games. Information Systems Research, 30(3): 927–947

[66]

Hydari M Z, Telang R, Marella W M (2018). Saving patient Ryan—Can advanced electronic medical records make patient care safer? Management Science, 65(5): 2041–2059

[67]

Ilow J, Hatzinakos D (1998). Analytic alpha-stable noise modeling in a Poisson field of interferers or scatterers. IEEE Transactions on Signal Processing, 46(6): 1601–1611

[68]

Jagabathula S, Subramanian L, Venkataraman A (2018). A model-based embedding technique for segmenting customers. Operations Research, 66(5): 1247–1267

[69]

Jamshidi A, Faghih-Roohi S, Hajizadeh S, Núñez A, Babuska R, Dollevoet R, Li Z L, de Schutter B (2017). A big data analysis approach for rail failure risk assessment. Risk Analysis, 37(8): 1495–1507

[70]

Jia F, Wu W (2019). Evaluating methods for handling missing ordinal data in structural equation modeling. Behavior Research Methods, 51(5): 2337–2355

[71]

Jiang G, Hong L J, Nelson B L (2019). Online risk monitoring using offline simulation. INFORMS Journal on Computing (in press) doi: 10.1287/ijoc.2019.0892

[72]

Jiang J, Wang I Y, Wang K P (2018). Revolving rating analysts and ratings of mortgage-backed and asset-backed securities: Evidence from LinkedIn. Management Science, 64(12): 5832–5854

[73]

Joseph R C, Johnson N A (2013). Big data and transformational government. IT Professional, 15(6): 43–48

[74]

Kalbandi I, Anuradha J (2015). A brief introduction on Big Data 5Vs characteristics and Hadoop technology. Procedia Computer Science, 48: 319–324

[75]

Kishore N, Mitchell R, Lash T L, Reed C, Danon L, Sigmundsdóttir G, Vigfusson Y (2020). Flying, phones and flu: Anonymized call records suggest that Keflavik International Airport introduced pandemic H1N1 into Iceland in 2009. Influenza and Other Respiratory Viruses, 14(1): 37–45

[76]

Kitchin R, Lauriault T P (2015). Small data in the era of big data. GeoJournal, 80(4): 463–475

[77]

Kopcso D, Pachamanova D (2018). Case article—Business value in integrating predictive and prescriptive analytics models. INFORMS Transactions on Education, 19(1): 36–42

[78]

Kumar N, Venugopal D, Qiu L, Kumar S (2018). Detecting review manipulation on online platforms with hierarchical supervised learning. Journal of Management Information Systems, 35(1): 350–380

[79]

Li C, Gu J (2019). An integration approach of hybrid databases based on SQL in cloud computing environment. Software, Practice & Experience, 49(3): 401–422

[80]

Li Z, Yu H, Zhang G, Wang J (2019). A Bayesian vector autoregression-based data analytics approach to enable irregularly-spaced mixed-frequency traffic collision data imputation with missing values. Transportation Research Part C: Emerging Technologies, 108: 302–319

[81]

Lim C, Maglio P P (2018). Data-driven understanding of smart service systems through text mining. Service Science, 10(2): 154–180

[82]

Little R J A, Rubin D B (2019). Statistical Analysis with Missing Data. 3rd ed. Hoboken, NJ: John Wiley & Sons

[83]

Liu J, Wang X, Khattak A J, Hu J, Cui J, Ma J (2016a). How big data serves for freight safety management at highway-rail grade crossings? A spatial approach fused with path analysis. Neurocomputing, 181: 38–52

[84]

Liu X, Singh P V, Srinivasan K (2016b). A structured analysis of unstructured big data by leveraging cloud computing. Marketing Science, 35(3): 363–388

[85]

Lizzette P L, Suzanna L, Shoberg T, Corns S (2019). A model for the evaluation of environmental impact indicators for a sustainable maritime transportation systems. Frontiers of Engineering Management, 6(3): 368–383

[86]

Lou Y, Jones M P, Sun W (2019). Estimation of causal effects in clinical endpoint bioequivalence studies in the presence of intercurrent events: Noncompliance and missing data. Journal of Biopharmaceutical Statistics, 29(1): 151–173

[87]

Lutu P E N, Engelbrecht A P (2013). Positive-versus-negative classification for model aggregation in predictive data mining. INFORMS Journal on Computing, 25(4): 792–807

[88]

Lv Y, Duan Y, Kang W, Li Z, Wang F (2015). Traffic flow prediction with big data: A deep learning approach. IEEE Transactions on Intelligent Transportation Systems, 16(2): 865–873

[89]

Mehra A, Kumar S, Raju J S (2018). Competitive strategies for brick-and-mortar stores to counter “showrooming”. Management Science, 64(7): 3076–3090

[90]

Mookerjee R, Kumar S, Mookerjee V S (2017). Optimizing performance-based Internet advertisement campaigns. Operations Research, 65(1): 38–54

[91]

Moreau V, Bage G, Marcotte D, Samson R (2012). Statistical estimation of missing data in life cycle inventory: An application to hydroelectric power plants. Journal of Cleaner Production, 37: 335–341

[92]

Naghdi M, Shafiyi M A, Haghifam M R (2018). Quadratic optimization method for a dual index combination of the penetration level and the dispersion factor of the distributed generation. International Transactions on Electrical Energy Systems, 28(8): e2575

[93]

Nambisan P, Luo Z, Kapoor A, Patrick T B, Cisler R A (2015). Social media, big data, and public health informatics: Ruminating behavior of depression revealed through Twitter. In: 48th Hawaii International Conference on System Sciences. IEEE, 2906–2913

[94]

Newman J P, Ferguson M E, Garrow L A, Jacobs T L (2014). Estimation of choice-based models using sales data from a single firm. Manufacturing & Service Operations Management, 16(2): 184–197

[95]

Nie J, Xiao L, Zheng L M, Du Z F, Liu D, Zhou J W, Xiang J, Hou J J, Wang X G, Fang J B (2019a). An integration of UPLC-DAD/ESI-Q-TOF MS, GC-MS, and PCA analysis for quality evaluation and identification of cultivars of Chrysanthemi Flos (Juhua). Phytomedicine, 59: 152803

[96]

Nie Z, Wan C, Chen C, Chen J (2019b). Comprehensive evaluation of the postharvest antioxidant capacity of Majiayou Pomelo harvested at different maturities based on PCA. Antioxidants, 8(5): 136

[97]

Park Y W, Jiang Y, Klabjan D, Williams L (2017). Algorithms for generalized clusterwise linear regression. INFORMS Journal on Computing, 29(2): 301–317

[98]

Parkinson S, Somaraki V, Ward R (2016). Auditing file system permissions using association rule mining. Expert Systems with Applications, 55: 274–283

[99]

Qiu L, Kumar S (2017). Understanding voluntary knowledge provision and content contribution through a social-media-based prediction market: A field experiment. Information Systems Research, 28(3): 529–546

[100]

Rajwan Y G, Barclay P W, Lee T, Sun I F, Passaretti C, Lehmann H (2013). Visualizing central line-associated blood stream infection (CLABSI) outcome data for decision making by health care consumers and practitioners—An evaluation study. Online Journal of Public Health Informatics, 5(2): 218

[101]

Ramasubbu N, Kemerer C F (2016). Technical debt and the reliability of enterprise software systems: A competing risks analysis. Management Science, 62(5): 1487–1510

[102]

Rezghi M, Obulkasim A (2014). Noise-free principal component analysis: An efficient dimension reduction technique for high dimensional molecular data. Expert Systems with Applications, 41(17): 7797–7804

[103]

Ringel D M, Skiera B (2016). Visualizing asymmetric competition among more than 1000 products using big search data. Marketing Science, 35(3): 511–534

[104]

Roy A, Qureshi S, Pande K, Nair D, Gairola K, Jain P, Singh S, Sharma K, Jagadale A, Lin Y Y, Sharma S, Gotety R, Zhang Y X, Tang J, Mehta T, Sindhanuru H, Okafor N, Das S, Gopal C N, Rudraraju S B, Kakarlapudi A V (2019). Performance comparison of machine learning platforms. INFORMS Journal on Computing, 31(2): 207–225

[105]

Ruths D, Pfeffer J (2014). Social media for large studies of behavior. Science, 346(6213): 1063–1064

[106]

Sagaert Y R, Aghezzaf E H, Kourentzes N, Desmet B (2018). Temporal big data for tactical sales forecasting in the tire industry. Interfaces, 48(2): 121–129

[107]

Salemi P L, Song E, Nelson B L, Staum J (2019). Gaussian Markov random fields for discrete optimization via simulation: Framework and algorithms. Operations Research, 67(1): 250–266

[108]

Sato Y, Izui K, Yamada T, Nishiwaki S (2019). Data mining based on clustering and association rule analysis for knowledge discovery in multiobjective topology optimization. Expert Systems with Applications, 119: 247–261

[109]

Senot C, Chandrasekaran A, Ward P T, Tucker A L, Moffatt-Bruce S D (2016). The impact of combining conformance and experiential quality on hospitals’ readmissions and cost performance. Management Science, 62(3): 829–848

[110]

Shang Y, Dunson D, Song J S (2017). Exploiting big data in logistics risk assessment via Bayesian nonparametrics. Operations Research, 65(6): 1574–1588

[111]

Simon D (2013). Evolutionary Optimization Algorithms. Hoboken, NJ: John Wiley & Sons

[112]

Sirignano J, Giesecke K (2018). Risk analysis for large pools of loans. Management Science, 65(1): 107–121 doi:10.1287/mnsc.2017.2947

[113]

Sivarajah U, Kamal M M, Irani Z, Weerakkody V (2017). Critical analysis of Big Data challenges and analytical methods. Journal of Business Research, 70: 263–286

[114]

Soley-Bori M (2013). Dealing with missing data: Key assumptions and methods for applied analysis. Technical Report No. 4. Boston University

[115]

Sun T, Vasarhelyi M A (2018). Predicting credit card delinquencies: An application of deep neural networks. Intelligent Systems in Accounting, Finance & Management, 25(4): 174–189

[116]

Timoshenko A, Hauser J R (2019). Identifying customer needs from user-generated content. Marketing Science, 38(1): 1–20

[117]

van Vliet M, Salmelin R (2020). Post-hoc modification of linear models: Combining machine learning with domain information to make solid inferences from noisy data. NeuroImage, 204: 116221

[118]

Vanli O A, Zhang C, Wang B (2013). An adaptive Bayesian approach for robust parameter design with observable time series noise factors. IIE Transactions, 45(4): 374–390

[119]

Varshney U, Chang C K (2016). Smart health and well-being. Computer, 49(11): 11–13

[120]

Wamba S F, Akter S, Edwards A, Chopin G, Gnanzou D (2015). How “big data” can make big impact: Findings from a systematic review and a longitudinal case study. International Journal of Production Economics, 165: 234–246

[121]

Wang G, Gunasekaran A, Ngai E W, Papadopoulos T (2016). Big data analytics in logistics and supply chain management: Certain investigations for research and applications. International Journal of Production Economics, 176: 98–110

[122]

Wang P, Li X (2019). Assessing the quality of information on Wikipedia: A deep-learning approach. Journal of the Association for Information Science and Technology, 71(1): 16–28

[123]

Wang Y, Wu M (2019). A novel systematic algorithm paradigm for the electric vehicle data anomaly detection based on association data mining. Concurrency and Computation, 31(9): e5073

[124]

Wani H, Ashtankar N (2017). Big data in supply chain management. In: 4th International Conference on Advanced Computing and Communication Systems (ICACCS). IEEE, 1–4

[125]

Wiwatcharakoses C, Berrar D (2019). SOINN+, a self-organizing incremental neural network for unsupervised learning from noisy data streams. Expert Systems with Applications, 143: 113069

[126]

Wu L, Hitt L, Lou B (2019a). Data analytics, innovation, and firm productivity. Management Science, 65(10): 4863–4877

[127]

Wu X, Akbarzadeh Khorshidi H, Aickelin U, Edib Z, Peate M (2019b). Imputation techniques on missing values in breast cancer treatment and fertility data. Health Information Science and Systems, 7(1): 19

[128]

Xia F, Chatterjee R, May J H (2019). Using conditional restricted Boltzmann machines to model complex consumer shopping patterns. Marketing Science, 38(4): 711–727

[129]

Xie K, Ozbay K, Kurkcu A, Yang H (2017). Analysis of traffic crashes involving pedestrians using big data: Investigation of contributing factors and identification of hotspots. Risk Analysis, 37(8): 1459–1476

[130]

Xu L, Jiang C X, Wang J, Yuan J, Ren Y (2014). Information security in big data: Privacy and data mining. IEEE Access, 2: 1149–1176

[131]

Yang F, Du F, Liang L, Yang Z (2014). Forecasting the production abilities of recycling systems: A DEA based research. Journal of Applied Mathematics, 2014: 1–9

[132]

Yang F, Jiang L, Ang S (2019a). A winner-take-all evaluation in data envelopment analysis. Annals of Operations Research, 278(1–2): 141–158

[133]

Yang F, Jiao C, Ang S (2019b). The optimal technology licensing strategy under supply disruption. International Journal of Production Research, 57(7): 2057–2082

[134]

Yang F, Kong J, Jin M (2019c). Two-period pricing with selling effort in the presence of strategic customers. Asia-Pacific Journal of Operational Research, 36(03): 1–21

[135]

Yang F, Shan F, Jin M (2017a). Capacity investment under cost sharing contracts. International Journal of Production Economics, 191: 278–285

[136]

Yang F, Song S, Huang W, Xia Q (2015). SMAA-PO: Project portfolio optimization problems based on stochastic multicriteria acceptability analysis. Annals of Operations Research, 233(1): 535–547

[137]

Yang F, Yang M, Xia Q, Liang L (2016a). Collaborative distribution between two logistics service providers. International Transactions in Operational Research, 23(6): 1025–1050

[138]

Yang F, Yang M, Xia Q, Liang L (2017b). Cooperation between two logistics service providers with different distribution ranges. International Journal of Shipping and Transport Logistics, 9(2): 186–201

[139]

Yang F, Yuan Q, Du S, Liang L (2016b). Reserving relief supplies for earthquake: A multi-attribute decision making of China Red Cross. Annals of Operations Research, 247(2): 759–785

[140]

Yang Z, Liu H, Bi T, Li Z, Yang Q (2020). An adaptive PMU missing data recovery method. International Journal of Electrical Power & Energy Systems, 116: 105577

[141]

Zhang C, Xue X, Zhao Y, Zhang X, Li T (2019). An improved association rule mining-based method for revealing operational problems of building heating, ventilation and air conditioning (HVAC) systems. Applied Energy, 253: 113492

[142]

Zheng X, Men J, Yang F, Gong X (2019). Understanding impulse buying in mobile commerce: An investigation into hedonic and utilitarian browsing. International Journal of Information Management, 48: 151–160

[143]

Zhou Z F, Ou J, Wang S S, Chen X H (2016). The building of papermaking enterprise’s recycling economy evaluation index system based on value flow analysis. Frontiers of Engineering Management, 3(1): 9–17

[144]

Zoph B, Yuret D, May J, Knight K (2016). Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, 1568–1575

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (308KB)

9564

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/