Recent progress and trends in predictive visual analytics
Junhua LU, Wei CHEN, Yuxin MA, Junming KE, Zongzhuang LI, Fan ZHANG, Ross MACIEJEWSKI
Recent progress and trends in predictive visual analytics
A wide variety of predictive analytics techniques have been developed in statistics, machine learning and data mining; however, many of these algorithms take a black-box approach in which data is input and future predictions are output with no insight into what goes on during the process. Unfortunately, such a closed system approach often leaves little room for injecting domain expertise and can result in frustration from analysts when results seem spurious or confusing. In order to allow for more human-centric approaches, the visualization community has begun developing methods to enable users to incorporate expert knowledge into the prediction process at all stages, including data cleaning, feature selection, model building and model validation. This paper surveys current progress and trends in predictive visual analytics, identifies the common framework in which predictive visual analytics systems operate, and develops a summarization of the predictive analytics workflow.
predictive visual analytics / visualization / visual analytics / data mining / predictive analysis
[1] |
Larose D T, larose C D. Data Mining and Predictive Analytics, 2nd ed. Hoboken: John Wiley & Sons, 2015
|
[2] |
Schlangenstein M. UPS crunches data to make more routes more efficient, save gas. http://www.bloomberg.com/news/articles/2013-10-30/ups-uses-big-data-to-make-routes-more-efficient-save-gas, 2013
|
[3] |
Ginsberg J, Mohebbi M H, Patel R S, Brammer L, Smolinski M S, Brilliant L. Detecting influenza epidemics using search engine query data. Nature, 2009, 457(7232): 1012–1014
CrossRef
Google scholar
|
[4] |
Butler D. When Google got flu wrong. Nature, 2013, 494(7436): 155–156
CrossRef
Google scholar
|
[5] |
Culotta A. Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the 1st Workshop on Social Media Analytics. 2010, 115–122
CrossRef
Google scholar
|
[6] |
Lazer D, Kennedy R, King G, Vespignani A. The parable of Google flu: traps in big data analysis. Science, 2014, 343(6176): 1203–1205
CrossRef
Google scholar
|
[7] |
Keim D A, Kohlhammer J, Ellis G, Mansmann F. Mastering the Information Age — Solving Problems with Visual Analytics. Goslar: Florian Mansmann, 2010
|
[8] |
Bertini E, Lalanne D. Surveying the complementary role of automatic data analysis and visualization in knowledge discovery. In: Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration. 2009, 12–20
CrossRef
Google scholar
|
[9] |
Sacha D, Stoffel A, Stoffel F, Kwon B C, Ellis G, Keim D. Knowledge generation model for visual analytics. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1604–1613
CrossRef
Google scholar
|
[10] |
El-Assady M, Jentner W, Stein M, Fischer F, Schreck T, Keim D. Predictive visual analytics —approaches for movie ratings and discussion of open research challenges. In: Proceedings of IEEE VIS Workshop: Visualization for Predictive Analytics. 2014
|
[11] |
Krause J, Perer A, Bertini E. INFUSE: interactive feature selection for predictive modeling of high dimensional data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1614–1623
CrossRef
Google scholar
|
[12] |
Gleicher M. Position paper: towards comprehensible predictive modeling. In: Proceedings of IEEE VIS Workshop: Visualization for Predictive Analytics. 2014
|
[13] |
Kandel S, Paepcke A, Hellerstein J, Heer J. Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2011, 3363–3372
CrossRef
Google scholar
|
[14] |
Rahm E, Do H H. Data cleaning: problems and current approaches. IEEE Data Eng. Bull., 2000, 23(4): 3–13
|
[15] |
Kim W, Choi B J, Hong E K, Kim S K, Lee D. A taxonomy of dirty data. Data Mining and Knowledge Discovery, 2003, 7(1): 81–99
CrossRef
Google scholar
|
[16] |
Ganuza M L, Ferracutti G, Gargiulo M F, Castro S M, Bjerg E, Gröller E, Matković K. The spinel explorer — interactive visual analysis of spinel group minerals. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1913–1922
CrossRef
Google scholar
|
[17] |
Brown E T, Ottley A, Zhao H, Lin Q, Souvenir R, Endert A, Chang R. Finding waldo: learning about users from their interactions. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1663–1672
CrossRef
Google scholar
|
[18] |
Born S, Sundermann S H, Russ C, Hopf R, Ruiz C E, Falk V, Gessat M. Stent maps — comparative visualization for the prediction of adverse events of transcatheter aortic valve implantations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2704–2713
CrossRef
Google scholar
|
[19] |
Xie C, Chen W, Huang X X, Hu Y Q, Barlowe S, Yang J. VAET: a visual analytics approach for e-transactions time-series. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1743–1752
CrossRef
Google scholar
|
[20] |
Madhavan K, Elmqvist N, Vorvoreanu M, Chen X, Wong Y, Xian H, Dong Z, Johri A. Dia2: Web-based cyberinfrastructure for visual analysis of funding portfolios. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1823–1832
CrossRef
Google scholar
|
[21] |
Hao M C, Janetzko H, Mittelstädt S, Hill W, Dayal U, Keim D A, Marwah M, Sharma R K. A visual analytics approach for peak-preserving prediction of large seasonal time series. Computer Graphics Forum, 2011, 30(3): 691–700
CrossRef
Google scholar
|
[22] |
Hao M C, Marwah M, Janetzko H, Dayal U, Keim D A, Patnaik D, Ramakrishnan N, Sharma R K. Visual exploration of frequent patterns in multivariate time series. Information Visualization, 2012, 11(1): 71–83
CrossRef
Google scholar
|
[23] |
Malik A, Maciejewski R, Towers S, McCullough S, Ebert D S. Proactive spatiotemporal resource allocation and predictive visual analytics for community policing and law enforcement. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1863–1872
CrossRef
Google scholar
|
[24] |
Hollt T, Magdy A, Zhan P, Chen G, Gopalakrishnan G, Hoteit I, Hansen C D, Hadwiger M. Ovis: a framework for visual analysis of ocean forecast ensembles. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(8): 1114–1126
CrossRef
Google scholar
|
[25] |
Doraiswamy H, Ferreira N, Damoulas T, Freire J, Silva C T. Using topological analysis to support event-guided exploration in urban data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2634–2643
CrossRef
Google scholar
|
[26] |
Chen W, Guo F, Wang F Y. A survey of traffic data visualization. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(6): 2970–2984
CrossRef
Google scholar
|
[27] |
Koch S, John M, Worner M, Muller A, Ertl T. Varifocalreader-in-depth visual analysis of large text documents. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1723–1732
CrossRef
Google scholar
|
[28] |
Zhao J, Cao N, Wen Z, Song Y, Lin Y R, Collins C M. # FluxFlow: visual analysis of anomalous information spreading on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1773–1782
CrossRef
Google scholar
|
[29] |
Sun G, Wu Y, Liu S, Peng T Q, Zhu J J, Liang R. EvoRiver: visual analysis of topic coopetition on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1753–1762
CrossRef
Google scholar
|
[30] |
Klemm P, Oeltze-Jafra S, Lawonn K, Hegenscheid K, Volzke H, Preim B. Interactive visual analysis of image-centric cohort study data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1673–1682
CrossRef
Google scholar
|
[31] |
Arietta S M, Efros A, Ramamoorthi R, Agrawala M. City forensics: using visual elements to predict non-visual city attributes. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2624–2633
CrossRef
Google scholar
|
[32] |
Ma Y X, Xu J Y, Peng D C, Zhang T, Jin C Z, Qu H M, Chen W, Peng Q S. A visual analysis approach for community detection of multi-context mobile social networks. Journal of Computer Science and Technology, 2013, 28(5): 797–809
CrossRef
Google scholar
|
[33] |
Van den Elzen S, Holten D, Blaas J, VanWijk J J. Dynamic network visualization with extended massive sequence views. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(8): 1087–1099
CrossRef
Google scholar
|
[34] |
Van den Elzen S, Van Wijk J J. Multivariate network exploration and presentation: From detail to overview via selections and aggregations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2310–2319
CrossRef
Google scholar
|
[35] |
Van den Elzen S, Holten D, Blaas J, Van Wijk J J. Reducing snapshots to points: a visual analytics approach to dynamic network exploration. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 1–10
CrossRef
Google scholar
|
[36] |
Gschwandtner T, Gärtner J, Aigner W, Miksch S. A taxonomy of dirty time-oriented data. In: Proceedings of International Conference on Availability, Reliability, and Security. 2012, 58–72
CrossRef
Google scholar
|
[37] |
Eaton C, Plaisant C, Drizd T. Visualizing missing data: graph interpretation user study. In: Proceedings of IFIP Conference on Human Computer Interaction. 2005, 861–872
CrossRef
Google scholar
|
[38] |
Templ M, Alfons A, Filzmoser P. Exploring incomplete data using visualization techniques. Advances in Data Analysis and Classification, 2012, 6(1): 29–47
CrossRef
Google scholar
|
[39] |
Lin J, Wong J, Nichols J, Cypher A, Lau T A. End-user programming of mashups with vegemite. In: Proceedings of the 14th International Conference on Intelligent User Interfaces. 2009, 97–106
|
[40] |
Scaffidi C, Myers B, Shaw M. Intelligently creating and recommending reusable reformatting rules. In: Proceedings of the 14th International Conference on Intelligent User Interfaces. 2009, 297–306
|
[41] |
Ives Z, Knoblock C, Minton S, Jacob M, Talukdar P, Tuchinda R, Ambite J L, Muslea M, Gazen C. Interactive data integration through smart copy & paste. In: Proceedings of the Biennial Conference on Innovative Data Systems Research. 2009
|
[42] |
Kandel S, Heer J, Plaisant C, Kennedy J, Van Ham F, Riche N H, Weaver C, Lee B, Brodbeck D, Buono P. Research directions in data wrangling: visualizations and transformations for usable and credible data. Information Visualization, 2011, 10(4): 271–288
CrossRef
Google scholar
|
[43] |
Robertson G G, Czerwinski M P, Churchill J E. Visualization of mappings between schemas. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2005, 431–439
CrossRef
Google scholar
|
[44] |
Altova. Data integration: opportunities, challenges, and altova mapforce. http://www.altova.com/whitepapers/mapforce.pdf, 2014
|
[45] |
Informatica. The informatica data quality methodology: a framework to achieve pervasive data quality through enhanced businessit collaboration. https://www.informatica.com/downloads/7130-DQMethodology- wp-web.pdf, 2010
|
[46] |
Zheng Y. Methodologies for cross-domain data fusion: an overview. IEEE Transactions on Big Data, 2015, 1(1): 16–34
CrossRef
Google scholar
|
[47] |
Dash M, Liu H. Feature selection for classification. Intelligent Data Analysis, 1997, 1(3): 131–156
CrossRef
Google scholar
|
[48] |
Fogarty J, Hudson S E. Toolkit support for developing and deploying sensor-based statistical models of human situations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2007, 135–144
CrossRef
Google scholar
|
[49] |
Markovitch S, Rosenstein D. Feature generation using general constructor functions. Machine Learning, 2002, 49(1): 59–98
CrossRef
Google scholar
|
[50] |
Schuller B, Reiter S, Rigoll G. Evolutionary feature generation in speech emotion recognition. In: Proceedings of the IEEE International Conference on Multimedia and Expo. 2006, 5–8
CrossRef
Google scholar
|
[51] |
Guo D S. Coordinating computational and visual approaches for interactive feature selection and multivariate clustering. Information Visualization, 2003, 2(4): 232–246
CrossRef
Google scholar
|
[52] |
Seo J, Shneiderman B. A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In: Proceedings of the IEEE Symposium on Information Visualization. 2004, 65–72
|
[53] |
Piringer H, Berger W, Hauser H. Quantifying and comparing features in high-dimensional datasets. In: Proceedings of the 12th International Conference on Information Visualization. 2008, 240–245
CrossRef
Google scholar
|
[54] |
May T, Bannach A, Davey J, Ruppert T, Kohlhammer J. Guiding feature subset selection with an interactive visualization. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2011, 111–120
CrossRef
Google scholar
|
[55] |
Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1): 273–324
CrossRef
Google scholar
|
[56] |
Klemm P, Lawonn K, Glaöer S, Niemann U, Hegenscheid K, Völzke H, Preim B. 3D regression heat map analysis of population study data. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 81–90
CrossRef
Google scholar
|
[57] |
Lu Y, Wang F, Maciejewski R. Business intelligence from social media: a study from the vast box office challenge. IEEE Computer Graphics and Applications, 2014, 34(5): 58–69
CrossRef
Google scholar
|
[58] |
Brooks M, Amershi S, Lee B, Drucker S M, Kapoor A, Simard P. Featureinsight: visual support for error-driven feature ideation in text classification. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2015, 105–112
CrossRef
Google scholar
|
[59] |
Bögl M, Aigner W, Filzmoser P, Lammarsch T, Miksch S, Rind A. Visual analytics for model selection in time series analysis. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12): 2237–2246
CrossRef
Google scholar
|
[60] |
Lu Y, Kruger R, Thom D, Wang F, Koch S, Ertl T, Maciejewski R. Integrating predictive analytics and social media. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2014, 193–202
CrossRef
Google scholar
|
[61] |
Piringer H, Berger W, Krasser J. Hypermoval: Interactive visual validation of regression models for real-time simulation. Computer Graphics Forum, 2010, 29(3): 983–992
CrossRef
Google scholar
|
[62] |
Mühlbacher T, Piringer H. A partition-based framework for building and validating regression models. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12): 1962–1971
CrossRef
Google scholar
|
[63] |
Gotz D, Sun J. Visualizing accuracy to improve predictive model performance. In: Proceedings of the IEEE VISWorkshop on Visualization for Predictive Analytics. 2014
|
[64] |
Quinlan J R. Induction of decision trees. Machine Learning, 1986, 1(1): 81–106
CrossRef
Google scholar
|
[65] |
Suykens J A, Vandewalle J. Least squares support vector machine classifiers. Neural Processing Letters, 1999, 9(3): 293–300
CrossRef
Google scholar
|
[66] |
Johnson B, Shneiderman B. Tree-maps: a space-filling approach to the visualization of hierarchical information structures. In: Proceedings of the IEEE Conference on Visualization. 1991, 284–291
CrossRef
Google scholar
|
[67] |
Stasko J, Zhang E. Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In: Proceedings of the IEEE Symposium on Information Visualization. 2000, 57–65
CrossRef
Google scholar
|
[68] |
Ware M, Frank E, Holmes G, Hall M, Witten I H. Interactive machine learning: letting users build classifiers. International Journal of Human-Computer Studies, 2001, 55(3): 281–292
CrossRef
Google scholar
|
[69] |
Ankerst M, Elsen C, Ester M, Kriegel H P. Visual classification: an interactive approach to decision tree construction. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999, 392–396
CrossRef
Google scholar
|
[70] |
Van den Elzen S, Van Wijk J J. Baobabview: Interactive construction and analysis of decision trees. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2011, 151–160
CrossRef
Google scholar
|
[71] |
Becker B, Kohavi R, Sommerfield D. Visualizing the simple Baysian classifier. In: Fayyad U, Grinstein G G, Wierse A, eds. Information Visualization in Data Mining and Knowledge Discovery. San Francisco: Morgan Kaufmann Publishers Inc., 2002
|
[72] |
Caragea D, Cook D, Honavar V G. Gaining insights into support vec tor machine pattern classifiers using projection-based tour methods. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2001, 251–256
|
[73] |
Ma Y. EasySVM: a visual analysis approach for open-box support vector machines. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
|
[74] |
John G H, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. 1995, 338–345
|
[75] |
Ho T K. Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995, 278–282
|
[76] |
Mühlbacher T, Piringer H, Gratzl S, Sedlmair M, Streit M. Opening the black box: strategies for increased user involvement in existing algorithm implementations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1643–1652
CrossRef
Google scholar
|
[77] |
Paiva J G S, Schwartz W R, Pedrini H, Minghim R. An approach to supporting incremental visual data classification. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(1): 4–17
CrossRef
Google scholar
|
[78] |
Talbot J, Lee B, Kapoor A, Tan D S. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2009, 1283–1292
CrossRef
Google scholar
|
[79] |
Wu Y, Pitipornvivat N, Zhao J, Yang S, Huang G, Qu H. egoSlider: visual analysis of egocentric network evolution. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 260–269
CrossRef
Google scholar
|
[80] |
Stolper C D, Perer A, Gotz D. Progressive visual analytics: user-driven visual exploration of in-progress analytics. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1653–1662
CrossRef
Google scholar
|
[81] |
Ng K, Ghoting A, Steinhubl S R, Stewart W F, Malin B, Sun J. PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. Journal of Biomedical Informatics, 2014, 48: 160–170
CrossRef
Google scholar
|
[82] |
Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
CrossRef
Google scholar
|
[83] |
Bögl M, Aigner W, Filzmoser P, Gschwandtner T, Lammarsch T, Miksch S, Rind A. Visual analytics methods to guide diagnostics for time series model predictions. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
|
[84] |
Andrienko N, Andrienko G, Rinzivillo S. Experiences from supporting predictive analytics of vehicle traffic. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
|
[85] |
Maciejewski R, Hafen R, Rudolph S, Larew S G, Mitchell M, Cleveland W S, Ebert D S. Forecasting hotspots — a predictive analytics approach. IEEE Transactions on Visualization and Computer Graphics, 2011, 17(4): 440–453
CrossRef
Google scholar
|
[86] |
Cleveland R B, Cleveland W S, McRae J E, Terpenning I. STL: a seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 1990, 6(1): 3–73
|
[87] |
Bryan C, Wu X, Mniszewski S, Ma K L. Integrating predictive analytics into a spatiotemporal epidemic simulation. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2015, 17–24
CrossRef
Google scholar
|
[88] |
Chuang J, Socher R. Interactive visualizations for deep learning. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
|
[89] |
Yeon H, Jang Y. Predictive visual analytics using topic composition. In: Proceedings of the 8th International Symposium on Visual Information Communication and Interaction. 2015, 1–8
CrossRef
Google scholar
|
[90] |
Wu Y C, Liu S X, Yan K, Liu M C, Wu F Z. OpinionFlow: visual analysis of opinion diffusion on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1763–1772
CrossRef
Google scholar
|
[91] |
Choo J, Lee H, Kihm J, Park H. iVisClassifier: an interactive visual analytics system for classification based on supervised dimension reduction. In: Proceedings of the IEEE Symposium on Visual Analytics Science and Technology. 2010, 27–34
CrossRef
Google scholar
|
[92] |
Höferlin B, Netzel R, Höferlin M, Weiskopf D, Heidemann G. Interactive learning of ad-hoc classifiers for video visual analytics. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2012, 23–32
|
[93] |
Heimerl F, Koch S, Bosch H, Ertl T. Visual classifier training for text document retrieval. IEEE Transactions on Visualization and Computer Graphics, 2012, 18(12): 2839–2848
CrossRef
Google scholar
|
[94] |
Munzner T. Visualization Analysis and Design. Boca Raton: CRC Press, 2014
|
[95] |
Delevingne L. Hedge fund robots crushed human rivals in 2014. http://www.cnbc.com/2015/01/05/hedge-fund-robots-crushed-humanrivals- in-2014.html, 2015
|
[96] |
Seifert M, Hadida A L. On the relative importance of linear model and human judge(s) in combined forecasting. Organizational Behavior and Human Decision Processes, 2013, 120(1): 24–36
CrossRef
Google scholar
|
[97] |
Ruchikachorn P, Mueller K. Learning visualizations by analogy: promoting visual literacy through visualization morphing. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(9): 1028–1044
CrossRef
Google scholar
|
[98] |
Amini F, Rufiange S, Hossain Z, Ventura Q, Irani P, McGuffin M J. The impact of interactivity on comprehending 2D and 3D visualizations of movement data. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(1): 122–135
CrossRef
Google scholar
|
/
〈 | 〉 |