1 Introduction
In the era of big data, data analysis faces challenges addressed by large-scale data, complicated analysis tasks, etc. Visualization is widely studied and applied to conquer intricate data-related tasks in a variety of domains, like medical treatments [
1,
2], social behaviors [
3,
4], and business services [
5]. Because of intuitive data representations, visual analysis approaches can improve the efficiency of knowledge generation by speeding up the process of data comprehension [
6]. However, humans still need extra assistance for analyzing complex data (e.g., high-dimensional data [
7], graph data [
8], heterogeneity data [
4], etc.) or performing sophisticated analysis tasks (e.g., anomaly detection [
9]).
Simultaneously, it is proved that artificial intelligence (AI) models can achieve higher accuracy than humans in simple tasks [
10]. Taking advantage of automation, AI is leveraged to lower labor costs and avoid human error. However, AI approaches provide limited assistance to analysis tasks with unclear goals. For example, advanced AI can not design high-quality works without the guidance from humans [
11]. It is necessary to allow close and efficient communication between humans and AI, which can be supported by visual approaches.
As widely-applied data analysis approaches, visual analysis and AI approaches have complementary advantages. On one hand, AI approaches can improve visual analysis from multiple aspects, including data processing, generation of visual designs, and understanding user intention [
12,
13]. On the other hand, visual analysis can optimize AI approaches through participating in all stages of model building [
14]. AI approaches for visualization (AI4VIS) [
12] and visual analysis for AI approaches (VIS4AI) [
14] are surveyed separately. According to them, we can understand how visualization can benefit from high-quality AI and how visualization can help construct high-quality AI.
However, visualization and AI can be further integrated to support data analysis. On one hand, AI may be faced with tasks that are not pre-defined when humans come up with new analysis requirements. To perform new tasks, AI needs to collect training data, learn analysis processes guided by human intelligence, verify output based on human feedback, and be ready for the next usage by iteratively interacting with humans through visualization. On the other hand, visual analysis approaches should support human users in exploring data flexibly and efficiently. Close connections with AI allow visual analysis systems to provide human users with customized and comprehensive assistance. Therefore, it is necessary for data analysts to follow an integrated framework of visualization and AI when developing visual systems. Unfortunately, descriptions of such an iterative process are missing in existing studies.
To fill the gap, we introduce visualization with artificial intelligence (VIS+AI), which integrates visual analysis and AI approaches to analyze data. AI techniques for humans-centered visual knowledge extraction and visualization approaches for AI-centered data mining lay the foundation of VIS+AI. The framework of VIS+AI allows bidirectional communication between humans and AIs. Human intelligence can, therefore, coordinate with artificial intelligence in the process of data analysis via visualization. We also discussed existing tools and describe the research prospective of VIS+AI.
2 Levels of integration
To summarize related studies, we have surveyed 113 papers, including 4 high-quality survey papers [
14–
17]. We first generalize processes of data analysis to knowledge extraction from data. The form of knowledge can be human knowledge, artificial intelligence, or both of them. For example, through machine learning, AI-based data analysis can yield reusable intelligence for classification judgment, prediction, etc. Then, we found that the integration of visualization and AI in data analysis has become closer in recent years. To describe such a trend, we re-divide corresponding studies and define three levels of integration (see Fig.1).
Visualization and AI are first used separately, which are data analysis approaches at level 0: independent process. At this level, humans can take advantage of visualization or AI but not both of them (see Section 3). The maturity of the technology can lead to the expansion of the application scenarios. Visualization and AI have been applied to assist each other. Related approaches are known as VIS4AI and AI4VIS, which correspond to level 1: one-way assistance. Initial integration covers up partial shortcomings (see Section 4). However, one-way assistance cannot support feedback. Approaches at level 1 have no chance to assess or optimize the effect of the provided assistance. To further improve data analysis approaches, the next level requires dual-way assistance, which is level 2: deep integration. We introduce the approaches at level 2 as VIS+AI (see Section 5).
3 Level 0: independent processes
In this section, we introduced the visualization-based knowledge generation process and the AI-based intelligence generation process, respectively.
3.1 Knowledge generation process in visual analysis
The knowledge generation model (see Fig.2) presented by Sacha et al. [
17] unfolds the visual analysis pipeline proposed by Keim et al. [
18] into three loops, consisting of an exploration loop, a verification loop, and a knowledge generation loop. Through proceeding with the three loops iteratively, humans can leverage visual analysis to extract knowledge from data.
In the exploration loop, humans observe the visual analysis systems and communicate with them directly. Visual analysis systems represent data with multiple visual encodings (e.g., color, length, position). Through observing visual encodings, humans can perceive its graphical patterns (i.e., findings [
17]). Interesting findings can inspire subsequent exploration processes. Humans can take actions to start a new exploration loop and have other findings. Due to the limited ability in information perception, humans can hardly learn complicated data comprehensively in a glimpse. To ensure perceived efficiency, visual analysis systems visualize partial information in a time. Usually, visual interfaces first provide a data overview, and then show data from certain perspectives [
19]. The exploration loop requires massive user labor to learn about data and achieve various findings—not all findings are valuable or significant. When humans have insufficient knowledge of the data, they may need navigation to explore data efficiently.
In the second loop, humans attempt to verify hypotheses and yield insights, which is the verification loop. Findings can be interpreted and decoded into insights. For example, humans first observe the graphic patterns (i.e., findings) in the scatterplot visualizing joint distribution. According to human knowledge, they can further learn about the correlation between two data dimensions and have corresponding insights (e.g., there exists a strong positive correlation between the two dimensions). Then, humans come up with a hypothesis based on their knowledge or insights yielded during the process of visual analysis.
With the accumulation of insights, humans understand data features and generate knowledge on them. Note that visual analysis processes could be continued after gaining knowledge. Because humans can come up with new hypotheses based on the acquired knowledge and re-enter the verification loop to gain more knowledge. Currently, the final loop can not be assisted by AI models.
3.2 Machine learning pipeline
AI-based data analysis approaches generate knowledge in the form of machine intelligence, which can be stored as model parameters and applied by running models. Unlike the knowledge generation process of humans, the generation process of machine intelligence needs an explicit target (usually defined by humans). According to the target, AI can extract high-quality intelligence from data through a three-stage pipeline of machine learning [
14].
The first stage is to prepare data for feature extraction. Excluding task-irrelevant features can not only improve model performance, but also decrease computational costs. Machine learning models have various architectures, which specialize in different tasks. In the second stage, a model architecture needs to be specified for model training. During the training process, machine learning models optimize themselves by updating parameters. Unfortunately, low performance can also be caused by inappropriate training data or model architectures. In the final stage, the trained model is deployed after evaluation. It is necessary to be aware of model vulnerabilities before deploying. Performance metrics can hardly provide a comprehensive description and detailed explanation for the process of model judgments and data characteristics.
4 Level 1: one-way assistance
Visualization and AI can provide assistance for each other. Data analysis approaches at the level 1 involve one-way assistance, which are called visualization for AI (VIS4AI) and AI for visualization (AI4VIS).
4.1 Visualization for AI
AI extracts knowledge from data in a different way from humans. Visual analysis approaches build a bridge for humans to access the entire learning process. In level 1, visualization approaches for AI are proposed to assist visual understanding.
AI models have a problem-solving pipeline different from humans. To entrust AI with tasks, humans have to understand what is learned by AI and how AI can perform tasks. Visualization facilitates comprehension by explaining training data, input, and the data generated by models, including model architecture, output and performance indicators.
AI models learn from training data. Thus, training data with low-quality lead to poor model performance. Visual interfaces allow humans to check various data quality issues, like missing values [
20], abnormal plausibility [
21], noisy data [
22], data redundancy [
23,
24] and label missing [
25].
It is necessary to understand model mechanics for model selection. For example, linear regression can be demonstrated by superimposing the regression line, i.e., classification boundary, on the scatterplot of training data [
26]. Deep learning models consist of intricate layers and data flows, which can be extracted as graphs. Graph layout optimization techniques, like edge bundling, can be applied to summarize the hierarchical structure of deep learning models [
27]. To further illustrate related mathematics operations, CNN Explainer [
28] unfolds convolutional layers with visual representations. Besides, interactive exploration can assist machine learning beginners to understand. TensorFlow Playground [
29] allows users to interactively specify training data, hyperparameters and the architecture of deep learning models. For example, users can modify the number of hidden layers and the number of neurons in each layer. The output of the modified model can be inspected to elicit a better understanding of model architecture. Especially, instance-level examination [
30–
32] can be applied to illustrate the specific judgment process or verify humans’ hypotheses [
33,
34], like “misjudgments caused by specific features exist.” After understanding model mechanics, humans can employ model judgments to analyze data in a more efficient way [
35].
Humans can also monitor the learning process and assess model effectiveness at any time. Performance fluctuation indicates whether the training process can be terminated. Further inspections includes bias analysis [
36] and vulnerability detection [
37].
4.2 AI for visualization
The knowledge generation process in visual analysis can benefit from AI. Existing studies employ AI approaches mainly in three parts: findings, actions, and insights.
4.2.1 Findings
Currently, human beings lack efficient means to perceive big data from multiple perspectives (e.g., different dimensions) and identify significant findings from a large number of charts visualizing big data. Fortunately, AI approaches can support batch evaluation for a large number of targets and speed up the finding process by mainly two means: 1) recommending appropriate visual representations to facilitate finding identification, and 2) recommending findings by simulating finding identification behaviors of humans.
1) Recommendation of visual representation: Besides of data characteristics, visual representations (e.g., time series [
38], node-link diagrams [
39–
41], projection [
42]) can affect finding identification. AI approaches can recommend visual representation to improve the finding efficiency. Existing studies can inspect visual representations by implementing overall evaluation, which yields quantified evaluation for single objects [
43–
45] or makes a comparison between multiple objects [
46]. To assess the readability of single targets efficiently, Haleem et al. [
47] train a CNN model to reproduce traditional metrics (e.g., node dispersion). The input of the CNN model is the image of a graph layout, instead of the coordinates of corresponding nodes or edges. Thus, the running time is not affected by the number of nodes or edges, which allows high-speed evaluation and layout recommendation.
When multiple objects are analyzed, maintaining consistency of visualization in the analysis process can reduce the cognitive cost of humans. To generate a series of projections for temporal data, incremental PCA [
48] projects the same records to similar positions. GraphScape [
49] calculates the costs of chart transition by identifying the overlaps and non-overlaps between the data records in two charts. The results from GraphScape are used to recommend chart sequences to improve humans’ analysis efficiency. Similarity-based approaches can also be used to yield visual representations that satisfy human preference. DeepDrawing [
50] learns the characteristics of humans’ favorite layouts and generates new layouts based on learned characteristics. Besides, AI models can generate descriptions of temporal changes based on the comparison results between data recorded at different time stamps. GenerativeMap [
51] and TSR-TVD [
52] represent dynamics through intermediate results calculated by generative learning models.
2) Recommendation of findings: to simulate the finding behaviors of humans, AI approaches need to learn how humans perceive information from visual analysis systems and how humans judge if the perceived information can be regarded as a finding.
Humans perceive visual representations through observation behaviors, like fixation, saccades, etc. For example, humans always spent a relatively long period browsing something interesting but confusing. Thus, fixation may imply that humans have findings. Such behaviors can be captured by eye tracking [
53–
55]. Seeking further intention determination, VA
2 [
56] collects data describing various human behaviors, including eye movement, interactions and thoughts. Patterns identified from VA
2 can illustrate how humans have findings.
According to analysis results of human behaviors, metrics are designed to simulate humans’ judgment. Nevertheless, existing experiment results [
57,
58] indicate that scagnostics-based metrics can not represent human perception directly. Besides, multiple metrics need to be integrated into a single simulation, because multiple factors could be considered by humans simultaneously.
4.2.2 Actions
Humans are allowed to take the initiative in the process of visual analysis by actions. It is difficult for humans to perform accurate actions in complicated analysis contexts. AI can detect human errors and provide correct feedback. For example, Fan and Hauser [
59] employ CNN to comprehend data distribution and correct the set of points selected by brushing according to density.
4.2.3 Insights
The verification loop recommends a deductive reasoning process, which starts from hypotheses. However, AI models mainly leverage inductive reasoning. Although AI models can be used to verify a hypothesis based on data, few studies focus on automatic generation of hypotheses. Therefore, we mainly summarize studies on how AI models contribute to insights, which mainly include two directions: 1) generating insights by comprehending visual representations, and 2) expressing insights.
1) Insight generation: AI models generate insights by comprehending the data depicted in diagrams. Visual analysis introduces sophisticated insight generation processes. For instance, insights could be comparison results or summary of partial data records [
60]. To generate insights about visual charts, AI models need to not only identify visual elements, but also decode the data encoded by elements. To figure out visual encodings, AI models can inspect legends [
61,
62] or embedded codes [
63,
64]. Then, AI models can learn from the decoded data and generate insights according to human needs. For example, Temporal Summary Images (TSIs) [
65] extract insights from charts of time series data according to the three types of points of interest (POIs). Those insights can be further ranked by the significance of related attributes and POI types. Top ones will be recommended to humans.
2) Insight expression: AI models need to express what they learn from diagrams to humans. To support user-friendly communication, AI models have to learn natural language to express insights. Natural language generation (NLG) [
66,
67] can be used to translate automatically-generated insights into narratives in natural language.
5 Level 2: deep integration (VIS+AI)
At level 2, VIS+AI aims at barrier-free communication between human intelligence and artificial intelligence in the scenario of visual analysis. However, close bilateral communication is not supported by approaches at level 1. In this context, we propose the framework of VIS+AI (see Fig.3) to completely open up the channel between AI and visualization, which further links human intelligence. As shown on the left of the framework, the knowledge generation model is inherited from the previous level to inject human intelligence. As shown on the right of the framework, the channel between AI and visualization consists of three iterative loops: an interaction loop, an execution loop, and an intelligence optimization loop. Through the three loops, AI can adapt to dynamic data analysis processes, and therefore be deeply involved into the data analysis processes guided by humans.
5.1 Interaction loop
In the first loop, AI communicates with humans via visual analysis systems directly. Through communication, humans can convey instruction to AI and AI can feedback to humans according to their instruction.
As mentioned in Section 2, a goal is necessary to activate the usage of AI. Therefore, AI first needs to receive an executable instruction. In the frame of VIS+AI, instruction can be either explicit or implicit. Explicit instruction is an instruction in predefined formats, which can be sent by specific actions in visual analysis systems. For example, selecting a set of data records and clicking the button of “Query” generates a piece of instruction that requests related information of the selected data records. Implicit instruction can be sent without clear instruction expression. For instance, the query interaction indicates that humans are interested in the selected data records. Corresponding implicit instruction could be “recommend other data records with similar features.”
Although executing implicit instruction can further broaden humans’ horizon and improve efficiency of data analysis, it is challenging to capture implicit instruction and understand human needs correctly. Because human needs can vary from individuals and the analysis scenarios. To mine correlations, AI models need to collect existing analysis provenance. Explicit instruction can be regarded as ground-truth labels and be used to generate a user profile. Descriptions of analysis scenarios could include analysis progress (e.g., acquired findings, components on the current interface) and data characteristics (e.g., size, distribution) [
68,
69], which involve a sophisticated data space. AI can leverage feature extraction approaches to identify significant features/variables to record humans’ analysis provenance. Based on details of analysis provenance, hidden Markov model [
70] and Long-Short Term Memory Model (LSTM) [
71] can be used to understand implicit instructions and predict future needs.
Execution results of instruction will be responded to humans as feedback. When AI has no ability to execute human instruction, AI can seek assistance from humans by feeding related questions back and recording the answers from humans. For example, AI can not accurately execute instructions, like “Show me the most interesting pattern,” at the beginning because AI has no idea about the definition of “interesting pattern”. To execute the instruction, AI can first ask humans to select a pattern from a group of alternatives, and then provide the details of the selected pattern. To support visual understanding, feedback can be encoded as visual representations and integrated into visual analysis systems. Humans can then implement the exploration loop (see Section 3.1) based on the updated visual interfaces.
5.2 Execution loop
The second loop requires AI to execute human instruction automatically.
AI needs to leverage what they learnt to execute instruction. If existing AI is not able to deal with instruction, AI needs activate extra learning processes. For example, when AI can not understand implicit instruction, AI has to learn the relationship between the meaning of instruction and dependent variables collected in the interaction loop. A learning process can be implemented through the machine learning pipeline. Humans can control learning processes by instruction, which specify model architectures, hyperparameters, etc. Besides, a single piece of instruction could activate a series of learning processes. Because splitting complex instruction into sub-problems can simplify learning processes and improve performance.
Outcome of learning processes includes model output, intermediate results, performance evaluation, etc. When adequate outcome is yielded, feedback can be refined from outcome and conveyed to humans through visual analysis systems. After analyzing model outcome, humans can comprehensively assess AI models and optimize them by issuing new instruction (e.g., parameter adjustments).
5.3 Intelligence optimization loop
A function may be called repeatedly in the process of visual analysis. Therefore, what is learned by AI needs to be remembered and wait for the next usage. Fortunately, models can be reserved by storing necessary model parameters.
AI employed in visual analysis systems is not always perfect. Fortunately, AI models can not only learn from each other by techniques, like transfer learning [
72], but also respond to human instructions. There exist two ways to optimize AI. The first is passive optimization, which seeks explicit navigation from humans. The second is active optimization, which seeks self-optimization by learning, e.g., model updates. As shown on the right of the Fig.3, the above flow
AI-outcome-feedback-visual analysis system supports humans to gain visual understanding and generate knowledge of data. Then, the below flow
visual analysis system-instruction-learning-AI allows humans to integrate human knowledge and provide guidance [
73].
5.3.1 Passive optimization
Passive optimization is triggered when humans issue explicit instructions to improve AI models. For supervised learning, humans can improve model performance or speed up learning processes. When misjudgments are identified by visual understanding, humans can make adjustments interactively according to the diagnosis results. For instance, when out-of-distribution is detected, humans can issue explicit instructions by adding certain samples to the training set [
74]. Unsupervised learning can also follow human guidance from human-issued instructions. For example, humans can navigate clustering models by labeling individuals [
75–
77]. To avoid excessive labor, humans can leverage an iterative labeling process, in which significant individuals are prioritized.
5.3.2 Active optimization
Humans can immerse themselves in data analysis if there is no need to issue explicit instructions on model adjustment. Aimed at a smoother user experience, active optimization requires AIs to optimize themselves by collecting data actively and learning from the data. We summarize corresponding studies according to the type of assistance provided by AI: findings, actions, and insights.
1) Findings: as mentioned in Section 4.2.1, humans identify interesting findings based on multiple considerations. AI can refine corresponding metrics from each consideration. To learn the metric combination, Sedlmair et al. [
78] train a classifier with existing metrics to learn humans’ judgment on how a class is separated from others in color-coded scatterplots. The parameters of the classifier can reflect how each metric exerts effects on the final judgment. Following this idea, ScatterNet [
79] explains similarity perception of scatterplots based on a deep learning model trained with semantic features.
Besides of metrics, existing studies design AI models to simulate human judgments. Abbas et al. [
80] select the model, which matches best with human judgment, from a set of models. The selected model can be used to rank patterns by complexity. DeepEye [
81] extracts features used to train a model, which simulates human recognition and provides a binary judgment for a visualization. The judgment from AI can be used to filter user-interested patterns for finding generation [
82]. In addition, automatic evaluation can also be employed to generate visual representations, like projection [
83].
2) Actions: the provenance of task-oriented actions can be used to extract human intent [
68]. For example, the targets selected by humans may indicate humans’ points of interest to which they may revisit in the following actions. Classification models trained with mouse interaction log can successfully predict the visual analysis tasks of cancer genomics experts [
84]. Similarly, Gotz et al. [
85] find interaction patterns and infer humans’ intended analysis task by using a rule-based approach on analysis behavior data. According to humans’ intent, AI can assist humans to conduct actions by two means: respond to the human action correctly, and guide following actions efficiently.
To improve the accuracy of intent prediction, AI models should integrate action provenance and current analysis context [
86]. For example, LassoNet [
87] infers which points in a 3D space humans are intent to select by lasso according to the perspectives set by humans. Guidance of following actions can sometimes reduce learning cost of complex systems and improve system performance through preloading. REACT [
86] suggests next actions by searching similar analysis contexts from the tree that records analysis provenance. To convey guidance, ModelSpace [
88] summarizes the explored analysis contexts by projection and recommends humans to explore the analysis context in blank area.
3) Insights: active optimization allows AI to not only “show” the insights generated by AI but also “discuss” with humans about the insights. To respond to human’s questions, FigureQA [
89] and Data Visualization Question Answering (DVQA) [
90] provide corpora of question-answer pairs corresponding to massive chart images. Based on corpora for question answering for visual charts, semantic information in related questions can be comprehended by computer [
91]. However, the meaning of certain words may affect by usage scenarios. For instance, a visual element can be specified by its ID or unique visual encoding in the chart. To understand questions, the applied corpora have to be updated according to existing visual encodings [
92]. FigureSeer [
61] classifies figures in scholarly articles by identifying text contents, like IDs in legends and ticks of axes. Given a question, the identified “keywords” can be used to select the relevant figures and provide an answer.
Humans sometimes do not trust automatic insights without verification, because insights generated by automatic approaches could be incorrect. To make automatic insights reliable and easy to understand, AI needs to learn which insights could be questioned by humans and augment those insights by auxiliary explanation [
93]. Similar to insights, auxiliary explanation can be provided in the form of text. Kim et al. [
92] conduct a user study to learn the insight generation processes of humans, consisting of raising questions, seeking solutions, and answering the question. The output of the model employed by Kim et al. [
92] can not only answer a question, but explain how the answer is delivered by descriptions in natural language.
Besides, visual analysis leverages visual representations to explain data, which can also be used to explain insights. Descriptions in narrative can correspond to visual elements in charts [
94,
95]. When humans are checking a piece of insight, visual explanation can be displayed on related visual elements. For the insights extracted from data instead of visual charts, DataShot [
96] generates infographics to illustrate the insights.
5.4 Differences from Level 1
To discuss the differences between level 2 (i.e., VIS+AI) and level 1 (i.e., VIS4AI and AI4VIS), we summarize the features of existing tools at level 1 from two aspects: 1) assistance provision: whether AI-based assistance for humans’ knowledge generation processes is provided in visual analysis systems, and 2) user accessibility: whether humans are allowed to access AI modules directly. Each feature is remarked as supporting and missing (see Tab.1). We further differentiate “partially” and “fully” of the three user accessibility features are differentiated on whether the feature is fully supported or partially supported. Details are listed as follows:
● Instruction: Whether the tool attempt to responds to implicit instruction or explicit instruction only?
● Feedback: Whether AI can express their questions or available executed results only?
● Learning: Whether the tool can learn from human behavior (i.e., using human behavior as training data) or accept instruction on model adjustments (e.g., parameter settings) only?
As shown in Tab.1, the framework of AI+VIS has not been fully activated in existing tools. Either category of tools supports a single direction of communications between humans and AI but fails in the other direction. AI4VIS tools, like Power BI, following the path of AI-outcome-feedback-visual analysis system can answer humans’ questions but support neither implicit instruction nor explicit instruction on model adjustments. On the opposite, VIS4AI tools, like TensorBoard, following the path of visual analysis system-instruction-learning-AI allow humans to adjust models but can not respond to implicit instruction.
5.5 Application scenarios
Various application scenarios can benefit from the achievement of VIS+AI. We describe two types of them to demonstrate the effectiveness of VIS+AI.
● Real-time decision-making In scenarios, like stock trading and disaster rescue, procrastination may cause huge losses. VIS+AI supports real-time intelligence communication between humans and AI, which guarantees a swift decision-making process. Moreover, either humans or AI can play the role of decision-makers and assess each other’s decisions based on two-way communication.
● Exploration of heterogeneous data Heterogeneous data is widely used in scenarios, like urban planning and criminal investigation. Exploring heterogeneous data need to employ various models, which are superior at different tasks. VIS+AI allows humans to communicate with multiple models simultaneously and get assistance comprehensively.
6 Future directions
We summarize three future directions to develop VIS+AI.
6.1 Understanding analysis behaviors
The framework of VIS+AI requires AI to learn how to respond to implicit instruction and satisfy complex requirements. High-quality training data describing analysis behavior of humans is necessary to support AI learning from humans. Although VIS+AI can collect human behavior data through communication with humans, VIS+AI is also based on intelligence that can support AI initiating communication on visual analysis. Therefore, we still need to collect sufficient data.
Recent studies have made the first step. For example, Zhao et al. [
97] attempt to understand how humans respond to different graph visualizations by user studies. The study result can guide AI to improve the generation processes of graph representations.
6.1.1 Collecting analysis provenance
Existing studies introduce numerous visual analysis systems and their applications (e.g., case studies), which not only assists researchers to define design spaces and construct metrics but also provides samples for AI approaches to learn system usages. However, comprehensive provenance of system usages, including analysis goals, visual representations and interactions in each step [
98], are far from sufficiency.
Sections of case studies mainly aim at proving the effectiveness of visual analysis systems. Such sections may emphasize system performance and functions, instead of user experiences. In most articles, only successful analysis stories are introduced. Failure attempts are always existing but omitted. Even though we can extract positive samples from textual descriptions, lack of negative samples contributes to extremely skewed distribution of training data. Besides, it is challenging to restore analysis provenance from textual description and figures in articles. Writing skills can facilitate reading but increase difficulty in information extraction. For example, visual representations shown in figures may be superimposed with annotations to highlight certain details. Annotations could be embedded in visual systems or added by users. The two cases are hard to distinguish automatically.
Collecting analysis provenance from published studies requires efforts from both sides. On the one hand, we encourage researchers of visual analysis systems to provide comprehensive analysis provenance when publishing studies. On the other hand, researchers dedicated to developing AI-assisted approaches need to claim necessary information for model training, which will guide provenance collection [
99].
6.1.2 Ensuring data security
AI models are under the risk of active attacks. On the one hand, malicious adversaries can poison models with crafted updates and misleading models to specific results [
100]. To identify stealthy attacks [
101,
102], AI models have to be transparent and explainable. Besides of efficient detection models, visualization can facilitate abnormal detection and attack identification by intuitive patterns [
37].
On the other hand, user information, like interaction provenance and user status, is considered as individual privacy. The use of such data comes with the risk of privacy leakage, which arise users’ concern. However, comprehensive and accurate personal data is indispensable for providing intelligent assistance. Currently, a majority of enterprises (e.g., Facebook) have developed privacy policies to obtain the permission of personal data use. In addition, decentralized AI approaches, like Federated learning [
103], are proposed to train models with data locally and synthesis models globally, which allows users to benefit from a shared AI model without uploading personal data. However, adversaries still have chances to infer training data according to model parameters or model output [
104]. To preserve privacy, techniques for active defense, like differential privacy [
105], can be integrated into AI models.
6.2 Application of visual knowledge
The concept of visual knowledge [
106] is proposed to prompt research progress in the era of AI 2.0. To leverage visual knowledge in intelligence communication, related researchers should pay attention to expression and comprehension of visual knowledge.
6.2.1 Visual knowledge expression
Visual knowledge expression requires general expression rules–visual encodings. Based on visualizations designed by humans, AI has learned to select appropriate visual encodings for data in a simple format. Common charts (e.g., bar charts, line charts) can be generated according to data characteristics and user preference automatically [
96,
107].
Humans can generate visual knowledge by drawing or coding. Nevertheless, not all humans have advanced skills to express their thoughts flexibly. It is universal to create visual representations interactively in visual analysis systems [
108]. For instance, Data Prophecy [
109] allows users to express their hypotheses by interactions, like sketching the contour of scatters. However, simple interaction designs can not support details expression–descriptions of density are not allowed.
Existing studies still leave blank on how humans or AI can express complex knowledge visually. To lift communication restrictions, general expression rules for complex knowledge need to be studied.
6.2.2 Visual knowledge comprehension
Visualization is designed to facilitate humans in data analysis. As for AI, partial visual representations are not “intuitive.” For example, it is challenging for AI to comprehend visual encodings according to legends, where text and icons may be mixed.
VisCode [
110] and Chartem [
111] avoid this issue by embedding data information into the pictures of visual charts. However, it is still important for AI to develop the ability to comprehend visual knowledge. Visualization as Intermediate Representations (VLAIR) [
112] employs visualization as input of CNN. VLAIR achieves better performance than classification models with input of original data. Because visual representations prompt AI to focus on significant features.
6.3 Reliable AI
AI models are allowed to take full responsibility for simple missions (e.g., face recognition for access control systems). Should researchers on visual analytics pursue high-level automation and switch research focuses to advanced AI technologies? The answer is not exact. Users need not only tools with low labor cost, but also analysis results, which have to be reliable and trustworthy [
113]. Excessive automation in visual analysis may lead to ignoration of significant facts. When users rely on detection algorithms excessively, they may miss important anomalies or the opportunities to explore the unknown because of incomplete detection standards.
To avoid such situations, high-level automation for visual analysis should be explainable and under the control of human. Humans can hardly trust AI output before understanding the mechanisms of applied AI models, especially when there exist conflicts between human judgment and AI output. As the interface between humans and AIs, visual analysis systems should provide users with necessary explanations and evidence. However, how much explanation or evidence is enough for human users with different levels of knowledge on AI models? Related studies are needed to answer the question and guide future works.
7 Conclusion
In this paper, we introduce VIS+AI, a state-of-the-art technique that facilitates direct communication between human intelligence and artificial intelligence by visualization. We categorize existing attempts from two aspects: 1) AI technologies, which assist the usage of visual analysis systems, from the perspective of knowledge extraction processes, and 2) visualization approaches for AI. Potential directions on VIS+AI are discussed to inspire new studies. We believe that data applications will greatly benefit from VIS+AI.
The Author(s) 2023. This article is published with open access at link.springer.com and journal.hep.com.cn