Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during the generation of data, the development of algorithms, and the evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS). The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and researches. These ideas include experimental design principles of randomization and local control as well as the principle of stability to gain reproducibility and interpretability of algorithms and data results. We discuss the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors’ collaborative research.
Conversational systems have come a long way since their inception in the 1960s. After decades of research and development, we have seen progress from Eliza and Parry in the 1960s and 1970s, to task-completion systems as in the Defense Advanced Research Projects Agency (DARPA) communicator program in the 2000s, to intelligent personal assistants such as Siri, in the 2010s, to today’s social chatbots like XiaoIce. Social chatbots’ appeal lies not only in their ability to respond to users’ diverse requests, but also in being able to establish an emotional connection with users. The latter is done by satisfying users’ need for communication, affection, as well as social belonging. To further the advancement and adoption of social chatbots, their design must focus on user engagement and take both intellectual quotient (IQ) and emotional quotient (EQ) into account. Users should want to engage with a social chatbot; as such, we define the success metric for social chatbots as conversation-turns per session (CPS). Using XiaoIce as an illustrative example, we discuss key technologies in building social chatbots from core chat to visual awareness to skills. We also show how XiaoIce can dynamically recognize emotion and engage the user throughout long conversations with appropriate interpersonal responses. As we become the first generation of humans ever living with artificial intelligenc (AI), we have a responsibility to design social chatbots to be both useful and empathetic, so they will become ubiquitous and help society as a whole.
This paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, interpretability is always Achilles’ heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of a low interpretability of their black-box representations. We believe that high model interpretability may help people break several bottlenecks of deep learning, e.g., learning from a few annotations, learning via human–computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence.
The cocktail party problem, i.e., tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously, is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition (ASR) systems. In this overview paper, we review the techniques proposed in the last two decades in attacking this problem. We focus our discussions on the speech separation problem given its central role in the cocktail party environment, and describe the conventional single-channel techniques such as computational auditory scene analysis (CASA), non-negative matrix factorization (NMF) and generative models, the conventional multi-channel techniques such as beamforming and multi-channel blind source separation, and the newly developed deep learning-based techniques, such as deep clustering (DPCL), the deep attractor network (DANet), and permutation invariant training (PIT). We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment. We argue effectively exploiting information in the microphone array, the acoustic training set, and the language itself using a more powerful model. Better optimization objective and techiques will be the approach to solving the cocktail party problem.
Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks continue to increase. This poses a significant challenge to the deployment of such networks, especially in real-time applications or on resource-limited devices. Thus, network acceleration has become a hot topic within the deep learning community. As for hardware implementation of deep neural networks, a batch of accelerators based on a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC) have been proposed in recent years. In this paper, we provide a comprehensive survey of recent advances in network acceleration, compression, and accelerator design from both algorithm and hardware points of view. Specifically, we provide a thorough analysis of each of the following topics: network pruning, low-rank approximation, network quantization, teacher–student networks, compact network design, and hardware accelerators. Finally, we introduce and discuss a few possible future directions.
A powerful platform of digital brain is proposed using crowd wisdom for brain research, based on the computational artificial intelligence model of synthesis reasoning and multi-source analogical generating. The design of the platform aims to make it a comprehensive brain database, a brain phantom generator, a brain knowledge base, and an intelligent assistant for research on neurological and psychiatric diseases and brain development. Using big data, crowd wisdom, and high performance computers may significantly enhance the capability of the platform. Preliminary achievements along this track are reported.
Deep neural networks have been successfully applied to numerous machine learning tasks because of their impressive feature abstraction capabilities. However, conventional deep networks assume that the training and test data are sampled from the same distribution, and this assumption is often violated in real-world scenarios. To address the domain shift or data bias problems, we introduce layer-wise domain correction (LDC), a new unsupervised domain adaptation algorithm which adapts an existing deep network through additive correction layers spaced throughout the network. Through the additive layers, the representations of source and target domains can be perfectly aligned. The corrections that are trained via maximum mean discrepancy, adapt to the target domain while increasing the representational capacity of the network. LDC requires no target labels, achieves state-of-the-art performance across several adaptation benchmarks, and requires significantly less training time than existing adaptation methods.
Question answering is an important problem that aims to deliver specific answers to questions posed by humans in natural language. How to efficiently identify the exact answer with respect to a given question has become an active line of research. Previous approaches in factoid question answering tasks typically focus on modeling the semantic relevance or syntactic relationship between a given question and its corresponding answer. Most of these models suffer when a question contains very little content that is indicative of the answer. In this paper, we devise an architecture named the temporality-enhanced knowledge memory network (TE-KMN) and apply the model to a factoid question answering dataset from a trivia competition called quiz bowl. Unlike most of the existing approaches, our model encodes not only the content of questions and answers, but also the temporal cues in a sequence of ordered sentences which gradually remark the answer. Moreover, our model collaboratively uses external knowledge for a better understanding of a given question. The experimental results demonstrate that our method achieves better performance than several state-of-the-art methods.
Generative adversarial network (GAN) is the most exciting machine learning breakthrough in recent years, and it trains the learning model by finding the Nash equilibrium of a two-player zero-sum game. GAN is composed of a generator and a discriminator, both trained with the adversarial learning mechanism. In this paper, we introduce and investigate the use of GAN for novelty detection. In training, GAN learns from ordinary data. Then, using previously unknown data, the generator and the discriminator with the designed decision boundaries can both be used to separate novel patterns from ordinary patterns. The proposed GAN-based novelty detection method demonstrates a competitive performance on the MNIST digit database and the Tennessee Eastman (TE) benchmark process compared with the PCA-based novelty detection methods using Hotelling’s T2 and squared prediction error statistics.
Design intelligence, namely, artificial intelligence to solve creative problems and produce creative ideas, has improved rapidly with the new generation artificial intelligence. However, existing methods are more skillful in learning from data and have limitations in creating original ideas different from the training data. Crowdsourcing offers a promising method to produce creative designs by combining human inspiration and machines’ computational ability. We propose a crowdsourcing intelligent design method called ‘flexible crowdsourcing design’. Design ideas produced through crowdsourcing design can be unreliable and inconsistent because they rely solely on selection among participants’ submissions of ideas. In contrast, the flexible crowdsourcing design method employs a cultivation procedure that integrates the ideas from crowd participants and cultivates these ideas to improve design quality at the same time. We introduce a series of studies to show how flexible crowdsourcing design can produce original design ideas consistently. Specifically, we will describe the typical procedure of flexible crowdsourcing design, the refined crowdsourcing tasks, the factors that affect the idea development process, the method for calculating idea development potential, and two applications of the flexible crowdsourcing design method. Finally, it summarizes the design capabilities enabled by crowdsourcing intelligent design. This method enhances the performance of crowdsourcing design and supports the development of design intelligence.
Human information processing depends mainly on billions of neurons which constitute a complex neural network, and the information is transmitted in the form of neural spikes. In this paper, we propose a spiking neural network (SNN), named MD-SNN, with three key features: (1) using receptive field to encode spike trains from images; (2) randomly selecting partial spikes as inputs for each neuron to approach the absolute refractory period of the neuron; (3) using groups of neurons to make decisions. We test MD-SNN on the MNIST data set of handwritten digits, and results demonstrate that: (1) Different sizes of receptive fields influence classification results significantly. (2) Considering the neuronal refractory period in the SNN model, increasing the number of neurons in the learning layer could greatly reduce the training time, effectively reduce the probability of over-fitting, and improve the accuracy by 8.77%. (3) Compared with other SNN methods, MD-SNN achieves a better classification; compared with the convolution neural network, MD-SNN maintains flip and rotation invariance (the accuracy can remain at 90.44% on the test set), and it is more suitable for small sample learning (the accuracy can reach 80.15% for 1000 training samples, which is 7.8 times that of CNN).