Line drawings, as a concise form, can be recognized by infants and even chimpanzees. Recently, how the visual system processes line-drawings attracts more and more attention from psychology, cognitive science and computer science. The neuroscientific studies revealed that line drawings generate similar neural actions as color photographs, which give insights on how to efficiently process big media data. In this paper, we present a comprehensive survey on line drawing studies, including cognitive mechanism of visual perception, computational models in computer vision and intelligent process in diverse media applications. Major debates, challenges and solutions that have been addressed over the years are discussed. Finally some of the ensuing challenges in line drawing studies are outlined.
Coalition logic (CL) enables us to model the strategic abilities and specify what a group of agents can achieve whatever the other agents do. However, some rational mental attitudes of the agents are beyond the scope of CL such as the prestigious beliefs, desires and intentions (BDI) which is an interesting and useful epistemic notion and has spawned substantial amount of studies in multi-agent systems. In this paper, we introduce a first-order coalition BDI (FCBDI) logic for multi-agent systems, which provides a semantic glue that allows the formal embedding and interaction of BDI, coalition and temporal operators in a first-order language. We further introduce a semantic model based on the interpreted system model and present an axiomatic system that is proved sound and complete with respect to the semantics. Finally, it is shown that the computational complexity of its model checking in finite structures is PSPACE-complete.
In differential evolution (DE), the salient feature lies in its mutationmechanismthat distinguishes it from other evolutionary algorithms. Generally, for most of the DE algorithms, the parents for mutation are randomly chosen from the current population. Hence, all vectors of population have the equal chance to be selected as parents without selective pressure at all. In this way, the information of population cannot be fully exploited to guide the search. To alleviate this drawback and improve the performance of DE, we present a new selection method of parents that attempts to choose individuals for mutation by utilizing the population information effectively. The proposed method is referred as fitnessand- position based selection (FPS), which combines the fitness and position information of population simultaneously for selecting parents in mutation of DE. In order to evaluate the effectiveness of FPS, FPS is applied to the original DE algorithms, as well as several DE variants, for numerical optimization. Experimental results on a suite of benchmark functions indicate that FPS is able to enhance the performance of most DE algorithms studied. Compared with other selection methods, FPS is also shown to be more effective to utilize information of population for guiding the search of DE.
The rapid development of online services and information overload has inspired the fast development of recommender systems, among which collaborative filtering algorithms and model-based recommendation approaches are wildly exploited. For instance, matrix factorization (MF) demonstrated successful achievements and advantages in assisting internet users in finding interested information. These existing models focus on the prediction of the users’ ratings on unknown items. The performance is usually evaluated by the metric root mean square error (RMSE). However, achieving good performance in terms of RMSE does not always guarantee a good ranking performance. Therefore, in this paper, we advocate to treat the recommendation as a ranking problem. Normalized discounted cumulative gain (NDCG) is chosen as the optimization target when evaluating the ranking accuracy. Specifically, we present three ranking-oriented recommender algorithms, NSMF, AdaMF and AdaNSMF. NSMF builds a NDCG approximated loss function for Matrix Factorization. AdaMF is based on an algorithm by adaptively combining component MF recommenders with boosting method. To combine the advantages of both algorithms, we propose AdaNSMF, which is a hybird of NSMF and AdaMF, and show the superiority in both ranking accuracy and model generalization. In addition, we compare our proposed approaches with the state-of-the-art recommendation algorithms. The comparison studies confirm the advantage of our proposed approaches.
Users of the internet often wish to follow certain news events, and the interests of these users often overlap. General search engines (GSEs) cannot be used to achieve this task due to incomplete coverage and lack of freshness. Instead, a broker is used to regularly query the built-in search engines (BSEs) of news and social media sites. Each user defines an event profile consisting of a set of query rules called event rules (ERs). To ensure that queries match the semantics of BSEs, ERs are transformed into a disjunctive normal form, and separated into conjunctive clauses (atomic event rules, AERs). It is slow to process all AERs on BSEs, and can violate query submission rate limits. Accordingly, the set of AERs is reduced to eliminate AERs that are duplicates, or logically contained by other AERs. Five types of event are selected for experimental comparison and analysis, including natural disasters, accident disasters, public health events, social security events, and negative events of public servants. Using 12 BSEs, 85 ERs for five types of events are defined by five users. Experimental comparison is conducted on three aspects: event rule reduction ratio, number of collected events, and that of related events. Experimental results in this paper show that event rule reduction effectively enhances the efficiency of crawling.
Because the labor needed to manually label a huge training sample set is usually not available, the problem of hyperspectral image classification often suffers from a lack of labeled training samples. At the same time, hyperspectral data represented in a large number of bands are usually highly correlated. In this paper, to overcome the small sample problem in hyperspectral image classification, correlation of spectral bands is fully utilized to generate multiple new sub-samples from each original sample. The number of labeled training samples is thus increased several times. Experiment results demonstrate that the proposed method has an obvious advantage when the number of labeled samples is small.
Non-negative matrix factorization (NMF) is a popular feature encoding method for image understanding due to its non-negative properties in representation, but the learnt basis images are not always local due to the lack of explicit constraints in its objective. Various algebraic or geometric local constraints are hence proposed to shape the behaviour of the original NMF. Such constraints are usually rigid in the sense that they have to be specified beforehand instead of learning from the data. In this paper, we propose a flexible spatial constraint method for NMF learning based on factor analysis. Particularly, to learn the local spatial structure of the images, we apply a series of transformations such as orthogonal rotation and thresholding to the factor loading matrix obtained through factor analysis. Then we map the transformed loading matrix into a Laplacian matrix and incorporate this into a max-margin non-negative matrix factorization framework as a penalty term, aiming to learn a representation space which is non-negative, discriminative and localstructure-preserving. We verify the feasibility and effectiveness of the proposed method on several real world datasets with encouraging results.
Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.
The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.
Together with the big datamovement, many organizations collect their own big data and build distinctive applications. In order to provide smart services upon big data, massive variable data should be well linked and organized to form Data Ocean, which specially emphasizes the deep exploration of the relationships among unstructured data to support smart services. Currently, almost all of these applications have to deal with unstructured data by integrating various analysis and search techniques upon massive storage and processing infrastructure at the application level, which greatly increase the difficulty and cost of application development.
This paper presents D-Ocean, an unstructured data management system for data ocean environment. D-Ocean has an open and scalable architecture, which consists of a core platform, pluggable components and auxiliary tools. It exploits a unified storage framework to store data in different kinds of data stores, integrates batch and incremental processing mechanisms to process unstructured data, and provides a combined search engine to conduct compound queries. Furthermore, a so-called RAISE process modeling is proposed to support the whole process of Repository, Analysis, Index, Search and Environment modeling, which can greatly simplify application development. The experiments and use cases in production demonstrate the efficiency and usability of D-Ocean.
Privacy preservation has recently received considerable attention for location-based mobile services. A lot of location cloaking approaches focus on identity and location protection, but few algorithms pay attention to prevent sensitive information disclosure using query semantics. In terms of personalized privacy requirements, all queries in a cloaking set, from some user’s point of view, are sensitive. These users regard the privacy is breached. This attack is called as the sensitivity homogeneity attack. We show that none of the existing location cloaking approaches can effectively resolve this problem over road networks. We propose a (K, L, P)-anonymity model and a personalized privacy protection cloaking algorithm over road networks, aiming at protecting the identity, location and sensitive information for each user. The main idea of our method is first to partition users into different groups as anonymity requirements. Then, unsafe groups are adjusted by inserting relaxed conservative users considering sensitivity requirements. Finally, segments covered by each group are published to protect location information. The efficiency and effectiveness of the method are validated by a series of carefully designed experiments. The experimental results also show that the price paid for defending against sensitivity homogeneity attacks is small.