Collections

Natural Language Processing
Quality article selection in Natural Language Processing field
Publication years
Loading ...
Article types
Loading ...
  • Select all
  • RESEARCH ARTICLE
    Jingjing WEI, Xiangwen LIAO, Houdong ZHENG, Guolong CHEN, Xueqi CHENG
    Frontiers of Computer Science, 2018, 12(4): 714-724. https://doi.org/10.1007/s11704-016-6163-5

    This study addresses the problem of Chinese microblog opinion retrieval, which aims to retrieve opinionated Chinese microblog posts relevant to a target specified by a user query. Existing studies have shown that lexicon-based approaches employed online public sentiment resources to rank sentimentwords relying on the document features. However, this approach could not be effectively applied to microblogs that have typical user-generated content with valuable contextual information: “user–user” interpersonal interactions and “user–post/comment” intrapersonal interactions. This contextual information is very helpful in estimating the strength of sentiment words more accurately. In this study, we integrate the social contextual relationships among users, posts/comments, and sentiment words into a mutual reinforcement model and propose a unified three-layer heterogeneous graph, on which a random walk sentiment word weighting algorithm is presented to measure the strength of opinion of the sentiment words. Furthermore, the weights of sentiment words are incorporated into a lexicon-based model for Chinese microblog opinion retrieval. Comparative experiments are conducted on a Chinese microblog corpus, and the results show that our proposed mutual reinforcement model achieves significant improvement over previous methods.

  • RESEARCH ARTICLE
    Weimin WANG, Dan ZHOU
    Frontiers of Computer Science, 2018, 12(1): 135-145. https://doi.org/10.1007/s11704-016-5415-8

    The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. Themethod can learnmany interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.

  • RESEARCH ARTICLE
    Zhongqing WANG, Shoushan LI, Guodong ZHOU
    Frontiers of Computer Science, 2017, 11(6): 1085-1097. https://doi.org/10.1007/s11704-016-5088-3

    Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack consistent organization confronted with the large amount of available information. Therefore, it is always a challenge for people to quickly find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both textual information and social connection information in social networks from both unsupervised and supervised learning paradigms. Here, using social connection information is motivated by the intuition that people with similar academic, business or social background (e.g., comajor, co-university, and co-corporation) tend to have similar experiences and should have similar summaries. For unsupervised learning, we propose a collective ranking approach, called SocialRank, to combine textual information in an individual profile and social context information from relevant profiles in generating a personal profile summary. For supervised learning, we propose a collective factor graph model, called CoFG, to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large dataset from LinkedIn.com demonstrates the usefulness of social connection information in personal profile summarization and the effectiveness of our proposed unsupervised and supervised learning approaches.

  • REVIEW ARTICLE
    Yiqun LIU, Chao WANG, Min ZHANG, Shaoping MA
    Frontiers of Computer Science, https://doi.org/10.1007/s11704-017-6518-6

    Modern search engines record user interactions and use them to improve search quality. In particular, user click-through has been successfully used to improve clickthrough rate (CTR), Web search ranking, and query recommendations and suggestions. Although click-through logs can provide implicit feedback of users’ click preferences, deriving accurate absolute relevance judgments is difficult because of the existence of click noises and behavior biases. Previous studies showed that user clicking behaviors are biased toward many aspects such as “position” (user’s attention decreases from top to bottom) and “trust” (Web site reputations will affect user’s judgment). To address these problems, researchers have proposed several behavior models (usually referred to as click models) to describe users? practical browsing behaviors and to obtain an unbiased estimation of result relevance. In this study, we review recent efforts to construct click models for better search ranking and propose a novel convolutional neural network architecture for building click models. Compared to traditional click models, our model not only considers user behavior assumptions as input signals but also uses the content and context information of search engine result pages. In addition, our model uses parameters from traditional click models to restrict the meaning of some outputs in our model’s hidden layer. Experimental results show that the proposed model can achieve considerable improvement over state-of-the-art click models based on the evaluation metric of click perplexity.

  • RESEARCH ARTICLE
    Wayne Xin ZHAO, Chen LIU, Ji-Rong WEN, Xiaoming LI
    Frontiers of Computer Science, 2017, 11(5): 852-862. https://doi.org/10.1007/s11704-016-5144-z

    Detecting and using bursty patterns to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context.We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.