Entity attribute discovery and clustering from online reviews

Qingliang MIAO, Qiudan LI, Daniel ZENG, Yao MENG, Shu ZHANG, Hao YU

PDF(618 KB)
PDF(618 KB)
Front. Comput. Sci. ›› 2014, Vol. 8 ›› Issue (2) : 279-288. DOI: 10.1007/s11704-014-3043-8
RESEARCH ARTICLE

Entity attribute discovery and clustering from online reviews

Author information +
History +

Abstract

The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.

Keywords

opinion mining / attribute extraction / attribute clustering

Cite this article

Download citation ▾
Qingliang MIAO, Qiudan LI, Daniel ZENG, Yao MENG, Shu ZHANG, Hao YU. Entity attribute discovery and clustering from online reviews. Front. Comput. Sci., 2014, 8(2): 279‒288 https://doi.org/10.1007/s11704-014-3043-8

References

[1]
PangB, LeeL. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2008, 2(1-2): 1-135
CrossRef Google scholar
[2]
LiuB, HuM, ChengJ. Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th International World Wide Web Conference. 2005, 342-351
CrossRef Google scholar
[3]
HuM, LiuB. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 168-177
[4]
PopescuA M, EtzioniO. Extracting product features and opinions from reviews. In: Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing. 2005, 339-346
[5]
MiaoQ, LiQ, DaiR. An integration strategy for mining product features and opinions. In: Proceedings of the 17th Conference on Information and Knowledge Management. 2008, 1369-1370
[6]
GiuseppeC, RaymondT, EdZ. Extracting knowledge from evaluative text. In: Proceedings of the 3rd International Conference on Knowledge Capture. 2005, 11-18
[7]
SuQ, XiangK, WangH, SunB, YuS. Using pointwise mutual information to identify implicit features in customer reviews. In: Proceedings of the 21st International Conference on the Computer Processing of Oriental Languages. 2006
[8]
ShiB, ChangK. Mining Chinese reviews. In: Proceedings of the 6th IEEE International Conference on Data Mining. 2006, 585-589
[9]
RayidG, KatharinaP, LiuY, MarkoK, AndrewF. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41-48
CrossRef Google scholar
[10]
WangB, WangH. Bootstrapping both product properties and opinion words from Chinese reviews with cross-training. In: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence. 2007, 259-262
[11]
JinW, HoH. A novel lexicalized HMM based learning framework for web opinion mining. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 465-472
[12]
QiL, ChenL. A linear-chain CRF-based learning approach for web opinion mining. In: Proceedings of the 11th International Conference on Web Information Systems Engineering. 2010, 128-141
[13]
ZhangS, JiaW, XiaY, MengY, YuH. Product features extraction and categorization in Chinese reviews. In: Proceedings of the 6th International Multi-Conference on Computing in the Global Information Technology. 2010, 38-43
[14]
SomprasertsriG, LalitrojwongP. Automatic product feature extraction from online product reviews using maximum entropy with lexical and syntactic features. In: Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration. 2008, 250-255
CrossRef Google scholar
[15]
MiaoQ, LiQ, DanielZ. Mining fine grained opinions by using probabilistic models and domain knowledge, In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 2010, 358-365
[16]
LaffertyJ, McCallumA, PereiraF. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 282-289
[17]
SuQ, XuX, GuoH, GuoZ, WuX, ZhangX, SwenB, SuZ. Hidden sentiment association in Chinese web opinion mining. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 959-968
CrossRef Google scholar
[18]
GuoH, ZhuH, GuoZ, ZhangX, SuZ. Product feature categorization with multilevel latent semantic association. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 1087-1096
[19]
ZhaiZ, LiuB, XuH, JiaP. Clustering product features for opinion mining. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 347-354
CrossRef Google scholar
[20]
GiuseppeP. A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering, 2009, 68(11), 1289-1308
CrossRef Google scholar
[21]
RudiL, PaulM. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383
CrossRef Google scholar
[22]
DanushkaB, YutakaM, MitsuruI. Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 757-766
[23]
HuX, SunN, ZhangC, ChuaT. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 919-928
[24]
PatrickP, DekangL. Discovering word senses from text. In: Proceedings of the 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2002, 613-619
[25]
PeterD T, PatrickP. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 2010, 37(1): 141-188

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(618 KB)

Accesses

Citations

Detail

Sections
Recommended

/