Entity attribute discovery and clustering from online reviews
Qingliang MIAO, Qiudan LI, Daniel ZENG, Yao MENG, Shu ZHANG, Hao YU
Entity attribute discovery and clustering from online reviews
The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.
opinion mining / attribute extraction / attribute clustering
[1] |
PangB, LeeL. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2008, 2(1-2): 1-135
CrossRef
Google scholar
|
[2] |
LiuB, HuM, ChengJ. Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th International World Wide Web Conference. 2005, 342-351
CrossRef
Google scholar
|
[3] |
HuM, LiuB. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 168-177
|
[4] |
PopescuA M, EtzioniO. Extracting product features and opinions from reviews. In: Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing. 2005, 339-346
|
[5] |
MiaoQ, LiQ, DaiR. An integration strategy for mining product features and opinions. In: Proceedings of the 17th Conference on Information and Knowledge Management. 2008, 1369-1370
|
[6] |
GiuseppeC, RaymondT, EdZ. Extracting knowledge from evaluative text. In: Proceedings of the 3rd International Conference on Knowledge Capture. 2005, 11-18
|
[7] |
SuQ, XiangK, WangH, SunB, YuS. Using pointwise mutual information to identify implicit features in customer reviews. In: Proceedings of the 21st International Conference on the Computer Processing of Oriental Languages. 2006
|
[8] |
ShiB, ChangK. Mining Chinese reviews. In: Proceedings of the 6th IEEE International Conference on Data Mining. 2006, 585-589
|
[9] |
RayidG, KatharinaP, LiuY, MarkoK, AndrewF. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41-48
CrossRef
Google scholar
|
[10] |
WangB, WangH. Bootstrapping both product properties and opinion words from Chinese reviews with cross-training. In: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence. 2007, 259-262
|
[11] |
JinW, HoH. A novel lexicalized HMM based learning framework for web opinion mining. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 465-472
|
[12] |
QiL, ChenL. A linear-chain CRF-based learning approach for web opinion mining. In: Proceedings of the 11th International Conference on Web Information Systems Engineering. 2010, 128-141
|
[13] |
ZhangS, JiaW, XiaY, MengY, YuH. Product features extraction and categorization in Chinese reviews. In: Proceedings of the 6th International Multi-Conference on Computing in the Global Information Technology. 2010, 38-43
|
[14] |
SomprasertsriG, LalitrojwongP. Automatic product feature extraction from online product reviews using maximum entropy with lexical and syntactic features. In: Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration. 2008, 250-255
CrossRef
Google scholar
|
[15] |
MiaoQ, LiQ, DanielZ. Mining fine grained opinions by using probabilistic models and domain knowledge, In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 2010, 358-365
|
[16] |
LaffertyJ, McCallumA, PereiraF. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 282-289
|
[17] |
SuQ, XuX, GuoH, GuoZ, WuX, ZhangX, SwenB, SuZ. Hidden sentiment association in Chinese web opinion mining. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 959-968
CrossRef
Google scholar
|
[18] |
GuoH, ZhuH, GuoZ, ZhangX, SuZ. Product feature categorization with multilevel latent semantic association. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 1087-1096
|
[19] |
ZhaiZ, LiuB, XuH, JiaP. Clustering product features for opinion mining. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 347-354
CrossRef
Google scholar
|
[20] |
GiuseppeP. A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering, 2009, 68(11), 1289-1308
CrossRef
Google scholar
|
[21] |
RudiL, PaulM. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383
CrossRef
Google scholar
|
[22] |
DanushkaB, YutakaM, MitsuruI. Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 757-766
|
[23] |
HuX, SunN, ZhangC, ChuaT. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 919-928
|
[24] |
PatrickP, DekangL. Discovering word senses from text. In: Proceedings of the 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2002, 613-619
|
[25] |
PeterD T, PatrickP. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 2010, 37(1): 141-188
|
/
〈 | 〉 |