WordNet-based lexical semantic classification for text corpus analysis

Jun Long , Lu-da Wang , Zu-de Li , Zu-ping Zhang , Liu Yang

Journal of Central South University ›› 2015, Vol. 22 ›› Issue (5) : 1833 -1840.

PDF
Journal of Central South University ›› 2015, Vol. 22 ›› Issue (5) : 1833 -1840. DOI: 10.1007/s11771-015-2702-8
Article

WordNet-based lexical semantic classification for text corpus analysis

Author information +
History +
PDF

Abstract

Many text classifications depend on statistical term measures to implement document representation. Such document representations ignore the lexical semantic contents of terms and the distilled mutual information, leading to text classification errors. This work proposed a document representation method, WordNet-based lexical semantic VSM, to solve the problem. Using WordNet, this method constructed a data structure of semantic-element information to characterize lexical semantic contents, and adjusted EM modeling to disambiguate word stems. Then, in the lexical-semantic space of corpus, lexical-semantic eigenvector of document representation was built by calculating the weight of each synset, and applied to a widely-recognized algorithm NWKNN. On text corpus Reuter-21578 and its adjusted version of lexical replacement, the experimental results show that the lexical-semantic eigenvector performs F1 measure and scales of dimension better than term-statistic eigenvector based on TF-IDF. Formation of document representation eigenvectors ensures the method a wide prospect of classification applications in text corpus analysis.

Keywords

document representation / lexical semantic content / classification / eigenvector

Cite this article

Download citation ▾
Jun Long, Lu-da Wang, Zu-de Li, Zu-ping Zhang, Liu Yang. WordNet-based lexical semantic classification for text corpus analysis. Journal of Central South University, 2015, 22(5): 1833-1840 DOI:10.1007/s11771-015-2702-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

JingL P, NgM K, Huang JoshuaZ. Knowledge-based vector space model for text clustering [J]. Knowledge and Information Systems, 2010, 25(1): 35-55

[2]

ZhangW, YoshidaT, TangX-jin. A comparative study of TF*IDF, LSI and multi-words for text classification [J]. Expert Systems with Applications, 2011, 38(3): 2758-2765

[3]

ZhangY, JinR, ZhouZ-hua. Understanding bag-of-words model: a statistical framework [J]. International Journal of Machine Learning and Cybernetics, 2010, 1(1/2/3/4): 43-52

[4]

LiP, ShrivastavaA, KonigA C. b-Bit minwise hashing in practice [C]. Proceedings of the 5th Asia-Pacific Symposium on Internetware, 2013New YorkACM13-22

[5]

HamidA O, BehzadiB, ChristophS, HenzingerM. Detecting the origin of text segments efficiently [C]. Proceedings of the 18th International Conference on World Wide Web, 2009New YorkACM61-70

[6]

SanchezD, BatetM. A semantic similarity method based on information content exploiting multiple ontologies [J]. Expert Systems with Applications, 2013, 40(4): 1393-1399

[7]

ChurchK W, HanksP. Word association norms, mutual information, and lexicography [J]. Computational linguistics, 1990, 16(1): 22-29

[8]

MillerG A. WordNet: A lexical database for English [J]. Communications of the ACM, 1995, 38(11): 39-41

[9]

LinteanM, RusV. Measuring Semantic similarity in short texts through greedy pairing and word semantics [C]. Proceedings of the 25th International Florida Artificial Intelligence Research Society Conference, 2012Marco Island, USAAAAI244-249

[10]

MIT. MIT Java Wordnet interface (JWI) [EB/OL]. [2013-12-20]. http://projects.csail.mit.edu/jwi/api/edu/mit/jwi/morph/WordnetStem mer.html/.

[11]

ZhaoL-y, LiuF-a, ZhuZ-fangFrontier and future development of information technology in medicine and education: Identification of evaluation collocation based on maximum entropy model [M], 20131st edNew YorkSpringer713-721

[12]

HwangM, ChoiC, KimP. Automatic enrichment of semantic relation network and its application to word sense disambiguation [J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(6): 845-858

[13]

KeylockC J. Simpson diversity and the Shannon-Wiener index as special cases of a generalized entropy [J]. Oikos, 2005, 109(1): 203-207

[14]

TanS. Neighbor-weighted k-nearest neighbor for unbalanced text corpus [J]. Expert Systems with Applications, 2005, 28(4): 667-671

[15]

AggarwalC C, ZhaiC XMining text data: A survey of text classification algorithms [M], 20121st edNew YorkSpringer163-222

[16]

TataS, PatelJ M. Estimating the selectivity of tf-idf based cosine similarity predicates [J]. ACM Sigmod Record, 2007, 36(2): 7-12

[17]

van RijsbergenCInformation retrieval [M], 1979LondonButterworths Press

[18]

YanJ, LiuN, YanS-c, YangQ, FanW-g, WeiW, ChenZheng. Trace-oriented feature analysis for large-scale text data dimension reduction [J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7): 1103-1117

AI Summary AI Mindmap
PDF

109

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/