Automatic patent document summarization for collaborative knowledge systems and services

Amy J.C. Trappey , Charles V. Trappey , Chun-Yi Wu

Journal of Systems Science and Systems Engineering ›› 2009, Vol. 18 ›› Issue (1) : 71 -94.

PDF
Journal of Systems Science and Systems Engineering ›› 2009, Vol. 18 ›› Issue (1) : 71 -94. DOI: 10.1007/s11518-009-5100-7
Article

Automatic patent document summarization for collaborative knowledge systems and services

Author information +
History +
PDF

Abstract

Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development time, increases market success, and reduces potential patent infringement. Thus, it is beneficial to automatically and systematically extract information from patent documents in order to improve knowledge sharing and collaboration among R&D team members. In this research, patents are summarized using a combined ontology based and TF-IDF concept clustering approach. The ontology captures the general knowledge and core meaning of patents in a given domain. Then, the proposed methodology extracts, clusters, and integrates the content of a patent to derive a summary and a cluster tree diagram of key terms. Patents from the International Patent Classification (IPC) codes B25C, B25D, B25F (categories for power hand tools) and B24B, C09G and H011 (categories for chemical mechanical polishing) are used as case studies to evaluate the compression ratio, retention ratio, and classification accuracy of the summarization results. The evaluation uses statistics to represent the summary generation and its compression ratio, the ontology based keyword extraction retention ratio, and the summary classification accuracy. The results show that the ontology based approach yields about the same compression ratio as previous non-ontology based research but yields on average an 11% improvement for the retention ratio and a 14% improvement for classification accuracy.

Keywords

Semantic knowledge service / key phrase extraction / document summarization / text mining / patent document analysis

Cite this article

Download citation ▾
Amy J.C. Trappey, Charles V. Trappey, Chun-Yi Wu. Automatic patent document summarization for collaborative knowledge systems and services. Journal of Systems Science and Systems Engineering, 2009, 18(1): 71-94 DOI:10.1007/s11518-009-5100-7

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Aizawa A.. An information-theoretic perspective of TF-IDF measures. Information Processing & Management, 2003, 39(1): 45-65.

[2]

Aone, C., Okurowski, M.E., Gorlinsky, J. & Larsen, B. (1997). A scalable summarization system using robust NLP. In: Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, 10–17, Madrid, Spain, 1997

[3]

Blanchard A.. Understanding and customizing stopword lists for enhanced patent mapping. World Patent Information, 2007, 29(4): 308-316.

[4]

Bobillo F., Delgado M., Gómez-Romero J.. Representation of context-dependant knowledge in ontologies: a model and an application. Expert Systems with Applications, 2008, 35(4): 1899-1908.

[5]

Brown, C.T. (2006). Stapling Device. United States Patent, No. US 7,014,088 B2

[6]

Buitelaar P., Cimiano P., Frank A., Hartung M., Racioppa S.. Ontology-based information extraction and integration from heterogeneous data sources. International Journal of Human-Computer Studies, 2008, 66(11): 759-788.

[7]

Chung T.M., Nation P.. Identifying technical vocabulary. System, 2004, 32(2): 251-263.

[8]

Edmundson H.P.. New methods in automatic extracting. Journal of the ACM (JACM), 1969, 16(2): 264-285.

[9]

Ercan G., Cicekli I.. Using lexical chains for keyword extraction. Information Processing & Management, 2007, 43(6): 1705-1714.

[10]

Fattori M., Pedrazzi G., Turra R.. Text mining applied to patent mapping: a practical business case. World Patent Information, 2003, 25: 335-342.

[11]

Fum, D., Guida, G. & Tasso, C. (1985). Evaluating importance: a step towards text summarization, In: Proceedings of the 9th International Joint Conference on Artificial Intelligence, 840–844, Los Angeles, CA, USA

[12]

Goldstein, J., Kantrowitz, M., Mittal, V. & Carbonell, J. (1999). Summarizing text documents: sentence selection and evaluation metrics. In: Research and Development in Information Retrieval. Available via DIALOG. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.3703

[13]

Gong, Y. & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis, In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Available via DIALOG. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.5097

[14]

Greiff W.R.. A Theory of Term Weighting Based on Exploratory Data Analysis, 1998, Amherst: Computer Science Department, University of Massachusetts

[15]

Gruber T.R.. A translation approach to portable ontology specifications. Knowledge Acquisition, 1993, 5(2): 199-220.

[16]

Han J., Kamber M.. Data Mining: Concepts and Techniques, 2000, Morgan Kaufmann Publishers: San Francisco, California

[17]

Hassel M.. Evaluation of automatic text summarization — a practical implementation. Licentiate Thesis, 2004, Stockholm, Sweden: Department of Numerical Analysis and Computer Science, Royal Institute of Technology

[18]

Hovy, E. & Lin, C.Y. (1999). Automated text summarization in SUM MARIST. In: Advances in Automatic Text Summarization. Available via DIALOG. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.2103

[19]

Hsu S.H.. Ontology-based semantic annotation authoring and retrieval (in Chinese). M.S. Thesis, 2003, Hualien, Taiwan, China: Department of Computer Science, National Dong Hwa University

[20]

Hsu F.C., Trappey A.J.C., Hou J.L., Trappey C.V., Liu S.J.. Technology and knowledge document clustering analysis for enterprise R&D strategic planning. International Journal Technology Management, 2006, 36(4): 336-353.

[21]

Hu Y., Li H., Cao Y., Teng L., Meyerzon D., Zheng Q.. Automatic extraction of titles from general documents using machine learning. Information Processing & Management, 2006, 42(5): 1276-1293.

[22]

Joung Y.J., Chuang F.Y.. OntoZilla: an ontology-based, semi-structured, and evolutionary peer-to-peer network for information systems and services. Future Generation Computer Systems, 2009, 25(1): 53-63.

[23]

Wu, J., Xiong, H., Chen, J. & Zhang, W. (2007). A generalization of proximity functions for K-means. In: Seventh IEEE International Conference on Data Mining, 361–370

[24]

Kim N.H., Jung S.Y., Kang C.S., Lee Z.H.. Patent information retrieval system. Journal of Korea Information Processing, 1999, 6(3): 80-85.

[25]

Ko, Y., Kim, K. & Seo, J. (2003). Topic keyword identification for text summarization using lexical clustering. In: IEICE Trans. Inform. System, 1695–1701. Available via DIALOG. http://sciencelinks.jp/j-east/article/200320/000020032003A0635686.php

[26]

Kupiec, J., Pedersen, J. & Chen, F. (1995). A trainable document summarizer. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), 68–73, Seattle, WA, USA

[27]

Lam-Adesina, A.M. & Jones, G.J.F. (2001). Applying summarization techniques for term selection in relevance feedback. In: Proceedings of the 24th Annual International ACM SIGIR’01 Conference on Research and Development in Information Retrieval, 1–9, New Orleans, Louisiana, September 9-13, 2001

[28]

Li, Y.R., Wang, L.H. & Hong, C.F. (2008). Extracting the significant-rare keywords for patent analysis. Expert Systems with Applications, In Press, Corrected Proof, Available Online 8 July

[29]

Lin, C.Y. & Hovy, E.H. (1997). Identifying topics by position. In: Proceedings of the Applied Natural Language Processing Conference (ANLP-97), 283–290, Washington, D.C., March 31–April 3, 1997

[30]

Lin F.R., Liang C.H.. Storyline-based summarization for news topic retrospection. Decision Support Systems, 2008, 45(3): 473-490.

[31]

Lorch R.F., Lorch E.P., Ritchey K., McGovern L., Coleman D.. Effects of headings on text summarization. Contemporary Educational Psychology, 2001, 26: 171-191.

[32]

Luhn H.P.. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1957, 1(4): 309-317.

[33]

Luhn H.P.. The automatic creation of literature abstracts. IBM Journal of Research and Development, 1958, 2(2): 159-165.

[34]

Mani I., Bloedorn E.. Summarizing similarities and differences among related documents. Information Retrieval, 1999, 11–2: 35-67.

[35]

Mani, I., House, D., Klein, G., Hirschman, L., Firmin, T. & Sundheim, B. (1998). The TIPSTER SUMMAC Text Summarization Evaluation. MITRE Technical Report, Washington, D.C., 1–47

[36]

Mani I., Maybury M.T.. Advances in Automated Text Summarization, 1999, Cambridge, MA: The MIT Press

[37]

Morris J., Hirst G.. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 1991, 17(1): 21-43.

[38]

Princeton University. (2006). WordNet 3.0. Available via DIALOG. http://wordnet.princeton.edu/perl/webwn?s=ontology

[39]

Reeve L.H., Han H., Brooks A.D.. The use of domain-specific concepts in biomedical text summarization. Information Processing & Management, 2007, 43(6): 1765-1776.

[40]

Rodrigues T., Rosa P., Cardoso J.. Moving from syntactic to semantic organizations using JXML2OWL. Computers in Industry, 2008, 59(8): 808-819.

[41]

Salton G., Buckley C.. Term-weighting approaches in automatic text retrieval. Journal of Information Processing & Management, 1988, 24(5): 513-523.

[42]

Salton G., Singhal A., Mitra M., Buckley C.. Automatic text structuring and summarization. Information Processing & Management, 1997, 33(2): 193-207.

[43]

Sharma S.C.. Applied Multivariate Techniques, 1996, New York: John Wiley & Sons, Hoboken

[44]

Teufel, S. & Moens, M. (1997). Sentence extraction as a classification task. In: Proceedings of the ACL/EACL Workshop on Intelligent Scalable Summarization, 58–65, Madrid, Spain

[45]

Trappey A.J.C., Hsu F.C., Trappey C.V., Liu C.I.. Development of a patent document classification and search platform using a back-propagation network. Expert Systems with Applications, 2006, 31: 755-765.

[46]

Trappey A.J.C., Trappey C.V.. An R&D knowledge management method for patent document summarization. Industrial Management and Data System, 2008, 108(2): 245-257.

[47]

Tseng Y.H., Lin C.J., Lin Y.I.. Text mining techniques for patent analysis. Information Processing & Management, 2007, 43(5): 1216-1247.

[48]

Wu, J., Xiong, H., Chen, J. & Zhou, W. (2007). A generalization of proximity functions for k-means. In: Seventh IEEE International Conference on Data Mining, 28–31, Omaha, NE, USA

[49]

Ye J.S., Chua H.T., Kan W.M., Qiu I.L.. Document concept lattice for text understanding and summarization. Information Processing & Management, 2007, 43(6): 1643-1662.

[50]

Yeh, J.Y., Ke, H.R. & Yang, W.P. (2002). Chinese text summarization using a trainable summarizer and latent semantic analysis. In: Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology, 76–87, ISBN: 3-540-00261-8. Available via DIALOG. http://portal.acm.org/citation.cfm?id=681381

[51]

Yeh J.Y., Ke H.R., Yang W.P., Meng I.H.. Text summarization using a trainable summarizer and latent semantic analysis. Information Processing & Management, 2005, 41(1): 75-95.

[52]

Yeh J.Y., Ke H.R., Yang W.P.. iSpreadRank: ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network. Expert Systems with Applications, 2008, 35(3): 1451-1462.

[53]

Young, S.R. & Hayes, P.J. (1985). Automatic classification and summarization of banking telexes. In: Proceedings of the 2nd Conference on Artificial Intelligence Application, 402–408

[54]

Zhang, W., Yoshida, T. & Tang, X. (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems, In Press, Corrected Proof, Available Online 4 April

[55]

Zheng H.T., Kang B.Y., Kim H.G.. An ontology-based approach to learnable focused crawling. Information Sciences, 2008, 178(23): 4512-4522.

AI Summary AI Mindmap
PDF

168

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/