Automatic patent document summarization for collaborative knowledge systems and services
Amy J.C. Trappey , Charles V. Trappey , Chun-Yi Wu
Journal of Systems Science and Systems Engineering ›› 2009, Vol. 18 ›› Issue (1) : 71 -94.
Automatic patent document summarization for collaborative knowledge systems and services
Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development time, increases market success, and reduces potential patent infringement. Thus, it is beneficial to automatically and systematically extract information from patent documents in order to improve knowledge sharing and collaboration among R&D team members. In this research, patents are summarized using a combined ontology based and TF-IDF concept clustering approach. The ontology captures the general knowledge and core meaning of patents in a given domain. Then, the proposed methodology extracts, clusters, and integrates the content of a patent to derive a summary and a cluster tree diagram of key terms. Patents from the International Patent Classification (IPC) codes B25C, B25D, B25F (categories for power hand tools) and B24B, C09G and H011 (categories for chemical mechanical polishing) are used as case studies to evaluate the compression ratio, retention ratio, and classification accuracy of the summarization results. The evaluation uses statistics to represent the summary generation and its compression ratio, the ontology based keyword extraction retention ratio, and the summary classification accuracy. The results show that the ontology based approach yields about the same compression ratio as previous non-ontology based research but yields on average an 11% improvement for the retention ratio and a 14% improvement for classification accuracy.
Semantic knowledge service / key phrase extraction / document summarization / text mining / patent document analysis
| [1] |
|
| [2] |
Aone, C., Okurowski, M.E., Gorlinsky, J. & Larsen, B. (1997). A scalable summarization system using robust NLP. In: Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, 10–17, Madrid, Spain, 1997 |
| [3] |
|
| [4] |
|
| [5] |
Brown, C.T. (2006). Stapling Device. United States Patent, No. US 7,014,088 B2 |
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
Fum, D., Guida, G. & Tasso, C. (1985). Evaluating importance: a step towards text summarization, In: Proceedings of the 9th International Joint Conference on Artificial Intelligence, 840–844, Los Angeles, CA, USA |
| [12] |
Goldstein, J., Kantrowitz, M., Mittal, V. & Carbonell, J. (1999). Summarizing text documents: sentence selection and evaluation metrics. In: Research and Development in Information Retrieval. Available via DIALOG. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.3703 |
| [13] |
Gong, Y. & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis, In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Available via DIALOG. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.5097 |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
Hovy, E. & Lin, C.Y. (1999). Automated text summarization in SUM MARIST. In: Advances in Automatic Text Summarization. Available via DIALOG. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.2103 |
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Wu, J., Xiong, H., Chen, J. & Zhang, W. (2007). A generalization of proximity functions for K-means. In: Seventh IEEE International Conference on Data Mining, 361–370 |
| [24] |
|
| [25] |
Ko, Y., Kim, K. & Seo, J. (2003). Topic keyword identification for text summarization using lexical clustering. In: IEICE Trans. Inform. System, 1695–1701. Available via DIALOG. http://sciencelinks.jp/j-east/article/200320/000020032003A0635686.php |
| [26] |
Kupiec, J., Pedersen, J. & Chen, F. (1995). A trainable document summarizer. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), 68–73, Seattle, WA, USA |
| [27] |
Lam-Adesina, A.M. & Jones, G.J.F. (2001). Applying summarization techniques for term selection in relevance feedback. In: Proceedings of the 24th Annual International ACM SIGIR’01 Conference on Research and Development in Information Retrieval, 1–9, New Orleans, Louisiana, September 9-13, 2001 |
| [28] |
Li, Y.R., Wang, L.H. & Hong, C.F. (2008). Extracting the significant-rare keywords for patent analysis. Expert Systems with Applications, In Press, Corrected Proof, Available Online 8 July |
| [29] |
Lin, C.Y. & Hovy, E.H. (1997). Identifying topics by position. In: Proceedings of the Applied Natural Language Processing Conference (ANLP-97), 283–290, Washington, D.C., March 31–April 3, 1997 |
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
Mani, I., House, D., Klein, G., Hirschman, L., Firmin, T. & Sundheim, B. (1998). The TIPSTER SUMMAC Text Summarization Evaluation. MITRE Technical Report, Washington, D.C., 1–47 |
| [36] |
|
| [37] |
|
| [38] |
Princeton University. (2006). WordNet 3.0. Available via DIALOG. http://wordnet.princeton.edu/perl/webwn?s=ontology |
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
Teufel, S. & Moens, M. (1997). Sentence extraction as a classification task. In: Proceedings of the ACL/EACL Workshop on Intelligent Scalable Summarization, 58–65, Madrid, Spain |
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
Wu, J., Xiong, H., Chen, J. & Zhou, W. (2007). A generalization of proximity functions for k-means. In: Seventh IEEE International Conference on Data Mining, 28–31, Omaha, NE, USA |
| [49] |
|
| [50] |
Yeh, J.Y., Ke, H.R. & Yang, W.P. (2002). Chinese text summarization using a trainable summarizer and latent semantic analysis. In: Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology, 76–87, ISBN: 3-540-00261-8. Available via DIALOG. http://portal.acm.org/citation.cfm?id=681381 |
| [51] |
|
| [52] |
|
| [53] |
Young, S.R. & Hayes, P.J. (1985). Automatic classification and summarization of banking telexes. In: Proceedings of the 2nd Conference on Artificial Intelligence Application, 402–408 |
| [54] |
Zhang, W., Yoshida, T. & Tang, X. (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems, In Press, Corrected Proof, Available Online 4 April |
| [55] |
|
/
| 〈 |
|
〉 |