Patent expanded retrieval via word embedding under composite-domain perspectives

Fei WANG, Tieyun QIAN, Bin LIU, Zhiyong PENG

PDF(598 KB)
PDF(598 KB)
Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (5) : 1048-1061. DOI: 10.1007/s11704-018-7056-6
RESEARCH ARTICLE

Patent expanded retrieval via word embedding under composite-domain perspectives

Author information +
History +

Abstract

Patent prior art search uses dispersed information to retrieve all the relevant documents with strong ambiguity from the massive patent database. This challenging task consists in patent reduction and patent expansion. Existing studies on patent reduction ignore the relevance between technical characteristics and technical domains, and result in ambiguous queries. Works on patent expansion expand terms from external resource by selecting words with similar distribution or similar semantics. However, this splits the relevance between the distribution and semantics of the terms. Besides, common repository hardly meets the requirement of patent expansion for uncommon semantics and unusual terms. In order to solve these problems, we first present a novel composite-domain perspective model which converts the technical characteristic of a query patent to a specific composite classified domain and generates aspect queries. We then implement patent expansionwith double consistency by combining distribution and semantics simultaneously.We also propose to train semantic vector spaces via word embedding under the specific classified domains, so as to provide domain-aware expanded resource. Finally, multiple retrieval results of the same topic are merged based on perspective weight and rank in the results. Our experimental results on CLEP-IP 2010 demonstrate that our method is very effective. It reaches about 5.43% improvement in recall and nearly 12.38% improvement in PRES over the state-of-the-art. Our work also achieves the best performance balance in terms of recall, MAP and PRES.

Keywords

patent retrieval / composite-domain perspective / double-consistency expansion / word embedding

Cite this article

Download citation ▾
Fei WANG, Tieyun QIAN, Bin LIU, Zhiyong PENG. Patent expanded retrieval via word embedding under composite-domain perspectives. Front. Comput. Sci., 2019, 13(5): 1048‒1061 https://doi.org/10.1007/s11704-018-7056-6

References

[1]
Zhang L, Li L, Li T. Patent mining: a survey. ACM SIGKDD Explorations Newsletter, 2015, 16(2): 1–19
CrossRef Google scholar
[2]
Xue X, Croft W B. Automatic query generation for patent search. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management. 2009, 2037–2040
CrossRef Google scholar
[3]
Xue X, Croft W B. Transforming patents into prior-art queries. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 808–809
CrossRef Google scholar
[4]
Kim Y, Seo J, Croft W B. Automatic boolean query suggestion for professional search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 825–834
CrossRef Google scholar
[5]
Kim Y, Croft W B. Diversifying query suggestions based on query documents. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 891–894
CrossRef Google scholar
[6]
Far M G, Sanner S, Bouadjenek M R, Ferraro G, Hawking D. On term selection techniques for patent prior art search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015, 803–806
[7]
Al-Shboul B, Myaeng S H. Query phrase expansion using wikipedia in patent class search. In: Proceedings of the 7th Asia Information Retrieval Symposium. 2011, 115–126
CrossRef Google scholar
[8]
Magdy W, Jones G J F. A study on query expansion methods for patent retrieval. In: Proceedings of the 4th Workshop on Patent Information Retrieval. 2011, 19–24
CrossRef Google scholar
[9]
Kishida K. Pseudo relevance feedback method based on taylor expansion of retrieval function in NTCIR-3 patent retrieval task. In: Proceedings of the ACL-2003 Workshop on Patent Corpus Processing. 2003, 33–40
CrossRef Google scholar
[10]
Mahdabi P, Andersson L, Keikha M, Crestani F. Automatic refinement of patent queries using concept importance predictors. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 505–514
CrossRef Google scholar
[11]
Mahdabi P, Gerani S, Huang J X, Crestani F. Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013, 113–122
CrossRef Google scholar
[12]
Wang F, Lin L. Domain lexicon-based query expansion for patent retrieval. In: Proceedings of the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. 2016, 1543–1547
CrossRef Google scholar
[13]
Mahdabi P, Crestani F. Query-driven mining of citation networks for patent citation retrieval and recommendation. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014, 1659–1668
CrossRef Google scholar
[14]
Judea A, Schütze H, Brügmann S. Unsupervised training set generation for automatic acquisition of technical terminology in patents. In: Proceedings of the 15th International Conference on Computational Linguistics. 2014, 290–300
[15]
Magdy W, Leveling J, Jones G J F. Exploring structured documents and query formulation techniques for patent retrieval. In: Proceedings of the Workshop on Cross-Language Evaluation Forum for European Languages. 2009, 410–417
[16]
Mahdabi P, Crestani F. Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Transactions on Information Systems, 2014, 32(4): 1–30
CrossRef Google scholar
[17]
Cetintas S, Si L. Effective query generation and postprocessing strategies for prior art patent search. Journal of the Association for Information Science and Technology, 2012, 63(3): 512–527
CrossRef Google scholar
[18]
Ganguly D, Leveling J, Magdy W, Jones G J F. Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 1953–1956
CrossRef Google scholar
[19]
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. 2013, arXiv preprint arXiv:1301.3781
[20]
Magdy W, Jones G J F. PRES: a score metric for evaluating recalloriented information retrieval applications. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 611–618
CrossRef Google scholar

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(598 KB)

Accesses

Citations

Detail

Sections
Recommended

/