Enhanced multi-tuple extraction for materials: integrating pointer networks and augmented attention
Mengzhe Hei , Zhouran Zhang , Qingbao Liu , Yan Pan , Xiang Zhao , Yongqian Peng , Yicong Ye , Xin Zhang , Shuxin Bai
Journal of Materials Informatics ›› 2026, Vol. 6 ›› Issue (1) -15.
Extracting reliable, tuple-level information from materials texts is essential for data-driven materials design, yet multi-tuple sentences remain difficult due to intertwined semantics, syntactic complexity, and sparse supervision in higher-density cases. In this study, we address these challenges by formulating information extraction as an integrated process that couples entity extraction with tuple allocation. The framework combines an entity extraction module based on bidirectional encoder representations from transformers (MatSciBERT) with pointer networks and an allocation module that models inter- and intra-entity attention to enforce tuple coherence. Using the mechanical properties of multi-principal element alloys as a case study, we define the target schema and evaluate exact match tuple accuracy. Our experiments demonstrate F1 scores of 0.96, 0.95, 0.85, and 0.75 on datasets containing one to four tuples per sentence, and 0.85 on a randomly curated set. Ablation studies show that the allocation module is most critical, with inter-entity attention contributing more than intra-entity attention. Error analysis attributes the density-related performance decline mainly to semantic overlap and syntactic complexity, with upstream extraction errors more prominent under sparse supervision and allocation errors concentrated in structurally complex templates. This approach delivers precise, structured outputs suitable for downstream analysis and offers a domain-adaptable alternative to prompt-based large models when strict correctness is required.
AI for materials / multi-tuple extraction / MatSciBERT / attention mechanism
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
OpenAI, Achiam, J.; Adler, S.; et al. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. Available online: https://doi.org/10.48550/arXiv.2303.08774. (accessed 23 Mar 2026) |
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
Anthropic. The Claude 3 model family: opus, sonnet, haiku. https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf. (accessed 2026-03-23) |
| [32] |
Gemini Team Google. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context. arXiv 2024, arXiv:2403.05530. Available online: https://doi.org/10.48550/arXiv.2403.05530. (accessed 23 Mar 2026) |
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
/
| 〈 |
|
〉 |