Large language models for generative information extraction: a survey

Derong XU; Wei CHEN; Wenjun PENG; Chao ZHANG; Tong XU; Xiangyu ZHAO; Xian WU; Yefeng ZHENG; Yang WANG; Enhong CHEN

doi:10.1007/s11704-024-40555-y

PDF(3347 KB)

Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (6) : 186357. DOI: 10.1007/s11704-024-40555-y

Excellent Young Computer Scientists Forum

REVIEW ARTICLE

Large language models for generative information extraction: a survey

Derong XU¹^,² ,
Wei CHEN¹ ,
Wenjun PENG¹ ,
Chao ZHANG¹^,² ,
Tong XU¹ ,
Xiangyu ZHAO² ,
Xian WU³ ,
Yefeng ZHENG³ ,
Yang WANG⁴ ,
Enhong CHEN¹

Author information +

History +

Abstract

Information Extraction (IE) aims to extract structural knowledge from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. As a result, numerous works have been proposed to integrate LLMs for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and techniques, and then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on a thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies. We maintain a public repository and consistently update related works and resources on GitHub (LLM4IE repository).

Graphical abstract

Keywords

information extraction / large language models / review

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Derong XU, Wei CHEN, Wenjun PENG, Chao ZHANG, Tong XU, Xiangyu ZHAO, Xian WU, Yefeng ZHENG, Yang WANG, Enhong CHEN. Large language models for generative information extraction: a survey. Front. Comput. Sci., 2024, 18(6): 186357 https://doi.org/10.1007/s11704-024-40555-y

1 Introduction

Information Extraction (IE) is a crucial domain in natural language processing (NLP) that converts plain text into structured knowledge (e.g., entities, relations, and events), and serves as a foundational requirement for a wide range of downstream tasks, such as knowledge graph construction [1], knowledge reasoning [2], and question answering [3]. Typical IE tasks consist of Named Entity Recognition (NER), Relation Extraction (RE) and Event Extraction (EE) [4-7]. However, performing IE tasks is inherently challenging. This is because IE involves extracting information from various sources and dealing with complex and ever-changing domain requirements [8]. Unlike traditional NLP tasks, IE encompasses a broad spectrum of objectives such as entity extraction, relationship extraction, and more. In IE, the extraction targets exhibit intricate structures where entities are presented as span structures (string structures) and relationships are represented as triple structures [4]. Additionally, in order to effectively handle different information extraction tasks, it is necessary to employ multiple independent models. These models are trained separately for each specific task, without sharing any resources. However, this approach comes with a drawback: managing a large number of information extraction models becomes costly in terms of the resources needed for construction and training, like annotated corpora.

The emergence of large language models (LLMs), such as GPT-4 [9], has significantly advanced the field of NLP, due to their extraordinary capabilities in text understanding and generation [10-12]. Pretraining LLMs using auto-regressive prediction allows them to capture the inherent patterns and semantic knowledge within text corpora [13-19]. This enhances LLMs with the capability to perform zero-shot and few-shot learning, enabling them to model various tasks consistently and serving as tools for data augmentation [20-22]. Furthermore, LLMs can serve as intelligent agents for complex task planning and execution, utilizing memory retrieval and various tools to enhance efficiency and successfully accomplish tasks [23-27]. Therefore, there has been a recent surge of interest in generative IE methods [28] that adopt LLMs to generate structural information rather than extracting structural information from plain text. These methods have been proven to be more practical in real-world scenarios compared to discriminated methods [29,30], as they can handle schemas containing millions of entities without significant performance degradation [31].

On the one hand, LLMs have attracted significant attention from researchers in exploring their potentials for various scenarios and tasks of IE. In addition to excelling in individual IE tasks, LLMs possess a remarkable ability to effectively model all IE tasks in a universal format. This is conducted by capturing inter-task dependencies with instructive prompts, and achieves consistent performance [4,5,32-36]. On the other hand, recent works have shown the outstanding generalization of LLMs to not only learn from IE training data through fine-tuning [33,37-39], but also extract information in few-shot and even zero-shot scenarios relying solely on in-context examples or instructions [40-44].

However, existing surveys [8,45,46] do not provide a comprehensive exploration of these areas for the above two groups of research works: 1) universal frameworks that cater to multiple tasks and 2) cutting-edge learning techniques for scenarios with limited training data. The community urgently needs a more in-depth analysis of how LLM can be more appropriately applied to IE tasks to improve the performance of the IE field. This is because there are still challenges and issues in applying LLM to IE in terms of learning and understanding knowledge [47]. These challenges include the misalignment between natural language output and structured form [6], hallucination problem in LLMs [48], contextual dependence, high computational resource requirements [49], difficulties in updating internal knowledge [50], etc.

In this survey, we provide a comprehensive exploration of LLMs for generative IE, as illustrated in Fig.1. To achieve this, we categorize existing methods mainly using two taxonomies: (1) a taxonomy of numerous IE subtasks, which aims to classify different types of information that can be extracted individually or uniformly, and (2) a taxonomy of IE techniques, which categorizes various novel approaches that utilize LLMs for generative IE, particularly on low-resource scenarios. In addition, we present a comprehensive review of studies that specifically focus on the application of IE techniques in various domains. And we discuss studies that aim to evaluate and analyze the performance of LLMs for IE. According to the above division, we construct a taxonomy of related studies as shown in Fig.2. We also compare several representative methods to gain deeper understanding of their potentials and limitations, and provide insightful analysis on future directions. To the best of our knowledge, this is the first survey on generative IE with LLMs.

Fig.1 LLMs have been extensively explored for generative IE. These studies encompass various IE techniques, specialized frameworks designed for a single subtask, and universal frameworks capable of addressing multiple subtasks simultaneously

Full size|PPT slide

Fig.2 Taxonomy of research in generative IE using LLMs. Some papers have been omitted due to space limitations

Full size|PPT slide

The remaining part of this survey is organized as follows: First, we introduce the definition of generative IE and the targets of all subtasks in Section 2. In Section 3, we introduce representative models for each task and universal IE, and compare their performance. Next, in Section 4, we summarize different learning techniques of LLMs for IE. Additionally, we introduce works proposed for special domains in Section 5, and present recent studies that evaluate and analyze the abilities of LLMs on IE tasks in Section 6. Moreover, we propose potential research directions for future studies in Section 7. In Section 8, we provide a comprehensive summary of the most commonly used LLMs and dataset statistics as a reference for researchers. Finally, we conclude in Section 9.

2 Preliminaries of generative IE

In this section, we provide a formal definition of discriminative and generative IE and summarize the IE subtasks, as outlined in [46]. This survey focuses primarily on the tasks of Named Entity Recognition (NER), Relation Extraction (RE), and Event Extraction (EE) [5,32], as these are tasks that receive the most attention in IE papers. Examples are shown in Fig.3.

Fig.3 Examples of different IE tasks

Full size|PPT slide

(1) For a discriminative model, the objective is to maximize the likelihood of the data. This involves considering an annotated sentence

x

and a collection of potentially overlapping triples.

t_{j} = (s, r, o)

(1)

p_{c l s} (t | x) = \prod_{(s, r, o) \in t_{j}} p ((s, r, o) | x_{j}) .

Another method of discrimination involves generating tags using sequential tagging for each position

i

. For a sentence

x

consisting of n words, n different tag sequences are annotated based on the “BIESO” (Begin, Inside, End, Single, Outside) notation schema. During the training of the model, the objective is to maximize the log-likelihood of the target tag sequence by utilizing the hidden vector

h_{i}

at each position

i

(2)

p_{t a g} (y | x) = \frac{\exp (h_{i}, y_{i})}{\exp (\exp (h_{i}, y_{i}^{'}))} .

(2) The three types of IE tasks can be formulated in a generative manner. Given an input text (e.g., sentence or document) with a sequence of

n

tokens

X = [x_{1}, . . ., x_{n}]

, a prompt

P

, and the target extraction sequence

Y = [y_{1}, . . ., y_{m}]

, the objective is to maximize the conditional probability in an auto-regressive formulation:

(3)

p_{θ} (Y | X, P) = \prod_{i = 1}^{m} p_{θ} (y_{i} | X, P, y_{< i}),

where

θ

donates the parameters of LLMs, which can be frozen or trainable. In the era of LLMs, several works have proposed appending extra prompts or instructions

P

X

to enhance the comprehensibility of the task for LLMs [5]. Even though the input text

X

remains the same, the target sequence varies for each task:

● Named entity recognition (NER) includes two tasks: Entity identification and entity typing. The former task is concerned with identifying spans of entities, and the latter task focuses on assigning types to these identified entities.

● Relation extraction (RE) may have different settings in different works. We categorize it using three terms following the literature [4,5]: (1) Relation classification refers to classifying the relation type between two given entities; (2) Relation triplet refers to identifying the relation type and the corresponding head and tail entity spans; (3) Relation strict refers to giving the correct relation type, the span, and the type of head and tail entity.

● Event extraction (EE) can be divided into two subtasks [151]: (1) Event detection (also known as Event Trigger Extraction in some works) aims to identify and classify the trigger word and type that most clearly represents the occurrence of an event; (2) Event Arguments Extraction aims to identify and classify arguments with specific roles in the events from the sentences.

3 LLMs for different information extraction tasks

In this section, we first present a introduction to the relevant LLM technologies for IE subtasks, including NER (Section 3.1), RE (Section 3.2), and EE (Section 3.3). We also conduct experimental analysis to evaluate the performance of various methods on representative datasets for three subtasks. Furthermore, we categorize universal IE frameworks into two categories: natural language (NL-LLMs) and code language (Code-LLMs), to discuss how they model the three distinct tasks using a unified paradigm (Section 3.4).

3.1 Named entity recognition

NER is a crucial component of IE and can be seen as a predecessor or subtask of RE and EE. It is also a fundamental task in other NLP tasks, thus attracting significant attention from researchers to explore new possibilities in the era of LLMs [47,90-95,108,122,124-126,164,165,172,173,177,179,280]. Considering the gap between the sequence labeling and generation models, GPT-NER [42] transformed NER into a generative task and proposed a self-verification strategy to rectify the mislabeling of NULL inputs as entities. Xie et al. [63] proposed a training-free self-improving framework that uses LLM to predict on the unlabeled corpus to obtain pseudo demonstrations, thereby enhancing the performance of LLM on zero-shot NER.

Tab.1 shows the comparison of NER on five main datasets, which are obtained from their original papers. We can observe that:

Tab.1 Comparison of LLMs for named entity recognition (identification & typing) with the Micro-F1 metric (%). $^{†}$ indicates that the model is discriminative. We demonstrate some universal and discriminative models for comparison. IE techniques include Cross-Domain Learning (CDL), Zero-Shot Prompting (ZS Pr), In-Context Learning (ICL), Supervised Fine-Tuning (SFT), Data Augmentation (DA). Uni. denotes whether the model is universal. Onto. 5 denotes the OntoNotes 5.0. Details of datasets and backbones are presented in Section 8. The settings for all subsequent tables are consistent with this format

Representative model	Paradigm	Uni.	Backbone	ACE04	ACE05	CoNLL03	Onto. 5	GENIA
DEEPSTRUCT [151]	CDL		GLM-10B	−	28.1	44.4	42.5	47.2
Xie et al. [63]	ZS Pr		GPT-3.5-turbo	−	32.27	74.51	−	52.06
CODEIE [36]	ICL	$\sqrt$	Code-davinci-002	55.29	54.82	82.32	−	−
Code4UIE [6]	ICL	$\sqrt$	Text-davinci-003	60.1	60.9	83.6	−	−
PromptNER [43]	ICL		GPT-4	−	−	83.48	−	58.44
Xie et al. [63]	ICL		GPT-3.5-turbo	−	55.54	84.51	−	58.72
GPT-NER [42]	ICL		Text-davinci-003	74.2	73.59	90.91	82.2	64.42
TANL [33]	SFT	$\sqrt$	T5-base	−	84.9	91.7	89.8	76.4
Cui et al. [53]	SFT		BART	−	−	92.55	−	−
Yan et al. [37]	SFT		BART-large	86.84	84.74	93.24	90.38	79.23
UIE [4]	SFT	$\sqrt$	T5-large	86.89	85.78	92.99	−	−
DEEPSTRUCT [151]	SFT	$\sqrt$	GLM-10B	−	86.9	93.0	87.8	80.8
Xia et al. [56]	SFT		BART-large	87.63	86.22	93.48	90.63	79.49
InstructUIE [192]	SFT	$\sqrt$	Flan-T5-11B	−	86.66	92.94	90.19	74.71
UniNER [64]	SFT		LLaMA-7B	87.5	87.6	−	89.1	80.6
GoLLIE [32]	SFT	$\sqrt$	Code-LLaMA-34B	−	89.6	93.1	84.6	−
EnTDA [58]	DA		T5-base	88.21	87.56	93.88	91.34	82.25
YAYI-UIE [155]	SFT	$\sqrt$	Baichuan2-13B	−	81.78	96.77	87.04	75.21
ToNER [88]	SFT		Flan-T5-3B	88.09	86.68	93.59	91.30	−
KnowCoder [160]	SFT	$\sqrt$	LLaMA2-7B	86.2	86.1	95.1	88.2	76.7
GNER [67]	SFT		Flan-T5-11B	−	−	93.28	91.83	−
USM $^{†}$ [30]	SFT	$\sqrt$	RoBERTa-large	87.62	87.14	93.16	−	−
RexUIE $^{†}$ [197]	SFT	$\sqrt$	DeBERTa-v3-large	87.25	87.23	93.67	−	−
Mirror $^{†}$ [198]	SFT	$\sqrt$	DeBERTa-v3-large	87.16	85.34	92.73	−	−

● 1) the models in few-shot and zero-shot settings still have a huge performance gap behind the SFT and DA.

● 2) Even though there is little difference between backbones, the performance varies greatly between methods under the ICL paradigm. For example, GPT-NER opens up at least a 6% F1 value gap with other methods on each dataset, and up to about 19% higher.

● 3) Compared to ICL, there are only minor differences in performance between different models after SFT, even though the parameters in their backbones can differ by up to a few hundred times.

● 4) The performance of models trained with the SFT paradigm exhibits greater variability across datasets, particularly for universal models. For instance, YAYI-UIE [155] and KnowCoder [160] outperform other models by at least 2.89% and 1.22% respectively on CoNLL03, while experiencing a decrease of 7.04% and 5.55% respectively compared to the best model on GENIA. We hypothesize that this discrepancy may arise from these models being trained on diverse datasets primarily sourced from news and social media domains, whereas GENIA represents a smaller fraction in the training set as it belongs to the biomedical domain; thus resulting in significant distribution gaps between different fields that ultimately impact performance outcomes. Furthermore, universal models necessitate simultaneous training across multiple subtasks, which inevitably exacerbates this distribution gap.

● 5) The EnTDA [58], on the contrary, exhibits exceptional stability and outperforms other methods on all datasets, thereby substantiating the robustness of the DA paradigm in addressing specific tasks.

3.2 Relation extraction

RE also plays an important role in IE, which usually has different setups in different studies as mentioned in Section 2. To address the poor performance of LLMs on RE tasks due to the low incidence of RE in instruction-tuning datasets, as indicated by [187], QA4RE [98] introduced a framework that enhances LLMs’ performance by aligning RE tasks with QA tasks. GPT-RE [99] incorporates task-aware representations and enriching demonstrations with reasoning logic to improve the low relevance between entity and relation and the inability to explain input-label mappings. Due to the large number of predefined relation types and uncontrolled LLMs, Li et al. [161] proposed to integrate LLM with a natural language inference module to generate relation triples, enhancing document-level relation datasets.

As shown in Tab.2 and Tab.3, we statistically found that universal IE models are generally better solving harder Relation Strict problems due to learning the dependencies between multi-tasks [4,33], while the task-specific methods solve simpler RE subtasks (e.g. relation classification). Moreover, when compared to NER, it becomes evident that the performance disparities among models in RE are more pronounced, thereby highlighting the potential of LLM in addressing RE tasks.

Tab.2 Comparison of LLMs for relation extraction with the “relation strict” [4] Micro-F1 metric (%). $^{†}$ indicates that the model is discriminative

Representative model	Technique	Uni.	Backbone	NYT	ACE05	ADE	CoNLL04	SciERC
CodeKGC [159]	ZS Pr	$\sqrt$	Text-davinci-003	−	−	42.8	35.9	15.3
CODEIE [36]	ICL	$\sqrt$	Code-davinci-002	32.17	14.02	−	53.1	7.74
CodeKGC [159]	ICL	$\sqrt$	Text-davinci-003	−	−	64.6	49.8	24.0
Code4UIE [6]	ICL	$\sqrt$	Text-davinci-002	54.4	17.5	58.6	54.4	−
REBEL [39]	SFT		BART-large	91.96	−	82.21	75.35	−
UIE [4]	SFT	$\sqrt$	T5-large	−	66.06	−	75.0	36.53
InstructUIE [5]	SFT	$\sqrt$	Flan-T5-11B	90.47	−	82.31	78.48	45.15
GoLLIE [32]	SFT	$\sqrt$	Code-LLaMA-34B	−	70.1	−	−	−
YAYI-UIE [155]	SFT	$\sqrt$	Baichuan2-13B	89.97	−	84.41	79.73	40.94
KnowCoder [160]	SFT	$\sqrt$	LLaMA2-7B	93.7	64.5	84.8	73.3	40.0
USM $^{†}$ [30]	SFT	$\sqrt$	RoBERTa-large	−	67.88	−	78.84	37.36
RexUIE $^{†}$ [197]	SFT	$\sqrt$	DeBERTa-v3-large	−	64.87	−	78.39	38.37

Tab.3 Comparison of LLMs for relation classification with the Micro-F1 metric (%)

Representative model	Technique	Backbone	TACRED	Re-TACRED	TACREV	SemEval
QA4RE [98]	ZS Pr	Text-davinci-003	59.4	61.2	59.4	43.3
SUMASK [96]	ZS Pr	GPT-3.5-turbo-0301	79.6	73.8	75.1	−
GPT-RE [99]	ICL	Text-davinci-003	72.15	−	−	91.9
Xu et al. [44]	ICL	Text-davinci-003	31.0	51.8	31.9	−
REBEL [39]	SFT	BART-large	−	90.36	−	−
Xu et al. [44]	DA	Text-davinci-003	37.4	66.2	41.0	−

3.3 Event extraction

Events can be defined as specific occurrences or incidents that happen in a given context. Recently, many studies [135,138] aim to understand events and capture their correlations by extracting event triggers and arguments using LLMs, which is essential for various reasoning tasks [199]. For example, Code4Struct [41] leveraged LLMs to translate text into code to tackle structured prediction tasks, using programming language features to introduce external knowledge and constraints through alignment between structure and code. Considering the interrelation between different arguments in the extended context, PGAD [137] employed a text diffusion model to create a variety of context-aware prompt representations, enhancing both sentence-level and document-level event argument extraction by identifying multiple role-specific argument span queries and coordinating them with context.

As can be seen from results of recent studies in Tab.4, vast majority of current methods are based on SFT paradigm, and only a few methods that use LLMs for either zero-shot or few-shot learning. In addition, generative methods outperform discriminative ones by a wide margin, especially in metric for argument classification task, indicating the great potential of generative LLMs for EE.

Tab.4 Comparison of Micro-F1 Values for Event Extraction on ACE05. Evaluation tasks include: Trigger Identification (Trg-I), Trigger Classification (Trg-C), Argument Identification (Arg-I), and Argument Classification (Arg-C). $^{†}$ indicates that the model is discriminative

Representative model	Technique	Uni.	Backbone	Trg-I	Trg-C	Arg-I	Arg-C
Code4Struct [41]	ZS Pr		Code-davinci-002	−	−	50.6	36.0
Code4UIE [6]	ICL	$\sqrt$	GPT-3.5-turbo-16k	−	37.4	−	21.3
Code4Struct [41]	ICL		Code-davinci-002	−	−	62.1	58.5
TANL [33]	SFT	$\sqrt$	T5-base	72.9	68.4	50.1	47.6
Text2Event [131]	SFT		T5-large	−	71.9	−	53.8
BART-Gen [130]	SFT		BART-large	−	−	69.9	66.7
UIE [4]	SFT	$\sqrt$	T5-large	−	73.36	−	54.79
GTEE-DYNPREF [135]	SFT		BART-large	−	72.6	−	55.8
DEEPSTRUCT [151]	SFT	$\sqrt$	GLM-10B	73.5	69.8	59.4	56.2
PAIE [134]	SFT		BART-large	−	−	75.7	72.7
PGAD [137]	SFT		BART-base	−	−	74.1	70.5
QGA-EE [138]	SFT		T5-large	−	−	75.0	72.8
InstructUIE [5]	SFT	$\sqrt$	Flan-T5-11B	−	77.13	−	72.94
GoLLIE [32]	SFT	$\sqrt$	Code-LLaMA-34B	−	71.9	−	68.6
YAYI-UIE [155]	SFT	$\sqrt$	Baichuan2-13B	−	65.0	−	62.71
KnowCoder [160]	SFT	$\sqrt$	LLaMA2-7B	−	74.2	−	70.3
USM $^{†}$ [30]	SFT	$\sqrt$	RoBERTa-large	−	72.41	−	55.83
RexUIE $^{†}$ [197]	SFT	$\sqrt$	DeBERTa-v3-large	−	75.17	−	59.15
Mirror $^{†}$ [198]	SFT	$\sqrt$	DeBERTa-v3-large	−	74.44	−	55.88

3.4 Universal information extraction

Different IE tasks vary a lot, with different optimization objectives and task-specific schemas, requiring separate models to handle the complexity of different IE tasks, settings, and scenarios [4]. As shown in Fig.2, many works solely focus on a subtask of IE. However, recent advancements in LLMs have led to the proposal of a unified generative framework in several studies [5,32]. This framework aims to model all IE tasks, capturing the common abilities of IE and learning the dependencies across multiple tasks. The prompt format for Uni-IE can typically be divided into natural language-based LLMs (NL-LLMs) and code-based LLMs (Code-LLMs), as illustrated in Fig.4.

Fig.4 The comparison of prompts of NL-LLMs and Code-LLMs for universal IE. Both NL-based and code-based methods attempt to construct a universal schema, but they differ in terms of prompt format and the way they utilize the generation capabilities of LLMs. This figure is adopted from [5] and [6]

Full size|PPT slide

NL-LLMs. NL-based methods unify all IE tasks in a universal natural language schema. For instance, UIE [4] proposed a unified text-to-structure generation framework that encodes extraction structures, and captured common IE abilities through a structured extraction language. InstructUIE [5] enhanced UIE by constructing expert-written instructions for fine-tuning LLMs to consistently model different IE tasks and capture the inter-task dependency. Additionally, ChatIE [40] explored the use of LLMs like ChatGPT [200] in zero-shot prompting, transforming the task into a multi-turn question-answering problem.

Code-LLMs. On the other hand, code-based methods unify IE tasks by generating code with a universal programming schema [41]. Code4UIE [6] proposed a universal retrieval-augmented code generation framework, which leverages Python classes to define schemas and uses in-context learning to generate codes that extract structural knowledge from texts. Besides, CodeKGC [159] leveraged the structural knowledge inherent in code and employed the schema-aware prompts and rationale-enhanced generation to improve performance. To enable LLMs to adhere to guidelines out-of-the-box, GoLLIE [32] enhanced zero-shot ability on unseen IE tasks by aligning with annotation guidelines.

In general, NL-LLMs are trained on a wide range of text and can understand and generate human language, which allows the prompts and instructions to be conciser and easier to design. However, NL-LLMs may produce unnatural outputs due to the distinct syntax and structure of IE tasks [159], which differ from the training data. Code, being a formalized language, possesses the inherent capability to accurately represent knowledge across diverse schemas, which makes it more suitable for structural prediction [6]. But code-based methods often require a substantial amount of text to define a Python class (see Fig.4), which in turn limits the sample size of the context. Through experimental comparison in Tab.1, Tab.2, and Tab.4, we can observe that Uni-IE models after SFT outperform task-specific models in the NER, RE, and EE tasks for most datasets.

3.5 Summaries of tasks

In this section, we explored the three primary tasks within IE and their associated sub-tasks, as well as frameworks that unify these tasks [4]. A key observation is the increased application of generative LLMs to NER [67,178], which has seen significant advancements and remains a highly active area of research within IE. In contrast, tasks such as relation extraction and event extraction have seen relatively less application, particularly for strict relation extraction [39] and detection-only event extraction [128]. This discrepancy may be attributed to the critical importance of NER, its utility in various downstream tasks, and its relatively simpler structured outputs, which facilitate large-scale supervised fine-tuning [1].

Additionally, a notable trend is the emergence of unified models for IE tasks, leveraging the general text understanding capabilities of modern large models [4,6,156]. Several studies have proposed unified generative frameworks that capture the common abilities across IE tasks and learn the dependencies between them. These unified approaches can be broadly categorized into natural language-based and code-based methods, each with distinct advantages and limitations.

Experimental results summarized in Tab.1, Tab.2, Tab.3, and Tab.4 reveal that universal IE models generally perform better on more complex strict relation extraction tasks due to their ability to learn dependencies across multiple tasks. Furthermore, generative methods significantly outperform discriminative ones in event extraction tasks, particularly in argument classification, highlighting the substantial potential of generative LLMs in this domain.

4 Techniques of LLMs for generative IE

In this section, we categorize recent methods based on their techniques, including Data Augmentation (Section 4.1, refers to enhancing information by applying various transformations to the existing data using LLMs), Prompt Design (Section 4.2, refers to the use of task-specific instructions or prompts, to direct the behavior of a model.), Zero-shot Learning (Section 4.3, refers to generating answer without any training examples for the specific IE tasks), Constrained Decoding Generation (Section 4.4, refers to the process of generating text while adhering to specific constraints or rule), Few-shot Learning (Section 4.5, refers to the generalization from a small number of labeled examples by training or in-context learning), Supervised Fine-tuning (Section 4.6, refers to further training LLMs on IE tasks using labeled data), to highlight the commonly used approaches for adapting LLMs to IE.

4.1 Data augmentation

Data augmentation involves generating meaningful and diverse data to effectively enhance the training examples, while avoiding the introduction of unrealistic, misleading, and offset patterns. Recent powerful LLMs also demonstrate remarkable performance in data generation tasks [201-205], which has attracted the attention of many researchers using LLMs to generate synthetic data for IE [44,61,101,127,161-163]. It can be roughly divided into four strategies according to their techniques, as shown in Fig.5.

Fig.5 Comparison of data augmentation methods

Full size|PPT slide

Data annotation. This strategy directly generates labeled structural data using LLMs. For instance, Zhang et al. [61] proposed LLMaAA to improve accuracy and data efficiency by employing LLMs as annotators within an active learning loop, thereby optimizing both the annotation and training processes. AugURE [101] employed within-sentence pairs augmentation and cross-sentence pairs extraction to enhance the diversity of positive pairs for unsupervised RE, and introduced margin loss for sentence pairs. Li et al. [161] addressed the challenge of document-level relation extraction from a long context, and proposed an automated annotation method for DocRE that combines a LLM with a natural language inference module to generate relation triples.

Knowledge retrieval. This strategy effectively retrieves related information from LLMs for IE, which is similar to retrieval augmentation generation (RAG) [206]. PGIM [167] presented a two-stage framework for multimodal NER, which leveraged ChatGPT as an implicit knowledge base to heuristically retrieve auxiliary knowledge for more efficient entity prediction. Amalvy et al. [59] proposed to improve NER on long documents by generating a synthetic context retrieval training dataset, and training a neural context retriever. Chen et al. [166] focused on the task of multimodal NER and RE, and showcased their approach to enhancing commonsense reasoning skills by employing a range of CoT prompts that encompass different aspects, including nouns, sentences, and multimodal inputs. Additionally, they employed data augmentation techniques such as style, entity, and image manipulation to further improve the performance.

Inverse generation. This strategy encourages LLMs to generate natural text or questions by utilizing the structural data provided as input, which aligns with the training paradigm of LLMs. For example, SynthIE [168] showed that LLMs can create high-quality synthetic data for complex tasks by reversing the task direction, and train new models that outperformed previous benchmarks. Rather than relying on ground-truth targets, which limits generalizability and scalability, STAR [100] generated structures from valid triggers and arguments, then generates passages with LLMs by designing fine-grained instructions, error identification, and iterative revision. In order to address the challenge of maintaining text coherence while preserving entities, EnTDA [58] proposed a method that involves manipulating the entity list of the original text. This manipulation includes adding, deleting, replacing, or swapping entities. And it further introduced a diversity beam search to enhance diversity in the Entity-to-text generation process.

Synthetic datasets for fine-tuning. This strategy involves generating some data for instruction-tuning by querying LLMs. Typically, this data is generated by a more powerful model for fine-tuning instructions in dialogues, and then distilled onto a smaller model, enabling it to also acquire stronger zero-shot capabilities [64,67,84]. For instance, UniversalNER [64] explored targeted distillation with mission-focused instruction tuning to train student models that excel in open NER, which used ChatGPT as the teacher model and distilled it into smaller UniversalNER model. GNER [67] proposed the integration of negative instances to enhance existing methods by introducing contextual information and improving label boundaries. The authors trained their model using Pile-NER, a dataset that includes approximately 240K entities across 13K distinct entity categories, which are sampled from the Pile Corpus [207] and processed using ChatGPT to generate the entities. The results demonstrate improved zero-shot performance across unseen entity domains.

4.2 Prompt design

Prompt engineering is a technique employed to enhance the capabilities of LLMs without altering their network parameters [49,208-212]. It entails utilizing task-specific instructions, known as prompts, to guide the behavior of model [13,213,214]. The practice of prompt design has proven successful in various applications [215-218]. Undoubtedly, effective prompt design also plays a crucial role in improving the performance of LLMs on IE tasks. In this section, we categorize prompt design approaches based on different strategies and provide a detailed explanation of the underlying motivations behind these techniques:

Question answer (QA). LLMs are instruction-tuned using a dialogue-based method [219,220], which creates a gap when compared to the structured prediction requirements of the IE task. Consequently, recent efforts have been made to employ a QA prompt approach to enhance LLMs and facilitate the generation of desired results more seamlessly [40,90,96,98,108]. For example, QA4RE [98] found that LLMs tend to perform poorly on RE because the instruction-tuning datasets used to train them have a low incidence of RE tasks, and thus proposes reformulating RE as multiple-choice QA to take advantage of the higher prevalence of QA tasks in instruction-tuning datasets. Li et al. [96] analyzed the limitations of existing RE prompts and proposed a new approach called summarize-and-ask prompting, which transforms zero-shot RE inputs into effective QA format using LLMs recursively. It also showed promise in extracting overlapping relations and effectively handling the challenge of none-of-the-above relations. ChatIE [40] proposed a two-stage framework to transform the zero-shot IE task into a multi-turn QA problem. The framework initially identified the different types of elements, and then a sequential IE process is executed for each identified element type. Each stage utilized a multi-turn QA process, where prompts are constructed using templates and previously extracted information.

Chain-of-thought (CoT). CoT [221] is a prompting strategy used with LLMs to enhance their performance, by providing a step-wise and coherent reasoning chain as a prompt to guide the model’s response generation. CoT prompting has gained attention in recent years [222], and there is ongoing research exploring its effectiveness on IE tasks [43,91,166,169-171]. PromptNER [43] combined LLMs with prompt-based heuristics and entity definitions. It prompted a LLM to generate a list of potential entities and their explanations based on provided entity type definitions. Bian et al. [171] proposed a two-step approach to improve Biomedical NER using LLMs. Their approach involved leveraging CoT to enable the LLM to tackle the Biomedical NER task in a step-by-step manner, breaking it down into entity span extraction and entity type determination. Yuan et al. [170] also proposed CoT prompt as a two-stage approach to guide ChatGPT in performing temporal relation reasoning for temporal RE task.

Self-improvement. While COT technology can partially elicit the reasoning ability of LLM, it is unavoidable that LLM will still generate factual errors. As a result, there have been efforts [63,73,144] to employ LLMs for iterative self-verification and self-improvement, aiming to rectify the results. For instance, Xie et al. [63] proposed a training-free self-improving framework, which consists of three main steps. First, LLMs made predictions on unlabeled corpus, generating self-annotated dataset through self-consistency. Second, the authors explored different strategies to select reliable annotations. Finally, during inference, demonstrations from reliable self-annotated dataset were retrieved for in-context learning. ProgGen [73] involved guiding LLMs to engage in self-reflection within specific domains, resulting in the generation of domain-relevant attributes that contribute to the creation of training data enriched with attributes. Additionally, ProgGen employd a proactive strategy by generating entity terms in advance and constructing NER context data around these entities, thereby circumventing the challenges LLMs face when dealing with intricate structures.

4.3 Zero-shot learning

The primary challenges in zero-shot learning involve ensuring that the model can effectively generalize to tasks and domains it has not been trained on, while also aligning the pre-training paradigm of LLMs to these novel tasks. Due to the large amount of knowledge embedded within, LLMs show impressive abilities in zero-shot scenarios of unseen tasks [40,223]. To achieve zero-shot cross-domain generalization of LLMs in IE tasks, several works have been proposed [5,32,64]. These works offered a universal framework for modeling various IE tasks and domains, and introduced innovative training prompts, e.g., instruction [5] and guidelines [32], for learning and capturing the inter-task dependencies of known tasks and generalizing them to unseen tasks and domains. For cross-type generalization, BART-Gen [130] introduced a document-level neural model that frames the EE task as conditional generation, which led to improved performance and strong portability to unseen event types.

On the other hand, in order to improve the ability of LLMs under zero-shot prompts (no need for fine-tuning), QA4RE [98] and ChatIE [40] proposed to transform IE into a multi-turn question-answering problem for aligning it with QA task, which is a predominant task in instruction-tuning datasets. Li et al. [96] integrated the chain-of-thought approach and proposed the summarize-and-ask prompting to solve the challenge of ensuring the reliability of outputs from black box LLMs [62].

4.4 Constrained decoding generation

LLMs are pretrained models that are initially trained on the task of predicting the next token in a sequence. This pretraining allows researchers to leverage the advantages of these models for various NLP tasks [8,224]. However, LLMs are primarily designed for generating free-form text and may not perform well on structured prediction tasks where only a limited set of outputs are valid.

To address this challenge, researchers have explored the use of constrained generation for better decoding [4,123,174,175]. Constrained decoding generation in autoregressive LLMs refers to the process of generating text while adhering to specific constraints or rules [225-227]. For example, Geng et al. [174] proposed using grammar-constrained decoding as a solution to control the generation of LMs, ensuring that the output follows a given structure. The authors introduced input-dependent grammars to enhance flexibility, allowing the grammar to depend on the input and generate different output structures for different inputs. Unlike previous methods, which generate information token by token, Zaratiana et al. [123] introduced a new approach for extracting entities and relations by generating a linearized graph with nodes representing text spans and edges representing relation triplets. They used a transformer encoder-decoder architecture with a pointing mechanism and a dynamic vocabulary of spans and relation types, to capture the structural characteristics and boundaries while grounding the output in the original text.

4.5 Few-shot learning

Few-shot learning has access to only a limited number of labeled examples, leading to challenges like overfitting and difficulty in capturing complex relationships [228]. Fortunately, scaling up the parameters of LLMs gives them amazing generalization capabilities compared to small pre-trained models, allowing them to achieve excellent performance in few-shot settings [43,91]. Paolini et al. [33] proposed the Translation between Augmented Natural Languages framework; Lu et al. [4] proposed a text-to-structure generation framework; and Chen et al. [60] proposed collaborative domain-prefix tuning for NER. These methods have achieved state-of-the-art performance and demonstrated effectiveness in few-shot setting. Despite the success of LLMs, they face challenges in training-free IE because of the difference between sequence labeling and text-generation models [187]. To overcome these limitations, GPT-NER [42] introduced a self-verification strategy, while GPT-RE [99] enhanced task-aware representations and incorporates reasoning logic into enriched demonstrations. These methods effectively showcase how GPT can be leveraged for in-context learning. CODEIE [36] and CodeKGC [159] showed that converting IE tasks into code generation tasks with code-style prompts and in-context examples leads to superior performance compared to NL-LLMs. This is because code-style prompts provide a more effective representation of structured output, enabling them to effectively handle the complex dependencies in natural language.

4.6 Supervised fine-tuning

Using all training data to fine-tune LLMs is the most common and promising method [88,110,111,113,129,141,143,229-233], which allows the model to capture the underlying structural patterns in the data, and generalize well to unseen samples. For example, DEEPSTRUCT [151] introduced structure pre-training on a collection of task-agnostic corpora to enhance the structural understanding of language models. UniNER [64] explored targeted distillation and mission-focused instruction tuning to train student models for broad applications, such as NER. GIELLM [34] fine-tuned LLMs using mixed datasets, which are collected to utilize the mutual reinforcement effect to enhance performance on multiple tasks.

4.7 Summaries of techniques

Data augmentation [61,101] is a widely explored direction due to its potential in enhancing model performance. LLMs possess extensive implicit knowledge and strong text generation capabilities, making them well-suited for data annotation tasks [222]. However, while data augmentation can expand the training dataset and improve model generalization, they may also introduce noise. For example, knowledge retrieval methods can supply additional context about entities and relationships, enriching the extraction process. Nevertheless, the noise can detract from the overall quality of the extracted information [94,167].

On the other hand, designing effective prompts remains a significant challenge for leveraging LLMs like GPT-4 [9]. Although approaches such as QA dialogue and CoT [104] strategies can enhance LLMs’ IE capabilities, purely prompt-based methods still lag behind supervised fine-tuning with smaller models. Supervised fine-tuning [5,64,67], including cross-domain and few-shot learning, generally yields better performance, which suggests that combining large-scale LLMs for data annotation with supervised fine-tuning using additional data can optimize performance and reduce manual annotation costs [68,95,164].

In summary, while various techniques for IE using LLMs offer distinct advantages, they also come with challenges. Thoughtfully combining these strategies can yield significant enhancements in IE tasks.

5 Applications on specific domains

It is non-ignorable that LLMs have tremendous potential for extracting information from some specific domains, such as mulitmodal [57,94,166,167], multilingual [83,133,163], medical [85,91,162,163,171,172,179,183-188,234,235], scientific [47,80,180-182], astronomical [164,173], historical [126,189], and legal [78,89,115]. Additionally, we present statistical data in Tab.5.

Tab.5 The statistics of research in specific domain

Domain	Method	Task	Paradigm	Backbone
Multimodal	Cai et al. [57]	NER	ICL	GPT-3.5
	PGIM [167]	NER	DA	BLIP2, GPT-3.5
	RiVEG [94]	NER	DA	Vicuna, LLaMA2, GPT-3.5
	Chen et al. [166]	NER, RE	DA	BLIP2, GPT-3.5, GPT-4
Multilingual	Meoni et al. [163]	NER	DA	Text-davinci-003
	Naguib et al. [83]	NER	ICL	−
	Huang et al. [133]	EE	CDL	mBART, mT5
Medical	Bian et al. [171]	NER	DA	GPT-3.5
	Hu et al. [184]	NER	ZS Pr	GPT-3.5, GPT-4
	Meoni et al. [163]	NER	DA	Text-davinci-003
	Naguib et al. [83]	NER	ICL	−
	VANER [188]	NER	SFT	LLaMA2
	RT [91]	NER	ICL	GPT-4
	Munnangi et al. [85]	NER	ZS Pr, ICL, FS FT	GPT-3.5, GPT-4, Claude-2, LLaMA2
	Monajatipoor et al. [179]	NER	SFT, ICL	−
	Hu et al. [172]	NER	ZS Pr, ICL	GPT-3.5, GPT-4
	Gutiérrez et al. [187]	NER, RE	ICL	GPT-3
	GPT3+R [185]	NER, RE	−	Text-davinci-002
	Labrak et al. [186]	NER, RE	−	GPT-3.5, Flan-UL2, Tk-Insturct, Alpaca
	Tang et al. [162]	NER, RE	DA	GPT-3.5
	DICE [183]	EE	SFT	T5-Large
Scientific	Bölücü et al. [80]	NER	ICL	GPT-3.5
	Dunn et al. [180]	NER, RE	SFT	GPT-3
	PolyIE [181]	NER, RE	ICL	GPT-3.5, GPT-4
	Foppiano et al. [47]	NER, RE	ZS Pr, ICL, SFT	GPT-3.5, GPT-4
	Dagdelen et al. [182]	NER, RE	SFT	GPT-3, LLaMA2
Astronomical	Shao et al. [173]	NER	ZS Pr	GPT-3.5, GPT-4, Claude-2, LLaMA2
Astronomical	Evans et al. [164]	NER	DA	GPT-3.5, GPT-4
Historical	González-Gallardo et al. [189]	NER	ZS Pr	GPT-3.5
Historical	CHisIEC [126]	NER, RE	SFT, ICL	ChatGLM2, Alpaca2, GPT-3.5
Legal	Nunes et al. [89]	NER	ICL	Sabia
	Oliveira et al. [78]	NER	DA	GPT-3
	Kwak et al. [115]	RE, EE	ICL	GPT-4

For instance, Chen et al. [166] introduced a conditional prompt distillation method that enhances a model’s reasoning ability by combining text-image pairs with chain-of-thought knowledge from LLMs, significantly improving performance in multimodal NER and multimodal RE. Tang et al. [162] explored the potential of LLMs in the field of clinical text mining and proposed a novel training approach, which leverages synthetic data to enhance performance and address privacy issues. Dunn et al. [180] presented a sequence-to-sequence approach by using GPT-3 for joint NER and RE from complex scientific text, demonstrating its effectiveness in extracting complex scientific knowledge in material chemistry. Shao et al. [173] explored the use of LLMs to extract astronomical knowledge entities from astrophysical journal articles. Conventional approaches encounter difficulties such as manual labor and limited generalizability. To address these issues, the authors proposed a prompting strategy that incorporates five prompt elements and eight combination prompt, aiming to specifically target celestial object identifiers and telescope names as the experimental objects of interest. González et al. [189] examined the performance of ChatGPT in the NER task specifically on historical texts. The research not only compared ChatGPT with other state-of-the-art language model-based systems but also delved into the challenges encountered in this zero-shot setting. The findings shed light on the limitations of entity identification in historical texts, encompassing concerns related to annotation guidelines, entity complexity, code-switching, and the specificity of prompts.

6 Evaluation & analysis

Despite the great success of LLMs in various natural language processing tasks [236,237], their performance in the field of information extraction still have room for improvement [193]. To alleviate this problem, recent research has explored the capabilities of LLMs with respect to the major subtasks of IE, i.e., NER [83,190], RE [169,170], and EE [191]. Considering the superior reasoning capabilities of LLMs, Xie et al. [190] proposed four reasoning strategies for NER, which are designed to simulate ChatGPT’s potential on zero-shot NER. Wadhwa et al. [169] explored the use of LLMs for RE and found that few-shot prompting with GPT-3 achieves near SOTA performance, while Flan-T5 can be improved with chain-of-thought style explanations generated via GPT-3. For EE task, Gao et al. [191] showed that ChatGPT still struggles with it due to the need for complex instructions and a lack of robustness.

Along this line, some researchers performed a more comprehensive analysis of LLMs by evaluating multiple IE subtasks simultaneously. Li et al. [195] evaluated ChatGPT’s overall ability on IE, including performance, explainability, calibration, and faithfulness. They found that ChatGPT mostly performs worse than BERT-based models in the standard IE setting, but excellently in the OpenIE setting. Furthermore, Han et al. [193] introduced a soft-matching strategy for a more precise evaluation and identified "unannotated spans" as the predominant error type, highlighting potential issues with data annotation quality.

7 Future directions

The development of LLMs for generative IE is still in its early stages, and there are numerous opportunities for improvement.

Universal IE. Previous generative IE methods and benchmarks are often tailored for specific domains or tasks, limiting their generalizability [51]. Although some unified methods [4] using LLMs have been proposed recently, they still suffer from certain limitations (e.g., long context input, and misalignment of structured output). Therefore, further development of universal IE frameworks that can adapt flexibly to different domains and tasks is a promising research direction (such as integrating the insights of task-specific models to assist in constructing universal models).

Low-resource IE. The generative IE system with LLMs still encounters challenges in resource-limited scenarios [195]. There is a need for further exploration of in-context learning of LLMs, particularly in terms of improving the selection of examples. Future research should prioritize the development of robust cross-domain learning techniques [5], such as domain adaptation or multi-task learning, to leverage knowledge from resource-rich domains. Additionally, efficient data annotation strategies with LLMs should also be explored.

Prompt design for IE. Designing effective instructions is considered to have a significant impact on the performance of LLMs [224,238]. One aspect of prompt design is to build input and output pairs that can better align with pre-training stage of LLMs (e.g., code generation) [6]. Another aspect is optimizing the prompt for better model understanding and reasoning (e.g., Chain-of-Thought) [96], by encouraging LLMs to make logical inferences or explainable generation. Additionally, researchers can explore interactive prompt design (such as multi-turn QA) [98], where LLMs can iteratively refine or provide feedback on the generated extractions automatically.

Open IE. Open IE settings present greater challenges for IE models, as they do not provide a candidate label set and rely solely on the models’ ability to comprehend the task. LLMs, with their knowledge and understanding abilities, have significant advantages in some Open IE tasks [64]. However, there are still instances of poor performance in more challenging tasks [28], which require further exploration by researchers.

8 Benchmarks & backbones

8.1 Representative datasets

In this section, we introduce representative datasets of NER, RE and EE respectively, and show brief summary of each dataset in the Tab.6 to help readers better understand these tasks.

Tab.6 A summary of some representative IE datasets

Dataset	Summary
CoNLL03 [239]	Dataset scope: NER;1,393 English news articles from Reuters;909 German news articles;4 annotated entity types.
CoNLL04 [240]	Dataset scope: RE;entity-relation triples from news sentences;4 entity types;5 relation types.
ACE05 [241]	Dataset scope: NER, RE and EE;various text types and genres;7 entity types; 7 relation types;33 event types and 22 argument roles.

CoNLL03. The CoNLL03 [239] is a representative dataset for NER, including 1,393 news articles in English and 909 news articles in German. The English portion of the corpus was sourced from the Shared Task dataset curated by Reuters. This dataset encompasses annotations for four distinct entity types: PER (person), LOC (location), ORG (organization), and MISC (including all other types of entities).

CoNLL04. The CoNLL04 [240] is a well-known benchmark dataset for RE tasks, comprising sentences extracted from news articles that each contain at least one entity-relation triple. It has four kinds of entities (PER, ORG, LOC, OTH) and five kinds of relations (Kill, Work For, Live In, OrgBased In, Located In).

ACE05. Automatic Content Extraction 05 [241] is widely recognized and utilized for IE tasks. It serves as a valuable resource to assess the efficacy of automated systems in extracting structured information from diverse textual sources, encompassing news articles, interviews, reports, etc. Moreover, this dataset covers a broad range of genres including politics, economics, sports, among others. Specifically for the EE task within ACE05, it comprises 599 news documents that encapsulate 33 distinct event types and 22 argument roles.

8.2 Benchmarks

As shown in Tab.7, we compiled a comprehensive collection of benchmarks covering various domains and tasks, to provide researchers with a valuable resource that they can query and reference as needed. Moreover, we also summarized the download links for each dataset in our open source repository (LLM4IE repository).

Tab.7 Statistics of common datasets for information extraction. $*$ denotes the dataset is multimodal. # refers to the number of categories or sentences. The data in the table is partially referenced from InstructUIE [192]

Task	Dataset	Domain	#Class	#Train	#Val	#Test
NER	ACE04 [242]	News	7	6,202	745	812
	ACE05 [241]	News	7	7,299	971	1,060
	BC5CDR [243]	Biomedical	2	4,560	4,581	4,797
	Broad Twitter Corpus [244]	Social Media	3	6,338	1,001	2,000
	CADEC [245]	Biomedical	1	5,340	1,097	1,160
	CoNLL03 [239]	News	4	14,041	3,250	3,453
	CoNLLpp [246]	News	4	14,041	3,250	3,453
	CrossNER-AI [247]	Artificial Intelligence	14	100	350	431
	CrossNER-Literature [247]	Literary	12	100	400	416
	CrossNER-Music [247]	Musical	13	100	380	465
	CrossNER-Politics [247]	Political	9	199	540	650
	CrossNER-Science [247]	Scientific	17	200	450	543
	FabNER [248]	Scientific	12	9,435	2,182	2,064
	Few-NERD [249]	General	66	131,767	18,824	37,468
	FindVehicle [250]	Traffic	21	21,565	20,777	20,777
	GENIA [251]	Biomedical	5	15,023	1,669	1,854
	HarveyNER [252]	Social Media	4	3,967	1,301	1,303
	MIT-Movie [253]	Social Media	12	9,774	2,442	2,442
	MIT-Restaurant [253]	Social Media	8	7,659	1,520	1,520
	MultiNERD [254]	Wikipedia	16	134,144	10,000	10,000
	NCBI [255]	Biomedical	4	5,432	923	940
	OntoNotes 5.0 [256]	General	18	59,924	8,528	8,262
	ShARe13 [257]	Biomedical	1	8,508	12,050	9,009
	ShARe14 [258]	Biomedical	1	17,404	1,360	15,850
	SNAP^* [259]	Social Media	4	4,290	1,432	1,459
	TTC [260]	Social Meida	3	10,000	500	1,500
	Tweebank-NER [261]	Social Media	4	1,639	710	1,201
	Twitter2015^* [262]	Social Media	4	4,000	1,000	3,357
	Twitter2017^* [259]	Social Media	4	3,373	723	723
	TwitterNER7 [263]	Social Media	7	7,111	886	576
	WikiDiverse^* [264]	News	13	6,312	755	757
	WNUT2017 [265]	Social Media	6	3,394	1,009	1,287
RE	ACE05 [241]	News	7	10,051	2,420	2,050
	ADE [266]	Biomedical	1	3,417	427	428
	CoNLL04 [240]	News	5	922	231	288
	DocRED [267]	Wikipedia	96	3,008	300	700
	MNRE^* [268]	Social Media	23	12,247	1,624	1,614
	NYT [269]	News	24	56,196	5,000	5,000
	Re-TACRED [270]	News	40	58,465	19,584	13,418
	SciERC [271]	Scientific	7	1,366	187	397
	SemEval2010 [272]	General	19	6,507	1,493	2,717
	TACRED [273]	News	42	68,124	22,631	15,509
	TACREV [274]	News	42	68,124	22,631	15,509
EE	ACE05 [241]	News	33/22	17,172	923	832
	CASIE [275]	Cybersecurity	5/26	11,189	1,778	3,208
	GENIA11 [276]	Biomedical	9/11	8,730	1,091	1,092
	GENIA13 [277]	Biomedical	13/7	4,000	500	500
	PHEE [278]	Biomedical	2/16	2,898	961	968
	RAMS [279]	News	139/65	7,329	924	871
	WikiEvents [130]	Wikipedia	50/59	5,262	378	492

8.3 Backbones

We briefly describe some backbones that are commonly used in the field of generative information extraction, which is shown in Tab.8.

Tab.8 The common backbones for generative information extraction. We mark the commonly used base and large versions for better reference

Series	Model	Size	Base model	Open source	Instruction tuning	RLHF
BART	BART [281]	140M (base), 400M (large)	−	$\sqrt$	−	−
T5	T5 [282]	60M, 220M (base), 770M (large), 3B, 11B	−	$\sqrt$	−	−
	mT5 [283]	300M, 580M (base), 1.2B (large), 3.7B, 13B	−	$\sqrt$	−	−
	Flan-T5 [284]	80M, 250M (base), 780M (large), 3B, 11B	T5	$\sqrt$	$\sqrt$	−
GLM	GLM [285]	110M (base), 335M (large), 410M, 515M, 2B, 10B	−	$\sqrt$	−	−
GLM	ChatGLM series	6B	GLM	$\sqrt$	$\sqrt$	$\sqrt$
LLaMA	LLaMA [286]	7B, 13B, 33B, 65B	−	$\sqrt$	−	−
	Alpaca [287]	7B, 13B	LLaMA	$\sqrt$	$\sqrt$	−
	Vicuna [288]	7B, 13B	LLaMA	$\sqrt$	$\sqrt$	−
	LLaMA2 [289]	7B, 13B, 70B	−	$\sqrt$	−	−
	LLaMA2-chat [289]	7B, 13B, 70B	LLaMA2	$\sqrt$	$\sqrt$	$\sqrt$
	Code-LLaMA [290]	7B, 13B, 34B	LLaMA2	$\sqrt$	−	−
	LLaMA3 series	8B, 70B, 405B	−	$\sqrt$	$\sqrt$	$\sqrt$
GPT	GPT-2 [291]	117M, 345M, 762M, 1.5B	−	$\sqrt$	−	−
	GPT-3 [292]	175B	−	−	−	−
	GPT-J [293]	6B	GPT-3	$\sqrt$	−	−
	Code-davinci-002 [294]	−	GPT-3	−	$\sqrt$	−
	Text-davinci-002 [294]	−	GPT-3	−	$\sqrt$	−
	Text-davinci-003 [294]	−	GPT-3	−	$\sqrt$	$\sqrt$
	GPT-3.5-turbo series [200]	−	−	−	$\sqrt$	$\sqrt$
	GPT-4 series [9]	−	−	−	$\sqrt$	$\sqrt$

9 Conclusion

In this survey, We first introduced the subtasks of IE and discussed some universal frameworks aiming to unify all IE tasks with LLMs. Additional theoretical and experimental analysis provided insightful exploration for these methods. Then we delved into different IE techniques that apply LLMs for IE and demonstrate their potential for extracting information in specific domains. Finally, we analyzed the current challenges and presented potential future directions. We hope this survey can provide a valuable resource for researchers to explore more efficient utilization of LLMs for IE.

Derong Xu is currently a joint PhD student at University of Science and Technology of China and City University of Hong Kong, China. His research interests focus on Multimodal Knowledge graph and large language models

Wei Chen is now a doctoral student at the University of Science and Technology of China, China. His research interests include data mining, information extraction, and large language models

Wenjun Peng received his Master’s degree in the School of Computer Science and Technology at University of Science and Technology of China (USTC), China. He obtained his BE degree from Sichuan University, China in 2021. His main research interests include data mining, multimodal learning, and person re-ID

Chao Zhang received the BE degree in software engineering from Shandong University, China in 2022. He is currently pursuing a joint PhD degree at the University of Science and Technology of China and City University of Hong Kong, China. His research interests include data mining, multimodal learning, and large language models

Tong Xu is currently working as a Professor at University of Science and Technology of China (USTC), Hefei, China. He has authored more than 100 top-tier journal and conference papers in related fields, including TKDE, TMC, TMM, TOMM, KDD, SIGIR, WWW, ACM MM, etc. He was the recipient of the Best Paper Award of KSEM 2020 and the Area Chair Award for NLP Application Track of ACL 2023

Xiangyu Zhao is an assistant professor of the school of data science at City University of Hong Kong (CityU), China. His current research interests include data mining and machine learning. He has published more than 100 papers in top conferences and journals. His research has been awarded ICDM’22 and ICDM’21 Best-ranked Papers, Global Top 100 Chinese New Stars in AI, Huawei Innovation Research Program, CCF-Tencent Open Fund (twice), CCF-Ant Research Fund, Ant Group Research Fund, Tencent Focused Research Fund, and nomination for Joint AAAI/ACM SIGAI Doctoral Dissertation Award. He serves as top data science conference (senior) program committee members and session chairs, and journal guest editors and reviewers

Xian Wu is now a Principal Researcher in Tencent. Before joining Tencent, he worked as a Senior Scientist Manager and a Staff Researcher in Microsoft and IBM Research. Xian Wu received his PhD degree from Shanghai Jiao Tong University, China. His research interests includes Medical AI, Natural Language Processing and Multi-Modal modeling. Xian Wu has published papers in Nature Computational Science, NPJ digital medicine, T-PAMI, CVPR, NeurIPS, ACL, WWW, KDD, AAAI, IJCAI, etc. He also served as PC member of BMJ, T-PAMI, TKDE, TKDD, TOIS, TIST, CVPR, ICCV, AAAI, etc

Yefeng Zheng received BE and M.E. degrees from Tsinghua University, China in 1998 and 2001, respectively, and a PhD degree from University of Maryland, College Park, USA in 2005. After graduation, he worked at Siemens Corporate Research in Princeton, New Jersey, USA on medical image analysis before joining Tencent in Shenzhen, China in 2018. He is now Distinguished Scientist and Director of Tencent Jarvis Research Center, leading the company’s initiative on medical artificial intelligence. He has published 300+ papers and invented 80+ US patents. His work has been cited more than 22,000 times with h-index of 74. He is a fellow of IEEE, a fellow of AIMBE, and an Associate Editor of IEEE Transactions on Medical Imaging

Yang Wang is currently working as an Engineer at Anhui Conch Information Technology Engineering Co., Ltd., China. He has more than 10 years of IT project implementation experience in building materials industry, applied for 11 invention patents, published 2 technological papers, and participated in 3 large-scale national science and technology projects

Enhong Chen (CCF Fellow, IEEE Fellow) is a professor of University of Science and Technology of China (USTC), China. His general area of research includes data mining and machine learning, social network analysis, and recommender systems. He has published more than 300 papers in refereed conferences and journals, including Nature Communications, IEEE/ACM Transactions, KDD, NIPS, IJCAI, AAAI, etc. He was on program committees of numerous conferences including KDD, ICDM, and SDM. He received the Best Application Paper Award on KDD-2008, the Best Research Paper Award on ICDM-2011, and the Best of SDM-2015. His research is supported by the National Science Foundation for Distinguished Young Scholars of China

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zhong L, Wu J, Li Q, Peng H, Wu X. A comprehensive survey on automatic knowledge graph construction. ACM Computing Surveys, 2024, 56( 4): 94

[2]	Fu C, Chen T, Qu M, Jin W, Ren X. Collaborative policy learning for open knowledge graph reasoning. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 2672−2681

[3]	Srihari R K, Li W. Information extraction supported question answering. In: Proceedings of the 8th Text REtrieval Conference. 1999

[4]	Lu Y, Liu Q, Dai D, Xiao X, Lin H, Han X, Sun L, Wu H. Unified structure generation for universal information extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 5755−5772

[5]	Wang X, Zhou W, Zu C, Xia H, Chen T, Zhang Y, Zheng R, Ye J, Zhang Q, Gui T, Kang J, Yang J, Li S, Du C. InstructUIE: multi-task instruction tuning for unified information extraction. 2023, arXiv preprint arXiv: 2304.08085

[6]	Guo Y, Li Z, Jin X, Liu Y, Zeng Y, Liu W, Li X, Yang P, Bai L, Guo J, Cheng X. Retrieval-augmented code generation for universal information extraction. 2023, arXiv preprint arXiv: 2311.02962

[7]	Zhong Y, Xu T, Luo P. Contextualized hybrid prompt-tuning for generation-based event extraction. In: Proceedings of the 16th International Conference on Knowledge Science, Engineering and Management. 2023, 374−386

[8]	Zhou S, Yu B, Sun A, Long C, Li J, Sun J. A survey on neural open information extraction: current status and future directions. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence. 2022, 5694−5701

[9]	OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, , . GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774

[10]	Liu Q, He Y, Lian D, Zheng Z, Xu T, Liu C, Chen E. UniMEL: a unified framework for multimodal entity linking with large language models. 2024, arXiv preprint arXiv: 2407.16160

[11]	Peng W, Li G, Jiang Y, Wang Z, Ou D, Zeng X, Xu D, Xu T, Chen E. Large language model based long-tail query rewriting in Taobao search. In: Companion Proceedings of the ACM Web Conference 2024. 2024, 20−28

[12]	Zhang C, Zhang H, Wu S, Wu D, Xu T, Gao Y, Hu Y, Chen E. NoteLLM-2: multimodal large representation models for recommendation. 2024, arXiv preprint arXiv: 2405.16789

[13]	Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023, 55( 9): 195

[14]	Lyu Y, Li Z, Niu S, Xiong F, Tang B, Wang W, Wu H, Liu H, Xu T, Chen E. CRUD-RAG: a comprehensive Chinese benchmark for retrieval-augmented generation of large language models. 2024, arXiv preprint arXiv: 2401.17043

[15]	Lyu Y, Niu Z, Xie Z, Zhang C, Xu T, Wang Y, Chen E. Retrieve-plan-generation: an iterative planning and answering framework for knowledge-intensive LLM generation. 2024, arXiv preprint arXiv: 2406.14979

[16]	Jia P, Liu Y, Zhao X, Li X, Hao C, Wang S, Yin D. MILL: mutual verification with large language models for zero-shot query expansion. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024, 2498−2518

[17]	Wang M, Zhao Y, Liu J, Chen J, Zhuang C, Gu J, Guo R, Zhao X. Large multimodal model compression via iterative efficient pruning and distillation. In: Companion Proceedings of the ACM Web Conference 2024. 2024, 235−244

[18]	Fu Z, Li X, Wu C, Wang Y, Dong K, Zhao X, Zhao M, Guo H, Tang R. A unified framework for multi-domain CTR prediction via large language models. 2023, arXiv preprint arXiv: 2312.10743

[19]	Jia P, Liu Y, Li X, Zhao X, Wang Y, Du Y, Han X, Wei X, Wang S, Yin D. G3: an effective and adaptive framework for worldwide geolocalization using large multi-modality models. 2024, arXiv preprint arXiv: 2405.14702

[20]	Zhang C, Wu S, Zhang H, Xu T, Gao Y, Hu Y, Chen E. NoteLLM: a retrievable large language model for note recommendation. In: Companion Proceedings of the ACM Web Conference 2024. 2024, 170−179

[21]	Wang X, Chen Z, Xie Z, Xu T, He Y, Chen E. In-context former: lightning-fast compressing context for large language model. 2024, arXiv preprint arXiv: 2406.13618

[22]	Zhu J, Liu S, Yu Y, Tang B, Yan Y, Li Z, Xiong F, Xu T, Blaschko M B. FastMem: fast memorization of prompt improves context awareness of large language models. 2024, arXiv preprint arXiv: 2406.16069

[23]	Wang L, Ma C, Feng X, Zhang Z, Yang H, Zhang J, Chen Z, Tang J, Chen X, Lin Y, Zhao W X, Wei Z, Wen J. A survey on large language model based autonomous agents. Frontiers of Computer Science, 2024, 18( 6): 186345

[24]	Guan Z, Wu L, Zhao H, He M, Fan J. Enhancing collaborative semantics of language model-driven recommendations via graph-aware learning. 2024, arXiv preprint arXiv: 2406.13235

[25]	Huang J, She Q, Jiang W, Wu H, Hao Y, Xu T, Wu F. QDMR-based planning-and-solving prompting for complex reasoning tasks. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 13395−13406

[26]	Fu C, Dai Y, Luo Y, Li L, Ren S, Zhang R, Wang Z, Zhou C, Shen Y, Zhang M, Chen P, Li Y, Lin S, Zhao S, Li K, Xu T, Zheng X, Chen E, Ji R, Sun X. Video-MME: the first-ever comprehensive evaluation benchmark of multi-modal LLMs in video analysis. 2024, arXiv preprint arXiv: 2405.21075

[27]	Li X, Su L, Jia P, Zhao X, Cheng S, Wang J, Yin D. Agent4Ranking: semantic robust ranking via personalized query rewriting using multi-agent LLM. 2023, arXiv preprint arXiv: 2312.15450

[28]	Qi J, Zhang C, Wang X, Zeng K, Yu J, Liu J, Hou L, Li J, Bin X. Preserving knowledge invariance: rethinking robustness evaluation of open information extraction. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 5876−5890

[29]	Chen W, Zhao L, Luo P, Xu T, Zheng Y, Chen E. HEProto: a hierarchical enhancing ProtoNet based on multi-task learning for few-shot named entity recognition. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023, 296−305

[30]	Lou J, Lu Y, Dai D, Jia W, Lin H, Han X, Sun L, Wu H. Universal information extraction as unified semantic matching. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 13318−13326

[31]	Josifoski M, De Cao N, Peyrard M, Petroni F, West R. GenIE: generative information extraction. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 4626−4643

[32]	Sainz O, García-Ferrero I, Agerri R, de Lacalle O L, Rigau G, Agirre E. GoLLIE: annotation guidelines improve zero-shot information-extraction. In: Proceedings of the ICLR 2024. 2024

[33]	Paolini G, Athiwaratkun B, Krone J, Ma J, Achille A, Anubhai R, dos Santos C N, Xiang B, Soatto S. Structured prediction as translation between augmented natural languages. In: Proceedings of the 9th International Conference on Learning Representations. 2021

[34]	Gan C, Zhang Q, Mori T. GIELLM: Japanese general information extraction large language model utilizing mutual reinforcement effect. 2023, arXiv preprint arXiv: 2311.06838

[35]	Fei H, Wu S, Li J, Li B, Li F, Qin L, Zhang M, Zhang M, Chua T S. LasUIE: unifying information extraction with latent adaptive structure-aware generative language model. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1125

[36]	Li P, Sun T, Tang Q, Yan H, Wu Y, Huang X, Qiu X. CodeIE: large code generation models are better few-shot information extractors. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 15339−15353

[37]	Yan H, Gui T, Dai J, Guo Q, Zhang Z, Qiu X. A unified generative framework for various NER subtasks. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 5808−5822

[38]	Huang K H, Tang S, Peng N. Document-level entity-based extraction as template generation. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 5257−5269

[39]	Cabot P L H, Navigli R. REBEL: relation extraction by end-to-end language generation. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021. 2021, 2370−2381

[40]	Wei X, Cui X, Cheng N, Wang X, Zhang X, Huang S, Xie P, Xu J, Chen Y, Zhang M, Jiang Y, Han W. ChatIE: zero-shot information extraction via chatting with ChatGPT. 2023, arXiv preprint arXiv: 2302.10205

[41]	Wang X, Li S, Ji H. Code4Struct: code generation for few-shot event structure prediction. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 3640−3663

[42]	Wang S, Sun X, Li X, Ouyang R, Wu F, Zhang T, Li J, Wang G. GPT-NER: named entity recognition via large language models. 2023, arXiv preprint arXiv: 2304.10428

[43]	Ashok D, Lipton Z C. PromptNER: prompting for named entity recognition. 2023, arXiv preprint arXiv: 2305.15444

[44]	Xu X, Zhu Y, Wang X, Zhang N. How to unleash the power of large language models for few-shot relation extraction? In: Proceedings of the 4th Workshop on Simple and Efficient Natural Language Processing. 2023, 190−200

[45]	Nasar Z, Jaffry S W, Malik M K. Named entity recognition and relation extraction: state-of-the-art. ACM Computing Surveys, 2022, 54( 1): 20

[46]	Ye H, Zhang N, Chen H, Chen H. Generative knowledge graph construction: a review. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 1−17

[47]	Foppiano L, Lambard G, Amagasa T, Ishii M. Mining experimental data from materials science literature with large language models: an evaluation study. Science and Technology of Advanced Materials: Methods, 2024, 4( 1): 2356506

[48]	Liu H, Xue W, Chen Y, Chen D, Zhao X, Wang K, Hou L, Li R, Peng W. A survey on hallucination in large vision-language models. 2024, arXiv preprint arXiv: 2402.00253

[49]	Sahoo P, Singh A K, Saha S, Jain V, Mondal S, Chadha A. A systematic survey of prompt engineering in large language models: techniques and applications. 2024, arXiv preprint arXiv: 2402.07927

[50]	Xu D, Zhang Z, Zhu Z, Lin Z, Liu Q, Wu X, Xu T, Wang W, Ye Y, Zhao X, Chen E, Zheng Y. Editing factual knowledge and explanatory ability of medical large language models. 2024, arXiv preprint arXiv: 2402.18099

[51]	Yuan S, Yang D, Liang J, Li Z, Liu J, Huang J, Xiao Y. Generative entity typing with curriculum learning. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 3061−3073

[52]	Feng Y, Pratapa A, Mortensen D. Calibrated seq2seq models for efficient and generalizable ultra-fine entity typing. In: Proceedings of the Findings of the Association for Computational Linguistics. 2023, 15550−15560

[53]	Cui L, Wu Y, Liu J, Yang S, Zhang Y. Template-based named entity recognition using BART. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021, 1835−1845

[54]	Zhang S, Shen Y, Tan Z, Wu Y, Lu W. De-bias for generative extraction in unified NER task. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 808−818

[55]	Wang L, Li R, Yan Y, Yan Y, Wang S, Wu W, Xu W. InstructionNER: a multi-task instruction-based generative framework for few-shot NER. 2022, arXiv preprint arXiv: 2203.03903

[56]	Xia Y, Zhao Y, Wu W, Li S. Debiasing generative named entity recognition by calibrating sequence likelihood. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 1137−1148

[57]	Cai C, Wang Q, Liang B, Qin B, Yang M, Wong K F, Xu R. In-context learning for few-shot multimodal named entity recognition. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 2969−2979

[58]	Hu X, Jiang Y, Liu A, Huang Z, Xie P, Huang F, Wen L, Yu P S. Entity-to-text based data augmentation for various named entity recognition tasks. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. 2023, 9072−9087

[59]	Amalvy A, Labatut V, Dufour R. Learning to rank context for named entity recognition using a synthetic dataset. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 10372−10382

[60]	Chen X, Li L, Qiao S, Zhang N, Tan C, Jiang Y, Huang F, Chen H. One model for all domains: collaborative domain-prefix tuning for cross-domain NER. In: Proceedings of the 32nd International Joint Conference on Artificial Intelligence. 2023, 559

[61]	Zhang R, Li Y, Ma Y, Zhou M, Zou L. LLMaAA: making large language models as active annotators. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 13088−13103

[62]	Ma Y, Cao Y, Hong Y, Sun A. Large language model is not a good few-shot information extractor, but a good reranker for hard samples! In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 10572−10601

[63]	Xie T, Li Q, Zhang Y, Liu Z, Wang H. Self-improving for zero-shot named entity recognition with large language models. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024, 583−593

[64]	Zhou W, Zhang S, Gu Y, Chen M, Poon H. UniversalNER: targeted distillation from large language models for open named entity recognition. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[65]	Zhang X, Tan M, Zhang J, Zhu W. NAG-NER: a unified non-autoregressive generation framework for various NER tasks. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 676−686

[66]	Su J, Yu H. Unified named entity recognition as multi-label sequence generation. In: Proceedings of 2023 International Joint Conference on Neural Networks. 2023, 1−8

[67]	Ding Y, Li J, Wang P, Tang Z, Bowen Y, Zhang M. Rethinking negative instances for generative named entity recognition. In: Proceedings of the Findings of the Association for Computational Linguistics ACL 2024. 2024, 3461−3475

[68]	Bogdanov S, Constantin A, Bernard T, Crabbé B, Bernard E. NuNER: entity recognition encoder pre-training via LLM-annotated data. 2024, arXiv preprint arXiv: 2402.15343

[69]	Chen J, Lu Y, Lin H, Lou J, Jia W, Dai D, Wu H, Cao B, Han X, Sun L. Learning in-context learning for named entity recognition. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 13661−13675

[70]	Zhang Z, Zhao Y, Gao H, Hu M. LinkNER: linking local named entity recognition models to large language models using uncertainty. In: Proceedings of the ACM Web Conference 2024. 2024, 4047−4058

[71]	Tang X, Wang J, Su Q. Small language model is a good guide for large language model in Chinese entity relation extraction. 2024, arXiv preprint arXiv: 2402.14373

[72]	Popovič N, Färber M. Embedded named entity recognition using probing classifiers. 2024, arXiv preprint arXiv: 2403.11747

[73]	Heng Y, Deng C, Li Y, Yu Y, Li Y, Zhang R, Zhang C. ProgGen: generating named entity recognition datasets step-by-step with self-reflexive large language models. In: Proceedings of the Findings of the Association for Computational Linguistics ACL 2024. 2024, 15992−16030

[74]	Mo Y, Yang J, Liu J, Zhang S, Wang J, Li Z. C-ICL: contrastive in-context learning for information extraction. 2024, arXiv preprint arXiv: 2402.11254

[75]	Keloth V K, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C H, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics, 2024, 40( 4): btae163

[76]	Kim S, Seo K, Chae H, Yeo J, Lee D. VerifiNER: verification-augmented NER via knowledge-grounded reasoning with large language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 2441−2461

[77]	Li Y, Ramprasad R, Zhang C. A simple but effective approach to improve structured language model output for information extraction. 2024, arXiv preprint arXiv: 2402.13364

[78]	Oliveira V, Nogueira G, Faleiros T, Marcacini R. Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents. Artificial Intelligence and Law, 2024: 1-21

[79]	Lu J, Yang Z, Wang Y, Liu X, Namee B M, Huang C. PaDeLLM-NER: parallel decoding in large language models for named entity recognition. 2024, arXiv preprint arXiv: 2402.04838

[80]	Bölücü N, Rybinski M, Wan S. Impact of sample selection on in-context learning for entity extraction from scientific writing. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 5090−5107

[81]	Liu J, Wang J, Huang H, Zhang R, Yang M, Zhao T. Improving LLM-based health information extraction with in-context learning. In: Proceedings of the 9th China Health Information Processing Conference. 2024, 49−59

[82]	Wu C, Ke W, Wang P, Luo Z, Li G, Chen W. ConsistNER: towards instructive NER demonstrations for LLMs with the consistency of ontology and context. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 19234−19242

[83]	Naguib M, Tannier X, Névéol A. Few-shot clinical entity recognition in English, French and Spanish: masked language models outperform generative model prompting. 2024, arXiv preprint arXiv: 2402.12801

[84]	Zaratiana U, Tomeh N, Holat P, Charnois T. GliNER: generalist model for named entity recognition using bidirectional transformer. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024, 5364−5376

[85]	Munnangi M, Feldman S, Wallace B, Amir S, Hope T, Naik A. On-the-fly definition augmentation of LLMs for biomedical NER. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024, 3833−3854

[86]	Zhang M, Wang B, Fei H, Zhang M. In-context learning for few-shot nested named entity recognition. 2024, arXiv preprint arXiv: 2402.01182

[87]	Yan F, Yu P, Chen X. LTNER: Large language model tagging for named entity recognition with contextualized entity marking. 2024, arXiv preprint arXiv: 2404.05624

[88]	Jiang G, Luo Z, Shi Y, Wang D, Liang J, Yang D. ToNER: type-oriented named entity recognition with generative language model. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 16251−16262

[89]	Nunes R O, Spritzer A S, Dal Sasso Freitas C, Balreira D S. Out of sesame street: a study of portuguese legal named entity recognition through in-context learning. In: Proceedings of the 26th International Conference on Enterprise Information Systems. 2024

[90]	Hou W, Zhao W, Liu X, Guo W. Knowledge-enriched prompt for low-resource named entity recognition. ACM Transactions on Asian and Low-Resource Language Information Processing, 2024, 23( 5): 72

[91]	Li M, Zhou H, Yang H, Zhang R. RT: a retrieving and chain-of-thought framework for few-shot medical named entity recognition. Journal of the American Medical Informatics Association, 2024, 13( 9): 1929–1938

[92]	Jiang G, Ding Z, Shi Y, Yang D. P-ICL: point in-context learning for named entity recognition with large language models. 2024, arXiv preprint arXiv: 2405.04960

[93]	Xie T, Zhang J, Zhang Y, Liang Y, Li Q, Wang H. Retrieval augmented instruction tuning for open ner with large language models. 2024, arXiv preprint arXiv:2406.17305

[94]	Li J, Li H, Sun D, Wang J, Zhang W, Wang Z, Pan G. LLMs as bridges: reformulating grounded multimodal named entity recognition. In: Proceedings of the Findings of the Association for Computational Linguistics ACL 2024. 2024, 1302−1318

[95]	Ye J, Xu N, Wang Y, Zhou J, Zhang Q, Gui T, Huang X. LLM-DA: data augmentation via large language models for few-shot named entity recognition. 2024, arXiv preprint arXiv: 2402.14568

[96]	Li G, Wang P, Ke W. Revisiting large language models as zero-shot relation extractors. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 6877−6892

[97]	Pang C, Cao Y, Ding Q, Luo P. Guideline learning for in-context information extraction. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 15372−15389

[98]	Zhang K, Gutierrez B J, Su Y. Aligning instruction tasks unlocks large language models as zero-shot relation extractors. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. 2023, 794−812

[99]	Wan Z, Cheng F, Mao Z, Liu Q, Song H, Li J, Kurohashi S. GPT-RE: in-context learning for relation extraction using large language models. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 3534−3547

[100]

Ma M D, Wang X, Kung P N, Brantingham P J, Peng N, Wang W. STAR: boosting low-resource information extraction by structure-to-text data generation with large language models. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 18751−18759

[101]

Wang Q, Zhou K, Qiao Q, Li Y, Li Q. Improving unsupervised relation extraction by augmenting diverse sentence pairs. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 12136−12147

[102]

Li B, Yu D, Ye W, Zhang J, Zhang S. Sequence generation with label augmentation for relation extraction. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 13043−13050

[103]

Guo Q, Yang Y, Yan H, Qiu X, Zhang Z. DORE: document ordered relation extraction based on generative framework. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 3463−3474

[104]

Ma X, Li J, Zhang M. Chain of thought with explicit evidence reasoning for few-shot relation extraction. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 2334−2352

[105]

Zhou S, Meng Y, Jin B, Han J. Grasping the essentials: tailoring large language models for zero-shot relation extraction. 2024, arXiv preprint arXiv: 2402.11142

[106]

Qi J, Ji K, Wang X, Yu J, Zeng K, Hou L, Li J, Xu B. Mastering the task of open information extraction with large language models and consistent reasoning environment. 2023, arXiv preprint arXiv: 2310.10590

[107]

Li G, Wang P, Liu J, Guo Y, Ji K, Shang Z, Xu Z. Meta in-context learning makes large language models better zero and few-shot relation extractors. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence. 2024

[108]

Otto W, Upadhyaya S, Dietze S. Enhancing software-related information extraction via single-choice question answering with large language models. 2024, arXiv preprint arXiv: 2404.05587

[109]

Shi Z, Luo H. CRE-LLM: a domain-specific Chinese relation extraction framework with fine-tuned large language model. 2024, arXiv preprint arXiv: 2404.18085

[110]

Li G, Wang P, Ke W, Guo Y, Ji K, Shang Z, Liu J, Xu Z. Recall, retrieve and reason: towards better in-context relation extraction. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence. 2024

[111]

Li G, Xu Z, Shang Z, Liu J, Ji K, Guo Y. Empirical analysis of dialogue relation extraction with large language models. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence. 2024

[112]

Efeoglu S, Paschke A. Retrieval-augmented generation-based relation extraction. 2024, arXiv preprint arXiv: 2404.13397

[113]

Li Y, Peng X, Li J, Zuo X, Peng S, Pei D, Tao C, Xu H, Hong N. Relation extraction using large language models: a case study on acupuncture point locations. Journal of the American Medical Informatics Association, 2024: ocae233

[114]

Fan Z, He S. Efficient data learning for open information extraction with pre-trained language models. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 13056−13063

[115]

Kwak A S, Jeong C, Forte G, Bambauer D, Morrison C, Surdeanu M. Information extraction from legal wills: how well does GPT-4 do? In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 4336−4353

[116]

Sun Q, Huang K, Yang X, Tong R, Zhang K, Poria S. Consistency guided knowledge retrieval and denoising in LLMs for zero-shot document-level relation triplet extraction. In: Proceedings of the ACM Web Conference 2024. 2024, 4407−4416

[117]

Ozyurt Y, Feuerriegel S, Zhang C. In-context few-shot relation extraction via pre-trained language models. 2023, arXiv preprint arXiv: 2310.11085

[118]

Xue L, Zhang D, Dong Y, Tang J. AutoRE: document-level relation extraction with large language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 211−220

[119]

Liu Y, Peng X, Du T, Yin J, Liu W, Zhang X. ERA-CoT: improving chain-of-thought through entity relationship analysis. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 8780−8794

[120]

Li G, Ke W, Wang P, Xu Z, Ji K, Liu J, Shang Z, Luo Q. Unlocking instructive in-context learning with tabular prompting for relational triple extraction. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024

[121]

Ding Z, Huang W, Liang J, Xiao Y, Yang D. Improving recall of large language models: a model collaboration approach for relational triple extraction. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 8890−8901

[122]

Ni X, Li P, Li H. Unified text structuralization with instruction-tuned language models. 2023, arXiv preprint arXiv: 2303.14956

[123]

Zaratiana U, Tomeh N, Holat P, Charnois T. An autoregressive text-to-graph framework for joint entity and relation extraction. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 19477−19487

[124]

Peng L, Wang Z, Yao F, Wang Z, Shang J. MetaIE: distilling a meta model from LLM for all kinds of information extraction tasks. 2024, arXiv preprint arXiv: 2404.00457

[125]

Atuhurra J, Dujohn S C, Kamigaito H, Shindo H, Watanabe T. Distilling named entity recognition models for endangered species from large language models. 2024, arXiv preprint arXiv: 2403.15430

[126]

Tang X, Su Q, Wang J, Deng Z. CHisIEC: an information extraction corpus for ancient Chinese history. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 3192−3202

[127]

Ben Veyseh A P, Lai V, Dernoncourt F, Nguyen T H. Unleash GPT-2 power for event detection. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 6271−6282

[128]

Xia N, Yu H, Wang Y, Xuan J, Luo X. DAFS: a domain aware few shot generative model for event detection. Machine Learning, 2023, 112( 3): 1011–1031

[129]

Cai Z, Kung P N, Suvarna A, Ma M, Bansal H, Chang B, Brantingham P J, Wang W, Peng N. Improving event definition following for zero-shot event detection. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024

[130]

Li S, Ji H, Han J. Document-level event argument extraction by conditional generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 894−908

[131]

Lu Y, Lin H, Xu J, Han X, Tang J, Li A, Sun L, Liao M, Chen S. Text2Event: controllable sequence-to-structure generation for end-to-end event extraction. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 2795−2806

[132]

Zhou Y, Shen T, Geng X, Long G, Jiang D. ClarET: pre-training a correlation-aware context-to-event transformer for event-centric generation and classification. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 2559−2575

[133]

Huang K H, Hsu I, Natarajan P, Chang K W, Peng N. Multilingual generative language models for zero-shot cross-lingual event argument extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 4633−4646

[134]

Ma Y, Wang Z, Cao Y, Li M, Chen M, Wang K, Shao J. Prompt for extraction? PAIE: prompting argument interaction for event argument extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 6759−6774

[135]

Liu X, Huang H, Shi G, Wang B. Dynamic prefix-tuning for generative template-based event extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 5216−5228

[136]

Cai E, O’Connor B. A Monte Carlo language model pipeline for zero-shot sociopolitical event extraction. In: Proceedings of the NeurIPS 2023. 2023

[137]

Luo L, Xu Y. Context-aware prompt for generation-based event argument extraction with diffusion models. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023, 1717−1725

[138]

Lu D, Ran S, Tetreault J, Jaimes A. Event extraction as question generation and answering. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 1666−1688

[139]

van Nguyen C, Man H, Nguyen T H. Contextualized soft prompts for extraction of event arguments. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. 2023, 4352−4361

[140]

Hsu I H, Xie Z, Huang K, Natarajan P, Peng N. AMPERE: AMR-aware prefix for generation-based event argument extraction model. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 10976−10993

[141]

Duan J, Liao X, An Y, Wang J. KeyEE: enhancing low-resource generative event extraction with auxiliary keyword sub-prompt. Big Data Mining and Analytics, 2024, 7( 2): 547–560

[142]

Lin Z, Zhang H, Song Y. Global constraints with prompting for zero-shot event argument classification. In: Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023. 2023, 2482−2493

[143]

Liu W, Zhou L, Zeng D, Xiao Y, Cheng S, Zhang C, Lee G, Zhang M, Chen W. Beyond single-event extraction: towards efficient document-level multi-event argument extraction. In: Proceedings of the Findings of the Association for Computational Linguistics ACL 2024. 2024, 9470−9487

[144]

Zhang X F, Blum C, Choji T, Shah S, Vempala A. ULTRA: unleash LLMs’ potential for event argument extraction through hierarchical modeling and pair-wise self-refinement. In: Proceedings of the Findings of the Association for Computational Linguistics ACL 2024. 2024

[145]

Sun Z, Pergola G, Wallace B, He Y. Leveraging ChatGPT in pharmacovigilance event extraction: an empirical study. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. 2024, 344−357

[146]

Zhou H, Qian J, Feng Z, Hui L, Zhu Z, Mao K. LLMs learn task heuristics from demonstrations: a heuristic-driven prompting strategy for document-level event argument extraction. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 11972−11990

[147]

Hsu I H, Huang K H, Boschee E, Miller S, Natarajan P, Chang K W, Peng N. DEGREE: a data-efficient generation-based event extraction model. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 1890−1908

[148]

Zhao G, Gong X, Yang X, Dong G, Lu S, Li S. DemoSG: demonstration-enhanced schema-guided generation for low-resource event extraction. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 1805−1816

[149]

Gao J, Zhao H, Wang W, Yu C, Xu R. EventRL: enhancing event extraction with outcome supervision for large language models. 2024, arXiv preprint arXiv: 2402.11430

[150]

Huang K H, Hsu I H, Parekh T, Xie Z, Zhang Z, Natarajan P, Chang K W, Peng N, Ji H. TextEE: benchmark, reevaluation, reflections, and future challenges in event extraction. In: Proceedings of the Findings of the Association for Computational Linguistics ACL 2024. 2024, 12804−12825

[151]

Wang C, Liu X, Chen Z, Hong H, Tang J, Song D. DeepStruct: pretraining of language models for structure prediction. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022. 2022, 803−823

[152]

Li J, Zhang Y, Liang B, Wong K F, Xu R. Set learning for generative information extraction. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 13043−13052

[153]

Wei X, Chen Y, Cheng N, Cui X, Xu J, Han W. CollabKG: a learnable human-machine-cooperative information extraction toolkit for (event) knowledge graph construction. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024

[154]

Wang J, Chang Y, Li Z, An N, Ma Q, Hei L, Luo H, Lu Y, Ren F. TechGPT-2.0: a large language model project to solve the task of knowledge graph construction. 2024, arXiv preprint arXiv: 2401.04507

[155]

Xiao X, Wang Y, Xu N, Wang Y, Yang H, Wang M, Luo Y, Wang L, Mao W, Zeng D. YAYI-UIE: a chat-enhanced instruction tuning framework for universal information extraction. 2023, arXiv preprint arXiv: 2312.15548

[156]

Xu J, Sun M, Zhang Z, Zhou J. ChatUIE: exploring chat-based unified information extraction using large language models. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 3146−3152

[157]

Gui H, Yuan L, Ye H, Zhang N, Sun M, Liang L, Chen H. IEPile: unearthing large scale schema-conditioned information extraction corpus. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 127−146

[158]

Guo Q, Guo Y, Zhao J. Diluie: constructing diverse demonstrations of in-context learning with large language model for unified information extraction. Neural Computing and Applications, 2024, 36( 22): 13491–13512

[159]

Bi Z, Chen J, Jiang Y, Xiong F, Guo W, Chen H, Zhang N. CodeKGC: code language model for generative knowledge graph construction. ACM Transactions on Asian and Low-Resource Language Information Processing, 2024, 23( 3): 45

[160]

Li Z, Zeng Y, Zuo Y, Ren W, Liu W, Su M, Guo Y, Liu Y, Lixiang L, Hu Z, Bai L, Li W, Liu Y, Yang P, Jin X, Guo J, Cheng X. KnowCoder: coding structured knowledge into LLMs for universal information extraction. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 8758−8779

[161]

Li J, Jia Z, Zheng Z. Semi-automatic data enhancement for document-level relation extraction with distant supervision from large language models. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 5495−5505

[162]

Tang R, Han X, Jiang X, Hu X. Does synthetic data generation of LLMs help clinical text mining? 2023, arXiv preprint arXiv: 2303.04360

[163]

Meoni S, De la Clergerie E, Ryffel T. Large language models as instructors: a study on multilingual clinical entity extraction. In: Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. 2023, 178−190

[164]

Evans J, Sadruddin S, D’Souza J. Astro-NER–astronomy named entity recognition: is GPT a good domain expert annotator? 2024, arXiv preprint arXiv: 2405.02602

[165]

Naraki Y, Yamaki R, Ikeda Y, Horie T, Naganuma H. Augmenting NER datasets with LLMs: towards automated and refined annotation. 2024, arXiv preprint arXiv: 2404.01334

[166]

Chen F, Feng Y. Chain-of-thought prompt distillation for multimodal named entity recognition and multimodal relation extraction. 2023, arXiv preprint arXiv: 2306.14122

[167]

Li J, Li H, Pan Z, Sun D, Wang J, Zhang W, Pan G. Prompting ChatGPT in MNER: enhanced multimodal named entity recognition with auxiliary refined knowledge. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 2787−2802

[168]

Josifoski M, Sakota M, Peyrard M, West R. Exploiting asymmetry for synthetic training data generation: synthIE and the case of information extraction. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 1555−1574

[169]

Wadhwa S, Amir S, Wallace B. Revisiting relation extraction in the era of large language models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 15566−15589

[170]

Yuan C, Xie Q, Ananiadou S. Zero-shot temporal relation extraction with ChatGPT. In: Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. 2023, 92−102

[171]

Bian J, Zheng J, Zhang Y, Zhu S. Inspire the large language model by external knowledge on biomedical named entity recognition. 2023, arXiv preprint arXiv: 2309.12278

[172]

Hu Y, Chen Q, Du J, Peng X, Keloth V K, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal of the American Medical Informatics Association, 2024, 31( 9): 1812–1820

[173]

Shao W, Zhang R, Ji P, Fan D, Hu Y, Yan X, Cui C, Tao Y, Mi L, Chen L. Astronomical knowledge entity extraction in astrophysics journal articles via large language models. Research in Astronomy and Astrophysics, 2024, 24( 6): 065012

[174]

Geng S, Josifosky M, Peyrard M, West R. Flexible grammar-based constrained decoding for language models. 2023, arXiv preprint arXiv: 2305.13971

[175]

Liu T, Jiang Y E, Monath N, Cotterell R, Sachan M. Autoregressive structured prediction with language models. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 993−1005

[176]

Chen X, Li L, Deng S, Tan C, Xu C, Huang F, Si L, Chen H, Zhang N. LightNER: a lightweight tuning paradigm for low-resource NER via pluggable prompting. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, 2374−2387

[177]

Nie B, Shao Y, Wang Y. Know-adapter: towards knowledge-aware parameter-efficient transfer learning for few-shot named entity recognition. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 9777−9786

[178]

Zhang J, Liu X, Lai X, Gao Y, Wang S, Hu Y, Lin Y. 2INER: instructive and in-context learning on few-shot named entity recognition. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 3940−3951

[179]

Monajatipoor M, Yang J, Stremmel J, Emami M, Mohaghegh F, Rouhsedaghat M, Chang K W. LLMs in biomedicine: a study on clinical named entity recognition. 2024, arXiv preprint arXiv: 2404.07376

[180]

Dunn A, Dagdelen J, Walker N, Lee S, Rosen A S, Ceder G, Persson K, Jain A. Structured information extraction from complex scientific text with fine-tuned large language models. 2022, arXiv preprint arXiv: 2212.05238

[181]

Cheung J, Zhuang Y, Li Y, Shetty P, Zhao W, Grampurohit S, Ramprasad R, Zhang C. POLYIE: a dataset of information extraction from polymer material scientific literature. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024

[182]

Dagdelen J, Dunn A, Lee S, Walker N, Rosen A S, Ceder G, Persson K A, Jain A. Structured information extraction from scientific text with large language models. Nature Communications, 2024, 15( 1): 1418

[183]

Ma M D, Taylor A, Wang W, Peng N. DICE: data-efficient clinical event extraction with generative models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 15898−15917

[184]

Hu Y, Ameer I, Zuo X, Peng X, Zhou Y, Li Z, Li Y, Li J, Jiang X, Xu H. Zero-shot clinical entity recognition using ChatGPT. 2023, arXiv preprint arXiv: 2303.16416

[185]

Agrawal M, Hegselmann S, Lang H, Kim Y, Sontag D. Large language models are few-shot clinical information extractors. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 1998−2022

[186]

Labrak Y, Rouvier M, Dufour R. A zero-shot and few-shot study of instruction-finetuned large language models applied to clinical and biomedical tasks. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024

[187]

Gutiérrez B J, McNeal N, Washington C, Chen Y, Li L, Sun H, Su Y. Thinking about GPT-3 in-context learning for biomedical IE? Think again. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 4497−4512

[188]

Biana J, Zhai W, Huang X, Zheng J, Zhu S. VANER: leveraging large language model for versatile and adaptive biomedical named entity recognition. 2024, arXiv preprint arXiv: 2404.17835

[189]

González-Gallardo C E, Boros E, Girdhar N, Hamdi A, Moreno J G, Doucet A. yes but.. can ChatGPT identify entities in historical documents? In: Proceedings of 2023 ACM/IEEE Joint Conference on Digital Libraries. 2023, 184−189

[190]

Xie T, Li Q, Zhang J, Zhang Y, Liu Z, Wang H. Empirical study of zero-shot NER with ChatGPT. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 7935−7956

[191]

Gao J, Zhao H, Yu C, Xu R. Exploring the feasibility of ChatGPT for event extraction. 2023, arXiv preprint arXiv: 2303.03836

[192]

Gui H, Zhang J, Ye H, Zhang N. InstructIE: a Chinese instruction-based information extraction dataset. 2023, arXiv preprint arXiv: 2305.11527

[193]

Han R, Peng T, Yang C, Wang B, Liu L, Wan X. Is information extraction solved by ChatGPT? an analysis of performance, evaluation criteria, robustness and errors. 2023, arXiv preprint arXiv: 2305.14450

[194]

Katz U, Vetzler M, Cohen A, Goldberg Y. NERetrieve: dataset for next generation named entity recognition and retrieval. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 3340−3354

[195]

Li B, Fang G, Yang Y, Wang Q, Ye W, Zhao W, Zhang S. Evaluating ChatGPT’s information extraction capabilities: an assessment of performance, explainability, calibration, and faithfulness. 2023, arXiv preprint arXiv: 2304.11633

[196]

Fei H, Zhang M, Zhang M, Chua T S. XNLP: an interactive demonstration system for universal structured NLP. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024

[197]

Liu C, Zhao F, Kang Y, Zhang J, Zhou X, Sun C, Kuang K, Wu F. RexUIE: a recursive method with explicit schema instructor for universal information extraction. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 15342−15359

[198]

Zhu T, Ren J, Yu Z, Wu M, Zhang G, Qu X, Chen W, Wang Z, Huai B, Zhang M. Mirror: a universal framework for various information extraction tasks. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 8861−8876

[199]

Bhagavatula C, Le Bras R, Malaviya C, Sakaguchi K, Holtzman A, Rashkin H, Downey D, Yih S W T, Choi Y. Abductive commonsense reasoning. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[200]

OpenAI. Introduce ChatGPT. See openai.com/index/chatgpt/ website, 2023

[201]

Whitehouse C, Choudhury M, Aji A F. LLM-powered data augmentation for enhanced cross-lingual performance. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023

[202]

Wu L, Zheng Z, Qiu Z, Wang H, Gu H, Shen T, Qin C, Zhu C, Zhu H, Liu Q, Xiong H, Chen E. A survey on large language models for recommendation. World Wide Web, 2024, 27( 5): 60

[203]

Chen Y, Wang Q, Wu S, Gao Y, Xu T, Hu Y. TOMGPT: reliable text-only training approach for cost-effective multi-modal large language model. ACM Transactions on Knowledge Discovery from Data, 2024, 18( 7): 171

[204]

Luo P, Xu T, Liu C, Zhang S, Xu L, Li M, Chen E. Bridging gaps in content and knowledge for multimodal entity linking. In: Proceedings of the ACM Multimedia 2024. 2024

[205]

Yang H, Zhao X, Huang S, Li Q, Xu G. LATEX-GCL: large language models (LLMs)-based data augmentation for text-attributed graph contrastive learning. 2024, arXiv preprint arXiv: 2409.01145

[206]

Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, Dai Y, Sun J, Wang M, Wang H. Retrieval-augmented generation for large language models: a survey. 2023, arXiv preprint arXiv: 2312.10997

[207]

Gao L, Biderman S, Black S, Golding L, Hoppe T, Foster C, Phang J, He H, Thite A, Nabeshima N, Presser S, Leahy C. The pile: an 800GB dataset of diverse text for language modeling. 2020, arXiv preprint arXiv: 2101.00027

[208]

Marvin G, Hellen N, Jjingo D, Nakatumba-Nabende J. Prompt engineering in large language models. In: Jacob I J, Piramuthu S, Falkowski-Gilski P. Data Intelligence and Cognitive Informatics. Singapore: Springer, 2024, 387−402

[209]

Zhao H, Zheng S, Wu L, Yu B, Wang J. LANE: logic alignment of non-tuning large language models and online recommendation systems for explainable reason generation. 2024, arXiv preprint arXiv: 2407.02833

[210]

Zheng Z, Qiu Z, Hu X, Wu L, Zhu H, Xiong H. Generative job recommendations with large language model. 2023, arXiv preprint arXiv: 2307.02157

[211]

Wu L, Qiu Z, Zheng Z, Zhu H, Chen E. Exploring large language model for graph data understanding in online job recommendations. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 9178−9186

[212]

Zheng Z, Chao W, Qiu Z, Zhu H, Xiong H. Harnessing large language models for text-rich sequential recommendation. In: Proceedings of the ACM Web Conference 2024. 2024, 3207−3216

[213]

Chen B, Zhang Z, Langrené N, Zhu S. Unleashing the potential of prompt engineering in large language models: a comprehensive review. 2023, arXiv preprint arXiv: 2310.14735

[214]

Zhao Z, Lin F, Zhu X, Zheng Z, Xu T, Shen S, Li X, Yin Z, Chen E. DynLLM: when large language models meet dynamic graph recommendation. 2024, arXiv preprint arXiv: 2405.07580

[215]

Wang J, Shi E, Yu S, Wu Z, Ma C, Dai H, Yang Q, Kang Y, Wu J, Hu H, Yue C, Zhang H, Liu Y, Pan Y, Liu Z, Sun L, Li X, Ge B, Jiang X, Zhu D, Yuan Y, Shen D, Liu T, Zhang S. Prompt engineering for healthcare: methodologies and applications. 2023, arXiv preprint arXiv: 2304.14670

[216]

Xu D, Zhang Z, Lin Z, Wu X, Zhu Z, Xu T, Zhao X, Zheng Y, Chen E. Multi-perspective improvement of knowledge graph completion with large language models. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 11956−11968

[217]

Li X, Zhou J, Chen W, Xu D, Xu T, Chen E. Visualization recommendation with prompt-based reprogramming of large language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 13250−13262

[218]

Liu C, Xie Z, Zhao S, Zhou J, Xu T, Li M, Chen E. Speak from heart: an emotion-guided LLM-based multimodal method for emotional dialogue generation. In: Proceedings of 2024 International Conference on Multimedia Retrieval. 2024, 533−542

[219]

Peng W, Xu D, Xu T, Zhang J, Chen E. Are GPT embeddings useful for ads and recommendation? In: Proceedings of the 16th International Conference on Knowledge Science, Engineering and Management. 2023, 151−162

[220]

Peng W, Yi J, Wu F, Wu S, Zhu B B, Lyu L, Jiao B, Xu T, Sun G, Xie X. Are you copying my model? Protecting the copyright of large language models for EaaS via backdoor watermark. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 7653−7668

[221]

Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, Chi E H, Le Q V, Zhou D. Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1800

[222]

Chu Z, Chen J, Chen Q, Yu W, He T, Wang H, Peng W, Liu M, Qin B, Liu T. A survey of chain of thought reasoning: advances, frontiers and future. 2023, arXiv preprint arXiv: 2309.15402

[223]

Kojima T, Gu S S, Reid M, Matsuo Y, IwasawaY. Large language models are zero-shot reasoners. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 22199−22213

[224]

Yin S, Fu C, Zhao S, Li K, Sun X, Xu T, Chen E. A survey on multimodal large language models. 2023, arXiv preprint arXiv: 2306.13549

[225]

Willard B T, Louf R. Efficient guided generation for large language models. 2023, arXiv preprint arXiv: 2307.09702

[226]

Beurer-Kellner L, Müller M N, Fischer M, Vechev M. Prompt sketching for large language models. 2023, arXiv preprint arXiv: 2311.04954

[227]

Zheng L, Yin L, Xie Z, Huang J, Sun C, Yu C H, Cao S, Kozyrakis C, Stoica I, Gonzalez J E, Barrett C, Sheng Y. Efficiently programming large language models using SGLang. 2023, arXiv preprint arXiv: 2312.07104

[228]

Huang J, Li C, Subudhi K, Jose D, Balakrishnan S, Chen W, Peng B, Gao J, Han J. Few-shot named entity recognition: an empirical baseline study. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 10408−10423

[229]

Liu Z, Wu L, He M, Guan Z, Zhao H, Feng N. Dr.E bridges graphs with large language models through words. 2024, arXiv preprint arXiv: 2406.15504

[230]

Guan Z, Zhao H, Wu L, He M, Fan J. LangTopo: aligning language descriptions of graphs with tokenized topological modeling. 2024, arXiv preprint arXiv: 2406.13250

[231]

Zha R, Zhang L, Li S, Zhou J, Xu T, Xiong H, Chen E. Scaling up multivariate time series pre-training with decoupled spatial-temporal representations. In: Proceedings of the 40th IEEE International Conference on Data Engineering. 2024, 667−678

[232]

Zhao L, Liu Q, Yue L, Chen W, Chen L, Sun R, Song C. COMI: COrrect and mitigate shortcut learning behavior in deep neural networks. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024, 218−228

[233]

Lin F, Zhao Z, Zhu X, Zhang D, Shen S, Li X, Xu T, Zhang S, Chen E. When box meets graph neural network in tag-aware recommendation. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024, 1770−1780

[234]

Liu Q, Wu X, Zhao X, Zhu Y, Xu D, Tian F, Zheng Y. When MOE meets LLMs: parameter efficient fine-tuning for multi-task medical applications. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024, 1104−1114

[235]

Liu Q, Wu X, Zhao X, Zhu Y, Zhang Z, Tian F, Zheng Y. Large language model distilling medication recommendation model. 2024, arXiv preprint arXiv: 2402.02803

[236]

Wang Y, Wang Y, Fu Z, Li X, Zhao X, Guo H, Tang R. LLM4MSR: an LLM-enhanced paradigm for multi-scenario recommendation. 2024, arXiv preprint arXiv: 2406.12529

[237]

Zhao Z, Fan W, Li J, Liu Y, Mei X, Wang Y Q. Recommender systems in the era of large language models (LLMs). IEEE Transactions on Knowledge and Data Engineering, 2024, 36( 11): 6889–6907

[238]

Qiao S, Ou Y, Zhang N, Chen X, Yao Y, Deng S, Tan C, Huang F, Chen H. Reasoning with language model prompting: a survey. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 5368−5393

[239]

Sang E F T K, De Meulder F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning. 2003, 142−147

[240]

Roth D, Yih W T. A linear programming formulation for global inference in natural language tasks. In: Proceedings of the 8th Conference on Computational Natural Language Learning. 2004, 1−8

[241]

Walker C, Strassel S, Medero J, Maeda K. Ace 2005 multilingual training corpus-linguistic data consortium. See catalog.ldc.upenn.edu/LDC2006T06 website, 2005

[242]

Doddington G R, Mitchell A, Przybocki M A, Ramshaw L A, Strassel S M, Weischedel R M. The automatic content extraction (ACE) program - tasks, data, and evaluation. In: Proceedings of the 4th International Conference on Language Resources and Evaluation. 2004, 837−840

[243]

Li J, Sun Y, Johnson R J, Sciaky D, Wei C H, Leaman R, Davis A P, Mattingly C J, Wiegers T C, Lu Z. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database, 2016, 2016: baw068

[244]

Derczynski L, Bontcheva K, Roberts I. Broad twitter corpus: a diverse named entity recognition resource. In: Proceedings of the 26th International Conference on Computational Linguistics. 2016, 1169−1179

[245]

Karimi S, Metke-Jimenez A, Kemp M, Wang C. C_ADEC: a corpus of adverse drug event annotations. Journal of Biomedical Informatics, 2015, 55: 73–81

[246]

Wang Z, Shang J, Liu L, Lu L, Liu J, Han J. CrossWeigh: training named entity tagger from imperfect annotations. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 5153−5162

[247]

Liu Z, Xu Y, Yu T, Dai W, Ji Z, Cahyawijaya S, Madotto A, Fung P. CrossNER: evaluating cross-domain named entity recognition. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 13452−13460

[248]

Kumar A, Starly B. ”FabNER”: information extraction from manufacturing process science domain literature using named entity recognition. Journal of Intelligent Manufacturing, 2022, 33( 8): 2393–2407

[249]

Ding N, Xu G, Chen Y, Wang X, Han X, Xie P, Zheng H, Liu Z. Few-NERD: a few-shot named entity recognition dataset. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 3198−3213

[250]

Guan R, Man K L, Chen F, Yao S, Hu R, Zhu X, Smith J, Lim E G, Yue Y. FindVehicle and VehicleFinder: a NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system. Multimedia Tools and Applications, 2024, 83: 24841–24874

[251]

Kim J D, Ohta T, Tateisi Y, Tsujii J. GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics, 2003, 19( S1): i180–i182

[252]

Chen P, Xu H, Zhang C, Huang R. Crossroads, buildings and neighborhoods: a dataset for fine-grained location recognition. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 3329−3339

[253]

Liu J, Pasupat P, Cyphers S, Glass J. Asgard: a portable architecture for multilingual dialogue systems. In: Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013, 8386−8390

[254]

Tedeschi S, Navigli R. MultiNERD: a multilingual, multi-genre and fine-grained dataset for named entity recognition (and disambiguation). In: Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022. 2022, 801−812

[255]

Doğan R I, Leaman R, Lu Z. NCBI disease corpus: a resource for disease name recognition and concept normalization. Journal of Biomedical Informatics, 2014, 47: 1–10

[256]

Pradhan S, Moschitti A, Xue N, Ng H T, Björkelund A, Uryupina O, Zhang Y, Zhong Z. Towards robust linguistic analysis using OntoNotes. In: Proceedings of the 17th Conference on Computational Natural Language Learning. 2013, 143−152

[257]

Pradhan S, Elhadad N, South B R, Martínez D, Christensen L, Vogel A, Suominen H, Chapman W W, Savova G. Task 1: ShARe/CLEF eHealth evaluation lab 2013. In: Proceedings of the Working Notes for CLEF 2013 Conference. 2013

[258]

Mowery D L, Velupillai S, South B R, Christensen L, Martínez D, Kelly L, Goeuriot L, Elhadad N, Pradhan S, Savova G, Chapman W W. Task 2: ShARe/CLEF eHealth evaluation lab 2014. In: Proceedings of the Working Notes for CLEF 2014 Conference. 2014, 31−42

[259]

Lu D, Neves L, Carvalho V, Zhang N, Ji H. Visual attention model for name tagging in multimodal social media. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 1990−1999

[260]

Rijhwani S, Preotiuc-Pietro D. Temporally-informed analysis of named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7605−7617

[261]

Jiang H, Hua Y, Beeferman D, Roy D. Annotating the tweebank corpus on named entity recognition and building NLP models for social media analysis. In: Proceedings of the 13th Language Resources and Evaluation Conference. 2022, 7199−7208

[262]

Zhang Q, Fu J, Liu X, Huang X. Adaptive co-attention network for named entity recognition in tweets. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 5674−5681

[263]

Ushio A, Barbieri F, Silva V, Neves L, Camacho-Collados J. Named entity recognition in twitter: a dataset and analysis on short-term temporal shifts. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing. 2022, 309−319

[264]

Wang X, Tian J, Gui M, Li Z, Wang R, Yan M, Chen L, Xiao Y. WikiDiverse: a multimodal entity linking dataset with diversified contextual topics and entity types. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 4785−4797

[265]

Derczynski L, Nichols E, van Erp M, Limsopatham N. Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text. 2017, 140−147

[266]

Gurulingappa H, Rajput A M, Roberts A, Fluck J, Hofmann-Apitius M, Toldo L. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of Biomedical Informatics, 2012, 45( 5): 885–892

[267]

Yao Y, Ye D, Li P, Han X, Lin Y, Liu Z, Liu Z, Huang L, Zhou J, Sun M. DocRED: a large-scale document-level relation extraction dataset. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 764−777

[268]

Zheng C, Wu Z, Feng J, Fu Z, Cai Y. MNRE: a challenge multimodal dataset for neural relation extraction with visual evidence in social media posts. In: Proceedings of 2021 IEEE International Conference on Multimedia and Expo. 2021, 1−6

[269]

Riedel S, Yao L, McCallum A. Modeling relations and their mentions without labeled text. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. 2010, 148−163

[270]

Stoica G, Platanios E A, Poczos B. Re-TACRED: addressing shortcomings of the TACRED dataset. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 13843−13850

[271]

Luan Y, He L, Ostendorf M, Hajishirzi H. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3219−3232

[272]

Hendrickx I, Kim S N, Kozareva Z, NakovP, Séaghdha D Ó, Padó S, Pennacchiotti M, Romano L, Szpakowicz S. SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the 5th International Workshop on Semantic Evaluation. 2010, 33−38

[273]

Zhang Y, Zhong V, Chen D, Angeli G, Manning C D. Position-aware attention and supervised data improve slot filling. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 35−45

[274]

Alt C, Gabryszak A, Hennig L. TACRED revisited: a thorough evaluation of the TACRED relation extraction task. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 1558−1569

[275]

Satyapanich T, Ferraro F, Finin T. CASIE: extracting cybersecurity event information from text. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8749−8757

[276]

Kim J D, Wang Y, Takagi T, Yonezawa A. Overview of genia event task in BioNLP shared task 2011. In: Proceedings of BioNLP Shared Task 2011 Workshop. 2011, 7−15

[277]

Kim J D, Wang Y, Yamamoto Y. The genia event extraction shared task, 2013 edition -overview. In: Proceedings of BioNLP Shared Task 2013 Workshop. 2013, 8−15

[278]

Sun Z, Li J, Pergola G, Wallace B, John B, Greene N, Kim J, He Y. PHEE: a dataset for pharmacovigilance event extraction from text. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 5571−5587

[279]

Ebner S, Xia P, Culkin R, Rawlins K, Van Durme B. Multi-sentence argument linking. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8057−8077

[280]

Zamai A, Zugarini A, Rigutini L, Ernandes M, Maggini M. Show less, instruct more: Enriching prompts with definitions and guidelines for zero-shot ner. 2024, arXiv preprint arXiv:2407.01272

[281]

Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7871−7880

[282]

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21( 1): 140

[283]

Xue L, Constant N, Roberts A, Kale M, Al-Rfou R, Siddhant A, Barua A, Raffel C. mT5: a massively multilingual pre-trained text-to-text transformer. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 483−498

[284]

Chung H W, Hou L, Longpre S, Zoph B, Tay Y, Fedus W, Li Y, Wang X, Dehghani M, Brahma S, Webson A, Gu S S, Dai Z, Suzgun M, Chen X, Chowdhery A, Castro-Ros A, Pellat M, Robinson K, Valter D, Narang S, Mishra G, Yu A, Zhao V, Huang Y, Dai A, Yu H, Petrov S, Chi E H, Dean J, Devlin J, Roberts A, Zhou D, Le Q V, Wei J. Scaling instruction-finetuned language models. 2022, arXiv preprint arXiv: 2210.11416

[285]

Du Z, Qian Y, Liu X, Ding M, Qiu J, Yang Z, Tang J. GLM: general language model pretraining with autoregressive blank infilling. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 320−335

[286]

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G. LLaMA: open and efficient foundation language models. 2023, arXiv preprint arXiv: 2302.13971

[287]

Taori R, Gulrajani I, Zhang T, Dubois Y, Li X. Stanford alpaca: An instruction-following llama model. See github. com/tatsu-lab/stanford_alpaca website, 2023

[288]

Chiang W L, Li Z, Lin Z, Sheng Y, Wu Z, Zhang H, Zheng L, Zhuang S, Zhuang Y, Gonzalez J E, Stoica I, Xing E P. Vicuna: an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality. See vicuna. lmsys. org websit, 2023

[289]

Hugo T, Louis M, Kevin S, Albert P, Almahairi A, , . Llama 2: open foundation and fine-tuned chat models. 2023, arXiv preprint arXiv: 2307.09288

[290]

Rozière B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan X E, Adi Y, Liu J, Sauvestre R, Remez T, Rapin J, Kozhevnikov A, Evtimov I, Bitton J, Bhatt M, FerrerC C, Grattafiori A, Xiong W, Défossez A, Copet J, Azhar F, Touvron H, Martin L, Usunier N, Scialom T, Synnaeve G. Code llama: open foundation models for code. 2023, arXiv preprint arXiv: 2308.12950

[291]

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1( 8): 9

[292]

Brown T B, Mann B, Ryder N, Subbiah M, KaplanJ D, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 159

[293]

Wang B. Mesh-Transformer-JAX: model-parallel implementation of Transformer language model with JAX. See github.com/kingoflolz/mesh-transformer-jax website, 2021

[294]

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C L, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 27730−27744

Acknowledgements

This work was supported in part by the grants from the National Natural Science Foundation of China (Nos. 62222213, 62072423). Additionally, this research was partially supported by Research Impact Fund (No. R1015-23), APRC - CityU New Research Initiatives (No. 9610565, Start-up Grant for New Faculty of CityU), CityU - HKIDS Early Career Research Grant (No. 9360163), Hong Kong ITC Innovation and Technology Fund Midstream Research Programme for Universities Project (No. ITS/034/22MS), Hong Kong Environmental and Conservation Fund (No. 88/2022), and SIRG - CityU Strategic Interdisciplinary Research Grant (No. 7020046), Huawei (Huawei Innovation Research Program), Tencent (CCF-Tencent Open Fund, Tencent Rhino-Bird Focused Research Program), Ant Group (CCF-Ant Research Fund, Ant Group Research Fund), Alibaba (CCF-Alimama Tech Kangaroo Fund (No. 2024002)), CCF-BaiChuan-Ebtech Foundation Model Fund, and Kuaishou.

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2024 The Author(s) 2024. This article is published with open access at link.springer.com and journal.hep.com.cn

AI Summary AI Mindmap

PDF(3347 KB)

Supplementary files

Highlights (768 KB)

1313

Accesses

Citations

Detail

Sections

Recommended

Abstract
Graphical abstract
Keywords
Cite this article
1 Introduction
Fig.1 LLMs have been extensively explored for generative IE. These studies encompass various IE techniques, specialized frameworks designed for a single subtask, and universal frameworks capable of addressing multiple subtasks simultaneously
Fig.2 Taxonomy of research in generative IE using LLMs. Some papers have been omitted due to space limitations
2 Preliminaries of generative IE
Fig.3 Examples of different IE tasks
3 LLMs for different information extraction tasks
3.1 Named entity recognition
Tab.1 Comparison of LLMs for named entity recognition (identification & typing) with the Micro-F1 metric (%). † indicates that the model is discriminative. We demonstrate some universal and discriminative models for comparison. IE techniques include Cross-Domain Learning (CDL), Zero-Shot Prompting (ZS Pr), In-Context Learning (ICL), Supervised Fine-Tuning (SFT), Data Augmentation (DA). Uni. denotes whether the model is universal. Onto. 5 denotes the OntoNotes 5.0. Details of datasets and backbones are presented in Section 8. The settings for all subsequent tables are consistent with this format
3.2 Relation extraction
Tab.2 Comparison of LLMs for relation extraction with the “relation strict” [4] Micro-F1 metric (%). † indicates that the model is discriminative
Tab.3 Comparison of LLMs for relation classification with the Micro-F1 metric (%)
3.3 Event extraction
Tab.4 Comparison of Micro-F1 Values for Event Extraction on ACE05. Evaluation tasks include: Trigger Identification (Trg-I), Trigger Classification (Trg-C), Argument Identification (Arg-I), and Argument Classification (Arg-C). † indicates that the model is discriminative
3.4 Universal information extraction
Fig.4 The comparison of prompts of NL-LLMs and Code-LLMs for universal IE. Both NL-based and code-based methods attempt to construct a universal schema, but they differ in terms of prompt format and the way they utilize the generation capabilities of LLMs. This figure is adopted from [5] and [6]
3.5 Summaries of tasks
4 Techniques of LLMs for generative IE
4.1 Data augmentation
Fig.5 Comparison of data augmentation methods
4.2 Prompt design
4.3 Zero-shot learning
4.4 Constrained decoding generation
4.5 Few-shot learning
4.6 Supervised fine-tuning
4.7 Summaries of techniques
5 Applications on specific domains
Tab.5 The statistics of research in specific domain
6 Evaluation & analysis
7 Future directions
8 Benchmarks & backbones
8.1 Representative datasets
Tab.6 A summary of some representative IE datasets
8.2 Benchmarks
Tab.7 Statistics of common datasets for information extraction. ∗ denotes the dataset is multimodal. # refers to the number of categories or sentences. The data in the table is partially referenced from InstructUIE [192]
8.3 Backbones
Tab.8 The common backbones for generative information extraction. We mark the commonly used base and large versions for better reference
9 Conclusion
References
Acknowledgements
Competing interests
Open Access
RIGHTS & PERMISSIONS

Received	Accepted	Published
02 Jun 2024	08 Oct 2024	15 Dec 2024
Just Accepted Date	Issue Date
10 Oct 2024	08 Nov 2024

About the journal

Browse

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

Fig.1 LLMs have been extensively explored for generative IE. These studies encompass various IE techniques, specialized frameworks designed for a single subtask, and universal frameworks capable of addressing multiple subtasks simultaneously

Fig.2 Taxonomy of research in generative IE using LLMs. Some papers have been omitted due to space limitations

2 Preliminaries of generative IE

Fig.3 Examples of different IE tasks

3 LLMs for different information extraction tasks

3.1 Named entity recognition

3.2 Relation extraction

Tab.2 Comparison of LLMs for relation extraction with the “relation strict” [4] Micro-F1 metric (%). † indicates that the model is discriminative

Tab.3 Comparison of LLMs for relation classification with the Micro-F1 metric (%)

3.3 Event extraction

Tab.4 Comparison of Micro-F1 Values for Event Extraction on ACE05. Evaluation tasks include: Trigger Identification (Trg-I), Trigger Classification (Trg-C), Argument Identification (Arg-I), and Argument Classification (Arg-C). † indicates that the model is discriminative

3.4 Universal information extraction

Fig.4 The comparison of prompts of NL-LLMs and Code-LLMs for universal IE. Both NL-based and code-based methods attempt to construct a universal schema, but they differ in terms of prompt format and the way they utilize the generation capabilities of LLMs. This figure is adopted from [5] and [6]

3.5 Summaries of tasks

4 Techniques of LLMs for generative IE

4.1 Data augmentation

Fig.5 Comparison of data augmentation methods

4.2 Prompt design

4.3 Zero-shot learning

4.4 Constrained decoding generation

4.5 Few-shot learning

4.6 Supervised fine-tuning

4.7 Summaries of techniques

5 Applications on specific domains

Tab.5 The statistics of research in specific domain

6 Evaluation & analysis

7 Future directions

8 Benchmarks & backbones

8.1 Representative datasets

Tab.6 A summary of some representative IE datasets

8.2 Benchmarks

Tab.7 Statistics of common datasets for information extraction. ∗ denotes the dataset is multimodal. # refers to the number of categories or sentences. The data in the table is partially referenced from InstructUIE [192]

8.3 Backbones

Tab.8 The common backbones for generative information extraction. We mark the commonly used base and large versions for better reference

9 Conclusion

{{custom_sec.title}}

{{custom_sec.title}}

References

Acknowledgements

Competing interests

Open Access

RIGHTS & PERMISSIONS

Tab.2 Comparison of LLMs for relation extraction with the “relation strict” [4] Micro-F1 metric (%). $^{†}$ indicates that the model is discriminative

Tab.4 Comparison of Micro-F1 Values for Event Extraction on ACE05. Evaluation tasks include: Trigger Identification (Trg-I), Trigger Classification (Trg-C), Argument Identification (Arg-I), and Argument Classification (Arg-C). $^{†}$ indicates that the model is discriminative

Tab.7 Statistics of common datasets for information extraction. $*$ denotes the dataset is multimodal. # refers to the number of categories or sentences. The data in the table is partially referenced from InstructUIE [192]