E2CNN: entity-type-enriched cascaded neural network for Chinese financial relation extraction

Mengfan LI; Xuanhua SHI; Chenqi QIAO; Xiao HUANG; Weihao WANG; Yao WAN; Teng ZHANG; Hai JIN

doi:10.1007/s11704-024-3983-6

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (10) : 1910352 DOI: 10.1007/s11704-024-3983-6

Artificial Intelligence

RESEARCH ARTICLE

E²CNN: entity-type-enriched cascaded neural network for Chinese financial relation extraction

Author information +

History +

PDF (1952KB)

Abstract

Knowledge Graphs (KGs) are pivotal for effectively organizing and managing structured information across various applications. Financial KGs have been successfully employed in advancing applications such as audit, anti-fraud, and anti-money laundering. Despite their success, the construction of Chinese financial KGs has seen limited research due to the complex semantics. A significant challenge is the overlap triples problem, where entities feature in multiple relations within a sentence, hampering extraction accuracy – more than 39% of the triples in Chinese datasets exhibit the overlap triples. To address this, we propose the Entity-type-Enriched Cascaded Neural Network (E²CNN), leveraging special tokens for entity boundaries and types. E²CNN ensures consistency in entity types and excludes specific relations, mitigating overlap triple problems and enhancing relation extraction. Besides, we introduce the available Chinese financial dataset FINCORPUS.CN, annotated from annual reports of $2, 000$ companies, containing $48, 389$ entities and $23, 368$ triples. Experimental results on the DUIE dataset and FINCORPUS.CN underscore E²CNN’s superiority over state-of-the-art models.

Graphical abstract

Keywords

financial knowledge graph / overlap triples / cascaded neural network / relation extraction

Cite this article

Download citation ▾

Mengfan LI, Xuanhua SHI, Chenqi QIAO, Xiao HUANG, Weihao WANG, Yao WAN, Teng ZHANG, Hai JIN. E²CNN: entity-type-enriched cascaded neural network for Chinese financial relation extraction. Front. Comput. Sci., 2025, 19(10): 1910352 DOI:10.1007/s11704-024-3983-6

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Knowledge graphs have become a widely adopted standard for knowledge representation in the semantic web, where knowledge is encoded as a collection of “facts”. Typically, the facts are expressed as triples of the form (subject, predicate, object), where the subject and object are entities, and the predicate indicates a relation between them. In general, knowledge graphs can be applied to support various downstream tasks, such as question answering [1,2], data integration [3], and fact-checking [4].

Currently, significant efforts have been invested in constructing a knowledge graph (KG) with a primary focus on named entity recognition [5,6] and relation extraction [7–9]. Several domain knowledge graphs have already been established, including those for food [10] and manufacturing [11]. In this research, we focus on constructing a Chinese financial knowledge graph. This resource holds potential applications in diverse anti-fraud services, such as consumer credit services [12], credit card applications [13], fraud detection, and identification [14]. Although some attempts have been made to build language-independent neural networks for knowledge graph construction from diverse language corpora [15], extracting relations from the Chinese financial corpus remains challenging due to its unique characteristics.

One obstacle is the presence of the same entity appearing in multiple relations within a single sentence. According to our statistics from the collected Chinese financial corpus, nearly

62.91 %

of entities exhibit this phenomenon, which is also known as the overlap triple problem [16,17]. In Fig.1, we provide an example to illustrate the entity pair overlap (EPO) and single entity overlap (SEO) patterns of overlap triples. For instance, consider the sentence “Suzhou Jiangnan Film Vice President Zhang Hua has become a board member and investor of Yunhai Film.” Multiple relations exist between “Zhang Hua” and “Yunhai Film”. Furthermore, “Suzhou Jiangnan Film” serves as the subject in the triple (Suzhou Jiangnan Film, locate, Suzhou) and as the object in the triple (Zhang Hua, belong, Suzhou Jiangnan Film).

To tackle these issues, we employ a cascaded neural network. Initially, we detect a group of candidate subjects. Thereafter, for each subject, we proceed to extract all related objects via pre-defined relations, addressing the entity pair overlap. For single entity overlap, since extraction is structured for each relation, every subject can match its potential objects in different relations.

Besides, we discover that the relation types in the Chinese financial corpus exhibit a strong correlation with the entity types, and both the boundaries and types of entities offer a valuable opportunity. Such improvement materializes in a two-fold manner. Consistency and exclusion in relation type. Entities under the same subject and pre-defined relation embody a common type. Besides, for a specific entity and relation, coexistence is not present. As an illustration in Fig.2, entities like “ Suzhou Jiangnan Film” and “Yunhai Film” are typified as the “company” type. Both are identified under the “belong” relation. The head entity, such as “Zhang Hua”, belongs to the “person” type, and likely matching relations for objects include “belong”, “cooperate”, and “invest” (with a “Company” type object), as opposed to relations like “launch”, “own”, “locate”, and “apply to”. This exclusion helps prevent the occurrence of false relations. Defining entity boundaries. On defining the entity type, a coincidental finding often is the demarcation of the entity, delineating the beginning and end of the entity. Clearly defined boundaries can aid in classifying nested entities. For example, in the triple (“Suzhou Jiangnan Film”, locate, “Suzhou”), through the entity boundary, we can confirm the presence of two distinct entities within the phrase “Suzhou Jiangnan Film”.

In the light of the above, we propose an Entity-type-Enriched Cascaded Neural Network (E²CNN) for constructing financial knowledge graphs. Specifically, we utilize span-type identification, which involves employing special tokens [e:type] and [/e:type], to demarcate the boundaries and types of extracted entities. Following this, relying on the entity-type enriched representation, we engage the cascaded neural network to extract all subjects and then extract their corresponding objects for each relation type, simultaneously. The primary contributions of this paper are as follows:

● We propose a novel entity-type-enriched cascaded neural network (E²CNN) that considers the overlap triple problem and entity-type information to construct a Chinese financial knowledge graph. We investigate the challenge of relation extraction from specific financial application scenarios and data characteristics. The learned knowledge graph is publicly available on GitHub.

● We release a Chinese financial dataset (termed FINCORPUS.CN) based on a collection of Chinese financial company annual reports. The data annotation process involves manual annotation and cross-validation, resulting in

7

financial relation types and

6

financial entity types from

2, 000

companies.

● We conduct comprehensive experiments on the publicly available dataset DUIE and our newly established FINCORPUS.CN. The results demonstrate that E

2

CNN outperforms several state-of-the-art models, highlighting its effectiveness in relation extraction.

2 Methodology

Consider the Chinese financial corpus, i.e.,

C = {s 1, s 2, …, s m}

(

m

is the corpus’s total number of sentences). The

i

th sentence consists of multiple words, i.e.,

s i = {w 1, w 2, …, w n}

(

n

is the number of words contained in the

i

th sentence). Our goal is to extract relation facts and construct a knowledge graph

G = ⟨ V, E ⟩

based on the relation facts, where the subject and object are vertices and the predicate is an edge between those vertices. Fig.3 shows an overview of the proposed E²CNN. It contains two principal components: span-type identification and cascaded neural network. Essential notations and definitions we used in this paper can be found in Tab.1.

2.1 Span-type-identification

Span refers to a collection of consecutive words in the sequence, the span-based method learns a deep representation for each possible span, classifying it to its corresponding type. The process is described in Algorithm 1.

p : q

is the candidate span consisting of consecutive words with subscripts from

p

q

in the sentence

s

E p : q

is the encoding vector of candidate span

p : q

P p : q

is the probability that

p : q

is predicted as each type.

Specifically, we first vectorize the input sentences

s

using the pre-trained model BERT to get the contextualized representation

E s ∈ R n × d

for all tokens in a sentence [18], and then enumerates all possible spans in the input sentence. Its original vector is represented as

E p : q = {e p, e p + 1, …, e q} (1 ≤

p ≤ q ≤ n)

, where

e p

and

e q

denote the vector representation of head and tail of a span, respectively.

Based on the initial embedding, general span-based methods have two mainstream ways for span type identification. One concats the hidden states of the first and last tokens and combines these with a length feature vector to create the span representation, which is then classified [19–21]. The process is as follows:

(1)

E p : q = [e p; e q; l q − p + 1],

where

l q − p + 1 ∈ R d l

represents the length feature corresponding to the length of the span vector,

d l

represents the dimension of the length feature. The other is using a multi-head Biaffine decoder [22,23] to get the scores matrix for each enumerated span:

(2)

E p = L e a k y R e L U (E s W p),

(3)

E q = L e a k y R e L U (E s W q),

(4)

E p : q = M H B i a f f i n e (E p, E q),

where

w p, W q ∈ R d × h

and

h

is the hidden size. MHBiaffine is the multi-head Biaffine decoder. Therefore, the final representation of span can be denoted as

E p : q = e m b (p : q),

where the emb method can adopt Eqs. (1) and (2). The probability of the span for each entity type is calculated as follows:

P p : q = softmax (W e ∗ E p : q + b e),

where

W e

and

b e

are the learnable parameters of the fully connected layer. Since we have pre-defined

6

entity types,

P p : q ∈ R 6 × 1

. Besides, we prune out the non-entity spans (none of its type probability is above the predefined probability threshold

P

), and then sort the remained spans based on their maximum entity score

P p : q

, mark the corresponding type

t s

as the predicted entity type. Finally, the predicted entity set

D e n t i t y

is obtained as follows:

D e n t i t y = {(p 1, q 1, t 1), …, (p z, q z, t z)},

where

z

represents the length of the set of predicted entities, and each predicted entity consists of three elements, where

p z

represents the beginning position of the

z

th entity,

q z

represents the

z

-th entity the end position of the entity, and

t z

represents the predicted entity type of the

z

th entity.

2.2 Entity-type-enriched representation

We use two special token

[e : t y p e]

and

[/ e : t y p e]

to mark the extracted entities, which indicate the beginning and end of the entity, respectively. Taking Fig.3 as an example, “Donghai Technology” is the “company type”, and “Wang Ming” is the “person” type. Thus, the embedding of token “Dong” and “Wang” are added with the corresponding beginning label embeddings “

E [e : c o m]

”, “

E [e : p e r]

”, and the embedding of “Technology” and “Ming” are added with the end label embeddings “

E [/ e : c o m]

”, “

E [/ e : p e r]

”. The enriched embedding with the entity label is formulated as

E s ′ = {…, E [e : t y p e] + e p, …, E [/ e : t y p e] + e q, …} .

For each span in the

D e n t i t y

, the beginning label embedding is added to the head of span

e p

, and the end label embedding is added to its tail position

e q

. Likewise, the representation containing the entity type labels is obtained through the BERT [18].

Note: We primarily concentrate on identifying the head and tail positions. Enriching entity types is to strengthen the boundary and type information of the entity. Thus, we primarily concentrate on adding special token embeddings,

E [e : t y p e]

and

E [/ e : t y p e]

, on the head and tail positions of the entity as opposed to incorporating them throughout the entire entity, which prevents disruptions to the sequential operations.

2.3 Cascaded neural network

In the cascaded neural network module, we extract all subjects first, and then extract their corresponding objects for each relation type.

2.3.1 Subject extraction module

Subject extraction aims to discern the subjects within the entities, focusing on semantics. The process employs a distinct binary classifier to predict whether each position in the sequence is the beginning or end of the subject. For each position

e j

E s ′

, we use a sigmoid function to obtain the probability that the position is the beginning position of the subject,

p b e g s u b : j

, and the probability of the end position,

p e n d s u b : j

, respectively.

(5)

p b e g s u b : j = s i g m o i d (W b e g : s u b ∗ e j + b b e g : s u b),

(6)

p e n d s u b : j = s i g m o i d (W e n d : s u b ∗ e j + b e n d : s u b),

where

W b e g : s u b

b b e g : s u b

W e n d : s u b

, and

b e n d : s u b

are the learnable parameters of the subject classifiers. The calculated probability of this position is compared to the predetermined probability threshold

h b a r

, and if it is higher, it’s counted as a boundary for the subject and marked as

1

; if not, it is marked as

0

. The predicted subject set is then compiled by pairing the

1

markers at the beginning and the end position. Then the final subject representation of the

m

-th predicted subject

e s u b m ′

is calculated as follows:

(7)

e s u b m ′ = 12 (e b e g : m ′ + e e n d : m ′),

which is the average of the beginning and end tokens of the entity embeddings.

2.3.2 Relation-specific objects extraction

Assuming that for

m

th subject in the candidate subjects, the input for the relation-specific objects extraction is as follows:

(8)

E m = E s ′ ⊕ e s u b m ′,

where

⊕

represents the matrix addition operation. For any word vector

e j ∈ E m

, any relation type

r

in the relation type set

R

, we use the sigmoid function to convert the output of the fully connected layer into probabilities.

P b e g o b j : j = S i g m o i d (W b e g : o b j ∗ e j + b b e g : o b j),

P e n d o b j : j = S i g m o i d (W e n d : o b j ∗ e j + b e n d : o b j),

where

W b e g : o b j

and

b b e g : o b j

represent the adjustable parameters of the object beginning point, while

W e n d : o b j

and

b e n d : o b j

denote the end point. The predicted probabilities are symbolized as

P b e g o b j : j

and

P e n d o b j : j

. If the probability exceeds the threshold

t b a r

, the marker bit at this position is set as

1

; if not, it is set as

0

. Ultimately, the beginning and the end positions marked with

1

are paired to identify all candidate objects

o b j p r e d

Discussion: How does the cascaded neural network address the identified overlap issues?

For each subject, we extract all the corresponding objects based on every pre-defined relation. Consequently, for the same entity pair, under different relation classifiers, we can identify probable objects. Choosing a given subject, according to the relations, to extract objects, rather than identifying which relation the entity pair belongs to, can aid in mining multiple relations or objects.

For the extra subject-object overlap (SOO) problem, which means there are nested entities in the subject or object, such as the triple (“Suzhou Jiangnan Film”, locate, “Suzhou”), span-type-identification can enumerate every candidate span set, so both “Suzhou” and “Suzhou Jiangnan Film” can be identified as entities. Besides, the cascaded neural network can iterate each position, in turn, to judge the entity boundary, so “zhou” and “Film” can be assuredly distinguished.

2.4 Training

For the span-type identification task, we tune the pre-trained language model using task-specific cross-entropy loss as follows:

(9)

L e = − ∑ s ∈ s p a n log ⁡ P s (t s | s) ，

where

t s

is the annotated entity type.

L s u b : b e g = − ∑ k = 1 n x k : b e g log ⁡ p k : b e g + (1 − x k : b e g) log ⁡ (1 − p k : b e g) ，

L s u b : e n d = − ∑ k = 1 n x k : e n d log ⁡ p k : e n d + (1 − x k : e n d) log ⁡ (1 − p k : e n d) ，

where

n

represents the length of the sentence, and

x k : b e g

and

x k : e n d

represent the true label of the

k

th word for the beginning and end positions of the subject, respectively. If the

k

th word is the subject beginning position, then

x k : b e g

is 1; otherwise, it is 0.

p k : b e g

and

p k : e n d

represent the probability of the

k

th word being predicted as the subject boundary position. The final loss function of the subject extraction process is the sum of the loss functions, given by

L s u b = L s u b : b e g + L s u b : e n d .

In the cascaded neural network, for a given relation

r

in the predefined relation types

R

, the loss function of the module that predicts the beginning position of the object is as follows:

L r : b e g = − ∑ j = 1 n y j : b e g log ⁡ p j : b e g + (1 − y j : b e g) log ⁡ (1 − p j : b e g) .

Correspondingly, the loss function of the module that predicts the end position of the object is

L r : e n d = − ∑ j = 1 n y j : e n d log ⁡ p j : e n d + (1 − y j : e n d) log ⁡ (1 − p j : e n d),

where

y j : b e g

and

y j : e n d

represent the true label of the

j

th word for the beginning and end positions of the object, respectively.

p j : b e g

and

p j : e n d

represent the probability of the

j

th word being predicted as the object boundary position. The final loss function of the joint extraction process is the sum of the loss functions for each relation type. The formula is as follows:

(10)

L o b j = ∑ r ∈ R L r : b e g + L r : e n d .

3 Experiments

In this section, we present the experimental setup for the E

2

CNN model in relation extraction task.

3.1 Datasets

We employ two datasets: the large-scale Chinese general dataset DUIE [24] and our FINCORPUS.CN datasets. The statistical information is presented in Tab.2.

DuIE, developed by Baidu, is a large-scale Chinese dataset containing

450, 000

instances and

49

common relation types. Due to the utilization of distant supervision for automatically generating labeled data, the dataset’s quality is somewhat compromised, and the data distribution is not uniform. To address this, we clean and screen the original dataset, ultimately selecting

17

relation types and

15

entity types.

FINCORPUS.CN is a Chinese financial dataset constructed from a collection of annual reports published by

2, 000

financial companies. Here, we use brat, a widely used tool for entity and relation labeling, to manually annotate and cross-validate the data. The ontology diagram of FINCORPUS.CN is shown in Fig.4.

FINCORPUS.CN comprises

8, 319

samples, encompassing a total of

7

types of financial relations and

6

types of financial entities. Details regarding the involved entities and relations are elucidated in Tab.3. The distribution of the number of entities and relation types in the samples is illustrated in Fig.5.

3.2 Evaluation metrics

We employ precision (P), recall (R), and F1-measure (F1) as performance evaluation metrics. Precision represents the proportion of the number of true positive examples predicted by the model among the total positive examples it predicts. Recall, on the other hand, indicates the proportion of true positive examples predicted relative to the total number of true positive examples. The F1 value, being the harmonic average of precision and recall, serves as a judgment of the overall effectiveness of the model. Correct relation extraction entails the accurate prediction of entity boundaries of head and tail entities and the correct prediction of relation types.

3.3 Relation extraction test results and analysis

The F1 values of the relation extraction on DuIE and FINCORPUS.CN reach

84.93 %

and

75.24 %

, respectively, which both exceeded the baseline models.

E 2 C N N

is compared against five high-performing models:

● Digie++ [25] uses BERT to obtain the token representations and T-Concat span representation in the input sentence, and then iteratively propagates the core inference and confidence of relation type through the span graph to refine the representation.

● CasRel [16] models relations as functions that map subjects to objects instead of discrete labels to better handle the overlap problem. However, it does not consider the effect of entity type on relation extraction.

● PL-Marker [26] summarizes the existing work on span representation and adopts a fusion subject-oriented packing scheme in the span pair model. For the subject span, entity markers are applied by inserting [S] and [/S] before and after the span to indicate the beginning and end of the span, respectively.

● BIRTE [27] proposes a paralleled bidirectional extraction framework to extract all possible subject–object pairs and assigns all possible relations for pairs using a biaffine model. Additionally, it introduces a share-aware mechanism to address the issue of convergence rate inconsistency.

● OD-RTE [28] approaches the relation extraction task by treating it as an object detection problem. Additionally, it introduces the bidirectional diagonal walk decoding algorithm for extracting all types of triples.

Tab.4 shows intuitively that the performance of our E²CNN model is significantly better than previous models in terms of end-to-end relation extraction. Our model outperforms the state-of-the-art (SOTA) model by

19.7

% and

28.69

% on the DUIE and FINCORPUS.CN, respectively, on the F1 metric. Moreover, the experimental results reveal significant variations in the performance of the same model across the FINCORPUS.CN and DuIE datasets. This discrepancy is primarily attributed to the dissimilarities between the datasets. The sentence structure in DUIE is relatively straightforward and consistent, making it easier to learn and extract information. However, the Chinese financial dataset FINCORPUS.CN contains a substantial number of overlap triples, reflecting distinctive data features and syntactic complexity. Comparison of the statistics and sample cases from the DUIE and FINCORPUS.CN datasets are presented in Fig.6.

Tab.5 compares the F1, P, and R scores of the E

2

CNN and CasRel models on the normal, single entity overlap, and entity pair overlap triples. The table shows that E

2

CNN surpasses CasRel on all triples, especially on the normal dataset, where E

2

CNN achieves a

29.19

% higher F1 score than CasRel. Moreover, E

2

CNN also outperforms CasRel by

21.86

% and

31.1

% on the single entity overlap and entity pair overlap triples, respectively. This demonstrates that the E

2

CNN model is more capable of handling complex relation extraction tasks.

Implementation details For a fair comparison, we employ the chinese-roberta-wwm-ext model [18] as the pre-trained BERT model in all our experiments. In the span type identification task, the model is trained with a learning rate of

1 × 10 − 5

over

100

epochs, employing a batch size of

8

. For the relation extraction task, we adjust the batch size to

4

and conduct

10

epochs. Additionally, to accommodate the task requirements, we set a maximum sentence length of

300

. The model is implemented using the PyTorch [29] framework. During our experiments, the threshold for object detection is consistently set to 0.5 for both the head and tail. It is worth noting that this parameter has the potential for optimization to enhance model performance. However, we have opted to postpone its fine-tuning to future work.

4 Ablation study

We conduct an ablation experiment on the effect of entity type on different relations and calculate the type distribution of subject and object in the triples predicted by the test set. Tab.6 demonstrates the experimental results, after removing the type information, the proportions of false type and non-entity both increase, especially the proportion of non-entity increases significantly. E

2

CNN achieves higher F1 scores on most relations. Moreover, our model significantly reduces the proportion of false type and non-entity predictions. This demonstrates that entity type information can enhance the model’s ability to extract more accurate and relevant triples from the corpus.

5 Related work

Currently, many knowledge graphs have been developed in different domains, such as RcpKG [30] and FabKG [11] for food and manufacturing, respectively. However, few works focus on the financial domain. Building a financial knowledge graph can facilitate answering complex financial domain queries [31]. Knowledge Graphs (KGs) construction by extracting data from structured or unstructured data sources, especially textual texts, is of great importance to support services like question answering [32,33], fact checking [34,35], and data integration [36,37]. Over the past few years, various knowledge graphs including DBpedia [38], YAGO [39], and NELL [40], have been developed for general-purposed domains.

In the financial field, Elhammadi et al. [41] proposed a pipeline knowledge extraction that uses conditional random field (CRF) to filter data and combines Semantic Role Labeling (SRL) and pattern-based information extraction to extract domain-targeted noun/verb-mediated relations in financial news domain. However, the Chinese financial sector lacks a sufficient amount of technologies for the systematical and automatic construction of KGs.

Relational triples extraction [28,42] from natural language corpora is a crucial stage in the construction of massive KGs. The primary study area at present is the relation extraction approach based on deep neural networks, and the effect of supervised learning [43] method is far better than that of semi-supervised [44] and unsupervised methods [45]. The two main categories of current methods for extracting relation triples are pipeline-based methods and joint-based methods.

5.1 Pipeline-based methods

Generally, the pipeline of KGs construction can be organized as two subtasks: 1) named entity recognition [5,6,46], which aims to recognize the financial entities from natural-language sentences; 2) relation extraction [7,8], which aims to link the financial entities via a relation.

Early works [47–50] follow a pipeline-based paradigm, namely training one model to extract entities (i.e., entities recognition) and another model to classify relations between them (i.e., relation extraction). Zeng et al. [51] proposed utilizing CNN to learn word-level features, further learn sentence-level features through convolution, and then classify the features into relations, demonstrating the feasibility of applying deep learning models to the relation extraction task. Zhou et al. [52] suggested improving feature capture by adding the Attention Mechanism on top of BiLSTM. Zhong and Chen [20] present a pipeline approach for representing all candidate spans and identifying the corresponding entity types. They propose the importance of fused entity information at the relation model’s input layer. Based on their work, three commonly used span representation extraction methods (i.e., T-concat, Solid Marker, and Levitated Marker) are outlined by Ye et al. [26]. They investigate how various span representations affect relation extraction model performance.

5.2 Joint-based methods

The joint-based methods [53] combine entity extraction and relation extraction to accomplish the extraction in an end-to-end mode. Zheng et al. [54] proposed a joint extraction method that turns the two subtasks into a unified sequence labeling task with BiLSTM for coding and decoding, avoiding the entity redundancy problem caused by shared parameters. Bekoulis et al. [55] combined CRF for entity extraction with multi-head selection for relation classification to achieve joint extraction.

The foundation of conventional joint models is characteristics [56,57] which require labor-intensive human work. Recent research [58–60] has explored neural network-based techniques, offering cutting-edge performance while minimizing manual effort. Miwa and Bansal [61] proposed a neural network to obtain information about dependency tree substructure, but their model only achieved joint extraction by sharing parameters, not joint decoding. In contrast to earlier methods, Zheng et al. [54] used a tagging scheme to model the relation extraction problem instead of adhering to the task classification of entity extraction versus relationship extraction, treating the relation triples as a whole and directly extracting the information at the triple level.

Recently, external knowledge has been employed to complement traditional context information such as part-of-speech (POS) labels [62]. Similar approaches are used in the field of Chinese NER and RE. Li et al. [63] incorporate word-level information into character sequence inputs to avoid segmentation errors. Xuan et al. [64] incorporated graph information into the character representation.

Despite the extensive study of joint-based relation extraction methods, much of the current work has not considered the substantial amount of hyper-relational instances in the corpus. The relation types were categorized by Zeng et al. [60] as formal, single-entity overlap, and entity-pair overlap. They then attempted to use the sequence-to-sequence model to solve overlap relation issues. Fu et al. [65] suggested a graph convolutional network-based method for this problem. Shang et al. [66] utilized a scoring-based classifier and a relation-specific horns tagging strategy to address issues like cascaded errors and redundant information. Even after the initial successes [67], existing approaches continue to see relations as separate labels, which makes it challenging for overlap relation extraction.

6 Conclusion and future work

In our study, we investigate methods for overlap triple extraction from Chinese financial corpora and offer a dataset for financial knowledge graph construction. We utilize a cascaded neural network to extract all subjects first, and then extract their corresponding objects for each relation type, concurrently. Meanwhile, the extracted information about entity boundaries and types provides an opportunity for better relation extraction. Consequently, we propose an Entity-type-Enriched Cascaded Neural Network (E²CNN) to enrich our cascaded neural network with entity type information for better relation extraction. We perform comprehensive experiments on the open-source Chinese datasets DUIE and our Chinese financial dataset (termed FINCORPUS.CN). Experimental results demonstrate the effectiveness of E²CNN when compared with several SOTA baselines and by using entity type information, E²CNN enhances the relation extraction task performance.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Saxena A, Chakrabarti S, Talukdar P. Question answering over temporal knowledge graphs. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 6663−6676

[2]	Zhang M, He T T, Dong M . Meta-path reasoning of knowledge graph for commonsense question answering. Frontiers of Computer Science, 2024, 18( 1): 181303

[3]	Collarana D, Galkin M, Traverso-Ribón I, Lange C, Vidal M E, Auer S. Semantic data integration for knowledge graph construction at query time. In: Proceedings of the 11th IEEE International Conference on Semantic Computing. 2017, 109−116

[4]	Kim J, Choi K S. Unsupervised fact checking by counter-weighted positive and negative evidential paths in a knowledge graph. In: Proceedings of the 28th International Conference on Computational Linguistics. 2020, 1677−1686

[5]	Sang E F T K, De Meulder F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003. 2003, 142−147

[6]	Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition. In: Proceedings of the 13th Conference on Computational Natural Language Learning. 2009, 147−155

[7]	Zelenko D, Aone C, Richardella A. Kernel methods for relation extraction. In: Proceedings of 2002 Conference on Empirical Methods in Natural Language Processing. 2002, 71−78

[8]	Bunescu R C, Mooney R J. A shortest path dependency kernel for relation extraction. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 2005, 724−731

[9]	Zheng Z Y, Liu Y, Li D, Zhang X J . Distant supervised relation extraction based on residual attention. Frontiers of Computer Science, 2022, 16( 6): 166336

[10]	Haussmann S, Seneviratne O, Chen Y, Ne’eman Y, Codella J, Chen C H, McGuinness D L, Zaki M J. FoodKG: a semantics-driven knowledge graph for food recommendation. In: Proceedings of the 18th International Semantic Web Conference. 2019, 146−162

[11]	Kumar A, Bharadwaj A G, Starly B, Lynch C. FabKG: a knowledge graph of manufacturing science domain utilizing structured and unconventional unstructured knowledge source. In: Proceedings of the Workshop on Structured and Unstructured Knowledge Integration (SUKI). 2022, 1−8

[12]	Kang Y Z, Jia N, Cui R B, Deng J . A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring. Applied Soft Computing, 2021, 105: 107259

[13]	Van Belle R, Mitrović S, De Weerdt J. Representation learning in graphs for credit card fraud detection. In: Proceedings of the 4th ECML PKDD Workshop on Mining Data for Financial Applications. 2020, 32−46

[14]	Zhan Q, Yin H. A loan application fraud detection method based on knowledge graph and neural network. In Proceedings of the 2nd International Conference on Innovation in Artificial Intelligence. 2018, 111−115

[15]

Iglesias-Molina A, Chaves-Fraga D, Priyatna F, Corcho O. Towards the definition of a language-independent mapping template for knowledge graph creation. In: Proceedings of the 3rd International Workshop on Capturing Scientific Knowledge Co-located with the 10th International Conference on Knowledge Capture. 2019, 33−36

[16]	Wei Z, Su J, Wang Y, Tian Y, Chang Y. A novel cascade binary tagging framework for relational triple extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 1476−1488

[17]	Wang G, Zeng Y, Li R H, Qin H, Shi X, Xia Y, Shang X, Hong L . Temporal graph cube. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 12): 13015–13030

[18]	Cui Y, Che W, Liu T, Qin B, Wang S, Hu G. Revisiting pre-trained models for Chinese natural language processing. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2020. 2020, 657−668

[19]	Xu W, Chen Y, Ouyang J. A streamlined span-based factorization method for few shot named entity recognition. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 1673−1683

[20]	Zhong Z, Chen D. A frustratingly easy approach for entity and relation extraction. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 50−61

[21]	Dixit K, Al-Onaizan Y. Span-level model for relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 5308−5314

[22]	Yu J, Bohnet B, Poesio M. Named entity recognition as dependency parsing. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 6470−6476

[23]	Yan H, Sun Y, Li X, Qiu X. An embarrassingly easy but strong baseline for nested named entity recognition. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 1442−1452

[24]	Li S, He W, Shi Y, Jiang W, Liang H, Jiang Y, Zhang Y, Lyu Y, Zhu Y. DuIE: a large-scale Chinese dataset for information extraction. In: Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing. 2019, 791−800

[25]

Wadden D, Wennberg U, Luan Y, Hajishirzi H. Entity, relation, and event extraction with contextualized span representations. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 5784−5789

[26]	Ye D, Lin Y, Li P, Sun M. Packed levitated marker for entity and relation extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 4904−4917

[27]	Ren F, Zhang L, Zhao X, Yin S, Liu S, Li B. A simple but effective bidirectional framework for relational triple extraction. In: Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 2022, 824−832

[28]	Ning J, Yang Z, Sun Y, Wang Z, Lin H. OD-RTE: a one-stage object detection framework for relational triple extraction. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 11120−11135

[29]	Dai H, Peng X, Shi X, He L, Xiong Q, Jin H . Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment. Science China Information Sciences, 2022, 65: 112103

[30]	Lei Z, Ul Haq A, Zeb A, Suzauddola M, Zhang D . Is the suggested food your desired? Multi-modal recipe recommendation with demand-based knowledge graph. Expert Systems with Applications, 2021, 186: 115708

[31]	Zehra S, Mohsin S F M, Wasi S, Jami S I, Siddiqui M S, Syed M K U R R . Financial knowledge graph based financial report query system. IEEE Access, 2021, 9: 69766–69782

[32]	Rony M R A H, Chaudhuri D, Usbeck R, Lehmann J . Tree-KGQA: an unsupervised approach for question answering over knowledge graphs. IEEE Access, 2022, 10: 50467–50478

[33]	Shang C, Wang G, Qi P, J H. Improving time sensitivity for question answering over temporal knowledge graphs. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 8017−8026

[34]	Lin P, Song Q, Wu Y . Fact checking in knowledge graphs with ontological subgraph patterns. Data Science and Engineering, 2018, 3( 4): 341–358

[35]	Lin P, Song Q, Wu Y, Pi J . Discovering patterns for fact checking in knowledge graphs. Journal of Data and Information Quality, 2019, 11( 3): 13

[36]	Cudré-Mauroux P . Leveraging knowledge graphs for big data integration: the XI pipeline. Semantic Web, 2020, 11( 1): 13–17

[37]	Dandan R, Despres S. DIKG2: a semantic data integration approach for knowledge graphs generation from Web forms. In: Proceedings of the 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. 2021, 255−260

[38]	Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P N, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C . DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 2015, 6( 2): 167–195

[39]	Suchanek F M, Kasneci G, Weikum G. Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 697−706

[40]

Mitchell T, Cohen W, Hruschka E, Talukdar P, Yang B, Betteridge J, Carlson A, Dalvi B, Gardner M, Kisiel B, Krishnamurthy J, Lao N, Mazaitis K, Mohamed T, Nakashole N, Platanios E, Ritter A, Samadi M, Settles B, Wang R, Wijaya D, Gupta A, Chen X, Saparov A, Greaves M, Welling J . Never-ending learning. Communications of the ACM, 2018, 61( 5): 103–115

[41]	Elhammadi S, Lakshmanan L V S, Ng R, Simpson M, Huai B, Wang Z, Wang L. A high precision pipeline for financial knowledge graph construction. In: Proceedings of the 28th International Conference on Computational Linguistics. 2020, 967−977

[42]	Kim K, Hur Y, Kim G, Lim H . GREG: a global level relation extraction with knowledge graph embedding. Applied Sciences, 2020, 10( 3): 1181

[43]	McCallum A, Freitag D, Pereira F C N. Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning. 2000, 591−598

[44]	Brin S. Extracting patterns and relations from the world wide web. In: Proceedings of International Workshop on the World Wide Web and Databases. 1998, 172−183

[45]	Hasegawa T, Sekine S, Grishman R. Discovering relations among named entities from large corpora. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 2004, 415−422

[46]	Gong C, Li Z, Xia Q, Chen W, Zhang M . Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition. Science China Information Sciences, 2020, 63( 10): 202102

[47]	Mintz M, Bills S, Snow R, Jurafsky D. Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009, 1003−1011

[48]	Gormley M R, Yu M, Dredze M. Improved relation extraction with feature-rich compositional embedding models. In: Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 1774−1784

[49]	Florian R, Hassan H, Ittycheriah A, Jing H, Kambhatla N, Luo X, Nicolov N, Roukos S. A statistical model for multilingual entity detection and tracking. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 2004, 1−8

[50]	Florian R, Jing H, Kambhatla N, Zitouni I. Factorizing complex models: a case study in mention detection. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. 2006, 473−480

[51]	Zeng D, Liu K, Lai S, Zhou G, Zhao J. Relation classification via convolutional deep neural network. In: Proceedings of the 25th International Conference on Computational Linguistics. 2014, 2335−2344

[52]	Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B. Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 207−212

[53]	Qiao B, Zou Z, Huang Y, Fang K, Zhu X, Chen Y . A joint model for entity and relation extraction based on BERT. Neural Computing and Applications, 2022, 34( 5): 3471–3481

[54]	Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B. Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, 1227−1236

[55]	Bekoulis G, Deleu J, Demeester T, Develder C . Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications, 2018, 114: 34–45

[56]	Yu X, Lam W. Jointly identifying entities and extracting relations in encyclopedia text via a graphical model approach. In: Proceedings of the 23rd International Conference on Computational Linguistics. 2010, 1399−1407

[57]	Li Q, Ji H. Incremental joint extraction of entity mentions and relations. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014, 402−412

[58]	Gupta P, Schütze H, Andrassy B. Table filling multi-task recurrent neural network for joint entity and relation extraction. In: Proceedings of the 26th International Conference on Computational Linguistics. 2016, 2537−2547

[59]	Katiyar A, Cardie C. Going out on a limb: joint extraction of entity mentions and relations without dependency trees. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, 917−928

[60]	Zeng X, Zeng D, He S, Liu K, Zhao J. Extracting relational facts by an end-to-end neural model with copy mechanism. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 506−514

[61]	Miwa M, Bansal M. End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 1105−1116

[62]	Nie Y, Tian Y, Song Y, Ao X, Wan X. Improving named entity recognition with attentive ensemble of syntactic information. In: Proceedings of Findings of the Association for Computational Linguistics. 2020, 4231−4245

[63]	Li Z, Ding N, Liu Z, Zheng H, Shen Y. Chinese relation extraction with multi-grained information and external linguistic knowledge. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 4377−4386

[64]	Xuan Z, Bao R, Jiang S. FGN: fusion glyph network for Chinese named entity recognition. In: Proceedings of the 5th China Conference on Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence. 2020, 28−40

[65]	Fu T J, Li P H, Ma W Y. GraphRel: modeling text as relational graphs for joint entity and relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 1409−1418

[66]	Shang Y M, Huang H, Mao X. OneRel: joint entity and relation extraction with one module in one step. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 11285−11293

[67]	Wang Y, Yu B, Zhang Y, Liu T, Zhu H, Sun L. TPLinker: single-stage joint extraction of entities and relations through token pair linking. In: Proceedings of the 28th International Conference on Computational Linguistics. 2020, 1572−1582