Exploring &amp; exploiting high-order graph structure for sparse knowledge graph completion

Tao HE; Ming LIU; Yixin CAO; Zekun WANG; Zihao ZHENG; Bing QIN

doi:10.1007/s11704-023-3521-y

PDF(17469 KB)

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (2) : 192306. DOI: 10.1007/s11704-023-3521-y

Artificial Intelligence

RESEARCH ARTICLE

Exploring & exploiting high-order graph structure for sparse knowledge graph completion

Tao HE¹ ,
Ming LIU¹^,² ,
Yixin CAO³ ,
Zekun WANG¹ ,
Zihao ZHENG¹ ,
Bing QIN¹^,²

Author information +

History +

Abstract

Sparse Knowledge Graph (KG) scenarios pose a challenge for previous Knowledge Graph Completion (KGC) methods, that is, the completion performance decreases rapidly with the increase of graph sparsity. This problem is also exacerbated because of the widespread existence of sparse KGs in practical applications. To alleviate this challenge, we present a novel framework, LR-GCN, that is able to automatically capture valuable long-range dependency among entities to supplement insufficient structure features and distill logical reasoning knowledge for sparse KGC. The proposed approach comprises two main components: a GNN-based predictor and a reasoning path distiller. The reasoning path distiller explores high-order graph structures such as reasoning paths and encodes them as rich-semantic edges, explicitly compositing long-range dependencies into the predictor. This step also plays an essential role in densifying KGs, effectively alleviating the sparse issue. Furthermore, the path distiller further distills logical reasoning knowledge from these mined reasoning paths into the predictor. These two components are jointly optimized using a well-designed variational EM algorithm. Extensive experiments and analyses on four sparse benchmarks demonstrate the effectiveness of our proposed method.

Graphical abstract

Keywords

knowledge graph completion / graph neural networks / reinforcement learning

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Tao HE, Ming LIU, Yixin CAO, Zekun WANG, Zihao ZHENG, Bing QIN. Exploring & exploiting high-order graph structure for sparse knowledge graph completion. Front. Comput. Sci., 2025, 19(2): 192306 https://doi.org/10.1007/s11704-023-3521-y

1 Introduction

Knowledge Graph Completion (KGC) is the task of reasoning missing facts in the triple format

(h, r, t)

according to existing facts in a given Knowledge Graph (KG). Despite previous successes on it, KGC often encounters sparsity issues [1]. To illustrate this, we conducted pilot studies using five typical KGC models on four datasets with varying sparsity levels. As illustrated in Fig.1, the performance curves exhibit a clear downward trend as sparsity increases. Unfortunately, KGs in practical applications are typically much sparser than those in current research [2], considering the insufficient corpus and imperfect effect for the information extraction technology [3,4]. Therefore, investigating sparse KGs would greatly benefit real-world applications such as question answering [5,6], conversation [7], and question generation[8].

Fig.1 KGC results of different previous KG embedding models on FB15K-237 and its sparse subsets (60%, 40%, and 20% denote percentages of retained triples). The performance drops dramatically as we remove triples. (a) Hits@10; (b) MRR

Full size|PPT slide

Previous studies have utilized the graph attention mechanism within the Graph Neural Network (GNN) framework [2] or contrastive learning [9] to address the issue of graph sparsity. However, these methods only consider the first-order neighbor structure and disregard rich higher-order graph structure features such as rules and motifs [10], which are essential semantic components of the graph structure [11]. A naive solution is to stack K-floor graph convolutional layers [12] to enable models to consider distant neighbors. However, this approach is susceptible to the over-squashing curse [13]. This problem states the phenomenon that features from K-hop neighbors are seriously compressed during aggregation processes due to the exponential growth in the number of K-hop neighbors with increasing distances.

To this end, we endeavor to design a more effective approach that leverages the rich high-order graph structure features to do the sparse KGC task, where two primary challenges need to be addressed: 1) Exploration. Abundant noisy structures are included in exponentially increasing K-hop neighbors. Thus, it is crucial to identify meaningful long-range dependencies while filtering out irrelevant high-order structures. 2) Exploitation. GNN methods prioritize the modeling of neighbor correlation, while high-order structure knowledge from motifs or logical rules serves more robust and general. Thus, it is beneficial to integrate high-order structural knowledge into GNN architectures under the sparse KGs.

To tackle these issues, we propose a novel framework, LR-GCN, which enhances GNN-based methods by capturing reasoning paths for high-order structure information. LR-GCN comprises two jointly optimized modules, namely the GNN-based predictor and reasoning path distiller, referred to as MLN-RL. The fundamental motivation is to drive MLN-RL to explore meaningful reasoning paths via Reinforcement Learning [14], serving as meaningful high-order structures. Subsequently, high-order structure knowledge within these meaningful reasoning paths is exploited both implicitly and explicitly. Specifically, we propose two joint learning strategies to further leverage high-order structure knowledge among reasoning paths. Firstly, we explicitly incorporate long-range dependencies into graph convolutional layers of the GNN-based predictor by encoding discovered paths into rich-semantic edges to connect distant but relevant neighbors. By doing so, this approach also densifies KGs and alleviates the sparsity issue. Secondly, we implicitly distill logical reasoning abilities within induced rules into the predictor via Markov Logic Network (MLN) [15], which is a probabilistic logic model that applies Markov Network to first-order logic and enables uncertain inference. The Variational EM algorithm [16] is leveraged to unify the optimization of MLN and GNN-based predictor. This strategy constructs unobserved triples as hidden variables to distill first-order rule knowledge into the GNN-based predictor. During this process, we expect that the logical reasoning capability of the predictor can be enhanced. By simultaneously learning these two strategies, we succeed in integrating high-order structure knowledge into embedding learning effectively. Our extensive empirical evaluation shows that LR-GCN boosts the performance of the backbone model (CompGCN) with an obvious margin on four sparse datasets, including FB15K-237_10 (+4.26%), FB15K-237_20 (+2,21%), NELL23K (+4.37%), and WD-singer (+ 2.36%) at Hits@10.

Our contributions can be summarized as follows.

● We introduce a novel GNN-based framework, LR-GCN, which explores and exploits high-order graph structures to relieve the challenge of sparse KGC.

● We propose a novel path-based method, MLN-RL, which generates reasoning paths with delicately calibrated rule weights. This approach effectively filters out noisy paths and explores more instructive high-order graph structures.

● We also propose two strategies to explicitly and implicitly exploit the mined high-order graph structure information, including integrating long-range dependencies into the graph convolution and distilling logical reasoning knowledge based on the variational EM algorithm.

● Extensive experiments and detailed analyses on four sparse benchmarks consistently demonstrate the effectiveness of our proposed method.

2 Preliminary

In this section, we will provide a formalized definition of the knowledge graph completion (KGC) task, along with a set of notations that will be utilized in the framework chapter. Subsequently, we present a concise overview of the utilization of Graph Neural Network models and Reinforcement Learning algorithms in addressing the KGC task, which will facilitate comprehension of the subsequent methods section.

2.1 Problem definition

A KG is represented by

G = (E, R, T)

, where

E = {e_{1}, . . ., e_{| E |}}

R = {r_{1}, . . ., r_{| R |}}

denote the entity and relation set,

T = {t_{i} = (e_{s}, r_{q}, e_{o})}_{i = 1}^{| T |}

is the set of observed triples within

G

. Based on the above definition, KGC task aims to predict

e_{o}

given a new query

(e_{s}, r_{q}, ?)

e_{s}

given

(?, r_{q}, e_{o})

. In order to keep uniformity, we follow previous works and refer to both directions as queries with

(e_{s}, r_{q}, ?)

by augmenting a reverse relation

r_{q}^{- 1}

for the query

(?, r_{q}, e_{o})

. Therefore, KGC can be defined as predicting the tail entity

e_{o}

for the query

(e_{s}, r_{q}, ?)

2.2 GNN-based KG embedding

Our focus is on enhancing embedding learning based on Graph Neural Network [12] for sparse KGC. In the GNN-based method, given a training case

(e_{s}, r_{q}, e_{o})

, the KG structure is first encoded with graph convolutional layer(s), resulting in entity and relation embeddings. On top of encoded embeddings, KGE methods such as TransE, DitMult, and ConvE are leveraged to predict the probability distribution over each entity. The aforementioned processes can be formulated as follows:

(1)

\begin{aligned} v_{p r e d} = f_{ϕ} (e_{s}, r_{q} | E, R), \end{aligned}

where

v_{p r e d} \in R^{| E |}

denotes the probability vector over all entities, and

ϕ

is the parameters of the model. Finally, the GNN-based model is optimized by BCE Loss as follows:

(2)

\begin{aligned} L_{l a b e l} = B C E L o s s (v_{p r e d}, e_{o}) . \end{aligned}

In this paper, we apply CompGCN(-ConvE) [17] as our base GNN-based model. Of course, our framework is agnostic to the particular choice of the GNN-based predictor. Of course, other relevant works such as R-GCN, SACN, and SE-GNN can also be employed in our framework.

2.3 Reinforcement learning-based method

The Reinforcement Learning-based (RL-based) approach represents a promising avenue for addressing Knowledge Graph Completion (KGC) by utilizing reinforcement learning to identify interpretable reasoning paths. However, this method often lags behind Knowledge Graph Embedding (KGE) techniques in terms of performance [1,14]. Specifically, given a query

Q = (e_{s}, r_{q}, ?)

, the RL-based method searches for multiple interpretable reasoning paths, represented as

g : (e_{s}, r_{1}, e_{2}) \land (e_{2}, r_{2}, e_{3}) \land \dots \land (e_{n - 1}, r_{n - 1}, e_{n})

, and predicts the answer triple

t_{c} = (e_{s}, r_{q}, e_{n})

for each path. The path-searching process is modeled as a Markov Decision Process, with the model expected to induce the inherent rule

r u l e [g]

of the form

l : r_{1} \land r_{2} \land \dots \land r_{n - 1} \to r_{q}

, as illustrated in Fig.2, where the first part of the rule is called the rule body

l_{[b]}

and the second part is called the rule head

l_{[h]}

. Moreover, we assign a weight

0 \leq w_{l} = p (l_{[h]} | l_{[b]}) \leq 1

to each rule

l

to measure the confidence of the final predictive result. In this study, entities that are not directly connected to the query entity

e_{s}

in the path

g

, i.e.,

e_{3}, e_{4}, . . ., e_{n}

, are referred to as high-order neighbors. In the following section, we base MultihopKG [14] to discover helpful paths within sparse KGs.

Fig.2 An illustration of inducing rules from reasoning paths. To reduce the length of rules, we view loops within paths as pointless segments and remove them

Full size|PPT slide

3 Framework

In this section, we introduce our proposed framework, LR-GCN, for addressing the challenge of sparse Knowledge Graph Completion (KGC), as illustrated in Fig.3. Our approach comprises two primary components: the reasoning path distiller module, MLN-RL, and the GNN-based predictor. The MLN-RL module is responsible for exploring reasoning paths that are closely associated with the queries and distilling logical rule knowledge into the GNN-based predictor. The GNN-based model utilizes these reasoning paths as high-order graph structure features to supplement the insufficient 1-hop structural information. Moreover, the MLN-RL modular is able to update weights according to feedback from the GNN predictor, improving the quality of explored reasoning paths and the distilled logical rule knowledge, which in turn enhances the performance of the GNN-based predictor.

Fig.3 Our framework consists of two modules: GNN-based model with the long range dependency convolution layer and MLN-RL model, which are jointly optimized by high-order knowledge distillation. Given a query $(e_{s}, r_{q}, ?)$ , MLN-RL first reasons paths and classify the paths into two parts according to whether the path is correct. The positive paths and negative path segments are applied to construct new edges to capture long range dependency explicitly. While the negative paths are fed into MLN to distill knowledge for the GNN-based model by variational EM algorithm

Full size|PPT slide

To fully leverage the high-order structure knowledge obtained from MLN-RL, we propose two distinct exploitation methods based on the correctness of the reasoning paths. For positive paths, i.e., paths that predict correct entities and all path fragments starting from query entities, we explicitly encode the high-order structure information as long-range dependency knowledge and integrate it into the graph convolutional layers. For negative paths, i.e., paths that predict false answers, rather than simply discarding them, we encourage the GNN-based model to implicitly distill high-order structure knowledge and reasoning capabilities from MLN-RL, as theoretically proven by the variational EM algorithm. These two strategies are optimized upon different categories of reasoning paths without mutual conflicts. The theoretical framework incorporates both strategies, and our empirical experiments demonstrate their cumulative promotions. This is expounded upon in the subsequent experimental section.

In the following sections, we present our proposed reasoning path distiller MLN-RL in Section 3.1. We then elucidate the strategy of utilizing positive reasoning paths within the GNN architecture, which results in the convolution layer structure that accommodates long-range dependencies, as outlined in Section 3.2. We further expound on the joint learning method between MLN-RL and the GNN-based predictor with respect to negative paths, as detailed in Section 3.3. This learning approach allows the high-order structure knowledge to be distilled into the GNN-based predictor. Finally, we summarize the total optimization process in Section 3.4.

3.1 MLN-RL: reasoning path distiller

Current RL-based approaches are limited in only considering the prior weight of the reasoning path

g : (e_{s}, r_{1}, e_{2}) \land \dots \land

(e_{n - 1}, r_{n - 1}, e_{n})

, denoted as

w_{θ} (g | Q) =

p ((r_{1}, e_{2}) | e_{s}) p ((r_{2},

e_{3}) | e_{2}) \dots p ((r_{n - 1}, e_{n}) | e_{n - 1})

, to predict the final answer

e_{n}

. However, we argue that these approaches neglect the importance of considering the likelihood weight of the reasoning path

g

, denoted as

w_{θ} ((e_{s}, r_{q}, e_{n}) | g)

. In this study, we introduce an alternative for likelihood weight as

w_{l} = p_{w} (l_{[h]} | l_{[b]})

, i.e., the confidence of induced rule

l

from path

g

, where

l_{[h]}

and

l_{[b]}

are defined in Section 2.3. Specifically, previous studies have uniformly assigned a value of 1 to the likelihood weight

w_{θ} ((e_{s}, r_{q}, e_{n}) | g)

. Therefore, we define the predictive probability of

e_{n}

as follows:

(3)

\begin{aligned} p_{w, θ} (e_{n} | g, Q) & = p_{w, θ} (t_{c} = (e_{s}, r_{q}, e_{n}) is valid | g, Q) \\ = σ (w_{l = r u l e [g]} \cdot w_{θ} (g | Q)), \end{aligned}

where

Q

is the input query

(e_{s}, r_{q}, ?)

σ

denotes Sigmoid function and

θ

is the parameters of the RL model.

We have also observed that our method bears resemblance to MLN [15] in form. Conventional MLN involves a pre-defined rule set and then the search for all possible rule-induced paths, which can be a time-intensive process. Our approach improves upon this by employing an RL-based technique to first identify paths, followed by the induction of dynamic rules from these paths, which results in a more efficient and real-time updated rule base. As our method is distinct from conventional MLN, we refer to it as MLN-RL. The variational EM algorithm is utilized to update the weights of rules. A more detailed account of our method will be provided in Section 3.3.

3.2 GNN-based predictor with long range dependency convolution

Given a set of positive reasoning paths formulated as

(e_{s}, r_{1}, e_{2}) \land (e_{2}, r_{2}, e_{3}) \land \dots \land (e_{n - 1}, r_{n - 1}, e_{n})

for the query

(e_{s}, r_{q}, ?)

, we first connect

e_{s}

with

e_{n}

by constructing one new virtual fact

(e_{s}, r_{1 : n - 1}, e_{n})

, where a composite relation

r_{1 : n - 1} = r_{1} \land r_{2} \land \dots \land r_{n - 1}

is introduced here. We also hope to exploit information from other high-order neighbors for all reasoning paths, even if the path is negative. For example, for a query

(A l a n T u r i n g, l a n g u a g e, ?)

, the reasoning path

(A l a n T u r i n g, b o r n i n, L o n d o n) \land (L o n d o n,

l o c a t e d i n, U K) \land

(U K, l o c a t e d i n, E U)

is obviously wrong. However, we are still expected to mine helpful information from the path segment

(A l a n

T u r i n g, b o r n i n, L o n d o n) \land (L o n d o n, l o c a t e d i n, U K)

. To this end, we construct

n - 3

virtual triples

{(e_{s}, r_{1 : i},

e_{i + 1})}_{i = 2}^{n - 2}

for each path

(e_{s}, r_{1}, e_{2}) \land (e_{2}, r_{2}, e_{3}) \land \dots \land (e_{n - 1}, r_{n - 1}, e_{n})

by importing

n - 3

composite relations

r_{1 : 2}, r_{1 : 3}, . . ., r_{1 : n - 2}

. Specifically, we calculate the embedding

h_{r_{1 : i}}

for composite relation

r_{1 : i}

as follows:

(4)

\begin{aligned} α_{i, j} = & \frac{σ (W_{a t t n} [h_{r_{q}}; h_{r_{j}}])}{\sum_{k = 1}^{i} σ (W_{a t t n} [h_{r_{q}}; h_{r_{k}}])}, \end{aligned}

(5)

\begin{aligned} h_{r_{1 : i}} = \sum_{j = 1}^{i} α_{i, j} h_{r_{j}}, \end{aligned}

where

α_{i, j}

denotes the attention weight of relation

r_{j}

in the relation path

r_{1} \land r_{2} \land \dots \land r_{i}

[.; .]

represents the concatenation operator,

W_{a t t n}

is learnable parameters, and

σ (.)

is the LeakyReLU function.

Upon compressing reasoning paths into new triples

T_{d e n} = {(e_{s}, r_{1 : i}, e_{i + 1})}_{i = 2}^{n - 1}

, the generated triples are integrated into the graph convolutional filter, improving the model's ability to capture long-range dependencies within KGs. To clarify, it is important to note that we do not introduce newly constructed edges to the graph convolution layer explicitly during the training process, as this would alter the structure of the KGs. Instead, we propose the introduction of an additional training loss to incorporate knowledge related to long-range dependencies:

(6)

\begin{aligned} L_{d e n} = \frac{1}{| T_{d e n} |} \sum_{(e_{s}, r_{1 : i}, e_{i + 1}) \in T_{d e n}} B C E (f_{ϕ} (e_{s}, r_{1 : i}), e_{i + 1}) . \end{aligned}

3.3 High-order knowledge distillation

Negative reasoning paths are also treasure troves with much to be utilized. For example, some paths may actually be false-negative resulting from the inadequacy of Knowledge Graphs (KGs). In order to effectively make use of these negative paths, we view facts within the negative path

g : (e_{s}, r_{1}, e_{2}) \land

(e_{2}, r_{2}, e_{3}) \land \dots \land (e_{n - 1}, r_{n - 1}, e_{n})

as observed variables, and the predicted triple

t_{c} = (e_{s}, r_{q}, e_{n})

as hidden variables. As detailed in Section 3.1, building upon previous literature regarding Markov Logic Network [18], this study models the joint distribution of both observed and hidden variables using MLN-RL as follows:

(7)

\begin{aligned} \begin{aligned} p_{w, θ} (t_{c} = (e_{s}, r_{q}, e_{n}), g) = \frac{1}{Z (w)} \exp (w_{r u l e [g]} w_{θ} (g | Q)), \end{aligned} \end{aligned}

where

Z (w)

is the partition function, and

r u l e [g]

refers to the inherent rule of the reasoning path

g

w_{r u l e [g]}

and

w_{θ} (g | Q)

are defined in Section 3.1.

The training procedure of MLN-RL is started with maximizing the log-likelihood of the observed facts within the path

g

, i.e.,

\log p_{w, θ} (g)

. We instead optimize the evidence lower bound (

E L B O

) of the data log-likelihood to introduce hidden variables as follows:

(8)

\begin{aligned} \log p_{w, θ} (g) & \geq E L B O (p_{w, θ}, p_{ϕ}) \\ = E_{p_{ϕ} (t_{c})} [\log p_{w, θ} (t_{c}, g) - \log p_{ϕ} (t_{c})], \end{aligned}

where

p_{ϕ}

denotes the GNN-based model. Here, we define the

p_{ϕ} (t_{c})

as the variational distribution. We use the variational EM algorithm [16] to optimize

E L B O

, alternating between a variational E-step and an M-step.

3.3.1 E-step: inference

In the E-step, we obtain the expectation via updating

p_{ϕ}

to maximize

E L B O

. Take Eq. (7) into Eq. (8),

E L B O

is reorganized as bellow:

(9)

\begin{aligned} E L B O = & E_{p_{ϕ} (t_{c})} [w_{r u l e [g]} w_{θ} (e_{n} | g, Q)] \\ - \log Z (w) - E_{p_{ϕ} (t_{c})} [\log p_{ϕ} (t_{c})], \end{aligned}

where

w_{r u l e [g]}

and RL parameters set

θ

are fixed in the E-step and thus the partition function

Z (w)

can be treated as a constant. Here we introduce a theorem to optimize

E L B O

in Eq. (9).

Theorem 1 Optimize

E_{p_{ϕ} (t_{c})} [w_{r u l e [g]} w_{θ} (e_{n} | g, Q)]

by gradient descent approximates to optimize:

(10)

\begin{aligned} \log p_{ϕ} (t_{c}) [w_{r u l e [g]} w_{θ} (e_{n} | g, Q)] . \end{aligned}

Please refer to Appendix 6 for the detailed proof. Therefore, we optimize

p_{ϕ}

by minimizing the following loss in the E-step:

(11)

\begin{aligned} L_{e l b o} = & - \log p_{ϕ} (t_{c}) [w_{r u l e [g]} w_{θ} (e_{n} | g, Q)] \\ + λ * E_{p_{ϕ}} [\log p_{ϕ} (t_{c})], \end{aligned}

where

λ

is a hyperparameter to be tuned. We explain the practical meanings of this optimization objective. The first term requires

p_{ϕ}

, i.e., the GNN-based model, to maximize the likelihood value of the triple predicted by path

g

weighted by the path importance

w_{r u l e [g]} w_{θ} (e_{n} | g, Q)

, during which high-order graph structure knowledge contained in reasoning paths can be distilled into the GNN-based model. On the other hand, the second term acts as an entropy constraint, encouraging

p_{ϕ}

to retain the knowledge it has learned and not to overtrust in MLN-RL. The GNN-based predictor is expected to trade off between retaining its own knowledge and learning from MLN-RL.

3.3.2 M-step: learning

In the M-step, we will fix the GNN-based model

p_{ϕ}

and update the weights of rules in MLN-RL induced from reasoning paths by maximizing

E L B O

. However, it is untractable to optimize

E L B O

directly as the partition function

Z (w)

is no longer a constant. Following existing studies [18], we instead optimize the pseudo-likelihood function:

(12)

\begin{aligned} F_{P L} (w) & = E_{p_{ϕ} (t_{c})} [\log p_{w, θ} (t_{c} | M B (t_{c}))] \\ \approx E_{p_{ϕ} (t_{c})} [\log p_{w, θ} (t_{c} | g)], \end{aligned}

where

M B (t_{c})

is the Markov Blanket of

t_{c}

, involving triples that appear together with

t_{c}

in the groundings of induced rules.

p_{w, θ} (t_{c} | g)

is defined in Eq. (3).

For each induced rule

l

from path

g

that can conclude the triple

t_{c}

, we optimize the rule weight

w_{l}

by gradient descent, with the derivative:

(13)

\begin{aligned} \begin{aligned} \nabla_{w_{l}} E_{p_{ϕ} (t_{c})} [\log p_{w, θ} (t_{c} | g)] \approx p_{ϕ} (t_{c} = 1) - p_{w, θ} (t_{c} = 1 | g), \end{aligned} \end{aligned}

where

t_{c} = 1

means

t_{c}

is valid. The proof of this conclusion is given by [18]. Intuitively, for each triple

t_{c}

predicted by the false-prediction path

g

, we apply

p_{ϕ} (t_{c} = 1)

as the target for updating the probability

p_{w, θ} (t_{c} = 1 | g)

. In this way, the rule weights are updated to better measure the prior importance of reasoning paths.

3.3.3 Discussion

For the implementation of E-step, we optimize

p_{ϕ}

, i.e., the GNN-based predictor, to maximize the likelihood values of not only the triples

t_{c}

predicted by MLN-RL but all triples in the corresponding reasoning path, weighted by the path weight. Therefore, the GNN-based predictor learns both the predicted results and reasoning processes from MLN-RL. By doing so, the GNN-based predictor promises both to distill high-order structure knowledge from predicted results and to acquire reasoning capabilities from the reasoning processes of MLN-RL.

3.4 Optimization and evaluation

To speed up the training process, we pretrain the base GNN-based model and MLN-RL in advance until convergence. After that, we start jointly learning the GNN-based predictor and MLN-RL for certain epochs.

During the joint learning, reasoning paths are generated using MLN-RL for a given case

(e_{s}, r_{q}, e_{o})

using the query

(e_{s}, r_{q}, ?)

. Three losses are minimized together to train the GNN-based model:

(14)

\begin{aligned} L_{g n n} = L_{l a b e l} + β * L_{d e n} + γ * L_{e l b o}, \end{aligned}

where

L_{l a b e l}

L_{d e n}

, and

L_{e l b o}

are defined in Eq. (2), Eq. (6), and Eq. (11), respectively, and

β

and

γ

are corresponding loss weights. and we view them as hyperparameters. The rule weights in MLN-RL are then updated using the derivative in Eq. (13) in M-step. Finally, the learned high-order graph structure knowledge from MLN-RL is expected to be integrated into the GNN-based model resulting in LR-GCN model for evaluation.

4 Experiments

4.1 Experimental settings

Datasets This study follows the experimental setups of DacKGR [1] and HoGRN [2] on two sparse datasets, namely WD-singer and NELL23K. To evaluate the performance of our framework in sparse scenarios, we also uniformly sample 10%, 20%, 30%, and 60% of the triples from FB15K-237, resulting in sparser datasets denoted as FB15K-237_10, FB15K-237_20, FB15K-237_30, and FB15K-237_60, respectively. Unlike DacKGR [1] and HoGRN [2], our methodology ensures that all entities and relations are preserved by enforcing each entity (relation) to participate in at least one triple fact. Tab.1 summarizes the dataset statistics. We provide construction details of datasets in Appendixes.

Tab.1 Summary statistics of datasets

	WD-singer	NELL23K	FB15K-237_10	FB15K-237_20	FB15K-237_30	FB15K-237_60	FB15K-237
#Entity	10282	22925	14541	14541	14541	14541	14541
#Relation	270	400	237	237	237	237	237
#Train Set	16142	24321	27211	54423	108846	163269	272115
#Dev Set	2163	4951	17535	17535	17535	17535	17535
#Test Set	2203	4944	20466	20466	20466	20466	20466
Avg In-degree	1.570	1.170	1.876	3.752	5.628	11.256	18.760

Baselines we mainly compare LR-GCN with the backbone method CompGCN [17] in the following experiments. The primary objective is to assess whether LR-GCN can improve upon the performance of the backbone model and to what extent. Additionally, other KGC models that are based on embeddings such as TransE [19], RotatE [20], ComplEx [21], TuckER [22], ConvE [23], and GNN-based methods like SCAN [24], NBFNet [25], and RED-GNN [26] are evaluated. Furthermore, the findings of relevant studies on sparse KGs, namely DacKGR [1] and HoGRN [26], are also presented for comparative analysis.

Hyperparameters In our implementation, we set the embedding dimension to 200 for all KGE models. We first pre-train CompGCN as the backbone and then jointly retrain CompGCN and MLN-RL until convergence under the same learning rate. For fairness, we also retrain CompGCN independently for the same number of epochs. Learning rates for CompGCN and LR-GCN are set to 0.005 on WD-singer and NELL23K, and 0.001 on FB15K-237 and its sub-datasets. For more hyperparameters please refer to Appendixes.

Evaluation metrics Same as previous work, we use ranking metrics to evaluate our framework, i.e., MRR and HITS@k. Besides, we filter out all remaining entities valid for the test query

(h, r, ?)

from the ranking. The metrics are measured in both tail prediction and head prediction directions. We strictly follow the “RANDOM” protocol proposed by [27] to evaluate our methods.

4.2 Main results

Tab.2 presents the main results of three datasets, showcasing the consistent improvements observed in LR-GCN across all sparse datasets. (1) The MRR metric indicates that LR-GCN outperforms Comp- GCN with relative improvements of 16.87%, 9.65%, 5.50%, and 17.04%, respectively. These results validate the effectiveness of our proposed method in capturing long-range dependency to supplement insufficient structure features. (2) LR-GCN exhibits more significant improvements across all benchmarks compared to variants without Knowledge Distillation (KD) and Long Range Convolution (LRC) modules. This is because the high-order structure knowledge captured by KD and LRC is different and complementary, resulting in better performance when combined. (3) Our assessment reveals that, compared to NBFNet, our method yields slightly inferior results on FB15K-237_10 and FB15K-237_20, primarily due to the limitations inherent in our backbone model CompGCN. However, we want to note that our objectives revolve around utilizing higher-order structural information to enhance the performance of the backbone in sparse KGs and our framework is model-agnostic. We are confident that substituting NBFNet for CompGCN will enable us to achieve superior metrics.

Tab.2 Experimental results on FB15K-237_10, FB15K-237_20, WD-singer, and NELL23K. The last line records the relative improvements of LR-GCN over CompGCN. Hits@N and MRR values are in percentage. “KD” denotes High-order Knowledge Distillation and “LRC” represents Long Range Convolution. The best score is in bold and the second is underlined

	FB15K-237_10			FB15K-237_20			WD-singer			NELL23K
	Hits@N $↑$			Hits@N $↑$			Hits@N $↑$			Hits@N $↑$
	@1	@10	MRR $↑$	@1	@10	MRR $↑$	@1	@10	MRR $↑$	@1	@10	MRR $↑$
TransE	4.94	24.03	11.58	8.33	28.76	15.27	22.56	49.84	32.68	4.62	30.21	13.35
RotatE	5.64	17.16	9.52	10.40	28.52	16.43	31.43	45.96	36.79	9.63	28.13	15.75
ComplEx	9.61	22.77	13.92	10.69	26.17	15.81	29.87	43.53	34.94	14.21	32.57	20.26
TuckER	6.51	16.44	9.89	9.73	25.91	15.08	32.12	44.83	36.87	13.75	30.35	19.36
ConvE	10.56	23.90	15.69	11.38	29.51	17.27	31.46	47.37	37.22	14.44	37.55	22.73
SACN	9.60	22.56	13.98	11.36	28.82	17.07	28.49	43.70	34.10	14.36	35.01	21.22
RED-GNN	8.43	19.8	12.22	12.28	30.88	18.43	32.57	49.18	38.79	16.63	39.65	24.24
NBFNet	11.13	27.78	16.64	12.45	31.87	18.89	32.59	48.10	38.81	14.40	39.04	22.74
DacKGR	10.21	21.34	13.91	10.99	26.15	15.87	26.49	44.30	32.66	13.00	31.63	18.99
HoGRN	−	−	−	−	−	−	−	48.80	39.07	−	39.98	24.56
-backbone
CompGCN	9.52	23.11	14.11	11.27	29.51	17.30	31.75	47.62	37.65	13.79	37.57	21.59
LR-GCN (w/o KD)	10.41	26.32	15.77	11.94	30.98	18.22	32.06	49.61	38.71	15.41	39.84	23.41
LR-GCN (w/o LRC)	10.60	26.25	15.89	11.59	30.11	17.74	31.89	48.28	38.00	15.99	39.11	23.53
LR-GCN	11.07	27.37	16.49	12.57	31.72	18.97	32.80	49.98	39.27	17.31	41.90	25.27
–Rela. Impr.	+16.28%	+18.43%	+16.87%	+11.54%	+7.49%	+9.65%	+3.31%	+4.96%	+5.50%	+25.53%	+11.53%	+17.04%

4.3 KG sparsity analysis

This section presents an analysis of the impact of knowledge graph (KG) sparsity upon FB15K-237 and its sub-datasets FB15K-237_x (x=10, 20, 30, 60). As explicated in Section 4.1, we construct four sub-datasets FB15K-237_10, FB15K-237_20, FB15K-237_30, and FB15K-237_60 by randomly removing 90%, 80%, 70%, and 40% triples. With the aim of ensuring equal difficulty levels across tasks with varying graph sparsities, we enforced the constancy of entities and relations among all sub-datasets, thereby guaranteeing the participation of each entity and relation in at least one triplet. The findings, depicted in Fig.4, reveal that improvements over CompGCN decrease as the average in-degree increases. This is due to the fact that entities in dense KGs possess sufficient first-order neighbors, resulting in limited information gained from higher-order structures.

Fig.4 Improvements of LR-GCN on FB15K-237 and 4 sparse datasets against to CompGCN (60%, 30%, 20%, and 10% denote percentages of retained triples)

Full size|PPT slide

4.4 In-degree analysis

This study evaluates the performance of LR-GCN and CompGCN on entities with varying in-degree to determine LR-GCN’s superiority under high-sparsity conditions. The test set was divided into subsets based on entity in-degree, and then two models were compared on each subset. Results in Fig.5 showed that: (1) As the in-degree decreases, there is an increase in the entity frequency. This observation suggests that resolving high-sparsity concerns is of great significance in enhancing the efficacy of integral completion, especially for entities with low in-degree. (2) Both models’ performance decreased as in-degree decreased, further indicating low in-degree worsens embedding learning performances, which is consistent with analysis results in Section 4.3. (3) Improvements from LR-GCN over CompGCN decrease as in-degree increases (except last in-degree scopes), revealing that LR-GCN is more sensitive to high-sparsity entities. We explain this phenomenon because high-sparsity entities can accrue more information gain from high-order graph structures. (4) Obvious improvements also occur at the maximal in-degree scopes on two datasets. We hypothesize that the great performance gains originate from the reasoning capacity distilled from MLN-RL, which benefits from the rich logical patterns that are inherent in dense graph structures. Future research will delve deeper into this hypothesis further.

Fig.5 MRR results and entity frequency grouped by entity in-degree on NELL23K and FB15K-237_10. (a) NELL23K; (b) FB15K-237_10

Full size|PPT slide

4.5 Comparison with K-stack method

In this study, we evaluate the effectiveness of a proposed method that uses reasoning paths in RL to guide the exploration and exploitation of higher-order graph structural information. Our comparison involves LR-GCN and CompGCN with stacked graph convolutional layers on WD-singer and NELL23K datasets. The hyper-parameters used in Comp-GCN remain constant except for the number of layers K. Considering that we uniformly set the maximize reasoning steps implemented in MultihopKG [14] as 3, the horizon of the GNN-based predictor has been broadened, allowing it to selectively see its 3-hop neighbors. Therefore we experiment with K=1,2,3 for comparative analysis. Our results in Tab.3 demonstrate that CompGCN’s performance cannot be improved by simply stacking graph convolution layers. This validates our statement about the over-squashing problem. Additionally, our proposed LR-GCN significantly outperforms CompGCN (K=2) and CompGCN (K=3), indicating that the high-order graph structure information obtained from our method is more useful for sparse KGC.

Tab.3 Performance comparison of LR-GCN with CompGCN stacked with $K = 1, 2, 3$ graph convolutional layers on WD-singer and NELL23K

	WD-singer		NELL23K
	Hits@10	MRR	Hits@10	MRR
CompGCN (K=1)	47.62	37.65	37.57	21.60
CompGCN (K=2)	47.98	36.55	37.34	21.13
CompGCN (K=3)	47.80	34.22	37.55	21.01
LR-GCN	49.98	39.27	41.90	25.27

4.6 MLN-RL performance

We also propose a novel RL-based method MLN-RL to primarily enable existing RL-based methods to output probabilities to facilitate knowledge distillation for GNN-based models. Besides, the effect of the reasoning path distiller heavily impacts the final performance of the GNN-based predictor. Therefore, we train the MLN-RL module separately to observe its effect compared with DacKGR [1]. We present DacKGR and MLN-RL results on FB15K-237_10, WD-singer, and NELL23K in Tab.4. The results demonstrate that MLN-RL outperforms DacKGR, especially on WD-singer, which verifies the feasibility and security of MLN-RL.

Tab.4 Performances of MLN-RL compared with its base model DacKGR on FB15K-237_10, WD-singer and NELL23K. Values are in percentage

	DacKGR		MLN-RL
	Hits@1	MRR	Hits@1	MRR
FB15K-237_10	10.21	13.91	10.33	13.97
WD-singer	26.49	32.66	28.05	33.65
NELL23K	13.00	18.99	13.02	19.05

4.7 Path length analysis

Obviously, the effectiveness and efficiency of utilizing high-order graph structure information are related to the exploration results from MLN-RL. To analyze the effect of the MLN-RL module on LR-GCN framework, we conducted experiments on NELL23K and WD-singer under different lengths of reasoning paths. The experimental results are shown in Tab.5 and Fig.6: (1) Tab.5 records the average training time for each epoch of LR-GCN on WD-singer and NELL23K. The time used for each round of training increases significantly as the path length increases, which proves that the time bottleneck of our method is in the higher-order structural information search phase. (2) Fig.6 illustrates performances under different path length settings for DacKGR, MLN-RL, and LR-GCN. Although the effect of DacKGR decreases with increasing path length, the MLN-RL module has been able to maintain the improvements, proving the effectiveness of incorporating the rule prior information. (3) As the path length increases, the performance of MLN-RL decreases significantly, which indicates that the number of negative rules in long paths decreases, i.e., the number of noisy paths increases with the increase of path length. However, the performance of LR-GCN does not decrease, which proves that the LR-GCN framework is robust to noisy paths.

Tab.5 Training time per epoch on WD-singer and NELL23K. Each of these values is in minutes

	Path length
	3	4	5	6
WD-singer	3.133	4.3	5.03	6.067
NELL23K	1.683	2.467	3.017	3.5

Fig.6 Hits@1 and MRR results for different reasoning paths lengths on WD-singer. (a) Hits@1; (b) MRR

Full size|PPT slide

Fig.7 Blue edges denote the reasoning paths searched for corresponding queries. Orange and green nodes denote the predicted answers by the pre-trained GNN-based predictor before joint learning and the golden answers, respectively. (a) Case 1: (Intelligent dance music, parent_genre, ?); (b) Case 2: (David Nutter, nominated_for, ?)

Full size|PPT slide

4.8 Case study

We visualize some exemplars learned by LR-GCN on the FB15K-237_10 dataset in Fig.7 to further explain our motivations. The first instance inquires about the parent genre of Intelligent dance music. Prior to joint learning, the CompGCN model predicts an incorrect response Folk rock but successfully predicts the correct answer with the aid of the positive reasoning path found by the MLN-RL module. This can be attributed to the establishment of a closer semantic connection between Intelligent dance music and Techno, as well as the possible injection of logic rules

p a r e n t_g e n r e \land p a r e n t_g e n r e \to p a r e n t_g e n r e

during training. The second instance demonstrates a negative reasoning path where the predicted answer is erroneous; however, the intermediate answer in the reasoning path aids in correcting the prediction during joint learning. Similar analyses can also be conducted upon this demonstration.

5 Related work

Knowledge graph completion Knowledge Graph Embedding (KGE) serves the most common method for Knowledge Graph Completion, which aims to learn dense distributed representation for entities and relations in KGs. According to the design criteria of the scoring function, KGE approaches can be classified into three main families: translation-based models, semantic matching-based, and deep learning models [28], represented by TransE [19], DistMult [29], and ConvE [23], respectively. On top of the above KGE models, GNN-based KGC models encode graph structure first with Graph Neural Network (GNN) [12] and then score triples using mentioned KGE methods, represented by R-GCN [30], CompGCN [17], and SE-GNN [31]. Reinforcement Learning (RL) can also be applied to multi-hop Knowledge Graph Completion [14,32], where the agent is trained to find the reasoning path for explaining the conducted query. The RL-based methods often lag behind KGE models, but provide better explainability. However, previous KGC models assume the KGs are dense enough to learn rich-semantic embeddings, which is not common in real-world scenarios [2]. On the contrary, KGs are often sparse due to the limitations of corpus and techniques. Obvious performance degradation can be observed for previous KGC models due to the growing sparsity of KGs.

Sparse knowledge graph completion To solve the problem of sparse KGC, [1] proposed DacKGR, a Reinforcement Learning (RL) based method, by extending the action space using a pre-trained KGE model. The improvements of DacKGR are remarkable compared with its base method MultiHopKG [14] but still lag behind KGE models. [33] proposed to apply textual information like entity names and descriptions to supplement insufficient features, but not all KGs contain textual information. [2] propose HoGRN to relieve high sparsity from different views, which introduces weight-free attention and learns high-quality relation embeddings in the Graph Neural Network framework. [9] also proposed to aggregate messages with the attention mechanism and utilize contrastive learning to enrich entity semantics. Obviously, these two works are confined to exploring the efficient use of 1-hop neighbors information without considering introducing higher-order graph structure features to enrich semantics of entities.

Path-based knowledge graph reasoning This work proposes to utilize path-based models to explore effective higher-order graph structures. The most common path-based approaches include rule-based [34-38], differentiable logic-based [39-41], and reinforcement learning-based approaches [1,14,42]. Rule-based approaches do reasoning by mining chained rules from KGs, where rules are used in two main ways: some works use the mined rules to predict target entities by matching rule bodies [34], and some works do this by combining rule mining with KGE learning [36,37], where the rules augment the representation learning and then the KG embeddings help to enhance the quality of the mined rules. Differentiable logic-based approaches represent paths as products of relational sequences and learn the logic through the BP algorithm [43]. Reinforcement Learning-based methods model multi-step inference path generation as a multi-step decision-making process [14]. Based on the path-based approach, we find that the previous approaches only utilize the likelihood probabilities of the paths while ignoring the prior weights of the paths, so we propose path inference algorithms that incorporate Markov Logic Network [15].

KGC based on markov logic network Markov Logic Network (MLN) [15] defines the joint distribution of the observed and the hidden variables in undirected graphical models employing the first-order logic. [18] and [44] applied MLN in KGs to distill logic rules into embedding learning via data augmentation. However, MLN has faced criticism due to its high computational complexity, specifically in inducing formulas and grounding paths. Moreover, [18,44] also did not consider graph densification and explicitly integrating high-order graph structure within graph convolutional layers. This study enhances the computational efficiency of MLN by directly inducing rules from reasoning paths generated by Reinforcement Learning-based models and enables dynamic updating of the rule base. Detailed information on this approach can be found in Section 3.1.

6 Conclusions

In this paper, we focus on the task of sparse Knowledge Graph Completion and attribute this challenge to inadequate information from direct neighbors. This paper proposes a novel GNN-based framework, LR-GCN, that addresses this challenge by leveraging high-order graph structure information to enrich entity semantics. LR-GCN comprises two components: a base GNN-based model and an RL-based reasoning path distiller MLN-RL. MLN-RL learns to explore meaningful reasoning paths. Then two different strategies are proposed to fully exploit long-range dependency knowledge contained in reasoning paths, which learn complementary knowledge. Experimental results support the effectiveness of LR-GCN in addressing sparse KGC. Future work may involve incorporating additional features, such as text information to improve the reasoning abilities of GNN-based methods, and improve the efficiency of the proposed framework to expand it to more large-scale KGs.

Tao He is currently a PhD student in the Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China. He received the BS and MS degrees from Harbin Institute of Technology, China. His research interests are knowledge reasoning and question answering, which include knowledge graph completion, knowledge graph question answering, and video question answering

Ming Liu received the PhD degree from the School of Computer Science and Technology, Harbin Institute of Technology, China in 2010. He is a full professor of the Department of Computer Science, and the faculty member of Social Computing and Information Retrieval (HIT-SCIR), Harbin Institute of Technology, China. His research interests include knowledge graph, machine reading comprehension

Yixin Cao is an assistant professor with Singapore Management University, Singapore. Before that, he was a research assistant professor of Nanyang Technology University, Singapore. He also was a research fellow with NExT++, National University of Singapore (NUS). He received his PhD degree in Computer Science from Tsinghua University, China in 2018. His research areas span natural language processing, knowledge graph, recommendation and knowledge-patched LLMs

Zekun Wang is currently a PhD student in the Social Computing and Information Retrieval research center, Harbin Institute of Technology, China. He received the BS degree from Harbin Institute of Technology, China. His research interests are efficient pretrained models

Zihao Zheng is currently a PhD student in the Social Computing and Information Retrieval research center, Harbin Institute of Technology, China. He received the BS degree from Harbin Institute of Technology, China. His research interests are information extraction and multimodal learning, which include relation extraction, named entity recognition and multimodal extraction

Bing Qin received the PhD degree from the School of Computer Science and Technology, Harbin Institute of Technology, China in 2005. She is a full professor of the Department of Computer Science, and the director of the Research Center for Social Computing and Information Retrieval (HIT-SCIR), Harbin Institute of Technology, China. Her research interests include natural language processing, information extraction, document-level discourse analysis, and sentiment analysis

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Lv X, Han X, Hou L, Li J, Liu Z, Zhang W, Zhang Y, Kong H, Wu S. Dynamic anticipation and completion for multi-hop reasoning over sparse knowledge graph. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 5694−5703

[2]	Chen W, Cao Y, Feng F, He X, Zhang Y. Explainable sparse knowledge graph completion via high-order graph reasoning network. 2022, arXiv preprint arXiv: 2207.07503

[3]	Xu X, Zhu Y, Wang X, Zhang N. How to unleash the power of large language models for few-shot relation extraction? In: Proceedings of the 4th Workshop on Simple and Efficient Natural Language Processing (SustaiNLP). 2023, 190−200

[4]	Sui D, Zeng X, Chen Y, Liu K, Zhao J. Joint entity and relation extraction with set prediction networks. IEEE Transactions on Neural Networks and Learning Systems, 2023, 1–12, doi: 10.1109/TNNLS.2023.3264735

[5]

Cao S, Shi J, Pan L, Nie L, Xiang Y, Hou L, Li J, He B, Zhang H. KQA Pro: a dataset with explicit compositional programs for complex question answering over knowledge base. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 6101−6119

[6]	Galkin M, Zhu Z, Ren H, Tang J. Inductive logical query answering in knowledge graphs. 2022, arXiv preprint arXiv: 2210.08008

[7]	Li D, Li Y, Zhang J, Li K, Wei C, Cui J, Wang B. C³KG: a Chinese commonsense conversation knowledge graph. In: Proceedings of Findings of the Association for Computational Linguistics: ACL 2022. 2022, 1369−1383

[8]	Fei Z, Zhou X, Gui T, Zhang Q, Huang X. LFKQG: a controlled generation framework with local fine-tuning for question generation over knowledge bases. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, 6575−6585

[9]	Tan Z, Chen Z, Feng S, Zhang Q, Zheng Q, Li J, Luo M. KRACL: contrastive learning with graph context modeling for sparse knowledge graph completion. In: Proceedings of the ACM Web Conference 2023. 2023, 2548−2559

[10]	Jin D, Gong Y, Wang Z, Yu Z, He D, Huang Y, Wang W. Graph neural network for higher-order dependency networks. In: Proceedings of the ACM Web Conference 2022. 2022, 1622−1630

[11]	Yang C, Liu M, Zheng V W, Han J. Node, motif and Subgraph: leveraging network functional blocks through structural convolution. In: Proceedings of 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 2018, 47−52

[12]	Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[13]	Topping J, Di Giovanni F, Chamberlain B P, Dong X, Bronstein M M. Understanding over-squashing and bottlenecks on graphs via curvature. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[14]	Lin X V, Socher R, Xiong C. Multi-hop knowledge graph reasoning with reward shaping. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3243−3253

[15]	Richardson M, Domingos P . Markov logic networks. Machine Learning, 2006, 62( 1): 107–136

[16]	Bishop C M. Pattern Recognition and Machine Learning. New York: Springer, 2006

[17]	Vashishth S, Sanyal S, Nitin V, Talukdar P. Composition-based multi-relational graph convolutional networks. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[18]	Qu M, Tang J. Probabilistic logic neural networks for reasoning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 7712−7722

[19]	Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2787−2795

[20]	Sun Z, Deng Z H, Nie J Y, Tang J. Rotate: Knowledge graph embedding by relational rotation in complex space. In: Proceedings of the 7th International Conference on Learning Representations. 2019

[21]	Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G. Complex embeddings for simple link prediction. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 2071−2080

[22]	Balažević I, Allen C, Hospedales T. TuckER: Tensor factorization for knowledge graph completion. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 5185−5194

[23]	Dettmers T, Minervini P, Stenetorp P, Riedel S. Convolutional 2D knowledge graph embeddings. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 221

[24]	Shang C, Tang Y, Huang J, Bi J, He X, Zhou B. End-to-end structure-aware convolutional networks for knowledge base completion. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2019, 3060−3067

[25]	Zhu Z, Zhang Z, Xhonneux L P A C, Tang J. Neural bellman-ford networks: A general graph neural network framework for link prediction. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 29476−29490

[26]	Zhang Y, Yao Q. Knowledge graph reasoning with relational digraph. In: Proceedings of the ACM Web Conference 2022. 2022, 912−924

[27]	Sun Z, Vashishth S, Sanyal S, Talukdar P, Yang Y. A re-evaluation of knowledge graph completion methods. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 5516−5522

[28]	Rossi A, Barbosa D, Firmani D, Matinata A, Merialdo P . Knowledge graph embedding for link prediction: a comparative analysis. ACM Transactions on Knowledge Discovery from Data, 2021, 15( 2): 14

[29]	Yang B, Yih W T, He X, Gao J, Deng L. Embedding entities and relations for learning and inference in knowledge bases. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[30]	Schlichtkrull M, Kipf T N, Bloem P, Van Den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: Proceedings of the 15th European Semantic Web Conference. 2018, 593−607

[31]	Li R, Cao Y, Zhu Q, Bi G, Fang F, Liu Y, Li Q. How does knowledge graph embedding extrapolate to unseen data: a semantic evidence view. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2022, 5781−5791

[32]	Wan G, Pan S, Gong C, Zhou C, Haffari G. Reasoning like human: hierarchical reinforcement learning for knowledge graph reasoning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 1926−1932

[33]	He T, Jiang T, Zheng Z, Zhu H, Zhang J, Liu M, Zhao S, Qin B. VEM²L: a plug-and-play framework for fusing text and structure knowledge on sparse knowledge graph completion. 2022, arXiv preprint arXiv: 2207.01528

[34]	Galárraga L, Teflioudi C, Hose K, Suchanek F M. Fast rule mining in ontological knowledge bases with AMIE+. The VLDB Journal, 2015, 24(6): 707−730

[35]	Qu M, Chen J, Xhonneux L P, Bengio Y, Tang J . RNNlogic: learning logic rules for reasoning on knowledge graphs. 2020, arXiv preprint arXiv: 2010, 04029,

[36]	Niu G, Zhang Y, Li B, Cui P, Liu S, Li J, Zhang X. Rule-guided compositional representation learning on knowledge graphs. In: Proceedings of the 34th AAAI conference on artificial intelligence. 2020, 2950−2958

[37]	Niu G, Li B, Zhang Y, Pu S. Perform like an engine: A closed-loop neural-symbolic learning framework for knowledge graph inference. In: Proceedings of the 29th International Conference on Computational Linguistics. 2021, 1391−1400

[38]	Xu J, Zhang J, Ke X, Dong Y, Chen H, Li C, Liu Y. P-INT: a path-based interaction model for few-shot knowledge graph completion. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2021. 2021, 385−394

[39]	Yang F, Yang Z, Cohen W W. Differentiable learning of logical rules for knowledge base reasoning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 2316−2325

[40]	Sadeghian A, Armandpour M, Ding P, Wang D Z. DRUM: End-to-end differentiable rule mining on knowledge graphs. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1375

[41]	Wang P W, Stepanova D, Domokos C, Kolter J Z. Differentiable learning of numerical rules in knowledge graphs. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[42]	Zhang D, Yuan Z, Liu H, Lin X, Xiong H. Learning to walk with dual agents for knowledge graph reasoning. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2022, 5932−5941

[43]	Rumelhart D E, Hinton G E, Williams R J . Learning representations by back-propagating errors. Nature, 1986, 323( 6088): 533–536

[44]	Zhang Y, Chen X, Yang Y, Ramamurthy A, Li B, Qi Y, Song L. Efficient probabilistic logic reasoning with graph neural networks. In: Proceedings of the 8th International Conference on Learning Representations. 2020

Acknowledgements

The research in this article was supported by the National Key R&D Program of China (2022YFF0903301), the National Natural Science Foundation of China (Grant Nos. U22B2059, 61976073, 62276083), the Shenzhen Foundational Research Funding (JCYJ20200109113441941), and the Major Key Project of PCL (PCL2021A06).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2025 Higher Education Press

AI Summary AI Mindmap

PDF(17469 KB)

Supplementary files

FCS-23521-OF-TH_suppl_1 (334 KB)

890

Accesses

Citations

Altmetric

Detail

Sections

Recommended

Abstract
Graphical abstract
Keywords
Cite this article
1 Introduction
Fig.1 KGC results of different previous KG embedding models on FB15K-237 and its sparse subsets (60%, 40%, and 20% denote percentages of retained triples). The performance drops dramatically as we remove triples. (a) Hits@10; (b) MRR
2 Preliminary
2.1 Problem definition
2.2 GNN-based KG embedding
2.3 Reinforcement learning-based method
Fig.2 An illustration of inducing rules from reasoning paths. To reduce the length of rules, we view loops within paths as pointless segments and remove them
3 Framework
Fig.3 Our framework consists of two modules: GNN-based model with the long range dependency convolution layer and MLN-RL model, which are jointly optimized by high-order knowledge distillation. Given a query (es,rq,?), MLN-RL first reasons paths and classify the paths into two parts according to whether the path is correct. The positive paths and negative path segments are applied to construct new edges to capture long range dependency explicitly. While the negative paths are fed into MLN to distill knowledge for the GNN-based model by variational EM algorithm
3.1 MLN-RL: reasoning path distiller
3.2 GNN-based predictor with long range dependency convolution
3.3 High-order knowledge distillation
3.3.1 E-step: inference
3.3.2 M-step: learning
3.3.3 Discussion
3.4 Optimization and evaluation
4 Experiments
4.1 Experimental settings
Tab.1 Summary statistics of datasets
4.2 Main results
Tab.2 Experimental results on FB15K-237_10, FB15K-237_20, WD-singer, and NELL23K. The last line records the relative improvements of LR-GCN over CompGCN. Hits@N and MRR values are in percentage. “KD” denotes High-order Knowledge Distillation and “LRC” represents Long Range Convolution. The best score is in bold and the second is underlined
4.3 KG sparsity analysis
Fig.4 Improvements of LR-GCN on FB15K-237 and 4 sparse datasets against to CompGCN (60%, 30%, 20%, and 10% denote percentages of retained triples)
4.4 In-degree analysis
Fig.5 MRR results and entity frequency grouped by entity in-degree on NELL23K and FB15K-237_10. (a) NELL23K; (b) FB15K-237_10
4.5 Comparison with K-stack method
Tab.3 Performance comparison of LR-GCN with CompGCN stacked with K=1,2,3 graph convolutional layers on WD-singer and NELL23K
4.6 MLN-RL performance
Tab.4 Performances of MLN-RL compared with its base model DacKGR on FB15K-237_10, WD-singer and NELL23K. Values are in percentage
4.7 Path length analysis
Tab.5 Training time per epoch on WD-singer and NELL23K. Each of these values is in minutes
Fig.6 Hits@1 and MRR results for different reasoning paths lengths on WD-singer. (a) Hits@1; (b) MRR
Fig.7 Blue edges denote the reasoning paths searched for corresponding queries. Orange and green nodes denote the predicted answers by the pre-trained GNN-based predictor before joint learning and the golden answers, respectively. (a) Case 1: (Intelligent dance music, parent_genre, ?); (b) Case 2: (David Nutter, nominated_for, ?)
4.8 Case study
5 Related work
6 Conclusions
References
Acknowledgements
Competing interests
RIGHTS & PERMISSIONS

Received	Accepted	Published
30 Jun 2023	12 Dec 2023	15 Feb 2025
Just Accepted Date	Issue Date
14 Dec 2023	22 Apr 2024

About the journal

Browse

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

Fig.1 KGC results of different previous KG embedding models on FB15K-237 and its sparse subsets (60%, 40%, and 20% denote percentages of retained triples). The performance drops dramatically as we remove triples. (a) Hits@10; (b) MRR

2 Preliminary

2.1 Problem definition

2.2 GNN-based KG embedding

2.3 Reinforcement learning-based method

Fig.2 An illustration of inducing rules from reasoning paths. To reduce the length of rules, we view loops within paths as pointless segments and remove them

3 Framework

3.1 MLN-RL: reasoning path distiller

3.2 GNN-based predictor with long range dependency convolution

3.3 High-order knowledge distillation

3.3.1 E-step: inference

3.3.2 M-step: learning

3.3.3 Discussion

3.4 Optimization and evaluation

4 Experiments

4.1 Experimental settings

Tab.1 Summary statistics of datasets

4.2 Main results

4.3 KG sparsity analysis

Fig.4 Improvements of LR-GCN on FB15K-237 and 4 sparse datasets against to CompGCN (60%, 30%, 20%, and 10% denote percentages of retained triples)

4.4 In-degree analysis

Fig.5 MRR results and entity frequency grouped by entity in-degree on NELL23K and FB15K-237_10. (a) NELL23K; (b) FB15K-237_10

4.5 Comparison with K-stack method

Tab.3 Performance comparison of LR-GCN with CompGCN stacked with K=1,2,3 graph convolutional layers on WD-singer and NELL23K

4.6 MLN-RL performance

Tab.4 Performances of MLN-RL compared with its base model DacKGR on FB15K-237_10, WD-singer and NELL23K. Values are in percentage

4.7 Path length analysis

Tab.5 Training time per epoch on WD-singer and NELL23K. Each of these values is in minutes

Fig.6 Hits@1 and MRR results for different reasoning paths lengths on WD-singer. (a) Hits@1; (b) MRR

4.8 Case study

5 Related work

6 Conclusions

{{custom_sec.title}}

{{custom_sec.title}}

References

Acknowledgements

Competing interests

RIGHTS & PERMISSIONS

Tab.3 Performance comparison of LR-GCN with CompGCN stacked with $K = 1, 2, 3$ graph convolutional layers on WD-singer and NELL23K