1. Research Center of Complex Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, China
2. Institute of Accounting and Finance, Shanghai University of Finance and Economics, Shanghai 200433, China
liujg004@ustc.edu.cn
Show less
History+
Received
Accepted
Published
2021-10-25
2022-02-14
2022-12-15
Issue Date
Revised Date
2022-05-16
PDF
(2150KB)
Abstract
The identification of spreading influence nodes in social networks, which studies how to detect important individuals in human society, has attracted increasing attention from physical and computer science, social science and economics communities. The identification algorithms of spreading influence nodes can be used to evaluate the spreading influence, describe the node’s position, and identify interaction centralities. This review summarizes the recent progress about the identification algorithms of spreading influence nodes from the viewpoint of social networks, emphasizing the contributions from physical perspectives and approaches, including the microstructure-based algorithms, community structure-based algorithms, macrostructure-based algorithms, and machine learning-based algorithms. We introduce diffusion models and performance evaluation metrics, and outline future challenges of the identification of spreading influence nodes.
Complicated interactions between individuals can be well described by social networks, where nodes represent online users and offline individuals, and edges denote relations between nodes (Yan et al., 2013; Buyalskaya et al., 2021). Therefore, the social network has been studied by many branches of science in solving a wide range of problems, including information diffusion (Watts and Dodds, 2007; Muthukrishna and Schaller, 2020), spread of infectious disease (Jia et al., 2020; Bertozzi et al., 2020), formation of social relationships (Liben-Nowell and Kleinberg, 2007), and identification of online user reputation (Liu et al., 2017b; Dai et al., 2018). One of the key academic questions of social networks is the so-called identification of spreading influence nodes, which aims to find nodes that can maximize the scale of information diffusion. For instance, in online social platforms, such as Weibo, Facebook, and Twitter, a group of users serves as the influencer who can spread information widely and rapidly and arouse widespread concern and discussions of a topic in a short period (Lou and Tang, 2013). Identifying these spreading influence nodes is of importance for a great number of applications (Hou et al., 2014), such as viral marketing (Huang et al., 2019), controlling rumor (Borge-Holthoefer and Moreno, 2012), and fake news verification (Campan et al., 2017). Specifically, knowing the spreading influence of each node, marketing managers can accurately identify which influencers can help to promote their new products to target customers more effectively, thereby maximizing the use of advertising budgets. Official departments will be able to detect the rumor spreading sources and take corresponding measures before the rumor causes huge influence on social order. Online users can judge whether a news is fake by verifying the importance of news sources.
Identifying the spreading influence nodes of social networks is a long-standing challenge in modern social science, which has attracted considerable research effort over the past decades. Measure the influence of each node by conducting real-world experiments in social networks with more than millions of nodes is unfeasible because resources and time are limited. The mainstream ideology of this field is to estimate the spreading influence of nodes based on nodes’ attributes and structural characteristics because it can sharply reduce the costs if the identification algorithms of spreading influence nodes are accurate (Lü et al., 2016; Liu et al., 2021). Existing literature reviews summarized the identification algorithms of spreading influence nodes with different focuses. Liu et al. (2013b) gave an overview of the identification algorithms of spreading influence nodes from the network topology and diffusion models’ viewpoints and systematically analyzed the advantages and disadvantages of different algorithms. Ren and Lü (2013) introduced more than 30 different algorithms before 2014. Lü et al. (2016) presented a survey on the identification algorithms of spreading influence nodes and performance evaluation metrics, and compared the performance of representative methods in different types of networks. Liu et al. (2021) reviewed the algorithms developed on the basis of centralities (Freeman, 1977; 1978), PageRank (Brin and Page, 1998), and Hyperlink-Induced Topic Search (HITS) (Kleinberg, 1999). In addition to algorithms designed for the static network, the temporal network-based method has received increasing attention in recent years (Yang et al., 2018a; Yin et al., 2018; Guo et al., 2019) because it can well describe the dynamic characteristics of social systems. Chen et al. (2020) summarized three different types of algorithms proposed for the temporal network: The network topology-based, the random walk-based, and machine learning-based algorithms. Ren (2020) pointed out the challenges when applying the temporal network-based algorithms in the growing networks, time-varying networks, and perturbed temporal network.
A number of novel algorithms based on new techniques and ideas have emerged in recent years due to the confluence of improved computational capabilities, the explosive growth of new datasets, the increasing trend of interdisciplinary collaboration, and fast-changing demands. The existing algorithms can be classified into four categories in accordance with the type of structural attributes used to design identification algorithms: The microstructure-based (MSB), community structure-based (CSB), macrostructure-based (MASB), and machine learning-based (MLB) algorithms. Specifically, the MSB algorithms are developed to meet the efficient identification of spreading influence nodes in large-scale networks. Recent studies have given more attention to the relations and attributes of high-order neighbors rather than simply aggregating the structure information of nearest neighbors to enhance the accuracy of the MSB algorithms (Dai et al., 2019; Sun et al., 2019). Community structure information, which can help researchers to obtain a more comprehensive understanding of the social network, has been increasingly used to identify spreading influence nodes (Galvão et al., 2010; Ghalmane et al., 2019a; Zhang et al., 2019b). Combining the macro and micro structure information to enhance the generalizability of algorithms has become a new trend (Zareie et al., 2019; Namtirtha et al., 2021). The MLB algorithms have begun to appear in this field and still lack a systematic review (Yu et al., 2020; Fan et al., 2020). To catch up with the recent progress of the identification of spreading influence nodes, we present a review on new developments of the MSB, CSB, MASB, and MLB algorithms. We introduce diffusion models and performance evaluation metrics commonly used in studies of the identification of spreading influence nodes. We attempt to summarize the current challenges of this field. Details, including the methodology, study problems, data, and main findings of representative MSB, CSB, MASB, and MLB algorithms are summarized in Tab.1.
The rest of this review is organized as follows. Section 2 presents the basic definitions of the social network and the description of the identification problem of spreading influence nodes. Section 3 introduces the MSB algorithms. Section 4 summarizes the CSB algorithms. Sections 5 and 6 discuss the MASB algorithms and the MLB algorithms, respectively. Sections 7 and 8 describe the diffusion models and performance evaluation metrics, respectively. Section 9 summarizes the future study trends and unsolved problems of this field.
2 Related definition and problem description
Let be an unweighted network consisting of nodes and edges, where and denote the set of nodes and the set of edges, respectively. In social networks, online users or offline individuals can be regarded as nodes, and edges describe the relations between these nodes. Considering whether the edge has weights and directions, social networks can be classified into the undirected and weighted network, the undirected and unweighted network, the directed and unweighted network, and the directed and weighted network, as shown in Fig.1. The social network can also be represented by its adjacency matrix , where if node is connected to node , otherwise.
The identification task of spreading influence nodes in social networks is to find nodes that can cause a great influence on the structure of social networks or maximize the scale and speed of information spreading. Specifically, the identification of spreading influence nodes can be further divided into the task of node ranking and the task of influence maximization. The node ranking task refers to ranking nodes in descending order in accordance with their spreading influence scores obtained by applying an evaluation function of spreading influence nodes. The influence maximization problem aims to find a set of seed nodes with a fixed-size to achieve the maximum influence.
3 MSB algorithms
In the era of big data, social networks, such as Weibo, are characterized by large scale and intricate connections between users. For such networks, directly using the structure information of the entire network to identify spreading influence nodes will be costly and inefficient. To develop efficient identification algorithms that can be applied to large-scale social networks, an increasing number of researchers attempt to identify spreading influence nodes by considering only the micro-level structural information. Hu et al. (2018) proved the feasibility of identifying the spreading influence nodes via the micro-level structure information on the basis of percolation theory (Dorogovtsev et al., 2008). They found the nucleation behavior of the spreading process, that is, if the number of individuals influenced by spreading sources is larger than a small characteristic number, then the information will rapidly reach the percolation cluster regardless of the global structure of the network, and it will be contained within a local area otherwise. Over the past decades, a great number of the MSB algorithms that can achieve relatively high accuracy while keeping low complexity have been proposed (Liu et al., 2017a; Bao et al., 2017).
The simplest MSB algorithm is the degree centrality (DC) (Freeman, 1978), which defines the number of a node’s first-order neighbors as its spreading influence, which is given as
where denotes node ’s degree, and is the total number of nodes in the network. In common sense, information shared by online users may influence their followers and indirectly affect followers’ friends. Having users with the same number of followers on online social platforms is common. In such cases, only considering the number of directly connected nodes may be extremely naive. An example is shown in Fig.2 that although the degree centralities of nodes 1 and 2 are the same, the difference in the number of second-order neighbors will be ignored by the DC method.
In spite of the above limitations, many MSB algorithms borrowed the idea of the DC method because it requires the least information. The local centrality (LC) (Chen et al., 2012) improved the performance of the DC method by simply aggregating the degree of high-order neighbors. However, for large-scale social networks, the more inclusion of information of higher-order neighbors, the better the performance of the identification algorithm is not always the case. Therefore, deciding how many steps of neighbor nodes to consider is a key point in balancing the accuracy and computational complexity of identification algorithms. Liu et al. (2016b) compared the changes in accuracy and complexity of their proposed algorithm called neighborhood centrality (NC) when using the structural information of different order neighbors. They found that once the structure information of more than three-order neighbors are used, the accuracy will not obtain remarkably improvement while the complexity increases intensely. The NC algorithm is given as
where denotes a benchmark identification algorithm of spreading influence nodes, is the set of first-order neighbors of node , is a free parameter, and represents the neighbor’s order. As shown in Eq. (2), the NC algorithm assuming a node’s spreading influence will be largely affected by neighbors close to it and will be slightly influenced by distant neighbors. In other words, Liu et al. (2016b) thought that the neighbors’ contribution to the node’s spreading influence is related to distances from the neighbor nodes to the target node.
The local neighbor contribution (LNC) algorithm (Dai et al., 2019) considers the node’s self-influence and its neighbors’ contribution to the spreading influence, which is given as
where is the sum of neighbors’ degree of node and denotes number of first- and second-order neighbors of node . represents node ’s contribution to the spreading influence of node . represents the combination in mathematics.
Except for the difference in neighbors’ contributions on the spreading influence of the node, two connected nodes, and , have different influence on each other in directed networks. Yu et al. (2019) thought that this difference should be considered in undirected networks and defined the spreading strength (SS) of node on node , that is, , as the combination of the direct influence of node on node and the indirect influence of node ’s neighbors on node , which is given as
where denotes the number of node ’s neighbors that are not neighbors of node , represents the number of paths between and with length 2, and is a free parameter.
The algorithms based on the degree information of high-order neighbors, that is, high-order-neighbor-degree-based algorithms, are efficient and easy to understand. However, in addition to the degree information of neighbor nodes, the topological connection of a node’s neighbors, which indicates the potential for the node to spread information to other parts of networks, plays an important role in measuring its spreading influence (Soffer and Vázquez, 2005; Ren et al., 2013b). The local clustering coefficient is used to design the MSB algorithms for quantifying the interactions between neighbor nodes. Mathematically, the local clustering coefficient in undirected networks is defined as
Soffer and Vázquez (2005) uncovered the negative correlation between the clustering coefficient and degree value in undirected networks. On this basis, Chen et al. (2013) further discovered that nodes with small clustering coefficient are likely to connect with more nodes in the future and proposed ClusterRank (CR) method, represented by , which is defined as
where represents the effect of node ’s local clustering coefficient on its spreading influence. Specifically, is a decreasing function concerning the local clustering coefficient because a higher local clustering coefficient of a node indicates that its neighbors interact with each other closer than with nodes in other parts of networks, resulting in information shared by the node being easily contained only in a local area. In addition to the local clustering coefficient of first-order neighbors, Gao et al. (2014) proposed local structure centrality (LSC) by further considering the positive effect of the local clustering coefficient of a node’s second-order neighbors on its spreading influence, which is given as
where is the total number of a node’s first- and second-order neighbors, and is a free parameter that can be adjusted in accordance with structural attributes of networks to guarantee a stable performance. However, the adjustment of the free parameter leads to high extra computational complexity. Berahmand et al. (2018) presented a parameter-free centrality algorithm that considers the degree and the local clustering coefficients of a node’s first- and second-order neighbors, which is defined as
Inspired by the positive effect of the local clustering coefficient of second-order neighbors on the spreading influence of the node, Yang et al. (2020) introduced entropy technology to calculate the weights assigned to degree and local clustering coefficient, and proposed a novel centrality (DCC) algorithm, which is given as
where accounts for the effect of degree and neighbors’ degree of node , measures the effect of first- and second-order neighbors’ local clustering coefficients, and .
The introduction of the local clustering coefficient enables the identification algorithms to have a higher resolution in distinguishing the spreading influence of nodes with the same number of neighbors, and local clustering coefficient-based algorithms can identify nodes located in the dense parts of networks. However, nodes located in locally dense but globally peripheral regions of the network are easily misclassified as spreading influence nodes because the local-clustering-coefficient-based algorithms give less attention to the position of nodes. The main challenge of designing an effective MSB algorithm for identifying multiple spreading influence nodes is how to reduce the overlapping spreading influence of selected nodes. Taking DC algorithm as an example, social networks often have the heterogeneous property, that is, nodes with a small degree are more likely to connect with nodes with a large degree. Seed nodes selected by the DC method are easy to gather within a local area of social networks (Barabási and Bonabeau, 2003; Zhou et al., 2018). One of the main ideas to alleviate this problem is to avoid selecting seed nodes with similar topological attributes within a local area to ensure that they are distributed in different parts of the network. Sheikhahmadi et al. (2015) considered the distance and the number of common neighbors between seed nodes to ensure that selected nodes are evenly distributed in the network. However, calculating the distances between each pair of nodes in large-scale networks is time consuming. Instead of using distance, Liu et al. (2017a) proposed local structure similarity (LSS), which sets the structure similarity between seed nodes as a constraint when identifying seed nodes. Specifically, LSS first selects the node with the largest degree as the initial seed and then find other seeds in an iterative manner. In each iteration, the next seed node will be chosen from the first- and second-order common neighbors of all selected seed nodes in accordance with the structure similarity score , which is given as
where denotes the set of first-order neighbors of node , is the first-order neighbors of node if node is connected to node , and is the set of first- and second-order neighbors of node if node is the second-order neighbor of node . If the structure similarity scores between the target node and all seed nodes are smaller than a given threshold , then the target node will be selected as the next seed node. To ensure that seed nodes are detected from different parts of the network, Bao et al. (2017) borrowed the idea of -means clustering (Macqueen, 1967) and presented heuristic clustering (HC) to reduce the overlapping spreading influence. Initially, centers of clusters are randomly chosen. Noncenter nodes are classified into these clusters in accordance with the local path similarity (Zhou et al., 2009) between them and center nodes, which is defined as
where denotes the set of nodes in cluster , W is the similarity matrix, is a free parameter, and is used to update the cluster center, which represents the significant of node in cluster . The node classifying step and center updating step are repeated until the steady-state is reached. The center of each cluster is selected as the seed node.
Similarity-based algorithms ensure that those seed nodes will not gather in a local region of networks by setting the number of common neighbors or distance between seed nodes as the constraint. However, the performance of this type of algorithms is sensitive to the selection of the initial seed node. For instance, the LSS algorithm only considers the degree of initial seed node, and structural attributes, such as the local clustering coefficient and the node’s position, which are proven to have a significant influence on the spreading influence of nodes, are ignored.
Inspired by the voting process, Zhang et al. (2016) proposed the VoteRank algorithm. In the initial phase, all nodes will be assigned the same voting ability and voting score . Nodes will then start to vote for their neighbors in an iterative manner. In each voting round, node ’s voting score will be updated by using the following equation
The node with the highest voting score of each round will be selected as the seed node, and its voting score will be reset to 0 in the next round. The voting abilities of the node’s neighbors will be decreased in the next round to avoid the overlapping influence problem. The VoteRank algorithm initially sets the voting abilities of all nodes to 1, which implicitly assumes that neighbors are the same as the target node, thereby leading to low resolution. To solve this problem, inspired by social conformity theory and community structure of networks, Zhang et al. (2019b) improved VoteRank from the viewpoints of the individual and the group. In common sense, attractiveness between individuals are different, which is quantified by the node’s in-degree and out-degree, that is, attractive power (AP), to measure the individual-level voting ability of the node, which is given as
where and denote the set of in- and out-neighbors of node in the directed network, respectively. The group-level voting ability, that is, initiating power (IP), is measured by the size of the community that the node belongs, which is defined as
where represents the size of the community to which node belongs. The voting score of each node is calculated as follows
Zhang et al. (2019b) presented node selection strategies from individual and group perspectives to reduce the overlapping spreading influence. From the viewpoint of the individual, the node will be removed from the network once the node is selected as the seed node. From the group viewpoint, the candidate will not be selected when the community to which the candidate node belongs is strongly connected with the communities that the seed nodes belong to. Kumar and Panda (2020) introduced the neighborhood coreness algorithm (NCRank) to enhance the resolution of the VoteRank algorithm. Considering the -shell values of neighbors, the voting score is given as
where represents the neighborhood coreness of node , and denotes a free parameter. Guo et al. (2020) proposed the EnRenew algorithm by considering the difference between nodes when decreasing their influence on the basis of information entropy, which is defined as
After the node is selected, the voting scores of its -order neighbors will be decreased by using following equation to avoid the overlapping spreading influence
where denotes the -length reachable nodes of node , and is the average degree of the network. Sun et al. (2019) extended the use of VoteRank in weighted networks by considering the edge weight and proposed the WvoteRank algorithm. The voting score of the node in weighted networks is defined as
where is the weight of edge .
The VoteRank-based algorithms are efficient because no distance calculation is included. However, the initial state of each node and the spreading influence decreasing strategy will cause a large influence on the performance of the VoteRank-based algorithms.
The spreading influence of nodes depend on their structural attributes and the spreading mechanism. On the basis of the spreading mechanism of the independent cascading (IC) model, the degree discount (DD) algorithm (Chen et al., 2009) selects a set of seed nodes by discounting a node’s degree in accordance with the number of seeds in its neighborhood to alleviate the overlapping spreading influence between seed nodes. Chen et al. (2019) assumed that if the information propagated by a node can easily influence its high-order neighbors, this node will be considered to have more potential to initiate large-scale propagation. Considering the diffusion process and the sum of probabilities that high-order neighbors being influenced by the target node, the spreading influence of nodes is quantified by using the following equation
where represents the probability of the -order neighbor being influenced by node , which is defined as
where represents the probability that node is not infected by nodes belong to , denotes the infection rate, and .
The diffusion model-based algorithms consider the spreading dynamics when evaluating the spreading influence of nodes, which is more in line with reality. However, the diffusion mechanism varies with the spreading events, thereby restricting the applications of this type of algorithms in different scenarios.
The MSB algorithms have offered solutions for identifying spreading influence nodes in large-scale social networks. Specifically, the MSB algorithms can be further classified into five main method streams based on the idea used to design identification algorithms: The high-order-neighbor-degree-based, the local-clustering-coefficient-based, the similarity-based, the VoteRank-based, and the diffusion-model-based methods. The advantages and disadvantages of representative algorithms and the five main method streams mentioned in this section are listed in Tab.2 and Tab.3. Although the MSB algorithms have achieved promising performance, several challenges still need to be addressed in the future.
For the node ranking problem, the high-order-neighbor-degree-based algorithms are efficient because they mainly focus on the degree values of nodes. However, the topological connections between neighbors did not receive sufficient attention. The local-clustering-coefficient-based algorithm improved this shortcoming by considering the relations of a node’s first- and second-order neighbors. The two types of MSB algorithms implicitly assume that spreading influence nodes are nodes located in densely connected regions of networks, which might lead to identification algorithms performing poorly in networks with a high edge density because the community structure and macro-level information are ignored. Therefore, improving the performance of MSB algorithms in densely connected networks while keeping low computational complexity can be a future direction worthy of attention.
For the influence maximization problem, the similarity-based algorithms try to reduce overlapping spreading influence by suppressing the similarity between seed nodes, which depend on the selection of initial seed nodes and similarity measurements. Researchers are encouraged to modify the initial seed node selection strategies and similarity indices to develop more effective algorithms. The VoteRank-based algorithms select seed nodes by considering neighbors’ contributions in the spreading influence of nodes in an iterative manner. How to quantify the difference in neighbors’ contributions is still worthy of study. Although diffusion-model-based algorithms consider the topological attributes of nodes and spreading dynamics when choosing seed nodes, they depend on a specific diffusion model. Whether this type of algorithms can work well when using different diffusion models needs to be further explored.
4 CSB algorithms
Community structure is a ubiquitous characteristic of social networks (Girvan and Newman, 2002; Yang et al., 2018b; Dong et al., 2021) because individuals tend to organize as groups based on their interests, occupations, social status, and other attributes. Each community can be viewed as a subnetwork in social networks, where nodes in the same subnetwork are densely connected, and edges between subnetworks are relatively sparse. Exploring the community structure can help researchers obtain an in-depth understanding of the network structure and the mechanism of the spreading process. For example, information transmission between communities requires the participation of individuals serving as the bridge. These individuals often have strong control over the information flow in common sense. In terms of this idea, Zhao et al. (2014) combined the centralities with index of the number of communities a node is connected with to identify the spreading influence nodes and discovered that this strategy can help detect spreading influence nodes that single centrality may ignore. However, the community structure of a network may change when applying different community detection algorithms (Palla et al., 2005; Newman, 2006; Pan et al., 2010; Tang et al., 2016), leading to this strategy being unstable. To alleviate this shortcoming, Zhao et al. (2015) considered the size of the community and the distribution of neighbors to reduce the dependence on community detection algorithms (Cantwell and Newman, 2019) and proposed community-based centrality (CbC), which is defined as
where denotes the number of node ’s neighbors in community , is the total number of communities, and is the size of community . Tulu et al. (2018) introduced the Shannon entropy to measure the spreading influence of nodes and proposed the community-based mediator (CbM), which defines the relations between the target node and nodes in the same community it belongs and nodes in other communities as its spreading influence, which is given as
where denotes the internal entropy and external edge density of node , and is the community. and represent the external and internal edge densities of node , respectively. Zhao et al. (2020c) improved the accuracy of closeness centrality (CC) (Sabidussi, 1966) by introducing the community structure information of the node and its neighbors. The mathematical formulation of the improved CC (ICC) is given as
where denotes the CC of node , and is the set of communities the node ’s neighbors are connected to, except the community containing node .
From the perspective of network division based on the community structure of networks, Ghalmane et al. (2019b) proposed a CSB spreading influence identification framework called modular centrality (MC). Specifically, the whole process is as follows: 1) Construct the local network and the global network by removing edges between communities and within each community; 2) Calculate the spreading influence of nodes in the local network and the global network by using the selected node spreading influence identification algorithm; and 3) The final spreading influence of a node is the sum of its spreading influence on local and global networks. They found that identifying spreading influence nodes in the local network will be more accurate than in the global network when a clear community structure is found in the original network, and the accuracy will be higher in the global network otherwise. In the real world, one node can belong to multiple communities, indicating that networks may have an overlapping community (OC) structure. In such a case, the local and global networks constructed by the proposed framework will be inappropriate. Therefore, Ghalmane et al. (2019a) modified the network construction rule, which is shown in Fig.3.
To detect the spreading influence nodes in networks with the OC structure, Wei et al. (2018) used the BigCLAM model (Yang and Leskovec, 2013) to identify the OC structure of the network. The seed nodes are found in accordance with network constraint coefficient and the number of communities the first- and second-order neighbors are connected with. The mathematical formulation of the proposed method is defined as
where denotes the network constraint coefficient of node , and is the number of communities node is connected with.
The CSB node ranking algorithms have shown advantages of using the community-level information to identify spreading influence nodes. However, the computational complexity of identification algorithms will increase because more community structural attributes are considered. Therefore, filtering significant attributes is a key step for designing an effective CSB algorithm. The use of the community structure information by splitting networks into subnetworks provides a novel perspective for designing CSB algorithms. However, the community structure of networks depends on community detection algorithms because the community structure of most real-world networks is unknown, which may affect the performance of this type of algorithm.
In addition to helping in analyzing network structure, community structure information can be used to reduce the time complexity when dealing with the influence maximization task. On the basis of submodular property of the influence spread (Galstyan and Cohen, 2007), Halappanavar et al. (2016) found that the overall influence of seed nodes selected from each community independently approximates to the influence of seed nodes identified by using the global structure information. Inspired by the community structure of social networks, Wang et al. (2010) developed a community-based greedy algorithm (CGA) based on the IC model (Kempe et al., 2003), which is more efficient than the MixGreedy algorithm (Kempe et al., 2003). Specifically, the CGA algorithm uses dynamic programming to determine which community can bring the largest importance gain in each step to narrow the searching space from the network level to the community level. The largest importance gain that community , , can bring is defined as
where denotes the set of nodes with the size , and represents the overall spreading influence of . On the basis of Eq. (31), the community selection strategy is defined as
where is the largest importance gain that community , , can bring, and denotes the importance gain of selecting the th seed node from previous communities. As shown in Eq. (32), CGA will attempt to detect the th spreading influence node in community if the importance gain of finding the th seed node in community is larger than in previous communities. Although this strategy has relatively low complexity, it cannot guarantee high accuracy. On the basis of spreading dynamics, Shang et al. (2017) divided the whole process into two phases: 1) The set of seed nodes influences their nearest neighbors in ; and 2) influences nonseed nodes in each community. In the first phase, the probability of nodes in being influenced by seeds is defined as
where denotes the probability of node being influenced by seed node . In the second phase, the influence of is calculated as follows
where represents the probability of node in community being influenced. The spreading influence of the set of seeds is defined on the basis of the weighted cascading model (Guo et al., 2020)
where denotes the set of neighbors of , and is a free parameter. Determining a reasonable probability of nonseed nodes being influenced by to ensure a high accuracy will be time consuming.
The CSB algorithms based on the spreading mechanism can identify seed nodes more comprehensively. The main limitation of this type of algorithm is similar to the diffusion model-based algorithms mentioned in the MSB algorithm section. Specifically, the performance of this type of algorithms might be influenced once the diffusion mechanism changes.
Intuitively, communities’ contributions to information diffusion are different. For example, compared with an independent small-scale community, a large-scale community with high inner edge density and strong relations with other communities is more likely to spread information on a large scale. On the basis of this intuition, Chen et al. (2014) used the community size as a criterion to first filter a set of significant communities and then identify seed nodes in each significant community by comprehensively considering the degree and the similarity of neighbor nodes, and whether the node is a hub. Although this strategy can achieve high accuracy while keeping low complexity, the information provided by the community size is limited and fails to reflect the relations between nodes in different communities. Wang et al. (2018) suggested that the spreading influence of a community can be measured from two aspects, which are its internal and external spreading influence, and the proposed community-hole index (CHR) considers the community spreading influence and the node’s position. Specifically, the spreading influence of community , that is, , is defined as
where and denote two free parameters. and represent the edge density of the community containing and the influence of the community structure on community importance, respectively. The position of a node is quantified by using the following equation
where and denote the set of neighbors that belongs and does not belong to the community containing node , respectively. The final spreading influence of a node is calculated by combining the community-level spreading influence and its position, which is given as
From the viewpoint of the seed node selection process, Qiu et al. (2019) divided the process into three steps: 1) Find candidates on the basis of the number of communities the node is connected to and its degree; 2) Further filter candidates by considering the number and the average size of communities the node is connected to, and the node’s degree; and 3) Select seed nodes from candidates based on the greedy algorithm and the IC model. Specifically, in the first phase, the core node set and the periphery node set are identified in each community independently. In the second phase, candidates will be selected in accordance with the following equation
where is the influence of community , and denotes the average size of communities node is connected to. After the searching space is narrowed, the greedy algorithm will identify seed nodes from the set of candidates.
The CSB algorithms have shown their potential in increasing the accuracy of node ranking and accelerating the speed of seed node selection. The advantages and disadvantages of representative CSB algorithms and the main method streams mentioned in this section are summarized in Tab.4 and Tab.5. However, several challenges still need to be addressed.
For the node ranking problem, the community-structural-attribute-based algorithms directly exploit the community-level information when identifying spreading influence nodes. However, the more consideration of the community-level attributes, the better the performance of the algorithms is not always the case because the efficiency of the identification algorithms is important. Thus, how to balance the accuracy and efficiency of the community-structural based algorithm needs to be further explored.
The CSB node ranking and influence maximization algorithms do rely on community detection algorithms because the community structures of most real-world networks are unknown. The community structure identified by different community detection algorithms may be different even for the same network. Thus, how to reduce the dependence of CSB algorithms on community structure detection algorithms is still a challenge in the future. The community structural attributes will be considered when designing the CSB algorithms. However, few studies focus on the relationship between node spreading influence and community structural attributes. Most real-world networks display an overlapping community structure, but most existing CSB algorithms are developed on the basis of the nonoverlapping community structure. Therefore, another challenge is extending these algorithms’ application to networks with the overlapping community structure.
5 MASB algorithms
Regarding the high time complexity, the MASB algorithms, identifying the spreading influence nodes based on the network’s global structure information, perform well in densely connected networks (Namtirtha et al., 2018). The commonly used MASB centralities include betweenness centrality (BC) (Freeman, 1977), closeness centrality (CC) (Sabidussi, 1966), eigenvector centrality (EC) (Bonacich, 1972), and -shell decomposition (Kitsak et al., 2010). BC defines nodes serving as the role of the bridge of two disconnected groups as spreading influence nodes, which is given as
where denotes the number of shortest paths between node and node , and represents the number of shortest paths between node and node that pass node .
CC defines nodes located in the center of the network as spreading influence nodes. Specifically, the location of a node is quantified by the average distance between it and the rest, which is defined as
where is the length of the shortest path between node and node .
EC measures the spreading influence of a node by the spreading influence of its neighbors, which is defined as
where means there is an edge between nodes and , otherwise. denotes a proportional parameter, and is an eigenvector where each element corresponds to each node’s spreading influence. A convergence state can be achieved by updating iteratively. PageRank is one of the most well-known algorithms developed on the basis of EC and is designed to rank the searching results returned by the Google search engine (Brin and Page, 1998). The original version of PageRank can only work well in strongly connected networks. A return probability was added to overcome this limitation. However, determining a proper return probability requires many tests, which is less efficient when used in social networks. Lü et al. (2011) proposed a parameter-free version of PageRank called LeaderRank (LR), which can converge faster than PageRank. Specifically, in the initialization phase, LeaderRank will first add a ground node interconnected with all existing nodes to make the network strongly connected. The LR values of all nodes, except the ground node, will be set to 1. After the initialization step, the LR value of each node will be updated iteratively by using the following equation
When the steady-state is reached, the LR value of the ground node will be evenly allocated to other nodes, and the final LR value of each node corresponds to its spreading influence. Li et al. (2014) suggested the LR value of the ground node should not be evenly assigned. Nodes with larger in-degree should obtain more LR value from the ground node. For example, in social networks, the larger the in-degree of users is, the more followers they have, which can reflect users’ popularity. The improved update rule by considering the in-degree of the node is defined as
where for any node and the ground node , and denotes a free parameter. For any node pair and , when , otherwise.
The main idea of iteration-based algorithms is to aggregate high-order neighbors’ structural information in an iterative manner, which is relatively more effective than algorithms that directly exploit the distance information. However, neighbors’ contributions to the spreading influence of the target node are mainly quantified by degree values. Measuring the importance of each neighbor node in a more comprehensive manner might further improve the performance of iterative-based algorithms.
Liu et al. (2016a) presented the dynamic-sensitive (DS) algorithm by simulating the discrete susceptible-infected-recovered (SIR) model to measure the spreading influence of nodes. Specifically, the DS algorithm assumes that the activated node has the probability to activate inactive nodes, and the activated node has the probability to return to the inactive state. The probability that a node will be activated at time is as follows
where denotes the cumulative probability of the node being activated from time 1 to time , and I is the identity matrix. is further approximated as
Let when node is the initial activated node, and the spreading power of node can be calculated by
where . DS considers the network topology and the spreading dynamics.
As mentioned above, the MSB algorithms give less attention to the macro-level position of the node. However, Kitsak et al. (2010) found that a node’s position in the network can reflect its spreading influence more accurately than its degree and proposed -shell decomposition, which splits nodes into different layers by recursively removing nodes in accordance with their degree. A simple example of finding nodes belonging to shell layer 1 by -shell decomposition is shown in Fig.4. Specifically, the whole process is as follows: 1) Removing nodes with degree value 1 in the original network (Fig.4(a)), and a new network can be obtained (Fig.4(b)); 2) Continue to remove nodes with degree value 1 (Fig.4(b)); and 3) Repeat step 2 until all remaining nodes’ degree is larger than 1 (Fig.4(c)).
Although the -shell decomposition can identify nodes located in the core–shell of networks, the spreading influence of nodes in the same shell cannot be distinguished. In recent years, considerable research efforts have been made to improve this limitation. Specifically, the proposed ideas can be mainly classified into distance-based, degree-based, removing-order-based and edge-diversity-based (Zeng and Zhang, 2013; Ren et al., 2013a; Maji et al., 2020). Liu et al. (2013a) alleviated this issue by considering the length of shortest paths between the target node and core nodes, which is given as
where denotes the set of nodes in the core–shell layer and represents the set of nodes in the th shell layer. Instead of using the distance, Ma et al. (2014) borrowed the idea of “the rich get richer” and designed an algorithm that combined resource allocation dynamic and -shell decomposition to measure the spreading influence of nodes. In the initialization phase, every node will be assigned the same resources. Then each node will iteratively allocate its resources to neighbor nodes according to their -shell values. The number of resources allocated by node to node at time is defined as
where denotes the -shell value of node and denotes resources that node has at time . This iteration process will stop once the resource gain is lower than a given threshold. The following equation calculates the total resource value of node at time
By modifying the pruning rule of the -shell decomposition algorithm to be more fine-grained, Liu et al. (2015c) presented an improved -shell (IKs) algorithm. Specifically, the rule is changed to remove nodes with the smallest degree value in each iteration and assign the IKs value to each node. Based on the IKs algorithm, Zareie et al. (2019) proposed an algorithm that measures the spreading influence of nodes by considering the diversity of its neighbor nodes, and the spreading and the intensity of influence. In order to calculate the diversity of node ’s neighbor nodes, the Shannon entropy, , is introduced, which is defined as
where denotes the sum of IKs values of node ’s neighbors. The spreading and intensity of the node are quantified by Jensen-Shannon Divergence (JSD) (Lin, 1991), namely
where denotes the distribution of IKs values of node ’s neighbors. The diversity-strength centrality, , of node is calculated as follows
Although many nodes will be regarded as the same by -shell decomposition algorithm, these nodes may have different DC values. Bae and Kim (2014) used degree and -shell values of nodes’ neighbors to distinguish differences in the spreading influence of nodes located in the same shell layer and proposed neighborhood coreness ( ), which is defined as
Focusing on nodes with the largest -shell value, Lin et al. (2014) used the sum of neighbors’ -shell value to identify the most influential node of the networks, and proposed improved neighbors’ -core (INK) algorithm, which is given as
where is a tunable parameter.
In addition to the difference in degree value of nodes in the same layer, the order in which nodes located in the same shell are removed is also different. Wang et al. (2016) suggested that iteration factors of the -shell decomposition can be used to distinguish the spreading influence of nodes in the same shell layer. Compared with these early removed nodes, later removed nodes are closer to the core layer of the network and thus have stronger spreading influence. The mathematical formulation is defined as
where denotes the iteration round that node being removed and represents the total number of iteration rounds. Based on this, the influence capability of node , , is defined as
According to the order in which nodes are removed and nodes’ -shell values, Li et al. (2018) divided neighbors of each node into four categories according to the order in which nodes are removed and nodes’ -shell value: 1) The upper class contains the set of neighbors that have greater -shell values than the target node; 2) The equal upper class contains neighbors with the same -shell value of the target node, and the deletion order is the same or later than the target node; 3) The equal lower class represents the set of neighbor nodes that have the same -shell value as the target node, and the deletion order is the same as the deletion order of the target node or before the target node; and 4) The lower class contains neighbor nodes with smaller -shell value than the target node. Based on the numbers of different types of neighbors, the spreading capability of node , , is quantified as follows
where , , and are free parameters, and , , , and denote the number of neighbors belonging to upper, equal upper, equal lower, and lower class, respectively.
Except for the low resolution, the -shell decomposition algorithm cannot always guarantee a good performance across different types of networks. Liu et al. (2015a) found that nodes located in the core–shell layer can be further divided into true core and core-like groups in accordance with the edge diversity between these core nodes and nodes in other shell layers. Specifically, nodes in the true core group refer to nodes located in the core–shell layer and have a stronger spreading influence than nodes in other shell layers, and the node belongs to a core-like group otherwise. The edge density is used to judge whether core nodes belong to the true core group, which is defined as
where denotes the total number of shell layers of the network and represents the average edge strength of the th shell to the th shell. Inspired by this, Liu et al. (2015) presented a strategy to improve the accuracy of -shell decomposition, that is, removing redundant edges to better identify nodes belonging to the true core group by adding a weight to each edge and setting a threshold value before applying -shell decomposition. Specifically, the redundant edge refers to the edge that has relatively low spreading influence but may lead to form a core-like group, and the weight of each edge is related to the number of common neighbors between two connected nodes. The fewer the common neighbors are, the larger the weight is. Namtirtha et al. (2021) suggested that MSB and MASB algorithms have their advantages in networks with different connectivity strengths and discovered that -shell decomposition performs well in networks with strong connectivity and neighbor degree centrality is suitable for sparse networks. On this basis, network global structure-based centrality (NGSC) that combines -shell decomposition and neighbor degree centrality was proposed, which is defined as
where and are two free parameters that can be modified in accordance with the network’s connectivity strength to ensure that NGSC can obtain a stable performance in different types of networks. However, tuning parameters is a time-consuming procedure. To solve this issue, Maji (2020) has designed a parameter-free version of NGSC, that is
where . Ma et al. (2020) proposed an algorithm that simultaneously measures the spreading influence of the node from the viewpoints of its local and global spreading influence. The entropy of -shell values is introduced to quantify the global spreading influence of nodes, which is defined as
where denotes the set of -shell values of node ’s neighbors, and represents the number of nodes in the th shell layer. The local spreading influence of a node is measured on the basis of the assumption that the higher the similarity between the target node and its neighbors is, the higher the belonging of its neighbors to the target node is. The similarity between the target node and its neighbors is defined as
On the basis of the global and local importance, the final importance of node , that is, , is calculated as follows
where .
Although the combination of micro and macro structural attributes has significantly improved the generalizability of the identification algorithm of spreading influence nodes, the weight assigned to each attribute needs to be defined manually, limiting the introduction of more structural information.
Structural hole theory (Burt et al., 2013) suggested that nodes with strong control over the information flow in the entire network are spreading influence nodes. As shown in Fig.5, a structural hole node 1 exists in Fig.5(a) compared with three interconnected nodes in Fig.5(b). In Fig.5(a), if node 2 would like to transmit information to node 3 or node 4, then it requires the participation of node 1. If the edge between nodes 2 and 3 is added, the control ability of node 1 will decrease. The network constraint coefficient is used to decide whether the node is a structural hole, which is given as
However, the network constraint coefficient only considers the relations between the node and its first-order neighbors, which may lead to low resolution. Su and Song (2015) improved the network constraint coefficient by further considering the interactions between second-order neighbors, which is defined as
where denotes the sum of the degree of ’s neighbor nodes. Zhang et al. (2019a) used the improved version of network constraint coefficient (Su and Song, 2015) and network connectivity to measure the local and global importance of the node and proposed CumulativeRank. Specifically, the local spreading influence of node , denoted by , is defined as
The global importance of CumulativeRank of node , denoted by , is defined as the influence caused by removing the node on the network structure
where denotes the cost of removing node , represents the size of the largest connected component, and is the number of connected components after node is removed. The final spreading influence of the node is calculated as follows
where denotes the normalized global importance. Ullah et al. (2021) proposed a global structure model (GSM) by combining the self-importance and global spreading influence of the node, which is defined as
where the node’s self-importance is measured by its -shell value, and the global importance of the node is quantified by the distances between the node and other nodes in the network and -shell values. Inspired by the gravity law, Ma et al. (2016) first introduced the gravity formulation to measure the importance of nodes and proposed gravity centrality (GC), which sets the -shell value as the mass and the length of the shortest path as the distance. The mathematical formulation of GC is defined as
Li et al. (2019) thought that nodes with larger degree values are more influential than nodes with smaller degree values. The gravity model (GM) was proposed by considering the distance between a node and all other nodes and replacing the -shell values as degree values, which is given as
The GM algorithm considers all the paths between the target node and other nodes, thereby resulting in high computational complexity. To this end, Li et al. (2019) designed a local version of GM algorithm (LGM) by only considering the interactions between nodes within a local area, which is defined as
where is the truncation radius. Yan et al. (2020) considered multilevel structural attributes when evaluating the spreading influence of nodes. On the basis of GC, the mass is defined as the weighted sum of different centralities, where weights are calculated by entropy technology. In the real-world diffusion process, the distance from node to node is not necessarily the same as the distance from node to node . However, the Euclidean distance used in the GC assumes that these distances are equal, thereby leading to the dynamic information being ignored. Shang et al. (2021) introduced the effective distance (Brockmann and Helbing, 2013) to capture the hidden dynamic information rather than modifying the mass part of the GC, which is given as
where denotes the effective distance from node to node , which is defined as
If multiple paths are found between nodes and , the shortest path will be used.
The MASB algorithms can accurately identify spreading influence nodes in densely connected networks because they focus on the macro-level structural attributes of networks, such as the position and the distance. However, in the era of the explosive growth of information and data, researchers are often faced with large-scale networks, making the use of MASB algorithms to be more limited. Specifically, the challenges of the MASB algorithms are as follows.
For the node ranking problem, the -shell-based algorithms pay more attention to the difference in the spreading influence of nodes located in the same shell. Studies have shown that a group of core-like nodes exists, which have a high coreness, but they are not the spreading influence nodes. Identifying these core-like nodes can help filter the spreading influence nodes more accurately. The GM has received increasing attention from researchers in the identification of spreading influence nodes due to its great potential in measuring the spreading influence of nodes. However, the distance calculation is time consuming. Thus, improving the effectiveness of gravity-model-based algorithms is still a challenge.
Researchers have developed spreading influence identification algorithms with high generalizability by considering the micro and macro structural attributes simultaneously. However, the weights assigned to each attribute need to be predefined when using this type of algorithms, which is time consuming. How to solve this limitation is also important in the future.
The advantages and disadvantages of representative MASB algorithms and the main method streams mentioned in this section are presented in Tab.6 and Tab.7.
6 MLB algorithms
Driven by growing demands of using graph form datasets to solve real-world problems (Peng et al., 2019) and the increasing trend of interdisciplinary cooperation, the integration of machine learning and network science has received increasing attention from researchers in the two fields (Belkin and Niyogi, 2003). Studies of network science have built a solid foundation for researchers to better use graph form datasets (Peng et al., 2019). Machine learning models can dig out more network topological information, thereby providing strong support for studies of network science (Silva and Zhao, 2012). In recent years, the MLB algorithms have begun to appear in many branches of network science, including node classification (Hall et al., 2009), link prediction (Zhang and Chen, 2018; Chen et al., 2021a), and network statistical feature extraction (Sacchet et al., 2014). As one of the core issues in the study of social networks, machine learning models have been introduced in the research on the identification of spreading influence nodes. A number of the MLB algorithms that can achieve promising performance have been developed in recent years. The MLB algorithms can be mainly divided into two categories: The statistical machine learning-based (SMLB) algorithms and deep learning-based (DLB) algorithms.
Unlike the three different types of algorithms mentioned above, the MLB algorithm aims to train a model by a given dataset so that it can be used to predict the spreading influence of nodes or to judge whether a node is important in unseen networks. The statistical machine learning models, such as the support vector machine (SVM), decision tree, linear regression, and logistic regression, can consider multiple structural attributes at once when identifying spreading influence nodes. However, the models’ accuracy will not be necessarily improved as more attributes are used. Therefore, feature selection is a key procedure to ensure stable performance of the SMLB algorithms. Bucur (2020) found that combining two complementary centralities to identify spreading influence nodes can achieve higher accuracy than using a single centrality. Hu et al. (2019) used principal component analysis (Moore, 1981) to test different centralities’ contributions, including DC, BC, CC, clustering coefficient, HITS value, Laplacian centrality (Qi et al., 2013), and network constraint coefficient, to the identification accuracy of spreading influence nodes and discovered that the Laplacian centrality and network constraint coefficient are important in all seven different types of networks. Han et al. (2015) set the network constraint coefficient, BC, hierarchy, efficient, the network size, PageRank, and clustering coefficient as the input of the ListNet algorithm (Cao et al., 2007) to evaluate the spreading influence of nodes. Zhao et al. (2020a) transformed the identification task of spreading influence nodes into a classification problem rather than viewing it as a regression problem and used nine centralities and the infection rate of the SIR model to train classification models, including random forest, SVM, and Naive Bayes. The simulation result of the SIR model is numerical, which is unsuitable to be directly used as classification labels. Therefore, labels were obtained by using the following equation
where denotes the infected scale of node acquired by simulating the SIR model, represents the minimum spreading scale among all nodes, and , where is the number of labels. Ivanov et al. (2018) directly labeled the nodes as the seed node and the nonseed node, and proposed a framework that can find other seed nodes based on given seed nodes. The framework contains the following steps: 1) Map nodes into low-dimensional vectors on the basis of network embedding algorithms; 2) Train the classification model on the basis of positive and negative samples; and 3) Use the trained classification model to find nodes with the highest probability of being the positive sample to complete the seed set. Specifically, positive samples refer to given seed nodes, and negative samples are selected from nonseed nodes in accordance with their degree. The smaller the degree of the node is, the higher the probability that the node is a negative sample.
Intuitively, information can be spread more easily and efficiently between densely connected nodes (Freeman et al., 1991). To measure the connectivity of the local area that a node is located rather than using the length of the shortest path between nodes, Yang and Xiong (2021) used Euclidean distance between nodes by introducing DeepWalk, a network embedding algorithm, to map nodes into low-dimensional vectors. The spreading influence of node , , is defined as
where denotes the low-dimensional representation of node .
To enhance the robustness of interference-sensitive algorithms, such as -shell decomposition, Tixier et al. (2019) was inspired by the idea of ensemble learning and presented a strategy that will first generate multiple perturbed networks on the basis of the original network, then calculate the spreading influence of nodes in all perturbed networks and the original network independently, and rank nodes by the average spreading influence of nodes in all networks. This strategy has effectively increased the robustness of methods such as PageRank and -shell, while keeping low computational complexity.
In addition to network topology attributes, the spreading influence of users in online social platforms can be described by nontopological features, such as their occupation, age, and content of their posts. Nargundkar and Rao (2016) measured the spreading influence of Twitter users by feeding the linear regression model with nontopological features, such as the number of posts and reposts of Twitter users.
As mentioned above, feature engineering is required before training SMLB algorithms, which is extremely time consuming. With the rapid development of deep learning, the end-to-end deep learning model has become a preferred tool when dealing with the identification task of spreading influence node because DLB algorithms can finish feature selection automatically. The graph form data are not Euclidean data such as images and audios, to which deep learning models, such as the convolutional neural network (CNN) and recurrent neural network (RNN), can be directly applied. The graph neural networks (GNNs), which attempt to extend the use of CNN and RNN on graph form data, were proposed to address this challenge. Specifically, the GNNs can be further classified as the recurrent GNNs, the convolutional GNNs (ConvGNNs), and the spatial–temporal GNNs (Wu et al., 2021). In the identification of spreading influence nodes, the ConvGNNs are widely used owing to their promising performance in processing graph form data. From the perspective of node embedding, ConvGNNs can be divided into transductive learning-based and inductive learning-based. The transductive learning-based model learns node embeddings for a given network, which must be retrained once the network structure changes. The inductive learning-based model can learn embeddings of unseen nodes after the training step. Niepert et al. (2016) proposed PATCHY-SAN, a transductive learning-based algorithm that extends the use of the CNN to complex networks. The graph convolutional networking (GCN) (Kipf and Welling, 2016) uses the normalized Laplacian matrix of the network as the parameter to aggregate the information of neighbors to learn the low-dimensional representation of the node. Different from the abovementioned transductive learning-based methods, GraphSAGE (Hamilton et al., 2017) is an inductive learning-based method that aims to learn an aggregator that can aggregate the structure information of neighbor nodes so that it can generate node embeddings for unseen networks. Among the ConvGNNs, the GCN algorithm has been used by many GNN-based identification algorithms owing to its simplicity and effectiveness. Wang et al. (2019) designed a DLB algorithm called influence deep learning (IDL) for social networks, which considers the topological and the action logs of the social network users when evaluating their spreading influence. Specifically, IDL samples a fixed-size subnetwork for each node based on the action logs of users by using the random walk method. The pretrained network embedding method is then used to obtain the low-dimensional representation of each node and feeds these vectors into GCN to generate the trained embeddings for each node for predicting the spreading influence of nodes. The instance normalization technique is adopted to make the algorithm focus on the relative position of the node rather than the absolute position. The normalized representation of node , , is defined as
where denotes the low-dimensional vector of user , and represent the mean and variance of user representation vectors, respectively, and is a small number for numerical stability. The framework of IDL is shown in Fig.6.
Zhao et al. (2020b) generated the neighborhood network for each node by using the breadth-first-search (BFS) algorithm rather than using random walk. The normalized Laplacian matrix of the neighborhood network, DC, BC, CC, and clustering coefficient are the input of GCN for learning the embeddings of nodes. The output of GCN is used as the input of a fully connected neural network to predict the spreading influence of nodes. The discrimination capabilities of labels under different infection rates are tested by using Eq. (86) to generate high-quality labels based on the SIR model, because the chosen has a significant influence on evaluating the spreading influence of nodes.
where and represent the influence of the high influence and low influence groups, respectively, and denote the largest and smallest influence, respectively, and is the percentage of the high influence groups. Yu et al. (2020) used the BFS algorithm to extract the fixed-size neighborhood network for each node, re-encoded the nodes of the neighborhood network in the order in which they are selected, and transformed the adjacency matrix of the neighborhood network in accordance with Eq. (87) to generate the input of each node.
where denotes number of nodes in each neighborhood network. When the input matrix of each node is obtained, these transformed matrices will be fed into a CNN model (RCNN) to train a spreading influence prediction model. The framework of RCNN is shown in Fig.7. Although RCNN algorithm is applicable to large-scale networks, it uses only the degree information. Ou et al. (2022) improved the performance of RCNN by constructing a three-channel input for each node that preserves the micro, community and macro attributes.
The architecture of the ConvGNN-based algorithms can be simplified as Fig.8. Specifically, the details of the architecture are as follows: 1) Extract the neighborhood network for each node from the training network by using strategies, such as the random walk or BFS; 2) Construct the input vector for each node based on its topological attributes or nontopological attributes; 3) Obtain the label of each node (if the ground truth labels are available, this step is not required; otherwise, the labels can be obtained by using diffusion models); 4) Generate node embeddings via GCN layers; 5) Predict the spreading influence of each node by using fully-connected networks; 6) Calculate the loss by comparing the predictions with the labels; 7) Optimize the parameters of the models based on the loss of the model; and 8) Test the performance of the trained model on other networks. The ConvGNN-based algorithms have shown their great potential for measuring the spreading influence of nodes. However, existing algorithms give less attention in balancing their efficiency and accuracy. The performance of the ConvGNN-based algorithms, such as RCNN, will be influenced by the structural difference in training and test networks.
Reinforcement learning (RL) has attracted increasing attention. Fan et al. (2020) adopted the RL technique to evaluate spreading influence. The objective function considering the network connectivity is defined as
where denotes the connectivity of the network . Equation (88) measures the change in network connectivity after removing selected nodes. The smaller the value is, the more important the selected node is. The basic idea of this algorithm is to transform the node importance identification into a Markov decision process, which lets the agent interact with the environment based on a series of states and actions for maximizing rewards. In the identification task of spreading influence nodes, the environment is the target network, the state is the residual network, the actions are activating or removing the selected node, and the reward is the decrease in . The network embedding algorithm was introduced to solve state and action representation problems. The experimental result shows that this algorithm can achieve high accuracy in real-world networks after training on 200000 small-scale artificial random networks.
The advantage of machine learning models in processing multiple features enables the MLB algorithms to achieve a high identification accuracy. However, the following challenges still need to be addressed in the future.
First, the spreading influence of nodes in social networks depends on the topological and nontopological attributes of nodes. However, most existing MLB algorithms only consider the topological attributes of nodes. Therefore, introducing the nontopological information when identifying the spreading influence of nodes via MLB algorithms can be a future development direction of the MLB algorithms.
Second, most MLB algorithms are supervise learning-based, which requires labels of nodes. Diffusion models, such as SIR (Hethcote, 2000), IC (Kempe et al., 2003), and linear threshold (LT) (Kempe et al., 2003), are commonly used to generate the label. However, this strategy will cost considerable time and computational resources when dealing with large-scale networks.
Third, the performance of the MLB algorithms will be unstable when a significant difference is found between the structure of the training network and the test network. How to use small-scale networks to train such models that can achieve strong generalization capabilities is a key issue that needs to be solved in the future.
Finally, the SMLB and DLB algorithms attempt to enhance the identification accuracy of spreading influence nodes by using a set of structural attributes, such as DC, BC, and CC. However, the computational complexity of calculating multiple structural attributes is high, limiting the use of the SMLB and DLB algorithms in large-scale networks. Thus, more attention should be paid to balancing the accuracy and efficiency when designing MLB algorithms. The advantages and disadvantages of representative MLB algorithms and the two main method streams of MLB algorithms introduced in this section are listed in Tab.8 and Tab.9.
7 Diffusion models
We do not know which nodes are important in social networks, otherwise, we do not need to identify the spreading influence of nodes. Therefore, diffusion models are used to generate approximations of nodes’ real spreading influence and to test the performance of algorithms. Given that diffusion models are designed on the basis of different spreading mechanisms (Cohen, 1992; Fu et al., 2020), choosing an appropriate diffusion model that can best describe the spreading process of the target task is important to ensure that test results are credible. In this section, we introduce diffusion models that are widely used in studies of identification of spreading influence nodes, such as the susceptible-infected (SI) model (Barabási and Albert, 1999), the susceptible-infected-susceptible (SIS) model (Cohen, 1992), SIR model (Hethcote, 2000), LT model (Kempe et al., 2003), and IC model (Kempe et al., 2003).
7.1 SI model
As one of the classic epidemic models, the SI model sets each node in one of the two states: The susceptible state and the infected state . In the initialization phase, seed nodes will be set as infected. When the diffusion starts, the susceptible nodes will be infected by their infected neighbors with an infection rate . The propagation process stops until no newly infected node is found, and the number of total infected nodes is the overall influence of seed nodes.
7.2 SIS model
The SIS model considers the possibility of infected nodes becoming susceptible nodes again based on the SI model. The SIS model sets each node to be in one of the two states: The susceptible state and the infected state . The only difference between the SIS and SI models is that the infected nodes may become susceptible nodes again.
7.3 SIR model
On the basis of the SI model, the SIR model adds the recovered state. Specifically, the SIR model sets each node to be in one of three states, which are the susceptible state , the infected state , and the recovered state . After the diffusion process starts, susceptible nodes will be infected by their infected neighbor nodes with the infection rate , and infected nodes will recover with probability .
7.4 LT model
The LT model was designed to simulate the diffusion process in directed networks. Specifically, each edge of the directed network will be assigned a weight. For example, is the weight of the edge pointing from node to node , which represents the spreading influence of node to node among all node ’s in-neighbors. In the initialization phase, seed nodes will be set to either activated nodes or inactivated nodes. When the diffusion process starts, the inactivated node will be activated if the sum of weights of edges between the node and all its activated in-neighbors is greater than a given threshold . The diffusion process stops until no newly activated node is found.
7.5 IC model
The IC model was also designed for directed networks. In the initialization phase, seed nodes will be set to either activated or inactivated nodes. After the propagation starts, inactivated nodes will be activated by their activated in-neighbors with probability . If an inactive node has multiple activated in-neighbor nodes, these in-neighbor nodes will independently try to activate the node in random order. The propagation process stops when no new activated node is found.
7.6 Weighted IC model
The weighted IC model (Palla et al., 2005) is a weighted version of the IC model. During the diffusion process, assuming the inactive node is the out-neighbor of the activated node at time , node will be activated by node with the probability of at time . If the inactive node has active in-neighbors at time , the node will be activated at time with the probability of .
7.7 Conformity-aware IC model
Conformity awareness plays a vital role in spreading information, opinion, and beliefs in the real world. For example, online users are more likely to repost the information that most users believe in the social platform. On this basis, Li et al. (2013) proposed the conformity-aware IC model. Specifically, assuming the inactive node is the out-neighbor of the active node at time , node will be activated by node with the probability
where denotes the influence of node , and represents the conformity of node .
8 Performance evaluation metrics
Evaluation metrics are needed to compare the ranking results obtained by diffusion models and identification algorithms. This process is required to test the accuracy of the identification algorithms of spreading influence nodes. This section introduces eight widely used evaluation metrics in this research field.
8.1 Average spreading influence
Comparing the average influence of top ( ) most influential nodes identified by different identification algorithms (Chen et al., 2013; Berahmand et al., 2018) can measure algorithms’ performance to some extent. The average spreading influence of nodes identified by a specific algorithm, , is defined as
where denotes the set of seed nodes identified by using a specific algorithm, is the total number of nodes, and represents the spreading influence of node .
8.2 Influence scale
The influence scale can reflect the change in influence of nodes identified by a specific algorithm over time, which is defined as
where and denote the number of infected and recovered nodes at time , respectively.
8.3 Imprecision function
The imprecision function (Kitsak et al., 2010) is introduced to quantify the difference in the average spreading scales between top nodes identified by the identification algorithm and the most efficient spreaders identified by diffusion models ( ). The spreading efficiency is defined as the number of nodes infected by node , denotes the set of top nodes selected in accordance with the spreading efficiency, and represents the set of top nodes identified by the identification algorithm of spreading influence nodes. The imprecision value of is defined as
where and denote the average influence of and , respectively. The closer the imprecision value is to 0, the closer the average influence of the node set identified by is to the average influence of the most influential node set identified by diffusion models.
8.4 Relative difference of spreading scales
The relative difference of spreading scales (Knight, 1966; Zhao et al., 2014a) between two sets of top most important nodes identified by two different node importance identification algorithms is defined as
where denotes the total influence of the set of seed nodes identified by algorithm . The total influence of seed nodes identified by algorithm is greater than that by algorithm when .
8.5 Kendalls’ correlation coefficient
The Kendalls’ correlation coefficient (Wang et al., 2016) is used to measure the similarity of two ordered lists and is widely utilized when testing the performance of identification algorithms. Assuming two ordered lists and , each of which contains elements. denotes the th element pair of and . When any two element pairs of and have the same ranking, such as and or and , the two element pairs are a concordant pair, otherwise, they are a discordant pair. The Kendalls’ correlation coefficient is calculated in accordance with the number of concordant pairs and discordant pairs of two ordered lists, which is defined as
where C and D denote the number of concordant pairs and discordant pairs, respectively, and is the total number of elements in each order list. The closer the Kendalls’ coefficient is to 1, the more similar the two ordered lists are. Another evaluation metric that has a similar function as the Kendalls’ coefficient is the Jaccard correlation coefficient (Wang et al., 2018), which is given as
where and represent the seed nodes selected by the identification algorithm and the most influential nodes acquired by stimulating the diffusion model, respectively.
8.6 Monotonicity
The monotonicity (Wang et al., 2016) is used to measure the uniqueness of nodes’ rank obtained by the identification algorithms of spreading influence nodes, which is given as
where denotes the number of nodes assigned to rank , represents a spreading influence nodes identification algorithm, contains all unique values obtained by applying , and is the total number of nodes. The value of is between 0–1. The closer the value is to 1, the fewer nodes are assigned to the same level.
The complementary cumulative distribution function (CCDF) (Li et al., 2018; Zareie et al., 2019) has a similar function as monotonicity, which describes the distribution of nodes in different rankings. The mathematical formulation is defined as
where the cumulative distribution function denotes the probability that the node’s rank is less than or equal to .
9 Discussion and conclusions
In this review, we briefly summarized the recent progress of studies on the identification of spreading influence nodes, emphasizing the applications of the identification algorithms of spreading influence nodes in social networks. An increasing number of novel algorithms based on new techniques resulting from the studies of other research fields, especially ideas and tools from community detection and machine learning, have emerged in recent years due to the confluence of improved computational capabilities, the explosive growth of new datasets, increasing trend of interdisciplinary development, and fast-changing demands.
No single algorithm can achieve stable performance in all types of networks (Lü et al., 2016; Bucur, 2020; Namtirtha et al., 2021). An in-depth understanding of social network structural attributes that affect algorithmic performance can be considered the guide for the choice of algorithms when the accuracy and complexity have to be considered. Specifically, the MSB algorithms may be a good choice if the social network is large and sparse because they can achieve relatively high accuracy with extremely low computational complexity. However, if nodes are densely connected and the size of the social network is small, then the MSB algorithms may perform not as well as expected because they mainly focus on the edge density within a local area of nodes. In such cases, we can try to use CSB algorithms or MASB algorithms to utilize community-level and macro-level information. For social networks with a clear modular structure, the CSB algorithms may provide highly accurate identification results and an in-depth understanding of how information is actually spread between nodes belonging to different communities. The spreading influence of nodes in social networks can be determined by topological and nontopological attributes, such as interaction frequency and the number of reposts and comments, making the study of the identification of spreading influence nodes a natural place to apply machine learning models. Compared with the micro-macro-based MASB algorithms, which require weights assigned to each attribute to be predefined, the MLB algorithms are more suitable for identifying spreading influence nodes based on multiple attributes. To sum up, the advantages and disadvantages of MSB, CSB, MASB, and MLB algorithms are summarized in Tab.10. Although great advancement has been made in recent years, a number of unsolved problems that can affect the future development on the identification of spreading influence nodes still exist.
First, as discussed previously, the performance of the identification algorithms of spreading influence nodes is highly correlated with the structural attributes of social networks, making the benchmark datasets for comparing the accuracy of different types of algorithms a prerequisite to ensure the credibility of testing results. However, the performance of most existing algorithms is tested on different networks due to the lack of unifying datasets, which may lead to an unwanted outcome that only the good aspects of the algorithm are reported, thereby hindering their application.
Second, another huge challenge is the identification of spreading influence nodes in temporal networks, where connections between nodes will change over time. Although algorithms designed for static networks can achieve relatively high accuracy, the temporal networks are more in line with real-world situations. Therefore, further improving these algorithms so that they can be applied in temporal networks will be worth paying attention to.
Third, the algorithms’ performance can be significantly enhanced by considering the structural attributes and the nonstructural attributes of nodes. In social networks, the spreading influence of individuals can be influenced by their occupations, social status, and online reputations. However, most of the existing algorithms mainly focused on the network’s topological attributes and did not use nontopological information. The machine learning model might be an appropriate choice to consider multiple features at once while identifying important social network nodes.
Finally, although MLB algorithms can predict the node’s spreading influence by simultaneously considering different features of the node, the learning strategy of these algorithms is supervised learning, which requires labels to be generated by information diffusion models or disease diffusion models. Training these models on large-scale networks with more than millions of nodes is costly due to limited time and resources. Therefore, unsupervised learning-based algorithms may become new hot spot in the near future. The distribution between the training data and test data will affect the performance of machine learning models. However, the influence of the structural differences in training networks and test networks on the performance of MLB algorithms has not been well studied.
Albert,RJeong,HBarabási,A L ( 1999). Diameter of the World-Wide Web. Nature, 401( 6749): 130– 131
[2]
Bae,JKim,S ( 2014). Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Physica A: Statistical Mechanics and Its Applications, 395: 549– 559
[3]
Bao,Z KLiu,J GZhang,H F ( 2017). Identifying multiple influential spreaders by a heuristic clustering algorithm. Physics Letters A, 381( 11): 976– 983
[4]
Barabási,A LAlbert,R ( 1999). Emergence of scaling in random networks. Science, 286( 5439): 509– 512
Belkin,MNiyogi,P ( 2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15( 6): 1373– 1396
[7]
Berahmand,KBouyer,ASamadi,N ( 2018). A new centrality measure based on the negative and positive effects of clustering coefficient for identifying influential spreaders in complex networks. Chaos, Solitons, and Fractals, 110: 41– 54
[8]
Bertozzi,A LFranco,EMohler,GShort,M BSledge,D ( 2020). The challenges of modeling and forecasting the spread of COVID-19. Proceedings of the National Academy of Sciences of the United States of America, 117( 29): 16732– 16738
[9]
Boguñá,MPastor-Satorras,RDíaz-Guilera,AArenas,A ( 2004). Models of social networks based on social distance attachment. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 70( 5): 056122
[10]
Bonacich,P ( 1972). Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology, 2( 1): 113– 120
[11]
Borge-HolthoeferJMorenoY ( 2012). Absence of influential spreaders in rumor dynamics. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 85(2): 026116
[12]
Brin,SPage,L ( 1998). The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International Conference on World Wide Web. Brisbane: Association for Computing Machinery, 107– 117
[13]
Brockmann,DHelbing,D ( 2013). The hidden geometry of complex, network-driven contagion phenomena. Science, 342( 6164): 1337– 1342
[14]
Bucur,D ( 2020). Top influencers can be identified universally by combining classical centralities. Scientific Reports, 10( 1): 20550
[15]
Burt,R SKilduff,MTasselli,S ( 2013). Social network analysis: Foundations and frontiers on advantage. Annual Review of Psychology, 64( 1): 527– 547
[16]
Buyalskaya,AGallo,MCamerer,C F ( 2021). The golden age of social science. Proceedings of the National Academy of Sciences of the United States of America, 118( 5): e2002923118
[17]
Campan,ACuzzocrea,ATruta,T M ( 2017). Fighting fake news spread in online social networks: Actual trends and future research directions. In: IEEE International Conference on Big Data. Boston, MA, 4453– 4457
[18]
CantwellG TNewmanM E J ( 2019). Mixing patterns and individual differences in networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 99(4): 042306
[19]
CaoZQinTLiuT YTsaiM FLiH (2007). Learning to rank: From pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning. Corvallis, OR: Association for Computing Machinery, 129– 136
[20]
Chen,D BGao,HLü,LZhou,T ( 2013). Identifying influential nodes in large-scale directed networks: The role of clustering. PLoS One, 8( 10): e77455
[21]
Chen,D BLü,L YShang,M SZhang,Y CZhou,T ( 2012). Identifying influential nodes in complex networks. Physica A: Statistical Mechanics and Its Applications, 391( 4): 1777– 1787
[22]
Chen,D BSun,H LTang,QTian,S ZXie,M ( 2019). Identifying influential spreaders in complex networks by propagation probability dynamics. Chaos, 29( 3): 033120
[23]
Chen,J YZhang,JXu,X HFu,C BZhang,DZhang,Q PXuan,Q ( 2021a). E-LSTM-D: A deep learning framework for dynamic network link prediction. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51( 6): 3699– 3712
[24]
Chen,SRen,Z MLiu,CZhang,Z K ( 2020). Identification methods of vital nodes on temporal network. Journal of University of Electronic Science and Technology of China, 49( 2): 291– 314 (in Chinese)
[25]
Chen,WWang,Y JYang,S Y ( 2009). Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, 199– 208
[26]
Chen,YGuo,QLiu,MLiu,J G ( 2021b). Improved gravity model for identifying the influential nodes. Europhysics Letters, 136( 6): 68004
[27]
Chen,Y CZhu,W YPeng,W CLee,W CLee,S Y ( 2014). CIM: Community-based influence maximization in social networks. ACM Transactions on Intelligent Systems and Technology, 5( 2): 1– 31
[28]
Cohen,J E ( 1992). Infectious diseases of humans: Dynamics and control. Journal of the American Medical Association, 268( 23): 3381
[29]
Dai,J YWang,BSheng,J FSun,Z JKhawaja,F RUllah,ADejene,D ADuan,G H ( 2019). Identifying influential nodes in complex networks based on local neighbor contribution. IEEE Access, 7: 131719– 131731
[30]
Dai,LGuo,QLiu,X LLiu,J GZhang,Y C ( 2018). Identifying online user reputation in terms of user preference. Physica A: Statistical Mechanics and Its Applications, 494: 403– 409
[31]
Dong,GWang,FShekhtman,L MDanziger,M MFan,JDu,RLiu,JTian,LStanley,H EHavlin,S ( 2021). Optimal resilience of modular interacting networks. Proceedings of the National Academy of Sciences of the United States of America, 118( 22): e1922831118
[32]
Dorogovtsev,S NGoltsev,A VMendes,J F F ( 2008). Critical phenomena in complex networks. Reviews of Modern Physics, 80( 4): 1275– 1335
[33]
Fan,CZeng,LSun,YLiu,Y Y ( 2020). Finding key players in complex networks through deep reinforcement learning. Nature Machine Intelligence, 2( 6): 317– 324
[34]
Freeman,L C ( 1977). A set of measures of centrality based on betweenness. Sociometry, 40( 1): 35– 41
[35]
Freeman,L C ( 1978). Centrality in social networks conceptual clarification. Social Networks, 1( 3): 215– 239
[36]
Freeman,L CBorgatti,S PWhite,D R ( 1991). Centrality in valued graphs: A measure of betweenness based on network flow. Social Networks, 13( 2): 141– 154
[37]
Fu,J QLiu,MDeng,C YHuang,JJiang,M ZGuo,QLiu,J G ( 2020). Spreading model of the COVID-19 based on the complex human mobility. Journal of University of Electronic Science and Technology of China, 49( 3): 383– 391 (in Chinese)
[38]
GalstyanACohenP ( 2007). Cascading dynamics in modular networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 75(3): 036109
[39]
Galvão,VMiranda,J GAndrade,R FAndrade,Jr J SGallos,L KMakse,H A ( 2010). Modularity map of the network of human cell differentiation. Proceedings of the National Academy of Sciences of the United States of America, 107( 13): 5750– 5755
[40]
Gao,SMa,JChen,Z MWang,G HXing,C M ( 2014). Ranking the spreading ability of nodes in complex networks based on local structure. Physica A: Statistical Mechanics and Its Applications, 403: 130– 147
[41]
Ghalmane,ZCherifi,CCherifi,HHassouni,M E ( 2019a). Centrality in complex networks with overlapping community structure. Scientific Reports, 9( 1): 10133
[42]
Ghalmane,ZEl Hassouni,MCherifi,CCherifi,H ( 2019b). Centrality in modular networks. EPJ Data Science, 8( 1): 15
[43]
Girvan,MNewman,M E J ( 2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99( 12): 7821– 7826
[44]
GuimeràRDanonLDíaz-GuileraAGiraltFArenasA ( 2003). ELF-similar community structure in a network of human interactions. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 68( 6): 065103
[45]
Guo,CYang,LChen,XChen,DGao,HMa,J ( 2020). Influential nodes identification in complex networks via information entropy. Entropy, 22( 2): 242– 260
[46]
Guo,QYin,R RLiu,J G ( 2019). Node importance identification for temporal networks via the TOPSIS method. Journal of University of Electronic Science and Technology of China, 48( 2): 296– 300 (in Chinese)
[47]
Halappanavar,MSathanur,A VNandi,A K ( 2016). Accelerating the mining of influential nodes in complex networks through community detection. In: Proceedings of the ACM International Conference on Computing Frontiers. Como, 64– 71
[48]
Hall,MFrank,EHolmes,GPfahringer,BReutemann,PWitten,L H ( 2009). The WEKA data mining software: An update. SIGKDD Explorations, 11( 1): 10– 18
[49]
HamiltonW LYingRLeskovecJ (2017). Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA: Curran Associates Inc., 1025– 1035
Hethcote,H W ( 2000). The mathematics of infectious diseases. SIAM Review, 42( 4): 599– 653
[52]
Hou,LLiu,J GPan,XWang,B H ( 2014). A social force evacuation model with the leadership effect. Physica A: Statistical Mechanics and Its Applications, 400: 93– 99
[53]
Hu,GXu,XZhang,W MZhou,Y ( 2019). Contribution analysis for assessing node importance indices with principal component analysis. Acta Electronica Sinica, 47( 2): 358– 365 (in Chinese)
[54]
Hu,YJi,SJin,YFeng,LStanley,H EHavlin,S ( 2018). Local structure can identify and quantify influential global spreaders in large scale social networks. Proceedings of the National Academy of Sciences of the United States of America, 115( 29): 7468– 7472
Ivanov,SDurasov,NBurnaev,E ( 2018). Learning node embeddings for influence set completion. In: IEEE International Conference on Data Mining Workshops. Singapore, 1034– 1037
[57]
Jeong,HMason,S PBarabási,A LOltvai,Z N ( 2001). Lethality and centrality in protein networks. Nature, 411( 6833): 41– 42
[58]
Jia,J SLu,XYuan,YXu,GJia,JChristakis,N A ( 2020). Population flow drives spatio–temporal distribution of COVID-19 in China. Nature, 582( 7812): 389– 394
[59]
Kempe,DKleinberg,JTardos,E ( 2003). Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Washington, D.C., 137– 146
Kitsak,MGallos,L KHavlin,SLiljeros,FMuchnik,LStanley,H EMakse,H A ( 2010). Identification of influential spreaders in complex network. Nature Physics, 6( 11): 888– 893
[62]
Kleinberg,J M ( 1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46( 5): 604– 632
[63]
Klimt,BYang,Y ( 2004). The Enron Corpus: A new dataset for email classification research. In: Proceedings of the 15th European Conference on Machine Learning. Berlin: Springer, 217– 226
[64]
Knight,W R ( 1966). A computer method for calculating Kendall’s τ with un-grouped data. Journal of the American Statistical Association, 61( 314): 436– 439
[65]
Kumar,ASnyder,M ( 2002). Protein complexes take the bait. Nature, 415( 6868): 123– 124
[66]
Kumar,SPanda,B S ( 2020). Identifying influential nodes in social networks: Neighborhood coreness based voting approach. Physica A: Statistical Mechanics and Its Applications, 553: 124215
[67]
Kunegis,J ( 2016). KONECT: The Koblenz network collection. In: Proceedings of the 22nd International Conference on World Wide Web. Rio de Janeiro: Association for Computing Machinery, 1343– 1350
[68]
Leskovec,JKleinberg,JFaloutsos,C ( 2007). Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data, 1( 1): 2
[69]
Leskovec,JLang,K JDasgupta,AMahoney,M W ( 2009). Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6( 1): 29– 123
[70]
Liben-Nowell,D LKleinberg,J ( 2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58( 7): 1019– 1031
[71]
Li,CWang,LSun,S WXia,C Y ( 2018). Identification of influential spreaders based on classified neighbors in real-world complex networks. Applied Mathematics and Computation, 320: 512– 523
[72]
LiHBhowmickS SSunA X ( 2013). CINEMA: Conformity-aware greedy algorithm for influence maximization in online social networks. In: Proceedings of the 16th International Conference on Extending Database Technology. Genoa: Association for Computing Machinery, 323– 334
[73]
Li,QZhou,TLü,L YChen,D B ( 2014). Identifying influential spreaders by weighted LeaderRank. Physica A: Statistical Mechanics and Its Applications, 404: 47– 55
Liu,J GRen,Z MGuo,Q ( 2013a). Ranking the spreading influence in complex networks. Physica A: Statistical Mechanics and Its Applications, 392( 18): 4154– 4159
[79]
Liu,J GRen,Z MGuo,QWang,B H ( 2013b). Node importance ranking of complex networks. Acta Physica Sinica, 62( 17): 178901
[80]
Liu,J GWang,Z YGuo,QGuo,LChen,QNi,Y Z ( 2017a). Identifying multiple influential spreaders via local structural similarity. Europhysics Letters, 119( 1): 18001
[81]
Liu,J QLi,X RDong,J C ( 2021). A survey on network node ranking algorithms: Representative methods, extensions, and applications. Science China Technological Sciences, 64( 3): 451– 461
[82]
Liu,X LLiu,J GYang,KGuo,QHan,J T ( 2017b). Identifying online user reputation of user-object bipartite networks. Physica A: Statistical Mechanics and Its Applications, 467: 508– 516
[83]
Liu,YTang,MZhou,TDo,Y ( 2015a). Core-like groups result in invalidation of identifying super-spreader by k-shell decomposition. Scientific Reports, 5( 1): 9602
[84]
Liu,YTang,MZhou,TDo,Y ( 2015b). Improving the accuracy of the k-shell method by removing redundant links: From a perspective of spreading dynamics. Scientific Reports, 5( 1): 13172
[85]
Liu,YTang,MZhou,TDo,Y ( 2016b). Identify influential spreaders in complex networks, the role of neighborhood. Physica A: Statistical Mechanics and Its Applications, 452: 289– 298
[86]
Liu,Z HJiang,CWang,J YYu,H ( 2015c). The node importance in actual complex networks based on a multi-attribute ranking method. Knowledge-Based Systems, 84: 56– 66
[87]
Lou,T CTang,J ( 2013). Mining structural hole spanners through information diffusion in social networks. In: Proceedings of the 22nd International Conference on World Wide Web. Rio de Janeiro: Association for Computing Machinery, 825– 836
[88]
Lü,LZhang,Y CYeung,C HZhou,T ( 2011). Leaders in social networks, the delicious case. PLoS One, 6( 6): e21202
Lusseau,DSchneider,KBoisseau,O JHaase,PSlooten,EDawson,S M ( 2003). The bottlenose Dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54( 4): 396– 405
[91]
Ma,L LMa,CZhang,H FWang,B H ( 2016). Identifying influential spreaders in complex networks based on gravity formula. Physica A: Statistical Mechanics and Its Applications, 451: 205– 212
[92]
Ma,S JRen,Z MYe,C MGuo,QLiu,J G ( 2014). Node influence identification via resource allocation dynamics. International Journal of Modern Physics C, 25( 11): 1450065
[93]
Ma,T HLiu,QCao,JTian,YAl-Dhelaan,AAl-Rodhaan,M ( 2020). LGIEM: Global and local node influence based community detection. Future Generation Computer Systems, 105: 533– 546
[94]
MacqueenJ (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, CA: University of California Press, 281– 297
[95]
Maji,G ( 2020). Influential spreaders identification in complex networks with potential edge weight based k-shell degree neighborhood method. Journal of Computational Science, 39: 101055
[96]
Maji,GMandal,SSen,S ( 2020). A systematic survey on influential spreaders identification in complex networks with a focus on k-shell based techniques. Expert Systems with Applications, 161: 113681
[97]
MassaPSalvettiMTomasoniD (2009). Bowling alone and trust decline in social network sites. In: Proceedings of 8th IEEE International Conference on Dependable, Autonomic and Secure Computing. Chengdu, 658– 663
[98]
McAuleyJLeskovecJ (2012). Learning to discover social circles in ego networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, NV: Curran Accociates, 539– 547
[99]
Moore,B ( 1981). Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Transactions on Automatic Control, 26( 1): 17– 32
[100]
Muthukrishna,MSchaller,M ( 2020). Are collectivistic cultures more prone to rapid transformation? Computational models of cross-cultural differences, social network structure, dynamic social influence, and cultural change. Personality and Social Psychology Review, 24( 2): 103– 120
[101]
Namtirtha,ADutta,ADutta,B ( 2018). Weighted k-shell degree neighborhood method: An approach independent of completeness of global network structure for identifying the influential spreaders. In: 10th International Conference on Communication Systems & Networks. Bengaluru: IEEE, 81– 88
[102]
Namtirtha,ADutta,ADutta,BSundararajan,ASimmhan,Y ( 2021). Best influential spreaders identification using network global structural properties. Scientific Reports, 11( 1): 2254
[103]
Nargundkar,ARao,Y S ( 2016). InfluenceRank: A machine learning approach to measure influence of Twitter users. In: International Conference on Recent Trends in Information Technology. Chennai: IEEE, 1– 6
[104]
Newman,M E J ( 2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98( 2): 404– 409
[105]
NewmanM E J ( 2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 74( 3): 036104
[106]
NiepertMAhmedMKutzkovK (2016). Learning convolutional neural networks for graphs. In: Proceedings of the 33rd International Conference on Machine Learning. New York, NY: JMLR.org, 2014– 2023
[107]
Ou,YGuo,QXing,J LLiu,J G ( 2022). Identification of spreading influence nodes via multi-level structural attributes based on the graph convolutional network. Expert Systems with Applications, 203: 117515
[108]
Pal,S KKundu,SMurthy,C A ( 2014). Centrality measures, upper bound, and influence maximization in large scale directed social networks. Fundamenta Informaticae, 130( 3): 317– 342
[109]
Palla,GDerényi,IFarkas,IVicsek,T ( 2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435( 7043): 814– 818
[110]
Pan,R KSaramäki,J ( 2012). The strength of strong ties in scientific collaboration networks. Europhysics Letters, 97( 1): 18007
[111]
Pan,YLi,D HLiu,J GLiang,J Z ( 2010). Detecting community structure in complex networks via node similarity. Physica A: Statistical Mechanics and Its Applications, 389( 14): 2849– 2857
[112]
Peng,CWang,XPei,JZhu,W ( 2019). A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 31( 5): 833– 852
[113]
Qi,XDuval,R DChristensen,KFuller,ESpahiu,AWu,QWu,YTang,WZhang,C ( 2013). Terrorist networks, network energy and node removal: A new measure of centrality based on Laplacian energy. Social Networking, 2( 1): 19– 31
[114]
Qiu,L QJia,WYu,J FFan,XGao,W W ( 2019). PHG: A three-phase algorithm for influence maximization based on community structure. IEEE Access, 7: 62511– 62522
[115]
Ren,XZhu,YWang,SLiao,HHan,XLü,L ( 2015). Online social network analysis and the relation with regional economic development. Journal of University of Electronic Science and Technology of China, 44( 5): 643– 651 (in Chinese)
[116]
Ren,X LLü,L Y ( 2013). Review of ranking nodes in complex networks. Chinese Science Bulletin, 59( 13): 1175– 1197
[117]
Ren,Z M ( 2020). Node influence of the dynamic networks. Acta Physica Sinica, 69( 4): 24– 32 (in Chinese)
[118]
Ren,Z MLiu,J GShao,FHu,Z LGuo,Q ( 2013a). Analysis of the spreading influence of the nodes with minimum k-shell value in complex networks. Acta Physica Sinica, 62( 10): 108902
[119]
Ren,Z MShao,FLiu,J GGuo,QWang,B H ( 2013b). Node importance measurement based on the degree and clustering coefficient information. Acta Physica Sinica, 62( 12): 128901
[120]
Sabidussi,G ( 1966). The centrality index of a graph. Psychometrika, 31( 4): 581– 603
[121]
Sacchet,M DPrasad,GFoland-Ross,L CThompson,P MGotilb,I H ( 2014). Elucidating brain connectivity networks in major depressive disorder using classification-based scoring. In: 11th International Symposium on Biomedical Imaging. Beijing: IEEE, 246– 249
[122]
Shang,J XZhou,S BLi,XLiu,L CWu,H C ( 2017). CoFIM: A community-based framework for influence maximization on large-scale networks. Knowledge-Based Systems, 117: 88– 100
[123]
Shang,QDeng,YCheong,K H ( 2021). Identifying influential nodes in complex networks: Effective distance gravity model. Information Sciences, 577: 162– 179
[124]
Sheikhahmadi,ANematbakhsh,M AShokrollahi,A ( 2015). Improving detection of influential nodes in complex networks. Physica A: Statistical Mechanics and Its Applications, 436: 833– 845
[125]
Silva,T CZhao,L ( 2012). Network-based high level data classification. IEEE Transactions on Neural Networks and Learning Systems, 23( 6): 954– 970
Spring,NMahajan,RWetherall,D ( 2002). Measuring ISP topologies with rocketfuel. ACM SIGCOMM Computer Communication Review, 32( 4): 133– 145
[128]
Su,X PSong,Y R ( 2015). Leveraging neighborhood “structural holes” to identifying key spreaders in social networks. Acta Physica Sinica, 64( 2): 020101
[129]
Sun,H LChen,D BHe,J LChng,E ( 2019). A voting approach to uncover multiple influential spreaders on weighted networks. Physica A: Statistical Mechanics and Its Applications, 519: 303– 312
[130]
Tang,LLiu,H ( 2009). Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, 817– 826
[131]
Tang,L YLi,S NLin,J HGuo,QLiu,J G ( 2016). Community structure detection based on the neighbor node degree information. International Journal of Modern Physics C, 27( 4): 1650046
[132]
Tixier,A J PRossi,M E GMalliaros,F DRead,JVazirgiannis,M ( 2019). Perturb and combine to identify influential spreaders in real-world networks. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Vancouver, 73– 80
[133]
Tulu,M MHou,RYounas,T ( 2018). Identifying influential nodes based on community structure to speed up the dissemination of information in complex network. IEEE Access, 6: 7390– 7401
[134]
Ullah,AWang,BSheng,JLong,JKhan,NSun,Z ( 2021). Identification of nodes influence based on global structure model in complex networks. Scientific Reports, 11( 1): 6173
[135]
Wang,FShe,JOhyama,YWu,M ( 2019). Deep-learning-based identification of influential spreaders in online social networks. In: IECON 45th Annual Conference of the IEEE Industrial Electronics Society. Lisbon, 6854– 6858
[136]
Wang,YCong,GSong,G JXie,K Q ( 2010). Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, D.C., 1039– 1048
[137]
WangY FYanG HMaQ QWuYZhangM (2018). Identifying influential nodes based on vital communities. In: 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress. Athens: IEEE, 314– 317
[138]
Wang,Z XZhao,YXi,J KDu,C J ( 2016). Fast ranking influential nodes in complex networks using a k-shell iteration factor. Physica A: Statistical Mechanics and Its Applications, 461: 171– 181
[139]
Watts,D JDodds,P S ( 2007). Influential, networks, and public opinion formation. Journal of Consumer Research, 34( 4): 441– 458
[140]
Watts,D JStrogatz,S H ( 1998). Collective dynamics of “small-world” networks. Nature, 393( 6684): 440– 442
[141]
Wei,HPan,ZHu,GZhang,LYang,HLi,XZhou,X ( 2018). Identifying influential nodes based on network representation learning in complex networks. PLoS One, 13( 7): e0200091
[142]
Wu,ZPan,SChen,FLong,GZhang,CYu,P S ( 2021). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32( 1): 4– 24
[143]
XieN ( 2006). Social Network Analysis of Blogs. Dissertation for the Master’s Degree. Bristol: University of Bristol
[144]
Yan,STang,S TPei,S SJiang,JZhang,XDing,W RZheng,M Z ( 2013). The spreading of opposite opinions on online social networks with authoritative nodes. Physica A: Statistical Mechanics and Its Applications, 392( 17): 3846– 3855
[145]
Yan,X LCui,Y PNi,S J ( 2020). Identifying influential spreaders in complex networks based on entropy weight method and gravity law. Chinese Physics B, 29( 4): 048902
[146]
Yang,JLeskovec,J ( 2012). Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. Beijing, 1– 8
[147]
Yang,JLeskovec,J ( 2013). Overlapping community detection at scale: A nonnegative matrix factorization approach. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining. Rome, 587– 596
[148]
Yang,J NLiu,J GGuo,Q ( 2018a). Node importance identification for temporal network based on inter-layer similarity. Acta Physica Sinica, 67( 4): 279– 286 (in Chinese)
[149]
Yang,KGuo,QLiu,J G ( 2018b). Community detection via measuring the strength between nodes for dynamics networks. Physica A: Statistical Mechanics and Its Applications, 509: 256– 264
[150]
Yang,X HXiong,S ( 2021). Identification of node influence using network representation learning in complex network. Journal of Chinese Computer Systems, 42( 2): 418– 423 (in Chinese)
[151]
Yang,Y ZWang,XChen,YHu,MRuan,C W ( 2020). A novel centrality of influential nodes identification in complex networks. IEEE Access, 8: 58742– 58751
[152]
Yin,R RGuo,QYang,J NLiu,J G ( 2018). Inter-layer similarity-based eigenvector centrality measures for temporal networks. Physica A: Statistical Mechanics and Its Applications, 512: 165– 173
[153]
Yu,E YWang,Y PFu,YChen,D BXie,M ( 2020). Identifying critical nodes in complex networks via graph convolutional networks. Knowledge-Based Systems, 198: 105893
[154]
Yu,S BGao,LXu,L DGao,Z Y ( 2019). Identifying influential spreaders based on indirect spreading in neighborhood. Physica A: Statistical Mechanics and Its Applications, 523: 418– 425
[155]
Zachary,W W ( 1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33( 4): 452– 473
[156]
Zareie,ASheikhahmadi,AJalili,M ( 2019). Influential node ranking in social networks based on neighborhood diversity. Future Generation Computer Systems, 94: 120– 129
[157]
Zeng,A CZhang,J ( 2013). Ranking spreaders by decomposing complex networks. Physics Letters A, 377( 14): 1031– 1035
[158]
Zhang,DWang,YZhang,Z ( 2019a). Identifying and quantifying potential super-spreaders in social networks. Scientific Reports, 9( 1): 14811
[159]
Zhang,J XChen,D BDong,QZhao,Z D ( 2016). Identifying a set of influential spreaders in complex networks. Scientific Reports, 6( 1): 27823
[160]
Zhang,M HChen,Y X ( 2018). Link prediction based on graph neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal: Curran Associates Inc., 5171– 5181
[161]
Zhang,WYang,JDing,X YZou,X MHan,H YZhao,Q C ( 2019b). Groups make nodes powerful: Identifying influential nodes in social networks based on social conformity theory and community features. Expert Systems with Applications, 125: 249– 258
[162]
Zhao,G HJia,PHuang,CZhou,AFang,Y ( 2020a). A machine learning based framework for identifying influential nodes in complex networks. IEEE Access, 8: 65462– 65471
[163]
Zhao,G HJia,PZhou,AZhang,B ( 2020b). InfGCN: Identifying influential nodes in complex networks with graph convolutional networks. Neurocomputing, 414: 18– 26
[164]
Zhao,X YHuang,BTang,MZhang,H FChen,D B ( 2014a). Identifying effective multiple spreaders by coloring complex networks. Europhysics Letters, 108( 6): 68005
[165]
Zhao,Z JGuo,QYu,KLiu,J G ( 2020c). Identifying influential nodes for the networks with community structure. Physica A: Statistical Mechanics and Its Applications, 551: 123893
[166]
Zhao,Z YYu,HZhu,Z LWang,X F ( 2014b). Identifying influential spreaders based on network community structure. Chinese Journal of Computers, 37( 4): 753– 766 (in Chinese)
[167]
Zhao,Z YWang,X FZhang,WZhu,Z L ( 2015). A community-based approach to identifying influential spreaders. Entropy, 17( 4): 2228– 2252
[168]
Zhou,M YXiong,W MWu,X YZhang,Y XLiao,H ( 2018). Overlapping influence inspires the selection of multiple spreaders in complex networks. Physica A: Statistical Mechanics and Its Applications, 508: 76– 83
[169]
Zhou,TLü,L YZhang,Y C ( 2009). Predicting missing links via local information. European Physical Journal B, 71( 4): 623– 630
RIGHTS & PERMISSIONS
Higher Education Press
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.