1 Introduction
Tab.1 Approach names, categories, study problems, data, diffusion models, and main analysis findings of the representative MSB, CSB, MASB, and MLB algorithms |
Approach | Category | Study problem | Data | Diffusion model | Main analysis findings |
---|---|---|---|---|---|
Percolation-based greedy algorithm (PBGA) (Hu et al., 2018) | MSB | Influence maximization | GrQc (Leskovec et al., 2007), HepTh (Leskovec et al., 2007), Enron (Leskovec et al., 2009), NoLA Facebook, DBLP (Yang and Leskovec, 2012), QQ (Ren et al., 2015), LiveJournal (Yang and Leskovec, 2012), Weibo, Delicious (Lü et al., 2011) | Susceptible-infected-recovered (SIR) (Hethcote, 2000) | The spreading influence of nodes can be approximately estimated by using the local structural information of nodes |
Spreading strength (SS) (Yu et al., 2019) | Node ranking | Facebook (McAuley and Leskovec, 2012), PGP (Boguñá et al., 2004), Protein (Jeong et al., 2001), Guntella08 (Leskovec et al., 2007), GrQc (Leskovec et al., 2007), CondMat (Leskovec et al., 2007), HepTh (Leskovec et al., 2007), US Air, PowerGrid (Watts and Strogatz, 1998) | SIR (Hethcote, 2000) | The indirect influence of a node on its neighborhood is important for measuring the spreading influence of the node | |
Local centrality (LC) (Chen et al., 2012) | Node ranking | Blog (Xie, 2006), Netscience (Newman, 2006), Router (Spring et al., 2002), Email (Guimerà et al., 2003) | SIR (Hethcote, 2000) | The degree information of high-order neighbors can improve the accuracy and resolution of the degree centrality (DC) | |
Neighborhood centrality (NC) (Liu et al., 2016b) | Node ranking | Email (Guimerà et al., 2003), HepTh (Leskovec et al., 2007), Hamster (Kunegis, 2016), PGP (Boguñá et al., 2004), Astro Physics (Newman, 2001), Router (Spring et al., 2002) | SIR (Hethcote, 2000) | Considering the structural information of a node’s neighbors within two steps is a good choice to balance accuracy and efficiency | |
Local structure similarity (LSS) (Liu et al., 2017a) | Influence maximization | GrQc (Leskovec et al., 2007), Router (Spring et al., 2002), Hamster (Kunegis, 2016), Polblogs | SIR (Hethcote, 2000), Susceptible-infected (SI) (Barabási and Albert, 1999) | The local structural property of nodes can help identify multiple spreading influence nodes more accurately than by using the distance | |
VoteRank (Zhang et al., 2016) | Influence maximization | YouTube (Yang and Leskovec, 2012), CondMat (Leskovec et al., 2007), Berkstan (Leskovec et al., 2009), Notre DAME (Albert et al., 1999) | SIR (Hethcote, 2000), SI (Barabási and Albert, 1999) | The performance of VoteRank is highly correlated with the number of ranked nodes | |
ClusterRank (CR) (Chen et al., 2013) | Node ranking | Delicious (Lü et al., 2011), SM | SIR (Hethcote, 2000) | Nodes with small clustering coefficients are likely to connect with more nodes in the future | |
Local structure centrality (LSC) (Gao et al., 2014) | Node ranking | Email (Guimerà et al., 2003), Blog, PGP (Boguñá et al., 2004), Twitter | SIR (Hethcote, 2000) | The positive effect of the clustering coefficient of a node’s second-order neighbors has a significant influence on the spreading influence of the node | |
V-communities (Vc) (Zhao et al., 2014b) | CSB | Node ranking | Facebook (McAuley and Leskovec, 2012), GrQc (Leskovec et al., 2007), Netscience (Newman, 2006), Protein (Jeong et al., 2001) | SIR (Hethcote, 2000) | The number of communities connected to a node can help detect the spreading influence nodes that a single centrality may ignore |
Community-based centrality (CbC) (Zhao et al., 2015) | Node ranking | Facebook (McAuley and Leskovec, 2012), Metabolic, Email, PowerGrid (Watts and Strogatz, 1998), Router (Spring et al., 2002), Blogcatalog | SIR (Hethcote, 2000) | The size of the community and the distribution of a node’s neighbors in each community play important roles in measuring the node’s spreading influence | |
Community-based mediator (CbM) (Tulu et al., 2018) | Node ranking | Karate (Zachary, 1977), American football network (Girvan and Newman, 2002), Dolphin (Lusseau et al., 2003), Airport, Internet | SIR (Hethcote, 2000) | The edge density within each community and the edge density between communities can be used to identify spreading influence nodes accurately with low computational complexity | |
Community-hole index (CHR) (Wang et al., 2018) | Node ranking | GrQc (Leskovec et al., 2007), Weibo (Tang and Liu, 2009), arXiv (Pan and Saramäki, 2012), Amazon | SIR (Hethcote, 2000) | The importance of communities connected to a node is related to the node’s spreading influence | |
Modular centrality (MC) (Ghalmane et al., 2019a) | Node ranking | Facebook (McAuley and Leskovec, 2012), Netscience (Newman, 2006), GrQc (Leskovec et al., 2007) | SIR (Hethcote, 2000) | Dividing the network with an overlapping community structure into local and global networks can help identify spreading influence nodes more accurately | |
Network global structure-based centrality (NGSC) (Namtirtha et al., 2021) | MASB | Influence maximization | Odlis, Netscience (Newman, 2006), Advogato (Massa et al., 2009) | SIR (Hethcote, 2000) | The network components and network density have a significant influence on the performance of identification algorithms |
Gravity centrality (GC) (Ma et al., 2016) | Node ranking | Facebook, Netscience (Newman, 2006), Email (Guimerà et al., 2003), TAP (Zeng and Zhang, 2013), Y2H (Kumar and Snyder, 2002), Blogs (Xie, 2006), Router (Spring et al., 2002), HepTh (Leskovec et al., 2007), PGP (Boguñá et al., 2004) | SIR (Hethcote, 2000) | The gravity model can be applied to identify the spreading influence of nodes with relatively high accuracy | |
(Shang et al., 2021) | Node ranking | Jazz, Netscience (Newman, 2006), GrQc (Leskovec et al., 2007), EEC, Email, PB, Facebook, US Air, Physicians, PDZBase, Haggle, Infectious | SI (Barabási and Albert, 1999) | Replacing the Euclidean distance of the gravity model by the effective distance can achieve higher accuracy | |
Dynamic-sensitive (DS) (Liu et al., 2016a) | Influence maximization | Erdos, Email contact (Kitsak et al., 2010), Router (Spring et al., 2002), Protein (Jeong et al., 2001) | SIR (Hethcote, 2000), SI (Barabási and Albert, 1999) | The spreading influence of a node is determined by topological structures and the spreading dynamics | |
Influence capacity (Wang et al., 2016) | Node ranking | Karate (Zachary, 1977), Netscience (Newman, 2006), Dolphin (Lusseau et al., 2003), Email (Guimerà et al., 2003), Jazz, PGP (Boguñá et al., 2004), Blog (Xie, 2006), Facebook, Enron (Leskovec et al., 2009), Twitter | SIR (Hethcote, 2000) | The order in which nodes within the same layer are removed can distinguish the spreading influence of nodes located in the same shell | |
Link entropy (Liu et al., 2015a) | Node ranking | Router (Spring et al., 2002), Email contact (Kitsak et al., 2010), AS, Email (Guimerà et al., 2003), HepTh (Leskovec et al., 2007), Hamster (Kunegis, 2016), PGP (Boguñá et al., 2004), Netscience (Newman, 2006), Astro Physics (Newman, 2001) | SIR (Hethcote, 2000) | The node with high coreness is a spreading influence node with a strong edge diversity to nodes located in other shells of the network | |
(Liu et al., 2013a) | Node ranking | Email (Guimerà et al., 2003), PGP (Boguñá et al., 2004), AS, P2P (Leskovec et al., 2007) | SIR (Hethcote, 2000) | The distance from the node to nodes in the core–shell layer can effectively distinguish the spreading influence of nodes in the same shell layer | |
Multicentrality predictors (Bucur, 2020) | MLB | Node ranking | Adolescent, Advogato (Massa et al., 2009), Astro Physics (Newman, 2001), CondMat (Leskovec et al., 2007), GrQc (Leskovec et al., 2007), HepTh (Leskovec et al., 2007), AS, Brightkite, Email, Epinions, Euroroad, Facebook, Github, Guntella, Googleplus, Hamster (Kunegis, 2016), IMDB, OpenFlights, PGP (Boguñá et al., 2004), Twitch, Twitter Stanford, US Airports, PowerGrid (Watts and Strogatz, 1998), WikiTalk | SIR (Hethcote, 2000) | The spreading influence nodes identified by the MSB algorithms may be located in the peripheral regions of the network, and the MASB algorithms can help to rectify this shortcoming |
Perturb and combine (P&C) (Tixier et al., 2019) | Node ranking | Email (Klimt and Yang, 2004), Epinions, WikiVote | SIR (Hethcote, 2000) | The idea of ensemble learning can improve the robustness of the -core and PageRank algorithm | |
Influence deep learning (IDL) (Wang et al., 2019) | Node ranking | Sina Weibo, Epinions, WikiVote, NetHEPT (Pal et al., 2014) | Independent cascading (IC) (Kempe et al., 2003) | The graph convolutional networking (GCN) model has a great potential for the identification of spreading influence nodes |
2 Related definition and problem description
3 MSB algorithms
Tab.2 Advantages and disadvantages of representative MSB algorithms, where is the total number of iteration rounds, is the average degree of nodes, and and are total number of nodes and edges of a network, respectively |
Methods | Advantages | Disadvantages | Computational complexity |
---|---|---|---|
DC (Freeman, 1978) | Low computational complexity; Simple and easy to understand | The structure information of high-order neighbors is ignored | |
LC (Chen et al., 2012) | Considers the degree of high-order neighbor nodes | Other structural attributes are disregarded, except the degree of the node | |
LNC (Dai et al., 2019) | Measures the contributions of different neighbors to the spreading influence of the node; Outperforms DC and betweenness while keeping low computational complexity | Unsuitable for the random network | |
VoteRank (Zhang et al., 2016) | Uses the idea of voting to aggregate the structural information of high-order neighbors; The accuracy is higher than PageRank and LeaderRank; Low computational complexity | The differences in the initial voting abilities of nodes are ignored | |
AIRank (Zhang et al., 2019b) | Distinguishes the voting abilities of nodes from the perspective of individuals and groups | The performance will be unstable when the edge density between communities is low | |
NCRank (Kumar and Panda, 2020) | Distinguishes the voting abilities of nodes by considering the position of the node | Adjusting free parameters to obtain a stable performance is time consuming | |
EnRenew (Guo et al., 2020) | Different initial voting abilities of nodes are distinguished on the basis of the information entropy | The computational complexity is higher than VoteRank | |
DynamicRank (Chen et al., 2019) | Uses the probabilities of high-order neighbors being influenced to measure the spreading influence of nodes | Adjusting the free parameter to obtain a stable performance is time consuming |
Tab.3 Advantages and disadvantages of five main MSB method streams |
Method streams | Related works | Advantages | Disadvantages |
---|---|---|---|
High-order-neighbor-degree-based | LC (Chen et al., 2012); NC (Liu et al., 2016b); LNC (Dai et al., 2019); SS (Yu et al., 2019) | Efficient and easy to understand | Topological information, such as interactions between neighbors and the edge density, is ignored |
Local-clustering-coefficient-based | CR (Chen et al., 2013); LSC (Gao et al., 2014); Centrality (Berahmand et al., 2018); DCC (Yang et al., 2020) | Higher resolution; Enables the spreading influence of nodes with the same degree value to be further distinguished | The performance will be suppressed in densely connected networks because it only focuses on the edge density in the node’s neighborhood |
Similarity-based | DegreeDistance (Sheikhahmadi et al., 2015); LSS (Liu et al., 2017a); HC (Bao et al., 2017) | Ensures that seed nodes are distributed in different parts of the network | The selection of initial seed nodes has a large influence on its performance |
VoteRank-based | VoteRank (Zhang et al., 2016); AIRank (Zhang et al., 2019b); NCRank (Kumar and Panda, 2020); EnRenew (Guo et al., 2020) | No distance calculation is included | The initial state of each node and the spreading influence decreasing strategy will cause large influence on its performance |
Diffusion-model-based | DD (Chen et al., 2009); DynamicRank (Chen et al., 2019) | The topological information and the diffusion mechanism are considered | When the diffusion mechanism changes, its performance will be affected |
4 CSB algorithms
Tab.4 Advantages and disadvantages of representative CSB algorithms, where is the number of seed nodes, is the total number of communities, is the time to compute the degree of a node in community , is the size of community , and and are the number of candidate nodes and edges, respectively |
Methods | Advantages | Disadvantages | Computational complexity |
---|---|---|---|
Vc (Zhao et al., 2014b) | Identifies spreading influence nodes that cannot be detected by a single centrality | The performance will be unstable when the community structure of the network changes | |
CbC (Zhao et al., 2015) | Alleviates the instability of Vc by considering the community size and the distribution of neighbors | Other community structural attributes are not used except the size of the community | |
CbM (Tulu et al., 2018) | The edge densities within and between communities are considered; CbM outperforms CbC | The computational complexity is higher than CbC | |
CGA (Wang et al., 2010) | Reduces the computational complexity of MixGreedy by narrowing the searching space to the community scale | The accuracy is lower than MixGreedy | |
Community-based framework for influence maximization (CoFIM) (Shang et al., 2017) | Divides the propagation process into two phases, which is more explainable; The computational complexity is low while keeping high accuracy | Determining an appropriate infection rate and free parameters is time consuming | |
PHG (Qiu et al., 2019) | Reduces the computational complexity by filtering candidates before applying the greedy algorithm | Only considers the degree of nodes when identifying the core node set | |
ICC (Zhao et al., 2020c) | Improves the performance of CC by adopting community structural attributes | Unsuitable for large-scale networks | – |
Community-based influence maximization (CIM) (Chen et al., 2014) | Narrows the seed nodes’ searching space by the size of the community | Other community structural attributes are ignored, except the size of the community | – |
CHR (Wang et al., 2018) | Considers the influence of community-level importance to node-level importance | Unsuitable for large-scale networks | – |
Note: “–” denotes that the time complexity is not provided in the original paper. |
Tab.5 Advantages and disadvantages of three main CSB method streams |
Method streams | Related works | Advantages | Disadvantages |
---|---|---|---|
Community-structural-attribute-based | Vc (Zhao et al., 2014b); CbC (Zhao et al., 2015); CbM (Tulu et al., 2018); ICC (Zhao et al., 2020c); OC (Wei et al., 2018); MC (Ghalmane et al., 2019a) | Community-level structural information can help detect spreading influence nodes that centralities may ignore | Its performance relies on the community detection algorithms and the selection of community structural attributes |
Diffusion-mechanism-based | CGA (Wang et al., 2010); CoFIM (Shang et al., 2017) | The topological information and diffusion mechanism are considered; Community structure information helps to accelerate the speed of seed node selection | Application scenarios are limited by the diffusion mechanism |
Community-importance-based | CIM (Chen et al., 2014); CHR (Wang et al., 2018); PHG (Qiu et al., 2019) | The difference in the importance of communities is considered; Community structure information helps to accelerate the speed of seed node selection | How to quantify the importance of community has not been well studied |
5 MASB algorithms
![](https://academic.hep.com.cn//article\2022\2095-7513/2095-7513-9-4-520/thumbnail/fem-21065-jgl-fig5.jpg)
Fig.5 Diagram of the structural hole (adapted from Su and Song (2015)): (a) network with a structural hole; and (b) network without a structural hole. |
Tab.6 Advantages and disadvantages of representative MASB algorithms |
Methods | Advantages | Disadvantages | Computational complexity |
---|---|---|---|
BC (Freeman, 1977) | Identifies nodes that have strong control over the spreading process | High computational complexity; Unsuitable for large-scale networks | |
CC (Sabidussi, 1966) | Uses the distances between nodes to measure the spreading of nodes | High computational complexity; Unsuitable for large-scale networks | |
PageRank (Brin and Page, 1998) | Aggregates the structure information of neighbor nodes iteratively; Low computational complexity | Difficult to converge when there are nodes with an out-degree 0 | |
-shell (Kitsak et al., 2010) | Considers the position of the node on the basis of the node’s degree | The spreading influence of nodes in the same shell layer cannot be distinguished | |
LeaderRank (Lü et al., 2011) | Ensures a faster convergence speed than PageRank by adding a ground node | Only suitable for directed networks | |
Local and global node influence (LGI) (Ma et al., 2020) | Introduces the entropy to distinguish differences in the spreading influence of nodes in the same shell layer | Tuning free parameters is time consuming | |
CumulativeRank (Zhang et al., 2019a) | The local and global spreading influence of nodes is considered simultaneously | Unsuitable for random networks | |
GSM (Ullah et al., 2021) | Considers the self-importance of the node and the relationship of the node with other nodes simultaneously | High computational complexity; Unsuitable for large-scale networks | |
IKs (Liu et al., 2015b) | The nodes are divided into different shell layers in a more granular manner | Unsuitable for random networks |
Tab.7 Advantages and disadvantages of four main MASB method streams |
Method streams | Related works | Advantages | Disadvantages |
---|---|---|---|
Iteration-based | EC (Bonacich, 1972); PageRank (Brin and Page, 1998); LeaderRank (Lü et al., 2011); WleaderRank (Li et al., 2014) | Aggregates the structural information of neighbor nodes in an iterative manner, which is more efficient than directly exploiting the structural information of the entire network | The contribution of neighbor nodes to the spreading influence of the target node is mainly measured by the degree information |
-shell-based | (Liu et al., 2013a); Resource Allocation Dynamics (Ma et al., 2014); IKs (Liu et al., 2015b); DSC (Zareie et al., 2019); (Bae and Kim, 2014); Link entropy (Liu et al., 2015a); Influence capacity (Wang et al., 2016); Classified neighbors (CN) (Li et al., 2018) | The position information of nodes in the network is considered, which can avoid to identify nodes located in peripheral regions of the network as spreading influence nodes | The nodes located in the core–shell layer are not always the spreading influence nodes |
Micro-macro-based | NGSC (Namtirtha et al., 2021); ksd (Maji, 2020); Influence (Ma et al., 2020) | Guarantees stable performance in both sparsely and densely connected networks | Weights assigned to each attribute need to be predefined, which is time consuming |
Gravity-model-based | GC (Ma et al., 2016); GM (Li et al., 2019); Yan et al. (2020); (Shang et al., 2021); Effective distance gravity model (EDGM) (Chen et al., 2021b) | Considers the position of nodes and the effect of the distance between nodes on their interaction | The calculation of distance results in high computational complexity |
6 MLB algorithms
Tab.8 Advantages and disadvantages of representative MLB algorithms |
Methods | Advantages | Disadvantages | Machine learning models |
---|---|---|---|
InfEmb (Ivanov et al., 2018) | Determines other seed nodes based on the structure information of the given seed nodes | Unable to identify the remaining seed nodes with few positive samples accurately; Only the degree of the node is considered when selecting negative samples | Linear regression |
InfluenceRank (Nargundkar and Rao, 2016) | Considers the topological and nontopological features simultaneously | Cannot ensure that the selected features accurately reflect the node’s spreading influence | DeepWalk; SVM |
P&C (Tixier et al., 2019) | Enhances the robustness of algorithms with weak anti-interference ability | Relies on the network perturbation method | Bagging |
IDL (Wang et al., 2019) | Considers the action logs of online users and network topology | Only suitable for social networks | GCN |
InfGCN (Zhao et al., 2020b) | Uses the low-dimensional vector representation and centralities of nodes as the input to predict the spreading influence of nodes | High computational complexity | GCN |
RCNN (Yu et al., 2020) | Only uses the degree of the node to transform the matrix representation of the node, which is more efficient than InfGCN and IDL | The performance is unstable when the structure of the training network and the test network are different | CNN |
FINDER (Fan et al., 2020) | Applies the reinforcement learning technique to evaluate the spreading influence of nodes | High computational complexity | RL; DeepWalk |
Tab.9 Advantages and disadvantages of two main MLB method streams |
Method streams | Related works | Advantages | Disadvantages |
---|---|---|---|
SMLB | Multicentrality predictors (Bucur, 2020); Hu et al. (2019); Han et al. (2015); Zhao et al. (2020a); InfEmb (Ivanov et al., 2018); NCL (Yang and Xiong, 2021); InfluenceRank (Nargundkar and Rao, 2016); P&C (Tixier et al., 2019) | Weights assigned to each structural attribute can be obtained automatically by training the model; Highly interpretable | Requires the time-consuming feature engineering process; The accuracy of the model is unstable when the network structure of the training set and the test set are different |
DLB | IDL (Wang et al., 2019); InfGCN (Zhao et al., 2020b); RCNN (Yu et al., 2020); FINDER (Fan et al., 2020) | Feature engineering is not required; Learning from streaming data | Easy to overfit in the training process; The training process is a black box; The accuracy of the model is unstable when the network structure of the training set and the test set are different |
7 Diffusion models
7.1 SI model
7.2 SIS model
7.3 SIR model
7.4 LT model
7.5 IC model
7.6 Weighted IC model
7.7 Conformity-aware IC model
8 Performance evaluation metrics
8.1 Average spreading influence
8.2 Influence scale
8.3 Imprecision function
8.4 Relative difference of spreading scales
8.5 Kendalls’ correlation coefficient
8.6 Monotonicity
9 Discussion and conclusions
Tab.10 Advantages and disadvantages of MSB, CSB, MASB, and MLB algorithms |
Algorithms | Advantages | Disadvantages |
---|---|---|
MSB | Estimates the spreading influence of nodes via micro-level structural information, thereby enabling the identification of spreading influence nodes in large-scale networks to be feasible | Focuses on the edge density within a local area of the target node, leading to nodes located in peripheral regions of networks to be misclassified as spreading influence nodes |
CSB | Accelerates the speed of seed node selection for the influence maximization task; Community structural attributes can help to improve the accuracy of centralities | The performance of CSB algorithms depends on the community detection algorithms |
MASB | Helps to rectify the identification results of MSB algorithms; Performs well in densely connected networks | High computational complexity hinders the use of MASB algorithms in large-scale networks; Weights of MASB algorithms that consider multiple structural attributes need to be predefined |
MLB | Automatically calculates the weights assigned to different attributes of nodes; Higher accuracy of identification than traditional algorithms | The computational complexity of MLB algorithms did not receive sufficient attention; The performance of MLB algorithms is influenced by the structural difference in the training network and the test network |