MTMO: an efficient network-centric algorithm for subtree counting and enumeration

Guanghui Li; Jiawei Luo; Zheng Xiao; Cheng Liang

doi:10.1007/s40484-018-0140-y

PDF(626 KB)

Quant. Biol. ›› 2018, Vol. 6 ›› Issue (2) : 142-154. DOI: 10.1007/s40484-018-0140-y

RESEARCH ARTICLE

MTMO: an efficient network-centric algorithm for subtree counting and enumeration

Author information +

History +

Abstract

Background: The frequency of small subtrees in biological, social, and other types of networks could shed light into the structure, function, and evolution of such networks. However, counting all possible subtrees of a prescribed size can be computationally expensive because of their potentially large number even in small, sparse networks. Moreover, most of the existing algorithms for subtree counting belong to the subtree-centric approaches, which search for a specific single subtree type at a time, potentially taking more time by searching again on the same network.

Methods: In this paper, we propose a network-centric algorithm (MTMO) to efficiently count k-size subtrees. Our algorithm is based on the enumeration of all connected sets of k–1 edges, incorporates a labeled rooted tree data structure in the enumeration process to reduce the number of isomorphism tests required, and uses an array-based indexing scheme to simplify the subtree counting method.

Results: The experiments on three representative undirected complex networks show that our algorithm is roughly an order of magnitude faster than existing subtree-centric approaches and base network-centric algorithm which does not use rooted tree, allowing for counting larger subtrees in larger networks than previously possible. We also show major differences between unicellular and multicellular organisms. In addition, our algorithm is applied to find network motifs based on pattern growth approach.

Conclusions: A network-centric algorithm which allows for a faster counting of non-induced subtrees is proposed. This enables us to count larger motif in larger networks than previously.

Graphical abstract

Keywords

complex network / evolutionary systems biology / network motif discovery / subtree counting / subtree isomorphism

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Guanghui Li, Jiawei Luo, Zheng Xiao, Cheng Liang. MTMO: an efficient network-centric algorithm for subtree counting and enumeration. Quant. Biol., 2018, 6(2): 142‒154 https://doi.org/10.1007/s40484-018-0140-y

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. and Alon, U. (2002) Network motifs: simple building blocks of complex networks. Science, 298, 824–827 CrossRef Pubmed Google scholar

[2]	Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F. and Sahinalp, S. C. (2008) Biomolecular network motif counting and discovery by color coding. Bioinformatics, 24, i241–i249 CrossRef Pubmed Google scholar

[3]	Huan, J., Wang, W. and Prins, J. (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In Proc. Third IEEE Int’l Conf. on Data Mining, pp. 549–552

[4]	Kuramochi, M. and Karypis, G. (2001) Frequent subgraph discovery. In Proc. First IEEE Int’l Conf. on Data Mining, pp. 313–320

[5]	Liang, C., Li, Y., Luo, J. and Zhang, Z. (2015) A novel motif-discovery algorithm to identify co-regulatory motifs in large transcription factor and microRNA co-regulatory networks in human. Bioinformatics, 31, 2348–2355 CrossRef Pubmed Google scholar

[6]	Chen, X., Yan, C. C., Zhang, X., Zhang, X., Dai, F., Yin, J. and Zhang, Y. (2016) Drug-target interaction prediction: databases, web servers and computational models. Brief. Bioinform., 17, 696–712 CrossRef Pubmed Google scholar

[7]	Chen, X., Yan, C. C., Zhang, X. and You, Z. H. (2017) Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinformatics, 18, 558–576 Pubmed

[8]	Chen, X., Huang, Y. A., You, Z. H., Yan, G. Y. and Wang, X. S. (2017) A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics, 33, 733–739 Pubmed

[9]	Chen, X., Xie, D., Zhao, Q. and You, Z. H. (2017) MicroRNAs and complex diseases: from experimental results to computational models. Brief. Bioinform., doi: 10.1093/bib/bbx130 Pubmed

[10]	Chen, X., Yan, C. C., Zhang, X., You, Z. H., Deng, L., Liu, Y., Zhang, Y. and Dai, Q. (2016) WBSMDA: within and between score for MiRNA-disease association prediction. Sci. Rep., 6, 21106 CrossRef Pubmed Google scholar

[11]	Dao, P., Schönhuth, A., Hormozdiari, F., Hajirasouliha, I., Sahinalp, S. C. and Este, M. (2009) Quantifying systemic evolutionary changes by color coding confidence-scored PPI networks. In 9th Int’l Workshop on Algorithms in Bioinformatics, pp. 37–48

[12]	Omidi, S., Schreiber, F. and Masoudi-Nejad, A. (2009) MODA: an efficient algorithm for network motif discovery in biological networks. Genes Genet. Syst., 84, 385–395 CrossRef Pubmed Google scholar

[13]	Alon, N., Yuster, R. and Zwick, U. (1995) Color-coding. J. Assoc. Comput. Mach., 42, 844–856 CrossRef Google scholar

[14]	Zhao, Z., Khan, M., Kumar, V. S. A. and Marathe, M. V. (2010) Subgraph enumeration in large social contact networks using parallel color coding and streaming. In Proc. IEEE 39th Int’l Conf. on Parallel Processing, pp. 594–603

[15]	Zhao, Z., Wang, G., Butt, A. R., Khan, M., Kumar, V. S. A. and Marathe, M. V. (2012) SAHAD: subgraph analysis in massive networks using Hadoop. In Proc. 26th Int’l. Parallel and Distributed Processing Symp., pp. 390–401

[16]	Slota, G. M. and Madduri, K. (2013) Fast approximate subgraph counting and enumeration. In Proc. IEEE 42nd Int’l Conf. on Parallel Processing, pp. 210–219

[17]	Slota, G. M. and Madduri, K. (2014) Complex network analysis using parallel approximate motif counting. In Proc. 28th Int’l. Parallel and Distributed Processing Symp., pp. 405–414

[18]	Slota, G. M. and Madduri, K. (2015) Parallel color-coding. Parallel Comput., 47, 51–69 CrossRef Google scholar

[19]	Wernicke, S. (2006) Efficient detection of network motifs. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 3, 347–359 CrossRef Pubmed Google scholar

[20]	Kashani, Z. R., Ahrabian, H., Elahi, E., Nowzari-Dalini, A., Ansari, E. S., Asadi, S., Mohammadi, S., Schreiber, F. and Masoudi-Nejad, A. (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics, 10, 318 CrossRef Pubmed Google scholar

[21]	Khakabimamaghani, S., Sharafuddin, I., Dichter, N., Koch, I. and Masoudi-Nejad, A. (2013) QuateXelero: an accelerated exact network motif detection algorithm. PLoS One, 8, e68073 CrossRef Pubmed Google scholar

[22]	Paredes, P. and Ribeiro, P. (2013) Towards a faster network-centric subgraph census. In IEEE/ACM Int’l Conf. on Advances in Social Networks Analysis and Mining, pp. 264–271

[23]	Ferreira, R., Grossi, R. and Rizzi, R. (2011) Output-sensitive listing of bounded-size trees in undirected graphs. In Proc. ESA’11, pp. 275–286

[24]	Wasa, K. (2016) Enumeration of enumeration algorithms. arXiv:1605.05102

[25]	Bu, D., Zhao, Y., Cai, L., Xue, H., Zhu, X., Lu, H., Zhang, J., Sun, S., Ling, L., Zhang, N., (2003) Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Res., 31, 2443–2450 CrossRef Pubmed Google scholar

[26]	Batagelj, V. and Mrvar, A. (2006) Pajek Datasets. Available: http://vlado.fmf.uni-lj.si/pub/networks/data/

[27]	ISCAS89 benchmark suite. http://www.cbl.ncsu.edu/CBL Docs/iscas89.html

[28]	Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E. and Dawson, S. M. (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol., 54, 396–405 CrossRef Google scholar

[29]	Newman, M. (2009) Network Data. Available: http://www-personal.umich.edu/˜mejn/netdata/

[30]	Xenarios, I., Salwínski, L., Duan, X. J., Higney, P., Kim, S. M. and Eisenberg, D. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res., 30, 303–305 CrossRef Pubmed Google scholar

[31]	Mewes, H. W., Amid, C., Arnold, R., Frishman, D., Güldener, U., Mannhaupt, G., Münsterkötter, M., Pagel, P., Strack, N., Stümpflen, V., (2004) MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res., 32, D41– D44 CrossRef Pubmed Google scholar

[32]	Wan, S. and Zou, Q. (2017) HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing. Algorithms Mol. Biol., 12, 25 CrossRef Pubmed Google scholar

[33]	Zou, Q., Hu, Q., Guo, M. and Wang, G. (2015) HAlign: fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics, 31, 2475–2481 CrossRef Pubmed Google scholar

[34]	Stinson, D. (1999) Combinatorial Algorithms: Generation, Enumeration, and Search, pp. 48–52. Boca Raton: CRC Press

[35]	Alamgir, Z. and Abbasi, S. (2007) Combinatorial algorithms for listing paths in minimal change order. In Proc. Fourth Conf. Combinatorial and Algorithmic Aspects of Networking, pp. 112–130

[36]	Hilton, P. and Pedersen, J. (1991) Catalan numbers, their generalization, and their uses. Math. Intell., 13, 64–75 CrossRef Google scholar

[37]	Aho, A., Hopcroft, J. and Ullman, J. (1974) The Design and Analysis of Computer Algorithms, pp 84–85. New Jersey: Addison-Wesley Publishing Co.

[38]	Heubach, S. and Mansour, T. (2004) Compositions of n with parts in a set. Congr. Numer., 168, 127–143

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/ 10.1007/s40484-018-0140-y.

ACKNOWLEDGEMENTS

This work was supported by the National Natural Science Foundation of China (No. 61572180) and Scientific and Technological Research Project of Education Department in Jiangxi Province (No. GJJ170383).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Guanghui Li, Jiawei Luo, Zheng Xiao and Cheng Liang declare that they have no conﬂict of interests.This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(626 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Revised	Accepted	Published
03 Nov 2017	14 Dec 2017	25 Dec 2017	11 Jun 2018
Issue Date
11 Jun 2018

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs