COMMUNICATION

An examination of the OMIM database for associating mutation to a consensus reference sequence

Expand
  • 1. Shanghai Center for Bioinformation Technology, Shanghai 200235, China; 2. School of Life Science, Fudan University, Shanghai 200433, China; 3. School of Life Science and Technology, Tongji University, Shanghai 200295, China; 4. University of Wisconsin-Milwaukee, Milwaukee, WI 53201, USA

Received date: 24 Feb 2012

Accepted date: 19 Mar 2012

Published date: 01 Mar 2012

Abstract

Gene mutation (e.g. substitution, insertion and deletion) and related phenotype information are important biomedical knowledge. Many biomedical databases (e.g. OMIM) incorporate such data. However, few studies have examined the quality of this data. In the current study, we examined the quality of protein single-point mutations in the OMIM and identified whether the corresponding reference sequences align with the mutation positions. Our results show that close to 20% of mutation data cannot be mapped to a single reference sequence. The failed mappings are caused by position conflict, site shifting (peptide, N-terminal methionine) and other types of data error. We propose a preliminary model to resolve such inconsistency in the OMIM database.

Cite this article

Zuofeng Li, Beili Ying, Xingnan Liu, Xiaoyan Zhang, Hong Yu . An examination of the OMIM database for associating mutation to a consensus reference sequence[J]. Protein & Cell, 2012 , 3(3) : 198 -203 . DOI: 10.1007/s13238-012-2037-2

References

[1] Alonso, G., Koegl, M., Mazurenko, N., and Courtneidge, S.A. (1995). Sequence requirements for binding of Src family tyrosine kinases to activated growth factor receptors. J Biol Chem 270, 9840-9848 .10.1074/jbc.270.17.9840
[2] Cambien, F., and Tiret, L. (2007). Genetics of cardiovascular diseases: from single mutations to the whole genome. Circulation 116, 1714-1724 .10.1161/CIRCULATIONAHA.106.661751
[3] Caporaso, J.G., Baumgartner, W.A. Jr, Randolph, D.A., Cohen, K.B., and Hunter, L. (2007). MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23, 1862-1865 .10.1093/bioinformatics/btm235
[4] George, R.A., Smith, T.D., Callaghan, S., Hardman, L., Pierides, C., Horaitis, O., Wouters, M.A., and Cotton, R.G.H. (2008). General mutation databases: analysis and review. J Med Genet 45, 65-70 .10.1136/jmg.2007.052639
[5] Giardine, B., van Baal, S., Kaimakis, P., Riemer, C., Miller, W., Samara, M., Kollia, P., Anagnou, N.P., Chui, D.H.K., Wajcman, H., . (2007). HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. Hum Mutat 28, 206.10.1002/humu.9479
[6] Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., and McKusick, V.A. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33, D514-D517 .10.1093/nar/gki033
[7] Horaitis, O., Talbot, C.C. Jr, Phommarinh, M., Phillips, K.M., and Cotton, R.G.H. (2007). A database of locus-specific databases. Nat Genet 39, 425.10.1038/ng0407-425
[8] Horn, F., Lau, A.L., and Cohen, F.E. (2004). Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20, 557-568 .10.1093/bioinformatics/btg449
[9] Kanagasabai, R., Choo, K.H., Ranganathan, S., and Baker, C.J.O. (2007). A workflow for mutation extraction and structure annota tion. J Bioinform Comput Biol 5, 1319-1337 .10.1142/S0219720007003119
[10] Lee, A.W., and States, D.J. (2000). Both src-dependent and-independent mechanisms mediate phosphatidylinositol 3-kinase regulation of colony-stimulating factor 1-activated mitogenactivated protein kinases in myeloid progenitors. Mol Cell Biol 20, 6779-6798 .10.1128/MCB.20.18.6779-6798.2000
[11] Leinonen, R., Nardone, F., Zhu, W., and Apweiler, R. (2006). UniSave: the UniProtKB sequence/annotation version database. Bioinformatics 22, 1284-1285 .10.1093/bioinformatics/btl105
[12] Li, Zuofeng, Xingnan Liu, Jingran Wen, Ye Xu, Xin Zhao, Xuan Li, Lei Liu, and Xiaoyan Zhang.2011. “DRUMS: A human disease related unique gene mutation search engine”. Human Mutation 32, E2259-E2265 .10.1002/humu.21556
[13] Ostell, J. (2009). Data Sharing: Standards for Bioinformatic Cross-Talk. Hum Mutat 30, vii-vii .10.1002/humu.21013
[14] Rebholz-Schuhmann, D., Marcel, S., Albert, S., Tolle, R., Casari, G., and Kirsch, H. (2004). Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res 32, 135-142 .10.1093/nar/gkh162
[15] Tatusova, T.A., and Madden, T.L. (1999). BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 174, 247-250 .10.1111/j.1574-6968.1999.tb13575.x
[16] Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Chetvernin, V., Church, D. M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L. Y., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J., Miller, V., Pruitt, K. D., Schuler, G. D., Sequeira, E., Sherry, S. T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R. L., Tatusova, T. A., Wagner, L. & Yaschenko, E. (2007). Database resources of the National Center for Biotechnology Information. Nucl Acids Res 35(Database), D5-D12 .
[17] Xi, H., Park, J., Ding, G., Lee, Y.-H., and Li, Y. (2009). SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry. Nucl Acids Res 37, D913-D920 .10.1093/nar/gkn848
Options
Outlines

/