An examination of the OMIM database for associating mutation to a consensus reference sequence

Zuofeng Li1(), Beili Ying2, Xingnan Liu3, Xiaoyan Zhang3, Hong Yu4

PDF(409 KB)
PDF(409 KB)
Protein Cell ›› 2012, Vol. 3 ›› Issue (3) : 198-203. DOI: 10.1007/s13238-012-2037-2
COMMUNICATION
COMMUNICATION

An examination of the OMIM database for associating mutation to a consensus reference sequence

  • Zuofeng Li1(), Beili Ying2, Xingnan Liu3, Xiaoyan Zhang3, Hong Yu4
Author information +
History +

Abstract

Gene mutation (e.g. substitution, insertion and deletion) and related phenotype information are important biomedical knowledge. Many biomedical databases (e.g. OMIM) incorporate such data. However, few studies have examined the quality of this data. In the current study, we examined the quality of protein single-point mutations in the OMIM and identified whether the corresponding reference sequences align with the mutation positions. Our results show that close to 20% of mutation data cannot be mapped to a single reference sequence. The failed mappings are caused by position conflict, site shifting (peptide, N-terminal methionine) and other types of data error. We propose a preliminary model to resolve such inconsistency in the OMIM database.

Keywords

single-point mutation / OMIM / reference sequence / data quality

Cite this article

Download citation ▾
Zuofeng Li, Beili Ying, Xingnan Liu, Xiaoyan Zhang, Hong Yu. An examination of the OMIM database for associating mutation to a consensus reference sequence. Prot Cell, 2012, 3(3): 198‒203 https://doi.org/10.1007/s13238-012-2037-2

References

[1] Alonso, G., Koegl, M., Mazurenko, N., and Courtneidge, S.A. (1995). Sequence requirements for binding of Src family tyrosine kinases to activated growth factor receptors. J Biol Chem 270, 9840-9848 .10.1074/jbc.270.17.9840
[2] Cambien, F., and Tiret, L. (2007). Genetics of cardiovascular diseases: from single mutations to the whole genome. Circulation 116, 1714-1724 .10.1161/CIRCULATIONAHA.106.661751
[3] Caporaso, J.G., Baumgartner, W.A. Jr, Randolph, D.A., Cohen, K.B., and Hunter, L. (2007). MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23, 1862-1865 .10.1093/bioinformatics/btm235
[4] George, R.A., Smith, T.D., Callaghan, S., Hardman, L., Pierides, C., Horaitis, O., Wouters, M.A., and Cotton, R.G.H. (2008). General mutation databases: analysis and review. J Med Genet 45, 65-70 .10.1136/jmg.2007.052639
[5] Giardine, B., van Baal, S., Kaimakis, P., Riemer, C., Miller, W., Samara, M., Kollia, P., Anagnou, N.P., Chui, D.H.K., Wajcman, H., . (2007). HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. Hum Mutat 28, 206.10.1002/humu.9479
[6] Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., and McKusick, V.A. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33, D514-D517 .10.1093/nar/gki033
[7] Horaitis, O., Talbot, C.C. Jr, Phommarinh, M., Phillips, K.M., and Cotton, R.G.H. (2007). A database of locus-specific databases. Nat Genet 39, 425.10.1038/ng0407-425
[8] Horn, F., Lau, A.L., and Cohen, F.E. (2004). Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20, 557-568 .10.1093/bioinformatics/btg449
[9] Kanagasabai, R., Choo, K.H., Ranganathan, S., and Baker, C.J.O. (2007). A workflow for mutation extraction and structure annota tion. J Bioinform Comput Biol 5, 1319-1337 .10.1142/S0219720007003119
[10] Lee, A.W., and States, D.J. (2000). Both src-dependent and-independent mechanisms mediate phosphatidylinositol 3-kinase regulation of colony-stimulating factor 1-activated mitogenactivated protein kinases in myeloid progenitors. Mol Cell Biol 20, 6779-6798 .10.1128/MCB.20.18.6779-6798.2000
[11] Leinonen, R., Nardone, F., Zhu, W., and Apweiler, R. (2006). UniSave: the UniProtKB sequence/annotation version database. Bioinformatics 22, 1284-1285 .10.1093/bioinformatics/btl105
[12] Li, Zuofeng, Xingnan Liu, Jingran Wen, Ye Xu, Xin Zhao, Xuan Li, Lei Liu, and Xiaoyan Zhang.2011. “DRUMS: A human disease related unique gene mutation search engine”. Human Mutation 32, E2259-E2265 .10.1002/humu.21556
[13] Ostell, J. (2009). Data Sharing: Standards for Bioinformatic Cross-Talk. Hum Mutat 30, vii-vii .10.1002/humu.21013
[14] Rebholz-Schuhmann, D., Marcel, S., Albert, S., Tolle, R., Casari, G., and Kirsch, H. (2004). Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res 32, 135-142 .10.1093/nar/gkh162
[15] Tatusova, T.A., and Madden, T.L. (1999). BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 174, 247-250 .10.1111/j.1574-6968.1999.tb13575.x
[16] Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Chetvernin, V., Church, D. M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L. Y., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J., Miller, V., Pruitt, K. D., Schuler, G. D., Sequeira, E., Sherry, S. T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R. L., Tatusova, T. A., Wagner, L. & Yaschenko, E. (2007). Database resources of the National Center for Biotechnology Information. Nucl Acids Res 35(Database), D5-D12 .
[17] Xi, H., Park, J., Ding, G., Lee, Y.-H., and Li, Y. (2009). SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry. Nucl Acids Res 37, D913-D920 .10.1093/nar/gkn848
AI Summary AI Mindmap
PDF(409 KB)

Accesses

Citations

Detail

Sections
Recommended

/