Inter-observer variability between readers of CT images: all for one and one for all
Nikolas S. Kulberg , Roman V. Reshetnikov , Vladimir P. Novik , Alexey B. Elizarov , Maxim A. Gusev , Victor A. Gombolevskiy , Anton V. Vladzymyrskyy , Sergey P. Morozov
Digital Diagnostics ›› 2021, Vol. 2 ›› Issue (2) : 105 -118.
Inter-observer variability between readers of CT images: all for one and one for all
BACKGROUND: The markup of medical image datasets is based on the subjective interpretation of the observed entities by radiologists. There is currently no widely accepted protocol for determining ground truth based on radiologists’ reports.
AIM: To assess the accuracy of radiologist interpretations and their agreement for the publicly available dataset “CTLungCa-500”, as well as the relationship between these parameters and the number of independent readers of CT scans.
MATERIALS AND METHODS: Thirty-four radiologists took part in the dataset markup. The dataset included 536 patients who were at high risk of developing lung cancer. For each scan, six radiologists worked independently to create a report. After that, an arbitrator reviewed the lesions discovered by them. The number of true-positive, false-positive, true-negative, and false-negative findings was calculated for each reader to assess diagnostic accuracy. Further, the inter-observer variability was analyzed using the percentage agreement metric.
RESULTS: An increase in the number of independent readers providing CT scan interpretations leads to accuracy increase associated with a decrease in agreement. The majority of disagreements were associated with the presence of a lung nodule in a specific site of the CT scan.
CONCLUSION: If arbitration is provided, an increase in the number of independent initial readers can improve their combined accuracy. The experience and diagnostic accuracy of individual readers have no bearing on the quality of a crowd-tagging annotation. At four independent readings per CT scan, the optimal balance of markup accuracy and cost was achieved.
X-ray computed tomography / datasets as topic / ground truth / observer variation
| [1] |
Morozov SP, Kulberg NS, Gombolevsky VA, et al. Moscow Radiology Dataset CTLungCa-500. 2018. (In Russ). Available from: https://mosmed.ai/datasets/ct_lungcancer_500/ |
| [2] |
Морозов С.П., Кульберг Н.С., Гомболевский В.А., и др. Датасет радиологии Москвы CTLungCa-500. 2018. Режим доступа: https://mosmed.ai/datasets/ct_lungcancer_500/. Дата обращения: 11.02.2021. |
| [3] |
Morozov SP, Kulberg NS, Gombolevsky VA, et al. Moscow Radiology Dataset CTLungCa-500. 2018. (In Russ). Available from: https://mosmed.ai/datasets/ct_lungcancer_500/ |
| [4] |
Morozov SP, Gombolevskiy VA, Elizarov AB, et al. A simplified cluster model and a tool adapted for collaborative labeling of lung cancer CT Scans. Comput Methods Programs Biomed. 2021;206:106111. doi: 10.1016/j.cmpb.2021.106111 |
| [5] |
Morozov S.P., Gombolevskiy V.A., Elizarov A.B., et al. A simplified cluster model and a tool adapted for collaborative labeling of lung cancer CT Scans//Comput Methods Programs Biomed. 2021. Vol. 206. Р. 106111. doi: 10.1016/j.cmpb.2021.106111 |
| [6] |
Morozov SP, Gombolevskiy VA, Elizarov AB, et al. A simplified cluster model and a tool adapted for collaborative labeling of lung cancer CT Scans. Comput Methods Programs Biomed. 2021;206:106111. doi: 10.1016/j.cmpb.2021.106111 |
| [7] |
Kulberg NS, Gusev MA, Reshetnikov RV, et al. Methodology and tools for creating training samples for artificial intelligence systems for recognizing lung cancer on CT images. Heal Care Russ Fed. 2020;64(6):343–350. doi: 10.46563/0044-197X-2020-64-6-343-350 |
| [8] |
Kulberg N.S., Gusev M.A., Reshetnikov R.V., et al. Methodology and tools for creating training samples for artificial intelligence systems for recognizing lung cancer on CT images//Heal Care Russ Fed. 2020. Vol. 64, N 6. Р. 343–350. doi: 10.46563/0044-197X-2020-64-6-343-350 |
| [9] |
Kulberg NS, Gusev MA, Reshetnikov RV, et al. Methodology and tools for creating training samples for artificial intelligence systems for recognizing lung cancer on CT images. Heal Care Russ Fed. 2020;64(6):343–350. doi: 10.46563/0044-197X-2020-64-6-343-350 |
| [10] |
Hessel SJ, Herman PG, Swensson RG. Improving performance by multiple interpretations of chest radiographs: effectiveness and cost. Radiology. 1978;127(3):589–594. doi: 10.1148/127.3.589 |
| [11] |
Hessel S.J., Herman P.G., Swensson R.G. Improving performance by multiple interpretations of chest radiographs: effectiveness and cost//Radiology. 1978. Vol. 127, N 3. Р. 589–594. doi: 10.1148/127.3.589 |
| [12] |
Hessel SJ, Herman PG, Swensson RG. Improving performance by multiple interpretations of chest radiographs: effectiveness and cost. Radiology. 1978;127(3):589–594. doi: 10.1148/127.3.589 |
| [13] |
Herman PG, Hessel SJ. Accuracy and its relationship to experience in the interpretation of chest radiographs. Invest Radiol. 1975;10(1):62–67. doi: 10.1097/00004424-197501000-00008 |
| [14] |
Herman P.G., Hessel S.J. Accuracy and its relationship to experience in the interpretation of chest radiographs//Invest Radiol. 1975. Vol. 10, N 1. Р. 62–67. doi: 10.1097/00004424-197501000-00008 |
| [15] |
Herman PG, Hessel SJ. Accuracy and its relationship to experience in the interpretation of chest radiographs. Invest Radiol. 1975;10(1):62–67. doi: 10.1097/00004424-197501000-00008 |
| [16] |
MacMahon H, Naidich DP, Goo JM, et al. Guidelines for management of incidental pulmonary nodules detected on ct images: from the fleischner society 2017. Radiology. 2017;284:228–243. doi: 10.1148/radiol.2017161659 |
| [17] |
MacMahon H., Naidich D.P., Goo J.M., et al. Guidelines for management of incidental pulmonary nodules detected on ct images: from the fleischner society 2017//Radiology. 2017. Vol. 284, N 1. Р. 228–243. doi: 10.1148/radiol.2017161659 |
| [18] |
MacMahon H, Naidich DP, Goo JM, et al. Guidelines for management of incidental pulmonary nodules detected on ct images: from the fleischner society 2017. Radiology. 2017;284:228–243. doi: 10.1148/radiol.2017161659 |
| [19] |
Gerke O, Vilstrup MH, Segtnan EA, et al. How to assess intra- and inter-observer agreement with quantitative PET using variance component analysis: a proposal for standardisation. BMC Med Imaging. 2016;16(1):54. doi: 10.1186/s12880-016-0159-3 |
| [20] |
Gerke O., Vilstrup M.H., Segtnan E.A., et al. How to assess intra- and inter-observer agreement with quantitative PET using variance component analysis: a proposal for standardisation//BMC Med Imaging. 2016. Vol. 16, N 1. Р. 54. doi: 10.1186/s12880-016-0159-3 |
| [21] |
Gerke O, Vilstrup MH, Segtnan EA, et al. How to assess intra- and inter-observer agreement with quantitative PET using variance component analysis: a proposal for standardisation. BMC Med Imaging. 2016;16(1):54. doi: 10.1186/s12880-016-0159-3 |
| [22] |
Rasheed K, Rabinowitz YS, Remba D, Remba MJ. Interobserver and intraobserver reliability of a classification scheme for corneal topographic patterns. Br J Ophthalmol. 1998;82(12):1401–1406. doi: 10.1136/bjo.82.12.1401 |
| [23] |
Rasheed K., Rabinowitz Y.S., Remba D., Remba M.J. Interobserver and intraobserver reliability of a classification scheme for corneal topographic patterns//Br J Ophthalmol. 1998. Vol. 82, N 12. Р. 1401–1406. doi: 10.1136/bjo.82.12.1401 |
| [24] |
Rasheed K, Rabinowitz YS, Remba D, Remba MJ. Interobserver and intraobserver reliability of a classification scheme for corneal topographic patterns. Br J Ophthalmol. 1998;82(12):1401–1406. doi: 10.1136/bjo.82.12.1401 |
| [25] |
Van Riel SJ, Sánchez CI, Bankier AA, et al. Observer variability for classification of pulmonary nodules on low-dose ct images and its effect on nodule management. Radiology. 2015;277(3):863–871. doi: 10.1148/radiol.2015142700 |
| [26] |
Van Riel S.J., Sánchez C.I., Bankier A.A., et al. Observer variability for classification of pulmonary nodules on low-dose ct images and its effect on nodule management//Radiology. 2015. Vol. 277, N 3. Р. 863–871. doi: 10.1148/radiol.2015142700 |
| [27] |
Van Riel SJ, Sánchez CI, Bankier AA, et al. Observer variability for classification of pulmonary nodules on low-dose ct images and its effect on nodule management. Radiology. 2015;277(3):863–871. doi: 10.1148/radiol.2015142700 |
| [28] |
Wickham H, François R, Henry L, Müller K. dplyr: A Grammar of Data Manipulation. R package version 1.0.4. 2021. |
| [29] |
Wickham H., François R., Henry L., Müller K. dplyr: A Grammar of Data Manipulation. R package version 1.0.4. 2021. |
| [30] |
Wickham H, François R, Henry L, Müller K. dplyr: A Grammar of Data Manipulation. R package version 1.0.4. 2021. |
| [31] |
Gamer M, Lemon J, Fellows I, Singh P. irr: Various Coefficients of Interrater Reliability and Agreement. 2019. |
| [32] |
Wickham H. ggplot2: elegant Graphics for Data Analysis. Springer-Verlag New York; 2016. 260 р. |
| [33] |
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2020. Available from: http://www.r-project.org/index.html |
| [34] |
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 2020. Режим доступа: http://www.r-project.org/index.html. Дата обращения: 11.02.2021. |
| [35] |
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2020. Available from: http://www.r-project.org/index.html |
| [36] |
Van Rossum G, Drake FL. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA; 2009. |
| [37] |
Van Rossum G., Drake F.L. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA; 2009. |
| [38] |
Van Rossum G, Drake FL. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA; 2009. |
| [39] |
Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med. 2019;25(6):954–961. doi: 10.1038/s41591-019-0447-x |
| [40] |
Ardila D., Kiraly A.P., Bharadwaj S., et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography//Nat Med. 2019. Vol. 25, N 6. Р. 954–961. doi: 10.1038/s41591-019-0447-x |
| [41] |
Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med. 2019;25(6):954–961. doi: 10.1038/s41591-019-0447-x |
| [42] |
Peters R, Heuvelmans M, Brinkhof S, et al. Prevalence of pulmonary multi-nodularity in CT lung cancer screening. 2015. |
| [43] |
Peters R., Heuvelmans M., Brinkhof S., et al. Prevalence of pulmonary multi-nodularity in CT lung cancer screening. 2015. |
| [44] |
Peters R, Heuvelmans M, Brinkhof S, et al. Prevalence of pulmonary multi-nodularity in CT lung cancer screening. 2015. |
| [45] |
Creative Research Systems. The survey systems: Sample size calculator. 2012. |
| [46] |
Hugo GD, Weiss E, Sleeman WC, et al. A longitudinal four-dimensional computed tomography and cone beam computed tomography dataset for image-guided radiation therapy research in lung cancer. Med Phys. 2017;44(2):762–771. doi: 10.1002/mp.12059 |
| [47] |
Hugo G.D., Weiss E., Sleeman W.C., et al. A longitudinal four-dimensional computed tomography and cone beam computed tomography dataset for image-guided radiation therapy research in lung cancer//Med Phys. 2017. Vol. 44, N 2. Р. 762–771. doi: 10.1002/mp.12059 |
| [48] |
Hugo GD, Weiss E, Sleeman WC, et al. A longitudinal four-dimensional computed tomography and cone beam computed tomography dataset for image-guided radiation therapy research in lung cancer. Med Phys. 2017;44(2):762–771. doi: 10.1002/mp.12059 |
| [49] |
Bakr S, Gevaert O, Echegaray S, et al. A radiogenomic dataset of non-small cell lung cancer. Sci Data. 2018;5:180202. doi: 10.1038/sdata.2018.202 |
| [50] |
Bakr S., Gevaert O., Echegaray S., et al. A radiogenomic dataset of non-small cell lung cancer//Sci Data. 2018. Vol. 5. Р. 180202. doi: 10.1038/sdata.2018.202 |
| [51] |
Bakr S, Gevaert O, Echegaray S, et al. A radiogenomic dataset of non-small cell lung cancer. Sci Data. 2018;5:180202. doi: 10.1038/sdata.2018.202 |
| [52] |
Armato SG, McLennan G, Bidaut L, et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on ct scans. Med Phys. 2011;38(2):915–931. doi: 10.1118/1.3528204 |
| [53] |
Armato S.G., McLennan G., Bidaut L., et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on ct scans//Med Phys. 2011. Vol. 38, N 2. Р. 915–931. doi: 10.1118/1.3528204. |
| [54] |
Armato SG, McLennan G, Bidaut L, et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on ct scans. Med Phys. 2011;38(2):915–931. doi: 10.1118/1.3528204 |
Kulberg N.S., Reshetnikov R.V., Novik V.P., Elizarov A.B., Gusev M.A., Gombolevskiy V.A., Vladzymyrskyy A.V., Morozov S.P.
/
| 〈 |
|
〉 |