About the Data

Two primary datasets of pulmonary CT scans have been made available for this challenge.

Lung Image Database Consortium (LIDC-IDRI)

Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation

Located on the Cancer Imaging Archive website and freely available under the Creative Commons Attribution 3.0 Unported License, this image collection “is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis.” It consists of 1,018 cases in which four thoracic radiologists have independently labeled nodules and non-nodules, including freehand outlines of nodules ≥ 3mm in diameter in CT slices where the nodule is visible, and approximate centroid markings for nodules < 3mm and non-nodules. Read more about the data at (and continue to update!) the data documentation page.

DrivenData is grateful the maintainers of this resource for making it available for this project.


  • Armato III, Samuel G., McLennan, Geoffrey, Bidaut, Luc, McNitt-Gray, Michael F., Meyer, Charles R., Reeves, Anthony P., … Clarke, Laurence P. (2015). Data From LIDC-IDRI. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX
  • Armato SG III, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, van Beek EJR, Yankelevitz D, et al.: The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38: 915--931, 2011. (paper)
  • Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. (paper)

Link to data download

Additional datasets may be useful as well, check the data documentation page for more resources.

National Lung Screening Trial (NLST)

National Cancer Institute (NCI), part of National Institutes of Health (NIH)

CT scans from a randomized controlled trial, including records for 26,254 patients totaling 203,099 series and 21,082,502 images. Lab results including 177,500 records of abnormalities on CT and 2,100 records of lung cancer have been made available for training. Read more about the data at (and continue to update!) the data documentation page.

DrivenData is extremely grateful to the National Cancer Institute for access to NCI's data collected by the National Lung Screening Trial (NLST).