Two primary datasets of pulmonary CT scans have been made available for this challenge.

Lung Image Database Consortium (LIDC-IDRI)

Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation

Located on the Cancer Imaging Archive website and freely available under the Creative Commons Attribution 3.0 Unported License, this image collection “is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis.” It consists of 1,018 cases in which four thoracic radiologists have independently labeled nodules and non-nodules, including freehand outlines of nodules ≥ 3mm in diameter in CT slices where the nodule is visible, and approximate centroid markings for nodules < 3mm and non-nodules. Read more about the data at (and continue to update!) the data documentation page.

Additional datasets may be useful as well, check the data documentation page for more resources.

National Lung Screening Trial (NLST)

National Cancer Institute (NCI), part of National Institutes of Health (NIH)

CT scans from a randomized controlled trial, including records for 26,254 patients totaling 203,099 series and 21,082,502 images. Lab results including 177,500 records of abnormalities on CT and 2,100 records of lung cancer have been made available for training. Read more about the data at (and continue to update!) the data documentation page.

