Bringing AI tools for lung cancer detection from concept to clinic.

In the Concept to Clinic challenge, hundreds of data scientists and engineers from around the world came together to build open source tools to fight the world’s deadliest cancer.

Concept to Clinic Screenshot

When advances in AI applications are trapped in inaccessible research code, nobody else can benefit from the code’s insights.

Concept to Clinic is a novel approach for catalyzing technical communities to make artificial intelligence more useful for the real people whose lives it can improve—like lung cancer radiologists and the patients they serve.



For this project, algorithms came from cutting-edge research in the use of deep learning for lung cancer screening. By looking over thousands of previous scans labeled by teams of radiologists, these algorithms analyzed images and inferred patterns indicating potential lung cancer and whether additional procedures are warranted or not.


As a new kind of data challenge, the focus of this project was to catalyze the software development process that could make these advances useful — not just for data scientists interested in cutting-edge methods, but for clinicians and patients on the front lines of lung cancer detection. Contributors tweaked the algorithms so that they could be applied to assessment of individual nodules, and connected them with a robust backend and user interface so that radiologists can actually try them out and provide their own input on an ongoing basis.


The tool built during Concept to Clinic started out as nothing more than a design sketch. The perspective of active radiologists and imagery researchers was built directly into the tool, from the technical design document through prototype demo and user feedback.

These conversations—and user-oriented design in AI more broadly—are critical to bridging the AI application gap that divides research algorithms from the benefit they can have on real users and patient lives.


The prototype developed during the live challenge period between August 2017 and January 2018 focused on helping clinicians flag, assess, and report concerning nodules from CT scans. It consists of four primary user interface screens:

Screen I: Open Image

Import DICOM images from CT scans for your case.

  • View available DICOM images from patient scans
  • Select scans of interest and preview imagery
  • Start a new case by importing selected images

Screen II: Detect and Select

AI algorithms automatically flag potential nodules and suggest risk scores for radiologists to use in diagnostics.

  • See the specific parts of the scan that algorithms identify as of potential concern
  • Assign a risk probability that each candidate warrants follow-up, based on screening results in the past
  • Let radiologists adopt or dismiss proposed candidates, and  add new ones manually

Screen III: Annotate and Segment

Radiologists use additional AI-powered measurement tools to add information for top areas of concern.

  • Display each accepted nodule of concern for radiologists to evaluate
  • Enable radiologists to add information about relevant characteristics and attach their own risk estimates, which are captured as data to power continued learning and improvement
  • Surface suggested boundaries of each nodule through segmentation algorithms, which are manually updated by the radiologist

Screen IV: Report and Export

Standardized reporting automatically generates based on best practice.

  • Display information captured and automatically organized from previous screens into standard reporting templates
  • Export data for record-keeping and follow-up


The code developed during this challenge is openly available for anyone to learn from and use. As an open source project, the impact from the contributions made during this challenge should extend well beyond the boundaries of one prototype or repository. You can find the latest version of the project codebase on GitHub.

Beyond the progress made during this challenge, there are many opportunities to continue to improve upon the codebase. Machine learning models can be refined to provide increasingly accurate predictions for the locations and descriptions of nodules. User testing can help to refine the user interface and backend functionality. Nodules can be tracked longitudinally to give radiologists better tools for monitoring and quantifying changes over time. All of these enhancements could help bring useful functionality to radiologists and patients more quickly.

The contributors

Congratulations to top contributors! Meet the overall leaders in points across the entire project.




See the full leaderboard →

From August 2017 to January 2018, more than 600 contributors from 68 countries around the world entered this challenge.

Throughout the challenge, $100,000 was awarded for top contributors in the different areas needed for development: prediction models, backend engineering, frontend design implementation and community development. Contributors submitted over 150 pull requests (PRs) to the project repository addressing more than 130 registered issues. Different contributions were awarded points based on their assessed level of effort and importance as they moved the project forward.

“It was overwhelming to see the enthusiasm of the participants in the project. Every time I started the software on my machine, it had another new feature that made it easier to explore and navigate through patient scans.”

Willi Gierke, 1st Place

Hear more from contributors Willi Gierke and Serhiy Shekhovtsov in interviews on our blog.

Project background

The problem

Lung cancer causes more deaths each year than any other cancer. As with all cancers, the earlier it can be detected the better the patient outcomes. When the disease is still localized in the lungs, the five-year survival rate is 55%. For tumors that have spread to other organs, the five-year survival rate drops to just 4%.

Read more about the problem →

Built on machine learning

The primary algorithm adapted in this project comes from the top-performing model in the 2017 Data Science Bowl, a competition where thousands of machine learning models vied to accurately classify cancer risk from lung CT scans. This solution was able to detect nodules with greater than 85% recall and reduce the rate of false positive results compared with previous clinical studies.  The algorithm is built using a deep learning segmentation architecture called UNET, which uses a 3D region proposal network to detect potential nodules, and then a similar network engineered for diagnosis to assign cancer probabilities to the proposed nodules. When combined, these two UNET based modules outperformed all other models entered in the competition.

Read more about the algorithms →

Project partners

From before the project started, we had an amazing group of of technical, clinical, and domain experts guiding the structure of the challenge and the design of the technical approach. This was an ambitious project that relied on a wide range of expertise and experience. A technical panel provided input on code contributions, while clinicians and researchers provided feedback on the design and implementation of those changes, and context for why this matters in a clinical setting.

Read more about the partners →

Thank you to all of the amazing contributors to this project!

Special thanks to the Bonnie J. Addario Lung Cancer Foundation (ALCF) who have been instrumental partners in designing and sponsoring the challenge. Thanks also to the incredible partners and all the in-kind sponsors who made this project possible: Amazon Web Services, GitHub, Fossa, TravisCI, and Docker.