Official announcements

AWS Credits for Contributions to the Algorithms!

Thanks for all the awesome work so far! We're in a great place to use our remaining time improving the quality of our algorithms and how they integrate with the application. Towards that end, we want to make sure that contributors have the resources to train deep learning models and test their predictions. Our friends at AWS have generously provided us with credits to help with the cause. If you're planning to improve the algorithms, email info [at] drivendata [dot] org, and tell us what you're planning to work on, and why you need cloud compute resources to do that. We'll provide $1,000 in credits to the first 6 contributors that are planning on tackling big issues!
4 weeks ago

Join us for a walkthrough of the current state of the application!

We'll be running a walkthrough of the application for all contributors on Monday, November 27 at 4pm UTC ([check your time zone](https://time.is/1600_27_Nov_2017_in_UTC)). This will be a great chance to spin up on the project and see the latest version of the end-to-end application. Thanks so much for all of your hardwork so far! Come see what we've accomplished and get motivated for what's next. Details for joining the webinar are below: #### Join from PC, Mac, Linux, iOS or Android: - https://zoom.us/j/378483110 #### Or iPhone one-tap: - US: +16465588656,,378483110# or +16699006833,,378483110# #### Or Telephone: - Dial(for higher quality, dial a number based on your current location): - US: +1 646 558 8656 or +1 669 900 6833 (Meeting ID: 378 483 110) - [International numbers available here](https://zoom.us/zoomconference?m=iLlwfthYMV1QcYZrVsiHQY91HnZ5jVUj)
1 month, 3 weeks ago

MVP phase coming to a close!

Hi all, We're excited by the frenzy of progress being made on the project over the last few weeks. Some logistical notes for how this phase will get closed out: * PRs that are opened up until the MVP phase officially closes at midnight UTC tonight will be eligible for points on the MVP leaderboard. Naturally, that means the MVP leaderboard will be subject to change for a few days after the phase ends as the PRs are resolved. * We will treat these PRs normally and they will go through the customary review process — as usual, our policy within the open source side of the house is to pretend the points don't exist! — except that we expect contributors to be extremely responsive in addressing questions or requested changes so that we can finalize the MVP leaderboard in a timely fashion. * Once the phase ends, the milestone tag on Github will be switched over for issues that now properly belong in the feature building phase. This may not happen instantaneously, but no new points will be entered for the MVP phase after midnight UTC regardless of the Github milestone. For a reminder of how prizes will be allocated at the end of a milestone, see the ['Prizes' page](https://concepttoclinic.drivendata.org/prizes). Happy development! - The C2C team
2 months, 3 weeks ago

Private Docker Cloud repos for early contributors

Just to sweeten the deal: the first 100 submitters of accepted PRs will earn the submitter 5 private Docker Cloud repos thanks to the generous sponsorship of our friends at Docker! Check out the Docker Cloud [site](https://cloud.docker.com/) to learn more about what you can do with private repositories.
5 months, 1 week ago

Launch Announcement

The Concept to Clinic challenge is live! Thanks for joining in this community effort to bring advances in AI to the front lines of lung cancer detection. Read up on the challenge and Sign Up to start contributing today!
5 months, 2 weeks ago

Newsfeed

@WGierke commented on PR #290: [WIP] Classification Evaluation

The current implementation takes 8.5h to run, gives an average accuracy/precision of 10% and an average loss of 3.55 :/
9 hours, 14 minutes ago

@WGierke opened a new pull request: #290: [WIP] Classification Evaluation

I'm currently adding the possibility to evaluate a classification model based on the LIDC dataset as described in #271. Furthermore, I'm currently benchmarking the model that's implemented at the moment. Even if I still haven't finished, I wanted to show you what I'm currently working at. ## CLA - [X] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
17 hours, 22 minutes ago

@lamby commented on PR #288: [WIP] Wizard flow

Let me know when this is ready for review :
19 hours, 43 minutes ago

@reubano commented on issue #268: Classification throws RuntimeError for real nodule location

@Serhiy-Shekhovtsov that's right! You did mention that before. I'll edit the original issue so others don't fall back into the same trap. Thanks!
20 hours, 39 minutes ago

@Serhiy-Shekhovtsov commented on issue #268: Classification throws RuntimeError for real nodule location

@reubano the patch is, actually, invalid: > test is running on small DICOM image and 72 is far outside the boundaries of Z dimension(0, 28). The cropping will fail to create a patch of expected size (96, 96, 96). **The test can be fixed by simple replacement of `dicom_path_003`(small image) with `dicom_paths[2]`(full image)**. So, I have fixed that test by creating two new tests in [this commit](https://github.com/concept-to-clinic/concept-to-clinic/pull/272/commits/df2738277856e0eb6e3c00d4c8e92011f9d2b446).
20 hours, 45 minutes ago

@reubano commented on issue #268: Classification throws RuntimeError for real nodule location

@Serhiy-Shekhovtsov did you add the new `test_classify_dicom_nodule` test from the included patch file?
21 hours, 19 minutes ago

@Serhiy-Shekhovtsov commented on issue #268: Classification throws RuntimeError for real nodule location

@reubano I can't reproduce it on latest version. Can you check it again, please?
21 hours, 58 minutes ago

@caseyfitz commented on issue #131: Continuous improvement of nodule classification models (see #2)

Hi @swarm-ai , improved models would be _very welcome_––good luck! There is currently no workflow for including training processes in the application codebase, but this is something we'd love to have. The minimum we need to incorporate an improved model are currently 1. the weights, which live in the `assets/` subdir for each algorithm 2. the architecture, which live in the `src/` subdir for each algorithm With an eye towards future development, we'd be happy to see a PR that augments the algorithm directories with a `training/` subdir (in addition to the current `src` and `assets`)
1 day, 16 hours ago

@reubano commented on issue #131: Continuous improvement of nodule classification models (see #2)

@swarm-ai that repo is just for reference. Are you able to incorporate your performance enhancements to the code in this repo? The GRT model has already been included as per #4.
1 day, 17 hours ago

@WGierke opened a new pull request: #289: /<algorithm>/predict/: Only expect GET or POST requests

While skimming through [code that still needs to be tested in the prediction service](https://wgierke.github.io/stuff/prediction/index.html), I noticed this line that prevented the file from being completely covered by tests. The reason was that the condition was never false since the `methods=['GET', 'POST']` parameter prevents any other request than a `POST` request from reaching this LOC. ## CLA - [ ] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 day, 19 hours ago

@swarm-ai commented on issue #131: Continuous improvement of nodule classification models (see #2)

Hi @reubano I have been working on retraining the classifier and detector models for better performance. I am planning to document the process for both detector and classifier models and submit a pull request to the concept-to-clinic clone of the GRT code base here: https://github.com/concept-to-clinic/DSB2017 Will that work? I did not find any training code set up in the concept-to-clinic repo.
1 day, 22 hours ago

@reubano commented on issue #268: Classification throws RuntimeError for real nodule location

@Serhiy-Shekhovtsov still failing for me on master ```bash $ sudo docker-compose -f local.yml run prediction pytest -vrsk src/tests/test_classification.py Starting base ... Starting base ... done ======================================== test session starts ========================================= platform linux -- Python 3.6.3, pytest-3.1.3, py-1.5.2, pluggy-0.4.0 -- /usr/bin/python3.6 cachedir: .cache rootdir: /app, inifile: collected 6 items src/tests/test_classification.py::test_classify_predict_load PASSED src/tests/test_classification.py::test_classify_dicom PASSED src/tests/test_classification.py::test_classify_real_nodule_small_dicom PASSED src/tests/test_classification.py::test_classify_dicom_nodule FAILED src/tests/test_classification.py::test_classify_real_nodule_full_dicom PASSED src/tests/test_classification.py::test_classify_luna PASSED ============================================== FAILURES ============================================== _____________________________________ test_classify_dicom_nodule _____________________________________ dicom_path_003 = '/images/LIDC-IDRI-0003/1.3.6.1.4.1.14519.5.2.1.6279.6001.101370605276577556143013894866/1.3.6.1.4.1.14519.5.2.1.6279.6001.170706757615202213033480003264' model_path = '/app/src/algorithms/classify/assets/gtr123_model.ckpt' def test_classify_dicom_nodule(dicom_path_003, model_path): > predicted = trained_model.predict(dicom_path_003, [{'x': 367, 'y': 349, 'z': 72}], model_path) src/tests/test_classification.py:21: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ src/algorithms/classify/trained_model.py:40: in predict return gtr123_model.predict(dicom_path, centroids, model_path) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ct_path = '/images/LIDC-IDRI-0003/1.3.6.1.4.1.14519.5.2.1.6279.6001.101370605276577556143013894866/1.3.6.1.4.1.14519.5.2.1.6279.6001.170706757615202213033480003264' nodule_list = [{'x': 367, 'y': 349, 'z': 72}] model_path = '/app/src/algorithms/classify/assets/gtr123_model.ckpt' def predict(ct_path, nodule_list, model_path=None): """ Args: ct_path (str): path to a MetaImage or DICOM data. nodule_list: List of nodules model_path: Path to the torch model (Default value = "src/algorithms/classify/assets/gtr123_model.ckpt") Returns: List of nodules, and probabilities """ if not model_path: CLASSIFY_DIR = path.join(Config.ALGOS_DIR, 'classify') model_path = path.join(CLASSIFY_DIR, 'assets', 'gtr123_model.ckpt') if not nodule_list: return [] casenet = CaseNet() casenet.load_state_dict(torch.load(model_path)) casenet.eval() if torch.cuda.is_available(): casenet = torch.nn.DataParallel(casenet).cuda() # else: # casenet = torch.nn.parallel.DistributedDataParallel(casenet) preprocess = PreprocessCT(clip_lower=-1200., clip_upper=600., spacing=True, order=1, min_max_normalize=True, scale=255, dtype='uint8') # convert the image to voxels(apply the real spacing between pixels) ct_array, meta = preprocess(*load_ct(ct_path)) patches = patches_from_ct(ct_array, meta, config['crop_size'], nodule_list, stride=config['stride'], pad_value=config['filling_value']) results = [] for nodule, (cropped_image, coords) in zip(nodule_list, patches): > cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float()) E RuntimeError: the given numpy array has zero-sized dimensions. Zero-sized dimensions are not supported in PyTorch src/algorithms/classify/src/gtr123_model.py:273: RuntimeError ================================ 1 failed, 5 passed in 39.14 seconds ================================= ```
2 days, 18 hours ago

@louisgv opened a new pull request: #288: [WIP] Wizard flow

<!--- Provide a general summary of your changes in the Title above --> ## Description <!--- Describe your changes in detail --> Work in progress... ## Reference to official issue <!--- If fixing a bug, there should be an existing issue describing it with steps to reproduce --> <!--- Please link to the issue here: --> #279 ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If adding a new feature or making improvements not already reflected in an official issue, please reference the relevant sections of the design doc --> Adding a wizard flow for each tab to assist user on what they should be doing next. ## How Has This Been Tested? <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## Screenshots (if appropriate): ## Metrics (if appropriate): If you submitting a PR for a prediction algorithm (segmentation, identification, or classification) please fill in values for as many as the below statistics as are relevant. *algorithms by metric* metric | relevant algorithms -------|-------------------- [accuracy <sup>1</sup> <sup>2</sup>](https://stats.stackexchange.com/a/231237/143678) | classification, identification [data IO](https://unix.stackexchange.com/questions/55212) | classification, identification, segmentation [Dice coefficient <sup>3</sup>](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) | segmentation [disk space usage](https://www.cyberciti.biz/faq/linux-check-disk-space-command) | classification, identification, segmentation [Hausdorff distance <sup>3</sup>](https://en.wikipedia.org/wiki/Hausdorff_distance) | segmentation [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) | segmentation [Log Loss](http://wiki.fast.ai/index.php/Log_Loss) | classification, identification <sup>4</sup> [memory usage](https://stackoverflow.com/questions/110259) | classification, identification, segmentation [prediction time <sup>2</sup>](https://stackoverflow.com/questions/385408) | classification, identification, segmentation [sensitivity <sup>3</sup>](http://wiki.fast.ai/index.php/Deep_Learning_Glossary#Recall) | segmentation [specificity <sup>3</sup>](http://wiki.fast.ai/index.php/Deep_Learning_Glossary#Specificity) | segmentation [training time <sup>2</sup>](https://stackoverflow.com/questions/385408) | classification, identification, segmentation *notes* 1. Use 5-fold cross validation if there is enough time and computational power available, otherwise use a holdout set 1. This metric may be automatically calculated by the machine learning architecture, e.g., Keras 1. The calculations for these metrics [are available here](https://github.com/concept-to-clinic/concept-to-clinic/blob/master/prediction/src/algorithms/segment/src/evaluate.py) 1. In order to calculate Log Loss for identification, the data needs to be arranged in a way that shows for each pixel, whether or not it is a nodule centriod. Restated, the pixel level labels of 1/0 would correspond to centriod/not-centriod. *metrics by algorithm* algorithm | relevant metrics ---------------|------------------ classification | accuracy, data IO, disk space usage, Log Loss, memory usage, prediction time, training time identification | accuracy, data IO, disk space usage, Log Loss, memory usage, prediction time, training time segmentation | data IO, Dice coefficient, disk space usage, Hausdorff distance, Jaccard index, memory usage, prediction time, sensitivity, specificity, training time When reporting your values, please use a format similar to the following example. algorithm | metric | value -------------|--------|------: segmentation | accuracy | 99.5 segmentation | Jaccard index | 0.5 segmentation | prediction time (s) | 45.3 segmentation | memory usage (MB) | 5.4 ## CLA - [ ] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
3 days ago

@reubano commented on issue #287: Cannot build from docker

Yes, I had issues before using an external drive as well. Also, see #165 for potential solutions to the space issue.
3 days ago

@reubano commented on issue #113: Fix Vue container - Error: Cannot find module 'chalk'

@nwaughachukwuma I stopped seeing the error after switching to an AWS server. Can you give that a try?
3 days ago
3 days, 1 hour ago

@louisgv commented on issue #287: Cannot build from docker

Hmm, this error was thrown only if I tried to run docker integrated with my external drive. If I disable the external drive and run docker, it seems to be running now...
3 days, 1 hour ago

@WGierke commented on issue #287: Cannot build from docker

Unfortunately, I can't confirm that on - Linux Ubuntu 16.04.3 LTS - Docker version 17.09.0-ce, build afdb6d4 - docker-compose version 1.15.0, build e12f3b9
3 days, 8 hours ago

@nwaughachukwuma commented on issue #113: Fix Vue container - Error: Cannot find module 'chalk'

Hey guys (@reubano and all) am getting this error on my Heroku App after CI/CD from GitLab to Heroku. Following is the gitlab-ci.yml script: ``` image: node:6 stages: - ver - init - tests - deploy cache: paths: - node_modules/ ver: stage: ver script: - node --version - whoami init: stage: init script: - npm install -g chalk - npm cache clean - rm -rf node-modules - npm install run_tests: # needed for testing selenium on docker with java image: karthikiyengar/node-nightwatch:0.0.1 stage: tests script: - sudo npm install - sudo npm test deploy_staging: stage: deploy script: - npm install -g chalk - npm cache clean - rm -rf node-modules - npm install - git remote add heroku https://heroku:$HEROKU_API_KEY@git.heroku.com/webapp-staging.git - git push heroku master - echo "Deployed to staging server" environment: staging only: - master deploy_production: stage: deploy script: - npm install -g chalk - npm cache clean - rm -rf node-modules - npm install - git remote add heroku https://heroku:$HEROKU_API_KEY@git.heroku.com/webapp.git - git push heroku master - echo "Deployed to production server" environment: production when: manual only: - master ``` Help will be greatly appreciated. Thanks
3 days, 12 hours ago

@louisgv created a new issue: #287: Cannot build from docker

<!--- Provide a general summary of the issue in the Title above --> I pulled from latest and tried to clean and rebuild docker. However, it failed on my system. ## Expected Behavior <!--- Tell us what should happen --> Running `docker-compose -f local.yml build` should build the application. ## Current Behavior <!--- Tell us what happens instead of the expected behavior --> The following error was thrown: ``` Running setup.py clean for networkx Successfully built sqlalchemy Failed to build networkx Installing collected packages: pyflakes, pycodestyle, mccabe, flake8, py, pytest, scipy, cycler, pyparsing, matplotlib, decorator, networkx, PyWavelets, scikit-image, sqlalchemy, pylidc Running setup.py install for networkx: started Running setup.py install for networkx: finished with status 'done' Successfully installed PyWavelets-0.5.2 cycler-0.10.0 decorator-4.2.1 flake8-3.3.0 matplotlib-2.1.1 mccabe-0.6.1 networkx-2.0 py-1.5.2 pycodestyle-2.3.1 pyflakes-1.5.0 pylidc-0.1.9 pyparsing-2.2.0 pytest-3.1.3 scikit-image-0.13.1 scipy-1.0.0 sqlalchemy-1.2.1 Exception: Traceback (most recent call last): File "/usr/lib/python3.6/shutil.py", line 436, in _rmtree_safe_fd os.unlink(name, dir_fd=topfd) FileNotFoundError: [Errno 2] No such file or directory: 'options' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/pip/basecommand.py", line 215, in main status = self.run(options, args) File "/usr/local/lib/python3.6/dist-packages/pip/commands/install.py", line 385, in run requirement_set.cleanup_files() File "/usr/local/lib/python3.6/dist-packages/pip/utils/build.py", line 38, in __exit__ self.cleanup() File "/usr/local/lib/python3.6/dist-packages/pip/utils/build.py", line 42, in cleanup rmtree(self.name) File "/usr/local/lib/python3.6/dist-packages/pip/_vendor/retrying.py", line 49, in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) File "/usr/local/lib/python3.6/dist-packages/pip/_vendor/retrying.py", line 212, in call raise attempt.get() File "/usr/local/lib/python3.6/dist-packages/pip/_vendor/retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "/usr/local/lib/python3.6/dist-packages/pip/_vendor/six.py", line 686, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/pip/_vendor/retrying.py", line 200, in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) File "/usr/local/lib/python3.6/dist-packages/pip/utils/__init__.py", line 102, in rmtree onerror=rmtree_errorhandler) File "/usr/lib/python3.6/shutil.py", line 480, in rmtree _rmtree_safe_fd(fd, path, onerror) File "/usr/lib/python3.6/shutil.py", line 418, in _rmtree_safe_fd _rmtree_safe_fd(dirfd, fullname, onerror) File "/usr/lib/python3.6/shutil.py", line 418, in _rmtree_safe_fd _rmtree_safe_fd(dirfd, fullname, onerror) File "/usr/lib/python3.6/shutil.py", line 438, in _rmtree_safe_fd onerror(os.unlink, fullname, sys.exc_info()) File "/usr/local/lib/python3.6/dist-packages/pip/utils/__init__.py", line 114, in rmtree_errorhandler func(path) IsADirectoryError: [Errno 21] Is a directory: '/tmp/pip-build-hv_u296y/flake8/flake8/options' ERROR: Service 'base' failed to build: The command '/bin/sh -c pip install -r /requirements/prediction/local.txt' returned a non-zero code: 2 ``` ## Steps to Reproduce <!--- Provide a link to a live example, or an unambiguous set of steps to --> <!--- reproduce this bug. Include code to reproduce, if relevant --> 1. Pull from master 2. Run `docker-compose -f local.yml build` ## Context (Environment) <!--- How has this issue affected you? What are you trying to accomplish? --> <!--- Providing context helps us come up with a solution that is most useful in the real world --> + Linux Debian + Docker version 17.09.0-ce, build afdb6d4 + docker-compose version 1.17.0, build ac53b73 ## Checklist before submitting - [ ] I have confirmed this using the officially supported Docker Compose setup using the `local.yml` configuration and ensured that I built the containers again and they reflect the most recent version of the project at the `HEAD` commit on the `master` branch - [ ] I have searched through the other currently open issues and am confident this is not a duplicate of an existing bug - [ ] I provided a **minimal code snippet** or list of steps that reproduces the bug. - [ ] I provided **screenshots** where appropriate - [ ] I filled out all the relevant sections of this template
3 days, 16 hours ago
3 days, 21 hours ago

@louisgv commented on issue #150: Vue: implement 'Segmentation' UI component(s)

No I am not On Tue, Jan 16, 2018, 1:48 PM Serhiy-Shekhovtsov <notifications@github.com> wrote: > @louisgv <https://github.com/louisgv> are you working on this? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/concept-to-clinic/concept-to-clinic/issues/150#issuecomment-358065002>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AGaX9mCSBwFgRRAEIhN4YVCF8DoyT7tyks5tLO7wgaJpZM4Pvui3> > . >
3 days, 21 hours ago

@Serhiy-Shekhovtsov commented on issue #279: UI feature: add overall progress indicator ("wizard" element)

@louisgv I am not working on it, just like the feature :) Especially if the progress will be saved on loaded. This gives me pain every time I work on interface frontend. Have to go to first step and select the case each time the page is reloaded.
3 days, 21 hours ago

@louisgv commented on issue #279: UI feature: add overall progress indicator ("wizard" element)

@Serhiy-Shekhovtsov are you actively working on this issue? (Since you upvoted it). Just wanted to make sure we're not colliding.
3 days, 22 hours ago
4 days ago

@Serhiy-Shekhovtsov opened a new pull request: #286: Sending and saving of nodule properties on Annotate screen

Fixed few bugs of Annotate screen. Fixed sending of the Nodule data to backend. Added fields for storing `probability_concerning` and `note`. Includes fix of #285 ## CLA - [ ] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
5 days, 19 hours ago

@isms commented on issue #280: Annotation: per-nodule notes should actually be POSTed and saved

@Serhiy-Shekhovtsov no, not at all, PATCH is the right verb 👍
1 week, 1 day ago

@Serhiy-Shekhovtsov commented on issue #280: Annotation: per-nodule notes should actually be POSTed and saved

@isms, I can see that we are using `PATCH` for updating the candidate. Is there a reason for using `POST` here?
1 week, 1 day ago

@vessemer commented on issue #283: Continuous improvement of nodule segmentation and volume estimates

Yes, sure. I'll add some comments in `trained_model.calculate_volume` with my next commit, since there is some obscurity :)
1 week, 2 days ago

@Serhiy-Shekhovtsov created a new issue: #285: User should be able to run nodule candidates prediction

Currently we can only create a new case based on local images. There should be a button to start the actual prediction. After clicking it should show that prediction is running. Once prediction is ready - it should show update the status on the page. ## Checklist before submitting - [ ] I have confirmed this using the officially supported Docker Compose setup using the `local.yml` configuration and ensured that I built the containers again and they reflect the most recent version of the project at the `HEAD` commit on the `master` branch - [ ] I have searched through the other currently open issues and am confident this is not a duplicate of an existing bug - [ ] I provided a **minimal code snippet** or list of steps that reproduces the bug. - [ ] I provided **screenshots** where appropriate - [ ] I filled out all the relevant sections of this template
1 week, 2 days ago
1 week, 2 days ago

@Serhiy-Shekhovtsov opened a new pull request: #284: Added test for multiple patches cropping

Added test for changes introduced in PR #273 - (fixed classification prediction for multiple nodules) ## CLA - [ ] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 week, 2 days ago

@caseyfitz commented on issue #283: Continuous improvement of nodule segmentation and volume estimates

Ah, thanks @vessemer! I thought the functionality was clear to me but I must have been confused due to the fact that `labels = [mask[centroid['x'], centroid['y'], centroid['z']] for centroid in centroids]` was returning `[1 1 1 1 1 1]` on the six `centroids` I was passing it (for LIDC-0003). Didn't realize that `scipy.ndimage.label` has a [default `structure` parameter](http://lagrange.univ-lyon1.fr/docs/scipy/0.17.1/generated/scipy.ndimage.label.html) representing squared connectivity, which should be sufficient for this stage of the project. The problem then, seems to be that the image has only *one* connected component, yes? If so, then 2 in the issue statement above should be good to go for now (in which case I'll edit the issue) and the immediate problems are just those in 1. Make sense?
1 week, 2 days ago

@vessemer commented on issue #283: Continuous improvement of nodule segmentation and volume estimates

@caseyfitz did you carefully read the code of `trained_model.calculate_volume`? [This] code treat centroids as connected components and deals well with them.
1 week, 2 days ago

@caseyfitz commented on issue #138: Continuous improvement of lungs segmentation algorithm

New issue #283 focused on nodule segmentation and volume estimation (as opposed to segmentation of the entire lung).
1 week, 2 days ago

@caseyfitz created a new issue: #283: Continuous improvement of nodule segmentation and volume estimates

After exploring the segmentation code under `prediction/src/algorithms/segment/` we have identified a few outstanding issues related to the segmentation functionality and volume calculations. These issues are all interrelated, but we've tried to divide them into two general catagories (whose code paths start in `segment/trained_model.py`): 1. Model architecture / complexity (`trained_model.predict`) * Model output shape (512, 512, 1024) - the `.npy` mask saved to `segment_path` - should not have 1024 slices. Most slices after 200 are uniform, for example in `LIDC-IDRI-0003` with value `0.45197698` and an overall range around `-0.35` to `0.8` * The `simple_3d_model.py` and `unet_3d_model.py` each use the same `best_model_Simple3DModel` and make identical predictions. However, the full unet will only process some full size test images without throwing a `MemoryError`. * It may be too much to try and retrain a new model this late, but it is desireable to have at least one model that accepts any appropriately sized input and outputs the correct shape. 2. Nodule volume calculation (`trained_model.calculate_volume`) * The naive approach using `numpy.bincount`, which calculates nodule volumes by summing non-zero values in the binary mask saved as `lung-mask.npy`, does not use centroid information and merely sums non-zero values in the scan, yielding a (poor) _total centroid volume_ rather than the distinct volumes of each centroid in `centroids`. One negative impact of this is that for `n` centroids, the predicted volume is just this total volume, `n` times. * More advanced brute force approaches using convex hull (`scipy.spatial.ConvexHull`, `skimage.morphology.convex_hull_image`) are either too memory intensive or only work with 2d arrays. Plus, it's not clear that a standard convex hull approach would be best anyway, since the entire lungs aren't our interest, but subsets of the lungs (perhaps something like `skimage.morphology.convex_hull_object`, but this only works on 2d arrays). * The ideal function (as specified in the doc string for `trained_model.calculate_volume`) takes a list of centroids as inputs and calculates e.g. 3d connected components given those centroids. * Note that in the current `Simple3DModel`, masking of nodules does not perform well and it's possible that there is essentially one large connected component spanning ~200 slices. The approach to exploring these issues has been to use an interactive jupyter notebook, rooted in the `prediection` directory of the application. From there, one can use `from src.algorithms.segment.trained_model import predict` to start playing with the outputs directly and testing changes on the fly. (Pro tip: use the magic `%load_ext autoreload` to [autoreload](https://ipython.org/ipython-doc/3/config/extensions/autoreload.html) the functions with your changes everytime you call them.) And as always, please update documentation too with any new changes for easy points! (The [segment predict docs](http://concept-to-clinic.readthedocs.io/en/latest/design-doc.html#segment) are pretty weak right now.)
1 week, 2 days ago

@WGierke commented on PR #282: Add Python Coverage Report

@lamby @isms Now, we just generate the standard report. If someone wants to add that the report should be uploaded to codecov.io: it worked for me by running `coverage xml` and uploading it using `bash <(curl -s https://codecov.io/bash) -t THE-CODECOV-TOKEN` to codecov. Maybe it's not necessary to specify the token when uploading from a Travis build but I'm not sure. PS: the report from the current build: prediction service: ``` Name Stmts Miss Branch BrPart Cover ------------------------------------------------------------------------------------------- __init__.py 0 0 0 0 100% config.py 22 0 0 0 100% src/__init__.py 0 0 0 0 100% src/algorithms/__init__.py 0 0 0 0 100% src/algorithms/classify/__init__.py 0 0 0 0 100% src/algorithms/classify/src/__init__.py 0 0 0 0 100% src/algorithms/classify/src/gtr123_model.py 151 1 26 1 99% src/algorithms/classify/trained_model.py 4 0 0 0 100% src/algorithms/identify/__init__.py 0 0 0 0 100% src/algorithms/identify/prediction.py 190 153 58 0 15% src/algorithms/identify/src/__init__.py 0 0 0 0 100% src/algorithms/identify/src/gtr123_model.py 284 229 68 1 16% src/algorithms/identify/trained_model.py 43 26 18 1 36% src/algorithms/segment/__init__.py 0 0 0 0 100% src/algorithms/segment/src/__init__.py 0 0 0 0 100% src/algorithms/segment/src/evaluate.py 31 0 0 0 100% src/algorithms/segment/src/models/__init__.py 0 0 0 0 100% src/algorithms/segment/src/models/segmentation_model.py 38 10 0 0 74% src/algorithms/segment/src/models/simple_3d_model.py 39 7 2 0 78% src/algorithms/segment/trained_model.py 28 0 6 0 100% src/factory.py 19 4 4 2 74% src/preprocess/__init__.py 0 0 0 0 100% src/preprocess/crop_dicom.py 21 1 8 1 93% src/preprocess/crop_patches.py 33 4 16 5 82% src/preprocess/errors.py 5 0 2 1 86% src/preprocess/extract_lungs.py 171 88 63 2 47% src/preprocess/generators.py 313 59 172 25 78% src/preprocess/improved_lung_segmentation.py 284 17 100 9 92% src/preprocess/load_ct.py 91 6 28 2 93% src/preprocess/lung_segmentation.py 139 105 36 1 20% src/preprocess/preprocess_ct.py 78 14 46 14 77% src/views.py 32 0 10 1 98% ------------------------------------------------------------------------------------------- TOTAL 2016 724 663 66 62% ``` interface service: ``` Name Stmts Miss Branch BrPart Cover --------------------------------------------------------------- backend/__init__.py 0 0 0 0 100% backend/api/__init__.py 0 0 0 0 100% backend/api/serializers.py 65 16 2 1 75% backend/api/urls.py 28 10 4 0 56% backend/api/views.py 79 7 3 0 91% backend/cases/__init__.py 0 0 0 0 100% backend/cases/enums.py 65 0 18 0 100% backend/cases/factories.py 18 0 0 0 100% backend/cases/models.py 84 0 4 0 100% backend/images/__init__.py 0 0 0 0 100% backend/images/factories.py 15 0 6 0 100% backend/images/models.py 84 1 23 0 99% config/__init__.py 0 0 0 0 100% config/settings/__init__.py 0 0 0 0 100% config/settings/base.py 33 2 0 0 94% config/settings/local.py 12 0 0 0 100% config/urls.py 10 4 4 1 50% manage.py 16 6 2 1 61% --------------------------------------------------------------- TOTAL 509 46 66 3 90% ```
1 week, 3 days ago

@isms commented on PR #282: Add Python Coverage Report

My thought is that a text report in the build is all we're looking for to close the issue. As @WGierke mentioned an integration is easy and just a matter of configuring the build tool. Anyway, fiddling with Travis is out of scope for contributors, so my recommendation is still just to generate the text report as part of the testing commands and leave it there. (We may or may not loop in a third party service this late in the game.)
1 week, 3 days ago

@WGierke commented on PR #282: Add Python Coverage Report

[Apparently codecov.io supports Coverage.py](https://github.com/codecov/example-python). I'm currently trying to add an upload of the generated report to codecov.io on my fork. This way we can have both - a nice HTML version that can be generated locally (if need be) and a repository integration that shows us the changed coverage that would be induced by merging a branch.
1 week, 3 days ago
1 week, 3 days ago

@reubano commented on PR #272: Fixed coordinates scaling for classification prediction

Yes, tests pass for me now... maybe the travis error was a fluke? I just reran the build.
1 week, 4 days ago

@WGierke commented on PR #282: Add Python Coverage Report

@lamby I tried this "spike" for a couple of reasons. First, I don't have the rights to add such an integration like coveralls to the repo :-) Second, one would always be required to first push to GitHub to receive a test report if I understand it right. While this not only takes way longer than running the report generation locally, there are also tests that always time out on Travis which means that one can never fully rely on the report generated online. However, I understand your point.
1 week, 4 days ago
1 week, 4 days ago

@lamby commented on PR #282: Add Python Coverage Report

Wouldn't it be better to push to an external service like coveralls (or similar). This way we get nice GitHub integration where we say "your PR decreases coverage by 2%". I believe some of these services even let you block merges (!!) if you decrease :)
1 week, 4 days ago

@Serhiy-Shekhovtsov commented on PR #272: Fixed coordinates scaling for classification prediction

@reubano, coordinates for this test has been hand crafted to match the real nodule on the third image: **LIDC-IDRI-0003**. But the `glob` method returns unsorted list. By coincidence on your machine the third item is **LIDC-IDRI-0002**. I will fix the test to sort the list. ![image](https://user-images.githubusercontent.com/607527/34689922-df2e5496-f4bf-11e7-99ce-7c36862d9a7f.png)
1 week, 4 days ago

@reubano commented on PR #272: Fixed coordinates scaling for classification prediction

Ran this a few times and while @caseyfitz's use cases looks solved, the `test_classify_real_nodule_full_dicom` is failing for me: ```python _____________________________________ test_classify_real_nodule_full_dicom _____________________________________ dicom_paths = ['/images_full/LIDC-IDRI-0002/1.3.6.1.4.1.14519.5.2.1.6279.6001.490157381160200744295382098329/1.3.6.1.4.1.14519.5.2.1...14519.5.2.1.6279.6001.298806137288633453246975630178/1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192'] model_path = '/app/src/algorithms/classify/assets/gtr123_model.ckpt' def test_classify_real_nodule_full_dicom(dicom_paths, model_path): predicted = trained_model.predict(dicom_paths[2], [{'x': 367, 'y': 349, 'z': 75}], model_path) assert predicted > assert 0.3 <= predicted[0]['p_concerning'] <= 1 E assert 0.3 <= 0.0021293163299560547 src/tests/test_classification.py:23: AssertionError ```
1 week, 5 days ago

@Serhiy-Shekhovtsov commented on PR #272: Fixed coordinates scaling for classification prediction

To be precise - the reported issue was caused by wrong coordinates. The coordinates specified in the test were good for full size DICOM but test was running on a small image. But I have found an other issue with classification. It was confirmed by @caseyfitz [here](https://github.com/concept-to-clinic/concept-to-clinic/pull/272#issuecomment-355413434). This PR includes tests for real nodules.
1 week, 5 days ago

@reubano commented on PR #272: Fixed coordinates scaling for classification prediction

@Serhiy-Shekhovtsov great! And just to be clear, this fixes #268 right? Is there any other test (besides that one) I should run to make sure this PR works as expected?
1 week, 5 days ago

@Serhiy-Shekhovtsov commented on PR #272: Fixed coordinates scaling for classification prediction

@reubano converting to boolean is a fix for an issue of lost `meta.spacing`. Now `params.spacing` is boolean and it tells if we need to scale the image. But it doesn't wipe out the `meta.spacing` anymore.
1 week, 5 days ago

@reubano commented on PR #272: Fixed coordinates scaling for classification prediction

Was the issue about converting `spacing` from a number to a boolean resolved? I wasn't sure if the back and forth reached a conclusion.
1 week, 5 days ago

@WGierke commented on issue #275: Unit tests for Python backend code

Does something like #282 make sense? It generates a coverage report locally which could then also be persisted by Travis.
1 week, 6 days ago

@WGierke opened a new pull request: #282: Add Python Coverage Report

As discussed in #275, it would be helpful to generate a coverage report of the unit tests in which the total coverage and missing lines can be seen. This PR adds [Coverage.py](https://coverage.readthedocs.io/en/coverage-4.4.2/) which keeps track of which LOC haven't been tested after running the tests. It generates a `htmlcov` directory which includes the HTML files that should be served afterwards to correctly render the report. An example of such a report for the current master is [here](https://wgierke.github.io/stuff/). To persist this report generated by Travis, the project manager need to [set up Travis artifacts](https://docs.travis-ci.com/user/uploading-artifacts/). ## CLA - [X] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 week, 6 days ago

@isms commented on issue #275: Unit tests for Python backend code

Good thought, we have used and enjoyed codecov.io. Maybe we can start with just a report in the test pipeline, and staff can look into setting up the integration of time permits.
1 week, 6 days ago

@lamby commented on PR #272: Fixed coordinates scaling for classification prediction

> turned out, one little else at this line can solve the issue with multiple centroids. I will push the fix and tests soon. "with this one weird trick discovered by a schoolteacher", huh? Looking forward to tests and resolving the conflicts... :)
2 weeks ago

@WGierke opened a new pull request: #281: #274 Add doc strings

I skimmed through the code and added doc strings to some functions I added whose exact method signatures and purposes I still was able to recap. More exhaustive doc string adding PRs will follow. ## Reference to official issue #274 ## CLA - [X] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
2 weeks ago

@isms commented on issue #274: Code quality: adding docstrings (for points!)

Google or Numpy are both fine as long as they conform to the respective specification.
2 weeks ago

@isms created a new issue: #280: Annotation: per-nodule notes should actually be POSTed and saved

## Expected Behavior Manipulating these should be `POST`ed to the backend and saved in the appropriate model instance: ![image](https://user-images.githubusercontent.com/1284973/34627406-6c2f62a2-f22e-11e7-8b47-52c6b6ec5a5a.png) ## Current Behavior No-op
2 weeks ago

@WGierke commented on issue #274: Code quality: adding docstrings (for points!)

Can we agree on a common [docstring format](https://stackoverflow.com/questions/3898572/what-is-the-standard-python-docstring-format)?
2 weeks ago

@isms created a new issue: #279: UI feature: add overall progress indicator ("wizard" element)

## Expected Behavior We want the users to have a clear path in the workflow: - Load a case - Mark all nodules - Mark any nodules not found if necessary - Annotate and bound all nodules marked concerning - Move to report We should clearly show them at all times a progress indicator showing which phase (step 1/4) and what remains in that phase (2/3 nodules identified). This should all be fairly easy to calculate from the current Case. Some random images of the kind of element: ![](http://d5vf6134d8ffdnfp1qv4rv3l.wpengine.netdna-cdn.com/wp-content/uploads/06-vertical-progress-checkout.jpg) ![](https://i.gyazo.com/298959b3aacc180e1dce0b27ff65522b.png) ![]() ## Possible implementation Example: http://vuejs.creative-tim.com/vue-form-wizard/#/
2 weeks ago

@isms created a new issue: #278: Report: RSNA standard template should pull from the actual Case

<!--- Provide a general summary of the issue in the Title above --> ## Expected Behavior Fill in technical parameters, overall findings, and nodule data from the actual case (with thumbnails). ## Current Behavior Mock data. ## Possible Solution - [ ] Add serializers for data not already in Case so it is present in Vuex representation of current case - [ ] Make sure this is all used on the frontend with no mock data left
2 weeks ago

@WGierke commented on issue #275: Unit tests for Python backend code

Would it be useful to add integrations such as [CodeCov](https://codecov.io/) to continuously monitor whether a test really adds more coverage / a feature reduces the coverage?
2 weeks ago

@isms created a new issue: #277: Report: per-nodule 3D shapefile export from 3D boolean mask

Assume a boolean mask for each nodule that is a 3D numpy array of the same dimensions as the DICOM image (e.g. 512x512x24 layers) where some internal of the 3D space is a [polytope](https://en.wikipedia.org/wiki/Polytope). ## Expected Behavior A user wants to download a [shapefile (.shp)](https://en.wikipedia.org/wiki/Shapefile#Shapefile_shape_format_(.shp)) for 3D printing or viewing in CAD. Should use DICOM metadata to recover real world 3D dimensions. ## Current Behavior Not implemented. ## Possible Solution Libraries like `pyshp` may be helpful.
2 weeks ago

@isms created a new issue: #276: Report: implement ACR Lung-RAD™ findings based on Case

Here are the assessment criteria: https://www.acr.org/-/media/ACR/Files/RADS/Lung-RADS/LungRADS_AssessmentCategories.pdf?la=en ## Expected Behavior On the report, show what we can of where in the decision criteria the case characteristics put us. ## Possible enhancements Ask for anything missing (e.g. patient age), allow interactive outcome calculator from input scenarios. ## Current Behavior Not implemented.
2 weeks ago

@isms created a new issue: #275: Unit tests for Python backend code

Go through and add [unit tests](https://en.wikipedia.org/wiki/Unit_testing) for Django/DRF or Python functions in the prediction service. (This is correctness testing separate from ML evaluation.) Tests should not unduly slow down the build. <hr> *Points will be awarded continuously through the end of the competition -- this issue will not close.*
2 weeks ago

@isms created a new issue: #274: Code quality: adding docstrings (for points!)

## Expected Behavior Go through and add docstrings or improve existing docstrings where helpful.
2 weeks ago

@Serhiy-Shekhovtsov commented on issue #268: Classification throws RuntimeError for real nodule location

Coordinates stored in in nonscaled form. So we have to scale them. And for that we need to know the spacing. So the point of suggested change is - preserving properties of `meta` object.
2 weeks ago

@reubano commented on PR #273: fixed classification prediction for multiple nodules

Thanks for this! Would you also mind adding a test that fails unless this patch is applied? That will make sure we don't create regressions with future changes.
2 weeks ago

@reubano commented on issue #271: Evaluation Pipeline for Models

@vessemer great observation! I see that they provide evaluation code as well. So yes, adjusting it to fit our use case will be extremely useful!
2 weeks, 1 day ago

@Serhiy-Shekhovtsov opened a new pull request: #273: fixed classification prediction for multiple nodules

Bug fixed. When the length of the nodules list is more then 1 – prediction fails as follows: ``` ~/concept-to-clinic/prediction/src/algorithms/classify/src/gtr123_model.py in predict(ct_path, nodule_list, model_path) 275 results = [] 276 --> 277 for nodule, (cropped_image, coords) in zip(nodule_list, patches): 278 cropped_image = Variable(torch.from_numpy(cropped_image[np.newaxis, np.newaxis]).float()) 279 cropped_image.volatile = True ValueError: too many values to unpack (expected 2) ``` More details [here](https://github.com/concept-to-clinic/concept-to-clinic/pull/272#issuecomment-355413434). ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well <!--- Provide a general summary of your changes in the Title above --> ## Description <!--- Describe your changes in detail --> ## Reference to official issue <!--- If fixing a bug, there should be an existing issue describing it with steps to reproduce --> <!--- Please link to the issue here: --> ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If adding a new feature or making improvements not already reflected in an official issue, please reference the relevant sections of the design doc --> ## How Has This Been Tested? <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## Screenshots (if appropriate): ## Metrics (if appropriate): If you submitting a PR for a prediction algorithm (segmentation, identification, or classification) please fill in values for as many as the below statistics as are relevant. *algorithms by metric* metric | relevant algorithms -------|-------------------- [accuracy <sup>1</sup> <sup>2</sup>](https://stats.stackexchange.com/a/231237/143678) | classification, identification [data IO](https://unix.stackexchange.com/questions/55212) | classification, identification, segmentation [Dice coefficient <sup>3</sup>](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) | segmentation [disk space usage](https://www.cyberciti.biz/faq/linux-check-disk-space-command) | classification, identification, segmentation [Hausdorff distance <sup>3</sup>](https://en.wikipedia.org/wiki/Hausdorff_distance) | segmentation [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) | segmentation [Log Loss](http://wiki.fast.ai/index.php/Log_Loss) | classification, identification <sup>4</sup> [memory usage](https://stackoverflow.com/questions/110259) | classification, identification, segmentation [prediction time <sup>2</sup>](https://stackoverflow.com/questions/385408) | classification, identification, segmentation [sensitivity <sup>3</sup>](http://wiki.fast.ai/index.php/Deep_Learning_Glossary#Recall) | segmentation [specificity <sup>3</sup>](http://wiki.fast.ai/index.php/Deep_Learning_Glossary#Specificity) | segmentation [training time <sup>2</sup>](https://stackoverflow.com/questions/385408) | classification, identification, segmentation *notes* 1. Use 5-fold cross validation if there is enough time and computational power available, otherwise use a holdout set 1. This metric may be automatically calculated by the machine learning architecture, e.g., Keras 1. The calculations for these metrics [are available here](https://github.com/concept-to-clinic/concept-to-clinic/blob/master/prediction/src/algorithms/segment/src/evaluate.py) 1. In order to calculate Log Loss for identification, the data needs to be arranged in a way that shows for each pixel, whether or not it is a nodule centriod. Restated, the pixel level labels of 1/0 would correspond to centriod/not-centriod. *metrics by algorithm* algorithm | relevant metrics ---------------|------------------ classification | accuracy, data IO, disk space usage, Log Loss, memory usage, prediction time, training time identification | accuracy, data IO, disk space usage, Log Loss, memory usage, prediction time, training time segmentation | data IO, Dice coefficient, disk space usage, Hausdorff distance, Jaccard index, memory usage, prediction time, sensitivity, specificity, training time When reporting your values, please use a format similar to the following example. algorithm | metric | value -------------|--------|------: segmentation | accuracy | 99.5 segmentation | Jaccard index | 0.5 segmentation | prediction time (s) | 45.3 segmentation | memory usage (MB) | 5.4 ## CLA - [ ] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
2 weeks, 1 day ago

@Serhiy-Shekhovtsov commented on PR #272: Fixed coordinates scaling for classification prediction

@caseyfitz turned out, one little `else` at [this line](https://github.com/Serhiy-Shekhovtsov/concept-to-clinic/blob/4887eeb6778013fce10e7e91fd846c84cb7cb248/prediction/src/preprocess/crop_patches.py#L68) can solve the issue with multiple centroids. I will push the fix and tests soon.
2 weeks, 1 day ago