@dchansen

Signed up since Aug. 23, 2017

Points

Timestamp Points Contributor Ad-hoc References
Oct. 16, 2017 40 @dchansen No Issues #4
PR #122

Activity

@dchansen commented on PR #142: Implemented simple nodule segmentation algorithm

I am closing the pull request. The point of it was to implement something simple, to quickly get the prediction service fully featured - 19 days later I think we somewhat missed that. It is also directly competing with pull request [#147](https://github.com/concept-to-clinic/concept-to-clinic/pull/147), but with no objective way to compare them (we can't use the LIDC data). Either way, whichever pull request does not get selected will essentially be wasted work.
1 year, 1 month ago

@dchansen commented on PR #147: #3 Segmentation Algorithm

I'll check my exact configuration and memory as soon as I can, though it might not be before Wednesday. Sorry. I think it was 128x128x128, but I'll check. Total memory was 2x12Gb. I believe part of the efficiency of VNET comes from the use of strided convolutions, rather than pooling layers. I do know that strided convolutions can lead to checkerboard artefacts though, so it might not give as nice results. The original VNET paper used a side of 128x128x64, and was able to fit 2 batches in 8Gb of memory for training. That should be enough to fit most (all?) nodules. My concern with LIDC is that it might encourage overfitting to that dataset. Doing something like 5-fold cross validation would be quite difficult, as some of these models literally take weeks to train on a single Titan X.
1 year, 2 months ago

@dchansen commented on PR #147: #3 Segmentation Algorithm

This looks great. I think a 3D U-Net (sometimes called V-NET?) would probably take the prize for segmentation, but as you say, memory is a concern with 1mm voxel sizes. I don't think 32Gb memory use for evaluation can be right though - I'm currently training a VNET on 2 GPUS for an unrelated project, and with a batchsize of 4, I'm, still not out of memory. For inference, the memory use for a 144x144x144 volume should be less than 1Gb. I am wondering how we benchmark these approaches fairly. We can't really evaluate on LIDC, as that's what everyone will use to train the models. Do we have a good validation dataset? I only have access to radiotherapy ones, and that's not quite the same thing.
1 year, 2 months ago

@dchansen commented on PR #142: Implemented simple nodule segmentation algorithm

@WGierke This was very much a minimal implementation, so I haven't tested it that thoroughly. Maybe we could open a seperate issue to create evaluation scripts for this feature? I'll add some documentation for the variables Monday.
1 year, 2 months ago

@dchansen opened a new pull request: #142: Implemented simple nodule segmentation algorithm

<!--- Provide a general summary of your changes in the Title above --> This pull request implements a simple level-set based segmentation algorithm for nodule segmentation. ## Description <!--- Describe your changes in detail --> This implements a simple segmentation algorithm based on SITK, and returns binary mask, diameter and volume. ## Reference to official issue <!--- If fixing a bug, there should be an existing issue describing it with steps to reproduce --> <!--- Please link to the issue here: --> #3 ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If adding a new feature or making improvements not already reflected in an official issue, please reference the relevant sections of the design doc --> Part of minimal viable product. ## How Has This Been Tested? <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> The segmentation has been tested manually on a few LIDC images. A better approach would be a test based on pylidc and report dice values. ## Screenshots (if appropriate): ![Nodule Segmentation](https://i.imgur.com/QBwqU2D.png) ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 year, 2 months ago
1 year, 2 months ago
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

Just not very experienced in using git rebase. I found myself in the situation of having to redo every merge made on the project for the last ten days. I have messed up my repository quite badly - I think my best approach is to try again from home, where I have a pristine version of my repo, before my attempts at rebasing.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

I seem to have messed that up rather thoroughly. This will take me a while to sort out.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

The torch import is required simply because torch needs to be imported before Keras, always. I am very open to other ways of doing that.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

I changed the image to be Ubuntu based instead. All tests should pass now.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

`pytorch:36` segfaults. `varunagrawal/pytorch` does not, but I had to manually install opencv and tkinter in it.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

Using a different docker image solved the problem for me. I cannot for the life of me figure out where the segfault is coming from in the python3:6 image however.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

The test run in my local environment (outside docker). Inside docker they run when individually, but not when running them all at once. I suspect it is tensorflow/torch shenanigans, but it is quite tricky to debug.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

@lamby Github claims that it has no conflicts with the base branch?
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

My issue was that I had to disable a lot of code in[identify.trained_model](https://github.com/dchansen/concept-to-clinic/blob/3396244f45f6683bdcc74b8f05081c16549ae315/prediction/src/algorithms/identify/trained_model.py#L22-L71) But I think this is what you want?
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

@vessemer Well, if you can share the code, I can have a look at it.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

@reubano Yes, that was my thought exactly. The problem right now is that all the models take ownership of trained_model.predict, and there is no good way of switching between particular models. Ideally, we might also like to do meta-model, combining many different ones, and a uniform interface would help greatly.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

@vessemer I am a little unsure about the benefit of moving the model to Keras now that Tensorflow and Torch seem to work without disturbing one another. Different approaches will require different enough post- and pre-processing that they can't be represented as a Keras model anyway. Long term, I do see the advantage of having all models on a single deep-learning framework, but is Keras the right choice? Using [ONNX](https://github.com/onnx/onnx) as the way to save the model would allow for easy use of many frameworks, including Caffe2 and CNTK. Either way, as soon as there is a consensus, I would be happy to help converting the models to that particular format. One difference between the Torch and Keras batchnorm is that they use a different epsilon value (1e-5 vs 1e-3), so maybe that's the issue? Could you make your WIP available on the model standardization?
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

I'm trying to do a reasonable merge right now, and algorith.identify.trained_model is giving me a headache. I think we really need a standardized way of specifying a model, which is independent of frameworks (Keras, Torch, Tensorflow, etc). I would suggest that each model independently implements a predict method taking only a dicom path (for identification) or a dicom path and a list of candidates. trained_model.predict would then take the same argument, plus a string identifying the algorithm to be used . How does this sound?
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

This would work for now. I think the long term solution should be to move the gtr123 preprocessing into preprocess_ct. I will probably not have time to do that until next week though.
1 year, 2 months ago

@dchansen commented on issue #4: Feature: Adapt the grt123 model

It was an error on my side. It's fixed in the repo now.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

The git lfs issue should be resolved now. I kept the previous code because I was unsure what the long-term plans were regarding the style of the project. Conceivably, this pull request could be changed to fit the style of the previous one. While I don't agree with all the choices made, it was in many ways nicer than what I contributed here. The main reason I did not follow it more closely was one of time, as I just wanted a working adaption of the gtr123 model as soon as possible. I have fixed all the flake8 issues, except the complexity ones. Hopefully, I will have them fixed by the end of the day.
1 year, 2 months ago

@dchansen commented on issue #4: Feature: Adapt the grt123 model

I have the full model adapted for Python3 in the pull request at https://github.com/concept-to-clinic/concept-to-clinic/pull/122 The Nodule identification part is extremely memory hungry. Be sure to enable the volatile setting on the Variables, and reduce the chunk size in the SplitComb.
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification and classification algorithms

I am a little unsure why, but the conflict between Tensorflow and Pytorch seems gone. It was realted to this issue in Tensorflow https://github.com/tensorflow/models/issues/523
1 year, 2 months ago

@dchansen commented on PR #122: Converted Team 'grt123' identification algorithm

@isms I agree that it's not a feasible compromise at all. I did it mainly to highlight the issue. I think we should have a single deep learning framework for the project as a whole, as they don't really play well together. I am not sure if Keras is the best choice, but either way, the project leads should make the decisions. Perhaps we should open an issue? Converting the model itself into Keras would relatively easy, but converting the weights is somewhat more involved. Converting to Caffe2 or CNTK would be easy, as they all support the ONNX format.
1 year, 2 months ago

@dchansen opened a new pull request: #122: Converted Team 'grt123' identification algorithm

<!--- Provide a general summary of your changes in the Title above --> ## Description <!--- Describe your changes in detail --> This pull request implements the grt123 identification algorithm. It is based on PyTorch, which currently conflicts with Tensorflow, so the pull request also disables the Keras import. While the algorithm is quite fast on GPU, it is painfully slow (but working) on the CPU due to a bug in PyTorch. The bug is fixed in the PyTorch development branch, but the next release is about 2 months off. ## Reference to official issue <!--- If fixing a bug, there should be an existing issue describing it with steps to reproduce --> <!--- Please link to the issue here: --> https://github.com/concept-to-clinic/concept-to-clinic/issues/4 https://github.com/concept-to-clinic/concept-to-clinic/issues/1 ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If adding a new feature or making improvements not already reflected in an official issue, please reference the relevant sections of the design doc --> This Pull request implements part of the prediction service (https://concept-to-clinic.readthedocs.io/en/latest/design-doc.html#detect-prediction-service) ## How Has This Been Tested? <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> The algorithm has largely been tested by manually confirming about 5 random LUNA2016 datasets - as the loaded model has already been tested extensively elsewhere, this was deemed sufficient. It has been tested both outside of docker, on a Titan X GPU, and inside docker on an i7 Intel CPU. ## Screenshots (if appropriate): ## CLA - [X] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 year, 2 months ago

@dchansen commented on PR #119: Add function that loads MetaImage files

I would strongly support moving to SimpleITK for everything, as it correctly deals with things like calibration curves, orientations, etc and gives us an easy way to pass along images and their metadata in one package.
1 year, 2 months ago

@dchansen commented on issue #1: Feature: Implement identification algorithm

My code is available at https://github.com/dchansen/concept-to-clinic Note that it is only the first part (identification) and not the second part. I will try to iron out the last bugs and have a pull request ready tomorrow.
1 year, 3 months ago

@dchansen commented on issue #1: Feature: Implement identification algorithm

I have adapted the grt123 code to do this and implemented it in my concept-to-clinic repo. The results are not perfect, but certainly much better than nothing. I am a little unclear what is required beyond providing the basic algorithm. Should I verify it on the LUNA dataset? I would also be unable to provide training details, as I am simply using the pretrained model. Finally, pytorch and tensorflow do not play nice together, so I am currently having to disable the Keras imports in the classification algorithm.
1 year, 3 months ago

@dchansen commented on issue #1: Feature: Implement identification algorithm

The grt123 teams solution is available, including a serialized version. Would that be acceptable for this feature?
1 year, 3 months ago