@tdraebing

Signed up since Aug. 4, 2017

Points

Timestamp Points Contributor Ad-hoc References
Sept. 11, 2017 3 @tdraebing No Issues #24
PR #107
Sept. 5, 2017 5 @tdraebing No Issues #58
PR #70
Aug. 28, 2017 2 @tdraebing No Issues #58
Aug. 23, 2017 20 @tdraebing No Issues #38
PR #57
Aug. 12, 2017 20 @tdraebing No Issues #12
PR #44

Activity

@tdraebing commented on issue #121: Use SimpleITK to load DICOM-images

SimpleITK as a port from C/C++ is for sure not really pythonic and accessing metadata is for sure a pain having to use the encoded tags. My idea was to keep the number of dependencies small and to be more consistent in what packages we use for the tasks. Also SimpleITK uses C in the backend as far as I understand and might therefor be faster. Although this might be more important if we would use it to also do preprocessing/transformation with SimpleITK. I could try some benchmarks on the weekend.
1 year, 2 months ago

@tdraebing commented on PR #119: Add function that loads MetaImage files

Awesome. Thanks for checking up on the license! I opened a new issue to deal with loading the DICOM-images with SimpleITK.
1 year, 2 months ago

@tdraebing created a new issue: #121: Use SimpleITK to load DICOM-images

<!--- Provide a general summary of the issue in the Title above --> ## Expected Behavior <!--- Tell us what should happen --> DICOM-images and -series should be loaded using the `SimpleITK`-package. ## Current Behavior <!--- Tell us what happens instead of the expected behavior --> Currently DICOM-images are loaded into memory by using `pydicom` and `dicom-numpy` in `load_dicom.py`. A pull request ( is on its way ## Possible Solution <!--- Not obligatory, but suggest a fix/reason for the bug, --> ## Steps to Reproduce <!--- Provide a link to a live example, or an unambiguous set of steps to --> <!--- reproduce this bug. Include code to reproduce, if relevant --> 1. 2. 3. 4. ## Context (Environment) <!--- How has this issue affected you? What are you trying to accomplish? --> <!--- Providing context helps us come up with a solution that is most useful in the real world --> ## Detailed Description <!--- Provide a detailed description of the change or addition you are proposing --> ## Possible Implementation <!--- Not obligatory, but suggest an idea for implementing addition or change --> ## Checklist before submitting - [ ] I have confirmed this using the officially supported Docker Compose setup using the `local.py` configuration and ensured that I built the containers again and they reflect the most recent version of the project at the `HEAD` commit on the `master` branch - [ ] I have searched through the other currently open issues and am confident this is not a duplicate of an existing bug - [ ] I provided a **minimal code snippet** or list of steps that reproduces the bug. - [ ] I provided **screenshots** where appropriate - [ ] I filled out all the relevant sections of this template
1 year, 2 months ago

@tdraebing commented on PR #119: Add function that loads MetaImage files

Hi @vessemer, thanks for the pull request. I am currently pretty swamped with the last months in my PhD, so I didn't manage to finish my pull request in the last days. Based on my work on the issue, I have some further suggestions: - The `load_dicom`-function uses pydicom and dicom-numpy to load the DICOM-images. SimpleITK also has a method implemented to do so (which might even be more tolerant to compressed images). Fully switching to SimpleITK would reduce the number of dependencies and make the code easier to maintain and follow. There would be also some changes in `crop_dicom` and `preprocess_dicom` needed. This might also be handled in a new issue. - In your `MetaData`-class you calculate the slice thickness. There actually is a value for this in the metadata of the DICOM-images: '0018|0050'. (or for a pydicom-object `SliceThickness`. - The `load_meta`-function seems to be quite redundant to `load_ct`. Maybe by adding a flag (like `voxel=True` if the voxel data should be returned as well) would be better here. - This is just my personal preference, but maybe it would make sense to put the logic in `MetaData.__init__()` into a separate method(s). That would make it easier to extend the object later on. For example if patient data becomes relevant in the interface, which would also be stored in the metadata. - From your tests I read that you added data from the LUNA16-competition. Did you check, whether using them in other projects is fine with their license?
1 year, 2 months ago

@tdraebing commented on issue #98: Add function that loads MetaImage files

I would start working on that. Are there any image files in those formats in the larger data set that will be used for model training/testing later on that could be used for tests? Else I would try to find some.
1 year, 3 months ago

@tdraebing opened a new pull request: #107: Added documentation for the MDai algorithm.

<!--- Provide a general summary of your changes in the Title above --> ## Description <!--- Describe your changes in detail --> I added documentation for the model created by MD.ai for the Kaggle Data Science Bowl 2017. The documentation given by the authors only included instructions to run the pipeline, but did not comment on the model architecture and what thoughts were behind the design. I went through the code to find out how the model is designed. If you find that I missed something important, please feel free to comment or extend the pull request. ## Reference to official issue <!--- If fixing a bug, there should be an existing issue describing it with steps to reproduce --> <!--- Please link to the issue here: --> Issue #24 ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If adding a new feature or making improvements not already reflected in an official issue, please reference the relevant sections of the design doc --> This algorithm was chosen as a candidate to be implemented into the software, since it was under the top 10 of the DSB 2017. The documentation should provide help for deciding, which algorithm should be included, and for contributors who want to implement it into the software. ## How Has This Been Tested? PyCharm renders the *.md-file correctly. ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 year, 3 months ago

@tdraebing commented on PR #70: Added first draft of template for documentation of prediction algorithm

I removed the question of feasibility for now, since this might also be commented on in the section dealing with adapting the model to our needs. I also changed part of the dependency section into a table layout.
1 year, 3 months ago

@tdraebing commented on PR #70: Added first draft of template for documentation of prediction algorithm

Yes, the second one should contain detailed steps or comments. The point "Prerequisites - is porting to Python 3.5 feasible?" should be answered with yes/no. It was mainly put there in case the algorithm is not already written in any version of Python. Would that be the case anyway or rather out of scope? If we anyway only plan to use algorithms already implemented in Python, I would suggest removing this point.
1 year, 3 months ago

@tdraebing commented on PR #70: Added first draft of template for documentation of prediction algorithm

Ok, I added an extra point for the ML framework into the "Prerequisites"-section.
1 year, 3 months ago

@tdraebing commented on PR #57: Reducing test set size and adding a script to crop dicom images.

I am not very familiar with `lfs`, but as I understand it, right now `lfs` is scanning for all `*.dcm`-files in the repo-folder as specified in the `.gitattributes`-file. Thus the small dataset would also be handled by `lfs`. `git lfs ls-files` indeed shows that the small datasets are managed by `lfs`. I am not completely sure how to fix it correctly without doing a mistake. I guess I would have to follow this guide: https://help.github.com/articles/removing-files-from-git-large-file-storage/ and change the entry in `.gitattributes` to a more specific path. But as I understand it, this will completely remove the files from the repo. Is changing the `.gitattributes`-file and committing with the image files maybe enough to replace the `lfs`-pointers in the repo with the proper files? Could anybody comment on that?
1 year, 3 months ago

@tdraebing commented on PR #70: Added first draft of template for documentation of prediction algorithm

You are right. That should be a separate point. It would show up in the dependencies section, but that might not be easy and convenient to find for somebody reading the docs. Before I add that point, is the aim of the project to support several frameworks? Or would you rather try to rewrite the models to fit one backend of choice (e.g. all to Keras with Tensorflow-backend)? Then I would add that part to the "Adaptation"-section.
1 year, 3 months ago

@tdraebing commented on issue #58: Create a template for algorithm documentation

I created a pull request with a first draft of the template. If you have ideas of how to improve it please fill free to comment on it or to directly add a commit.
1 year, 3 months ago

@tdraebing opened a new pull request: #70: Added first draft of template for documentation of prediction algorithm

<!--- Provide a general summary of your changes in the Title above --> ## Description <!--- Describe your changes in detail --> I added a first draft of a template for documenting candidates of prediction algorithms that could be included into this project. ## Reference to official issue <!--- If fixing a bug, there should be an existing issue describing it with steps to reproduce --> <!--- Please link to the issue here: --> Issue #58 ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If adding a new feature or making improvements not already reflected in an official issue, please reference the relevant sections of the design doc --> The template contains the following sections: - A short summary - Source of the algorithm - License - Prerequisites - Description of the algorithm design - Reference/User instructions to the trained model - Model performance - Use cases - Changes that are needed for adaptation into this project - Misc. comments - References ## How Has This Been Tested? <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> The markdown is displayed correctly in PyCharm. This is just a first draft and I encourage everybody to comment or commit possible improvements to this pull request. ## Screenshots (if appropriate): ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 year, 3 months ago

@tdraebing created a new issue: #58: Create a template for algorithm documentation

<!--- Provide a general summary of the issue in the Title above --> ## Expected Behavior Future documentation of detection algorithms (Issues #16, #18, #19, #20, #21, #22, #23, #24, #25, #26, #27 and #28) should have a consistent structure and make it easy to compare the different algorithms. ## Current Behavior <!--- Tell us what happens instead of the expected behavior --> The issues mentioned above ask for documentation of algorithms from the Data Science Bowl. If addressed by several people, the documentation of each algorithm inconsistent and messy, making it unnecessarily harder to read and compare algorithms. Thus the advantage over just using the original documentation would be minimal. ## Possible Solution <!--- Not obligatory, but suggest a fix/reason for the bug, --> Creating a template-file specifying sections and content to be filled in as much as possible. This issue thread is also thought of a place to discuss, which information about the algorithm should be included into the documentation. ## Possible Implementation <!--- Not obligatory, but suggest an idea for implementing addition or change --> Add a `template_algorithms.md` file containing the template to the `docs/template`-folder.
1 year, 4 months ago

@tdraebing commented on PR #57: Reducing test set size and adding a script to crop dicom images.

I updated the pull request. The doc string now follows Google style formatting. Also the full test set was included again next to the cropped dataset. I updated the docker-configuration to load the small dataset into the `/images`-folder when running tests.
1 year, 4 months ago

@tdraebing commented on issue #38: Create a smaller test image dataset

That seems like a good idea. I will update the pull request to change the doc-string styling and include both test set sizes. I think it will also make sense to add a short readme-file to the assets folder giving the specifics for anyone, who wants to fiddle around with the test data.
1 year, 4 months ago

@tdraebing commented on issue #12: Add function that loads DICOM images

I gave this issue some more thought. I think for future tasks it might be helpful to, instead of just handing over an array with the pixel data, create an object that also contains some useful metadata, as well as to be a place where future prediction algorithm can store the position of the nodules. What do you think? I would be happy to refactor the script.
1 year, 4 months ago

@tdraebing commented on issue #38: Create a smaller test image dataset

I finally finished the pull request. But I have some concerns that I wanted to share before the pull request is merged. While the reduced size of the dataset is no issue for testing tooling or similar tasks, it might reduce the dataset's information content considerably, which might cause issues for training models etc. A lot of positional information gets lost for example: Where in the lung is the nodule/tumor? Also the nodules annotated in the *.xml-files are pretty wide spread, so we lost a few here. In the current pull request the *.xml-files, which could be useful for supervised learning, are not included. If you think they should be included I will happily update them to the new test set and add them in a new commit. I would also suggest to create a new issue that deals with reading the xml-data into memory. Please let me now what you think. Cheers, Thomas
1 year, 4 months ago

@tdraebing opened a new pull request: #57: Reducing test set size and adding a script to crop dicom images.

Replaced large test images with smaller ones to decrease repo size. <!--- Provide a general summary of your changes in the Title above --> ## Description <!--- Describe your changes in detail --> I decreased the size of all three data sets by reducing the number of images (cropping in z-dimension). For the data sets LIDC-IDRI-0001 and LIDC-IDRI-0003 I chose the regions that contain a big mass inside the lung, which are probably tumors. LIDC-IDRI-0002 does not contain such a mass. Thus I chose a range of slices that contained nodules as stated by the *.xml file. (Disclaimer: I am not a medical professional. As a biologist I only have very limited knowledge about diagnosing lung cancer on CT-images). The *.xml files were removed from the data set for now. The data sets are in the same location as the original data sets. The names changed to be more declarative and now represent the depth of the slices in mm instead of a non-consecutive id. Additionally a script was added to `prediction/src/preprocess/crop_dicom.py` that allows to crop dicom series in x-, y- and z-dimensions for future use. Existing tests were updated accordingly. ## Reference to official issue <!--- If fixing a bug, there should be an existing issue describing it with steps to reproduce --> <!--- Please link to the issue here: --> Issue #38 ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If adding a new feature or making improvements not already reflected in an official issue, please reference the relevant sections of the design doc --> The datasets were quite large. Smaller datasets would substentially decrease the repo size. The created script might be useful for preprocessing in prediction tasks. ## How Has This Been Tested? <!--- Please describe in detail how you tested your changes. --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> Unit tests for the new script were created to test whether the expected output is created. Existing tests were updated to appropriately deal with the smaller test sets (One test relied on a *.xml-file being in the folder). The other tests resulted in no errors. Smaller datasets can be opened using an external DICOM-Viewer (RadiAnt). ## Screenshots (if appropriate): ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 year, 4 months ago

@tdraebing commented on issue #38: Create a smaller test image dataset

I had a look into this and found that the dimensions metadata is in the metadata of each *.dcm-file. The xml-file denotes the position of the marked nodules and is not needed to load the images. `pydicom` actually does not care if you delete images of a series, but `dicom-numpy` does, since it tests, whether the distance between slices is the same. Since the numbering of the files is not in the same order as the z-position of the slice, simply deleting files does not work here. I wrote a function that can reduce the dimensions of a series in x, y and z. I think that this function may be useful in general, because new test image sets might be needed for different tasks. I will create smaller test sets asap and create a pull request. Should I include the script as well in this pull request? What should happen to the full data sets? I would propose to change the docker script to only include the small dataset, when running the server, but keep the original files in the repo.
1 year, 4 months ago

@tdraebing commented on PR #44: Implemented function to load a dicom series into memory as a numpy-array

Thanks for the help with making the test images available on the server. I implemented the changes and linted the code.
1 year, 4 months ago

@tdraebing opened a new pull request: #44: Implemented function to load a dicom series into memory as a numpy-array

<!--- Provide a general summary of your changes in the Title above --> ## Description <!--- Describe your changes in detail --> This pull request adds the `load_dicom`-function into `./prediction/src/preprocess` that allows to load a DICOM-series in at a given folder path to be loaded into memory as a numpy-array using the `pydicom` and `dicom_numpy`-packages. The function also reraises exceptions from `pydicom` and `dicom_numpy` in case of corrupt DICOM-files or raises a `EmptyDicomSeriesException` implemented in `./prediction/src/preprocess.errors.py` if the give folder does not contain `*.dcm`-files. More extensive error handling might be necessary in later development stages. ## Reference to official issue <!--- If fixing a bug, there should be an existing issue describing it with steps to reproduce --> <!--- Please link to the issue here: --> Issue #12 ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If adding a new feature or making improvements not already reflected in an official issue, please reference the relevant sections of the design doc --> The change allows loading of DICOM-image series into memory for further processing. ## How Has This Been Tested? <!--- Please describe in detail how you tested your changes. --> Unittests using `pytest` were added at `./prediction/src/tests/test_load_dicom.py`. The tests assure that files are read correctly by `pydicom` and `dicom_numpy` or an exception is raised as well as that the output is a numpy-nd_array with the right number of images in the z-dimension. To be able to run those tests one test case from `./tests/assets` was copied into the `./prediction/src/tests/assets`-folder to be able to reach it using the current docker-configuration. If there is a more elegant solution to this, please tell :-). <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## Screenshots (if appropriate): ## CLA - [x] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
1 year, 4 months ago