05/02/2016
Data Second Annual Data Science Bowl | Kaggle Host
Competitions
Datasets
Scripts
Jobs
Community ▾Velu Pandian Ravichandran
$200,000 • 460 teams
Second Annual Data Science Bowl Mon 14 Dec 2015
Dashboard Home Data Make a submission Information Description Evaluation Rules Prizes About the DSB Deep Learning Tutorial Fourier Based Tutorial Resources Timeline
Forum Leaderboard My Submissions
Leaderboard 1. heart 2. Tencia & woshialex
Merger and 1st Submission Deadline
Mon 14 Mar 2016 (38 days to go)
Competition Details » Get the Data » Make a submission 1.00
Data Files File Name
Available Formats
validate
.zip (5.16 gb)
train
.zip (12.71 gb)
train.csv
.zip (3.05 kb)
sample_submission_validate.csv
.zip (3.12 kb)
In this dataset, you are given hundreds of cardiac MRI images in DICOM format. These are 2D cine images that contain approximately 30 images across the cardiac cycle. Each slice is acquired on a separate breath hold. This is important since the registration from slice to slice is expected to be imperfect. The competition task is to create an automated method capable of determining the left ventricle volume at two points in time: after systole, when the heart is contracted and the ventricles are at their minimum volume, and after diastole, when the heart is at its largest volume.
3. Mike 4. PaulG 5. Tim Hochberg 6. BoShuang 7. nagadomi 8. Keras.io 9. BioMedIA 10. h-wit
Forum (113 topics) Java
7 hours ago
Keras Deep Learning tutorial (~0.0359) 20 hours ago
nolearn BatchIterator question 22 hours ago
The volumes at systole, VS , and diastole, VD , form the basis of an important clinical measurement known as the ejection fraction:
The results need to be reproducible?
100 ∗
yesterday
.
VD
yesterday
Would anyone with score <0.017 like to team up?
VD − VS
This quantity represents the fraction of outbound blood pumped from the heart with
https://www.kaggle.com/c/secondannualdatasciencebowl/data
1/3
05/02/2016
Data Second Annual Data Science Bowl | Kaggle
yesterday
each heartbeat. An ejection fraction that is too low can signify a wide range of cardiac
Sunnybrook data
problems.
yesterday
teams players entries
Variations in anatomy, function, image quality, and acquisition make automated quantification of left ventricle size a challenging problem. You will encounter this variation in the competition dataset, which aims to provide a diverse representation of cases. It contains patients from young to old, images from numerous hospitals, and hearts from normal to abnormal cardiac function. A computational method which is robust to these variations could both validate and automate the cardiologists' manual measurement of ejection fraction. This is a two-stage competition. In the first stage, you are building models based on the training dataset, and testing your models by submitting predictions on the validation set. Two weeks before the final deadline, you will submit your model to Kaggle. At this point, the second stage of the competition starts. Kaggle will release the final test dataset, on which you will run your models. The final standings are based on this final test set.
File descriptions Each case has an associated directory of DICOM files. The exact number of images will differ from case to case, either varying in the number of slices, the views which are captured, or the number of frames in the time sequences. The main view for assessing ventricle size is the short axis stack, which contains images taken in a plane perpendicular to the long axis of the left ventricle. These have the prefix "sax_" in the competition dataset. Most cases also have alternative views, which you should feel free to incorporate into your methodology. The structure is as follows: train.zip - the train set directory, contains cases where you will have the associated systolic and diastolic volumes validate.zip - the validation set directory, used for the leaderboard in stage one of the competition. You should predict the volumes for these cases during stage one. test.zip - the test set, used for the leaderboard in stage two of the competition (a.k.a. the final standings). You should predict the volumes for these cases during stage two. This file will not be released until the second stage. train.csv - contains the systolic and diastolic volumes for the cases in the training set. sample_submission_validate.csv - a sample submission file in the correct format for stage one sample_submission_test.csv - a sample submission file in the correct format for stage two. This file will not be released until the second stage.
DICOM The DICOM standard is complex and there are a number of different tools to work with DICOM files. You may find the following resources helpful for managing the competition data: The lite version of OsiriX is useful for viewing images on OSX https://www.kaggle.com/c/secondannualdatasciencebowl/data
2/3
05/02/2016
Data Second Annual Data Science Bowl | Kaggle
pydicom - a package for working with images in python oro.dicom - a package for working with images in R Mango is a useful DICOM viewer for Windows s
FAQ We will add to this section as relevant common questions arise. How do I know where the left ventricle is? How do I compute its volume? Watch this video for a primer on the anatomy and process used by clinicians:
Second Annual Data Science Bowl Competition Tutorial ... 1.00
I see more than one series at the same slice location. How should we deal with those cases? Generally, a slice location is repeated if there is an artifact on the images. You can use either slice but the odds are that the last slice at a given slice location is the best the technologist could acquire. Some MRI images are not consistent (in size, shape, or structure). What should we do about these? We have opted to include as many cases as possible in this dataset. As this is real data from many sources, it is bound to have some amount of unwanted variability. You should do your best to handle these files. Since this is a two stage competition and the test set may have unseen abnormalities, we recommend including some form of error catching as you write your code.
Citation The data for the Data Science Bowl is available for research and academic pursuits. Please cite as ‘Data Science Bowl Cardiac Challenge Data’.
© 2016 Kaggle Inc
About Our Team Careers Privacy /
https://www.kaggle.com/c/secondannualdatasciencebowl/data
3/3