Data - Second Annual Data Science Bowl _ Kaggle 1m4x5x

05/02/2016

Data Second Annual Data Science Bowl | Kaggle Host

Competitions

Datasets

Scripts

Jobs

Community ▾Velu Pandian Ravichandran

$200,000 • 460 teams

Second Annual Data Science Bowl Mon 14 Dec 2015

Dashboard Home Data Make a submission Information Description Evaluation Rules Prizes About the DSB Deep Learning Tutorial Fourier Based Tutorial Resources Timeline

Forum Leaderboard My Submissions

Leaderboard 1. heart 2. Tencia & woshialex

Merger and 1st Submission Deadline

Mon 14 Mar 2016 (38 days to go)

Competition Details » Get the Data » Make a submission 1.00

Data Files File Name

Available Formats

validate

.zip (5.16 gb)

train

.zip (12.71 gb)

train.csv

.zip (3.05 kb)

sample_submission_validate.csv

.zip (3.12 kb)

In this dataset, you are given hundreds of cardiac MRI images in DICOM format. These are 2D cine images that contain approximately 30 images across the cardiac cycle. Each slice is acquired on a separate breath hold. This is important since the registration from slice to slice is expected to be imperfect. The competition task is to create an automated method capable of determining the left ventricle volume at two points in time: after systole, when the heart is contracted and the ventricles are at their minimum volume, and after diastole, when the heart is at its largest volume.

3. Mike 4. PaulG 5. Tim Hochberg 6. BoShuang 7. nagadomi 8. Keras.io 9. BioMedIA 10. h-wit

Forum (113 topics) Java

7 hours ago

Keras Deep Learning tutorial (~0.0359) 20 hours ago

nolearn BatchIterator question 22 hours ago

The volumes at systole, VS , and diastole, VD , form the basis of an important clinical measurement known as the ejection fraction:

The results need to be reproducible?

100 ∗

yesterday

.

VD

yesterday

Would anyone with score <0.017 like to team up?

VD − VS

This quantity represents the fraction of outbound blood pumped from the heart with

https://www.kaggle.com/c/secondannualdatasciencebowl/data

1/3

05/02/2016

Data Second Annual Data Science Bowl | Kaggle

yesterday

each heartbeat. An ejection fraction that is too low can signify a wide range of cardiac

Sunnybrook data

problems.

yesterday

teams players entries

Variations in anatomy, function, image quality, and acquisition make automated quantification of left ventricle size a challenging problem. You will encounter this variation in the competition dataset, which aims to provide a diverse representation of cases. It contains patients from young to old, images from numerous hospitals, and hearts from normal to abnormal cardiac function. A computational method which is robust to these variations could both validate and automate the cardiologists' manual measurement of ejection fraction. This is a two-stage competition. In the first stage, you are building models based on the training dataset, and testing your models by submitting predictions on the validation set. Two weeks before the final deadline, you will submit your model to Kaggle. At this point, the second stage of the competition starts. Kaggle will release the final test dataset, on which you will run your models. The final standings are based on this final test set.

File descriptions Each case has an associated directory of DICOM files. The exact number of images will differ from case to case, either varying in the number of slices, the views which are captured, or the number of frames in the time sequences. The main view for assessing ventricle size is the short axis stack, which contains images taken in a plane perpendicular to the long axis of the left ventricle. These have the prefix "sax_" in the competition dataset. Most cases also have alternative views, which you should feel free to incorporate into your methodology. The structure is as follows: train.zip - the train set directory, contains cases where you will have the associated systolic and diastolic volumes validate.zip - the validation set directory, used for the leaderboard in stage one of the competition. You should predict the volumes for these cases during stage one. test.zip - the test set, used for the leaderboard in stage two of the competition (a.k.a. the final standings). You should predict the volumes for these cases during stage two. This file will not be released until the second stage. train.csv - contains the systolic and diastolic volumes for the cases in the training set. sample_submission_validate.csv - a sample submission file in the correct format for stage one sample_submission_test.csv - a sample submission file in the correct format for stage two. This file will not be released until the second stage.

DICOM The DICOM standard is complex and there are a number of different tools to work with DICOM files. You may find the following resources helpful for managing the competition data: The lite version of OsiriX is useful for viewing images on OSX https://www.kaggle.com/c/secondannualdatasciencebowl/data

2/3

05/02/2016

Data Second Annual Data Science Bowl | Kaggle

pydicom - a package for working with images in python oro.dicom - a package for working with images in R Mango is a useful DICOM viewer for Windows s

FAQ We will add to this section as relevant common questions arise. How do I know where the left ventricle is? How do I compute its volume? Watch this video for a primer on the anatomy and process used by clinicians:

Second Annual Data Science Bowl Competition Tutorial ... 1.00

I see more than one series at the same slice location. How should we deal with those cases? Generally, a slice location is repeated if there is an artifact on the images. You can use either slice but the odds are that the last slice at a given slice location is the best the technologist could acquire. Some MRI images are not consistent (in size, shape, or structure). What should we do about these? We have opted to include as many cases as possible in this dataset. As this is real data from many sources, it is bound to have some amount of unwanted variability. You should do your best to handle these files. Since this is a two stage competition and the test set may have unseen abnormalities, we recommend including some form of error catching as you write your code.

Citation The data for the Data Science Bowl is available for research and academic pursuits. Please cite as ‘Data Science Bowl Cardiac Challenge Data’.

© 2016 Kaggle Inc

About Our Team Careers Privacy /

https://www.kaggle.com/c/secondannualdatasciencebowl/data

3/3

Data - Second Annual Data Science Bowl _ Kaggle 1m4x5x

Overview 26281t

More details 6y5l6z

Related Documents 3h463d

Data - Second Annual Data Science Bowl _ Kaggle 1m4x5x

Science 10 Data Booklet 734d5f

The Data Science Handbook 54644b

Art Of Data Science 1n5i59

R Para Data Science 3e6co

Data Science Course Brochure l9h

More Documents from "RVP" 3l2n69

Data - Second Annual Data Science Bowl _ Kaggle 1m4x5x

Laporan Respirasi Kecambah Kelompok 6 H 2015 n2c3k

Pokemon Emerald Gba Gameshark Codes 3t6151

Ifa1esmh 2b3f28

Cheat Dragon Ball Z :shin Budokai Psp 3b2u3t

Rps Genetika 1 2017 1tr23