Open-Source MRI Dataset from USC Now Available to Researchers
The University of Southern California has released an open-source dataset of anatomical brain images taken from MRIs of stroke victims. The dataset is intended to spur advances in machine learning by providing a large set of manually-traced lesions.
Manually-traced lesions are useful, but labor intensive.
Researchers have attempted to automate lesion segmentation through algorithms. However, this automation is in its primitive stages, and machines cannot yet identify lesions with great accuracy. Thus, manually-traced lesions are still the gold standard, but require a large amount of work from a trained neuroanatomy expert.
USC’s dataset attempts to bridge the gap between human tracers and machines. By providing 304 T1-weighted MRIs with lesions segmented by a human, the study’s authors hope computer programmers can develop an accurate lesion segmentation algorithm. The dataset is available for download free of charge here.
Strokes are a leading cause of death and disability in the United States.
Mortality rates from strokes have steadily declined worldwide, but around two thirds of stroke survivors suffer long-term disabilities that affect their daily activities. This situation has led scientists to focus on what interventions provide the best outcomes for stroke survivors.
Doctors have opportunities for intervention at both the acute and chronic stages. In the former, intervention can save neural tissue and promote functional recovery. In the latter, rehabilitation can help long-term recovery.
Magnetic resonance imaging can aid doctors in making intervention decisions.
Clinical brain images taken within 24 hours of a stroke help doctors determine whether to administer thrombolytic drugs or perform surgery to save neural tissue. Because clinical scans are taken for almost all stroke victims, there have been great strides in using large-scale datasets of these acute scans for predictive modeling.
Unfortunately, sub-acute and chronic scans are given less and therefore harder to obtain, making predictions at these levels less advanced. That’s one thing the study’s co-author Sook-Lei Liew would like to change.
“The goal of ATLAS is to generate a dataset that machine learning and computer scientists could use to develop better automated algorithms to identify the lesions,” Liew told Health Data Management.
Machine learning requires large, accurate datasets to train and to test.
Liew hopes that computers will eventually be able to identify biomarkers in stroke patients, making it easier to prescribe the appropriate rehabilitation therapy and treatment. Her next step is to create a separate dataset used to test the algorithms developed using the current dataset.
“In machine learning, you always need a training dataset and a testing dataset.” Liew noted. “Even if people aren’t interested in stroke, it’s also an interesting dataset to train any sort of computer vision algorithm because it’s a challenging problem.”
Slabodkin, G. USC Releases MRI Stroke Dataset To Spur AI Research. Health Data Management. Available here. Accessed February 22, 2018.
Liew, S et al. A Large, Open Source Dataset of Stroke Anatomical Brain Images and Manual Lesion Segmentations. Scientific Data. February 20;(5):180011. doi:10.1038/sdata.2018.11
Accessed February 22, 2018.
Feigin, VL et al. Global and regional Burden of Stroke During 1990-2010: Findings From the Global Burden of Disease study 2010. The Lancet. January 2014;(383)9913:245-255. doi:10.1016/S0140-6736(13)61953-4 Accessed February 22, 2018.