A new cancer database, the Genomic Data Commons, aims to make research more accessible
Cancer research is becoming more accessible due to a new initiative that gives researchers access to big data from thousands of patient cases. In June 2016, the Genomic Data Commons (GDC), a web portal that features information on cancer tumors from more than 30,000 patients, was revealed to the public during the American Society of Clinical Oncology’s annual meeting.
The database is managed by the National Cancer Institute (NCI) at the University of Chicago, and it is funded by the National Cancer Institute through President Obama's Precision Medicine Initiative for Oncology. A portion of the GDC data is open to the public, while qualified researchers can access more detailed data, says Louis M. Staudt, MD, PhD, senior investigator at NCI.
“The database includes all common cancer types. There are 50 to 500 or more tumors in each type,” Staudt says. The pediatric database features 29 cancer types and more than 3,000 patient cases.
The GDC is the first open-access cancer database that combines genetic information on cancer tumors and treatment information. In total, the database features 4.1 petabytes-or about 4 million gigabytes-of information. Staudt says that depending on the type of cancer included in the database, some behavioral or environmental data is collected, such as whether patients with lung cancers are smokers.
“Data sets include genomic data from tumors from patients with cancer, clinical data, treatment and the results from treatment,” says Staudt. “There’s also genomic data mutations in DNA, RNA expressions data, which is the activity of genes in a cancer cell. It’s a multiple platform approach to looking at data.”
Nearly 14,000 patient cases in the GDC came from the NCI. Foundation Medicine, a cancer genome analysis company, donated 18,000 cases, Staudt says
“We can easily add many more thousands of patients from other public databases,” he says. “We are measuring the success of the Genomic Data Commons attracting other public data sets.”
Next: Inspired by Facebook and Google
GDC creation began in 2014 at the University of Chicago Center for Data Intensive Science. Working with the NCI, researchers from the University of Chicago created information frameworks and standards that makes raw and processed data from cancer treatment more accessible and easier to understand.
The architects of the GDC used Facebook and Google as inspiration-both platforms house large amounts of diverse data, but also focus on the user experience, which can vary from computer novice to expert. Though the full data is only open to qualified researchers, it is open to research teams regardless of size or budget.
In 2014, Robert Grossman, GDC principal investigator and professor of medicine and director of the Center for Data Intensive Science at the University of Chicago, stated, “The Genomic Data Commons has the potential to transform the study of cancer at all scales. It supplies the data so that any researcher can test their ideas, from comprehensive ‘big-data’ studies to genetic comparisons of individual tumors to identify the best potential therapies for a single patient.”
Staudt says that increasing the usability is a short-term goal of the project. The GDC also provides an application programming interface that allows developers to access specific data files.
“In the near future, we hope that researchers will be able to ask questions on the site, and visualize the data,” Staudt says. “Also, the massive size of the data precludes many researchers.”
Next: Use and potential
Here are four ways the GDC could advance cancer care and other treatments:
1. Make it easier for researchers to advance cancer care. The diversity of data available is important, because often researchers in different groups are analyzing different aspects of cancer and its treatment. The GDC uses algorithms to harmonize and standardize data to ensure that researchers across platforms are able to utilize the data.
2. Improve the understanding of cancer tumors. One of the goals of the GDC is to create a more diverse pool of candidates for clinical trials to get a better understanding of the differences in cancer tumors.
Vice President Joe Biden, who leads President Barack Obama’s “cancer moonshot” initiative, announced the GDC to researchers, clinicians and patient advocates at ASCO. “Increasing the pool of researchers who can access data and decreasing the time it takes for them to review and find new patterns in that data is critical to speeding up development of lifesaving treatments for patients,” Biden said at the event.
3. Serve as a model for future databases. The GCD could be used to create open-source databases for other illnesses, including diabetes, heart disease and Alzheimer’s, says Grossman.
4. Advance precision medicine. In the future, the GDC will support single-person clinical trials, which is a first step in enhancing precision medicine for cancers. Also, GDC architects are working with cloud-based technologies that will allow researcher to perform on-site experiments and remote analyses of large amounts of data.
“Long term, we want to create a knowledge base for cancer and increase clinical data on drugs,” Staudt says. “This is called precision medicine in oncology. We hope to collect data from clinical trials and move forward with some more useful clinical knowledge.”
Grossman, Biden, Staudt touring the Genomics Data Commons. Source: University of Chicago
Donna Marbury is a writer in Columbus, Ohio.