Collecting and manipulating enormous amounts of data soon will begin to play a vital role in research and delivery of health care, according to California leaders of the "big data" movement.
"You're starting to see a crescendo of big data efforts and that's sharply increasing awareness of what that can mean in health care," said David Haussler, director of the Center for Biomolecular Science and Engineering at UC-Santa Cruz.
Last week, UC-Santa Cruz researchers announced plans to create the world's largest depository for cancer genomes. In conjunction with the National Cancer Institute, the Cancer Genomics Hub is designed to provide researchers with a huge and growing database of biomedical information that can be used in "personalized" or "precision" care, in which treatments target specific genetic changes found in an individual patient's cancer cells.
"Big data collection and computing is allowing us for the first time to get a complete molecular characterization of cancer," Haussler said.
"The scale of what we're doing is far beyond anything anybody's been able to put together before. I think you're going to start to see this sort of big data effort on several fronts -- partly because of supercomputing capabilities that we haven't had until recently and also because of wireless devices that are increasingly being used to transmit data," Haussler said.
$10.5 Million Project
The $10.5 million genome project will incorporate genetic information from 10,000 cancer patients fighting 20 different kinds of adult cancer and five childhood cancers.
Managed by Haussler and his UC-Santa Cruz team, the Cancer Genome Hub is housed at the San Diego Supercomputer Center. Researchers from other institutions -- including the Massachusetts Institute of Technology, Memorial Sloan-Kettering Cancer Center and others -- plan to build their own supercomputing system next to the CGHub to make the best, most efficient use of the database.
"Known as co-locating, this has become a popular solution for large institutions with their own research needs," Haussler said.
Haussler said he hopes eventually to get some or all of the CGHub data into a cloud computing sphere. "It would be great to have that ability to create a sort of Amazon for health data, but the problem with that right now is that clouds offer a one-style-fits-all kind of computing, which limits the sorts of things you can do with the data," Haussler said.
Changing Research, Delivery Methods
Bringing together previously unheard of amounts of data has the potential to change the way medical researchers and providers do their jobs, Haussler said.
The cancer genome project has the potential "to change the way cancer is treated. Cancer is incredibly complex, with different kinds of tumors within the same kids of cancer. By having the ability to examine so many different cancers, we hope to move toward the idea of precision or personalized medicine, treatment tailored to your specific molecular makeup," Haussler said.
Eventually there could be some application crossovers for stem cell research.
"There are some initiatives afoot to bring more genomic technology into stem cell research and ultimately into stem cell treatment," Haussler said.
Big Data Projects, Reports, Events
The Cancer Genome Hub is one of several big data efforts under way in California and elsewhere in the nation.
The Obama administration a month ago announced that UC-Berkeley and the Lawrence Berkeley National Laboratory in Washington, D.C., will receive a total of $35 million in federal funding to participate in a national initiative to harness large amounts of information -- including health care data -- for research purposes.
The funding is part of $200 million six federal agencies have committed to the "Big Data Research and Development Initiative."
The National Science Foundation awarded $10 million to UC-Berkeley's new AMP Lab (algorithms, machines and people) to create open-source software to collect and organize massive amounts of data from a number of federal agencies, including the Department of Defense, CDC and NIH.
Three other notable big data projects, reports and events are:
- A report last month from the Ewing Marion Kauffman Foundation suggested that providing access to big data could help control "America's most urgent public policy problem" -- rising health care costs. "Using proper safeguards, we need to open the information that is locked in medical offices, hospitals and the files of pharmaceutical and insurance companies," said John Wilbanks, Kauffman senior fellow and an author of the report, "Valuing Health Care: Improving Productivity and Quality."
- In a project similar to the UC-Santa Cruz Cancer Genome Hub, researchers at State University of New York at Buffalo are using big data technology to research multiple sclerosis. Researchers will collect and digest vast amounts of data looking for correlations among genetic, clinical and environmental factors that might help reveal the causes or severity of MS in some individuals.
The Health Datapalooza next month in Washington, D.C., will examine new methods for collecting and using big data. The two-day event -- June 5 to June 6 -- is the third Health Data Initiative Forum.