Georgia's Online Cancer Information Center

Federate the data: NCI produces master plan for the Childhood Cancer Data Initiative

12/04/2020, Cancer Letter

By Matthew Bin Han Ong

NCI is using a hub-and-spoke structure to engineer an ambitious data federation for pediatric cancers—which, if done right, is anticipated to become the gold standard for a new generation of comprehensive cancer databases.

The Childhood Cancer Data Initiative, which is now on track to be developed over 10 years, is envisioned to gather data from every child, adolescent, and young adult diagnosed with cancer in the United States, regardless of where they receive care.

“The CCDI is a critical new attempt to try to aggregate and understand and develop a broad series of types of data—clinical data, treatment data outcome, molecular biospecimen data—in a longitudinal fashion, from both individual children and young adults, as well as populations of children and young adults to try to share this data in ways that will improve treatment, quality of life, and survivorship for all children with cancer,” NCI Deputy Director Jim Doroshow said Dec. 2 in a virtual meeting of the National Cancer Advisory Board and the Board of Scientific Advisors.

The initiative, first announced by NCI in March 2019, was the institute’s response to the White House’s plan to spend $500 million on pediatric cancer research (The Cancer Letter, Feb. 8, 2019). Congress has appropriated $50 million in fiscal year 2020 for the CCDI, and is expected to continue to do so for another nine years.

“We envision this to be a very high grade dataset that will be useful for real cutting-edge translation and basic research,” NCI Director Ned Sharpless said at the time (The Cancer Letter, March 8, 2019). “This quality, this size, this scope doesn’t exist in any area of biomedical research.”


The stakes in this project are high, since it would serve as a springboard for a much bigger and longer-term goal: to create a comprehensive data federation in oncology. For now, pediatric cancer is a great place to start, experts contend, not only because the number of cases is lower—estimated at about 16,000 a year—but also because there is a pressing need for data aggregation and sharing in a subset of diseases that tend to be rare.

Observers have characterized the CCDI as a heavy lift for any institution, but one that is achievable, because of rapid advancements in data science, analytics, and cloud computing.

Once deployed, a comprehensive data federation would prove to be invaluable for cancer research, speeding up access and collaboration at an unprecedented scale.

Until then, the hard work of harmonizing data systems and creating common data elements is expected to span years, many working groups, and even more data scientists.

A federated data ecosystem

The data federation that NCI has in mind would allow researchers to move seamlessly within an interoperable data commons that would serve as a kaleidoscopic gateway of sorts to other existing data repositories—or, to use NCI-speak, a Federated Pediatric Cancer Data Ecosystem.

When complete, users would have access to a broad array of integrated data and resources—clinical and molecular information, including genomic, phenomic, diagnostic, and preclinical data; as well as biospecimens, including germline samples, PDX models and cell lines, secondary cancers, organoids, and central nervous system tumors.

To make that possible, work is already underway to develop a catalog of all available childhood cancer data registries and establish the National Childhood Cancer Registry to link clinical patient data.

“This will, I think, truly be fundamental at the highest level to understand where the information exists and how we can do a better job of putting it together,” Doroshow said Dec. 2 at the joint meeting. “Initial efforts to lay the foundation for a federated pediatric data ecosystem for research repositories and patient registries has also gotten off the ground.”

Also, initial funds have been used to:

  • Aggregate existing data through transfer of patient-linked clinical and molecular data, and analytic tools to NCI resources,

  • Generate new cancer models and sequence data to fill gaps for key NCI initiatives,

  • Develop or adapt analytic tools and computational methods using grants and contracts for use in childhood and AYA cancer research,

  • Establish a Rare Pediatric Tumor Cell Atlas from tissues obtained through the NCI Pediatric Rare Tumor Network, and

  • Supplement intramural and extramural grants and contracts for enhanced clinical trials data reporting, and  risk prediction

The remaining years of development for the CCDI will focus on building a coordination center that would serve as a conduit for three programs: the National Childhood Cancer Cohort, the Childhood Molecular Characterization Protocol, and Childhood Cancer Data Platform.

The initial data registries and repositories named at the Dec. 2 joint meeting—and identified as foundational contributing sources for the CCDI—include: Kids First, Project:EveryChild, St. Jude Cloud, Ped cBioPortal, Treehouse, and TARGET. These databases, in tandem with the three aforementioned NCI programs, will enable CCDI architects to identify unmet data needs and fill in knowledge gaps.

“The focus here is really allowing deep analytics to occur across the different federated systems and support interoperability among all these different resources,” Warren Kibbe, chief data officer at the Duke Cancer Institute, said at the joint meeting. Kibbe was part of the leadership team that worked on curating the institute’s Genomic Data Commons when he was director of NCI’s Center for Biomedical Informatics and Information Technology (The Cancer Letter, April 4, 2016).

“Some of those activities include developing common data elements, CDEs, making sure that when things are captured in ways that aren’t reflected in common data elements, that we understand how to harmonize them,” Kibbe said Dec. 2. “Of course, the systems themselves need to all be interoperable and data should be able to flow between these different systems. Maximizing data utility for different scientific use cases and making sure that that’s coordinated across all these different activities is incredibly important.”

Doroshow and Kibbe’s presentation slides can be downloaded here.

“A massive undertaking”

Although launched in the Trump era, the CCDI is part of then Vice President Joe Biden’s legacy to eliminate silos in cancer research and consolidate data that are complementary, but are sequestered in disparate ecosystems.

This latest blueprint by NCI comes four years after Biden, now president-elect, in coordination with bipartisan leadership in Congress, obtained $1.8 billion for NCI to carry out the scientific recommendations that formed the basis of the 2016 Beau Biden Cancer Moonshot (The Cancer Letter, To the Moon).

For instance, Moonshot funds that have been used to develop the NCI Pediatric Rare Tumor Network will ultimately contribute to the CCDI’s Rare Pediatric Tumor Cell Atlas.

NCI’s detailed plan for the CCDI ignited lively discussion at the joint advisory meeting on Wednesday, as the institute’s advisors sought to enunciate the foundational aspects of the initiative, ascertain the feasibility of the portfolio’s vision, and assess NCI’s current bandwidth for taking on a colossal, unprecedented data project.

“As I look at the structure of this, what occurred to me is that this is going to be a massive undertaking,” said Kevin Shannon, chair of the CCDI Working Group, and Auerback Distinguished Professor of Molecular Oncology and a professor in the Department of Pediatrics at the University of California, San Francisco.

“It’s really exciting, but what I’m wondering—I guess I should ask Ned this—is whether there’s been any thought given at the NCI level to setting up an office for this, with a director who actually is, if you will, the CCDI Czar—somebody recruited to NCI from the community to try to bring all these pieces together,” Shannon said at the meeting.

“I’ve known Jim [Doroshow] for a long time and he does a great job. It just seems like taking this on, in addition to everything else he has on his plate, maybe such an individual could be recruited from the intramural program and could report directly to Jim around the CCDI. I mean, I wasn’t saying set it aside from the NCI leadership. I was saying that I think you’re going to need a focal nucleating person to make this really work.”

That would make sense, Sharpless said.

“I do have a concern that parachuting someone in from the outside of the NCI is a little more challenging than it sounds. It has worked in a few instances and it has not worked well in other instances,” Sharpless said. “I thought having Jim Doroshow lead this for a while, given that he’s deputy director of the NCI was in some ways ideal, because Jim really has the ability to tell people what to do inside the agency, and I think that will be important for at least getting the ball rolling with Warren’s very important help.

“But I think you’re right that it will need a lot of people in org charts and bodies, and may need a different leadership structure as it really gets going. And I think we should continue to think about that. As Jim gets this directed in the place where he perhaps feels more comfortable, he may want to consider the structure you endorsed.”

The full presentation by Doroshow and Kibbe follows. Comments from NCI advisors appear after their remarks:

Jim Doroshow:

It’s my pleasure to give the boards this update on the NCI CCDI initiative, a very important activity that has been initiated over the past year or so.

It’s going to be my part of the presentation to update you on the progress from year one of the initiative.

And then Dr. Kibbe is going to talk about the program structure and the goals for future years at the CCDI and also our proposed governance structure that we’re bringing to the board for your insight.

As many, if not all, of the board members remember, the CCDI is a critical new attempt to try to aggregate and understand and develop a broad series of types of data—clinical data, treatment data outcome, molecular biospecimen data—in a longitudinal fashion, from both individual children and young adults, as well as populations of children and young adults to try to share this data in ways that will improve treatment, quality of life, and survivorship for all children with cancer. So, I’m going to review briefly the accomplishments from year one.


Probably the most important thing that happened in the past year or so, as this board knows, was the development by the BSA of a CCDI working group report that  really serves as a footprint and a pathway for us to understand and pursue these initiatives, focusing on the 24 specific recommendations that the working group report outlines.

Those recommendations can be grouped into seven major categories. These include the need to bring together and aggregate a wide range of cancer data research basis, to look at a variety of different types of data, how to collect that data, how to aggregate them, and analyze those data.

Very importantly, the working group report pointed out potential barriers to progress in the sense that there are large amounts of data in different silos across the country, and provided  input into how we should try to overcome those barriers.

Also, quite clearly, the report outlines the need for generating new data, both from patients and also from preclinical models, from molecular characterization to a variety of different kinds of population information that will be essential to understanding how to do the best for the children who have cancer in this country.

Another important area that was emphasized was to understand the distinction between research data, which can be aggregated and analyzed, and also the development of the large amounts of clinical data that may not be organized in the way research and clinical trial data are already organized, but maybe in EHRs or in registries.

And the question, really, is how do we get access to that data, put it together and make it available for research purposes and understanding how to improve care.

As important as all of these other initiatives and recommendations was how to do this for and involve a very diverse array of stakeholders and to get input from the entire community of investigators, scientists, parents, children, and advocates.

And finally, the report concludes with a series of potentially transformative opportunities that NCI should think carefully about as it goes forward in this 10-year initiative by using the funds that have been allocated from Congress and to improve these activities.

Year one progress report

I’d like to start out by going through, basically, how did we spend the money? We had $50 million that was appropriated by Congress to initiate this activity and I’m going to give you some broad categories and some specific areas where the monies were utilized, and, in fact, where activities were started, because, as most of you know, these monies became available fairly late in the fiscal year.

And so, in truth, many of these activities are just in their startup phase and have a long way to go in terms of providing information related to the support provided in the past fiscal year.

Among the things that were importantly pointed out in the working group report was the need to develop a catalog of available childhood cancer data registries and data repositories. This initiative, which has gotten off the ground, has completed a landscape analysis of where these data exists, and has started to build a pediatric data catalog prototype.

This will, I think, truly be fundamental at the highest level to understand where the information exists and how we can do a better job of putting it together. Initial efforts to lay the foundation for a federated pediatric data ecosystem for research repositories and patient registries has also gotten off the ground.

Among the most important aspects of this recommendation was to develop a National Childhood Cancer Registry and to link patient data. And not only the data that’s available in large medical centers and pediatric cancer hospitals, but also for patients who are in smaller and at rural hospitals where registries exist, but data aggregation has not been something that’s been possible to date.

And we also are very interested, as pointed out by the working group report, in trying to aggregate the large amount of preclinical data, both to inform the FDA’s molecular targets list, but also to develop Pediatric Preclinical Data Commons so that the models that have been studied across many different programs can be available to a much larger scope of investigators, for whom the tools need to be built to understand both the outcome and molecular characterization information that has been built into the development of those models.

Also, pointed out by the working group before was the need to try to aggregate existing data and to develop the means and the analytic tools to interrogate those data.

So, as you can see, a large part of funds was dedicated to supplementing Cancer Center Support Grants to get access to their registries and data repositories, and also to enhance the activities of the NCI-funded Childhood Cancer Survivor Study.

Now, as I pointed out before, we not only need to aggregate the data from pre-clinical models that have already been studied, but also to fill in a lot of gaps and to develop models in areas that have not been extensively studied.

So, a substantial amount of funds was initially put into trying to develop germline samples and further characterize diagnostic tumor samples from the Pediatric MATCH program, to characterize a large number of additional childhood cancer PDX models and cell lines to understand the molecular characteristics of secondary cancers that are interrogated or will be interrogated as part of the CCSS, and to develop more organoid and cell models in rare tumors and for CNS tumors.

We also really need to understand that aggregating all the data—although very useful—will not be as nearly as helpful, unless we can develop analytical tools and computational methods to interrogate that data.

And so, grants and contracts have been led to try to automate the curation of the data—whether it’s natural language processing, other kinds of AI attempts—to how to best interpret pathology data and imaging from pathology specimens and patient reports, and also to aggregate and interrogate the pediatric preclinical models for which there’s data already, but also that are going to be developed.

As part of this overall initiative, funds are also being generated to try to fill out a Rare Pediatric Tumor Cell Atlas, utilizing the Moonshot funds that have gone toward the development of the NCI Pediatric Rare Tumor Network. I think this is a unique opportunity to provide additional information in a variety of different levels that will help this effort.

And finally, there have been funds, a substantial amount of funds, that have been utilized to supplement existing grants and contracts to further our understanding of the etiology of childhood malignancies, to develop clinical risk prediction systems and genetic susceptibility models, to enhance our ability to report patient-reported outcomes and toxicities in pediatric patients and young adults, to improve the reporting of childhood clinical cancer at trial data, and, also, to get a better understanding of a variety of molecular pathogenesis studies in pediatric patients and some diseases that are important both not only for pediatric, but also for young adult cancer development.

So, this gives you, at a very high level, where the money has gone, it’s in the process of being spent, and will generate, I think, a very significant first step in trying to address the recommendations of the working group that we reported a few months ago.

Goals and objectives for years 2-10

I’d like to turn over the presentation now to Warren Kibbe, who will provide you with a high-level view of the goals, program structure, and governance for the future years at the CCDI. Warren?


Warren Kibbe:

Thank you, Jim. It’s great to be here. And I want to just add my thanks to the CCDI Working Group and particularly, I want to thank the chairs, Drs. Otis Brawley and Kevin Shannon for their work in putting that together. And you’ll see a lot of the themes, actually all of the themes laid out in that working group get echoed in the plans for the next nine years.

So, the foundational goals, again, they’re really an echo of what’s in the working group recommendations, to gather data from every child, in AYA patients diagnosed with cancer, regardless of where they receive their care.

And of course, a lot of the pieces that we’ll be presenting are how we go about doing that—develop core data from consented patients, include both tumor and germline molecular characteristics.

And that’s really with the intent of enabling research on patient-level data in a secure and de-identified way, of course, for realizing improved outcomes for pediatric and AYA cancers. And then, create a system that brings data of different types together in a way that really enables researchers and hopefully incentivizes researchers to query the data that’s then available for new kinds of research.

To give you a sense of the structure of the way that we’ve been laying out the working group recommendations, it’s really to create a National Childhood Cancer Cohort, that’s one of the focus areas.

To create the Childhood Molecular Characterization Protocol, and I’ll describe each of these things in more detail in the coming slides.

Of course, have the Childhood Cancer Data Platform that is one of the critical parts for CCDI—actually, all of these are critical parts for CCDI—and then to have a coordination center that allows us to make sure that all of these different activities are both coordinated, and then aligned with the many other activities that are going on in pediatric and AYA cancer.

And to oversee each of these or to provide guidance to each of these activities—so before working groups—and each will be co-chaired by someone from the NCI and an extramural expert, and the working groups will be made up of NCI staff, external experts and advocates.

Looking through the working group’s recommendations and the existing landscape of pediatric cancer, we realized there were some pretty obvious gaps that were clearly identified in the working group report.

And again, these different areas of Childhood Cancer Data Platform, the Cancer Cohort, and the Molecular Characterization Protocol were really designed to help fill in those gaps as depicted here graphically.


National Childhood Cancer Cohort

Now I’ll dig into each of those areas a little bit more, so a little bit more specificity.

Again, as I laid out as one of the overall principles, the Childhood Cancer Cohort will gather data from every child diagnosed with cancer in the United States and capture that care trajectory both for children and AYA cancer patients, including the care provided outside of COG and other networks, to identify both gaps and identify disparities in care and outcome for those patients.

Another critical piece is tracking biospecimens and biospecimen availability across that whole cohort, provide access to data from underserved patients, again, for the purpose of trying to reduce disparities that exist in different populations across the country and provide a consistent research consent. So, again, being able to lower the barrier for access to the data.

And then finally allow for long-term followup of childhood cancer patients. And something that’s already been mentioned by Dr. Doroshow is really including the work in the National Childhood Cancer Registry as a foundational element in this.

And, of course, I’ll just point out, every one of these slides just includes the working group, so keep that in mind when I’m going through. There’s a working group chaired both by NCI and an extramural expert.

Molecular Characterization Protocol

I’m going to go through the Childhood Molecular Characterization Protocol. Again, it will be a national strategy, which is building on the efforts of Project:EveryChild, to offer appropriate clinical and molecular characterization to every child.

This will enable discovery when these and other data are connected together, provide a minimum set of molecular diagnostics for every pediatric and AYA cancer patient.

They’ll be accessible to all children with cancer, so that filling in the gap between children that go onto clinical trials and those that don’t—that’s really where the Molecular Characterization Protocol is directed. And we anticipate that the protocol will provide clinical sequencing of roughly 3,000 patients.

And, again, as discussed in year one, it will also be aligned with the Rare Pediatric Tumor Cell atlas.


View More

Media, News & Events

Looking Across Disciplines to Share the Breadth of Innovation in Cancer Care

On July 25, 2023, The American Journal of Managed Care® brought its Institute for Value-Based Medicine (IVBM) series to Atlanta, Georgia, where nnovation was on the agenda.


DNA sequencing can lead to longer, better lives for cancer patients. But why do so few get it?

Guidelines now call for everyone diagnosed with advanced lung and colon cancer to get their tumor genetically sequenced, and increasingly, patients with earlier stage disease, as well.


Georgia CORE releases report from Disparities in Cancer Clinical Trials Summit

On Sept. 30, Georgia's cancer experts shared advice and experiences with oncology providers, patient navigators and research managers - all to provide new ways to diversify participation in clinical trials. In February 2023, Georgia CORE issued a report with a recap and actions steps. Download the report >>


Subscribe to Our Newsletter

Our email newsletter allows you to get the most up-to-date information right in your Inbox.

Subscribe Now
Georgia CORE


Advancing Cancer Care through Partnerships and Innovation

Georgia CORE is a statewide nonprofit that leverages partnerships and innovation to attract more clinical trials, increase research, and promote education and early detection to improve cancer care for Georgians in rural, urban, and suburban communities across the state.