If you couldn’t attend our last webinar on Open Data, here is what to remember from the Health Data Hub. It is a young but essential organization, especially for the management of the health crisis. We discussed with Stéphanie Combes, Director of the organization about the mission of the organization and how it can revolutionize our healthcare system.
First of all, where did the idea of the Health Data Hub come from?
The Health Data Hub finds its roots in the work of Cédric Villani on artificial intelligence, published in March 2018, which advocates the sharing of health data. Upon submission of this report, the President of the Republic announced the creation of a Health Data Hub. I was still working at the Ministry of Health at the time, and Agnès Buzyn, Minister of Health, had asked to lead an expert mission to describe the mission of this health data hub. We auditioned over a hundred actors. Despite their impressive number, the use cases considered useful for the healthcare system were still very theoretical. The observation made at the time is as follows: much of the health data was compartmentalized and scattered over the territory. At the same time, the technological offer was not sufficient at that time.
When this work was handed over, the Minister asked us to put in place the proposed roadmap and to obtain the first concrete elements before the end of 2019. We first worked on the legal aspect linked to the creation of such an organization. Thus, article 41 of the law “Organization and Transformation of the Health System” describes the objectives and missions of the Health Data Hub. We then organized a call for projects aimed at designing a relevant service offer in contact with business teams wishing to deliver concrete use cases. And at the end of 2019, a structure was created with teams whose mission is to support users of health data. This is the birth of the Health Data Hub.
Concretely, what is the role of the Health Data Hub?
In early January 2020, the public interest group (legal status of the Health Data Hub structure), bringing together 56 stakeholders from the health data ecosystem, approved its strategic roadmap for the next three years. It is based on four pillars :
Thus, the main mission of the Health Data Hub is to allow authorized health project leaders to access non-nominative health data via a secure technological platform while respecting citizens’ rights. These projects must necessarily serve a public interest, that is to say, aim to improve the quality of care and support patients. This platform has the advantage of bringing together a certain number of databases of interest, the catalog, which will allow the Health Data Hub teams to cross-reference data that can be reused by numerous projects and therefore to multiply initiatives while reducing data access times, which are sometimes very long today in France. Indeed, without the centralization of the most relevant data, some projects can take several years to see the light of day or are simply not feasible. In addition, the list of databases constituting the catalog will be carried by an order published after consultation with the CNIL. This allows it to evolve regularly according to ministerial priorities or requests from the scientific and innovation community. In fact, the Health Data Hub is not a giant platform bringing together all the data in a single space, contrary to what one might sometimes think, but a scalable database system to best meet the demand of the all stakeholders: project leaders but also actors at the origin of the bases who wish to enrich them or make them more visible.
Today, we are trying to make this whole approach consistent with the European strategy to set up common data spaces in nine sectors, including health. Discussions are underway to design the equivalent of the Health Data Hub at European level and we are the competent authority in France to reflect on these issues with the other French and European players involved.
Another of our missions is also to animate the ecosystem to help popularize the issues related to the reuse of health data. For example, last December we organized a conference organized with the Digital Health Delegation and the “IA4Health” Grand Challenge for the second consecutive year. We also launched a Data Challenge with Santé Publique France and a Winter School at the start of 2021, to evangelize on the subject of health data and the importance of sharing.
Who is providing the data today?
Today, we are working with around twenty prospective partners to make their non-nominative data available in the catalog as soon as it can be implemented. At the present time, we are authorized to share data relating to the epidemic within the framework of the health emergency, this foreshadows the catalog that may exist after the publication of the last texts of application of the law (expected under a few weeks). Thus, we share the data of emergency passages from Public Health France as well as from the Health Insurance, ATIH and health establishments relating to patients hospitalized for Covid, but also under few those of SIVIC (Hospitalization Information System), and SIDEP (PCR screening information system) and many others. It is important to emphasize that project leaders can only access this data with authorization from the CNIL. The data is pseudonymized, that is to say that directly identifying data is removed, but it remains sensitive, hence the importance of a secure platform to access it. Data sharing is done in partnership with the players, which is why we are still in the process of discussing with them on the methods of data sharing (scientific, economic, governance, etc.). Indeed, it is crucial for the actors making their data available to have a right of control over who accesses this data and how they use it. They cannot, however, oppose a legitimate project.
By mid-2021, we should finalize a first list of bases that make up the catalog. The latest decree is expected to be issued in late February – early March and the decree three or four months later. We are therefore ready to make this data available once the decree is published.
What use cases are being put into production thanks to the Health Data Hub?
First, I would like to talk about the Hydro Project led by the startup Implicity, an industrialist in the connected medical device industry. This is a project to cross data from connected pacemakers with data from hospitalizations. By correlating these data, it would then be possible to predict heart failure attacks from the pacemaker data that Implicity monitors to improve management and anticipate these attacks. These data can be crossed on the technological platform of the Health Data Hub.
Second, the Deepsarc Project, which is being launched, aims to chain a cohort of patients with sarcoma (a rare cancer) with data from health insurance. They are so few that clinical trials are difficult. In this specific case, the crossing will make it possible to monitor more precisely the consumption of patient care according to different types of treatment and better personalize it according to the patient’s situation with a better hope of efficiency.
Finally, the Deepiste Project aims to develop an artificial intelligence tool to automatically analyze mammograms and analyze risk factors. The will of this project is also to put the algorithms in open source.
All these projects aim to support healthcare professionals in diagnoses but also to better meet the needs of patients, all in record time. It is therefore important for us to partner with actors who have the same values as us, and who will make knowledge available through the sharing of documentation or tools.
The Health Data Hub was on the rise during the health crisis. Can you tell us more?
It was indeed during the first confinement that we were able to put the technological platform into production to support the management of the crisis. In the last quarter of 2020, we had seven project authorizations from the CNIL out of the 40 projects we are supporting, including three relating to projects related to the epidemic. This is an important step taken because the regulatory aspect is an integral part of our progress.
These include the COVISAS project: patients with obstructive sleep apnea syndrome, due to repeated oxygen deprivation, often develop associated diseases that can make them vulnerable to COVID- 19 (obesity, diabetes, high blood pressure, cardiovascular diseases). The CoviSAS project, led by the MIAI chair of artificial intelligence at the University of Grenoble-Alpes and the company Semeia, a supplier of software solutions using artificial intelligence, aims to find out the prevalence of severe forms of COVID-19 in these patients, and identify combinations of associated illnesses (co-morbidities) leading to a higher rate of ICU stay or death.
The Frog Covid study, for its part, is also looking at recurrent associations of other diseases in patients with severe (hospitalization) or very severe (admission to intensive care) forms of COVID-19. Through this study, the research office specializing in data collection solutions and algorithms Clinityx and the research unit of INSERM Cardiovascular MArkers in Stressed COndiTions (MASCOT) seek to identify predictive factors of the risk of develop severe to very severe COVID-19, to define profiles of particularly at-risk patients.
What do you think are the key success factors for projects to use health data beyond access to them?
Access to data is sometimes complicated in France, but beyond that, even once access is authorized and effective, it is complex to appropriate the data and ensure reliable analyzes. Access to knowledge is therefore also major, which is why we have created collaborative documentation around Health Insurance data with the aim of generalization around the catalog. We have created a gitlab, containing documentation from Public Health France, INSERM, the Ministry of Health, but also programs facilitating data manipulation as well as synthetic data so that potential users can understand their format. in advance of phase. These data must not present any risk of re-identification.
The other key factor for the success of a health data reuse project is to bring together the necessary and varied skills in the project teams. These skills are sometimes rare and expensive, but they are essential and we notice that more and more hospitals, for example, are organizing themselves to accommodate suitable multidisciplinary teams (IT specialists, data engineers, lawyers). For example, APHP has been setting up a health data warehouse for a few years, which has made it possible to export daily statistics on Covid patients.
In your opinion, what steps remain to be taken to become a true health data hub?
First, we have taken the step of setting up, and many projects are starting up so we are comforted in the idea that the Health Data Hub has a use. For 2021, the ambition would already be to have the first concrete results. We are aware that research projects always take a little time to deliver, but information may be shareable quickly because new bases have already been established within the framework of the most advanced projects.
Then, as soon as the decree is published, and we have finalized our service offer with catalog partners, we can extend the catalog to other databases than those gathering data relating to the epidemic. Once these steps have been taken, the Health Data Hub will allow project leaders to access unprecedented databases, within more acceptable timeframes in view of international competitiveness and the challenges for patients. Any type of project promoter aiming at a public interest will be able to access our services, subject to authorization from the CNIL.
As part of this catalog, we would like to create strategic partnerships with key players in the ecosystem such as Inserm around a few cohorts, and hospitals with the aim of creating one or more multi-center clinical databases enriched with medical data. -administrative health insurance
The implementation of this catalog is the priority of the Health Data Hub and is of an eminently strategic nature for French research.
I have been working with data for 9 years, structured data, geolocated data, textual data. Python, R, Rshiny are my friends. A data scientist at INSEE in recent years, I arrived at DREES with the desire to exploit the potential of these health data with a new perspective.