‘De-Identified’ Studies Are Not as Anonymous as Some May Think

April 6, 2023

Article

Systematic review suggests small amounts of data can be used to re-identify individuals.

While wearable devices have opened up a new paradigm in healthcare research due to their ability to generate huge amounts of health-related data, such research is predicated on the idea that the data has been “de-identified” to preserve the privacy of the individuals wearing the devices.

However, a new report in The Lancet Digital Health suggests the current process of de-identification may not be as protective of privacy as the name implies. The authors of the report conducted a systematic review of studies that attempted to re-identify patients from de-identified data sets using biometric signals from wearable devices. They found that correct identification was possible in between 86% and 100% of cases.

Jessilyn Dunn, Ph.D., of Duke University, and her colleagues investigated how de-identified health data about individuals can be re-identified with other data.

“Although data sharing provides tremendous benefits, it also poses many crucial questions around privacy risks to patients and study participants that remain unanswered,” wrote corresponding author Jessilyn Dunn, Ph.D., of Duke University, and colleagues.

The investigators noted that wearables, such as smartwatches, have an increasing array of capabilities, including the ability to track a patient’s steps, heart rate and location. The data generated by the devices can be used by the individual wearing the device, but it can also be used by software-makers to improve their algorithms, and by scientific researchers to study population health or the impact of specific medical interventions.

The National Institutes of Health have adopted guidelines aimed at promoting de-identified data-sharing, but the investigators said the possibility of re-identification — using other data to link wearable device data with a person’s identity —mcould open the door to data misuse by government, corporations, or other individuals.

For example, the investigators posited a scenario in which a patient participates in an employee wellness program that involves tracking her steps and heart rate that also requires the collection of demographic and identity information. In their scenario, the patient had previously participated in a stroke prevention study that tracked the same metrics, but which also included other health information that was inaccessible to her employer (the scenario used an HIV diagnosis as an example). If the study’s data were made publicly available in a de-identified manner, the patient’s employer could be able to link the data back to their own employee, thereby learning of her HIV diagnosis. The employer could then theoretically use such data to discriminate against the patient, for instance, by curtailing their contribution to her health coverage.

In an effort to better understand the scope of the potential “re-identification” problem, Dunn and colleagues performed a literature search that ultimately yielded 72 studies that met their inclusion criteria, 64 of which were classified as high-quality and 8 of which were classified as moderate quality, according to the investigators’ custom study-quality assessment tool.

In most of the studies included in the analysis (57), the metric used to assess re-identification was “correct identification rate” (CIR). Those studies found CIR values ranging from 86% to 100%, Dunn and colleagues said, “suggesting that reidentification risks from wearable device data are higher than previously appreciated.

The authors cautioned that most of the studies are small, with fewer than 100 participants each. However, the four larger studies showed results consistent with the smaller studies.

Adding to the concern, the investigators said re-identification was possible even with very little data.

For example, the investigators cited one study that found 50 seconds worth of accelerometer and gyroscope data from people who brushed their teeth while wearing an LG G smartwatch could be used to identify patients with a CIR of 96%.

“This discovery is concerning since publicly identified data is becoming increasingly abundant, given data-sharing advocacy and policy by influential bodies, such as the U.S. Food and Drug Administration and National Institutes of Health,” Dunn and colleagues said.

The investigators said it is still necessary to have identifiers in order to re-identify someone—simply having two de-identified data sets including the same person is not enough to identify that person. However, the authors noted that the availability of identifiers is on the rise because “an increasing number of companies are entering third-party data-sharing agreements, some of which are ethically tenuous.”

Still, Dunn and colleagues were clear that they are not suggesting blocking the sharing of biometric data.

“On the contrary, this systematic review exposes the need for more careful consideration of how data should be shared since the risk of not sharing data (e.g., algorithmic bias and failure to develop new algorithmic tools that could save lives) might even be greater than the risk of reidentification,” they wrote.

Rather, they said, their study is a warning that in order for open science to flourish, better measures are needed to preserve privacy.

“For example, an emphasis on research directions for developing privacy-protecting methods… could allow the biomedical research community to continue to reap the many benefits of data sharing while protecting the privacy of individuals,” they concluded.

Get the latest industry news, event updates, and more from Managed healthcare Executive.

Subscribe Now!

‘De-Identified’ Studies Are Not as Anonymous as Some May Think

Newsletter