Institute: ONC | Component: 2 | Unit: 9 | Lecture: a | Slide: 20
Institute:Office of National Coordinator (ONC) Workforce Training Curriculum
Component:The Culture of Health Care
Unit:Privacy, Confidentiality, and Security
Lecture:Definitions of privacy, confidentiality, and security
Slide content:Is De-identified Data More Secure? Not Necessarily 87% of US population uniquely identified by five-digit ZIP code, gender, and date of birth (Sweeney, 2002) Sweeney identified William Weld, governor of Massachusetts, in health insurance database for state employees by purchasing voter registration for Cambridge, Massachusetts, for $20 and linking ZIP code, gender, and date of birth to de-identified medical database (Sweeney, 1997) Genomic data can aid re-identification in clinical research studies ( Malin & Sweeney, 2005; Lumley & Rice, 2010) Social security numbers can be predicted from public data ( Acquisti & Gross, 2009) 20
Slide notes:When data is referred to as being de-identified , it means that personally identifying characteristics of the data, such as name or address, or other fields that make up personal health information have been removed. Is de-identification secure? It may not always be as secure as intended. Sweeney brought this to light and has received notice in the popular press. When she was completing her PhD at MIT, she did a widely cited study that essentially identified William Weld, the governor of Massachusetts at the time, through information found by linking to publicly available data sources. Her research also showed that eighty-seven percent of the U.S. population could be uniquely identified by their five-digit ZIP code, gender, and date of birth. So when relatively common data elements are combined, individual identities may be easily identified. In the case of William Weld, Sweeney was able to access a health insurance database for state employees, and Governor Weld was obviously a state employee. Sweeney also was able to purchase the voter registration list for the city of Cambridge, Massachusetts, where the governor lived. She then combined these two databases, linking the ZIP code, gender, and date of birth, and was able to identify the governor, as will be demonstrated further in the next slide. Just as genomic data generated in clinical research studies may make individuals identifiable, some recent research has shown how Social Security numbers of individuals can be predicted from public data, because so many data sets contain Social Security numbers. 20