De-identification of PHI: Two Key Methods (2024)

De-identification of protected health information (PHI) under the Health Insurance Portability and Accountability Act (HIPAA) is an important aspect of ensuring patient privacy and data security. HIPAA sets guidelines and standards for the appropriate use, disclosure, and protection of PHI. De-identification refers to the process of removing or obscuring specific identifiers related to individuals to reduce the risk of exposing sensitive or personal information.

To achieve HIPAA-compliant de-identification, entities must adhere to one of two established methods: Safe Harbor and Expert Determination.

In this article, we’ll cover everything you need to know about de-identification of PHI using these two methods.

What Is HIPAA De-Identification?

HIPAA de-identification refers to the process of removing specific identifiers from protected health information (PHI) to ensure compliance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. The goal of de-identification is to enable healthcare organizations, researchers, and other entities to share health data for various purposes without violating the privacy rights of patients or breaching any legal requirements concerning PHI.

What Is De-Identified Data?

De-identified data is health information that has been stripped of any identifiable characteristics or elements, resulting in no reasonable basis for linking the data to an individual. According to the HIPAA Privacy Rule, this reduction in detail renders the information no longer considered as individually identifiable health information, effectively eliminating the restrictions and privacy safeguards applied to PHI.

There are two primary methods for de-identification under the HIPAA Privacy Rule:

Safe Harbor: This method involves removing specific identifiers from the data set, such as names and geographic subdivisions smaller than a state. By following the Safe Harbor method, covered entities can ensure the information is adequately de-identified.
Expert Determination: Through this method, a qualified expert evaluates the risk of re-identification of the data and determines if it meets the required standard for de-identification.

Utilizing de-identified data enables organizations to share vital health information for large-scale medical research studies, policy assessments, comparative effectiveness studies, and other data-driven endeavors without infringing upon the privacy rights of patients or requiring patient authorizations. This allows for the advancement of medical knowledge and improved healthcare while also ensuring the protection of individual privacy.

Why Is PHI De-Identification Important?

De-identification of Protected Health Information (PHI) is a crucial process in maintaining patient privacy and compliance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. PHI consists of individually identifiable health information relating to an individual's health status, provision of health care, or payment for health care. HIPAA regulations require covered entities, such as healthcare providers, insurance companies, and their business associates, to protect the confidentiality of PHI.

The importance of PHI de-identification lies in its ability to protect individuals' privacy while enabling the use of health data for various purposes. De-identified information can be used to gain valuable insights on population health, advance medical research, and inform healthcare policy making. Moreover, de-identified information is no longer subject to the provisions of the HIPAA Rules, providing greater flexibility for data sharing and analysis.

De-identification reduces the risk of unauthorized access, disclosure, and improper use of PHI. When information is de-identified, the chance of re-identifying individuals from the available data becomes significantly lower, safeguarding the privacy and confidentiality of patient data.

Key Methods for De-Identifying PHI According to the HIPAA Privacy Rule

HIPAA Safe Harbor Method

The HIPAA Safe Harbor Method is one of the two primary methods for de-identifying Protected Health Information (PHI) according to the HIPAA Privacy Rule. This method requires the removal of specific identifiers from the PHI, which significantly reduces the risk of re-identification.

Identifiers to Be Removed According to the Safe Harbor Method

To achieve de-identification using the Safe Harbor method, the following 18 identifiers must be removed:

  1. Names
  2. Postal address information (excluding town, city, state, and zip code)
  3. Telephone numbers
  4. Fax numbers
  5. Email addresses
  6. Social Security numbers
  7. Medical record numbers
  8. Health plan beneficiary numbers
  9. Account numbers
  10. Certificate and license numbers
  11. Vehicle identifiers and serial numbers
  12. Device identifiers and serial numbers
  13. Web URLs
  14. IP addresses and hostnames
  15. Biometric identifiers (e.g., fingerprints, voice prints)
  16. Full-face photographs and comparable images
  17. Any other unique identifying number, code, or characteristic
  18. Dates directly related to an individual (except for the year)

HIPAA Expert Determination Method

The HIPAA Expert Determination Method is the second primary method for de-identifying PHI according to the Privacy Rule. This method involves using a qualified expert who can determine, based on their knowledge of accepted statistical and scientific principles, that the risk of re-identification is very low.

Unlike the Safe Harbor method, which focuses on the removal of specific identifiers, the Expert Determination method allows more flexibility. The expert can consider various factors, such as the data's statistical properties, the data recipient's ability to re-identify the patients, and the overall risk of re-identification before giving an expert opinion about the de-identified data.

Additional Strategies

In addition to these 2 methods, several other strategies can help minimize the risk of PHI re-identification:

  • Using aggregated data that summarizes information instead of providing individual-level details.
  • Limiting the granularity of data, such as using the year of birth instead of the exact date.
  • Implementing access controls and secure storage measures for the de-identified data.

Covered entities and their business associates must be vigilant in protecting PHI and mitigating the risk of re-identification. By following the HIPAA Privacy Rule guidelines and employing additional strategies, organizations can maintain compliance and ensure the privacy and security of individuals' health information.

Key Takeaways on Data De-Identification for HIPAA

The process of de-identification plays a crucial role in complying with the Health Insurance Portability and Accountability Act (HIPAA), specifically under the Privacy Rule. 

De-identifying PHI offers numerous advantages, including:

  • Privacy protection: By eliminating sensitive information, organizations can better safeguard patients' privacy while still benefiting from valuable health data insights.

  • Regulatory compliance: De-identification helps ensure compliance with the HIPAA Privacy Rule, which demands the protection of individuals' medical records and other personal health information.

  • Data sharing and collaboration: De-identified data can be exchanged and analyzed by different stakeholders, fostering collaboration in research and improving overall healthcare outcomes.

It's important to bear in mind several considerations while implementing de-identification methods:

  • Accuracy and integrity of data: Organizations must ensure that the de-identification process does not compromise the utility and meaning of the data. This balance is particularly important to retain the value of the data for research purposes.

  • Ongoing compliance: As technology advances, new methods of re-identification may arise. Organizations must continuously evaluate their de-identification techniques to keep up with the evolving landscape and maintain HIPAA compliance.


What Is the Difference Between De-Identified and Anonymized Data?

De-identified data refers to protected health information (PHI) that has been stripped of specific identifiers in accordance with the HIPAA Privacy Rule. This ensures that individuals cannot be identified from the remaining data. On the other hand, anonymized data is completely scrubbed of any information that could potentially link the data to an individual, making it irreversibly untraceable.

Is the List of Safe Harbor Identifiers the Same as the Definition of PHI?

No, the list of Safe Harbor identifiers is not the same as the definition of PHI. The Safe Harbor method is a subset of the HIPAA Privacy Rule, providing a list of 18 identifiers that must be removed from PHI to achieve de-identified data. These identifiers include names, geographic subdivisions smaller than a state, and unique identifying numbers, among others. PHI, on the other hand, refers to any health information that is individually identifiable.

What Constitutes “Any Other Unique Identifying Number, Characteristic, Or Code” with Respect to the Safe Harbor Method of the Privacy Rule?

In the context of the Safe Harbor method under the HIPAA Privacy Rule, "any other unique identifying number, characteristic, or code" refers to any piece of information that can be used to identify an individual, either by itself or when combined with other available data. This includes medical record numbers, vehicle identifiers, and Internet Protocol (IP) addresses. It is essential to remove such identifiers to comply with the Safe Harbor method.

Do Doctors´ Names Have to Be Removed from a Data Set for PHI to Be De-Identified?

Yes, doctors' names must be removed from a data set for it to be considered de-identified. Under the HIPAA Privacy Rule, names are part of the 18 identifiers that need to be removed using the Safe Harbor method. This not only applies to patients but also to healthcare providers and other individuals whose names may be included in the dataset.

What Is the Difference Between the Safe Harbor Method of De-Identification Vs. Data Masking?

The Safe Harbor method of de-identification is a specific process defined under the HIPAA Privacy Rule for removing 18 identifiers from PHI. Conversely, data masking is a general term for techniques used to replace sensitive information with fictional or scrambled data while maintaining the structure and format. Data masking can be used for various purposes, including de-identification, and may involve techniques such as substitution, shuffling, or encryption.

Is De-Identified Data Confidential?

De-identified data is not considered confidential because it has been stripped of identifiers under the HIPAA Privacy Rule, making it difficult to link the information back to any specific individual. However, it is still crucial to handle de-identified data responsibly and maintain best practices in data security to ensure privacy and avoid potential re-identification risks.

Start automating your
healthcare workflows

Free trial account
Cancel anytime
Get started free