How to De-Identify Data
Identifying Number
- There are many potential identifying numbers. For example, the preamble to the Privacy Rule noted that “Clinical trial record numbers are included in the general category of any other unique identifying number, characteristic, or code.”
Identifying Code
- A code corresponds to a value that is derived from a non-secure encoding mechanism. For instance, a code derived from a secure hash function without a secret key (e.g., ―salt‖) would be considered an identifying element. This is because the resulting value would be susceptible to compromise by the recipient of such data. As another example, an increasing quantity of electronic medical record and electronic prescribing systems assign and embed barcodes into patient records and their medications. These barcodes are often designed to be unique for each patient, or event in a patient‘s record, and thus can be easily applied for tracking purposes. See the discussion of re-identification.
Identifying Characteristic
- A characteristic may be anything that distinguishes an individual and allows for identification. For example, a unique identifying characteristic could be the occupation of a patient, if it was listed in a record as “current President of State University”.
Many questions have been received regarding what constitutes any other unique identifying number, characteristic or code in the Safe Harbor approach. Generally, a code or other means of record identification that is derived from PHI would have to be removed from data de-identified following the safe harbor method. To clarify what must be removed, the implementation specifications provide an exception with respect to re-identification‖ by the covered entity. The objective is to permit covered entities to assign certain types of codes or other record identification to the de-identified information so that it may be re-identified by the covered entity at some later date. Such codes or other means of record identification assigned by the covered entity are not considered direct identifiers that must be removed.
In the context of the Safe Harbor method, actual knowledge means clear and direct knowledge that the remaining information could be used, either alone or in combination with other information, to identify an individual who is a subject of the information. This means that a covered entity has actual knowledge if it concludes that the remaining information could be used to identify the individual. The covered entity, in other words, is aware that the information is not actually de-identified information.
The following examples illustrate when a covered entity would fail to meet the actual knowledge provision.
Example 1: Revealing Occupation
Imagine a covered entity was aware that the occupation of a patient was listed in a record as “former president of the State University”. This information in combination with almost any additional data, like age or state of residence, would clearly lead to an identification of the patient. In this example, a covered entity would not satisfy the de-identification standard by simply removing the enumerated identifiers because the risk of identification is of a nature and degree that a covered entity must have concluded that the information could identify the patient. Therefore, the data would not have satisfied the de-identification standard‘s Safe Harbor method unless the covered entity made a sufficient good faith effort to remove the “occupation‘‘ field from the record.
Example 2: Clear Familial Relation
Imagine a covered entity was aware that the anticipated recipient, a researcher who is an employee of the covered entity, had a family member in the data (e.g., spouse, parent, child, or sibling). In addition, the covered entity was aware that the data would provide sufficient context for the employee to recognize the relative. For instance, the details of a complicated series of procedures, such as a primary surgery followed by a set of follow-up surgeries and examinations, for a person of a certain age and gender, might permit the recipient to comprehend that the data pertains to his or her relative‘s case. In this situation, the risk of identification is of a nature and degree that the covered entity must have concluded that the recipient could clearly and directly identify the individual in the data. Therefore, the data would not have satisfied the de-identification standard‘s Safe Harbor method.
Example 3: Publicized Clinical Event
Rare clinical events may facilitate identification in a clear and direct manner. For instance, imagine the information in a patient record revealed that a patient gave birth to an unusually large number of children at the same time. During the year of this event, it is highly possible that this occurred for only one individual in the hospital (and perhaps the country). As a result, the event was reported in the popular media, and the covered entity was aware of this media exposure. In this case, the risk of identification is of a nature and degree that the covered entity must have concluded that the individual subject of the information could be identified by a recipient of the data. Therefore, the data would not have satisfied the de-identification standard‘s Safe Harbor method.
Example 4: Knowledge of a Recipient’s Ability
Imagine a covered entity was told that the anticipated recipient of the data has a table or algorithm that can be used to identify the information, or a readily available mechanism to determine a patient‘s identity. In this situation, the covered entity has actual knowledge because it was informed outright that the recipient can identify a patient, unless it subsequently received information confirming that the recipient does not in fact have a means to identify a patient. Therefore, the data would not have satisfied the de-identification standard‘s Safe Harbor method.
Please use the checklist below, provided by the UF Privacy Office to ensure that your data is properly de-identified. Below are two categories of de-identification.
De-Identified Data
Checklist
Please review the 2019 De-Identified Checklist to properly, and completely de-identify your PHI.
Please note: This check list is subject to change by the Privacy Office. Even if you use this list, you may still be subject to review. If you have any questions or concerns regarding your data, please contact the Privacy Office.
De-Identified Limited Data Set
Checklist
A Limited Data Set (LDS) is a limited set of identifiable patient information per HIPAA Privacy Regulations. A LDS may be shared with an outside person/entity without written patient authorization if two basic conditions are met:
- The purpose of the disclosure is for research, public health or health care operations and
- The party disclosing the information signs a DUA with the data recipient.
LDS HIPPA de-identification checklist for properly de-identifying your Limited Data Set of PHI.