Defining Personal Information

Along with many others, I have sought to make the case that there are many stages of personal or non-personal information, and that laws should recognize different obligations for different types of data.  A definition that is very broad risks setting infeasible or unwise requirements for data that is low risk and high utility, while a definition that is too narrow risks excluding risky uses of data.  In earlier work, my colleagues and I argued for multiple categories of data including identified, identifiable, pseudonymous, protected pseudonymous, de-identified and anonymous. The GDPR heads in this direction with its definition of pseudonymous, and treats such data as a safeguarding measure, which along with other factors can weigh in favor of more flexible use.

For legislation at this point, we might be wise to build on the GDPR and on an assessment of the technical stages of personal data.  Here is what I am thinking – I would love to see something like this incorporated into the bill.  Welcome reactions!

Draft legislative definition of “Covered Data”

  1. In this Act, “Covered Data” means any data that: 1) is under the control of a Covered Entity; and 2) is linked or can practicably be linked to an individual by the Covered Entity or by an anticipated recipient of the data.  


  1. “Covered Data” includes  


  1. a) “Identified Data” – information explicitly linked to a known individual.  


  1. b) “Identifiable Data” – information that is not explicitly linked to a known individual, but that can practicably be linked by the Covered Entity or intended recipients.  [is not subject to access requests/portability etc. but is subject to all other restrictions]


  1. c) “Pseudonymous Data” – information that cannot be linked to a known individual without additional information kept separately;


  1. c) “De-Identified Data” –  (i) data from which direct and indirect identifiers have been permanently removed; or (ii) data that has been perturbed to the degree that risk of re-identification is small, given the context of the data set. (iii) data that an expert has confirmed poses a very small risk that information can be used by an anticipated recipient to identify an individual


[Key impacts on substantive requirements of data being classified as “de-identified data:”

  • When subject to controls that are legal, administrative, technical, contractual, enforceable (public commitment/FTC), or some combination of such controls, the data is not subject to many requirements.
  • The data cannot be made public.
  • The data cannot be shared without controls that reasonably prevent identification. by anticipated recipients.
  • The data is not subject to access/portability.
  • Such de-identification is a determinative factor in assessing whether a use is “incompatible/out of context/subject to consent requirements” under a federal privacy law’s substantive provisions.
  • In many circumstances, the Act imposes different requirements regarding Identified Data and De-Identified Data; the Act incentivizes Covered Entities to de-identify Identified Data when appropriate.]  


[Key impacts of data being classified as “pseudonymous data”


  • Data cannot be made public
  • Data cannot be shared without controls that reasonably prevent identification by anticipated recipients.
  • Pseudonymization is an important but not determinative factor in assessing whether a use is “incompatible/out of context/subject to consent requirements” under a federal privacy law’s substantive provisions.
  • When pseudonymous data is shared and used by 3rd parties for personalization, targeting, profiling – the right to opt-out is applicable, unless the data is only used in aggregate form (for analysis, research, ad reporting. )
    • Important point: Data that has been pseudonymized, but for which a key is not available, or for which assurances are in place that prevent intended recipients from identifying users under c(iii) can be deidentified data.
  • Access/portability requirements depend on technical feasibility


  1. Exceptions – The term “Covered Data” does not include:
    1. Publicly available information. “Publicly available” means information that is lawfully made available from federal, state, or local government records when that information is used for a purpose that is compatible with the purpose for which the data is maintained and made available in the government records.
    2. Data used by an employer solely in connection with an employee’s employment or post employment related status (retirement etc);
    3. Data used by a business in the context of business-to-business activities;
    4. Data deleted by a Covered Entity;
    5. Non-identifiable Data, which has been strongly de-identified (direct and indirect identifiers have been removed, or data has been significantly perturbed or highly aggregated and an expert assessment assures the data can be made public, shared (or shared a limited number of times) and presents no risk or very little privacy risk; and
    6. Data used to identify or mitigate cybersecurity threats; ensure the security and stability of a Covered Entity’s networks and/or physical infrastructure; or operate anti-fraud programs;
    7. Data used to prevent or detect criminal activity or child exploitation;
    8. Data used to comply with a legal requirement;
    9. Data regarding a deceased individual that does not reveal Covered Data regarding a living individual [e.g. genetic data].


  1. Section [t/k] of the Act authorizes a mechanism [t/k] by which [t/k] can revise or supplement the definition of “Covered Data” through [t/k administrative mechanism]



  1. Peter Swire
    Invitation to Jules Polonetsky, Omer Tene, or John Verdi of Future of Privacy Forum – David Hoffman’s language seems to overlap with efforts FPF has been making on this topic. ic. Could you chime in on any additional points from those FPF efforts?

  2. Omer Tene
    Peter, not sure if your comment here is pre or post Jules’ comment, which outlines FPF efforts in this respect. Intel’s definition is in line with Danny’s preference for a nce for a briefly stated, principle based statute that leaves room for agency/judicial interpretation. It also reflects an EU-style definition that has gained much traction around the world. (I’d just tweak the term “location data” by adding the word “precise” before it.
    In a way, it also tracks FPF efforts since it distinguishes between direct identifiers (“a name, an identification number, [precise] location data, an online identifier”) and a collection of indirect identifiers (” physical, physiological, genetic, biometric, mental, economic, cultural or social identity”), which could be use for identification.
    At FPF, we have found that policymakers, regulators, businesses and advocates continue to struggle with this high level definition, which inevitably leads to “all or nothing” debates about the futility or beauty of de-id.
    We therefore try to slice it thinner, to unveil the complete spectrum of identifiability (or de-id), and to calibrate different obligations to various intermediate states. Perhaps Danny will say this is a task better left for the courts. But our experience is that organizations are yearning for guidance on this most fundamental – and controversial – piece of the framework.