How can I preserve participants' privacy?

Ensuring the anonymity of published data can be a tricky task. We owe it to our participants to guarantee their privacy, while being as transparent about our research as possible.

  • Meyer (2018) provides Practical Tips for Ethical Data Sharing
  • Fraser & Willison (2009) present various strategies and Tools for De-Identification of Personal Health Information. As examples of more recent implementations of these approaches, anonymizer for R obfuscates potentially identifying information, the online tool Amnesia shortens identifiers (e.g. postcodes) until they are no longer unique; Faker for Python can generate random, but plausible substitute data.
  • The Open Science Framework has defined criteria for a Protected Access Open Data Badge and has collected a list of repositories where sensitive data can be stored safely.
  • For highly sensitive data, analyses can be made reproducable by publishing raw descriptive data alongside correlation or covariance matrices. For example, Fried et al. (2017) report a fully open and reproducible analysis based on a publicly available aggregate version of their data.

See also

The DataWiz Knowledge Base provides much more information about informed consent and participants privacy, among many other issues. Don't miss this excellent resource!