Skip to content

Notes on Differential Privacy

I recently saw an interesting demo at work about differential privacy, and thought I should write down some notes! (and I’ve bugged the speaker to publish an article, yes). The concept of differential privacy reminded me of Mozilla’s Lean Data Practices, although I think they serve different purposes … Lean Data asks if you should collect anything in the first place, and I’d say differential privacy gives you a technique to collect data while respecting the user’s privacy.

What’s differential privacy?

So you want to collect data about your users in order to make better product decisions. Collecting data about your users is creepy and also Not Cool to be a way to expose your user’s private data. How can you deal with this? Differential privacy!

As you gather up the data, you add “noise” to the data, so that it’s not quite what it was before.

The cool trick about this, is that once you have large amounts of data, you’re able to get valuable statistics out of it … without having stored user-specific-real data 😎

How might I implement it?

The differential-privacy repo has a few implementations that make it easier to implement this strategy, including a Go implementation.

Some notes on epsilon

I hope the speaker does publish their article, because rather than recreating their demo here, I’m going to write down my favorite note from it: epsilon.

Real systems have a privacy budget — each bit of data you collect will decrease the privacy budget. That’s because each bit of data you collect makes it a bit more likely you can correlate a piece of data with a particular user (thus compromising their privacy). That’s where the epsilon value comes in. Epsilon is the value in this equation that allows you to tune how much noise you add to the data. More noise: more secure, but less accurate, and vice versa.

A question I have reading the type definition for some options you can pass in the dpagg package — I’m curious what the delta value means here? If epsilon influences how much noise to apply, what is the delta? Is that how much difference we’re aiming for between the real and the more-privacy-aware data?

I want to try it!

There’s an example in the differential-privacy repo so you can walk through using that library, if you like.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.