OpenAI Privacy Filter

294 points by tanelpoder 25 days ago | 66 comments

stratos123 25 days ago |

There's some interesting technical details in this release:

> Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.

> The released model has 1.5B total parameters with 50M active parameters.

> [To build it] we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.

LatencyKills 22 days ago | |

Couldn't this be used to locate private data in unstructured text without having to rely on other means of PII detection?

1. Pass the raw text through the filter to obtain the spans.

2. Map all the spans back to the original text.

Now you have all the PII information.

Everdred2dx 22 days ago | | |

Yep, and already has been done.

https://github.com/chiefautism/privacy-parser

yjftsjthsd-h 22 days ago | | |

If you have the redacted and unredacted versions, then you can diff them; that seems unsurprising? Unless I'm really misunderstanding "spans"?

fzxu22 22 days ago |

Working on this: https://github.com/KevinXuxuxu/anon_proxy, a sort of anonymization proxy to use with LLM providers. It does model (OpenAI privacy filter) + regex PII detection, and replaces them back-and-forth for API requests and responses. With locally hosted detection model, no PII leaves your local environment. I find it very useful especially when you're working on sensitive documents (legal, tax, immigration etc.), hope you find it helpful as well :)

blfr 22 days ago | |

This is very cool because it allows you to use any model. Obviously, it still lets the model and its operator see the entire context of the conversation.

I quite like Moxie's Confer[1] approach to just encrypt the whole thing in such a way that no one except the end-user sees the plaintext.

[1] https://confer.to/

fzxu22 22 days ago | | |

Thanks for the comment! And yes, redaction based measures will always face the trade-off between privacy vs intellegence you get out of LLM. e.g. provider will inevitably know you're in some sorts of legal/tax issue even without any PII. And for some case the intellegence you want will depend on LLM knowing some detail (e.g. your AGI when doing tax preparation).

On the other hand Moxie's Confer is really interesting! On first glance I thought it's using homomorphic encryption but it turns out to be based on hardware isolation. TIL +1

doodlebugging 22 days ago | | |

That looks interesting. I would like to see them update the Privacy Policy and Terms to acknowledge that their service also works with an Apple ID or with another email. At present, it suggests that the only authentication allowed on your end is through Google's GMail.

codethief 22 days ago | | |

It's a nice approach – if only Intel SGX were more trustworthy.

stingraycharles 22 days ago | |

How does it handle “unredaction” in responses? E.g. let’s say the LLM does something with the document. You redacted its input, so it emits redacted content. Now what?

fzxu22 22 days ago | | |

The proxy keeps 2-way mapping of identified PII and the redaction e.g. Jane Doe <-> <PERSON_1> so the process is reversable i.e. redactions from LLM response will be replaced back to the original, and it should feel transparent on user end. I'll add more detailed example in README to make it clear.

MassiveQuasar 22 days ago | | |

The way I handled it is by assigning the redacted tag an id which gets translated back to the saved PII in the output.

_pdp_ 22 days ago |

We've implemented this type of feature years ago. I can make a couple of comments as a result:

1. Sanitising PII data needs to be de-santised on the client in order to keep the UX somewhat functional. For example, if you say my name is John which get's redact to [NAME] and the model responds with Hi [NAME] it needs to be converted back to Hi John. This means that you need to have a mechanism for reversing PII at the layer where the user is interacting. Of course, that is true if your care about user experience.

2. Redacted PII data is practically useless for most intents and purposes. The model wont be able to do much without some data and there are many things that are considered PII. For simple chat system this is fine. For something more complex where the user needs to interact with the LLM this becomes extremely challenging as the LLM may not be able to do anything at all. There is also the chance of hallucination.

Overall, it is a feature that we support at platform level but it is not something people tend to use due to these limitations.

In my mind the only practical thing to do is to remove some types of PII that represent a security risk and make sure that you use a trusted model that purges PII data as quickly as possible. This will require a very different type of system.

nl 22 days ago |

I'm no where near as smart as OpenAI of course, but I did build https://tools.nicklothian.com/webner/index.html that uses a BERT based named-entity-recognition model running in your browser to do a subset of PII redaction.

It works pretty well for the use cases I was playing with.

The OpenAI model is small enough that I might enhance my tool to use it.

stingraycharles 22 days ago | |

I just used it on a document, but the amount of false positives this generates make it faily difficult to use?

I fed it a ~ 100 line markdown document, took about 10 seconds, and it decided that "matter" (as in, frontmatter), "end" (as in, frontend), MCP (as in, mcp server) are organizations.

Most of them don't even make grammatical sense, e.g. "Following the discussion in <PERSON_1>, blahblah".

Brings me back to what NLP was like a decade ago. I always thought spaCy was a very nice project in that space.

nl 22 days ago | | |

Yeah this really is roughly NLP ~10 years ago.

It does work better on plain text than markdown because of casing. I can't see what you used (kinda the point - because it run all in your browser) but if you can share the markdown as a gist or something I can take a look and comment more concretely.

mplanchard 25 days ago |

It would be nice if their examples weren’t mostly things that are easy to catch with regex, but it’s cool to see if released as an open, local model.

JLO64 22 days ago | |

For my customers I use regexes to block them from potentially publishing personal emails/phone numbers to their websites but I really wouldn't mind running this in addition just for the extra peace of mind. I don't have a GPU on our server, but I hope this is light enough of a model to handle CPU only inference on less than 2k tokens at a time.

mayneack 22 days ago |

Curious how this compares to presidio which mixes regex with a model: https://microsoft.github.io/presidio/

phren0logy 22 days ago | |

I would think you could use this model inside of Presidio, right?

maciejzj 22 days ago |

On a side note, when I click the link it redirects me to machine-translated version of OpenAI website with completely botched meaning - the word “redacted” is translated to a false friend “redagować” which means to edit/refine text, not anonymize.

mentalgear 24 days ago |

SuperagentLM made available on-edge PPI redaction models already a few years ago in sizes 20B, 3B, 200M. They still seem to be available via their legacy API - well worth checking out to compare against this one. https://docs.superagent.sh/legacy/llms/superagent-lm-redact-...

usdogu 22 days ago |

Someone has created the reverse of it: https://github.com/chiefautism/privacy-parser

hiAndrewQuinn 25 days ago |

I'm surprised nobody else has commented on this. This is a very straightforward and useful thing for a small locally runnable model to do.

apothegm 25 days ago | |

And also something that it’s dangerous to try to do stochastically.

hiAndrewQuinn 25 days ago | | |

It's going to be stochastic in some sense whether you want it to be or not, human error never reaches zero percent. I would bet you a penny you'd get better results doing one two-second automated pass + your usual PII redaction than your PII redaction alone.

agnishom 22 days ago | | |

One could chain a regex based system together with this

moralestapia 25 days ago | | |

The alternative being?

Fraaaank 22 days ago | |

From a compliance POV it's not enough. For example: "<NAME PERSON ONE> is president of the United States" is still identifiable even though the name has been redacted.

Since you can't be 100% certain that a filter redacts all personal data, you'd have to make sure that you have measures in place which allow OpenAI to legally process personal data on your behalf. Otherwise you'd technically have a data breach (from a GDPR pov).

And if OpenAI can legally process personal data on your behalf, why bother filtering if processing with filtering is also compliant?

hiAndrewQuinn 22 days ago | |

For the confused: this link must have gotten revived or something, I posted this comment a few days ago. Looks like it's getting the accolades I claim it deserves now.

tanelpoder 22 days ago | | |

It was put into second-chance pool by moderators. I originally submitted this link a few days ago and today got this (semi?)automated email from HN, an excerpt below:

  The submission "OpenAI Privacy Filter" that you posted to Hacker News (https://news.ycombinator.com/item?id=47870901) looks good, but hasn't had much attention so far. We put it in the second-chance pool, so it will get a random placement on the front page some time in the next day or so.

  This is a way of giving good HN submissions multiple chances at the front page. If you're curious, you can read about it at https://news.ycombinator.com/item?id=26998308 and other links there.

ashwindharne 25 days ago | |

Same here, this is an incredibly useful thing to have in the toolkit

CMay 22 days ago |

Just so people are clear, these types of models are almost universally naive and basic. If all you have is a single generic neutral message, "Hi, this is Bob.", it will be sufficient in most cases. If you have a pile of data, I am not aware of any PII redaction tool that has factored in all of the risks to identity leakage.

The problem is when companies use things like this and somehow believe they are anonymizing the data. No, you are not.

Still, for scenarios where the processed data isn't being directly published or shared, but used as some intermediate step like moderation enforcement, human evaluation layers or model training it can be useful to filter these things out.

hanneshapke 20 days ago |

The privacy filter is great, but it leaves a gap for non-technical users. Therefore, Dataiku's open source team at 575 Lab has built the Kiji Privacy Proxy. It is a complete app (MacOS/Linux/Chrome extension) that uses a fine-tuned DeBERTa model. It acts as a forward proxy, but we also have an experimental transparent proxy setup.

It detects 20+ entities, not just masks them, but also converts them back on the return trip.

The project and the entire ml stack are open source and Apache 2.0 (dataset and model on Huggingface, label review, and ml pipeline setup).

Repo: https://github.com/dataiku/kiji-proxy Demo: https://youtu.be/txzzY5bU2Ig

7777777phil 25 days ago |

> The model is available today under the Apache 2.0 license on Hugging Face (opens in a new window) and Github (opens in a new window).

Bringing back the Open to OpenAI..

awestroke 22 days ago | |

It's only open because nobody who's interested in this model would send their data to openai to be stripped of PII. If they thought otherwise, it would be closed-weights and API-only for "safety" reasons

Havoc 25 days ago |

50M effective parameters is impressively light. Is there a similarly light model on the prompt injection side? Most of the mainstream ones seem heavier

benmann 22 days ago |

I welcome this release. Lots of good reasons to have these models and practices in place, even outside regulated industries. Even the EU AI act make some of this encessary (in theory). I've built redaction and rehydration through specilized NER models into https://grepture.com, so definitely adding this to the pipeline. Optionally sitting in the hot path, allowing actual tinkering with requests before/after they hit the LLM can be really handy for compliance or dircet user input scenarios.

I_am_tiberius 22 days ago |

I assume they use this model to be able to train new models with user data.

I_am_tiberius 22 days ago | |

If they think they are respecting their users' privacy by doing so, they are very very wrong.

flashdesk 22 days ago |

This is exactly where stochastic approaches feel uncomfortable.

For anything touching security or privacy, even small inconsistencies can quickly erode trust.

freakynit 22 days ago |

Can someone explaon how can I reconstruct the original entities back if there are, for example, more than one person names?

pros 22 days ago | |

You cannot — not with the model alone. It gives you spans + types, not identity.

You need to do that part yourself after the model runs. The filter gives you spans; for each one, assign a stable ID (PERSON_1, PERSON_2) and keep {PERSON_1: "Harry", PERSON_2: "Ron"} next to the document. Swap IDs in before the LLM call, swap originals back in the reply.

Scoping that map to a document/project keeps the same person consistent across calls, so Harry stays PERSON_1 instead of becoming PERSON_3 the next time he's mentioned.

(Disclosure: I'm building a Mac privacy tool, RedMatiq, that does exactly this. The mapping layer turned out substantially harder than detection.)

freakynit 22 days ago | | |

Thanks.. I was expecting it would itself return redacted document and this Map ... but, that spans approach works too.. with a bit of effort.

Also, care to share your app link/homepage? I google, but couldn't find it.

ares623 22 days ago |

This looks actually useful. But can someone help me understand how you address the non-perfect scores: "Privacy Filter achieves an F1 score of 96% (94.04% precision and 98.04% recall)."

How would you actually use this if it can fail redacting 4% of the data. How do you reliably know which 4% failed?

vessenes 22 days ago | |

My experience with models that can reach high 90%-ile benchmark rates on tests is that often that last few percentage is arguable, vague, and often experts would disagree. You could try it yourself by training an MNIST classifier and seeing which digits your model inevitably cannot guess -- you'll be like "...wait a minute..."

Anyway, I have no idea what the underlying data here looks like, but I bet it's pretty unusual.

When I was working on my first job out of college, we were given a large contract and told to redact with black Sharpie every name of a company; it was a basic document prep exercise ahead of a strategy session for a competitor. Standard practice was to share general information but not specific. Our redaction error rate on 200 pages of contract was ... not 100%.

frognumber 22 days ago | |

This is not a tool which can be used to assume information is anonymized.

The way OpenAI describes it is ...

... concerning.

"Our goal is for models to learn about the world, not about private individuals. Privacy Filter helps make that possible." This means they're using sensitive PII to train models.

A smart AI will re-identify all the information -- including that in the 96% -- in a snap. That's already a solved problem.

flashdesk 22 days ago |

This is where stochastic approaches start to feel a bit uncomfortable.

Even small mistakes can make something dealing with sensitive data hard to trust. It seems useful as a first pass, but I’d probably still want some deterministic checks or a human in the loop to feel confident using it.

fathermarz 22 days ago | |

I built a community tool for exactly this, based on privacy first principals but around the what. It’s workflow based and not “put your sensitive data into ChatGPT and hope it captures the right stuff”. Mostly built for security folks but anyone can use it

Check it out: https://redact.cabreza.com

billylo 22 days ago |

I ran a comparison using this OPF with what I have in clarity.evergreen-labs.org (it's a Tauri app with local PII redaction.)

A few things jump out:

1) Dates are aggressively redacted, creating false positives. 2) Non-English names are not working yet.

dsavant 22 days ago |

https://peyeeye.ai literally solves all the problems everyone is mentioning in this thread

pando85 22 days ago | |

Privacy tools from the company that scraped everyone's data without asking. The irony is lost on them.

imrozim 22 days ago |

PII masking is always an afterthought when building a backends 50m active params means it can actually run in production.

a10c 21 days ago |

I'll bet Justice department can't wait to get their hands on this.

ndom91 25 days ago |

Where's the gguf from Unsloth and co?

The submission "OpenAI Privacy Filter" that you posted to Hacker News (https://news.ycombinator.com/item?id=47870901) looks good, but hasn't had much attention so far. We put it in the second-chance pool, so it will get a random placement on the front page some time in the next day or so. This is a way of giving good HN submissions multiple chances at the front page. If you're curious, you can read about it at https://news.ycombinator.com/item?id=26998308 and other links there.

(a) Said "Please wait while I turn off recording"; or (b) Transferred the call to an automated system that read the card details via the phone keypad input and then took back control of the call afterwards.