OpenAI Privacy Filter(openai.com) |
OpenAI Privacy Filter(openai.com) |
> Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.
> The released model has 1.5B total parameters with 50M active parameters.
> [To build it] we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.
1. Pass the raw text through the filter to obtain the spans.
2. Map all the spans back to the original text.
Now you have all the PII information.
I quite like Moxie's Confer[1] approach to just encrypt the whole thing in such a way that no one except the end-user sees the plaintext.
On the other hand Moxie's Confer is really interesting! On first glance I thought it's using homomorphic encryption but it turns out to be based on hardware isolation. TIL +1
1. Sanitising PII data needs to be de-santised on the client in order to keep the UX somewhat functional. For example, if you say my name is John which get's redact to [NAME] and the model responds with Hi [NAME] it needs to be converted back to Hi John. This means that you need to have a mechanism for reversing PII at the layer where the user is interacting. Of course, that is true if your care about user experience.
2. Redacted PII data is practically useless for most intents and purposes. The model wont be able to do much without some data and there are many things that are considered PII. For simple chat system this is fine. For something more complex where the user needs to interact with the LLM this becomes extremely challenging as the LLM may not be able to do anything at all. There is also the chance of hallucination.
Overall, it is a feature that we support at platform level but it is not something people tend to use due to these limitations.
In my mind the only practical thing to do is to remove some types of PII that represent a security risk and make sure that you use a trusted model that purges PII data as quickly as possible. This will require a very different type of system.
It works pretty well for the use cases I was playing with.
The OpenAI model is small enough that I might enhance my tool to use it.
I fed it a ~ 100 line markdown document, took about 10 seconds, and it decided that "matter" (as in, frontmatter), "end" (as in, frontend), MCP (as in, mcp server) are organizations.
Most of them don't even make grammatical sense, e.g. "Following the discussion in <PERSON_1>, blahblah".
Brings me back to what NLP was like a decade ago. I always thought spaCy was a very nice project in that space.
It does work better on plain text than markdown because of casing. I can't see what you used (kinda the point - because it run all in your browser) but if you can share the markdown as a gist or something I can take a look and comment more concretely.
Since you can't be 100% certain that a filter redacts all personal data, you'd have to make sure that you have measures in place which allow OpenAI to legally process personal data on your behalf. Otherwise you'd technically have a data breach (from a GDPR pov).
And if OpenAI can legally process personal data on your behalf, why bother filtering if processing with filtering is also compliant?
The submission "OpenAI Privacy Filter" that you posted to Hacker News (https://news.ycombinator.com/item?id=47870901) looks good, but hasn't had much attention so far. We put it in the second-chance pool, so it will get a random placement on the front page some time in the next day or so.
This is a way of giving good HN submissions multiple chances at the front page. If you're curious, you can read about it at https://news.ycombinator.com/item?id=26998308 and other links there.The problem is when companies use things like this and somehow believe they are anonymizing the data. No, you are not.
Still, for scenarios where the processed data isn't being directly published or shared, but used as some intermediate step like moderation enforcement, human evaluation layers or model training it can be useful to filter these things out.
It detects 20+ entities, not just masks them, but also converts them back on the return trip.
The project and the entire ml stack are open source and Apache 2.0 (dataset and model on Huggingface, label review, and ml pipeline setup).
Repo: https://github.com/dataiku/kiji-proxy Demo: https://youtu.be/txzzY5bU2Ig
Bringing back the Open to OpenAI..
For anything touching security or privacy, even small inconsistencies can quickly erode trust.
You need to do that part yourself after the model runs. The filter gives you spans; for each one, assign a stable ID (PERSON_1, PERSON_2) and keep {PERSON_1: "Harry", PERSON_2: "Ron"} next to the document. Swap IDs in before the LLM call, swap originals back in the reply.
Scoping that map to a document/project keeps the same person consistent across calls, so Harry stays PERSON_1 instead of becoming PERSON_3 the next time he's mentioned.
(Disclosure: I'm building a Mac privacy tool, RedMatiq, that does exactly this. The mapping layer turned out substantially harder than detection.)
Also, care to share your app link/homepage? I google, but couldn't find it.
How would you actually use this if it can fail redacting 4% of the data. How do you reliably know which 4% failed?
Anyway, I have no idea what the underlying data here looks like, but I bet it's pretty unusual.
When I was working on my first job out of college, we were given a large contract and told to redact with black Sharpie every name of a company; it was a basic document prep exercise ahead of a strategy session for a competitor. Standard practice was to share general information but not specific. Our redaction error rate on 200 pages of contract was ... not 100%.
The way OpenAI describes it is ...
... concerning.
"Our goal is for models to learn about the world, not about private individuals. Privacy Filter helps make that possible." This means they're using sensitive PII to train models.
A smart AI will re-identify all the information -- including that in the 96% -- in a snap. That's already a solved problem.
Even small mistakes can make something dealing with sensitive data hard to trust. It seems useful as a first pass, but I’d probably still want some deterministic checks or a human in the loop to feel confident using it.
Check it out: https://redact.cabreza.com
A few things jump out:
1) Dates are aggressively redacted, creating false positives. 2) Non-English names are not working yet.
I'm suggesting that a model designed for high-accuracy redaction can also be used to find all PII in unredacted text. For example, if I don't already know how to find PII (e.g., regex, NLP, etc.) I can use OpenAI's Privacy Filter model to do the work for me.
And because each span has a type (PRIVATE_NAME, etc.) I don't even need to do any work to find only the specific information I am looking for; something that simple diffing wouldn't do.
I'm not saying it's an issue, I just think it is interesting that a tool designed to protect PII can also be used to find it with minimal effort. And it looks like someone already implemented it: https://github.com/chiefautism/privacy-parser.
I've built large human data entry operations. Variable throughput, monotony, hiring and perf management and firing, management, quality management. All of these things are large investments of human effort and money.
If I can achieve the same quality level (or in some use cases, even slightly degraded output) with software scaling characteristics and costs... I see zero reasons outside regulatory compliance reasons to have people do it.
Sure they do, computers repeatedly, quickly, and predictably do what they are programmed to do. Which includes any human errors in that programming.
And now they predictably do what they are not programmed to do.
Sure, there's some math that says being really close and exact arn't a big deal; but then you're also saying your secrets don't need to be exact when decoding them and they absolutely do atm.
Sure looks like a weird privacy veil that sorta might work for some things, like frosted glass, but think of a toilet stall with all frosted glass, are you still comfortable going to the bathroom in there?
The use case for this is that many enterprise customers want SaaS products to strip PII from ingested content, and there's no non-model way to do it.
Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.
Credit card numbers are deterministic. A five year old could write a script to strip out credit card numbers.
As for other PII ? You're seriously expecting an LLM to find every instance of every random piece of PII ? Worldwide ? In multiple languages ? I've got an igloo I'd like to sell you ...
"four three uh let's see sorry my vision is bad six eight..."
Easy versions of problems are easy. But reality is messy.
And no, neither I nor anybody else is expecting a 50B parameter model to find every instance. But finding 90% or 95% or 99% is pretty good, and sufficiently good for many use cases.
I don't know the last time you relayed card details over the phone, but the last 100 times I did it, the agent did one of two things:
(a) Said "Please wait while I turn off recording"; or
(b) Transferred the call to an automated system that read the card details via the phone keypad input and then took back control of the call afterwards.
Relaying card details over the phone is a problem that has been comprehensively solved. You don't need an LLM for it !> But finding 90% or 95% or 99% is pretty good
I would humbly suggest that you are over-estimating the capabilities of an LLM. ;)