Training our own AI models(posthog.com) |
Training our own AI models(posthog.com) |
I wonder if they regret opensource, considering people will be using LLMs to replace them which have surely trained off of their code.
But then it gets used to describe the reverse, and we have to add words to clarify.
I once saw a post here with a correctly described opt-in telemetry before, and the top comment here was attacking them for the reverse, thinking it was including them by default, so there's little winning, it's one of those words that has just come to mean it's opposite.
The internet really stinks. The 1999 teenager in me somewhere is really bummed.
I've taken to a) leaving a negative Google or yelp review for such establishments and b) never coming back. This is a practice that needs to die.
But it's apparently yet one more thing we have to be actively suspicious of as it defaults towards an intolerable state. So it's easier to just rip it out of the system and move on.
Users on our EU cloud instance are opted out by default
So too users with agreements that prevent training (e.g. BAA, MSA, or similar)
All other users on our US cloud instance are opted in by default
We will anonymize all data before it's used for training
We will only use data that already exists in your PostHog instance
We will do all the model training ourselves, which means...
We won't sell or send your data to third-party model providers
You can opt out at any time via your org settings in PostHog (admin access required)
Training won't start until June 29, so there's plenty of time to decide
AFAICT this now gives them default permission to train an LLM on your code (as Posthog telemetry data is inextricably tied to your code) use it, and even sell it if they wanted to (as it's not your data anymore, it's their model). Yikes.
- The OS Redesign
- "Sexy Legal Documents"
- Emails with "<relevant hedgehog meme goes here>" as the subject line
- Having a merch shop with action figures of your CEO
It works both ways. When you're looking for adoption and making very pro-user moves, I guess it can be a benefit. However, when you're now looking to grow revenue and making very anti-user moves, it's insult to injury.
I'm the last person to say that tech "shouldn't be fun" or something overly-broad like that, but if your messaging doesn't match the decisions of leadership, you're gonna have a bad time.
I remember people cheering about their "OS" web redesign, which was the most confusing and unnecessary UX complication when I needed to go track down a session replay to debug something (They've since added navigation to the top right.)
And since the big platforms don't have to unwind their advantages or pay back for the methods that are now restricted and considered illegal, they can peacefully extract rents from their entrenched positions for even longer, while everyone else is prevented from using the same ladder they climbed.
> Put simply, because otherwise we will not have enough data to train a model that's actually useful.
AKA we won't be able to make as much money if we required you to give us permission to use your data.
Posthog has unfettered logged in access to some sensitive stuff. What steps are they actually taking to scrub sensitive data from my replay before being used to train a model?
I’ve now made our decision. We won’t be using them.
If they are going to position yourself as the non-slimy no-BS guys, they can’t pull this nonsense.
They’ll use your product and your data to later sell a product back to you.
As an aside, this also means the EU rules are working.
This does not make any sense.
> Now they want to use the data for a business purpose.
They raised VC money and they want a return so this was predictable.
The temptation and the value is too great, and the opt-in opt-out consent thing ends up being a fuckery where the company tries to trick the user into allowing them to take a look into the data, presumably because they are selling the product at a loss and need an alternative revenue model.
Just make it impossible from the get-go, the fine print would be that the data can be shared off-band explicitly, in an email, or if explicitly copy pasted in a support chatbox, but there would be no mechanism for us to read the data from the databases much less from the client.
I don't mean it would be an air-tight mechanism like Signal or ProtonMail, if a court order would ask us to produce client info, we would still reserve the right to produce the data, but exceptionally, and definitely not for training models.
Another term I would incorporate is a Seppuku term, if we get hacked, I resign, the company goes bankrupt. Anything else is the wrong attitude to computer security for companies that want to scale to Global reach.
Cool, cool. Glad to see that you are the arbiter of what your users have "opted" to do, and their input isn't required.
While we're at it, I'm going to "volunteer" your time to rebuild my patio this weekend. You don't need to worry about volunteering, I've done it for you.
Opt-in vs opt-out organ donorship has a large impact.
Most people on any web app won’t stray from the defaults.
This feels like a really bad defense. It’s great you provide transparency but I don’t want my analytics system writing my code. There are already so many other first movers that are better that I would rather connect to your analytics.
Anonymize by what definition? GDPR? Do note that this very high bar.
> All other users on our US cloud instance are opted in by default
Including end users in the EU? You should remember that you are obtained the personal data directly from data subject meaning Article 13 obligations apply. Article 13 omissions cannot be cured retroactively. Can you show all of your customers have provided sufficient Article 13 notice to cover this processing?
And do note that you are almost definitely within the scope of 3(2)(b).
That's actually an interesting note. So you all will be managing the training runs on hardware you own or rent and manage?
I feel like you either know that already, or should, but either way I won't be using your product anymore. Just pulled it out of the projects I'm personally in charge of and in the future I'm going to recommend against using it both internally and for clients.
Legitimately disappointed.
if you are looking at your metrics, I want to be clear that this transition will not happen overnight, but it _will_ happen for this reason, so just be aware that your short-term metrics won't tell the full story
This is slimy.
1. Lobby your representatives to improve your data protection laws, even if you think it's pointless to do so
2. Stop attacking EU data protection laws, even if they inconvenience you
As can be seen from this announcement, data protection laws do make a difference.
But at what point do we call a spade a spade and say it's just them secretly inflating their prices? "everything is a penny but we charge a 1000000% service charge"
I'm talking about restaurants that just add service charge to everyone.
Legitimate question, I am not trying to prove a point.
The fact that they only opt-out EU users, because regulation forces them, tells you all you need to know about the moral compass of PostHog.
This shouldn't even require regulation, but apparently expecting companies to act morally is a bloody pipe dream. Profit over morals and concerns for your costumers, apparently.
PostHog here is saying they will train on your data but opting out is allowed. For the taxes analogy to work, PostHoh would not offer the opt-out option at all and you'd be doing something like hacking their system to filter your data out on their end.
The same cannot be said for some random corporation training AI models off your data to make a buck or two.
Organ transplant surgery costs hundreds of thousands of dollars, yet donors get zilch, which is completely unfair when everyone else in the value chain gets paid.
If instead it was "allow my organs to be sold for my estate" I think the supply of organs would greatly increase, which would be win/win.
Not the brain, the center of conscience, being kept in some sort of horror-movie half-alive state. I do not think we understand consciousness enough to rule out what those brains are experiencing.
Consent matters.