I run a SaaS business and I dropped Google Analytics a long, long time ago. Primarily because of the tracking, but also because I really couldn't see the value of the data.
In the old days, you could at least use the "Referer" (sic) header to know where people came from and what they searched for. But that is long gone, and the only source of that data is Google/Bing search console.
Page visits are a vanity metric: they tell me nothing about my business. The only thing that actually matters for a SaaS are signups and MRR. Measuring your business by page views is like measuring the business performance of a Walmart by counting cars on the freeway nearby. Yes, the numbers are somewhat related, but you can't draw any conclusions.
I made it a point not to include any third-party JavaScript on my site, but even if I were to make an exception for these analytics, I can't really see the point, unless you are running an ad-driven site where pageviews are king.
Say for example, if all your users start spending 30% more time in your reset password page after you pushed out some changes. How would you know? What could be causes of that? Could something be broken with the login? Apply this to everything.
Not having analytics is literally not caring about what they do in your product, so you're either never changing the product and 100% confident it'll always work, or you're probably giving them a worse experience than you could.
How you do this tracking is another story, but there's ethical ways to do it.
The change of adding obnoxious tracking of course accounts for some user loss itself, which it cannot measure. On some of those "modern" websites, that show me a whitescreen without JS, I check my uBlockOrigin and see the domain of that website and some Google shit? Tab closed. No thank you, I will go elsewhere.
Understanding what drives traffic to your SaaS website is such an important piece of information. For instance, if you write two articles, one describing how to use your product to achieve a certain thing which customers want to do, and another article which compares your product to a competitor product and one of the two articles creates 50x more traffic than the other then you'd certainly want to know this, because then you know what articles give you the biggest return on your time writing them.
Just one of so many examples how web analytics is such an important tool to being a good sales person.
With true analytics, understanding typical session helps you optimising users workflow, making sure relevant features are easily discovered at the right place.
It really helps when you want to work on user experience. You may need metrics such as LCP, INP and CLS with details per type of page, ability to drill down data and get that in real time.
ROI of such script depends on what you do with the data. If that's vanity or not even looked at, you are emitting CO2 for nothing.
These are qualitative improvements which are extremely unlikely to stem from quantitative metrics, especially when the sample size is not significant (which it is for the vast majority of pages in existence).
One of the most disappointing client experiences I had was after building a custom shop for a company that was heavily focused on graphic art. We optimized the hell out of their site, getting performance scores of 97+ when every page was image heavy and included a product grid designed for a masonry grid look similar to Pinterest.
A few days before launch they asked us to add their Google Pixel script. The next day they had included 7 or 8 different third party scripts and blown performance scores into the mid 50s. Its their site and they can do what they want with it, but I sure could have saved a lot of dev time if performance didn't matter at all.
Page visits tell you have many people you get. If you then use how many sign ups you get then you have a conversion rate. That’s an important figure. Page visits can also tell you if your marketing efforts have worked. Imagine doing all the marketing work and not knowing if it did anything.
We're a membership driven organization, and by "membership" I mean we rely on donations to fund our content creation (Though whether you're a member or not you have the same level of access to our content). We care about raw traffic numbers, because it relates directly to our mission of informing people. It tells us how many people we inform day to day.
So yeah we care about those raw numbers, and those numbers are difficult to get w/out javaScript r/n because caching and the terrible log retention of our hosting providers.
Raw traffic numbers only tell part of the story though. We want to know the path people take from first landing on the site to becoming a donating member so that (in theory) we can do more of the things that promote that behavior in more people. That's The Funnel, and that's where orgs like Plausible are best. They're first party tracking, so the data stays with us. Also since they're first party tracking we can track a person's overall relationship with our site, from the first news story they read to the moment they first hit our donation page 3 years into the relationship or whatever.
We should be able to do that with our GA set up, but one of the reasons I want us to shift to Plausible is for its simplicity.
It's funny that they spout nonsense about better UX or how you wouldn't be able to do CRO when you'd just laid out two metrics that are actually important and don't require any website analytics to track.
Their most recent blog post:
Things I hate about GA4
It's also cookieless, the hosted version is free to use within reason, and it's extremely lightweight if you choose to self-host it. It doesn't even need a separate database, it can run self-contained with SQLite (or Postgres if you prefer). A good fit for small sites where the big industrial-grade solutions are overkill.
So much so that I made my own that focuses on self-hostability using SQLite and DuckDB (no external dependencies, can run on a 256MB VM): https://github.com/medama-io/medama
Stuck it behind a NGINX frontend and it works just fine.
https://github.com/Glimesh/glimesh_app/blob/main/lib/track.d...
I’m vain and curious enough to want to see the Google data, but not so much as to pay $160/yr for the Matomo plugin for my personal blog.
[0] This isn’t the same as Google Analytics. You can get this information without installing a tracker on your site.
https://search.google.com/search-console/about
It's not perfect, but it is free.
I think Plausible’s self-hosting is not simple, requiring unnecessarily heavy databases like ClickHouse, which can be overkill for the average website owner. Comparatively, this project can effectively run on a 256MB VM for most small website with no external dependencies.
Sites can track sessions without tracking personal data.
> The IP address and User-Agent are never stored to the database or disk, and there is no conceivable way to trace the random UUID back to this. > > It’s only stored in memory, which is needed anyway for basic networking to work.
I can't say whether that is GPDR compliant but it's definitely not storing the hash
Could you detail how that would work?
https://www.gtlaw-dataprivacydish.com/2021/03/what-is-hashin...
I think simple hash(IP) is only pseydonymiztion and can be reversed with a bit of work. And thus cannot be stored without consent.
Of course mapping each IP to random id and not storing the mapping should be completely ok.
And legitimate reasons allow storing the mapping for a short period for debugging and attack detection.
And regarding anonymisation, is it enough to remove the last two parts of an IPv4 IP, or it must be more?
So if you store and analyze everything "locally" to your server you don't need cookies and therefore no banner no matter how much you "track" since its all request made to your own server you merly use the telemetry of.
You can't share that data without consent but thats a seperate data protection thing from the cookie banners.
1) If it's strictly necessary, e.g. logging in or legal obligation, you're fine and don't need to ask
2) If the data can be associated with a specific human, and it isn't covered by #1, then ask
3) ??? legitimate interest ???
* but I know from experience that this means "don't trust my own feelings of clarity, ask a lawyer"
you have an online mail service, you have to save email accounts of emails you receive so you can respond to those.
you allow people to forward their emails received to other email addresses, you need to save those other email addresses.
This would be in dbs for that stuff if you have third party marketing analytics, just because you have legitimate interest to save email to make application work doesn't mean you can pass that email into third party marketing analytics. That is not legitimate interest.
if you have a newsletter service and someone signs up to receive newsletter then you need to save their email to send that newsletter. you don't need to ask, they have implicitly given you permission by asking you to send them the newsletter.
If you have a process for removing users from service for violation of terms then you probably need to be able to keep information about them otherwise they can just say get rid of info and then sign on again - this would come into the parts of the Digital services acts about obligations to users and appeals process for removal etc. but different thing, if you have removed someone you need to be able to identify when they try to come on again.
> Allow users to access your service even if they refuse to allow the use of certain cookies
Does it mean that sites like https://www.spiegel.de, are not GDPR compliant?
That can apply to most businesses.
These are still JavaScript solutions, so if their JS code is broked then you just don't get the data. You end up with unknown unknowns.
The only truly reliable data you can get is from your server logs, and obviously you are limited by whatever the browser gives you in the request.
But again, IANAL, so don't take my word on that.
What’s that? We need users consent for ad cookies? Ok let’s also make them consent to the session cookie too as a way to confuse them or get them to lazily just click the accept all cookies button rather than find the exact cookies the site needs to run without ads.
Not sure about how much of IPv4 must be anonymized. If you want to be sure, just anonymize the whole thing. Important to make it random, and not use a hashing function that always gives the same output for the same input IP (in that case, it counts as pseudoanonymized and can be PII).
Also, IANAL, just a dude who is passionate about online privacy.
When you next fetch that resource, because it is stale, the browser will revalidate it by passing an If-None-Match header containing the ETag. Update the ETag to include the original timestamp and the current timestamp.
So on every page load (or whichever other event you want to measure), you will be told when that session started, the session id and when that visitor was last seen.
To set the maximum session duration, reset the ETag if the last seen timestamp passed to you in If-None-Match is too long ago.
This can even work without JavaScript by using an img element.
The only data tracked with this is the session start time, last seen time, and a random session id. Since the session id isn’t related to any of your business logic, it cannot be used to identify an individual.
To further isolate this data, locate the tracking resource on a different hostname. The browser’s SOP will prevent any cookies from being sent with the request, so your analytics backend can’t record identifying information even if it wanted to. This will also prevent you from tracking which page is being visited, though you can override that with the no-referrer-when-downgrade referrer policy.
if (!sessionStorage.sessionReported) {
reportSession();
sessionStorage.sessionReported = 1;
}Plus the data that you're required to retain by other laws. E.g. banks/financial institutions might be required to retain a lot of data for several years for audit and compliance purposes.
normal people will not see the tracking. It's when laws force the cookie banners that it starts to become an item in people's minds, because that cookie banner is annoying.
If it was a different random id for every request, then sure, OK.
If it's the same random id used on multiple requests, then it becomes PII, as its purpose is to uniquely identify and individual. It should not be logged or stored.
But if what you are saying is true then it's impossible to know how many people visited your website unless you have banner. What about logs then? Sounds like everybody is happily using those because they are "legitimate interest" because servers couldn't work without them but its way more identifying data than what Plausible saves.
That doesn't make it any less PII. Also, the 20 minutes thing is just a number you plucked out of thin air - it's actually valid for 24 hours.
> But if what you are saying is true then it's impossible to know how many people visited your website unless you have banner.
No, that's not what I'm saying at all. First of all, that claim is clearly false. If your web server logged only the URL and nothing else, no time, nothing, you would have accurate usage counts for every single part of your site.
For the record, I actually think Plausible attempts to do a good job - it's clear they are trying their best to be privacy focused, not log anything, only provide data in aggregate - that's all good stuff. However, I'm not sure their stance that their don't require consent is valid, because the hash itself is PII. The reason I think the hash is PII is because of how it is being used - to identify an individual user.
Oh, and servers can work perfectly fine without logs. People like logs, but they're by no means necessary.
Logs by themselves aren't necessarily a problem if you have a clear data policy in place, and there is a legitimate use for them. The point is disclosure of the data use, and timely deletion of any data that isn't strictly necessary for the business use. So, you can keep PII around relating to billing for as long as they have a subscription, or as long as you are legally required to keep customer records for. After that, they need to be deleted. Anything like access logs that you can justify a business need for can be kept, perhaps a few days or ideally hours until you extract aggregate data, but again you need to state that in your privacy policy, and they should be promptly deleted as soon as reasonably policy.
And as I said before, all you need to do to comply with the law is to make sure you have the user's consent before tracking them. It isn't really that onerous. The question is, if you don't want the user to know how you're tracking them, why not? What are you hiding?
For example, it would make stopping a DDoS attack much harder if you would need to anonymize IPs.
Here's some interesting discussion on this very topic: https://law.stackexchange.com/questions/28603/how-to-satisfy...
No, they don't. They don't tell you if visits are up, if more people heard of you or anything. They just tell you that x number of people signed up. We can guess that marketing is going better but maybe it's the time of year where more people need the service. If signups go down, maybe you just had downtime or something on your page was broken.
If you look at any number in isolation you're never going to get the full picture.
And your MRR can go up without any marketing. You can just do sales.
In my experience, it's extremely cheap and easy to get a load of fake page impressions from bots, or to buy your US-only company loads of pageviews from low-cost-of-living countries, or to expand the top of your sales funnel with weak prospects who'll never convert to sales.
Seems to me only a fool would pat themselves on the back for doing so.
Imma make my analytics look really good when they're crap because??? People buy fake followers because others can see it. No one else is looking at your analytics. And you sure as hell don't want to increase your page views since your conversion rate would tank and that's the most important metric.
Again, if you don't care about visits, you don't care if they're up. OP said it best: signups and MRR.
People hearing about you: do you seriously believe that website analytics are suitable tools that provide reliable metrics for brand/product awareness, recognition, product-market fit, etc.?
>maybe it's the time of year where more people need the service. If signups go down, maybe you just had downtime or something on your page was broken.
Exactly, seasonality and website uptime / page functionality are important. They should be measured. At the same time, website analytics have nothing valuable to add to these measurements.
>And your MRR can go up without any marketing. You can just do sales.
I think you are circling around it: all those analytics metrics are just a means to justify the existence of useless 'marketers' who have no idea how to actually measure brand visibility, recognition, or any qualitative metric. These 'specialists' can't even fathom (heh) that business seasonality is something that shows up in a north-star metric and have no imagination or technical ability to set up a website monitoring service or a crawler, use a CRM for attribution, etc.
Oh, they did. My bad. OP a god, they can't be wrong. Oh wait, I'm saying OP is a narrowminded and missing out.
> People hearing about you: do you seriously believe that website analytics are suitable tools that provide reliable metrics for brand/product awareness, recognition, product-market fit, etc.?
Why are you bringing up PMF when it comes to analytics. BUT! Yes, can. If your users are using your shit all the time and you got analytics all over your app, you've probably got a better
But remember when I said earlier looking a single stat in isolation is bad? Ssh.
> I think you are circling around it: all those analytics metrics are just a means to justify the existence of useless 'marketers' who have no idea how to actually measure brand visibility, recognition, or any qualitative metric. These 'specialists' can't even fathom (heh) that business seasonality is something that shows up in a north-star metric and have no imagination or technical ability to set up a website monitoring service or a crawler, use a CRM for attribution, etc.
"Useless marketers"...
Anyways, you're complaining about others people useless while you're saying all data except for your north star metrics are useless.
Imo, this is arrogance and ignorance mixed together.
It's a simple trick: declaring all data collected to technical data, when in fact it is linkable to a data subject.
Thus collection of the data requires consent, because a subject is identified at least for the session.
If you can identify unique visitors you are clearly identifying individuals.
hash(daily_salt + website_domain + ip_address + user_agent)
That's what they do. Within 24 hours the daily salt is gone, and the data is anonymous.https://plausible.io/data-policy#how-we-count-unique-users-w...
You still need consent to collect it - well or some other kind of legal shenanigans. The intent is to track a person, it is not technically necessary. You might have a legitimate interest - but in the end you still have to consider the GDPR to use this tool.
https://europa.eu/youreurope/business/dealing-with-customers...
I have doubts that just identifying unique visitors would also identify individuals. Their current approach of creating random id which is unique for 24 hours should not violate GDPR? or it would?
Anonymisation of data is data processing and some argue, that it is subject to a privacy impact assessment. Arguing that if done poorly it has great negative consequences for the individual if they can be deanonymized.
The duration itself does not change the outcome.
Thus said the approach Plausible takes is much better than any cookie used.
One of the key rights individuals have is to request that ALL PII about them is deleted from all of your records, and you have to comply with this request within a certain timeframe, and a maximum of 30 days. This includes backups, logs, everything.
Obviously, it's impractical to try to edit old backups to remove PII, so you have to be careful how you deal with logs in the first place - you might want them to be backed up on another machine with a maximum lifetime of a few days, you might want to not back them up at all and only backup your aggregated data, etc.
But keeping logs for a few days can be justified for as you saying DDOS mitigation, post-failure root-cause-analysis, etc, but the defaults for that data should be to delete that data as soon as it's no longer useful for that purpose, which for most companies will be a couple of days, maybe another couple for the weekend. You can keep it still further, for instance for active analysis, but the default should be to delete it as soon as possible.
There is no "disproportionate punishment" under GDPR in practice, unless you're doing something egregious, and even then (see Facebook). I'm very familiar with the UK regulator, they publish their enforcement actions [1]. I'm not aware of a single case of a cautionary letter, much less "disproportionate punishment", that they sent over a cookie banner on its own. Are you?
Besides, you correctly hinted at the incentive structure. Your lawyer might advise you to slap a cookie banner just because because they have zero incentive not to, they don't care about your users' experience. You might care though. Personally I consulted multiple external DPOs and lawyers, as well as primary sources, before forming my opinion.
Their position was simple: my team uses 3rd party analytics tools (no ads or anything) so IPs will be passed and cookies will be stored. We don’t control them, we don’t know what kind, if they can be considered personal info or not (GDPR is intentionally vague - classic bad law). So we need to be extra careful since our regulator is not a sane one like the UK’s. Thus: follow the common practice - cookie banner. End of story.
If I were you, I'd consider changing my lawyers. This is explicitly forbidden by GDPR (art 28), you have to know what your contracted data processors are doing, and you have to have processes in place to assure data subjects rights (eg remove their data from your contracted third parties on request). Cookie banners have nothing to do with this, and you're in breach of GDPR cookie banner or not. If your lawyers didn't stop you from breaching art 28 but recommended slapping a cookie banner "to be extra careful", that's a major red flag.
This is super wierd spin from what i said. I work on content heavy media sites that are not ad driven. Its either from grants like research or journalism or its presentation of commercial work. Architects, design studios, publishers, writers… All of these clients want to have ballpark numbers of how many people visited the site. Nobody processes or sells this data. Its 10s to 100s visitors a day. We try to use the most private way we know of.
Its crazy that because of the sick practices of this industry i am suddenly the one suspicious. Some kind of nothing to hide fallacy huh? No we are not hiding anything. We just dont want annoying consent because of visitor counter. The ones hiding something are the ones with tricky psycho designed multi step consent banners. We just dont want to be in same bunch just because few basic stats.
You don't need cookies for that.
Again, as I've said before, you can for instance log data for technical reasons, e.g. wanting to post-mortem a failure or attack, as long as the data is deleted promptly as a matter of course. You shouldn't use the PII in that log for analysis without the user's consent (so for a log file, that means you probably should never use the IP address except for endpoints that are only accessible to logged in users), but the URL they accessed isn't PII (unless you start putting identifying tokens in it).
If you just want ballpark numbers, just extract the URL field only, and count how many times each appears. Obviously, this will give you metrics on how popular each page / asset is, not how many unique users you have. To do that, you have to identify unique users, and to do that you need to have their consent.
> We just dont want annoying consent because of visitor counter.
But the law requires you to get their consent.
> The ones hiding something are the ones with tricky psycho designed multi step consent banners.
To be fair, I agree with you. They are deliberately designed to be awful in the hopes that the user will just take the least path of resistance and accept their terms. However, it is still a choice. In the cases when I see such a consent form, I either just close the window or I re-open it in incognito mode so I won't get a persistent cookie if it's something I really want to read.
The point is that the regulatory line needs to be drawn somewhere. The law at the moment says the line is: If PII is required for your site to function, then must ensure the user knows you're doing it. If PII isn't strictly required for your site to function, but it provides a benefit to your company (usually re-framed as how to ultimately helps the customer), then you must request consent. Both of these cases are covered by the usual kind of popup, but that's why you'll see some that you can disable (like sharing data with partners) and some you can't (like cookies for logging in). But you still need consent.
> We just dont want to be in same bunch just because few basic stats.
Then just collect basic stats like how many hits each page got. That's fine, you don't need cookies or PII for that. Number of active users isn't a basic stat though, as it clearly requires you to distinguish between different users and any process you use to do that creates PII.
Perhaps you should consider just explaining why you want the cookie in your popup. If you word it in such a way that explains that you're only using daily active users as a metric to justify continued funding, you'll probably find most people are totally happy to click accept. A message plus simple ACCEPT / DECLINE is fine, as long as the message makes clear what you're doing. Note that you can set an "essential cookie" in response to them clicking DECLINE as long as you've explained that the website uses essential and non-essential cookies, but obviously it shouldn't contain anything other than a simple accept/decline result.
I will not jump the gun just yet. We will keep being in this gray zone until i see the authorities have problems with approach of matomo/plausible. I have seen the opposite. If they did we would remove the analytics entirely because there is nothing worse than cookie banner which instantly annoys users and puts you on level with any other mainstream site that does fingerprinted tracking.
No one will get fined for not asking consent for this. Our DPO just said ‘don’t be silly’ when I asked him. But we will see if it gets tested (my bet: it won’t).
Sadly, reckons don't hold up in court.
> you cannot retrieve the ip from the hash
You don't need to retrieve the ip to make it PII, the hash itself is PII.
You might not think of it as containing actual "personal information", but its sole purpose is to attempt to uniquely identify a person. That makes it PII.
> (and residential IPs are usually dynamic)
This actually makes the short lifetime more suitable as a PII, because it reduces the likelihood of the same IP being used by a different person being tracked as the same person.
> The short lifetime together with never storing the hash makes it so you cannot de-anonymise the user.
That also doesn't matter, because the lifetime of the token is long enough to track the user through and entire typical session, maybe several.
The stupid thing in all these shenanigans is that collecting the data isn't itself the problem, it's not getting the user's consent. Just tell the user what you're doing, and it's not a problem - if it's a "technically required" cookie they can make an informed choice to use your site or not, if it's an "optionally required" cookie, they can choose whether to accept or not. Most users won't care and will click on the biggest, most obvious buttons. The ones that do care are likely atypical and would skew your metrics anyway.
You can as long as you have IPv4 visitors, because the search space is small enough to brute-force. There are only four billion IP addresses. The user-agent complicates things a little but there aren’t many of those, so you could retrieve the IP addresses of most visitors from the hash if you wanted to.
> residential IPs are usually dynamic
Usually isn’t good enough. I’ve had residential IPs that are on public record belonging to me personally. IP addresses can be personally identifying information, so they need to be treated that way.
I get what you're saying - in that if you know the IP address, then you can often easily discover who the individual is. I'd counter that actually, for most people this isn't the case - for many companies, only the ISP, Google, Apple, Facebook etc know who the real user of an IP is... (incidentally, the people most keen too force analytics on you, but that's another issue).
However, that is all kind of moot. The hash itself is PII, because it can be used to track an individual. PII isn't about the difficulty of determining the specific identity of a user, it's about the difficulty in identifying a specific user. The distinction is subtle, but important.
Take an example - people are using a wireless hotspot somewhere, maybe you own a coffee shop, and over the course of a few weeks, you're alerted to the fact that someone has been accessing some illegal content that could get your business in trouble. You've been careful to comply with the GDPR, and your logs only include time and hostname of the server accessed. On it's own, there is no PII there. But, combine that with say credit card transactions, or video footage and finding out who was in the coffee shop every time this happened. Then boom! Suddenly, your time has become PII. Maybe not uniquely correlated to a single person, but a group of people. With every instance of a correlation to that person and a group of random people, it doesn't take maybe to narrow it down to a specific individual.
This is why, to actually comply with GDPR, you need to only store logs for as short a time as is technically required (legally beyond a month is hard to justify, ideally a few days at most) and then you should aggregate into groups where individuals cannot be isolated. If your aggregations result in groups of people that are too small, you need to change the aggregation groups, or report an empty group. It's totally fine to store data like "on this day, n people went from this page to this page, average linger time blah seconds" if n is 10 or more. If n is 1 or close to it, that data is still identifying.
Most websites don't get fined using GA. Plausible is a huge step in the right direction, but their claims are very strong and not backed up by the GDPR if you take a closer look.
Regarding fines: most offices will give you a warning instead of a fine, you adjust your cookie banner and you are good to go
A bad law, an ambiguous law compels you to be defensive and take precautions. Cookie banners are one of many such defenses and everybody seems to be doing it, validating our strategy.
Thanks for your advice, but unless you are willing to defend me in court and put your money where your mouth is, with all due respect, I will consider its value to be exactly how much I paid for it.
On the other hand they probably realise there's zero chance for substantial review of your GDPR practices by the regulator (much less seeing them in court), so they can recommend sticking a useless plaster (opt-in has to be specific, and how can it be specific if you collect it for unknown future changes) and keep you in the dark about more substantial requirements.
GDPR is a very good and clearly stated law, you can read through it yourself in about half an hour to an hour, a negligible time investment for such an important piece of legislation. The purported ambiguity is a psyop by people who don't want to comply.
For example, consider IP addresses as PII. (This is of course not clearly specified by the GDPR). Then analytics processing them needs consent. Thus cookie popup.
Anything else is interpretation unproven in court.
IP adress is required for site to function - your server cant not collect it. Plausible also only processes it for uniqueness and doesnt save it as is. Interestingly most webservers/firewalls will have to keep track of ip adresses so they will be saved in acess logs and caches. Making them more problematic than Plausible. Yet its most likely fine because the intent is not to track individual users but to improve service/keep it runing. Plausible intent is also not track individual users but collect visitor counts which is something used for improving service too.
I think you might be prematurely spreading fear.
Who has gone on record with this, and in which jurisdictions?
I think there is marketing tactic ad/analytics companies and marketers use against services like Plausible. They say these services also require cookie popup and wont give you as much detailed info so why would you use them. Most websites would be fine with limited data Plausible provides but it breaks ad/analytics industry business plan.
That's exactly the point. Processing of personal data to identify a unique person.
Regarding firewalls and logs: It's argued that this is legitimate interest as it is stated in Recital 49 of the GDPR. So they got a free pass, for the better or worth.
> I think you might be permanently spreading fear
Don't get me wrong, I like the approach. But it's not a get out of GDPR free card.
Not sure thats what i said. They cannot identify unique person. They identify unique legitimate visits per one day.
If logs and firewalls mean legitimate interest because you have to give server your ip address for everything to work then using same thing can be said about plausible especially since the ip address is immediately thrown away unlike with firewalls where the main point is to keep record of bad actors.
It is very different to google analytics where whole point is to pinpoint repeating visitors, their behaviour etc. You simply can't do that with service like plausible. What you can do is know how many legitimate visits you had and what was visited. For most websites that is enough at same time i would be surprised if not knowing how many people visited your site would not be legitimate requirement for service to function.
> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
IP addresses only allow to identify a natural person when combined with other data, such as ISP data or a profile built over dozens of websites. This is not the same kind of personal data as a name + address, Breyer notwithstanding (note the bit about the ISP in the judgment).
GDPR is not about identifying an abstract entity, it's about identifying a natural person. Doing the former for long enough/with enough data allows the latter, but especially with time-limited in-memory hashes that's a non-existent window of opportunity.
In practice this'd probably need to be resolved in court, and I'm sure not a single SME using Plausible or similar will even get a stern letter, much less fined.
Agreed.
Plausible just makes false claims like:
> All the site measurement is carried out absolutely anonymously. Cookies are not used and no personal data is collected. There are no persistent identifiers.
That's a heavy statement and it is simply not true, as you quoted:
> an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person
hash(daily_salt + website_domain + ip_address + user_agent) will fall under this definition.
But again, you are right, better then anything any other service does
The lack of persistence is one of the main design points.
If you're saying it's collection, that gets complicated because that data has to be there for the server to work at all.
That’s not what I said. I said if you have the hash, you can derive the IP address from it in most cases.
My point is that whether you can determine the IP address from the hash or not doesn't matter. The hash itself is PII.
Firewalls are a curious case. It is argued that the data is not collected but transmitted to the controller. Almost as if you get a letter with personal data and now have to deal with it.
Yes, it's a stretch. Not happy with it but I don't see any practical solution either...
> (4) At the latest at the time of the first communication with the data subject, the right referred to in paragraphs 1 and 2 shall be explicitly brought to the attention of the data subject and shall be presented clearly and separately from any other information.
I am not a lawyer, but as far as I can tell, there is no legal way to collect PII (including IP address) or place tracking identifiers on the user's device without at least informing the user explicitly under the GDPR and the ePrivacy Directive.
But soon there was an agreement that Art 13 lit. 4 could be interpreted that as long as you don't have any data collection beyond server logs this would be deemed as sufficient. Or in other words if you won't invoke the Art 21 lit. 1 of the GDPR.
But since everybody wants to track you on basis of their legitimate interest the web became full of cookie banners