I think that's the heart of why I so despise the GDPR. In an intent to change site behavior, politicians passed a law putting a burden on sites that did an undesirable thing (rather than, say, making the undesirable thing itself illegal).
Perhaps they thought sites would avoid the burden.
Did they not anticipate full shifting of burden onto end-users? Because being able to know how a site is used is extremely valuable to the site's owners.
I switched back to AWStats for my personal stuff. It's probably too basic for business or company apps, but for your personal stuff without javascript/cookies, it's still a great analytics tool.
I appreciate the problem and I would like to stop using GA in my static pages as well, but trading one privately-owned software from a tech giant for another privately-owned software from a different tech giant seems a bit ludicrous. I would readily swap GA for some decent open-source solution though.
Thus far, AWS has proven to be safe for companies to host their data upon, and there have been no leaks of data stored in AWS into Amazon's marketing program. The HIPPA, PCI, and FedRamp certifications help back up their claims that a company's AWS data stays in AWS.
So in the end, paying for stuff just shows that you're a more valuable product to sell. And gives them a great primary key to track you by.
The approach of 'today, company X is known to make money via Y, so we can trust them with private data' only works until that data becomes valuable enough for the company to invest in extracting.
The web SDK also supports collection of client-side JavaScript errors, which is neat for tracking down bugs and things which might harm user experience.
Why don't you? Do you really need this tracking at all?
Id be interested to see how amazons et up would handle the set up I am backup analytics nerd for.
Major beverage brand hundreds of websites, multiple locales per site (a dozen or more for the big brands) on and has to handle roll up as well as well as custom metrics and dimensions.
I've tried to separate myself from Google in various ways, and one of those was to replace Google Analytics with open source software. I tried several; they're all either non-functional out of the box, or require significant time investment to even start approaching Google Analytics.
After losing about a month of stats (which matters when you're also running AdSense), I ended up going back to Google. It took the same amount of time to set up as when I initially set it up: around 2 minutes of adding the tracking code and uploading it.
There are much better options out there. Quite apart from the solutions listed in these comments, a better option is to reconsider whether you really need analytics at all. Maybe the answer is yes if you are a business trying to understand your customers. But not every blog and project page needs analytics.
> ex-Amazon contractor, front-end lover, accessibility nerd, down for building cool shit, especially Vue.js and Amplify.js consulting
My alarm bells ring when the answer to "stop using X" is to "start using Y" where Y == company I worked for.
This isn't to say GA is or isn't problematic, but the article's bias is problematic.
Shows me a fun 'You are not connected to the internet' page that lets you doodle on the page.
I got to it by adding the following filter
@@||dev.to/goatandsheep/stop-donating-your-customers-data-to-google-analytics-191?i=i$xhr,1pMore to the point: there is probably going to be a bias in the analytics. Different people have different reasons for protecting themselves against tracking, but it is highly unlikely that people who are unaware of or disinterested in the issue will use a blocker.
Terrible argument.
Am I missing something?
We use Splunk as our data engine and you can install it on your own server. This way you have full control, access and ownership of your data without letting third parties get any data. In that sense Harvest is basically the infrastructure that allows you to collect, store, use and visualize your data.
Besides that, we have been focusing on features that will help companies comply with privacy regulations. It is proven that this is not always easy in the complex world of online data.
For more information check https://harvest.graindata.com/en.
The original GA does not give Google useful cross-site user data because it uses only first-party cookies and anonymizes data as it collected it. To my knowledge you can still implement GA this way If you want to. Such an implementation would be GDPR compliant in not tracking any personal data, although your counsel might still say you need to list them as “analytics” cookies in a cookie banner (mine did).
However GA data showed its usefullness when selling the business. The data was considered as a trusted source of information for the buyer. And all the definitions (unique user, etc) were aligned with the buyer's, so it was easier for them to assess the metrics.
See https://hackernoon.com/serverless-and-lambdaless-scalable-cr... And https://aws.amazon.com/blogs/compute/using-amazon-api-gatewa...
Except for each of the 3 components you listed that make up your system. They are 3rd party dependencies,
I think OP was clearly referring to a self-managed solution as opposed to a set of 3rd party services like GA, Segment, etc, where the flow of data is out for your control.
Are you relying only on data you can get from your app? There is no reason not to build your own solution.
Are you relying on data you can't get from your app/website? Then you can only use GA, since FB does not have a service like this.
Very few businesses/people would choose to pay for something when GA is free. Why do that? To tell your customers "we value your privacy"?
It’s one of Apple’s strongest marketing pillars.
Still building it, but you can sign up for when it launches here: https://forms.gle/MhojBWWfdiWjZatC7 (I know it's ironically on google forms and I'll move away soon)
How do you intend to make money from this free service?
At the end of the day, if anything goes wrong, I'll always be happy to open source the whole thing.
[1]: https://matomo.org/docs/installation "Installing Matomo On-Premise"
[2]: https://matomo.org/pricing/ "Matomo Pricing"
Here's the link: https://sdan.io/pingpong. If you want to signup/give feed back, I would greatly appreciate if you do so:https://forms.gle/MhojBWWfdiWjZatC7 !
https://storage.googleapis.com/pingponganalytics/pingone.js
So... I should install a script that loads from Google servers?
Apart from that, you probably don't want to tie yourself to google like that. Once the users have this in their pages they will _never_ update it. You should use your own domain.
It's almost as if you need to be a software engineer and do actual software engineering, to responsibly use tools like analytics.
> I ended up going back to Google. It took the same amount of time to set up as when I initially set it up: around 2 minutes of adding the tracking code and uploading it.
So how much effort is the privacy of your visitors worth, then?
It sounds like deep down you know the right thing to do, which is a lot of work, but seeing everybody (in your bubble of technical peers) just as easily use Google Analytics, makes you feel like you're owed the difference to these profits.
Maybe there shouldn't be a 2-minute turnkey solution to analytics, because even if it's self-hosted, your next excuse is going to be that it requires a significant time effort to keep it secure and act responsibly with the data.
I think that for a lot of the "alternative" analytics tools, feature parity with Google Analytics isn't necessarily a goal, so this may explain your disappointment. I think the only exception here is Matomo, which is the only "advanced" OSS analytics as far as I know.
FTFY
I'm aware of this. The next logical step is to find a better ad network.
There is basically no strong indication right now of any large segment of users boycotting sites because the users care about privacy. There's the same small amount that have always been present and the number doesn't appear to be growing.
fathom - Looks great. I am OK with closed source products (my motivation is self-hosting/privacy) but the direction is not clear to me. Maybe they will have a blog about it at some point - https://github.com/usefathom/fathom/issues/268. Having multiple code bases is going to be super hard.
goaccess.io - this analyses web logs
google-analytics-proxy - project is dead
matomo - this is what i use now and it works great. has a lot of quirks but if you spend some time, you can make it work.
ackee, goatcounter - simple but looks like this does not track users/sessions. it's mostly for page hits.
countly - looks good if you are enterprise. there is no pricing :(
freshlytics (from another thread) - page says it's in beta and not production ready
- Goatcounter also came up a few days ago on HN and got a lot of traction/discussion[1]
I can't find the comment it was posted in, but a HN user did a good comparison of a few privacy-focused analytics tools[2]
[0] https://simpleanalytics.com/
[1] https://news.ycombinator.com/item?id=22044854
[2] https://dev.to/hmhrex/a-comparison-of-the-top-3-privacy-focu...
Edit: I found the OG comment for the blog post: https://news.ycombinator.com/item?id=21716544
Doing log-analysis has its own drawbacks: not everyone has access to them, bot traffic will be a lot higher, and certain information is hard to access (like screen size). You can't always "just" use it.
(Netlify does sell access to log data but it looks expensive for most hobby / personal sites)
With our open-source data collection framework like RudderStack (an alternative to Segment), dumping data into a warehouse (Redshift/BigQuery/Druid etc) and sticking another open-source visualization layer on top (e.g. like SuperSet), it is possible to put together an alternative to Google Analytics. One of our early users did it and we wrote a blog about it
No, they don't anonymize the collected data (for any reasonable definition of "anonymous". The IP address alone gives GA a very close approximation of a unique key, and their own documentation[1] explains the "anonymization" process:
"... the last octet of the user IP address
is set to zero ..."
(if the logged event doesn't opt-in to this behavior by adding &aip=1 then GA presumably saves the entire IP. How many GA users bother setting that option?)The 8 least significant hits of an IPv4 address are the least interesting. The remaining 24 bits gives GA the ASN and is a lot of entropy for fingerprinting. It would be trivial to recover a unique key from the "anonymized" address by combining it with other analytics data, other cookies, timestamps.
[1] https://support.google.com/analytics/answer/2763052?hl=en
You can either disable cookies to run GA in cookieless mode [1], which presumably will affect how GA performs, since they can't determine repeat visits (but this might be fine, depending on the type of site you have), or you need to gain active consent to enable analytics cookies [2], which isn't much good if you want metrics for all users, not just those that opt-ed in.
If someone has solved this reasonably, I'd love to hear how! For now it seems like cookieless is my only option.
[1] https://developers.google.com/analytics/devguides/collection... [2] https://ico.org.uk/for-organisations/guide-to-pecr/cookies-a...
Your council should also have advised you that you need active consent in your cookie banner, since GDPR raised the standard for consent, which is the stumbling block I'm facing. [1]
[1] See "In brief": https://ico.org.uk/for-organisations/guide-to-pecr/cookies-a...
My Firefox was a minor update earlier than the one on the sibling comment (72.0.1). It has updated now, and the site claims that my connection is down on my machine too.
Now that I have seen the message... It's a funny thing for a web page to claim.
But in seriousness the 10 lines would be just use local storage or wotnot to store a tag, then call tracker.com?tag=... on each page load. "Rest is done on the server (TM)"
Just like people are putting Alexa in their homes and businesses, it can be potentially used for anti-competitive reasons, or to get inside information. This will get worse over time as we know...
A simple example is hacking a CEO's Alexa to listen to his phone calls at home to get insider trading tips...
The companies that make these products are not bound by an official code of ethics, and Governments barely understand the implications of technology, much less than corruption of technology. Laws to prevent misuse and manipulation of consumer products are weak, but proper investigation and enforcement of those laws are even weaker.
Google has also been changing Chrome Browser to suit it's information gathering needs as well. If they own the majority of market share, they won't really need their analytic tools on every site. We need to start thinking on a larger scale about how companies can influence culture, markets, and lives, and how to ensure there are proper rules in place to prevent catastrophe.
I recognize that there is more to digital business than adds, such as paternalistic commercial guidance, dark patterns, web traversal, and so on. However, I haven't seen proof that these patterns matter, especially given recent critiques on A/B testing (relative to multi-armed bandit).
This is a gross mischaracterization of what the linked paper says.
"It's extremely difficult to measure the impact" (the claim that the cited paper puts forward) has quite a different meaning from "no measurable impact." The paper is entirely about the difficulty of measuring the impact, and studiously avoids, as far as I can tell, any inference about what the impact actually was in these experiments. For example, Table I gives the mean of the control group sales and a standard deviation, but no mean of the treatment group sales, which you would need to do a statistical test of whether there was an impact; similarly, Table II reports standard deviations of the sales effect and ROI, but not means; Table III presents power calculations based on hypothetical ROIs and the real measured standard deviations, but gives no clue to the real ROIs were. Nowhere does the paper support your claim that there is no measurable impact.
In addition, the situation is very different for small companies that most people have never heard of. The article is citing studies done with major corporations with millions of customers and that are already well established in the collective consciousness. Measuring the effect of an advertising campaign taking you from 3.23 million customers to 3.231 million customers is indeed very difficult, particularly when you might fluctuate by tens of thousands of customers on a weekly basis. Measuring the effect of an AdWords campaign taking you from 200 customers to 250 customers is much easier.
Or, idk if that is quite the right formulation for what I mean.
But, at the least, it seems likely that some people will sometimes be willing to take an amount of effort to protect the privacy of a large number of people, when they wouldn’t take that same amount of effort to just protect their own to the same degree.
It seems likely to me that a major impact of lack of privacy comes from many people lacking privacy, in ways that wouldn’t happen if it were only a few lacking it, and also where a few re-gaining it doesn’t influence the impact all that much.
If so, then people not avoiding something because of privacy concerns in sufficient numbers to substantially influence the amount of use, doesn’t seem to entirely rule out that they care about privacy. Perhaps their behavior could be attributed to a collective action problem, where they each would prefer that all of them avoid it, but don’t find it worthwhile to be among only a small number of people avoiding it.
If AWS were really aggregating customer's data stored in AWS' platform, I think we'd be seeing a lot more about it in the news. And there would be a lot more than just Walmart advocating against its use.
This is why this comparison doesn't make sense. Google Analytics is a 3rd party service where you have no control at all of your data. You put a script of your app, and then you basically funnel data to them. That's it.
Using Pinpoint, in this case, is the equivalent of using EC2 and S3. You can control the flow and the lifecycle of data (deleting data forever, for example).
You should be able to trust that your customer data is safe there, otherwise, why use AWS at all? or better yet, why use any public cloud infrastructure provider at all?
If you're concerned about that, you probably should build your own self-managed server infrastructure.
You can opt out though.
First point under Data Privacy: https://aws.amazon.com/rekognition/faqs/
It's people that struggle with the conundrum: "X, Y, or ethical. Pick two".
As for Fathom, I find that last "since that people are confused"-comment rather funny, since their messaging on this has been confused for almost a year, haha
That said, that will only do about .01% of all the features of GA; like the infamous "FTP vs. Dropbox" the premise itself isn't exactly "wrong", just missing a bigger point.
> We are keeping this version open-source, forever, and committing to maintain it. We also have a business to run, and while we love open-source, it isn't paying our bills (and Fathom takes a lot of work from 2 people to keep going) and we're not a charity.
> If this repo was full of contributions and other folks pitching in, this would be a different story, but it's not—which is totally fine and accepted. But, since we want to keep going with Fathom, we have to separate V1 and V2 so we can make it sustainable. Otherwise we'd have to abandon it (which serves no one).
> If you truly want your complaint heard, maybe contribute to what you're complaining about (financially, time, effort, etc). My wife always tells me that I'm not allowed to gripe unless I'm also taking action.
Even Google doesn't dare monetize data stored in their enterprise customer's databases.
EDIT: Failing to offer a privacy-enhancing feature, and actively compromising and mining your customers data are quite different scenarios.
That story is 48 hours old; it remains to be seen what the effects will be.
I don't understand this claim at all. It's always been clear that if your paranoid about the state, to turn off iCloud backups. And it's not like Apple is selling your backups either.
And yet even here, there is widespread surprise at the news that iCloud backups are not fully encrypted in such a way that keeps them private from Apple.
If we (as a group) have in some factions been caught by surprise, what are the chances that the general public are also not aware?
Anyone who thought Apple kept iCloud backups fully encrypted was being willfully ignorant. Apple has been fully open to the fact that they share iCloud backups with the FBI, this exact same situation happened in the San Bernardino case where Apple provided the backups but the FBI cried about how they wouldn’t unlock the phone.
The tinfoil hat advice has always been to turn off iCloud backups. I don’t buy that anyone privacy conscious should have missed the fact that even Snowden was saying “use an iPhone but turn off iCloud backups” for the last several years.
If you got caught by surprise, then you weren’t paying attention. Apple wasn’t keeping it a secret that they would share your backups with law enforcement. The only reason this is possible is because today they don’t require you to enter a password to restore your iPhone to a new phone. Even the way the technology works today implies that Apple can read your iCloud backups without you knowing.
Of course, if that is all you are doing, you should be using Matamo or Fathom or whatever, but it is not fair to say GA could easily be replaced.
I've have used GoAccess for a while now and is mostly happy with it. It's fast enough and can generate pretty good looking static html which is mostly what you want for those simple use cases.
A side effect of processing log files is that you can freely try software on historical data.
Basically, if it accepts a GET with query parameters, it should work.
> We are keeping this version open-source, forever, ...
Here they are committing to keeping their existing Open Source around instead of taking their toys and go home.
> If this repo was full of contributions and other folks pitching in, this would be a different story, but it's not—which is totally fine and accepted.
I think they aren't asking for contributions to v2 but rather for contributions to v1 which they are committed to keeping Open Source. But even then they acknowledge that users are free to use it without contributing in any way.
But any service that sends something like "GET example.com/collect?path=/foo" can be loaded with an <img>. It's perhaps not an explicitly documented feature, but it will still work :-)