Amazon is blocking Google’s FLoC(digiday.com) |
Amazon is blocking Google’s FLoC(digiday.com) |
Obviously Google does it for self interested reasons, but thank goodness they do - you can hate Google and targeted ads all you want but without Google pushing web and ad tech forward it would stand little chance against the competing proprietary platforms.
Your suggestion that Google pay sites for the traffic they generate should like that ridiculous News Corp/Australian shakedown of Facebook and Google, which people were only able to justify based on their hatred of the target companies and a willingness to sacrifice the web to their ends.
Little chance of what. It sounds like this is framing the web as some sort of commercial venture. And Google is the gatekeeper. A venture where they can effectively make sites "appear" or "disappear" from the web and they decide what the public will or not see. Google watches the traffic, shows what is "popular" and buries the rest. Everyone begs for Google's favour to show their site "at the top, on page one". If not an organic listing, then Google will let anyone pay to be "at the top, on page one" in the form of an ad that looks much like a search result.
That's a very dysfunctional "public platform". (The Google founders wrote about how dysfunctional it was to sell out that way in their 1998 paper announcing their new, alternative search engine.) No one ever agreed the only way the "public platform" would be useful is for a few big corporations to control it. That is a recent idea held only by those who stand to (continue to) benefit from its realisation.
News Corp is bad, Google is bad, Facebook is bad, but c'mon this does not mean the web has to be bad. If one cannot see the difference between "the web" and a few big corporations, then some "reframing" is defintely in order. The web is a medium not a destination. Google, Facebook and others trying to emulate them are all acting as middlemen on the medium.
Garbage like AMP, or flexing their dominance in the search market to force websites to comply with this or that or risk delisting, is garbage.
Citation needed. What proprietary platforms would have taken hold if not for the grace of gmail?
> Your suggestion that Google pay sites for the traffic they generate should (sic.) like that ridiculous News Corp/Australian shakedown of Facebook and Google
Facebook is complying: https://www.msn.com/en-us/money/companies/facebook-to-lift-a... because hey, sharing the pot is better than no pot.
I think the point is that nobody would go to Google if they didn't need to look something up on Wikipedia. So while Google helps users discover content and funnel them towards sites, Google would be 100% useless without the content that ultimately drives the traffic. The status quo, where Google lays 100% claim to the traffic and gets to control monetization, is frankly not in anybody's interest. So why should we accept it?
While I agree with you that Google paying for serving requests or some other equity mechanism sounds just plain odd, there are few tools to deal with multinational monopolies. Tesla is making bank right now in no small part from carbon offsets and consumer tax benefits—- that’s all because Aramco and big oil won’t diverge from their shareholder interests. Google usually welcomes novel web/social mechanisms and it’s very telling when they so thoroughly refute the interests of news sites. Or try to solve the problem with something crappy like AMP.
To give some evidence for this, Google pushed hard for PWAs - it serves their interests since they can focus on one platform for their desktop platforms, but also means that on Desktop (via Chrome) and Android each web app can just install themselves without having to distribute a native package or go through an app store.
True, as are some of the counterpoints. I don't think it contradicts OP's point though. FLoC is designed by Google, for Google's needs. Some/most of those are genuinely privacy related, the way that they're related is via advertising/targeting/tracking... which Google rely on for all their revenue.
Amazon, meanwhile, doesn't benefit from FloC much... hence conflict.
These datasets are being used as defining advantages by both companies. Why should amazon want to adopt/feed google's new analytics project?
My understanding was Google works a ton on open source and essentially making "the internet" better so that people will ultimately use Google more (since Google is the backbone of the internet) and therefore consume more ads.
All of these tech advancements definitely helps the world more than it helps Google but I'm failing to know why/how FLoC helps the community more than it does Google? Not saying Google is in the wrong to do things out of self-interest, but this scenario is a little different
> make such huge investments in the one public platform we've got
How are things like AMP justifying this goal?
Ofc every company is doing things to advance its own interests, in that regard, Amazon has 0 incentives to share customer data which is truly unique/invaluable, with Google, or any 3rd parties.
The more the things change, the more they stay the same
How is this different from arguing that sites, such as Google or Facebook, should have to pay to link to news articles? I appreciate and support Wikipedia, but I don’t think Google should be expected to help pay for it (though I’d appreciate if they did as a form of public service).
They gave a paltry $2m to the endowment a couple years ago. ...and how much did they make off serving Wikipedia content?
https://techcrunch.com/2019/01/22/google-org-donates-2-milli...
Google has publicly recognized that they have a problem with trust and incentives. So when they admit that and continue to non-execute on addressing core problems, that's when the monopoly needs to be rebalanced.
What does this mean? You think Google should pay for people who are sent to wikipedia.org after a Google Search? Or you think Google should pay for the information they scrape from Wikipedia and display to users on a Google search results page?
I'm pretty happy with all the free youtube content, search engine results, email, storage, word processor, spreadsheet, slide shows, messaging, and more I get
This cannot be stated enough. I think just YouTube alone would be enough to justify Google's existence.
Meanwhile Amazon has Twitch, and people there don't seem to think too highly of how things are being managed (they somehow managed to break every single adblock available and at this point have won against adblockers).
"[Amazon is] preventing Google’s tracking system FLoC — or Federated Learning of Cohorts — from gathering valuable data reflecting the products people research in Amazon’s vast e-commerce universe"
Compare with, e.g.:
"Amazon is taking steps to protect its user's privacy by blocking Google's heavy-handed overreach in leveraging its Chrome browser to spy on user's personal shopping habits and sell that information to other retailers".
(Note: I'm not saying my rewrite is unbiased. It's not. It's just biased in a different direction to highlight the contrast.)
* I don't suspect he his.
But, it is certainly useful to publicly see floc sentiment. As far as I know, Amazon hasn't said anything publicly about floc, but now we know they are aware and doing something about it.
I saw that GitHub and The Guardian also rolled out the header.
Waiting for a website tracking who all has opted out to pop up.
I think the header also has value as a "last resort" to catch any unintentional use of floc if your org doesn't want it.
Permissions-Policy: interest-cohort=()
Source: https://www.drupal.org/project/drupal/issues/3209628
https://paramdeo.com/blog/opting-your-website-out-of-googles...
While they can be installed manually with extra steps, there are also other browsers out there.
Is there a way we can just obfuscate / ruin our data with them?
Like a tool or browser extension I can run that clicks / visits a bunch of random links and totally trashes which "cohort" Google thinks I belong in.
I'd pay for this more than paying to opt-out. Then serve me all the ads you want.
Also there's an issue that bots are detected easily.
What do they mean here, that the actual page request does not send the "no FLoC" HTTP header but the requests from Analytics do?
What happens in this scenario?
So they might be trialing it this way because of that, to help boost their ad platform and hinder floc , so that google cannot drop third party cookies that easily , as floc’s on browser processing makes google the defacto judge on what information do they add into floc identifiers and what they do not , meanwhile themselves getting all the unrestricted data from their browsers separately.
By hindering mass scale adoption of floc , they’re trying to delay dropping of third party cookies , to slow down google from getting an advantage over them.
Atleast that’s what I think , they might be testing it for other reasons, only an Amazon exec can answer it specifically.
Personally I don't see depersonalized targeting as a bad thing. Better than advertising dish washers to people who just bought a dish washer or some such nonsense.
Hmm.
Additionally, it alleviates the creepiness factor a bit ("they're so bad at tracking, they don't even realize I just bought one!", so you don't think about the perfect match with headphones you were just offered) and they might simply have missed the purchase.
Gut punch
What would be the total bandwidth, energy and Co2 usage if the largest net entities from Google used this header?
Google: More control to us, please.
Amazon: No.
That part seems to be the only universal truth these days.
But from what I think I know that's kind of right technically, but kind of not in terms of actual real privacy.
Yes, the actual browsing data, e.g. for the basic floc cohorts only what amazon product page you visited, is no longer 'sent' to ad networks (that's a pretty big oversimplification of how ad networks track you but for brevity). That data is parsed in your browser to generate a cohort ID for you.
But this cohort ID is exposed to the world document.interestCohort() and is what's used for targeting and tracking.
To me it seems that the cohorts are so small "thousands of people" + IP or UA it's basically the same as a semi-long lasting uuid.
And if you have like even 10 different cohort IDs, even if some of them are 'fake'/'noise' that's probably enough to ID you alone
Here's an image from google's site.
https://web-dev.imgix.net/image/80mq7dk16vVEg8BBhsVe42n6zn82...
It also seems like Chrome/google might be still defaulting browser settings to give themselves even more data just like they currently do?
https://github.com/WICG/floc#qualifying-users-for-whom-a-coh...
BUT when you layer on the other proposals (Fledge/Turtledove/Dovekey or whatever) - which I don't understand that much maybe someone else can explain - it seems like it basically collect this page/product level data and makes it available to DSP etc for tracking/ad serving (again if not technically 1:1 basically in consequence given the sizes of these groups).
Like one of the proposals talks about a 'trusted' key/value server which doesn't seem that different from what already happens? The original proposal wanted to move the entire ad bid/target/serve process into the browser.
Thank god they figured out it is illegal in Europe to do this without opt-in and didn't roll out FLoC here...
Like perhaps using AdSense, Google Analytics, Google Sign In, etc, will include a buried implied "opt in" for your site at some point.
Google is quite good at rolling out changes slowly enough to spread out any outrage. Watching the progression of ads take over their SERP pages, it was very slow and subtle. No ads, then just sidebar ads. Then one ad below the first one or two results, then above them, eventually leading to some pages with nothing but ads above the fold. Over many, many years.
The floc repo currently says "The algorithms might be based on the URLs of the visited sites, on the content of those pages, or other factors." Which is not super helpful. It seems like Google could fairly easily hide info from Floc since they own both sides.
(And while the author does say "Best guess", this isn't just an empty Google promise—if this changes, it would change the entire tenor of consensus-based standardization discussions that are happening here, and significantly lower Google's standing in the web standards community, which they care a lot about)
Not straight up lying, but downplaying concerns without actually being able to lay those concerns to rest.
[1] https://twitter.com/Log3overLog2/status/1384337637763387394?...
Sort of. Kind of.
googlebot only respects part of robots.txt, the part that refers specifically to itself. It doesn't respect global robots.txt rules.
Google also explicitly don't really respect the disallow rules:
> However, robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant. If you wish to explicitly block a page from being indexed, you should instead use the noindex robots meta tag or X-Robots-Tag HTTP header. In this case, you should not disallow the page in robots.txt, because the page must be crawled in order for the tag to be seen and obeyed. [0]
[0] https://developers.google.com/search/docs/advanced/robots/ro...
sorry, wasn't meaning to imply Googs ignores robots.txt. I was going for conceptually it is easy to ignore it, just as it is easy, conceptually, to ignore HTTP headers.
>and tracking opt-outs for years, right?
is this provable? if i opt-out with my g-account in the browser on a desktop, that should imply i want out of all tracking, yet you have to do it on each app on each platform. it's wack-a-mole that is impossible to win.
The organization controlling "the thing" is the entity that asked for the feature, so we believe the thing will both know about it and honor it.
It's the only purpose this flag has.
Products that target based on actual user intent benefit from cookie blocks, as that cannot be meaningfully blocked ever. (i.e., when you search for "brunch" ads relating to brunch show up)
Products that target based on behavior away from the product will suffer - but morally I'm ok with that.
Google happens to own one of the most intentful products out there - you directly tell the product what you want to see! The main pain for them will be loss of targeting ability in their network ads displayed on 3rd party sites - but their first-party products I suspect will see a boost in the new world.
Has everyone forgotten OTA broadcast television? Where Geritol spent a fortune advertising on the Lawrence Welk Show? And Kellogs flooded Saturday morning cartoons?
I may be wrong, but I don't think advertisers have boosted their budgets in the age of targeted advertising. Google has done well to replace the old channels for advertising with their own pipeline. For the last twenty years it has mattered which ad platform could more accurately target your demographic. Google has won most of that war. Today, you pay Google whether the ad is targeted or not. So now, they can shift the battlefront to create other barriers to entry. And to keep people dependent on their infrastructure to package and deliver advertising at all.
A content website has nothing to sell, assuming it's not behind a paywall. They are typically funded using general purpose tracking ads. The ads are based on other websites you visit and have nothing to do with the content you're reading.
These websites may face a serious threat, and need an entirely different model. The most straight-forward alternative I imagine to be contextual non-tracked ads. Ads related to the content you're reading.
Other victims are to be found in the shady world of data aggregators. Their entire existence is based on cross site tracking.
Whilst websites and data parties may suffer, Google will continue to hoard data. Almost every website will continue to use Google analytics, Google fonts, Google Tag Manager, the like. This on top of the wide array of consumer products you may use: Android, its various Google apps, Gmail, Youtube, all of it.
It's virtually impossible to avoid Google touchpoints, they will continue to know more about you than you do about yourself. They don't need AdWords for that.
This wording annoys me. The websites have nothing to do with it. Google choosing to turn it's browser into spyware that leaks information about what you used to do with it isn't the websites fault, the webserver doesn't do anything and doesn't have anything done to it, there is nothing for it to opt out of.
Google chose to give websites a way to request that the users browser doesn't include the fact that they visited this website in it's cohort calculation. That's fine, but the messaging around it is a transparent attempt at shifting the blame. It's not the website opting out or in, it's the website acting as an uninvolved third party bystander asking google to stop. Asking why a website didn't opt out is equivalent to a thief asking "well why didn't you stop me?" to the person looking on from the sidewalk.
We shouldn't accept this messaging. We should be very clear that Chrome is the entity spying on you, not the website, and that the website has no power to decide whether or not chrome spies on you, only the ability to make a polite request that it doesn't (or more accurately, does so less).
FLoC is only opt in for testing the proposal[0]. As a sibling comment says this is technically performative but publicly signals a stance against the proposal.
Though we also shouldn't forget that Amazon loves third party tracking and happily falls back to IP address associations if cookies aren't available.
Edit:
[0] https://developer.chrome.com/blog/floc/#take-part-in-a-floc-...
You can opt-in to actively be a part of FLoC, but if you don't opt-out, Google may randomly choose you to be part of their testing.
Edit: I think your point may have been from the perspective of a website owner. Sorry.
Wikipedia doesn't run ads on their pages, so Google showing content from Wikipedia directly in the search results doesn't take away any revenue from Wikipedia. If anything it reduces their operating costs to have Google serve the content (with attribution!) rather than sending users directly to Wikipedia's servers.
Jokes aside, if it was a multiplayer game that wouldn't be an impossibility.
I like the recent trend of friend-copies of games that are co-op first like "It Takes Two", "Operation Tango" (is that name correct?) and the two-player Wolfenstein I forget the name of.
> Efforts to standardize Do Not Track by the W3C in the Tracking Preference Expression (DNT) Working Group reached only the Candidate Recommendation stage and ended in September 2018 due to insufficient deployment and support. [...] Despite supporting it in its Chrome web browser, Google did not implement support for DNT on its websites, and directed users to its online privacy settings and opt-outs for interest-based advertising instead. The Digital Advertising Alliance, Council of Better Business Bureaus and the Direct Marketing Association does not require its members to honor DNT signals.
Permissions-Policy: interest-cohort=()
FLEDGE/Turtle*/etc. is a different issue. I'm not sure it will be more private than 3rd party cookies since the spec is not very clear and it has so many moving parts. I have heard from some Chrome devs that if it doesn't end up better for privacy than 3rd party cookies, it won't get past the origin trial stage.
The docs/images they use make it look like an array but I just read the origin trial info page and it says ocument.interestCohort() only returns cluster id and algo version id.
still though the point stands i think. even say 1 million people in one cohort id # (they use 'thousands' to describe) + ip + UA and it's pretty unique, until apple and others proxying everything as recent posts suggest. Add whatever 8 bits or however many privacy allowance entropy and it's probably very unique and trackable over time if you have say TTD scale.
totally! it's very very confusing and I don't understand some (ok maybe a lot lol) of the RTB/context/retarget proposals and multiple RTB stakeholders have submitted their own too and they all have really stupid confusing names. But that's what I gather that it's basically the same result. It feels like the only way to do similar retargeting, conversion tracking is to have one 'trusted' source who gets all the data
Unless you were studying the impact of ads you receive based on cohort, like https://their.tube.
Why did they do it? Because news website were heavy, slow, bad experiences compared to Facebook Instant news and Apple News etc. and so they those proprietary options were winning. AMP was designed to allow web sites compete with that.
It was reported that Apple News is taking 50% cut. When media companies keep customers on their own sites they have many options - more are now running their own ad business entirely (NYT most recently). For many reasons I hated to see those proprietary platforms crush the web sites, but the web sites really were too slow and heavy.
I'm certainly not telling you to like AMP - my point is that even their most hated, ham fisted product fits into this mold. It is totally open in every important way (look it up if you don't believe me) and it made a big difference in allowing sites to compete with proprietary platforms.
MS is happy to use/embrace Linux, Chrome (even AMP) etc. but contributing is new to them. The embrace & extinguish thing is not the same when the company is creating and contributing the tech themselves.
They could have prioritized websites with fewer tracking/ads/scripts.
I don't believe that Google cares at all about whats good for the web. They simply want to exploit it and pocket the money (as opposed to re-invest any major portion back in the infra/community) - in that sense, they're no different than any other nameless/faceless corporation.
They are a for profit corporation in the end, so it is unfortunate to depend on them, of course, but I think they need to care about the health of the web - their profits tomorrow depend on it. And I think they've demonstrated it by creating so much tech that they give away.
The downside comes down to the end user experience if those websites being prioritized have lower quality material, which in turn might force those users to use a different search engine that might not care about that if it means they're getting more users.
If they wanted to penalize slow sites they could have… penalized slow sites. There are numerous metrics (paint time, etc) that they can track for that.
Simply prioritising fast, mobile-friendly sites in search results would have achieved that aim.
If there's one thing that's clear from visiting any news publisher's website, it's that news publishers are unable to build sites that are fast and mobile-friendly. But one things news publishers do know how to do is rig up their CMS to also publish to proprietary systems like Facebook Instant Articles.
The magic of AMP was that it tricked publishers into thinking they were publishing to one of those proprietary systems, when in reality they were building a fast mobile website! Because it imposed strict rules rather than just "faster is better", publishers could throw out all of the stupid, awful practices they'd built up around making websites. Can't use that bloated framework of the week, AMP doesn't support it. Can't give the ads department free reign to ship whatever third-party scripts they please, AMP doesn't support it. Don't worry, website team, we're not threatening your jobs -- AMP is just another proprietary reading system, just like Facebook's.
No, it shows that they're not motivated to build fast and lightweight sites. If Google search severely penalised bloated websites, bloat would soon improve.
That's "easy"?! How does my mom do that for her WordPress site?
Proposal: Treat FLoC like a security concern - https://make.wordpress.org/core/2021/04/18/proposal-treat-fl...
Consider implications of FLoC and any actions to be taken on the provider (WordPress) front - https://core.trac.wordpress.org/ticket/53069
Chrome has promised to listen if websites say they don't want to be included in the browser history they calculate that statistic on, but it's all client side, there is nothing the website can actually do but request that they aren't included.
It doesn't work that way at all.
Hilariously, I even opposed removing the code later because I wanted us to be a good citizen but it was practically dead code because people were still calling us evil. They could literally set their UA to play along (or use one that set it by default).
I think we always kept the code in but it only incurred cost and we got blamed anyway. I think, looking back, I should have just removed that piece of middleware since no user ever really cared. It wasn't worth it for the org to pay for code so I could have a clean conscience.
There is a non-exhaustive list of features/APIs here: https://github.com/w3c/webappsec-feature-policy/blob/master/...
Each feature takes an allowlist, specifying which, if any, origins can use the feature.
https://github.com/w3c/webappsec-permissions-policy/issues/1...
What is happening in w3c?!
edit: ahhh i see it's in the http headers, not the head of the html. nvm.
Saying you did something doesn't help the user know that DNT was followed
Maybe we tried some other codes but anything but 200 was unsafe to many UAs (you could 3xx but UAs would break on 304 too because the tracking pixel wasn't actually cached). Anything that led to UA breakage was verboten anyway on our side since we didn't want anyone to have a broken experience because they set DNT. That would have been bullshit.
We were dumb-enough to handle P3P headers too (which AFAIK no one really used in the end). Lots of dead code. Ugh.
As is the nature of dualities, the web has benefit immensely from Google's investments even if it would have chartered a different (and in your opinion, a better) course had Google not existed in the first place. Someone pointed out, you couldn't say the same for Amazon. As for incompetence: imho, webrtc, which Google standardized and open sourced, is likely the single most important innovation on the interwebs (in terms of impact) just ahead of Microsoft's XMLHttpRequest.
Thats a really weird claim. We can point to some real ways google has benefited the web: Their search engine is excellent, and was a huge leap forward when it was released. SPDY/QUIC are set to become the next HTTP2/HTTP3. And google chrome has made the browser a much more powerful and compelling platform over the last few years. If anything they're investing too much - and hurting the web by making it hard for other browser vendors to keep up.
But webrtc?? Webrtc is still mostly a toy, barely used outside of video conferencing. Its insanely overcomplicated for any other use case. And I still haven't seen a compelling reason to use it for anything else. Decentralized communication doesn't buy you much when the site itself is still loaded from a centralized server.
More important / impactful than XMLHttpRequest? No, I think not.
RedHat benefited significantly from funding by large corporations in it's early days.
Undoubtedly these companies helped shape the Linux ecosystem. A single company doesn't control it, but as big as Google is a single company doesn't control the web either.
To be fair, I am not the one that's assuming things here. I am speaking of how Google has indeed contributed when they really didn't have to (as pointed out with the example of Amazon).
> Linux happened without single corporation controlling it.
A consortium of corporations, sure: linaro.org
They’re a lot like a revolutionary government that gradually becomes corrupt and as bad as the regime they overthrew.
Complex browser-based alternative to TCP? Standardized alternative to Socket.io? I can't say its not useful but webrtc is hardly the most important thing...
I'm not as positive about Google today as I used to be in the past, but I don't feel it's fair to pretend that they didn't help us take giant steps forward.
The world would be a better place if google search had been made a not-for-profit (maybe like wikipedia?)
By this point I would (maybe) pay a monthly subscription for a really good websearch like google circa 2005-2010
UI changes and new features aside, the web is just so much more adversarial nowadays. It's no wonder so much rubbish floats to the top of Google because the reality it's drowning out all the other content.
If you had the source code for 2005 Google it would be objectively worse today than it was then.
I'm trying to think of any changes to Wikipedia that happened after it launched and can't think of any. It surely does its job, but it doesn't change and there is no drive. Wiki concept was novel at the time, they did and continue to do an amazing job, but there's no evolution there. Or maybe I'm just a blind or unaware or biased - but, honestly, I tried to think of something and nothing came to mind.
Google constantly tries out some new things. They're really bad at maintaining them, they can't stop inventing chat services, they suck a lot and we could bash them endlessly, but let's credit what's due - they're always exploring some frontiers.
Just because cars still mostly have 4 wheels doesn't mean automotive engineers haven't been innovating the past 100 years.
Which among other things shows that patents are bad for innovation in new and quickly changing industry. Google came up with their algorithm and heavily patterned it. As an invention it was not ground-breaking, but it matched very well how web worked. This gave them essentially monopoly in search from which they massively profited. At least now those patents expire.
One consequence was the preceding generation of search engines being harder to drive for everyday folks, and a relevance approximation thereby more immediately accessible on the consumer scale, but let's face that the algorithmic approach also spawned a whole bottom-feeding industry of SEO snake oil vendors and their merry-go-round of clickbait, malware, and global-scale consumer surveillance. The incentive to hang yourself from a single keyword means that Google became the foster parent of AOL's Eternal September.
My personal feeling on the matter of Gmail and Google Maps is that they are best attributed to their personal creators (Paul Buchheit, and the Rasmussens, respectively), not the corporation. The seed of Google Maps was an acquisition, after all, and many other technologies I've seen offered up in neighbouring threads as proof of Google's benevolence were either acquisitions, or ones where substantial parts of any credit must be shared (webrtc has been mentioned; it is both).
Javascript in the browser still sucks mightily, and although it's not an argument I particularly wish to stir up there's plenty to say in support of that perspective. What's more, many of the best solutions are the product of independent/small/OSS groups, although I will confess a soft spot for TypeScript. Consequently, and especially w.r.t Gmail, Youtube, Maps, and <whatever Google Apps is called this year>, Chrome starts to look like the Lotus Notes of today: a thick client, developed by a large firm, in support of its specific service & platform offerings.
That’s the reason why Google, a very small newcomer, crashed the entire search engine market.
I meant to highlight that Linux, not in its entirety but parts of it, is indeed driven by orgs (that you haven't even heard of).
Just be prepared: the future of the internet is navigating politics. You better be prepared for people to be upset at query X returns results too right/left/up/down for their political preferences. Then senators start tweeting at you.
Also, my search engine will be called "Jays Favorite Websites" and the right side of politics can bite me.
I think the problem here is just one of language, a summary statistic is a number calculated from a set of data that gives you some idea of the contents of the data, but condenses it in a way that you can't reproduce the original data. Common examples for numeric data sets are things like mean, mode, median, standard deviation. Common examples for data sets consisting of a finite list of strings (such as browser history) would be things like average length, character frequency, count, etc. The cohort id generated is unambiguously such a summary statistic.
So instead of building a profile on specific users, the website (or ad network) builds profiles on cohort IDs. Users can change IDs, or mask theirs altogether if they wish.
Retrospectively, I feel so stupid to have promoted « free products paid by an advertising company » for years.
I know Google duped everyone on this so I don’t feel to be stupid alone. But still, in hindsight, there were no possible future where Google could stay on top without doing dirty things. We didn’t see the targeted ads coming.
I've seen censorship, identity theft, money laundering, online harassment, online crime, information theft on very large scale to target, arrest and hang political opponents... All of which Google used to oppose more than any other big tech co. But somehow, the pitchforks are out against targeted ads.
For me, Google is "evil" for what it is, not what it does. And I think that Google is an utterly dangerous company with such unbelievable amount of precious data that could change the course of the history in a really bad way if it falls into the wrong hands in the future.
Who knows who will control Google in 5, 10 or 50 years ? Nobody knows. But we can be sure that in 50 years, the data that Google collects about us today will be lying on some hard disk drive, ready to be used for who knows what.
Google is able to define and store indefinitely "who you are". And the human history is full of times where "who you are" or even "who you were" were some really dangerous information that threatened your liberty or your life.
DNT was DOA. You can blame Microsoft for that one.
The website or ad network is able to read those numbers and build profiles on them, but it's still divorced from the user and their specific data.
I think a better comparison is that of a hash. It sums up the data, but is just a unique identifier for it. Of course with a cohort ID it's non-unique (by design).
Because the browser is only sending a number, it retains the ability to change, randomize, or obscure that number. That's an important privacy consideration of the system.
For what it's worth, I do think more work is needed. One of Mozilla's suggestions which I liked was to automatically send a missing ID on occasion, just to keep things a little hazy and reduce fingerprinting viability.
Fingerprinting is inherently less-necessary as a result of FloC, and you need to balance it to not become necessary again, but it's a way to protect users that fully opt-out without themselves become fingerprintable.
Almost certainly your browser history is summarized into a vector, and then the closest class number is chosen and sent.
You might not know which vector the number represents, but it does represent a vector for the centroid, and has relationships with other cohorts.
I'd say it's guaranteed that that interface is leaky
I would sometimes worry about that. At some point internet privacy will be completely erased, not by malicious or greedy corporate practices but by algorithms scraping all of big data on the internet. This will ultimately be inevitable, almost everything you've ever posted on the internet will at some point be retroactively traced back to you.
The upside is that this isn't what's going to happen to your data, nor to my data. This is what's going to happen to _everyone's_ data. As long as the whole world bites the dust together, I can see the actual damage being minimized.
Of course, this does not take into account living within governments that don't care about human rights and that you happen to disagree with or be critical of.
50 years from now the technology should have improved so much that Google will look laughably blind and powerless, I'm afraid. Let's not forget that Google, for all its evil, is still just an aging part of the dying old internet, that was naively brought into existence to be interesting and useful at a time when the world was not expecting that such a thing was possible. Now the technology is slowly aligning with expectations and will soon assist on surveillance and control. We will miss targeted ads.
Maybe someone could set up a library that implements Swing on top of a HTML Canvas? Just waiting for all the classes to download would help with that nostalgic feeling.