Re-decentralizing the Web, for good this time(ruben.verborgh.org) |
Re-decentralizing the Web, for good this time(ruben.verborgh.org) |
Watching your TED talk in 2013 was one of the most influential moment in my life, and discovering the semantic web was perhaps my greatest epiphany. While the vision never left my mind, I never acted on it. Until now.
I'm dedicating 2019 to linked data. I'm going all-in.
Last week, I started to build a tool to convert unstructured input to linked data. Even after recognizing canonical literals (email, phone, url, color, gender, boolean, integer, float, date, time span, money, weight, distance, language, image, geo coordinates), I couldn't accurately infer predicates and guess classes. Before trying more complicated stuff like bayesian inference, I decided to try a simpler exercise.
This time, I want to aggregate structured data from different sources and map it to some existing ontologies. For example, I want to convert some JSON about comments and links from Reddit and Hacker News to RDF using the http://schema.org vocabulary.
- Can I feed the JSON into some ML system that automatically figures out the mapping? What if I provide some annotation or feedback?
- Can I manually turn the JSON into JSON-LD and use that as the mapping information? What about complex transformations (different structures and literals)?
- Should I implement the mapping manually using my favorite programming language?
- Should I use R2RML or RML?
What's the state of the art today for semantic data integration?
- Homepage http://wit.istc.cnr.it/stlab-tools/fred/
- Paper https://www.researchgate.net/publication/280113533_FRED_From...
There are likely other projects and papers, google 'text to rdf nlp'
Stephen Reed (ex-Cyc engineer) also did some interesting work in this field, in his Texai project, over 10 years ago. Although there are few references to it on the web now: that part of his project is no longer open source (and I know of no known mirrors).
- Paper https://pdfs.semanticscholar.org/8026/107de65c5a14aa8d0d47f9...
- Homepage http://texai.org
- http://homepages.inf.ed.ac.uk/kbyrne3/docs/thesisfinal.pdf
- https://www.researchgate.net/publication/228378264_A_very_br...
Is "we" Solid or Ethereum?
For decentralization the root problem always existed, while pointing at another resource requires no permission, receiving and hosting that resource does. Your government has to let you receive it and your ISP has to let you host.
This is a much lower level problem compared to the three challenges Berners-Lee puts forward, which seem to have little to do with decentralization.
1. taking back control of our personal data;
2. preventing the spread of misinformation;
3. realizing transparency for political advertising.
What about Google Chrome?
Facebook, probably not so much. Their business model is data harvesting.
Regarding Solid, note that we don't want to overthrow or replace any existing social networks. We start with offering experiences they cannot offer due to their siloed nature.
Pretty sure Chrome did. Or WebKit/Blink family. This is GOOD imho.
Such a centralization comes with the risk of websites only working with one browser, forcing people to chose a certain device, operating system, and browser vendor.
Regular people and businesses are always going to make the decision in front of them.
'Decentralization' unto it's own, is not something anyone directly cares about. People care about privacy, somewhat, but there are other paths to privacy, or at least, consumers may very well believe there are.
Decentralization will only happen with a real impetus: a product or service that facilitates it, that people want, either for issues related to decentralization, or, more likely for some other reason that just happens to facilitate decentralization for some other, related reason.
In both cases, DNS and TLS CA-based stuff is about trust. You need to trust the DNS server, as there could be malicious servers sneaking in, and you need to trust the cert.
But once you have a social network with a large strong set, you could base the trust on the strong set, and in particular, individuals in that strong set who can demonstrate that they have a clue.
Once we have that, we can get rid of these achilles heels, but quite frankly, I don't believe in a strategy that takes on those problems first.
Sure, I obviously got OpenNIC in my DNS resolution. Haven't once seen an address that required me to use it beyond when I set it up. I think our approach is much better. Base it on people and the strongest part of their network.
Disclaimer: building a registrar[2] for Handshake so we're pretty excited about it!
> The situation becomes problematic when we are robbed of our choice, deceived into thinking there is only one access gate to a space that, in reality, we collectively own.
Robbery - the action of taking property unlawfully from a person or place by force or threat of force. [0]
Deceit - The action or practice of deceiving someone by concealing or misrepresenting the truth [1]
That's what those words mean. They also have nothing to do with anything that has happened with the internet over the last 20 years.
Are you railing against the use of "rob" with an intangible noun? Would you cry foul at phrases like "robbed of their dignity?" Do you ignore alternative definitions like "to deprive of something unjustly or injuriously?"[0]
Do you believe that nobody involved in centralization conceals or misrepresents the truth? Does a marketer never overstate the benefits of their hosted solution?
> about: Hacker, community guy and project release manager at Inrupt, working on Solid.
HTH
(But they have a key signing ceremony, so we can trust that it is secure.)
There are alternative DNS roots, though. I participated in running one myself for a time.
1. https://en.wikipedia.org/wiki/ICANN
2. https://en.wikipedia.org/wiki/Root_name_server#Root_server_s...
You can argue that the google dns can sign the answer with a cert you trust, thus you know you got the right answer. And then, when you get the address of the service you want to talk to, you can check their cert and know for sure you are again talking to the thing you want to talk to.
And here lies the problem, how are you going to check those certs? Are you going to queue at a google office and obtain a cert on a usb stick, then import that in you environment, then do the same for all other services you want to talk to? Or are you going to trust a central authority? What if this central authority is messing with you? What if someone up the chain is messing with your central?
Let's not talk about how your router might send your dns request to some guy's home server that answers to 8.8.8.8, instead of google and you can't do shit about it...
Everything we know about internet works the way it works, because everybody involved agrees to do the right thing, from ip routing, to dns resolving, to certs verifying. All of this is based on a set of rules set by a central authority, that everybody choses to follow. You can't speak about web decentralization as long as proving who you are is a very centralized system.
Instead, it's just 'what about old-fashioned websites, plus lots of xml schema and long spec documents'? It just tastes like a rehash of Berners-Lee's existing '5-star open data' schpiel ( https://5stardata.info/en/ ) but now with the billing that it'll fix the internet. 5-star open data has been around for years now, and, well, the linked data future isn't here. When's the last time you consumed RDF in an application?
Ultimately I think there are technical solutions to making the decentralized web more attractive than the walled gardens, but at this point they will need to be ridiculously polished and shiny to even get a look, and this stuff... is not. Going forward it gets even worse, they're going to be opposed at every step by corporations with more money than most nations.
The internet was originally decentralized because the government wanted to make it that way, and I think the only way to get back there is going to require a gigantic, economically unattractive investment. There are at least a few governments that may have the capability but I can't name one that would have the motivation. Hopefully some billionaire's charity will decide saving the internet is a worthy legacy.
The internet doesn't really tolerate serious technical barriers stopping someone from automatically multiplexing the content from various social networks into a single read-write stream, for example. The issue is that when someone attempts to do that kind of thing, they get sued and they end up owing BigTechCo millions of dollars. [0]
An open internet is _not_ a technical issue. It's a legal one.
There are so many abuse related issues on the web and I’ve seen no decentralized effort that works unfortunately. Cloudflare brought cost effective DDoS protection to the masses.
Anybody who wants to advance the open web should focus his efforts on a P2P library with extremely good NAT traversal capabilities that is extremely reliable and simple to use and supports as many programming languages as possible - certainly not just C++ or C. It needs to be deployable under a permissive license on all major platforms macOS, Windows, Unix, Linux, iOS, Android, and browsers, and may not transport any data or chew away bandwidth without allowing total control over this by the programmer and end user. It needs to have a dead simple, almost idiot-proof API. The resulting network on top of IP needs to be searchable, not too high latency, and route to any endpoint on it.
That's still the biggest hurdle for the Open Web. Everything else is secondary.
The confusion in your comment is that one would need RDF to do Linked Data. I've written about that misconception here: https://ruben.verborgh.org/blog/2018/12/28/designing-a-linke...
Don't get me wrong, the Semantic Web community has made mistakes and has not been developer-friendly. But we're not still stuck in the 90s. For instance, XML hasn't been a part of any of this for many years.
AKA instead of "Eval is Evil" we might instead say "XML is Evil"
With open-science mandates coming from governments around the world researchers are looking for ways to share their data in meaningful ways. I can think of a significant amount of research that regularly consumes RDF, particularly in the fields of medical biology and genomics where it's used to annotate data. This is where I'd guess you'll see it take a foothold, for example medical diagnosis codes are notoriously disparate and there is a strong appreciation for what semantics could address. Unify, exchange, and consume medical diagnoses ... proffit.
Links etc. off the top of my head-
* GO - The gene ontology, used in hundreds of thousands of genomic anotations https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3944782/ * UBERON - https://genomebiology.biomedcentral.com/articles/10.1186/gb-... * The second year of US2TS - http://us2ts.org/2019/posts/registration.html * OBO foundry - https://github.com/OBOFoundry/OBOFoundry.github.io
Where is a good place to participate in those debates, especially data authenticity and local server pods?
FreedomBox didn't go anywhere. FreeNAS with ZFS is reliable but not designed to be exposed to the public internet. Many local services are using a centralized rendezvous server for NAT hole punching.
On the shiny commercial front, MyAmberLife has $13M in funding for a home server but it's mostly controlled by a central cloud service. Do Western Digital, Synology, QNAP, Drobo, etc care about decentralization?
Current solution for several issues related with electronic health records concluded to create the new standard, to use RDF and linked data, which solved most of the issues on the previous standard. See FHIR: https://en.wikipedia.org/wiki/Fast_Healthcare_Interoperabili...
In fact, current linked data discussions seem to me that become relevant again because it is more clear now that we have been misusing/overusing/ bad REST, microservices architectures and GraphQL for some already analyzed and solved problems.
But, of course, for a single application which doesn't require interoperability, not requiring standardized data exchange formats, not requiring support for flexible data representation, Linked Data and RDF will be clearly unnecessary. But on time, the future of data interconnection plays on the side of Linked Data IMHO.
Until now, current attempts to create some Linked Data + RDF alternate infraestructures are more likely to create ad-hoc, informally specified, bug-ridden, slow implementations of Linked Data and RDF.
And just to make sure we are on the same page here: it's not academics' job to build usable products. We will continue working on things that are novel from the academic standpoint; if people like you dismiss LD/SemWeb, those novel things will have "a terrible track record of solving real-world problems". I hope this does not come across as too personal.
And using semantic web for that is just as bad. A basic json API would be much more stable than parsing a document with navigation and similar just to get that data.
Highly unlikely someone will bother with this (in addition to all the other quirks) while making their website.
For example, I can easily fire up a Tor onion service on my never-turns-off home desktop computer and reach my stuff from anywhere. Why can't I reach my friends' stuff the same way? Because, to use business-speak, there's nothing "turnkey". It's something I've been pondering and working on. Sure, the bigger players may have to be in DCs, have more stringent uptime requirements, and distribute their bandwidth/workload more. But for most of us, desktop software and web-of-trust style connections could go a long way so long as the front of the software has a FB feel (e.g. a feed, messages, etc). We can tackle discovery, searching, aggregation, offloading, etc later.
Just look at ActivityPub. It's essentially OStatus but instead of XML we slapped namespaces on JSON, wrote a bunch of overly complex preprocessing procedures so that everyone can output just the way they want[1] and still made half the spec ambiguous enough[2] that implements essentially follow the one rule that matters, maintain compatibility with Mastodon.
[1]: https://www.w3.org/TR/json-ld-api/#algorithm-5
[2]: https://please-just-end.me/ap.html#block-activity-outbox (domain name relevant to content)
https://github.com/kazarena/json-gold/blob/master/ld/api_nor...
Nothing proves his point more than:
<script src="//www.google-analytics.com/analytics.js...First, this idealistic idea that "we" are going to take back our data. Who is this we? Only the smart, high-agency people who have time to spare. The commercial web is increasingly tuned to the normal user, who is low-agency and easily led around. Who will win a battle of user acquisition and retention? Facebook or the rebels? Facebook of course. So any solutions proposed here are just for a tiny percentage of users who will then be isolated from the real and useful social networks. Or more realistically use both.
Or maybe if the infrastructure is built, a layer of savvy entrepreneurs can emerge to monetize it? I'm thinking of reaganemail, selling an anti-google email account to the AM radio crowd.
Second, the idea of somehow eliminating censorship. De facto censorship will always exist, even if you sugar coat it as Twitter has tried - "your content is still there, but only if someone explicitly looks for it". Any platform without censorship will just be flooded by every marketer and political zealot, for starters.
Also, I think he is conflating filter bubbles with centralization. Without centralization, wouldn't we still have filter bubbles as people self-select into their online communities?
Supposing we manage to solve this problem, what's to say average people can't participate in 10 years or so or so when the tech has been made easier to use?
It didn't start centralized. Centralization happened. I might be more cynical than I should be but as a designer I struggle to see the future in which we have social dynamics that favor decentralization instead of convergence into a less self-managed system (i.e. all current centralized networks).
Perhaps, but those would be self-selected, not imposed by the provider. Big difference.
Don't get me wrong, I'm all for projects like this. I think it's wonderful. I just never really got how the apps will work with the same data without being forced into a particular data model (which seems like it would limit what you could do).
I'm fairly confident that 98% of the population of the earth doesn't give a crap that their data is collected, or that they don't "control" it. This whole "decentralized web" thing is just privacy nerds trying to convince us that we need this, when really no regular consumer is asking for it.
We have parallels from other platforms - specifically the fixed and mobile phone networks.
There used to be monopolies in local phone service. There were new competitors, but to change provider, you had to change phone number.
Even changing cell phone provider required a number change.
This obviously had strong network effects pulling you to stay with your provider. You had to tell _everyone_ in your extended network where to find you and have them update all of their business records when you changed from one carrier to another.
Eventually, everyone figured out this was stupid, and Number Portability [1] was forced on carriers by regulation.
This problem is completely gone now. You can take your number with you.
If we allow people to take their data to new social networks, and force federation, then we will get decentralization. However, it won't happen without regulation anymore than it did with the phone companies.
[1] https://en.wikipedia.org/wiki/Local_number_portability#Histo...
The government then says "You have to allow competitors access to X, and you have to do it by date Y".
Then the companies get together and agree on how to do it because they agree that government dictated standards suck. There is usually some jostling around with someone wanting to run a centralized database for a nice per-transaction fee. Typically this is tossed out in committee, but not always.
*disclaimer: I help develop Bunsen Browser, the mobile companion for Beaker Browser.
Choosing between service providers is no more meaningful for privacy than asking Windows users to download arbitrary apps. If smart phones are any more secure than desktops, it's because Apple and Google are constantly improving OS-level security and policing their app stores for malware.
Of course app stores have well-known flaws. But if we want to do better than that, someone has to figure out a better way to choose good rules and enforce the rules better.
It's a p2p caching proxy that also lets you edit web pages collaboratively in realtime over a LAN or the internet. It has a contacts list system and p2p chat functionality. This project effectively died due to lack of interest and I still have various security concerns about it (Should you break/reimplement Same-Origin policy or break/reimplement the TLS chain of trust?)
The main security concern is that because it decentralises HTTP in-place (existing URLs can now be looked up on an arbitrary number of overlay networks if the original URL isn't providing an OK response) it puts users at risk of malicious actors spamming overlay networks with browser exploits for popular resources like "news.ycombinator.com/".
I hope TBL and co converge on satisfying answers to these problems or constrain their design to not bother with decentralising existing URLs in-situ.
Code lives here: https://github.com/Psybernetics/Synchrony
Feel free to shoot me any questions.
From what I understand the proposal here seems to not allow for the advertising model. I don't think a services can grow and survive making people paying because people are too cheap.
There might be a better chance for something like this is they allow for the economics. - Maybe the data host can provide a "advertising" profile which the user has control of. This can be exposed to the application hosts to allow for advertising. - Maybe you also throw micropayments into the mix, along with bartering for information or micropayments.
Another issue is complexity. A number of comments have talked bout over-engineered solutions and protocols. This decentralizezd idea could be started with something small like an open social network standard. I think I saw something similar to this on HN not too long ago: - You have a web site, which is your profile. A provider could give you a nice editor for it. - You have a feed, where you can put pictures, short posts, long posts, whatever. This is distributed with RSS. (The host makes this all seamless for you.) - Identity is controlled with OAuth, used only to give an identity to visiting users. The owner users can manage permissions for certain remote users (his "friends")
Such a service could be managed on your own web server, or there could be different cloud providers that make this arbitarily easy, with arbitrary levels of functionality on the "profile" page, the "feed" and the "friend" permission management.
This whole article looks like "well, the obstacles are not technological, but let me write a few pages about technology anyways".
If the obstacle are not technological, then we need non-technological solutions. So far I think GDPR is one such non-technological step towards taking back control of our personal data.
The hardest problem in my opinion is "preventing the spread of misinformation" because we essentially need a way to distinguish between malice and stupidity. Without mind-reading I do not see how this could be possible at scale.
There are stronger alternatives. We need to make a push to begin using them.
You then need an alternative name system which links a unique human readable name to a public key. This is the tricker part (see Zooko's triangle), but there are some creative solutions like Namecoin and the Blockstack Name Service.
I'm pretty sure there aren't better alternatives.
In practice, I am satisfied with just using my own domain for email, my web site, and self-hosted blog. For communication I like FaceTime so I can see people while I am talking with them, phone, and email.
I still use social media, very occasionally, to see what people are doing and sometimes advertise my new open source projects and updates, and any books I write. Most of the problems people talk about with Facebook/Twitter don’t bother me as long as I only use the systems infrequently. I am not tempted to cancel my accounts.
Ask yourself: who is this for? People who are not already deeply passionate will stop reading unless they are engaged in a minute of reading. Note that a minute is being extremely generous; on a commercial consumer site, it's apparently an average of 7 seconds before someone will click away.
I recommend that you check out this video and reconsider how you might reframe your message as a call to action that speaks to a better future we can create together.
https://youtu.be/qp0HIF3SfI4?t=121
I even jumped you to the good part.
If we still don't have decentralization, it's because it is not as easy.
The solution involved running a mesh network with nodes on user's laptop or desktop and a corresponding node in the cloud. These nodes would index local data and provide replication of metadata across nodes and backup of actual data to cloud node.
A locally running web app acted as replacement for 'windows explorer'. It allowed the user to access all their files and folders across all their nodes, access them (open document, play music/videos, see contacts etc), create smart collections and share these files, folders or collections with other users in a secure authenticated and private manner.
User got an identity - which comprised of a dedicated domain (or subdomain) and a PKI certificate tied to that domain. Each node had it's own private key and their public keys were tied together by the identity certificate.
All communication between nodes (of same user or across users) where authenticated and encrypted using these identity/node keys and certificates. No central node existed in the system that could spy on these activities. The architecture separated the network discovery cloud nodes from your data cloud nodes and architecture allowed for your data cloud nodes to be hosted separately anywhere (say, in your own cloud instances).
This is the only system I have seen that utilized zero knowledge protocols and made it accessible to common people to manage their data and share with others as well.
But unfortunately, as a business it never took off. It got acquired by emc and merged with mozy (good old data backup company) and then this product died a silent death in 2010.
Maybe it was timing, maybe after snowden, if this product had launched it would have done well.
But now, I think a more urgent and a relatively less complex problem to solve is one of distributed communication. In this era of always connected powerful devices (mobile phones, home gateways), why don't we all have our own personal email/chat servers that nobody else can spy on? Why does email and chat have to get relayed via big aggregators who mine so much data as well as metadata?
Not only do they violate privacy, they succumb to security breaches and cause serious damages.
I feel the stage is set for this disruption: crypto protocols, always-on cheap connectivity, compute power at the edge, and sensitivity to privacy/security in general population – all of these ingredients are appropriately set right now for this to happen.
Maybe this is a lesson that we need to be less tolerant towards the creation of centralised services because those with money and power will seek to bring decentralised systems under their own control.
- GPU passthrough VM (gaming)
- SATA passthrough (FreeNAS)
- multi NIC passthrough (pfSense/OpenWRT)
- app server/cloud/P2P Linux or FreeBSD VM(s)
http://unraid.net sells a KVM-based product. VMware ESXi and XenServer are free. Connect a Ubiquiti AC-Lite WiFi access point to a dedicated NIC on the x86 box, WAN to another NIC. Since pfSense owns the WAN NIC, it can host a VPN server for your devices, including mobile. All VMs get virtual NICs. Dell T30 with quad-core Xeon and ECC costs about $400 with 8GB RAM and 1TB disk, it can hold 4 x 3.5" drives (20 TB in RAID-1) and 2 x 2.5" SSD.Level1Techs has intro videos on home servers: https://www.youtube.com/results?search_query=level1+home+ser...
Advantages:
- Stable and boring x86 platform
- Good performance for gaming
- Commercially supported hardware
- Upgradeable storage and GPU
- Upgradeable router softwareSo Microsoft moved skype to a centralized service and have been trying to monetize it since.
The problem with decentralized servers isn't technical, something half as fast as your phone could easily handle distributed versions of popular websites. The arm based "wall warts" were plenty fast, and they are several generations old already.
How could decentralized applications/services be sustainable funded? If not advertising, how? If it is advertising, what's the benefit to users?
Most importantly, why would users care about decentralized vs centralized?
It does seem that a modest arm based server that's silent, potentially integrated into a wifi router would hugely reduce the downsides of p2p networks. Free power, cheap bandwidth, and being part of a p2p network they would avoid long startup times for applications. Users on their phones would get instant access to their data while their local node did any proof of work, DHT tracking, earning the reputation necessary to use bandwidth, cpu, and storage from other peers.
I don't see any technical barriers, just that users wouldn't care, and nobody would want to pay for it.
Probably few, except those actively searching for it. And, especially with decentralisation, the inevitable outcome of being too popular is that it starts to become centralised again, to make things easier.
Besides that, I thought for a moment about Apple's tech. If you and a friend have an iPhone and you're both trying to connect to the same network (and have each other in your contacts, I think), iOS will allow you to automatically share the credentials and connect the other phone too.
I reckon that you'd see a lot of value in that kind of device integration, which is essentially peer-to-peer.
How any other open source is funded (e.g. corporate/individual donations, grants, crowdsourcing, support, ancillary products, etc). I don't believe, at least for an MVP, that much funding is needed compared the scope of some of the successfully funded open source projects that exist.
> Most importantly, why would users care about decentralized vs centralized?
They wouldn't. And ideally, beyond the annoying hoops on initial versions (e.g. discovery/identity), they shouldn't. Your software needs to win on features. A self-hosted, subscribable Reddit clone w/ chat would be a very good start.
Routers have a few key advantages over most other computing devices owned by the public: routers normally have a public (non-NAT) IP address, they're always on, and battery power is not a concern. If people could install a Tor implementation on their router with just a few taps, Tor usage could expand dramatically. Developers of decentralized social networks might finally get a foothold once the installation problems are gone.
IMHO the best way to re-decentralize the Internet is by creating routers that host arbitrary apps, along with a marketplace of router apps.
The way to bootstrap this idea seems simple: sell a new router with a thin margin, provide SDKs for free, let developers charge what they want for the apps, and take a 30% cut from app purchases. The plan seems so clear that I wonder why I haven't yet heard of anyone doing it. :-)
> routers that host arbitrary apps
Servers. You are describing servers.
Edit: upon reflection, rather a lot of people I know don't have routers, either; they use shared internet (e.g. xfinity) or only have mobile plans.
Your ISP will still know what you are doing, and will have the ability to block you from doing it should the need arise.
I also feel like people are proposing solutions based on technologies as they stand today. Chosen solutions to the decentralization issue need to consider realistic future cases. Just as a quick thought experiment, what happens in the near future, say, 3 to 5 years out, when a huge chunk of people are using 5g technologies as their primary connections? AT&T, or sprint, or verizon, or what have you will still get all of your traffic information in this, very plausible, future case. "It's encrypted." or "It's going through a relay." Is just not a sufficient response to privacy if you wish to have privacy from AT&T. I mean, think about it, chances are, the relay will be using AT&T too.
Which brings me to the big problem with solutions like these, ie - inevitable recentralization. Google, or Facebook, (and now because of how this new decentralization idea works, even AT&T), are in an almost unassailable position to act as the "switchboard" for all of this non-indexed data. Need to know where your aunt, who just moved, is on this new decentralized network? Are you going to ask google? or "WeAreDecentralizationIdealists.com"? Oh, you're going to ask your own node? Sorry, her new information has not propagated to your node yet. Check back tomorrow. Oh forget it, just call your aunt and ask her to give you the new information so you can input it manually.
No, she will need to be registered somewhere to seamlessly communicate changes to her node's connection information. And that "somewhere" will likely be a BigCo.
If we want to replace the behemoths, we need to come up with solutions that are just as easy to use, (easier actually), and avoid any possible blocking or tracking. That requires some very creative people to think radically. Easily blockable in home network nodes I think, are not only not really solving the problems, but are also doomed to failure usability-wise when compared with google or facebook.
Meh, don't need another box, just keep the family desktop on. But even without that, there is still value. The ISP tracking isn't much of an issue using an existing network like Tor. Software starts up, reads the locally encrypted Sqlite DB for your list of "friends'" onion IDs and connects to your friends gRPC services they are hosting on their machines as onion services. Maintains that stream, begin receiving fed data from these other servers (locally caching as you receive which is more ideal in an ephemeral world than live retrieval, but can be a mix of both depending on settings). All without the ISP knowing a thing except that you are connected to Tor.
> Which brings me to the big problem with solutions like these, ie - inevitable recentralization.
Yup. Can't get easily get around this. People are going to gravitate towards what's easier and what they want on the outside ignoring what's on the inside. It happens to most continually adopted standards, even if it's just a more trusted server w/ more uptime. And that's ok, I don't want to win some ideological battle at the cost of user happiness. I completely agree the software must be so easy you can't tell what's under it, but I don't think it requires that radical/creative thinking. Just user-oriented effort instead of the constant barrage of difficult-to-setup tech demos.
Putting an automatically configured VPN on such a box would be extremely easy, no?
I have family to think about: I don't have time to update my servers every time some new zero-day is fixed. I'd rather pay someone else to deal with those details. My five year old is growing up time with him is far more important that fixing security holes.
Nah, your auto-updating desktop is fine. The software itself might evolve to an hands-off, evergreen-type of approach, but for now just a desktop daemon is fine. The reason keeping servers updated seems so non-trivial is that we visualize it in an ops sense like we're at work.
Often upload is 10x slower than download.
I think the better solution is to leverage cheap vms. At least performance stands a chance. It's just a matter of making cloud computers accessible and usable to non technical people.
Home servers are a very difficult sell (see $500 Helm) compared to VMs running in a data center and IMO the privacy difference is mostly illusory.
^ That is your expected reaction by normal desktop users. I mean literally download an exe and up pops up your feed ready to add your friends, or favorite businesses, news sites, link aggregators, etc given their onion ID (yes, onion ID is annoyingly large, especially v3, but discovery/identity comes later, don't let it hold up the system).
I'm not convinced you need a "home server" in the traditional sense. Just accept what you lose, uptime, if you use your laptop or phone to do the hosting. You can share between them too given a synced private key which is the software's job, not the user's. Still, an ephemeral self-hosted-on-desktop social network can go a long way (and again, people will let the need for uptime drive their always-on desktop decision). This stuff requires such little resources to start, a cheap Raspi w/ a install/reach-from-other-device would work just fine if they don't have a home computer and want one just for this. Large storage can come later.
I do agree the privacy difference is minimal.
There are challenges like establishing initial connections, or push notifications, but these can hopefully be worked out.
This same thought has popped up in my mind as well before... I think it's a probable future only if these devices are discrete plug-in and forget machines and generate revenue for the owner.
I'm running Solid on my own box, and I can't see myself doing it any other way, but it was pretty hard to set it up. We need to change that.
I think you radically overestimate the desire and ability of the average computer-user to consider their devices' uptime.
"Al Gore claims he's an environmentalist, yet he flies using his personal gas guzzling plane."
And yeah, it's an ad hominem. "Is a fallacious argumentative strategy whereby genuine discussion of the topic at hand is avoided by instead attacking the character, motive, or other attribute of the person making the argument"
You can most definitely use centralized servers to disseminate decentralization. I'm pretty sure https://ipfs.io started as a github project and on a monolithic server backed by cloudflare. That doesn't dismiss them in the least - considering they now heavily dogfood.
Yes, I track how popular what content is on my site. Motivates me to write more. Please feel free to block trackers; I do that as well.
I don't care if people run Analytics, its just when they go sharing all that data with a third party that it gets troubling.
You can talk about it all you want, but as long as you continue to participate in the very thing you rail against, you're going to struggle to be taken seriously.
> "But look, you found the notice, didn't you?" "Yes," said Arthur, "yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard.'"
So, basically, there is one data model, RDF, but RDF does not require the same set of fields, to the contrary you are free to write your own. Obviously, you wouldn't get good interoperability if you do. So, there are several things you can do:
1) Adopt what others are using 2) Map your "fields" (we're more for calling it vocabularies), to the stuff others are doing, and rely on apps to figure out interop using reasoners. 3) Don't care, your app will work fine for you.
I mean, 3) is fine, it is just that you'd be missing out. 2) also works, kinda, but reasoners aren't all that easy to use, so I'd mostly like to see people go for 1).
So, we need to make it really easy to find existing stuff. You could go for the big one, i.e. https://schema.org/ or you could go more in detail and look at https://lov.linkeddata.es/dataset/lov/ . The former has a lot of traction, the latter is real decentralized, so I kinda prefer that.
Then, we have to make it real easy to author new stuff when you can't find existing stuff, because that will happen. Then, we need to make it easy for others to find yours, so that they can start using it too for similar applications. And, I'm thinking that it will be kind of a graduation process, where you first look for existing stuff, and when failing to find anything, you just mint your own without thinking about others, just to get something that works up and running. Once your app starts gaining traction, you tighten it up, and if then something other gets popular, you can migrate to that with little disruption.
So, we're not there yet, but we're thinking and working on it a lot.
I encourage you to read the article, where you'll see that I'm arguing from a permissionless innovation perspective, not so much privacy.
Plenty of services are API compatible with Amazon S3 (e.g. anyone can run their own S3 clone) so people can modify existing sites to use S3 with OAuth. Use OAuth to allow the user to delegate access to their S3 service link. No new protocols needed, no big innovations required.
But for this to work on anything other than the most rudimentary data (media files, blog posts, and serialized data) would require completely changing the way all modern applications are written. Databases would all have to change, APIs would all need to follow specific standards, and networks would need to become a hell of a lot more stable, higher bandwidth, and lower latency.
Assume you're Twitter, and you want to map-reduce all of the data of all your users to find out how many people retweeted a user, and then notify those users. Now you need to connect to every user's service provider, get their data, store it temporarily on your own servers, duplicate everything, do your processing, and then write changes back to all storage services for all users. Now do this every second. If you don't, you have to store this map-reduced data on your own service's storage, which violates the principle of only using the user's storage pod.
In fact, data would have to become more centralized to work in this model. Currently, application data exists across a range of services in a variety of networks, all of it being dynamically accessed in different ways before it is accessed by a user. There are dozens of different databases used just to open up the TV Guide on your cable company's set-top box. All of that would have to be centralized in one or two databases in order for the storage and processing to be disconnected.
Not only that, but a lot of data is useless to anyone but the original service provider or original application. Only a Facebook clone would be able to use Facebook's data, and only data relevant to Facebook's ad sales should stay on Facebook's servers, even if it contains "Peter clicked on ad X at Y time". Should there be a separation of what kind of data gets decentralized? Do we really want to go down the rabbit hole of what is my data, and what is data about me that a company has originated and created value from? (Is a picture mine because it's a picture of something I own, or is it mine if I took the picture?)
The idea that every component of every application could be completely decentralized from each other is unlikely. Now, what is more in the realm of possibility is doing a Google or Facebook, and creating features that allow exporting or importing all data. But that process is not perfect, and the procedure can take from minutes to days. And to use this data it would still all have to follow standards specific to a particular application.
And again, we already have a lot of these data standards. We have standards for most of the kinds of data that exist today, such as calendar, contacts, e-mail, instant message, voip, office documents, images, and so on. We have standards to synchronize and syndicate data feeds. We have standards to federate accounts and manage permissions. But commercial sites don't natively build these features as interoperable with each other - because, why would they?
Storage and processing of data are intimately connected with the specific applications that use them, and trying to decouple them will result in inefficiency and complication, with no clear advantages.
> unfortunately, as a business it never took off
Sounds like the timing was too early in 2005. I believe these days we're all so tired of the privacy and security situation, that the world is ready for something like this.
> [it] utilized zero knowledge protocols and made it accessible to common people to manage their data and share with others
This describes exactly what the re-decentralized web needs.
Seeing the many attempts over recent years, it looks like there are significant technical, financial/business and social challenges - but I totally agree with your conclusion, that "the stage is set for this disruption". It also feels like the tide is rising, that the solution is being worked on from numerous fronts and eventually a more evolved system will be adopted by the public.
The crucial point is that Solid will bring more choice: there will be social feed viewers that will be more invasive, and those that will be less invasive. People can choose the one they like, without consequences as to whom they can interact with. Today, we do not have a choice: if we want to interact with people who use Facebook, we have to use Facebook as well.
This sort of discussion looks often like rose tinted spectacles. The past wasn't so different to today.
Easy: use DNS, store the PGP key ID a TXT records, and then look up the public key for that ID using a PGP key server.
So...centralize many/most popular internet services on infrastructure that provides cheap, reliable VMs. At the risk of overdrafting my snark budget: that sounds familiar.
https://news.ycombinator.com/user?id=cordonbleu
this is handy as well.
A related beef though, is that with any reasonable size dataset ontology-based inference is computationally very difficult, you have to cut all sorts of corners and know all sorts of tricks to actually infer across your data. In other words- if semantic data are going to truly become ubiquitous we need to infer in real time across them. Inference takes everything in your dataset into account, so adding a single axiom means if you want to be complete, you have to compute all over again -> slow.
And I can't see how a response of "let's get rid of the chess board so they can't play" would be an adult response to this problem.
But that’s beyond the point that I originally wanted to make: spam and email have very similar parallels to security and the Internet. In another universe, it’s possible that the issue of spam and security could’ve been incorporated into the protocols themselves. But for some reason I’m sure is rational, those issues were moved outside, to the hosts — to be left to be solved with middleboxes as firewalls and Google’s spam filter.
I would argue that this was the smarter choice and that both of these involve the same problem: spam and security are ill-defined and constantly evolving.
I have tried FastMail and I like the company, but the spam filtering is not good. But good luck even trying to replicate FastMail anti-spam if you roll your entire email stack yourself. I have run my own email server even well before Gmail was popular and it was a nightmare, I'll never do it again. If a company asks me to do it, I'll quit.
Even if you put Gmail aside, email became reputation based, which is naturally going to mean the larger centralized platforms will succeed over a few independent outliers because they will always have a better reputation and control what is allowed to go out or in. Just like how the United States still has a large role in the internet and what is seen internationally because so much is still centralized within those borders, or the centralized companies that are within those borders control it.
It definitely is necessary to once set up proper DKIM, SPF, DMARC and TLS by default (thanks to Let's Encrypt), but after that the setup is pretty much hands-off.
Spamassassin filters my spam down to at most one mail a day, and that's usually because they have some new type of topic not caught by the previous bayesian filter.
> This year the event will take place from 3rd to 9th August 2015 in Maribor, Slovenia
2015 ??
The "normal desktop user" should probably not be running their own self-hosting setup, because they will fail at backups and reliability and performance.
I wonder if Ubiquiti would consider selling a "developer" module of the Edgeroute-X or similar with more ram.
Or maybe look for a raspberry pi clone with dual ethernet (not connected by USB). Add a 128GB microsd for $20-$25 to have some room for video, photos, email, chat logs, etc.
As confirmed by the lack of complaints about auto update in Windows 10. I mean, its not like updates ever break anything anyways.
I disagree. I've yet to encounter a system where automatic updates didn't sometimes break things. It's a nice theory, but in practice things break when you change things they depend on even when you're trying not to, and a lot of developers don't try that hard. Not to mention, sometimes you're dealing with design flaws that cause API changes, or programs relying on behavior that wasn't officially part of the stable interface. Auto-updating causes more problems than it solves.
I used Spamassassin back in the day and it was effective for a bit, and then pretty much all spammers figured out how to avoid it.
Also, I'm not talking about the obvious VIAGRA large caps spam, which I think is fairly easy to solve. But spam has become much more nuanced and Gmail is the best at it, and the false positive rate is still amazing.
The only one I can think of right now would be Font Awesome 5 (via Kickstarter), which already was a very popular product with big name recognition and had a professionally run Kickstarter.
Pretty much ever open source project only survives because developers donate their own time, or companies allow their developers to do so.
And even there, imho Bob (iirc) from the video was right - I hate font awesome 5, and always install either version 4 or "fork awesome" in my projects.
If I had data that I legit feared the government finding, I wouldn't be protecting it with a home server - I'd protect it by encrypting the crap out of it and storing it on a pretty generic cloud service, not in my home or anything easily traceable to me.
Your TV Guide is a good example of things that aren't hard. They don't change very quickly, so you can just use a cache. That's easy.
Finding the number of RTs, that's also easy, apart from it being an open world of course. When they RT, they notify you. And you want to display those RTs with your tweet? Just cache those who notified you.
Stable data access standard? That is Solid itself. And the data model, that's RDF.
There are ways that you can go about doing this stuff.
Finally, we're also getting some traction around this in academia, they've been hung up in stuff that isn't helpful for too long.
Finding the number of retweets is also more difficult, because there's other data that gets recorded too. Not only do you have your own data now, you now have the data of everyone else that retweeted you. Is it your data, or theirs? Who is caching it, and how long? How does refreshing the cache effect consistency of each user's views? With decentralized applications you have to choose what kind of functionality you will support.
But, yes, in theory, if you allowed only one service provider to use some given data, you could rely on caching (read: holding a copy of data indefinitely) to a good extent. But as soon as you have multiple using it, you enter the extremely hairy world of multi-master high-availability strong-consistency replication. AKA, absolute hell. But this isn't even the most difficult problem to me.
We already had some good data access standards. The question is, why weren't sites using them to allow data interoperability/mobility? Answer: they didn't want to. So even if you create a technical solution for all of this, the best you will get is the Facebooks of the world publishing a read-only calendar feed, clunky, slow export tools, and single-feature one-way application integrations. Like we have now.
I don't see an ethical or social reason to decouple the data from the services I use, and I don't think the majority of the world population does, either. The only ethical/social concern I have is with the very existence of the service, which is a different concern.
There are protections in both cases. But they are different.
For instance, in the USA, there is no requirement that you have a mechanism to provide information stored at home to law enforcement, even under warrant/court order. If you have one, they can compel you to use it, but if you don't they have no remediation.
But a cloud provider is legally required to have that mechanism, and when order if they don't exercise it, they are punished. See: https://en.wikipedia.org/wiki/Stored_Communications_Act and https://en.wikipedia.org/wiki/CLOUD_Act.
I started a reply stating that one can no longer use <i> elements and must instead include svgs, but then I decided to double check - and found I was wrong all this time... Some minor details have changed, but mostly for the better. Their marketing is a bit annoying too, but that's the price of having a sustainable business model.
I obviously haven't checked version 5 in projects yes, but I guess I should now. So thanks for making me realise my mistake...
That's not really what Solid is about. Solid is not just about me, it is about us. The stuff we do together. The sharing we do, but we share not just with anybody, but with someone we trust. It may be something really trivial: I share my grocery shopping list with my wife. It is not sensitive by any means, but it is also nobody else's business. Those are data used on my terms. Nobody should be peeking into my life to map me, as I go along with my daily business, but my daily business consists of interacting with a lot of people, and I do share and I want to share, but I want personal data control.
Now, personal data control is really the key to permissionless innovation. So, we're not just doing it to protect from snooping, once people have their data then you get a level playing field were there can be competition for the best user experiences.
E-mail is pretty much the last bastion of the old open Internet, and the amount of resources needed to just deal with malicious e-mails is huge. Mindbogglingly huge. And those costs cut out a lot of organizations from being able to operate their own e-mail servers (either the costs of doing it or the costs of verifying to the big players that the e-mail you're sending isn't garbage).
And that's pretty much the story across the board. The old Internet was overwhelmed by bad actors who would ruin everything. Facebook and Twitter house a lot of awful stuff. But can you imagine how bad it would be if we were all still using USENET and IRC?
And, yes I agree, it would make the internet great again.
I wish it was popular instead of random set of forums, mailing lists and reddits.
The walled gardens exist because the open Internet kind of sucks, really.
Was it that, or was it that the open internet was proving more difficult to monetize? Knock on effects on resources thrown at UX may be relevant.Is this still true when “telling who's a ‘robot’” is such a common thing to have happen? For instance, I've heard of at least one major platform both sending back quite a lot of UI telemetry and considering third-party clients a violation of their ToS; I haven't heard of strong action being taken yet. (I'm avoiding naming them both because I'm operating partly on hearsay and because I'm more interested in the general question.) Hasn't bigtech had a lot of time and motivation to advance “how to detect people who are using some weird software to talk to us”?
Simpler forms of technical barriers, like with the AIM protocol, were defeated in the past, but it seems like massively upgraded data backchannels, machine learning algorithms, and the new normality of silent automatic updates all the time might strongly favor a centralized defender. Plus IIRC the CableCARD wars didn't go so great, and there were presumably a lot of people motivated to save money on expensive TV packages, whereas risking losing access to all your friends for having slightly better control over something that's notionally “free” anyway sounds like a harder sell.
I don't think it's easy to defeat “socially required tech” + “automatic updates” + “machine learning” at all.
If the legal barrier went away, and someone surmounted the "unserious" (which I doubt) technical barriers to doing that, then the content providers would go out of business. That's arguably a good thing, but I suspect many people would disagree. The problem is the profit motive and financing model for what consumers want, not the degree of decentralization. Google didn't screw up the internet; people did, by preferring what Google offers.
The internet as used by many, many people consists of a few centralized walled gardens. Walled gardens also exist because of network effects.
An open internet is _not_ a technical issue. It's a legal one.
Perhaps it's a social one as well.
Doesn't exist anymore (often enough).
> Just user-oriented effort instead of the constant barrage of difficult-to-setup tech demos.
Things like identity management and data storage make these barriers pretty deep. “I can delete my post and it basically won't be accessible anymore” (there can be physical exceptions so long as they're legibly exceptions to the social reality), “I don't have to think about how big my images are and can just post as much as I want”, “I can lose any of my own hardware and everything will still be there because it's in the cloud”, and “I can tell who my friends are based on common knowledge within my circles of their unique name which is easy to remember and meaningful” are all things that heavily constrain what you can do “on the inside”.
Mastodon has meanwhile managed to either do something right or get lucky wrt the path dependency of building structures where prosocial hosting behavior is convenient: a whole bunch of mostly-volunteer instances have sprung up, adopters have managed to make instance choice part of identity so that the domain-part isn't just a “meaningless extra thing to remember”, and federation remains reasonably strong; meanwhile, financial support for server costs has mostly leant toward the Patreon model, allowing a fraction of generous users to help support a bunch of free riders while not having to directly participate in administration. At the same time, despite Mastodon having almost exactly copied Twitter's model in terms of available user interactions, the zeitgeist has repeatedly suggested that users getting on board for the first time often had no idea what instances even could be, and had to have the very concept explained several different ways before it got real traction. Random instance death is also a problem that's tempering the mood nowadays, because keeping the server up requires enough motivation which sometimes runs out, and some instances have started having problems with media storage requirements, which, see above (though I'm told the internal architecture could use some optimization too).
There's something deeper in here surrounding the thorough conflation of type with instance in the popular side of the digital world; I feel as though something critical to the more literate concept of this didn't make it into the default folk model, such that only centralized services are legible. I have some hope that Mastodon and related ActivityPub-based federated services absorbing waves of people fleeing the abusive behavior of major social media (such as the recent Tumblr exodus) will make a dent in this and cause the appropriate concepts to reach critical cultural density.
It seems to me that Mastodon gained traction in a way that wasn't much different from email. In both cases there was a communication network which anyone could add a node to pretty easily. Your node could have local moderation policies and you wouldn't have too many issues with getting blocked as long as you acted nice. For both email and Mastodon this enabled a few motivated administrators with some influence to onboard thousands of users quickly, and benefit from a network effect bigger than just their node.
The similarities to email end when you consider _why_ people migrated to the network in question. With Mastodon the waves have been culturally/politically motivated people running away from something (ex-Twitter users fleeing abuse, Japanese loli enthusiasts trying to avoid embarrassment, sex workers seeking an alternative to SESTA/FOSTA regulatory deplatforming). Whereas with email people were running _to_ something - the first free, global communications network of its kind.
It's also interesting to note that email nowadays has lost these advantages (running a server is much tougher now and you can't really pull it off without blessings from Google and others). Sure enough email is now on the decline.
Then “free email” providers on the early consumer Internet were able to compete on things like storage, because that was something it was still acceptable to expect people to pay attention to. One of GMail's original big draws was “lots and lots of storage”.
Email being in decline for social purposes feels partly related to changing feedback expectations and UI inertia, but I'd guess also some relation to a “mental association with uncool things” including both spam and stressful/boring transactional email; it's become more of a business thing. The spam side of that is related to the gradual federation lockdown, and something similar could happen to Mastodon, but at least there's some discussion about it happening in advance now.
I'm not sure what this all adds up to.
Only true if you care about Gmail (a centralized service) getting your e-mail and delivering it to its users. You could just not care about Gmail and still interoperate with lots of mail servers - even lots of servers that do have Google's blessing.
I literally do not know anyone who owns a desktop computer any more.
Another way might be something similar to ad-blockers for web browsers.
In my book, they're moderately worse. I entered the internet around the time of free web forums, where anyone could run one for any reason. I moderated a couple moderately sized ones, mostly oriented around computer games. Overall it wasn't too bad. Certainly nothing compared to what I've seen in larger communities. I suspect the larger the community, the worse the garbage.
But the main reason I say commercial players have made it worse is that they've also commercialized content moderation. Which is to say, they employ people to sit at a desk looking at the absolute worst humanity has to offer for 8 straight hours a day for barely better than minimum wage. That's like a job straight out of Black Mirror.
Forum moderation, by contrast, was/is a volunteer position. You were only in it for as long as you chose to be, and you could leave at any time without any effect on your livelihood.
So I'd argue the "community watch" model of amateur forum moderators was closer to the greater good than the commercial walled gardens.
Had to replace my dishwasher because the controls shorted out.
A motor or the display could just as easily burn out, but this article is about software problems, which you could fix with a workalike board.
A) know how the sausage is made, and
B) stand to actually lose something if people believe them, i.e. they are standing by their point in spite of the negative consequences to themselves.
People on the dole who argue against (details / implementation of) social security should be taken seriously. Rich people arguing against tax breaks for the wealthy. Programmers against big tech firms. Etc.
If everyone owned a personal cloud box, security procedures would quickly fall off a cliff, those people are now the new botnet.
Maybe leave people alone, stop sticking our fingers in all the places they might stick to money. Let the geeks take care of their little tribe.
Though its not really simple enough for non-technical people to set up.
It's really self-host heaven. I don't pay for Fastmail, Spotify, or video streaming anymore.
[1] https://github.com/linuxserver/docker-airsonic
[2] https://github.com/linuxserver/docker-beets
[3] https://www.synology.com/en-global/dsm/feature/video_station
In the depths of my soul I would love to re-decentralize the web. I truly believe data centralization will cause people to suffer a lot. decentralized tech needs to solve so many problems before alternatives to centralization become viable. Centralized approaches also improve over time and are a moving target to keep up with.
Which NAT/PAT/port-forwarding issues are you thinking of?
This is what hubzilla can do, along with using the Zot protocol for distributed identity.The main issue really is having an easy wizard-based installation/configuration for new users wanting to host.
Why: You only need one one for a family and most of the time there's already a person in the family who does "PC stuff". And even if there isn't there's always someone who'll learn it if a friend has one.
The rest of this post is not targeted at you but rather on a whole attitude here at HN:
-------------------------------
Anyone who can operate a web browser, has any education in IT and knows enough English to read instructions in the box should be able to set up one.
In fact I think just being able to read the quick start instructions should be enough to install one with basic features.
Setting up websites in the 90ies - early 2000s were a lot harder. Same goes for using older PCs with DOS.
A major problem today seems to be learned helplessnes. In our well meant and to some degree profitable[0] effort to make sure anyone can use anything we have are creating a situation were people are more helpless .
Seriously: if app stores and walled gardens had been introduced first the web and email had been considered to complicated now. I can imagine HN: """You mean my siblings, parents and grandparents are going to install this "e-mail" thing? Even if they were able to configure "smtp" and whatnot they'd forget the "email address" or even how to start it before tomorrow."""
[0:]: if anyone doesn't catch my drift, your brightest customers might not be the ones who pays most ;-)
Edits: a number of them :-)
That's not true at all. Confirmation bias is rough when you're technical; you keep spotting other technical people.
But I'm not totally convinced either: email has been huge despite the configuration needed, also with people who had to take it step by step twice and make notes while doing it. Some figured it out on their own (or more realistically using the step by instructions that came bundled with their first modem). Other had a son or a grandson who'd picked it up at school. Others got it at work.
My grandparents where the youngest group of people I can think of that didn't have access to email somehow.
And my wifes grandparents have/had access to mail and used actual mail clients too, not just Hotmail or Gmail.