Re-decentralizing the Web, for good this time

Re-decentralizing the Web, for good this time(ruben.verborgh.org)

527 points by Schoolmeister 7 years ago | 281 comments

tmcw 7 years ago |

It puzzles me that the linked data future is still discussed, as if we didn't already try it, and didn't already discover that developers dislike arcane RDF standards and the academic-rooted designers of the specifications have a terrible track record of solving real-world problems. And that now they're presenting linked data as some critical component of the decentralized web while skipping out on the debates that everyone else in the space is having - like whether decentralization can be fast, or how to ensure data authenticity, or whether a 'local server / pod' can be built that doesn't get hosed by hole-punching through a home Comcast connection.

Instead, it's just 'what about old-fashioned websites, plus lots of xml schema and long spec documents'? It just tastes like a rehash of Berners-Lee's existing '5-star open data' schpiel ( https://5stardata.info/en/ ) but now with the billing that it'll fix the internet. 5-star open data has been around for years now, and, well, the linked data future isn't here. When's the last time you consumed RDF in an application?

svachalek 7 years ago | |

I really, really sympathize with the goals here but when I read through these proposals a few months ago I literally facepalmed. They seem about as realistic as praying for some kind of deus ex machina.

Ultimately I think there are technical solutions to making the decentralized web more attractive than the walled gardens, but at this point they will need to be ridiculously polished and shiny to even get a look, and this stuff... is not. Going forward it gets even worse, they're going to be opposed at every step by corporations with more money than most nations.

The internet was originally decentralized because the government wanted to make it that way, and I think the only way to get back there is going to require a gigantic, economically unattractive investment. There are at least a few governments that may have the capability but I can't name one that would have the motivation. Hopefully some billionaire's charity will decide saving the internet is a worthy legacy.

cookiecaper 7 years ago | | |

The internet is already decentralized. Some billionaire can't do anything to fix the situation, at least not directly, because our draconian copyright and network access laws are the only reason that walled gardens are able to exist.

The internet doesn't really tolerate serious technical barriers stopping someone from automatically multiplexing the content from various social networks into a single read-write stream, for example. The issue is that when someone attempts to do that kind of thing, they get sued and they end up owing BigTechCo millions of dollars. [0]

An open internet is _not_ a technical issue. It's a legal one.

[0] https://www.eff.org/cases/facebook-v-power-ventures

eeeeeeeeeeeee 7 years ago | | |

I found it interesting they highlight the decentralized nature of email. And yet the decentralized design meant spam was basically unsolvable. It wasn’t until Gmail came along that they largely solved the spam problem.

There are so many abuse related issues on the web and I’ve seen no decentralized effort that works unfortunately. Cloudflare brought cost effective DDoS protection to the masses.

13415 7 years ago | | |

The problem with this 'Solid' project seems to be that it is vaporware.

Anybody who wants to advance the open web should focus his efforts on a P2P library with extremely good NAT traversal capabilities that is extremely reliable and simple to use and supports as many programming languages as possible - certainly not just C++ or C. It needs to be deployable under a permissive license on all major platforms macOS, Windows, Unix, Linux, iOS, Android, and browsers, and may not transport any data or chew away bandwidth without allowing total control over this by the programmer and end user. It needs to have a dead simple, almost idiot-proof API. The resulting network on top of IP needs to be searchable, not too high latency, and route to any endpoint on it.

That's still the biggest hurdle for the Open Web. Everything else is secondary.

rubenverborgh 7 years ago | |

You'll be surprised to hear that developers like Linked Data. People starting with Linked Data development today are not burdened by the Semantic Web legacy and mistakes of the past. We've been working with front-end devs who have never seen RDF, and never will. They enjoy how Linked Data is able to cross borders and leads to more data than a centralized database could ever give you.

The confusion in your comment is that one would need RDF to do Linked Data. I've written about that misconception here: https://ruben.verborgh.org/blog/2018/12/28/designing-a-linke...

Don't get me wrong, the Semantic Web community has made mistakes and has not been developer-friendly. But we're not still stuck in the 90s. For instance, XML hasn't been a part of any of this for many years.

anomie31 7 years ago | | |

By RDF did you mean RDF/XML specifically? JSON-LD is still RDF, it's just serialized differently, which is fine, I like RDF, but OP may have more specific concerns than the syntax.

brianberns 7 years ago | | |

The very first paragraph on linkeddata.org says it is for "exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."

pickpuck 7 years ago | | |

I've been thinking someone with the expertise and influence should write "RDF: The Good Parts"

AKA instead of "Eval is Evil" we might instead say "XML is Evil"

xipho 7 years ago | |

You're right, there are a lot of academics that like the idea of a semantic web, it rings true to a lot of scientific principles. There are also a lot of ideas and far fewer day-to-day applications. Science ticks a long a lot slower than startup culture, however, so it's not suprising that a) understanding of the issues comes slower, but also b) some "experiments" that would utilize semantic data have not yet been fully tested. Read a list of points that address "why the semantic web is dead", many of those points are precisely what science seeks, i.e. principles that promote a slow, and deep understanding of a domain of knowledge.

With open-science mandates coming from governments around the world researchers are looking for ways to share their data in meaningful ways. I can think of a significant amount of research that regularly consumes RDF, particularly in the fields of medical biology and genomics where it's used to annotate data. This is where I'd guess you'll see it take a foothold, for example medical diagnosis codes are notoriously disparate and there is a strong appreciation for what semantics could address. Unify, exchange, and consume medical diagnoses ... proffit.

Links etc. off the top of my head-

* GO - The gene ontology, used in hundreds of thousands of genomic anotations https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3944782/ * UBERON - https://genomebiology.biomedcentral.com/articles/10.1186/gb-... * The second year of US2TS - http://us2ts.org/2019/posts/registration.html * OBO foundry - https://github.com/OBOFoundry/OBOFoundry.github.io

mbrock 7 years ago | | |

I’ve been learning about Barry Smith’s project for a scientific base ontology, Basic Formal Ontology, and it’s really fascinating stuff.

degyves 7 years ago | | |

Even more, we're already there: US ONC Health IT and HL7 are currently on the FHIR standard, which is slowly integrating linked data principles. E.g. currently uses JSON-LD: https://www.healthit.gov/buzz-blog/interoperability/heat-wav...

Karrot_Kream 7 years ago | |

There's been a lot of effort to improve RDF ergonomics with JSON-LD, and ActivityPub is a widely used standard based on JSON-LD (though my experience with implementing it has been quite challenging).

hypothete 7 years ago | | |

I feel you on implementation - every time I make an attempt to try out ActivityPub I get intimidated by the combo of JSON-LD and the verbosity of Activity Streams vocabulary: https://www.w3.org/TR/activitystreams-vocabulary/

walterbell 7 years ago | |

> skipping out on the debates that everyone else in the space is having - like whether decentralization can be fast, or how to ensure data authenticity, or whether a 'local server / pod' can be built that doesn't get hosed by hole-punching through a home Comcast connection

Where is a good place to participate in those debates, especially data authenticity and local server pods?

FreedomBox didn't go anywhere. FreeNAS with ZFS is reliable but not designed to be exposed to the public internet. Many local services are using a centralized rendezvous server for NAT hole punching.

On the shiny commercial front, MyAmberLife has $13M in funding for a home server but it's mostly controlled by a central cloud service. Do Western Digital, Synology, QNAP, Drobo, etc care about decentralization?

tmcw 7 years ago | | |

Some relevant efforts are dat (and Beaker Browser, the user-friendly frontend), Secure Scuttlebutt (and Patchwork, the user-friendly frontend), and IPFS. Somewhat less legit (imho) is ZeroNet, and somewhat earlier-stage or more obscure is Upspin.

degyves 7 years ago | |

Not only now Linked Data + RDF is even simpler and nicer to learn: There are currently more libraries to work with. Also it is more critical than ever for many industries.

Current solution for several issues related with electronic health records concluded to create the new standard, to use RDF and linked data, which solved most of the issues on the previous standard. See FHIR: https://en.wikipedia.org/wiki/Fast_Healthcare_Interoperabili...

In fact, current linked data discussions seem to me that become relevant again because it is more clear now that we have been misusing/overusing/ bad REST, microservices architectures and GraphQL for some already analyzed and solved problems.

But, of course, for a single application which doesn't require interoperability, not requiring standardized data exchange formats, not requiring support for flexible data representation, Linked Data and RDF will be clearly unnecessary. But on time, the future of data interconnection plays on the side of Linked Data IMHO.

Until now, current attempts to create some Linked Data + RDF alternate infraestructures are more likely to create ad-hoc, informally specified, bug-ridden, slow implementations of Linked Data and RDF.

smarx007 7 years ago | |

In my opinion, the linked data future is still discussed, because when Tim Berners-Lee first presented the idea of Semantic Web in 1994, he used what I would call an IoT scenario to describe it ([1], scroll to the end; I really hope he is not reading this comment). Now that the IoT is here, maybe more people are ready to listen.

And just to make sure we are on the same page here: it's not academics' job to build usable products. We will continue working on things that are novel from the academic standpoint; if people like you dismiss LD/SemWeb, those novel things will have "a terrible track record of solving real-world problems". I hope this does not come across as too personal.

[1]: https://www.w3.org/Talks/WWW94Tim/

y4mi 7 years ago | | |

No, your link doesn't talk about an iot scenario. Selling and purchasing houses has absolutely nothing to do with iot.

And using semantic web for that is just as bad. A basic json API would be much more stable than parsing a document with navigation and similar just to get that data.

shriphani 7 years ago | |

Agreed 100% schema.org and whatever fb's equivalent is only "succeeded" because of the incentive in their ranking models (thus centralization).

Highly unlikely someone will bother with this (in addition to all the other quirks) while making their website.

kgwxd 7 years ago | |

Why does it matter the last time anyone consumed RDF? If it's a good fit for the problem at hand (I don't know if it is), then there is no reason to not use it. It doesn't matter how old an idea is, just how it gets used.

kodablah 7 years ago |

I think the part ignored by so many is the need to decentralize the computers into the home. I'm not talking meshes or shared resources. For the majority of use cases, we don't need distributed storage, compute, etc. Just start making these self-hosted "servers", "data pods", etc as easy to install as desktop software and make it clear that they are inaccessible when the computer is off. People that aren't already will gravitate towards at least one always-on machine in their house. Modern societies have reasonable upload speeds and electricity/network uptime to support it. Sure, things like ISP firewall/NAT and dynamic IPs are a bit of a barrier, but you can have volunteers help with relays.

For example, I can easily fire up a Tor onion service on my never-turns-off home desktop computer and reach my stuff from anywhere. Why can't I reach my friends' stuff the same way? Because, to use business-speak, there's nothing "turnkey". It's something I've been pondering and working on. Sure, the bigger players may have to be in DCs, have more stringent uptime requirements, and distribute their bandwidth/workload more. But for most of us, desktop software and web-of-trust style connections could go a long way so long as the front of the software has a FB feel (e.g. a feed, messages, etc). We can tackle discovery, searching, aggregation, offloading, etc later.

sascha_sl 7 years ago |

The W3C has a proven track record to produce overengineered shit when it comes to "the semantic web".

Just look at ActivityPub. It's essentially OStatus but instead of XML we slapped namespaces on JSON, wrote a bunch of overly complex preprocessing procedures so that everyone can output just the way they want[1] and still made half the spec ambiguous enough[2] that implements essentially follow the one rule that matters, maintain compatibility with Mastodon.

[1]: https://www.w3.org/TR/json-ld-api/#algorithm-5

[2]: https://please-just-end.me/ap.html#block-activity-outbox (domain name relevant to content)

zozbot123 7 years ago | |

The Semantic Web does not require XML these days. JSON-LD (for JSON) and GRDDL/Microdata (for HTML) are widely-acknowledged standards. For simple text use, akin to a Markdown-formatted document, you can use Turtle. If you believe that JSON-LD is genuinely ambiguous, take it to the authors of that spec and contribute to getting it fixed.

sascha_sl 7 years ago | | |

Do you think a format that requires this amount of code to predictably parse is a good format?

https://github.com/kazarena/json-gold/blob/master/ld/api_nor...

StreamBright 7 years ago |

"In order to regain freedom and control over the digital aspects of our lives"

Nothing proves his point more than:

  <script src="//www.google-analytics.com/analytics.js...

crucini 7 years ago |

I skimmed this and see two big problems.

First, this idealistic idea that "we" are going to take back our data. Who is this we? Only the smart, high-agency people who have time to spare. The commercial web is increasingly tuned to the normal user, who is low-agency and easily led around. Who will win a battle of user acquisition and retention? Facebook or the rebels? Facebook of course. So any solutions proposed here are just for a tiny percentage of users who will then be isolated from the real and useful social networks. Or more realistically use both.

Or maybe if the infrastructure is built, a layer of savvy entrepreneurs can emerge to monetize it? I'm thinking of reaganemail, selling an anti-google email account to the AM radio crowd.

Second, the idea of somehow eliminating censorship. De facto censorship will always exist, even if you sugar coat it as Twitter has tried - "your content is still there, but only if someone explicitly looks for it". Any platform without censorship will just be flooded by every marketer and political zealot, for starters.

Also, I think he is conflating filter bubbles with centralization. Without centralization, wouldn't we still have filter bubbles as people self-select into their online communities?

gerbilly 7 years ago | |

Well, when I got on the internet in 1988, we were all 'smart, high-agency people who have time to spare'

Supposing we manage to solve this problem, what's to say average people can't participate in 10 years or so or so when the tech has been made easier to use?

flixic 7 years ago | | |

Early web was quite decentralized already. Many separate Bulletin Boards, later forums. Many people writing there had an idea how to create their own.

It didn't start centralized. Centralization happened. I might be more cynical than I should be but as a designer I struggle to see the future in which we have social dynamics that favor decentralization instead of convergence into a less self-managed system (i.e. all current centralized networks).

TuringTest 7 years ago | |

> Without centralization, wouldn't we still have filter bubbles as people self-select into their online communities?

Perhaps, but those would be self-selected, not imposed by the provider. Big difference.

orthecreedence 7 years ago |

I've read a bit about Solid in the past, but never quite understood how it will handle different data models. Does it force social data to all look the same (as in, have a predefined set of fields)? If not, how do apps built on it interoperate?

Don't get me wrong, I'm all for projects like this. I think it's wonderful. I just never really got how the apps will work with the same data without being forced into a particular data model (which seems like it would limit what you could do).

peterwwillis 7 years ago |

The web is decentralized, and we already have control of our data. The problem is, people keep giving it away.

I'm fairly confident that 98% of the population of the earth doesn't give a crap that their data is collected, or that they don't "control" it. This whole "decentralized web" thing is just privacy nerds trying to convince us that we need this, when really no regular consumer is asking for it.

jpollock 7 years ago |

Technology won't help with this, regulation will.

We have parallels from other platforms - specifically the fixed and mobile phone networks.

There used to be monopolies in local phone service. There were new competitors, but to change provider, you had to change phone number.

Even changing cell phone provider required a number change.

This obviously had strong network effects pulling you to stay with your provider. You had to tell _everyone_ in your extended network where to find you and have them update all of their business records when you changed from one carrier to another.

Eventually, everyone figured out this was stupid, and Number Portability [1] was forced on carriers by regulation.

This problem is completely gone now. You can take your number with you.

If we allow people to take their data to new social networks, and force federation, then we will get decentralization. However, it won't happen without regulation anymore than it did with the phone companies.

[1] https://en.wikipedia.org/wiki/Local_number_portability#Histo...

TylerE 7 years ago | |

If you have ever seen any attempt by any government to regulate software you would know this to be a Lovecraftian nightmare, and not a solution.

jancsika 7 years ago | | |

If you have ever seen any attempt by any volunteer-run FLOSS team to solve nationwide social problems with technology you would know this to be a Lovecraftian nightmare, and not a solution.

jpollock 7 years ago | | |

The way this usually works, is there's an entrant into the market who decides it is cheaper to use the government to gain access to a network instead of building their own (Number Portability, Local Loop Unbundling).

The government then says "You have to allow competitors access to X, and you have to do it by date Y".

Then the companies get together and agree on how to do it because they agree that government dictated standards suck. There is usually some jostling around with someone wanting to run a centralized database for a nice per-transaction fee. Typically this is tossed out in committee, but not always.

staticvar 7 years ago |

Beaker Browser is a cool experiment showing how you can decentralize the web that is both easy for end user's and fun for developers because it pushes web standards.

https://beakerbrowser.com

*disclaimer: I help develop Bunsen Browser, the mobile companion for Beaker Browser.

MarsAscendant 7 years ago | |

Could you explain to me, a newbie, what Beaker promotes and what its advantage is against browsers that use HTTP?

skybrian 7 years ago |

I'm not sure this is a coherent plan, since it doesn't talk about how privacy rules get enforced for services. Who vets the services? If a fun game that you let have access to your "personal data pod" and it turns out to be Cambridge Analytica and just copies everything it sees into its own database, how is that an improvement over Facebook apps?

Choosing between service providers is no more meaningful for privacy than asking Windows users to download arbitrary apps. If smart phones are any more secure than desktops, it's because Apple and Google are constantly improving OS-level security and policing their app stores for malware.

Of course app stores have well-known flaws. But if we want to do better than that, someone has to figure out a better way to choose good rules and enforce the rules better.

deevolution 7 years ago |

The people dont care about decentralization or centralization. This is all a big generalization, but humans are lazy and when it comes to making a moral choice, they're going to pick the path of least resistance and completely ignore all moral consequences. Its looks like at the moment centralized services are what the people want and it's what they deserve.

repolfx 7 years ago | |

Why is "decentralisation" even the moral choice to begin with. A lot of projects claim to be decentralised, but when you ask "ok, who has the power in this project" it turns out that a small cabal of developers has most of the same rights and powers a corporate, centrally hosted service would have. It takes very careful analysis to discover whether a thing is really meaningfully decentralised or is just claiming to be.

LukeB42 7 years ago |

Shameless plug but I designed and wrote something for doing this from 2011 to 2015 because nothing like it existed or indeed exists as far as I'm aware.

It's a p2p caching proxy that also lets you edit web pages collaboratively in realtime over a LAN or the internet. It has a contacts list system and p2p chat functionality. This project effectively died due to lack of interest and I still have various security concerns about it (Should you break/reimplement Same-Origin policy or break/reimplement the TLS chain of trust?)

The main security concern is that because it decentralises HTTP in-place (existing URLs can now be looked up on an arbitrary number of overlay networks if the original URL isn't providing an OK response) it puts users at risk of malicious actors spamming overlay networks with browser exploits for popular resources like "news.ycombinator.com/".

I hope TBL and co converge on satisfying answers to these problems or constrain their design to not bother with decentralising existing URLs in-situ.

Code lives here: https://github.com/Psybernetics/Synchrony

Feel free to shoot me any questions.

gpsx 7 years ago |

Other people here have said this general idea here, the large centrealized services like Google and Facebook have succeeded in becoming so big through a lot of effort and a lot of cost, which was paid for by all the money they make. At a minimum they have to pay for their server use.

From what I understand the proposal here seems to not allow for the advertising model. I don't think a services can grow and survive making people paying because people are too cheap.

There might be a better chance for something like this is they allow for the economics. - Maybe the data host can provide a "advertising" profile which the user has control of. This can be exposed to the application hosts to allow for advertising. - Maybe you also throw micropayments into the mix, along with bartering for information or micropayments.

Another issue is complexity. A number of comments have talked bout over-engineered solutions and protocols. This decentralizezd idea could be started with something small like an open social network standard. I think I saw something similar to this on HN not too long ago: - You have a web site, which is your profile. A provider could give you a nice editor for it. - You have a feed, where you can put pictures, short posts, long posts, whatever. This is distributed with RSS. (The host makes this all seamless for you.) - Identity is controlled with OAuth, used only to give an identity to visiting users. The owner users can manage permissions for certain remote users (his "friends")

Such a service could be managed on your own web server, or there could be different cloud providers that make this arbitarily easy, with arbitrary levels of functionality on the "profile" page, the "feed" and the "friend" permission management.

qznc 7 years ago |

> From the above, it is clear that our primary obstacles are not technological [5]; hence Tim Berners-Lee’s call [6] to "assemble the brightest minds from business, technology, government, civil society, the arts, and academia to tackle the threats to the Web’s future". Yet at the same time, computer scientists and engineers need to deliver the technological burden of proof that decentralized personal data networks can scale globally and that they can provide people with an experience similar to that of centralized platforms.

This whole article looks like "well, the obstacles are not technological, but let me write a few pages about technology anyways".

If the obstacle are not technological, then we need non-technological solutions. So far I think GDPR is one such non-technological step towards taking back control of our personal data.

The hardest problem in my opinion is "preventing the spread of misinformation" because we essentially need a way to distinguish between malice and stupidity. Without mind-reading I do not see how this could be possible at scale.

rubenverborgh 7 years ago | |

Yes, I was asked specifically to write a chapter on Tim's work. What we have is the technology to make it work—it's a necessary, but not a sufficient condition.

wmf 7 years ago |

I can't escape the feeling that SOLID will be at best neutral but likely will make things worse. Some of the diagrams show more companies having access to your data where it will continue to be mined, sold, etc. If you control storage but not execution it seems like you control nothing.

rubenverborgh 7 years ago | |

You control access permissions too :-)

yati 7 years ago | | |

An application with access to bits in your pod can always copy that data to its own storage right? e.g., a Facebook built on Solid can still grab all the data it needs (with the requisite perms) and store it away, build a profile per WebID, continue serving "cached" copies of changed content, etc. What are your thoughts on this?

rickcogley 7 years ago |

The problem is people. To put it charitably, not everyone is "technical" enough to figure out how to own their own data, so I think silos and walled gardens are here to stay, because they are quick and easy for people. I for one, fully support keeping my own data in (as much as possible) future-proof formats, and although I've had a blog in some form for years, I want to move away from standard social media as much as possible.

maxk42 7 years ago |

The web cannot be decentralized without putting an end to SSL. As long as certificate-issuers are the arbiters of commerce and browsers push users to trust unsecured websites more and more, malicious governments will be able to silence people by revoking their certs.

There are stronger alternatives. We need to make a push to begin using them.

jakelazaroff 7 years ago | |

I'm not sure what stronger alternatives you're referring to. Can you elaborate or share any resources?

sparkie 7 years ago | | |

If, instead of identifying services by some human readable name, they are identified by their public keys, then we don't need certificates - there are several encrypted and authenticated transport protocols which only require knowledge of the destination's public key upfront.

You then need an alternative name system which links a unique human readable name to a public key. This is the tricker part (see Zooko's triangle), but there are some creative solutions like Namecoin and the Blockstack Name Service.

peterwwillis 7 years ago | |

Your argument might sound stronger if you didn't use the name of a dead protocol. All use of SSL has been prohibited by RFC since 2015.

I'm pretty sure there aren't better alternatives.

kodablah 7 years ago | | |

People tend to use "SSL" to mean "CA-based TLS" these days, as in "SSL certificate".

mark_l_watson 7 years ago |

I went to the Decentralized Web Conferece a few years ago and really liked it. In spirit, I am onboard.

In practice, I am satisfied with just using my own domain for email, my web site, and self-hosted blog. For communication I like FaceTime so I can see people while I am talking with them, phone, and email.

I still use social media, very occasionally, to see what people are doing and sometimes advertise my new open source projects and updates, and any books I write. Most of the problems people talk about with Facebook/Twitter don’t bother me as long as I only use the systems infrequently. I am not tempted to cancel my accounts.

snazz 7 years ago |

The design, typography, and diagrams in this article are wonderful. I like it when people pay this much attention to detail!

peteforde 7 years ago |

This essay is in radical need of a TL;DR. If something is this important, you owe it to the subject matter not to bury the lede under a mountain of history and flowery exposition.

Ask yourself: who is this for? People who are not already deeply passionate will stop reading unless they are engaged in a minute of reading. Note that a minute is being extremely generous; on a commercial consumer site, it's apparently an average of 7 seconds before someone will click away.

I recommend that you check out this video and reconsider how you might reframe your message as a call to action that speaks to a better future we can create together.

https://youtu.be/qp0HIF3SfI4?t=121

I even jumped you to the good part.

sprayk 7 years ago | |

I definitely scrolled around, looking for some kind of summary. I wantee to figure out if I was gonna be wasting my time reading another recap of what I lived through or if there was gonna be a proposal for some way to get back to decentralization that I could evaluate and keep in mind when designing my own apps. Couldn't figure it out from scrolling so I bailed.

jshen 7 years ago |

I’m all for the principles here, but one worry I have is the loss of efficiencies afforded by economies of scale which could dramatically increase the carbon footprint compared to the centralized versions.

wmf 7 years ago | |

This is why carbon needs to be priced so you could have facts about the magnitude of the problem (spoiler: probably pretty small) instead of trying to make qualitative tradeoffs (is it worth destroying society to save the environment?)

firefoxd 7 years ago |

Decentralized web can be downloaded and backed up by one entity. Then, you can go to that centralized entity to enjoy all the content.

If we still don't have decentralization, it's because it is not as easy.

vinay_ys 7 years ago |

In 2005, I worked at a startup that attempted to solve the problem of privacy and security for personal information (photos, home videos, music, personal health/finance documents, contacts etc) while also providing ability to share and collaborate.

The solution involved running a mesh network with nodes on user's laptop or desktop and a corresponding node in the cloud. These nodes would index local data and provide replication of metadata across nodes and backup of actual data to cloud node.

A locally running web app acted as replacement for 'windows explorer'. It allowed the user to access all their files and folders across all their nodes, access them (open document, play music/videos, see contacts etc), create smart collections and share these files, folders or collections with other users in a secure authenticated and private manner.

User got an identity - which comprised of a dedicated domain (or subdomain) and a PKI certificate tied to that domain. Each node had it's own private key and their public keys were tied together by the identity certificate.

All communication between nodes (of same user or across users) where authenticated and encrypted using these identity/node keys and certificates. No central node existed in the system that could spy on these activities. The architecture separated the network discovery cloud nodes from your data cloud nodes and architecture allowed for your data cloud nodes to be hosted separately anywhere (say, in your own cloud instances).

This is the only system I have seen that utilized zero knowledge protocols and made it accessible to common people to manage their data and share with others as well.

But unfortunately, as a business it never took off. It got acquired by emc and merged with mozy (good old data backup company) and then this product died a silent death in 2010.

Maybe it was timing, maybe after snowden, if this product had launched it would have done well.

But now, I think a more urgent and a relatively less complex problem to solve is one of distributed communication. In this era of always connected powerful devices (mobile phones, home gateways), why don't we all have our own personal email/chat servers that nobody else can spy on? Why does email and chat have to get relayed via big aggregators who mine so much data as well as metadata?

Not only do they violate privacy, they succumb to security breaches and cause serious damages.

I feel the stage is set for this disruption: crypto protocols, always-on cheap connectivity, compute power at the edge, and sensitivity to privacy/security in general population – all of these ingredients are appropriately set right now for this to happen.

captainbland 7 years ago |

The point about the decentralised web allowing the permissionless creation of centralising systems reminds me of the paradox of tolerance, where tolerant societies are thought to be taken over by intolerance if they tolerate that intolerance.

Maybe this is a lesson that we need to be less tolerant towards the creation of centralised services because those with money and power will seek to bring decentralised systems under their own control.

transpute 7 years ago |

For the technically savvy, you can run a virtualized desktop:

  - GPU passthrough VM (gaming)
  - SATA passthrough (FreeNAS)
  - multi NIC passthrough (pfSense/OpenWRT)
  - app server/cloud/P2P Linux or FreeBSD VM(s)

http://unraid.net sells a KVM-based product. VMware ESXi and XenServer are free. Connect a Ubiquiti AC-Lite WiFi access point to a dedicated NIC on the x86 box, WAN to another NIC. Since pfSense owns the WAN NIC, it can host a VPN server for your devices, including mobile. All VMs get virtual NICs. Dell T30 with quad-core Xeon and ECC costs about $400 with 8GB RAM and 1TB disk, it can hold 4 x 3.5" drives (20 TB in RAID-1) and 2 x 2.5" SSD.

Level1Techs has intro videos on home servers: https://www.youtube.com/results?search_query=level1+home+ser...

Advantages:

  - Stable and boring x86 platform
  - Good performance for gaming
  - Commercially supported hardware
  - Upgradeable storage and GPU
  - Upgradeable router software

miguelrochefort 7 years ago |

Great article Ruben! I've been following Solid's progress for a while, and I think your article very eloquently summarize its purpose and relevance. I'm especially interested in the ability to circumvent the middle-men, and resolve the marketplace chicken-and-egg problem once and for all.

Watching your TED talk in 2013 was one of the most influential moment in my life, and discovering the semantic web was perhaps my greatest epiphany. While the vision never left my mind, I never acted on it. Until now.

I'm dedicating 2019 to linked data. I'm going all-in.

Last week, I started to build a tool to convert unstructured input to linked data. Even after recognizing canonical literals (email, phone, url, color, gender, boolean, integer, float, date, time span, money, weight, distance, language, image, geo coordinates), I couldn't accurately infer predicates and guess classes. Before trying more complicated stuff like bayesian inference, I decided to try a simpler exercise.

This time, I want to aggregate structured data from different sources and map it to some existing ontologies. For example, I want to convert some JSON about comments and links from Reddit and Hacker News to RDF using the http://schema.org vocabulary.

- Can I feed the JSON into some ML system that automatically figures out the mapping? What if I provide some annotation or feedback?

- Can I manually turn the JSON into JSON-LD and use that as the mapping information? What about complex transformations (different structures and literals)?

- Should I implement the mapping manually using my favorite programming language?

- Should I use R2RML or RML?

What's the state of the art today for semantic data integration?

jimsmart 7 years ago | |

Maybe take a look at FRED? (Disclaimer: not used it myself)

- Homepage http://wit.istc.cnr.it/stlab-tools/fred/

- Paper https://www.researchgate.net/publication/280113533_FRED_From...

There are likely other projects and papers, google 'text to rdf nlp'

Stephen Reed (ex-Cyc engineer) also did some interesting work in this field, in his Texai project, over 10 years ago. Although there are few references to it on the web now: that part of his project is no longer open source (and I know of no known mirrors).

- Paper https://pdfs.semanticscholar.org/8026/107de65c5a14aa8d0d47f9...

- Homepage http://texai.org

jimsmart 7 years ago | | |

Very much related, "Populating the Semantic Web—Combining Text and Relational Databases as RDF Graphs", Kate Byrne.

- http://homepages.inf.ed.ac.uk/kbyrne3/docs/thesisfinal.pdf

jimsmart 7 years ago | | |

Related paper by Stephen Reed, "A (very) brief introduction to fluid construction grammar"

- https://www.researchgate.net/publication/228378264_A_very_br...

jimsmart 7 years ago | | |

Actually, the online API for FRED seems broken, and none of it seems to be open source - and the paper is light on details.

Sargos 7 years ago |

Solid looks to be trying to reimplement what platforms like Ethereum are already building. The same ethos is there and this is very well written but I wonder if the Solid project just missed that when doing their research. Hopefully all of their efforts don't go to waste and they can extend some of their work to the broader decentralized web community.

rubenverborgh 7 years ago | |

No: blockchain technologies are about reaching decentralized agreements. Solid is about everyone being able to write their own things (so no agreement) without centralized parties.

cslarson 7 years ago | | |

The Ethereum project is about providing a complete decentralized web3 stack - not just a blockchain, though the database layer that provides is a critical part of it.

kjetilk 7 years ago | |

Not really, it is pretty much the other way around. :-) We're basically building the simplest thing that could possibly work, they are rebuilding a lot of infrastructure that they have to use, but we can use where it makes sense. So, they are kinda trying to implement the Web, which is a much bigger task than adding access control and identity... :-) There's also been quite a lot of overlap between people working on Solid and working on Blockchains in the past, so we know it well. But we're not really in competition, we'd be fine coexisting.

pnut 7 years ago | | |

Just FYI, it is unclear from your comment, which of these organisations you associate with.

Is "we" Solid or Ethereum?

_5ysi 7 years ago |

<i> He and many others were able to state their critical opinions because they had the Web as an open platform, so they did not depend on anyone’s permission to publish their words. Crucially, the Web’s hyperlinking mechanism lets blogs point to each other, again without requiring any form of permission. This allows for a decentralized value network between equals, where readers remain in active and conscious control of their next move.</i>

For decentralization the root problem always existed, while pointing at another resource requires no permission, receiving and hosting that resource does. Your government has to let you receive it and your ISP has to let you host.

This is a much lower level problem compared to the three challenges Berners-Lee puts forward, which seem to have little to do with decentralization.

1. taking back control of our personal data;

2. preventing the spread of misinformation;

3. realizing transparency for political advertising.

ilaksh 7 years ago |

I think if you add some cryptography to Solid and use JSON-LD and pick some schemas and not expect everyone to implement OWL and then get a usable naming scheme for IPFS (or replace IPFS with something similar with names that work) and then create some P2P Solid servers then this could work pretty well.

mikob 7 years ago |

What's to stop a service from having a pod that stores user's data in a mutated form? (Forgive me for the basic question).

jimsmart 7 years ago | |

Signature hashs of the files, (possibly/probably) with those signature hashs being further signed/verfied with the user's key, to establish trust. With further chains of trust through key signing, if needed.

TheMagicHorsey 7 years ago |

Somebody should make an easy to install home server with a standard API to access data. People will start decentralizing on their own when you can just buy a box and do some basic configuration, and have a secure home web server. And then developers will build on the decentralized platform because it has users.

dabockster 7 years ago |

> Since 2010, no single browser has gained more than two thirds of global market share anymore

What about Google Chrome?

kornork 7 years ago |

Maybe I missed it somewhere, but I didn't quite follow how this is going to gain a foothold. Solid has the same problem any new social media platform has - before people want to use it, people have to be using it. Facebook and Google certainly have no incentive to promote it.

kjetilk 7 years ago | |

Oh, but Solid isn't just a social network. True, social networks have really powerful network effects, so it is a key to success, but not the only key. We're separating data from apps, which enables permissionless innovation. That means a lot of people can start writing cool things that they just can't now, because they are constrained by those platform companies. We're doing that too. And once people start doing that, every useful app that comes to Solid will grow the platform, first probably as small communities here and there, and then those communities get new connections, and boom, disruption! :-)

rubenverborgh 7 years ago | |

Google might. They do more than advertising, including selling services. They might happily make Google Drive compatible with Solid—especially if this helps break the Solid monopoly.

Facebook, probably not so much. Their business model is data harvesting.

Regarding Solid, note that we don't want to overthrow or replace any existing social networks. We start with offering experiences they cannot offer due to their siloed nature.

carlsborg 7 years ago |

The specification (set of protocols) is here https://github.com/solid/solid-spec

ngcc_hk 7 years ago |

Missed that about 20% humanity is under a walled national garden. If you have a protocol that are individual or home oriented, would it be allowed to work even.

EGreg 7 years ago |

Since 2010, no single browser has gained more than two thirds of global market share anymore

Pretty sure Chrome did. Or WebKit/Blink family. This is GOOD imho.

rubenverborgh 7 years ago | |

Unfortunately, "pretty sure" is not the way sound arguments are made ;-) Evidence is: https://netmarketshare.com/browser-market-share.aspx?options... and http://gs.statcounter.com/browser-market-share#monthly-20171...

Such a centralization comes with the risk of websites only working with one browser, forcing people to chose a certain device, operating system, and browser vendor.

austincheney 7 years ago |

I don't see any real possibility for decentralization so long as HTTP(S) is the protocol of the web.

zaro 7 years ago |

This sounds like yet another technical solution to a problem that is mostly societal.

sonnyblarney 7 years ago |

All of this is very academic.

Regular people and businesses are always going to make the decision in front of them.

'Decentralization' unto it's own, is not something anyone directly cares about. People care about privacy, somewhat, but there are other paths to privacy, or at least, consumers may very well believe there are.

Decentralization will only happen with a real impetus: a product or service that facilitates it, that people want, either for issues related to decentralization, or, more likely for some other reason that just happens to facilitate decentralization for some other, related reason.

pbalau 7 years ago |

This is a load of ... You can't descentralize the web for 2 reasons: DNS and SSL. And then you have the IP organisation, the name escapes me right now.

kjetilk 7 years ago | |

Oh, but that's more a matter of where you start and what you bootstrap.

In both cases, DNS and TLS CA-based stuff is about trust. You need to trust the DNS server, as there could be malicious servers sneaking in, and you need to trust the cert.

But once you have a social network with a large strong set, you could base the trust on the strong set, and in particular, individuals in that strong set who can demonstrate that they have a clue.

Once we have that, we can get rid of these achilles heels, but quite frankly, I don't believe in a strategy that takes on those problems first.

Sure, I obviously got OpenNIC in my DNS resolution. Haven't once seen an address that required me to use it beyond when I set it up. I think our approach is much better. Base it on people and the strongest part of their network.

troquerre 7 years ago | |

There are projects aiming to decentralize DNS and SSL as well. Handshake[1] decentralizes root DNS and enforces ownership through public-key cryptography. This gets rid of the need for Certificate Authorities which decentralizes both DNS and SSL in one go.

Disclaimer: building a registrar[2] for Handshake so we're pretty excited about it!

[1] https://handshake.org/ [2] https://namebase.io

peterwwillis 7 years ago | |

Both DNS and TLS PKI (nothing uses SSL) are decentralized by design.

troquerre 7 years ago | | |

DNS at the root level is not decentralized. .com, .io, .net etc are centralized and even owned by for-profit companies.

pbalau 7 years ago | | |

Who decides I can or cannot get foobar.tld?

alexashka 7 years ago |

Huh...

> The situation becomes problematic when we are robbed of our choice, deceived into thinking there is only one access gate to a space that, in reality, we collectively own.

Robbery - the action of taking property unlawfully from a person or place by force or threat of force. [0]

Deceit - The action or practice of deceiving someone by concealing or misrepresenting the truth [1]

That's what those words mean. They also have nothing to do with anything that has happened with the internet over the last 20 years.

[0] https://en.oxforddictionaries.com/definition/robbery

[1] https://en.oxforddictionaries.com/definition/deceit

fwip 7 years ago | |

I don't think quoting the dictionary here makes you look especially smart.

Are you railing against the use of "rob" with an intangible noun? Would you cry foul at phrases like "robbed of their dignity?" Do you ignore alternative definitions like "to deprive of something unjustly or injuriously?"[0]

Do you believe that nobody involved in centralization conceals or misrepresents the truth? Does a marketer never overstate the benefits of their hosted solution?

[0] https://www.dictionary.com/browse/robbed