The Need for Web UDP(github.com) |
The Need for Web UDP(github.com) |
> 1. Security - it has to benefit from SSL.
There is a lot more to security than transport-layer encryption and authentication.
> Connection based
UDP is too hard, so re-inventing TCP?
> p2p is not reliable ... monetization
Oh, you want require the SAaSS[1] model.
> Simple to use
The already-stated "requirements" are asking for something more complex than WebRTC.
> Minimum header overhead
Wait, are you thinking about using UDP to transport HTTP?! Do you even know what your MTU is?
> WebRTC suffers from complexity
That complexity exists for reason. Nowhere in this document is a discussion of the potential problems of using UDP or the ways tthe new service might be exploited by malicious actors.
[1] Service as a Software Substitute
You are welcome to PR.
> Connection based
This could be explored: starting from simple handshake, all the way to fully connection based protocol. Open for discussion based on developer needs.
> The already-stated "requirements" are asking for something more complex than WebRTC.
You are welcome to highlight specifics that makes you think that way.
> Wait, are you thinking about using UDP to transport HTTP?! Do you even know what your MTU is?
UDP is not streamed but message based protocol. As WebSockets implement their transport layer over pure TCP, WebUDP could implement it's own layer over pure UDP for various reasons.
> > WebRTC suffers from complexity
> That complexity exists for reason.
For P2P type communications, this complexity is perhaps reasonable.
For Server-Client type communications not at all.
> Nowhere in this document is a discussion of the potential problems of using UDP or the ways tthe new service might be exploited by malicious actors.
This document is initial effort to bring public discussion to form a reasonable shape of what WebUDP could look like. You are welcome to participate.
I'm very[1] familiar with the IP family of protocols.
> Open for discussion
If you don't know what your requirements are, you shouldn't be choosing a transport technology. It sounds like you want an library that wraps WebSockets or WebRTC and handles most of the complexity.
> WebUDP could implement it's own [transport] layer over pure UDP
Then you want TCP. The only reason to use UDP is to avoid the complexities of a transport layer. Transport reliability is very hard; this isn't something that is easy to re-implement by yourself in UDP.
More importantly, I take it you don't know what your MTU is? The Maximum Transmission Unit[2] the maximum packet size. On ethernet-based networks, it's probably ~0-100 octets less than ethernet's 1500 octet MTU. You need to keep UDP packets under this limit, or they will fragment. Fragmented IP packets may not arrive at all and even when they do, the OS will wait until all fragments arrive before passing the data up to the app. If you're insane and send HTTP headers in each packet, you've wasted most of your data space. Each packet? Shouldn't we send headers in the first packet only? Except that every packet IS the "first packet" in stateless protocol like UDP. It's the transport features of TCP that create ordered-data semantics.
[1] I used to write firmware for embedded systems. That included writing - from scratch, in Z80 asm - the entire network layer: ethernet (RealTek), ARP, IP, UDP, TCP, SNMP, HTTP, etc.
As soon as your message exceeds the MTU, things get complicated. Sure you can layer something to re-assemble, but if packets are dropping, this is going to start getting problematic really fast. And if packets are not dropping, then TCP shouldn't overly increase latency anyway.
WebRTC data channels in unreliable mode will work just fine. Is it as easy as opening up a WebSocket connection? No, it's not. Is it as easy on the server side as accepting a WebSocket connection? Also no.
But it really isn't that hard[0], and people have built libraries to help you out. So just use one, and move on with your life.
And you also benefit from a standard that has been fleshed out over multiple years by some very smart (if imperfect) people.
On the browser side, it's already supported by all major browsers, with the notable exception of iOS Safari (which should change this fall with the release of iOS 11)[1]. Even though it's not ideal, you can fall back to WebSocket for the few holdouts.
[0] Source: I've done it before, building it from scratch.
[1] https://gafferongames.com/post/why_cant_i_send_udp_packets_f...
IMO the most complex parts of WebRTC are the SRTP-DTLS encryption (except you also specified TLS as a requirement for Web UDP), and STUN/TURN (which are optional and not required for client-server).
I wish more Firefox OS APIs had become web standards. They would allow for some very powerful PWAs.
[0]: https://developer.mozilla.org/en-US/docs/Archive/B2G_OS/API/... [1]: https://developer.mozilla.org/en-US/docs/Archive/B2G_OS/API/...
The biggest problem with Firefox OS's TCP standard is that it used an event-driven model, which is somewhat at odds with more current promise-based thinking. The more natural version is to use something based on WHATWG Streams, but that spec has been stuck in vaporware-land for years.
> I wish more Firefox OS APIs had become web standards. They would allow for some very powerful PWAs.
The TCP specification actually was undergoing standardization (reformatted to use the streams API): https://www.w3.org/TR/tcp-udp-sockets/ . The problem was the working group ended up closing down, and since the specification wasn't suitable for use on webpages, it ended up with nobody to host it.
In Web context this approach wouldn't work and would lead to security issues. Just like there was need or WebSockets (TCP), there is need for similar API for UDP but it cannot be pure access for creating UDP connections as this leads to many security concerns.
And this is collaborative effort, not personal. So all input is welcome.
I've used WebRTC for p2p and server-client cases, and it is nightmare for later. And many other developers have expressed very similar experience when it comes to server-client cases.
Even more, after many years we see very little adoption of WebRTC for server-client cases due to it's complexity. WebSockets on the other hand took very little time to get adopted by many back-end platforms as well as browser vendors. I wrote my own WebSockets solution long time ago on .Net before 4.5 .Net was released (includes own WebSockets implementation).
It will be fast only in the beginning when a few clients are participating, but then screw over the infrastructure with degenerative congestive behaviors when "everyone" is on it. And by then, it will be a standard everyone is stuck with, with the only way out being to complicate it with a tirade of hacky refinements based on guesswork combined with crossed fingers.
That's not even considering malicious interference: what sorts of attacks will be discovered on this new UDP based shit, and what sorts of hacks will be required to mitigate them.
Since most traffic for games is server->client, most of the congestion will happen when several users are competing for the same customer link (DSL or cable modem). This already happens with streaming services, and people just yell at each other to stop downloading updates while I'm watching Netflix.
Indeed, the subscriber lines and surrounding edge hardware have not kept up with the times. Depending on where you are and who your provider is, chances are you're getting the same shitty line rates you had ten years ago (or more), though you have more memory, a bigger hard drive and a faster CPU, and the backbone is faster.
No one really wants to support a network with the evil of arbitrary UDP from the browser. In SCTP, the handshake combined with crypto tricks can allows a server to make sure the initiator stores a larger cookie than it needs to hold for verification, throttling the DDOS riffraff.
Making an SCTP web standard would improve endnode support (and actual app use) which is beginning to wane and are SCTPs adoption problems.
I mean, if you're streaming audio and video Real Time, is there really any point to TCP? If a few frames get dropped, then bursted back once the connection stabilizes, does that improve the user experience in any way?
WebRTC seems like a perfect candidate for UDP communications for the actual media streams.
The goal of the topic is to explore simple option for server-client communication using low-latency communication, without reliability and without ordered delivery.
WebRTC can be used for such case, although it is not designed for it. Due to that implementation is very complex and not much adopted. This is something we trying to explore, either new API or simplifications to WebRTC to make it simple choice for UDP in server-client scenarios.
With UDP you have to create your own feedback mechanism to find the optimal bitrate to stream to the far side at.
At the moment, I see a lot of ridiculous stuff like phone apps talking to some cloud instance which tries to jam the packets back through your firewall into your Internet light bulbs. Congratulations, you literally just used thousands of kilometers of fiber and billions of dollars of routing infrastructure to make the world's most expensive how many... light bulb joke.
I suppose you'll still need to deploy a stun/turn server to deal with the NAT issues unless you're happy with IPv6 only, but that's not really something the standard can fix.
I would propose having two HTML profiles in future, HTML document, and HTML application (and maybe HTML legacy). HTML document would be restricted in what you can do, and would be primarily for reading Hypertext. For HTML application you would have to go through a few clicks to install or activate it - now you are going to say that people will just click it away, but that is already the case with current desktop app installers, so it is not more insecure! An application profile page will be able to access the net just like any other native application. Most importantly, it will be able to bypass same-origin policy and send UDP and TCP anywhere - but not with credientials of course.
You'd still have the problem of being able to probe internal networks, and being able to manipulate UPnP routers. For the first, the network admin could have a group profile setting or similar to disable this kind of access. For the second, browsers could selectively block this on a case-by-case basis if needed.
For the problem of DDOS, I think we should not let that restrict us from implementing useful technologies. Rather we should fix it at the source. For example, maybe one could lock down certain routes if an attack is detected. All traffic along these routes is throttled, unless you send along a proof-of-work token. I'm just making this up, but my point is that I think we haven't exhausted all options here.
WebRTC isn't very complicated.
The hardest part is probably ICE, which basically involves each point telling eachother what they see, and potentially consulting a third party (STUN/TURN). I'd love to see more magic there, but once that's in-place, I don't see what's so hard about just using DataChannels.
One idea might be to put signalling into HTTP headers, e.g. have the client and server introduce something like:
ICE: sdp-desc...
and if so, allow WebRTC to skip the ICE negotiation step if speaking to the server.The only way to use HTTP/QUIC for packetlike communication might be to send each packet inside a seperate HTTP request. But I guess that will have a super high overhead (lifecycle of a whole stream must be managed for a packet which actually has no lifecycle) and will most likely also not deliever the expected results (afaik HTTP on top of QUIC still has head-of-line blocking for request/stream establishment. Request headers must be received in order).
New javascript APIs which are utilizing QUIC could work. However one would need to explore if QUIC is actually helpful for target applications, since it provides a stream semantic, whereas UDP is purely packet based. QUIC might also introduce similar issues like WebRTC to the server side: It's a complex protocol spanning multiple layers (everything from flow-control to encrytion). Therefore it will be hard to integrate into environments where no QUIC library is available. But that's only a feeling, since I haven't yet reviewed the QUIC specification in detail.
Will the documentation/RFC's encourage folks to fail gracefully if UDP is not supported in their network?
Could this spec include support for SRV records? It isn't allowed in http/1.1.
... /s (hopefully)
It exposes many security concerns, that's why WebSockets were more favourable over TCPSocket. We want similar for UDP.
Anyone knows anything about that?
"This ISG is seen as a transitional group i.e. a vehicle for the 5G community (and others of interest) to first gather their thoughts and prepare the case for the Internet community’s engagement in a complementary and synchronised modernisation effort."
The efforts seems to be in an quite early stage for now (architecture, models, requirements, etc).
I personally don't see TCP/IP going anywhere with 5G, but we may see more parallel deployments of protocols within isolated 5G network slices.
[1] http://www.etsi.org/technologies-clusters/technologies/next-...
It'd probably be lighter weight than WebRTC, which is IMHO an over-engineered nightmare. I'd like to see just the A/V encode/decode parts of WebRTC live on and the rest of it get deprecated in favor of web UDP and open-ended browser based P2P implementations. That's what should have happened, not a monolithic katamari ball of a standard.
The SSH and mosh apps use a binding between the POSIX socket interface on the Native Client side and these APIs on the JS side.
WebUDP wouldn't necessarily make the situation any worse.
1. The receiving end must consent to receiving packets by completing the DTLS handshake required by WebRTC.
2. The browser enforces SCTP congestion control to avoid spamming large numbers of UDP packets. Basically, it'll start throttling you when it detects packet loss.
The only way this could be done safely is if you are required to establish a TCP connection, and then "upgrade" it to UDP. There are too many pointy sticks that developers could impale themselves on, with the side effect of creating DDoS vectors.
Since it's restricted by origin (probably using literally the same mechanism CORS uses) and over HTTPS, a malicious actor can't DDoS anyone but themselves.
If you're really worried, maybe extend it to allow the server to limit the duration of the permission and the maximum bandwidth, but I think you don't need that.
EDIT: Oh, oops, binding to the hostname doesn't help because you can just repoint your hostname in DNS at someone else. You'd ideally need to bind it to the IP, in which case you definitely want it to be a time-limited permission. (But I think that attack is also feasible, though a bit harder, with the handshake-before-each-UDP-connection model. Just have the attacker set up a custom DNS server for their domain, which for any hostname sends you a low-TTL response pointing to the attacker's HTTPS server, and then a higher-TTL response to the victim. Each user gets pointed at a random hostname within that domain.)
UDP is arguably faster in some ways that is handled, provided the application load is managed well.
Implementing a WebRTC data channel endpoint in a server is not for the faint of heart, though. You would have to implement a lot of complex RfCs.
No same-origin-policy would be lovely combined with XSS vulnerabilities.
Suddenly all the visitors of that website would be doing DDOS on a random host.
Also, people already exploit XSS for DDOS-ing, although not via UDP, but TCP/HTTP. Granted, you can possibly make a worse attack if you have UDP.
If every web page would effectively reinvent their own transport-level protocol* including subtly different connection handling, congestion behavior and drop/reorder tolerance, that sounds like it would make life a nightmare for all network intermediaries.
(* I know the transport-level protocol is technically UDP but that doesn't count because all he practically relevant aspects are defined on top of it)
Also, ICE etc solves a real problem - that users of Web UDP would have to deal with as well. Why demand that everyone reinvents the solution for themselves if we can include a standard solution in the browsers for everyone?
Yes, if you just consider a client-server model where the server has a public, routable, non-firewalled IP, then you can do away with ICE/STUN/TURN, and it gets simpler. But that's not what WebRTC was designed for.
The SDP stuff might seem a bit arcane, and I would agree with you, but it's a widely-used telephony industry standard and there are libraries that will generate/parse them for you without you needing to do much.
Bona fides: I've implemented server-side WebRTC in C++ and Java. Was it trivial? No. Was it ridiculously difficult? No.
Building blocks and programmability are better for a long lived platform like the web than giant inflexible monoliths. UDP and web assembly lets you implement the stack you describe and anything else that comes down the pike.
Due to that, WebSockets were created were handshake is handled by browser transparently from developer ensuring port scanning is not possible. As well as to preserver origin-based security model of HTTP. Which we want for WebUDP as well.
UDP - is something been used by many industries for long time and is well known protocol to make a foundation from.
If you compare it to speed of adoption of WebSockets, variety of WebSockets implementations and how much it is used commercially.
There are many conversations by many developers who as well attempted using WebRTC for server-client communications, and it is apparent trend.
One of the developers of WebRTC team at Google who worked on DataChannel and network code admits himself that their team is aware of complexity and difficulties with WebRTC in server-client cases: https://news.ycombinator.com/item?id=13266874
WebRTC is specifically designed for server-client cases. In the case of video and audio, the server is called a "mixer" or MCU. This is extremely important for large multi-user conferences - if it weren't for a mixer, you'd have to upload your video once for every user, burning all of your upload bandwidth. Examples of services using this are Appear.in Premium and Cisco Spark.
Note that games are the #1 use case mentioned in the Datachannel draft [1]. It's perfectly fine to argue that the standard is bad, but it was designed for these use cases.
The Google developer you linked seems to be arguing for improving the WebRTC C++ library, not changing the protocol.
[1] https://tools.ietf.org/html/draft-ietf-rtcweb-data-channel-1...
This is exactly what I pointed out: WebSocket has much larger implementation base across browsers and back-end due it's simplicity.
> There is also a list of various transparent fallbacks that can be implemented like long polling which WebRTC doesn't have.
Those fallbacks were temporary solutions during adoption of WebSockets which today are obsolete as WebSockets are well adopted and pure WebSockets implementation is fairly simple.
This is what we want from WebUDP for server-client cases, which we can fallback to WebSockets if WebUDP is not supported.
There is also a list of various transparent fallbacks that can be implemented like long polling which WebRTC doesn't have.
You could handle it with some kind of cookie/token that enables the LB to route to the right place, but that opens a whole bunch of other complicated logic too.
Whereas if the UDP application/server were able to handle it independently of HTTP(S), you wouldn't need any of that.
The userspace SCTP stack that gets built into Firefox for use by WebRTC will do it too.
Similarly, once you have a userspace SCTP stack you have allowed the garbage to reach userspace resources.
Everyone knows how to not be a jerk while using UDP or SCTP, but people who have the goal of being a jerk are more manageable if you only give them remote SCTP access.
My home firewall is set up to allow anything originating here to pass but block most things from outside. For this to work the firewall needs to be able to track the state of the protocol exchange which will be different for each protocol. Few firewalls can do this for SCTP or DCCP yet, I'm in the process of adding SCTP support to the one that I use.
But SCTP implements reliability and ordered delivery making it more of an alternative to TCP, than a solution for low-latency communication with cases where reliability and/or ordered delivery is not required.
[1] https://en.wikipedia.org/wiki/Datagram_Congestion_Control_Pr...
> SCTP supports the transportation of user messages that have no
> application-specified order, yet need guaranteed reliable delivery.
But most people don't really care (that insisting on re-delivering out of date packets will use some bandwidth,) they just have an existing UDP app they want to port to the web and converting it to SCTP is easy enough (I think I've even used an LD_PRELOAD to convert for that? But I may be confusing a different ULP substitution.)
Making them do that and set everything up is actually a good mechanism for protecting the rest of the internet and allowing a web SCTP that browsers could enable even by default, with less risk of in page ad hijackings, etc. Allowing UDP is something that I hope every browser leaves as impossible to allow without going into configuration or being in an entirely different use context of web APIs than a browser.
And while you seemingly have a lot of experience with networking programming, and I have hardly any, I think I can easily correct you on the argument of a small MTU.
Many websites that have a more real-time networking need, currently use Websockets over a TCP channel separate from the HTTP stream. Over this channel many small messages are sent, for example updating the player's coordinates in a game. It is tolerable that some of these message may be delivered out of order or may be lost.
Wouldn't that be good fit for UDP?
Aside from that, you say that "the only reason to use UDP is to avoid the complexities of a transport layer", which I think is exactly right. You then say "transport reliability is very hard", which I can imagine it must be.
But then transport reliability is only one of the possible features of a transport layer. The whole point is that when you have access to UDP as a transport primitive, you can pick and choose which transport layer features you do and don't require on top of that.
> Wait, are you thinking about using UDP to transport HTTP?! Do you even know what your MTU is?
> If you're insane and send HTTP headers in each packet, you've wasted most of your data space.
I think header in the requirement just means protocol/PDU header, which is required to deploy any kind of higher level protocol on top of UDP. Just like the UDP header and the IP headers are used at the start of data received in ethernet frames. These headers can be anything depending on the use case, even only a single or even zero bytes. The whole document does not mention that sending HTTP headers on top of UDP (which obviously could cause MTU problems, but nevertheless this kind of mechanism was standardized with CoAP).
Great, you have an expertise to contribute. And with this discussion you already are.
> If you don't know what your requirements are, you shouldn't be choosing a transport technology. It sounds like you want an library that wraps WebSockets or WebRTC and handles most of the complexity. There are "libraries" that wrap over some protocols. They don't make adoption of underlying technology easier, WebRTC in this case.
This effort is to establish clear requirements, not yet to propose exact solution. So this discussion is just a process of establishing those requirements.
High level requirements are clearly stated in the document, and important one is: simplicity. Currently there is no simple solution for low-latency server-client communication.
> > WebUDP could implement it's own [transport] layer over pure UDP > Then you want TCP. The only reason to use UDP is to avoid the complexities of a transport layer. Transport reliability is very hard; this isn't something that is easy to re-implement by yourself in UDP.
Layer on top doesn't have to involve full on replication of TCP techniques or alternatives. Just like WebSockets only adds very little to pure TCP in form of a data framing (header) for messages.
> More importantly, I take it you don't know what your MTU is? The Maximum Transmission Unit[2] the maximum packet size. On ethernet-based networks, it's probably ~0-100 octets less than ethernet's 1500 octet MTU. You need to keep UDP packets under this limit, or they will fragment. Fragmented IP packets may not arrive at all and even when they do, the OS will wait until all fragments arrive before passing the data up to the app. If you're insane and send HTTP headers in each packet, you've wasted most of your data space. Each packet? Shouldn't we send headers in the first packet only? Except that every packet IS the "first packet" in stateless protocol like UDP. It's the transport features of TCP that create ordered-data semantics.
This is really valuable input, and is exactly what is needed to fuel further requirements for an implementation.
You know, you can provide valuable input without trying to "argue" ;)
"Alternative answer: be careful to not reinvent the wheel.
TCP is the product of decades of networking experience. There is a reason for every or almost every thing it does. It has several algorithms most people do not think about often (congestion control, retransmission, buffer management, dealing with reordered packets, and so on).
If you start reimplementing all the TCP algorithms, you risk ending up with an (paraphasing Greenspun's Tenth Rule) "ad hoc, informally-specified, bug-ridden, slow implementation of TCP".
If you have not done so yet, it could be a good idea to look at some recent alternatives to TCP/UDP, like SCTP or DCCP. They were designed for niches where neither TCP nor UDP was a good match, precisely to allow people to use an already "debugged" protocol instead of reinventing the wheel for every new application."
And you are right - we do not want reinvent the wheel.
And once you have that permission, you never need to make an HTTPS connection again.
What you do care is that your actual application traffic, such as your login session, and your UDP traffic have some way of being associated with each other, but you have that problem regardless of how the client (whether it's a browser client or a normal desktop/mobile app) gets permission to send UDP.
As an example, the user could visit example.com, load some HTML and JS, send a request to login.example.com, get a session key, send a single HTTPS request to data.example.com exactly once, and then send UDP to data.example.com protected by that session key. You never send HTTPS to data.example.com again; from this point onwards you only send UDP. Coordination between your HTTPS login server and your UDP data server is no different in this model from the native app model.
I guess what I'm trying to say is UDP could work, but I think trying to bootstrap the initial connection/session info with HTTP is probably going to be FAR more trouble than it's worth.
Doing that with NAT is even trickier. Take a look at the way some firewalls need to configure a DMZ for gaming, or SIP for some examples.
The point that seems to be getting lost is not that I want SCTP or DCCP support. Its that I don't think anyone should accept anything that could become a UDP arbitrary access loophole. The point of the current path is to replace the problem techs and add use cases safely as we go to gradually pay for a better network by making standards that aren't just the easiest thing for web-devs.
Every time someone tries to walk too close to the edge in a way that can open security problems for people who aren't running a server fully opted-in to web 2.0/etc, they risk a security backlash that could ban browser updates and effectively delay/kill unrelated innocent features and fixes with ripples for ~5-10 years.