Building video chat into my personal website using WebRTC, WebSockets, and Go(mattbutterfield.com) |
Building video chat into my personal website using WebRTC, WebSockets, and Go(mattbutterfield.com) |
This is the version of the js code that I got going (I couldn't reason about straight inline scripting, I had to make unnecessary classes. you dont need them) https://gist.github.com/emehrkay/1ea9a87a91e00b27843d9b71a3c...
You also need to tell nginx to serve the wss connection with http 1.1 or the handshakes fail
``` location /websocket/path { proxy_pass http://whateverSiteDotCom; proxy_http_version 1.1; proxy_set_header Connection "upgrade"; proxy_set_header Upgrade $http_upgrade; proxy_set_header Origin ''; } ```
A bit dense, and could use some error handling...but it actually seems to work fine!
- this blogpost: 100 loc - pion (open source): 100k loc? - dolby.io/ agora: I'm guessing >1m loc - zoom.... even more?
Not too mention all the features that people actually want like muting, toggle video, noise detection/cancellation.
So yeah, setting up a P2P video chat in 2021 is somewhat easy. Until it's not.
So of course it's hard. Nothing is built with video chat in mind, especially nothing that's existed for 30+ years like the Web. Our solutions are janky and feel bolted-on because they are.
Also, I think video (especially live-streamed video) is hands-down the hardest format to work with in computing. It's simultaneously network, disk, memory, and processor intensive, and doubly so with 2+ streams at the same time. We try to fix some of this with compression, but that just makes the codecs more complex, which makes it harder to work with...
Truth is though, you could "just add video chat," if you accept using a video chat vendor, of which there are probably hundreds (WebEx, Google Meet, Microsoft Teams, Discord, off the top of my head). But that means offloading the complexity to someone else. In many cases that's the right call. In OP's case this was clearly meant to be a learning experience so rolling something DIY is of course acceptable. Hard to estimate, maybe, but of course it would be hard to estimate something you don't know anything about and have never done before. Would a 4th grader be good at estimating how long it would take them to learn enough abstract algebra to start publishing papers on it?
EDIT: I guess one part might be that people are less likely to recognize specializations than in other disciplines?
"Oh yeah, and maybe some power, IDK maybe three phase. Also PoE and smart lighting and some insulation and enough room for a CNC machine and..."
It would be better if this sort of thing was heavily caveated ("this is the Hello World of WebRTC") because otherwise a lot of people (non-technical types, junior engineers) see it and think -- well we can do that, should take us a few weeks max.
sean@SeanLaptop:~/go/src/github.com/pion$ find . -type f -name '\*.go' | xargs wc
47871 162998 1394063 total
pion/webrtc is the largest package with 58k lines. Every other package (ICE, DTLS, SCTP....) are all around 20k lines. It feels wrong that WebRTC is so large (and not pushed into sub packages) will for sure be digging into that for fun in the next few weeks :)- A lot of examples: https://github.com/pion/webrtc/tree/master/examples - A lot of tests - if you exclude `_test.go` and `examples/` you are down to ~58k loines, which is only ~3x bigger than the (much simpler!) ICE and SCTP packages.
With a naive exclude via grep -v '_test.go' and grep -v 'examples/*' we are down to:
16180 58136 498113 totalTo stably build a negotiation system you'll probably need an infrastructure of websockets and some kind of nosql db to handle identity and other quirks around negotiation...
Example... how do you handle refresh from a new tab or after the connection has dropped... some kind of device signature is probably needed too!!
(We've just spent a year building this for ecommerce @ https://yown.it)
BIG thumbs up for the interest in WebRTC though enormous potential...
It was really hard to make p2p work and debugging the ice connections was even harder.
So a little click baity title. If the backend wasn't distributed the title would be a little more apt.
If you are looking for a native option use [0] or [1] and you can send anything from ffmpeg to webrtc. ffmpeg itself doesn't support WebRTC so need to use something for the last part.
[0] https://github.com/rviscarra/webrtc-remote-screen
[1] https://github.com/pion/webrtc/tree/master/examples/rtp-to-w...
Have a look at WebTransport to see a future alternative with potential.
For those who are interested, the technical term is signalling (not negotiation), and there are many providers that will help with that (ably.com, pubnub.com, pusher.com), you don't need to build your own infrastructure. WebSockets is also just one option.
Using a SFU/ MCU is almost a requirement for multi person calls, becoming more important for bigger groups.
I had a look at yown.it, I don't know what it does, your description of it is a bit vague. Those problems you mention are not hard to solve: "device signature"? You just set a cookie. Connection dropped? Cookie got you covered. New tab? Cookie got you covered. Refresh? Cookie you got covered.
Other interesting technologies are:
Twilio's network traversal service: https://www.twilio.com/stun-turn
Agora's higher level products (e.g. video call, voice call) https://www.agora.io/en
Given we allow anonymous connections, we need to associate each WebRTC connection with user defined data (read user profile). It's not quite as simple as "a cookie" because one user can have multiple devices, updated user information has to sync across the other connections and for a smooth experience you have to have synced connection statuses.
We did look at syncing all this with RTC data channels, problem... you can't get message history and you also can't depend on the channel until after a successful negotiation, which again for us is only part of the larger infrastructure...
This forces the use of a parallel comms system such as websockets, allowing for event based synchronisation as well as the organisation of the WebRTC metadata both pre and post connection...
Most people don't want "naked javascript" with two faces on it, and WebRTC is a fantastic tool for video and audio streaming, however it is limited in its wider use (which is perfectly fine it does enough!)...
I think the problem is that people associate "video chat" with simply the media streaming, whereas the reality is that integrating it into a feature rich front end framework is significantly more complicated, and not simply a case of "adding a cookie"
The difference between the solutions you posted and websockets is as far as I can tell, "your own websockets" or "pay someone else to run your websockets".
I am working on a Open Source book that includes a WebRTC networking chapter[0]. Would love your opinions/feedback if this would have actually been helpful when learning this stuff!
I too experimented with a p2p golang webchat setup. All the jargon was confusing and very hard to look up. This post has already given me much more clarity!!
1. Backed by an OS manufacturer that doesn't care about the web 2. Spends more time working on features that suit itself than meeting standards agreed upon by a body of which they're a part. 3. The only sanctioned/allowed browser on their platform (MS didn't even achieve this holy grail) 4. Lagging behind most other popular browsers by years in some cases
But due to it being the ONLY browser that'll run on iOS, I have no choice but to dumb down user experience for it. This year's lovely issue has been MediaRecorder - but supposedly that's made it into the most recent release.
Which, as it turns out, is a lot of users. I've seen estimates in the range of 10 to 20% of users. Which means, for a random selection of 7 users, you pretty much have a 50/50 chance of not being able to peer everyone using just STUN.
Unless you're capping the video bitrate, the browser will try to use whatever the browser's default target is, for each connection. On Chrome that's 3mb/s, which is a lot of network bandwidth, and turns out to be a lot of cpu as well just shuffling those packets through the encoding->sending->bandwidth-estimation and receiving->decoding->rendering pipelines.
Capping the video bitrate is more complicated and confusing than it should be. It's better now that the browser implementations are all more or less closing in on "WebRTC 1.0" compliance. But you still need to reach into either the raw SDP you are exchanging during signaling, or the RTCPeerConnection objects, and set the encoding bitrate target.
The SaaS platforms that offer WebRTC APIs and infrastructure all do a lot of work under the covers to set bitrate caps, track constraints (resolution, for example), and other bits and pieces of WebRTC config that work well on a wide variety of networks, devices, and browsers.
There is a little more nuance then just paternalistic networks though. In same cases like NAT Mapping exhaustion you just can't give an individual user multiple long lived mappings. Address Dependendent filtering/mapping also makes sense in some cases. It makes P2P harder, but does give you the ability to provide your users more sessions at least!
https://medium.com/the-making-of-whereby/what-kind-of-turn-s...
If I were to guess, the problem GP is facing is bandwidth, a mesh network uses exponentially more bandwidth. For each user, the bandwidth is linear, N more people requires N more bandwidth. This is fine for downloads, but uploading N more can be much more challenging for certain networks.
Way more for customers that are mostly serving corporate users, of course (firewalls). And more for mobile-heavy user populations.
Actually, that's a good reminder that it would be nice to understand the mobile data networks breakdown in more detail. Most of the US mobile data networks require TURN, as far as I remember when I last looked at this. But I don't know if that's true everywhere in the world.
I wonder if we come back to this in 10 years what this number will be. Linux on the desktop and IPv6 is just around the corner...
"We did look at syncing all this with RTC data channels,", that's when you use a reliable service with additional functionality like history and presence, not WebRTC data channels, that might be why you struggled. It sounds like you should be using WebSockets for this type of data.
It sounds like you're trying to build chat for ecommerce websites, but isn't that Intercom, tidio.com (free tier alternative). Agora is lower level, but also solves these problems and more: messaging, audio, video calls. I don't think any of these offer cross device identification without having users log in on all their devices.
Exactly, we have...
I wrote a little blog about it: https://yown.it/live-video-call-webrtc
If that doesn't explain it well enough I'll just assume you see being intentionally aggressive!
None of the above solutions enable users to easily manage personalized commerce experiences without paying a developer!!
> Exactly, we have...
I guess it also works when using TOR...
And uPNP (Universal Plug and Play) sounds like its for device discovery in the same local network, so again, it doesn't sound related to webRTC, we can connect directly with each other on the same local network anyway.
UPNP has a number of functions, including forwarding of WAN packets to a specific LAN device.
Bahaha, brilliant.
iOS binaries that are not signed by apple are not permitted to mark memory pages as executable if they have previously been writable.
This restriction makes exploiting buffer overruns very difficult on iOS - particularly important as objective-c doesn't give you much help avoiding them.
However, you can't write a runtime compiler unless you can generate bytecode (write) and then execute it, and nobody has found a way to write a performant javascript or CSS engine without some form of runtime compilation.
So, Apple does allow you to write your own browser backends, but they won't give their signature, which would permit you to use riskier techniques to gain performance.
In practice, that means any browser not using the safari engines would be unacceptably slow on the modern web.