Building a BitTorrent client from the ground up in Go (2020)

Building a BitTorrent client from the ground up in Go (2020)(blog.jse.li)

184 points by stevenmh 1 year ago | 89 comments

pzmarzly 1 year ago |

It is easy to make a torrent client, but very hard to make a good torrent client. A very good, or let's say "perfect", one, needs to support multiple transport protocols (TCP, "uTP" aka UDP, "WebTorrent" aka WebRTC), multiple discovery mechanisms (DHT, PEX, HTTP trackers, WebSocket trackers), multiple torrent formats (v1, v2, hybrid), should use the network optimally (max the speed without overloading the network - IIRC some clients measure average packet latency and if it starts going up, put some backpressure), resolve magnet URLs, set up port forwarding, reconfigure firewalls, offer API for *arr stack, be a good netizen (report stats correctly, send packets within the specs, do not spam - otherwise other clients will blocklist your in their code or config), implement many BEPs (mutable torrents are cool), be able to recover from interrupted state based on only the data that's on disk, have configurable downloading order (people want to start playing videos before they finish downloading, so you may want to e.g. download header and footer of each file first), and ideally detect duplicates between torrents (cross-seeding). And then there will be people throwing 2TB+ torrents at it (e.g. TLMC) to benchmark it, and saying your client is "literally unusable" if it doesn't handle it.

So, building a "perfect" torrent client from the ground up is a daunting task. But the "good" news is that nobody built such a "perfect" client just yet, so if you have some spare months of your time, you can take a shot at it. Or even better yet, open the issue tracker for one of popular clients or libraries, and add one of the missing features from the list above.

dgroshev 1 year ago | |

Few more challenges:

- seeding a million active torrents (this is entirely possible and needed but drives a lot of architecture)

- a UI for a million active torrents

- be nice to trackers (keep-alive, batched reports; again, architecture gets tricky, iirc libtorrent can't do that still)

- correct pipelining

- congestion control on uTP

- I/O scheduling/batching

- testing all of the above

It is quite a lot.

blacklion 1 year ago | | |

Looks like we have same experience with torrents :-)

anacrolix 1 year ago | |

My client anacrolix/torrent has most of that, including streaming, hybrid, v2, all trackers and protocols.

I don't do much port forwarding, it's too fiddly. I think I provide upnp out of the box and few other variants that were contributed.

I also have all the DHT extensions, like mutable torrents and get/put etc.

Large torrents do pop up occasionally, but it's been a long time since someone found a performance issue there but it used to happen every few months. Lots of optimisation.

GitHub.com/anacrolix/torrent

a-french-anon 1 year ago | | |

Any clean room impl. is good news to me, but is the support for private torrent/tracker complete (https://github.com/anacrolix/torrent/issues/531)?

NB: especially after https://github.com/Luminarys/synapse died and Transmission "botched" its 4.x rewrite (currently on 3 with some backported patches).

pzmarzly 1 year ago | | |

I know your library does a lot, I have been your GH Sponsor for months ;) Thanks for reminding me about it, I wanted to up the amount. And to thank you for working on such cool stuff and making the ecosystem better.

I missed the upnp support, thanks for correcting my beliefs.

Which client with WebUI would you recommend the most? I tried using exatorrent and distribyted, but was running into bugs.

blacklion 1 year ago | |

Also, good disk I/O scheduler needed, or you can thrash even stripped NVMe disks, not to mention spinning rust.

Good multi threading without any global locks is a must or your UI/remote control become unresponsive on big number of active torrents.

Also, it needs daemon mode, which can be compiled & installed without any GUI libs (like QT, GTK, Waynald, libx11, anything like this) and fast responsive remote GUI for all platforms which allows to manage thousands of torrents simultaneously (I'm looking at you, <any-existing-torrent-client>). It could be Web UI, but for now I didn't se usable one. Only thing that semi-works is Transmission + Transmission-Remote-GUI written in Pascal. But it is effectively abadonware and works worse and worse with each Transmission update.

And be scriptable to allow automate change of torrent on tracker, for example (you need to detect it by "Torrent not registered" tracker-specific error answer and re-download torrent building URL from, for example, "comment" field of torrent, again in tracker-specific way).

bool3max 1 year ago | |

> But the "good" news is that nobody built such a "perfect" client just yet,

Which of the qualities you listed are lacking from the currently most popular torrent clients (qBitTorrent, etc..)?

pzmarzly 1 year ago | | |

I am using qBittorrent, and from the top of my head:

- WebTorrent and WebSocket patch for qBittorrent is ready but not merged (waiting on libtorrent's decision),

- cross-seeding support is poor (a separate "cross-seed" binary can be used to set up hardlinks to fool qBittorrent into cross-seeding, but it cannot detect duplicates on its own)

- when it comes to torrent management, there is no way to group torrents into groups with common settings (important if you use multiple private trackers) - people recommend having multiple installations of qBittorrent side by side

- when it comes to reconfiguring NAT and firewall, qBittorrent supports UPNP IGD protoocl, but I am not sure about NAT-PMP and PCP

- I have never seen qBittorrent connect to a single IPv6 peer - so I don't know if the support is there

- download order - you can choose "download in order" or "download rarest first". I dont think "download in order" downloads footers, so mp4 files won't work (IIRC mp4 store metadata in footer, mkv in header)

andai 1 year ago | | |

Personally, "being as small as uTorrent used to be", but clearly that's not a deal-breaker. (Then again, neither is like two thirds of the stuff GP mentioned!)

wincy 1 year ago | | |

Last I tried it their hybrid torrent implementation was very broken and would fail half the time.

thrdbndndn 1 year ago | |

For me, the hardest part of building a good BT client (or any software, really) is getting the UX right.

These days, I use QB after uTorrent's downfall, but even after all these years, its UX still isn’t quite there.

szvsw 1 year ago | | |

As someone who stopped torrenting around 2015 or maybe 2016, what happened to uTorrent? Also RIP what.cd, I still am sad that it is gone.

pzmarzly 1 year ago | | |

qBittorrent + VueTorrent is not perfect, but it's hard to find anything better. If you haven't tried it, take a look.

https://github.com/VueTorrent/VueTorrent

6510 1 year ago | |

Alternatively, a minimal client with few features that can seed large numbers of torrents without crashing or freezing. Settings for how many it may seed at a time, prioritize that what needs seeding, ignore overly seeded things. Options to seed only blocks no one has. Remotely controlled.

anacrolix 1 year ago | | |

I had a super seeder implementation that seeded over 400k torrents simultaneously using some fancy callbacks at certain points in the protocol.

If this is actually a common use case, I can resurrect it into a usable form for the public if there's interest or funding. https://github.com/anacrolix/torrent

Eikon 1 year ago | |

> but very hard to make a good torrent client

> So, building a "perfect" torrent client from the ground up is a daunting task.

Don’t you think you’re exaggerating a bit? It’s not daunting by any stretch. The feature set you described is fairly straightforward and something even a beginner developer could tackle without too much hassle.

I’m honestly kind of tired of seeing people act like doing anything these days is some impossible feat.

dgroshev 1 year ago | | |

It's not impossible, it's just hard to be better than libtorrent, otherwise a new client only makes sense as an exercise.

Arvid Norberg is hardly a "beginner developer", yet look how hard it is to make a performant implementation of uTP: https://github.com/arvidn/libtorrent/issues/3542

pdimitar 1 year ago | | |

Why would it be an exaggerated claim?

Try it yourself, from scratch, and see how quickly you will finish something that will rival and surpass, say, qBittorrent.

None of the tasks is breaking new ground or is creatively difficult. But there's a metric ton of those tasks in order to make a truly good torrent client.

anacrolix 1 year ago | | |

Amen.

newsclues 1 year ago | |

Given the state of software quality, it is remarkable that torrent clients are SO GOOD, given they are free!

anacrolix 1 year ago | | |

There's only a handful of actually complete ones. A few it's surprising they even work, they're crazy complicated, and weird, but once they're stable they need very little maintenance. They're all also written in very different styles due to a mix of C/C++ and very high concurrency.

wslh 1 year ago | |

Yes, even qBitorrent "destroys" your other connection at maximum speeds.

troupo 1 year ago | |

You may also want both a CLI and a web access/management interface

vjerancrnjak 1 year ago | |

All of that without encryption. What a meh

But shouldn’t be hard to do with modern facilities (async runtimes, streaming libraries etc.) I don’t enjoy many clients yet I use them.

It’s definitely a good problem with which to test a programming language. Many fail on parsing, or inefficient network interfaces, inefficient file interfaces, sha sum unusable without intrinsics, etc.

ktpsns 1 year ago |

This looks simple not only because the article is written well but also because Go is the go-to-language for complex networking situations. Doing things in parallel, even pipelining? This would make quite some spaghetti algorithm in C/C++, even async rust/python world would not look so clean as in Go. This is clearly a big strength of the language.

pkolaczk 1 year ago | |

> even async rust/python world would not look so clean as in Go.

That’s quite debatable and my experience is different. There is a whole lot of high level stuff that can be expressed with eg async streams and functional transformation chains in Rust, that Go has no counter offer for. Same for being able to use any future in select/join not just channels. Also I find cleanup / error handling in Rust much cleaner.

cherryteastain 1 year ago | |

Agree with C and old style C++, but going off the flowchart [1] in the article this could be done quite cleanly with boost::asio and C++20 coroutines as well.

[1] https://blog.jse.li/torrent/download.png

johnisgood 1 year ago | |

> Go is the go-to-language for complex networking situations

And Elixir / Erlang for serious ones.

cybrox 1 year ago | | |

Rarely on the client side.

dataviz1000 1 year ago | | |

Unfortunately, BEAM VM is very slow compared to multiple other languages including Go. It's great when starting out with a few developers, however, since Go and Rust are much more performant, it is possible to hire several more Go or Rust developers with the server cost savings instead of being tied to Elixir. There is always a trade off between easy to use and high performance.

But, yeah, if I was going to bootstrap a startup at the seed level, Elixir is the best choice for backend. If I'm spending $500K+ a year on infrastructure, I'll be looking at Go and Rust.

neonsunset 1 year ago | |

And Go is not even the best or fastest language at this either - Goroutines are quite caveman of API, with footguns solved by better languages.

cultofmetatron 1 year ago | |

> Go is the go-to-language for complex networking situations.

laughs in elixir

lqet 1 year ago |

Very cool, and surprisingly simple. I was always mystified as a kid how these file-sharing protocols actually work, but if you just look at a torrent as a collection of fixed-size pieces you are asking peers managed by a tracker for, it becomes really straightforward to implement.

Now I wonder how clients protect themselves against abusers (i.e.. people who never upload a single bit but only download). I often noticed that when I set the maximum upstream to 0, clients would stop sending me pieces. Do clients share a predefined list of configuration parameters with each other or with the tracker (Max upstream, max downstream, etc.)? Or is it something more sophisticated?

snalty 1 year ago |

CodeCrafters has a BitTorrent client building challenge that I watched Jon Gjengset do on a stream, if someone wants to try and build their own client in an iterative and checked way.

https://app.codecrafters.io/courses/bittorrent/overview

DrammBA 1 year ago | |

It's worth noting that this is a paid service offering some free courses for a limited time. The BitTorrent course does not appear to be free at the moment.

kamranjon 1 year ago |

I was just thinking about how incredible BitTorrent was the other day, how it really does outperform non-distributed solutions in many cases and how it is one of the rare examples of that.

One thing I was thinking though was that the finding of Peers seems to be the bottleneck - that if peer resolution could be almost instantaneous, the BT protocol could be used for so many more use cases.

Does anyone know if this part of the process could ever be improved or does that just come with the territory?

SoKamil 1 year ago |

This looks surprisingly simple. A perfect project to learn programming language by creating thing you can interact with and not get overwhelmed by domain complexity.

Do you know more such problems?

jedimastert 1 year ago | |

There's an ad at the end of the post for something called CodeCrafters at the end of the post that appears to be a repository of such interesting problems

TheDong 1 year ago | |

Creating a simple ray tracer for learning a language’s vector math libraries and parallelization tooling.

An FTP client and server.

An IRC client for the older protocol version.

A gemtext parser and gemini client.

A redis clone supporting only the simplest operations.

A brainfuck interpreter.

Twitter.

A todo list program (maybe a little too hard for this list)

HumblyTossed 1 year ago | | |

Oh wow. I remember all the "I wrote a twitter in a weekend. Why does twitter need a thousand engineers?" posts everywhere.

imbusy111 1 year ago |

One thing that I find hard to understand is how you're able to connect to a random peer online with just an IP and port combination. The peer is likely behind a NAT proxy, and never communicated with you to begin with. In my head, NAT shouldn't allow this connection through, or be able to associate the port with the individual peer machine since the connection source (you) is arbitrary.

This is possible with port forwarding. But that's a niche set of peers, who have the power to configure port forwarding on a NAT proxy.

donatj 1 year ago |

This was post number 5 when I started reading it and 45 when I finished, I am guessing some people from the RIAA and MPA found their way to the flag button.

indulona 1 year ago |

i wanted to do the same when i got fed up with bt clients. but then i found there already are go clients and i found i like qbt. but it is a good read nevertheless.