Open Protocol for Resumable File Uploads

Open Protocol for Resumable File Uploads(tus.io)

159 points by gluegadget 7 years ago | 43 comments

kvz 7 years ago |

Was just casually (well ok maybe it’s more compulsive than that) browsing HN and was pleasantly surprised to find tus on the front page. I’m one of the core contributors and happy to answer questions. Although it’s late here so it may take a few hours while I’m asleep :)

geekodour 7 years ago | |

I have not looked into tus properly yet. but how does this compare with bittorrent seeding and can both be combined somehow?

kvz 7 years ago | | |

People ask that more yes, on the surface they have a lot in common. Both can be used to transmit huge files, both can chunk files up and only transmit remaining parts, and pick up and resume at a later point in time, and (in case of tus optionally with the Concat extension) send these chunks simultaneously.

Tus however works as a thin layer on top of HTTP, so it’s easy to drop into existing web sites/load balancers/auth proxies/firewalls. BitTorrent ports are often closed off on airports/hotels/corporate networks. But websites work. And if you can access a website, you will be able to upload files to it with tus.

Another difference is that tus assumes classic client/server roles. The client uploads to the server. Downloading is done via your regular http stack and not facilitated by tus. BitTorrent facilitates both uploading and downloading in single clients. It is more peer-to-peer and decentralized in nature, where tus clients typically upload to a central point (like: many video producers upload to Vimeo. Not very contrived as Vimeo adopted tus).

There are more differences (Discoverability, trackers, pull vs push, pulling from many peers at once) but the comment is getting very long so I hope this already helps a bit :)

Happy to dive deeper into this at request tho :)

chillaxtian 7 years ago |

S3 Multi-Part Upload API can be used to chunk an object into smaller parts, which can succeed or fail independently.

https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview....

kvz 7 years ago | |

Yes that is very helpful. Our s3 storage backend for tusd uses it, and our https://uppy.io file uploader does too, usable directly from the browser (so you can choose to not use tus at all with it). S3 resumable uploads do come with a few limitations that make some people still choose tus tho:

* chunks need to be >5MB which can be problematic on flaky/poor connections (rural areas, tunnels, clubs/basements, people on the move switching connections all the time)

* your s3 bucket needs to allow write by the world, or you need to deploy signature authentication

* there’s an s3 vendor lock-in some might worry about

* not an open protocol, no chance of advancing it with the community

That said, that still leaves a large audience for direct s3 resumable uploads and I’m thankful aws offers it!

michaelmior 7 years ago | | |

As far as vendor lock in, it seems like there are a large number of other vendors supporting the S3 API, so this doesn't seem like a huge concern.

eps 7 years ago |

If I read the spec correctly, PATCH method is actually more of APPEND, no?

It would seem logical and practical to allow PATCH to modify any part of a resource that is already present on the server and/or to extend it by appending. This would also make the whole thing useful beyond resuming of interrupted uploads, e.g. to allow for rsync-style updating of existing files.

kvz 7 years ago | |

Yes, APPEND is not an official HTTP method though. Allowing to modify parts at any location makes things a little bit more complex and comes with some overhead. If you do need to upload multiple chunks simultaneously, you can opt into our Concat extension however, which does exactly that. Our latest blog posts has some images to illustrate.

eps 7 years ago | | |

What overhead is that exactly?

My point is that you appear to be pushing for adoption of an extension that handles one specific use case for PATCH, when a more general extension is trivially possible with little to no extra effort.

digianarchist 7 years ago |

The HTML5 FileAPI has been around for a few years now yet a lot of sites don't support resumable uploads. I know it adds a bunch of complexity server side as you have to restitch those pieces together but it makes for a good user experience.

kvz 7 years ago | |

I hope with a client like https://uppy.io and a server like tusd, it’s much more manageable these days. Less boilerplate writing and more battle tested components for sure.

aiCeivi9 7 years ago |

Slight Offtopic - why after so many years Chrome & Firefox have so poor support for resuming interrupted file downloads? In case of Firefox I am almost sure it was better in past. I have to use 'wget -c' or https://www.freedownloadmanager.org/ for bigger filles.

chrisrhoden 7 years ago | |

As I suspect you may already know, this is dependent on the server 1) indicating support for byte range requests and 2) correctly implementing it.

I don't think I have noticed Firefox getting worse at this over time, but I'm not downloading large files every day. Would you be willing to share where you're noticing this?

nikeee 7 years ago | |

It depends on the server, which has to implement HTTP Ranges [0]. Servers like nginx and Apache 2 should suport it. I'm not certain about the whole Node.js and Go backends out there. I think the support in Firefox does not have changed.

[0]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requ...

speeq 7 years ago |

ioquatix 7 years ago |

There is a ruby implementation too: https://github.com/janko-m/tus-ruby-server

kvz 7 years ago | |

Love the work that Janko is doing in our ecosystem! There are implementations for most major languages. So a tus server could even just be some php code that you install with composer and add to your existing Apache setup.

amelius 7 years ago |

Finally a Request-For-Comments that actually contains a comments section!

treve 7 years ago |

This came a long way since 2013. Congrats, looks very robust now!

kvz 7 years ago | |

Thank you for the kind words!

JdeBP 7 years ago |

Zawinski's Law needs some revision. Not only do WWW apps expand until users can chat asynchronously, but WWW protocols expand until they incorporate ZMODEM. (-:

silvestrov 7 years ago |

Tus-Version: 1.0.0,0.2.2,0.2.1

seems like over design. The list will get very long over time.

Just use a single integer instead and have the header include min and max version supported. E.g.

Tus-Version: 1-4

meaning it supports version 1 thru 4. No reason to be able to say version 1 and 4 but not 2 and 3.

kvz 7 years ago | |

We are discussing this very topic here https://github.com/tus/tus-resumable-upload-protocol/issues/... — it has stalled a bit so I would be very happy to see you or other interested/concerned HN readers weigh in. People sharing concerns on GitHub is the main way the protocol has progressed.

aaaaaaaaaab 7 years ago |

What’s wrong with HTTP PUT with Content-Range?

treve 7 years ago | |

  > An origin server that allows PUT on a given target resource MUST send
  > a 400 (Bad Request) response to a PUT request that contains a
  > Content-Range header field (Section 4.2 of [RFC7233]),

https://tools.ietf.org/html/rfc7231#section-4.3.4

zzo38computer 7 years ago | | |

Maybe that should be fixed, then. HTTP PUT with the range specified seem to me it would be sensible.

dcbadacd 7 years ago |

Uhh, 206 partial content??

wtfrmyinitials 7 years ago | |

206 is for downloads, not uploads.

dcbadacd 7 years ago | | |

Oh, right. I misread the title.

eximius 7 years ago |

Is there a TL;DR? I see the whole spec is there but I don't have time to read it just this second.

Does it use anything fancy like fountain codes or does it just renegotiate chunks each time or something else?

kvz 7 years ago | |

The latter.

1. The client POSTs, this allocates a unique Location which the server returns and

2. the client saves this (e.g. in localStorage) along with local file identifiers so it can be looked up later and can

3. query that URL to check how many bytes were already received, and then

4. PATCH the remaining bytes

Repeat step 3 & 4 on failures/resumes.

emersion 7 years ago | |

You basically just send the offset when you resume the upload.

gsich 7 years ago |

rsync?

Yes I know this is mainly for browsers.

kvz 7 years ago | |

Yes for browsers it’s cheaper to build upon http, and it let’s you move through airport/hotel/corporate firewalls without problems.

Tus is also used in datacenters for high throughput & reliable transmissions. Probably in most cases rsync is a sensible choice, but sometimes maybe you already have tus, http based auth, loadbalancing, etc in place that you want to leverage, or maybe you want to avoid exchanging ssh secrets

> An origin server that allows PUT on a given target resource MUST send > a 400 (Bad Request) response to a PUT request that contains a > Content-Range header field (Section 4.2 of [RFC7233]),