Thinking about 'meta' torrent file format

Thinking about 'meta' torrent file format(gist.github.com)

37 points by mattengi 12 years ago | 23 comments

asdfaoeu 12 years ago |

I've actually been thinking about this a bit as well.

I think you can just avoid the torrent file completely and use a merkle tree hash like how new torrent files work and then you end up with just one torrent file per file. And have peer acquisition work through DHT

Directories would be simple and just a matter of creating a new "file" with hashes and names of the contents like how git directories (extending on this you can have a version control system like git).

A noticeable change is that each individual file is uniquely shared. This I believe is both a feature (avoiding duplicate torrents for the same file) as well as means that anyone can see whos downloading a file a solution would be another key hash which causes the dht id to be hashed again to allow individual darknets.

KMag 12 years ago | |

I agree that advertising single file Merkle tree roots on the DHT is a good thing, and that one could nicely build git-like directory structures, but why force the leaves of the tree to be singleton torrent files?

Why not instead advertise individual files on the DHT by their Merkle tree roots, and put the Merkle tree roots in each entry of the "files" section of the torrent file? This doesn't force re-packaging of existing torrents into singleton torrents. Seeders can advertise single files from old torrents and clients with new torrents can take advantage of this advertising.

I disagree with the munged-key darknet idea. If you want a darknet, run it on a non-public DHT, with cryptographic handshakes and encrypted traffic. Cryptographically munging the DHT keys on a public DHT only creates a "light grey net" that's trivially circumvented and provides a false sense of privacy.

jychang 12 years ago | | |

> Why not instead advertise individual files on the DHT by their Merkle tree roots, and put the Merkle tree roots in each entry of the "files" section of the torrent file?

You can't do that because torrents aren't file delimited, they are block delimited. You can't check 2 files are the same across torrents without first downloading both torrents.

RamiK 12 years ago |

https://en.wikipedia.org/wiki/Metalink

It's in there somewhere...

Edit: Here's a more relevant use case:

https://wiki.debian.org/Metalink

oakwhiz 12 years ago |

To see one method that is used to work around this sort of thing: The folks over at http://www.tlmc.eu/ have been expanding the same 1.2TB collection of files for a while, just by stopping the old torrent, running a Python script to patch the changes, and then rechecking and starting the new torrent from the old directory.

zeitg3ist 12 years ago | |

but doesn't this mean that all the other peers would need to manually upgrade their copy of the torrent file?

plorkyeran 12 years ago | | |

Yes. It works reasonably well in this specific case because it's such a niche thing (you don't download 1.25 TB of Touhou music if you don't really care about Touhou music), but it doesn't benefit from people who continue to seed things they've long forgotten about.

dz0ny 12 years ago |

Private trackers will say no. Public trackers may welcome this...

predakanga 12 years ago | |

There are many types of private tracker that would love to see this - for instance consider gaming trackers, where you may have a single .torrent for a large collection of ROMs, or DLC for a game. Consider TV trackers, tracking a whole TV season with a single .torrent file, or music trackers with discographies.

More importantly, a key concern on private trackers is swarm size - an extension like this would have the potential to expand the available peers on a given file, if the file exists in other swarms on the same tracker. Not a very common use case, but one to consider nonetheless.

sargun 12 years ago |

Is this basically an append-only torrent file? This could actually be implemented without having to do many changes to the torrent format. You can just have the client de-dupe based on file length + hash.

oakwhiz 12 years ago | |

Couldn't you also hash the root of the hash tree for the new appended data with the root of the hash tree for the old torrent? It would be like a hash chain of hash trees, but pointing backward in time.

kovalkos 12 years ago |

Another problem with torrents is compression of files. Compressing a torrent makes it impossible to select only 1 file from a big collection.

JTon 12 years ago | |

This is true. But afaik it's frowned upon and it's not really a big deal in serious communities.

j_s 12 years ago | |

I would think this is a failure of the client, which should support compression formats well enough to be able to fish around inside of the compressed file once it got the metadata portion (zip directory or whatever).

http://en.wikipedia.org/wiki/Zip_%28file_format%29#Design: A directory is placed at the end of a .ZIP file. This identifies what files are in the .ZIP and identifies where in the .ZIP that file is located. This allows .ZIP readers to load the list of files without reading the entire .ZIP archive.

JTon 12 years ago | | |

This is a great idea! I wonder why it hasn't already been implemented

brokenparser 12 years ago |

Perhaps we could make trackers more intelligent and have them combine peer pools, so they create something like a venn diagram of torrents. In addition to telling you which peers are available, it'll tell you what to request from them. You already have all of the file hashes in the torrent anyway, so any wrongdoing here will get discarded.

predakanga 12 years ago | |

Unfortunately it's not as simple as that - when asking each other for data, the individual peers ask for a particular 'piece' of the torrent, where that piece isn't relative to a given file, but the torrent as a whole.

The files are concatenated into one long stream, and the piece number is an index to that, with no guarantees about alignment.

For instance, if you have a torrent (we'll call it 'X') with three files: the 4mb file 'a', the 3mb file 'b' and the 1mb file 'c', and two separate torrents ('Y' and 'Z') describing files 'b' and 'c' seperately, then the pieces would map something like this:

'Y' piece 1 -> 'X' piece 17 'Z' piece 1 -> 'X' piece 29

That's an absolute best case scenario though - in most cases, file sizes aren't quite as perfect as that (each being a multiple of the default piece size, 256kb). If 'b' just happened to be 1373kb, or anything else that wasn't a multiple of 256kb, then any files after it aren't addressable from other torrents.

TheLoneWolfling 12 years ago | | |

Why not?

You just have at most two blocks of additional overhead.

You would have to have where the file begins and ends within the blocks downloaded, but that's already in the torrent file.