Thinking about 'meta' torrent file format(gist.github.com) |
Thinking about 'meta' torrent file format(gist.github.com) |
I think you can just avoid the torrent file completely and use a merkle tree hash like how new torrent files work and then you end up with just one torrent file per file. And have peer acquisition work through DHT
Directories would be simple and just a matter of creating a new "file" with hashes and names of the contents like how git directories (extending on this you can have a version control system like git).
A noticeable change is that each individual file is uniquely shared. This I believe is both a feature (avoiding duplicate torrents for the same file) as well as means that anyone can see whos downloading a file a solution would be another key hash which causes the dht id to be hashed again to allow individual darknets.
Why not instead advertise individual files on the DHT by their Merkle tree roots, and put the Merkle tree roots in each entry of the "files" section of the torrent file? This doesn't force re-packaging of existing torrents into singleton torrents. Seeders can advertise single files from old torrents and clients with new torrents can take advantage of this advertising.
I disagree with the munged-key darknet idea. If you want a darknet, run it on a non-public DHT, with cryptographic handshakes and encrypted traffic. Cryptographically munging the DHT keys on a public DHT only creates a "light grey net" that's trivially circumvented and provides a false sense of privacy.
You can't do that because torrents aren't file delimited, they are block delimited. You can't check 2 files are the same across torrents without first downloading both torrents.
It's in there somewhere...
Edit: Here's a more relevant use case:
More importantly, a key concern on private trackers is swarm size - an extension like this would have the potential to expand the available peers on a given file, if the file exists in other swarms on the same tracker. Not a very common use case, but one to consider nonetheless.
http://en.wikipedia.org/wiki/Zip_%28file_format%29#Design: A directory is placed at the end of a .ZIP file. This identifies what files are in the .ZIP and identifies where in the .ZIP that file is located. This allows .ZIP readers to load the list of files without reading the entire .ZIP archive.
The files are concatenated into one long stream, and the piece number is an index to that, with no guarantees about alignment.
For instance, if you have a torrent (we'll call it 'X') with three files: the 4mb file 'a', the 3mb file 'b' and the 1mb file 'c', and two separate torrents ('Y' and 'Z') describing files 'b' and 'c' seperately, then the pieces would map something like this:
'Y' piece 1 -> 'X' piece 17 'Z' piece 1 -> 'X' piece 29
That's an absolute best case scenario though - in most cases, file sizes aren't quite as perfect as that (each being a multiple of the default piece size, 256kb). If 'b' just happened to be 1373kb, or anything else that wasn't a multiple of 256kb, then any files after it aren't addressable from other torrents.
You just have at most two blocks of additional overhead.
You would have to have where the file begins and ends within the blocks downloaded, but that's already in the torrent file.
I am going to have nightmares tonight...
Clients that have downloaded all of the data for a single file (but may or may not have downloaded all of the data for the full torrent) have the data for the file and can calculate the Merkle tree root for that file, and advertise availability on the DHT.
Clients with new style torrent files that included Merkle tree roots in file descriptions would then be able to download those files. This has nothing to do with comparing torrent files.
Here is a tool that makes it possible to preview video/audio quality by getting the first and last .rar file: http://techzil.com/play-rar-files-without-extracting-uisng-d...
In practice, what this means is that you can't verify that two files of the same name and size but at different alignments within the consolidated data stream are identical; you can't compare hashes, can't do anything without first downloading. This opens the door to mass poisoning of swarms without even having to enter them in the first place.
There are potential solutions (including providing a broader hash per-file, as opposed to per-piece), but my statement was only that it's not that simple, not that it's impossible.