The Pirate Bay Runs on 21 “Raid-Proof” Virtual Machines(torrentfreak.com) |
The Pirate Bay Runs on 21 “Raid-Proof” Virtual Machines(torrentfreak.com) |
Probably not true for " This saved costs ". From what i've seen, virtual machines usually cost more than twice the price of renting the equivalent "real" machine monthly.
They could have used dedicated servers; there are more dedicated server providers than VM providers, thus achieving the same goal, less expensively.
Probably not true for " better uptime " either; VMs are still hosted on real hardware, which fails, too. (Although distributing the work on more independent machines can improve uptime.)
1) Hardware seizure expenses vs LEOs duplicating the hdd of a virt.
2) TPB needs to locate in disparate jurisdictions to take advantages of different legal situations. That would involve a ton of shipping costs, probably more lost hardware, and paying for remote hands
3) They had been paying a premium for 'bulletproof' hosting.
There are dedicated server providers in every part of the world, including MPAA-proof countries.
Is there a way the codebase could be entirely encrypted and not even accessible to the cloud provider (with some 'boot password' needed each time the server starts up)?
I don't know how accurate this is, though.
Encryption (with the decryption key being gotten at boot from, say, a particular .onion address) would work against backups, but won't protect against an adversary with admin access to the server when the virtual server is on.
I'd love to hear a little more about the architecture.
[1] https://www.ipredator.se/ [2] http://torrentfreak.com/pirate-bay-announces-ipredator-globa...
I don't know, that's why I'm asking though.
That level of hardware/cores seems a bit over the top given what TPB does.
When I was a boy we had this thing called 'Alta Vista'. It was the search engine before Bing! came along. Processors did not run at gigahertz speeds back then and a large disk was 2Gb. Nonetheless most offices had the internet and when people went searching 'Alta Vista' was the first port of call for many.
TPB has an index of a selective part of the internets, i.e. movies, software, music, that sort of thing. Meanwhile, back in the 1990's, AltaVista indexed everything, as in the entire known internets, with everything stored away in less than the 620Gb used by TPB for their collection of 'stolen' material.
From http://en.wikipedia.org/wiki/AltaVista
Alta Vista is a very large project, requiring the cooperation of at least 5 servers, configured for searching huge indices and handling a huge Internet traffic load. The initial hardware configuration for Alta Vista is as follows:
Alta Vista -- AlphaStation 250 4/266 4 GB disk 196 MB memory Primary web server for gotcha.com Queries directed to WebIndexer or NewsIndexer
NewsServer -- AlphaStation 400 4/233 24 GB of RAID disks 160 MB memory News spool from which news index is generated Serves articles (via http) to those without news server
NewsIndexer -- AlphaStation 250 4/266 13 GB disk 196 MB memory Builds news index using articles from NewsServer Answers news index queries from Alta Vista
Spider -- DEC 3000 Model 900 (replacement for Model 500) 30 GB of RAID disk 1GB memory Collects pages from the web for WebIndexer
WebIndexer -- Alpha Server 8400 5/300 210 GB RAID disk (expandable) 4 GB memory (expandable) 4 processors (expandable) Builds the web index using pages sent by Spider. Answers web index queries from Alta Vista
They also didn't get as much traffic as TBP, since there wasn't that many connected back then.
I would also imagine that they didn't have to HIDE their services either.
IIRC there where (quite) a few before bing. More to the point google was the pinnacle of web searches long before bing came into existence.
Alta Vista started out with a modest size index of 20 million pages. Let's imagine those pages were all of 1Kb in size, then, 20 10^6 10^3 comes to 20 *10^9 or 20Gb. So, in terms of stuff indexed, that is considerably larger than TPB. Agreed?
Well, maybe not. They could have used compression to get the vastness of TPB onto that USB stick. Around that time - 2012 - they had 1.6 million torrents. That is some way off the Gb that AltaVista indexed, no matter how you bloat the maths. Sad to say, but, in the 1990's, the internet was actually larger than your porn collection.
How useful is reqs/second anyway? By that score Google probably does very badly as a search usually returns the answer on the first page. With old-style search engines you might need to go through scores of pages before getting what you want. I found TPB to be a bit like that too, wading through results pages more than necessary.
TPB is not 'safe for work' and in a lot of jurisdictions you cannot even access it from home. In the UK (which is a small but well populated country) it is not that easy to get onto TPB - you have to have hacker voodoo skills to do that or route through a VPN as none of the main ISPs will let you on. Most of the civilised world has the same need to protect citizens from the evils of TPB so places where it can be accessed are not that common. Even if you could access it, would you? Probably...
Meanwhile, back in 1998 - a year or two before the dotcom crash - plenty of people were using search engines such as AltaVista (which was the best back then) for actual work. Maybe not everyone, but enough people knew about computers and things like AOL disks, modems and what not. The internet was big.
Which reminds me of my main point, the one you thought so important to down vote rather than give kudos for being insightful. TPB uses a constellation of computers and consumes vastly more resources than the biggest search engine of the 1990's, yet, the utility of TPB is limited to only a few fortunate enough to live somewhere where TPB can be accessed. What can be searched for on TPB is a mere subset of what was on Altavista albeit different and not so useful stuff. I would say that with AltaVista they were doing far more with what they had, reaching a better audience, doing something more useful for the world (than serving weight loss adverts) and all together performing a miracle. TPB is a slouch in comparison.
If you have 20 servers in 20 different providers, who don't know who they are hosting, you don't need this.
Also VPS provisioning time is seconds/minutes instead of hours to where they could redeploy to another provider if they suddenly got the boot from one provider. And via Amazon/Digitalocean-type APIs this reprovisioning-on-failure could be fully automated.
Running a top 100 site[1] on 21 VMs in 2014 is quite impressive.
Sweden and Holland used to be considered anti-copyright havens, but the movie/recording industry mafia eventually pressured them into passing legislation that squashed this.
The only countries where you could operate in and be reasonably copyright resistant are Iceland and Switzerland, because they are non-EU members and have great data protection laws as of now. Dedicated servers there are quite overpriced though.
So you could cross reference names of the people raided with payment information of the VPS providers (usual suspects or top "n" providers let's say). Of course that could be hidden as well.
Other issue is how does anyone know this isn't misinformation anyway and that the VPS providers don't play a role or not as much of a role as is indicated. Just because someone is writing this or because they said it?
What advantage does it have for anyone (like this) to reveal anything about how they are situated security wise if not to lead people off the beaten track even given some possible marketing advantage?
I doubt this will be useful as they're probably using Bitcoin or prepaid cards or things like that for payments.
> What advantage does it have for anyone (like this) to reveal anything
It could teach others how to setup websites that are harder to censor or more resilient to raids (plus it gets them free PR/traffic).
Because then they would also have to care about the thousands of other VM's that may run all kinds of stuff that is illegal somewhere, questionable, politically, socially, culturally or commercially sensitive etcetera.
No ISP can afford to be proactive about this. They cannot afford to care. Or even know.
It will be amazing if/when we get to the point where you can have a virtual server where you know that the person with physical access to the server cannot access your data.
Even if it is two orders of magnitude slower than raw hardware that's still fast enough for some things. (For example, being able to have a username+password -> personal info database safely run on someone else's hardware.) And once it takes off there probably will start to be hardware support/accelerators for it - like vector intrinsics and AES instructions currently.
Giving Joe Public access to the private key necessary for interpreting the query result allows attackers to inspect all of the intermediate states of the query finite state machine, which allows debugging and inspection just as if homomorphic encryption wasn't in use.
I suppose the routing proxy could hold the private key and decrypt the query result for the general public. However, the location of the routing proxy is almost certainly going to be compromised before the locations of the servers executing the queries, so in the decrypting proxy scenario, the attackers will almost certainly have the secret keys before they get access to the boxes executing the queries. There's also the problem that the messages being decrypted are the final states of finite state machines that executed the queries, so the messages to be copied over the network add up in size to at least the size of the dataset being queried. (The data can be sharded into many smaller databases, and almost certainly would be in order to speed up the homomorphic computation steps, but this doesn't cut down on the amount of network traffic necessary to retrieve all search results for a single query. A simple query on 1 TB of data, split into 10,000 databases each of 100 MB would require copying and remotely decrypting 10,000 messages, each over 100 MB in size.)
It might be possible to discover a homomorphic encryption scheme whereby knowledge of the private key allows one to devise a mapping from a higher dimensional finite state to a lower dimensional finite state machine, where the secret key for the smaller dimensional machine doesn't leak information about the secret key for the larger machine. In this case, it may be possible to perform some finishing operations on the query to prepare it for conversion to the smaller state machine and give the public the private key to the smaller state machine so that the query result could be read from the machine by the public without the public being able to observe intermediate states of the query computation. However, I believe this is far beyond our current mathematical understanding.
This doesn't apply to TPB, but one could give each user of, say, an email webapp the private key to his/her own data while still facilitating server-side search.
> I suppose the routing proxy could hold the private key and decrypt the query result for the general public. However, the location of the routing proxy is almost certainly going to be compromised before the locations of the servers executing the queries.
This wouldn't be completely useless since it lets you offload much of the storage and computation onto commodity cloud providers without revealing what's on the machines, even if they're scanning your RAM. From the article it seems like TPB is getting some kind of utility out of such a scheme: "All virtual machines are hosted with commercial cloud hosting providers, who have no clue that The Pirate Bay is among their customers. All traffic goes through the load balancer, which masks what the other VMs are doing."
> There's also the problem that the messages being decrypted are the final states of finite state machines that executed the queries, so the messages to be copied over the network add up in size to at least the size of the dataset being queried.
In a homomorphic encryption scheme that supported querying, only the encrypted results would need to be relayed back from each search shard, no?
Yes, but as I stated originally, the size of the result is the size of the finite state machine which encodes all of the query data plus the search mechanics. We may in the future discover ways around this, but it's a limitation of the current state of the art.
I'm responding to the GP, who was hoping that homomorphic encryption would allow TPB to hand an attacker a working copy of the database on which the attacker could run queries, but not leak information about what the database was doing.
I'm making the statement that allowing Joe Public the ability to interpret query results allows the attacker the ability to observe the database's internal state at each step of the query, nullifying any advantages of homomorphic encryption.
I explained why current homomorphic encryption doesn't allow the kind of separation of access the GP was hoping for, and outlined one way a theoretical discovery advancing the state of the art might allow what the GP was hoping for.