MongoDB Releases Queryable Encryption Preview(mongodb.com) |
MongoDB Releases Queryable Encryption Preview(mongodb.com) |
"In use" implies that you have a need to process that data. It doesn't matter if the end client is submitting queries in plain text (protected in transit) or this fancy encryption, the client (or server) still needs to be authorized to query that data. Translating from plain-text to encryption does not add additional protections from a compliance perspective.
At an organizational level, it's extremely hard to control what information get put into a SaaS. There are far too many ways in which data can be de-anonymized or inferred against (e.g. a field existing can have privacy implications).
It's far safer to use a SaaS provider that meets general control requirements than to try to shoe-horn encrypted data into them.
It's not just the query that is encrypted in this case, but the data being queried. From MongoDB's description, the server never receives or stores plaintext data, and the query results can only be decrypted by a client who has the same key that was used to encrypt the data in the first place. From a compliance perspective, that's amazing if it works. It means the server is never storing or processing anything but ciphertext.
In the most extreme cases, the unencrypted values never leave the client. The database can concentrate on delivering storage and fast query answers without paying much attention to issues of security. Clients don't need to trust the database because they control the encryption.
Just as a dumb example; an auditor says passwords need to be hashed with bcrypt. They find a code sample that says "store(bcrypt(password))". Awesome; complied to a T. But true security goes beyond that: are we using a library for bcrypt, or an internal implementation? Is the internal implementation well-implemented? Is the library free of CVEs (maybe they check that)? Did we trace that call to ensure the data generated is what is inserted to the db, or was it intercepted by some middleware? Did we name that function 'bcrypt' but its actually just MD5?
My point is really not to assert that auditing is pointless, but rather its fundamentally limited in what kind of attestations it can make.
One great example I can pull from a few recent audits I've been through: serverless tech like Fargate. This oftentimes blows auditors away (or, rather, it used to; nowadays they've seen it so often that they just know). It checks so many boxes. They'll present multi-page forms about data center colos and operating system security and operator SSH access and we'll say "We use Fargate". "Oh nice, ok we can check all of these and carve out with AWS's attestation for (ComplianceFrameworkX)". It saves hours, days, of time.
That's, I think, where homomorphic encryption can go. That isn't what this is, but it's a step toward that. It's not about meeting today's compliance frameworks; it's about evolving the framework. And, in the interim, as advanced R&D teams meet these auditors, they'll educate-up how, yeah, you've got a lot of questions here, but its not that we do or don't meet them: its that they're fundamentally the wrong questions to ask; but we understand the spirit, here's how we meet the spirit, and here's how we're actually better than if we had just checked Yes on all of them.
Third example: years ago, our team was the first time our auditor had ever seen LetsEncrypt and k8s certificate-manager (then it was called kube-lego). He wanted an attestation that TLS certificates were current and not near-expiration. We countered: they can't be near-expiration, because we have automated systems which renew them. He'd never seen anything like it; he was used to expensive certificates and operations runbooks for renewal; and we nerded out for ten minutes showing it all off. Instead of documenting a runbook for renewing certificates, he documented our runbook for maintaining this automated service and ensuring uptime. Win-win.
Its a slow process, and its made even slower because there are tons of people in the industry who treat the frameworks as gospel. But, ultimately; we control the technology, not them. We decide what is secure; they just attest to it and double-check.
CipherStash works with any Database and also supports Range queries and sorting/ordering. We do it in the application layer. Only supports Ruby so far but C#, Java, Python, Rust are in the works.
It says it will support prefix search, substring search, and the like. Can anyone point me in the right direction on what the algorithm may be here? I don't get how you could do those things without making the encryption less secure and/or decrypting every record the fly.
Another interesting use case I found that isn't mentioned here is sort. I've had customers ask me to be able to sort the results by PII and we tell them... no, we can't do that because the field is encrypted.
MongoDB is very short on details, and I suspect they do something worse than homomorphic encryption, that does indeed make some kind of compromise between privacy and convenience.
CryptDB: Processing Queries on an Encrypted Database
E.g. googling i found http://cs.brown.edu/people/seny/pubs/edb.pdf
[0]: https://en.wikipedia.org/wiki/Column_Level_Encryption
[1]: https://github.com/bincyber/go-sqlcrypter
[2]: https://www.vaultproject.io/docs/secrets/transit#convergent-...
"Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz"
Some related papers with those two as authors:
Unfortunately I don't seem too be able to find this again, but a quick search turned out two papers that say that just encrypting your db isn't enough: [0], [1]. In particualr [1] doesn't seem to go into the details of how you could recover the data, but mentions that many operations as performed by "normal" databases leak information if performed over encrypted data. Maybe someone that is more familiar with Queryable Encryption can comment on this?
[0] https://www.cs.cornell.edu/~shmat/shmat_hotos17.pdf [1] https://www.microsoft.com/en-us/research/wp-content/uploads/...
(their pagination is implemented just by increasing the limit parameter).
It's also difficult to see how this could work on the server side without exposing some information about the encrypted fields. For example, if all documents have a value that begins with "a", then there must exist a prefix query that matches all those documents. I would expect it to be possible to figure out whether such a query is possible or not, only given access to the encrypted data, but even if that's not possible, the simple fact that a prefix query was issued that matched all documents gives away that information.
For something like, HIPAA, this ads very little value if fields are semi-known.
Is it really all client side? How could they do things like substring matching without sending the entire index back and forth to the client? The graphic seems to show the query being executed solely on the server (although graphics often lie).
Then it's just a matter of counting matching trigrams/chunks. The server doesn't need to know how to read the trigrams.
So let me get this right - its encrypted but you cansearch prefix and suffix?
So all the attacker has to do is do it one letter at a time, see if it starts with A, B, C, once they figure that out, go to the next letter and so on. (I presume that the DB is not supposed to be trusted since they make such a big fuss about only being decryptable on the client side)
Also there doesn't seem to be a whitepaper detailing algorithms or their threat model. Bitcoin scams try harder then this.
The use case here is just "advanced encryption at rest". Encrypting at rest is one thing, but this means people are less likely to see PII by accident, for example.
"Queryable Encryption implements a fast, searchable scheme that allows the server to process queries on fully encrypted data, without knowing anything about the data. The data and the query itself remain encrypted at all times on the server."
They are strongly implying that the someone with access to the database should not be able to decrypt the data. According to their blog post that seems to be the entire value proposition compared to what they describe as traditional encryption at rest.
Designing secure cryptosystems is hard. Experts fail at it all the time. The lack of technical details is a major red flag.
Not to mention the distinct possibility that even if this group made a secure system, the mongodb marketing dept may very well be misrepresenting its security/limitations.
I don't think this is exactly homomorphic. I hope they put out a whitepaper so researchers can properly evaluate its security.
See the MuchPIR project (https://github.com/ReverseControl/MuchPIR) which implements Information-Theoretic Private Information Retrieval (IT-PIR) in Postgresql; In addition to the demo there is a high performance version available for commercial use.
If they are able to do this without decrypting the data then I think you could describe this as a somewhat week encryption that exposes some data attributes as queryable. You could not implement this with strong encryption without at least decrypting for indexing.
This is obviously not that. They're encrypting locally. However, Simon Oya & Dr. Kerschbaum's paper, https://arxiv.org/abs/2010.03465, demonstrate a fantastic efficient attack to recover keywords on most constructions without a lot of queries. It is yet to be seen how effective MongoDB's implementation will be.
This is a very interesting space but structural encryption is the right way to put the theory into good use.
Most of the other encryption mechanisms such as homomorphic, partially homomorphic, etc. are just too impractical or require very specific niche use cases to be useful.
There are other misnamed technology I've seen in marketing such as "polymorphic encryption" or "vaultless" - but most of these haven't had real research or cryptanalysis behind it.
[0] https://info.ionic.com/hubfs/IonicDotCom/Resources/Assets/Se... [1] https://eprint.iacr.org/2017/111.pdf
That's what I mean by a "local client," there has to be something on the client side and it cannot just be something that communicates over the internet to a server w/o some sort of local encryption first.
the conclusions drawn by this paper with regard to CryptDB's guarantees for medical applications are incorrect: had the guidelines been followed, none of the claimed attacks would have been possible. [1]
This isn't really true because there are multiple ciphertexts that can decode to the same plaintext in any modern encryption algorithm. If you skip that property you weaken the encryption. (Chosen plaintext attacks)
Classic example is on-prem enterprises requiring data encryption at rest when moving to a cloud vendor. I can explain to a client that encrypting an S3 bucket with an AWS-managed key doesn't really prevent anything beyond someone physically stealing a hard drive from the AWS data center, and that the cloud provider can still see all of their data because they control the encryption key... or I can just click the "encrypt data" flag on the S3 bucket, make their security and compliance officer happy, and be done with it.
So, you're totally right, but this might be a case they needed to satisfy where an enterprise security team or regulatory agency said that they couldn't put X data field in the cloud unless it was encrypted, but X data field was really important to the application team.
It does, though. To get that data, you now need access to the bucket itself _and_ the KMS-managed encryption key. You might not be protecting the data from AWS, but one bucket misconfiguration doesn't lead to wholesale data loss now.
Is it perfect? No. You can misconfigure both. But misconfiguring KMS access is harder to do.