Swift Homomorphic Encryption

322 points by yAak 1 year ago | 122 comments

tedunangst 1 year ago |

I feel like phone number lookup is the textbook example of homomorphic encryption not actually working because there's so few keys you can simply enumerate them.

colmmacc 1 year ago | |

I think here the query exposes who called who, which isn't as enumerable. By encrypting the query homomorphically on the client, the answering service has no knowledge of what number the lookup is for, and so Apple can't build a database of who calls you.

tedunangst 1 year ago | | |

It includes both numbers? That wasn't clear. It sounded like they're just looking up the calling number for fancy caller id. How does the recipient affect the query?

silasdavis 1 year ago | |

I'm not sure what enumeration attack you have in mind, but if you were to encrypt the same value many times you would not get the same ciphertext under most schemes.

willseth 1 year ago | |

The novelty is that the server processing the phone number can perform the lookup without actually knowing the phone number or whether it matched.

Dylan16807 1 year ago | |

Are you thinking of hashing?

As far as I'm aware homomorphic encryption can keep even a single bit safe, but maybe I missed something.

scosman 1 year ago | |

add a seed.

golol 1 year ago |

I find homomorphic encryption fascinating as it can in some sense move a simulation into an inaccessible parallel universe.

Jerrrrrrry 1 year ago | |

> move a simulation into an inaccessible parallel universe.

more like, "move a computation into an progressed, but still unknown, state"

tpurves 1 year ago |

Anyone interested in FHE should also be checking out https://www.zama.ai they've made a ton of progress recently in making FHE practical.

bluedevilzn 1 year ago |

This must be the first real world use case of HE. It has generally been considered too slow to do anything useful but this is an excellent use case.

osaariki 1 year ago | |

Edge's Password Monitor feature uses homomorphic encryption to match passwords against a database of leaks without revealing anything about those passwords: https://www.microsoft.com/en-us/research/blog/password-monit... So not the first, but definitely cool to see more adoption!

cedws 1 year ago | | |

This is nicer than the k-anonymity algorithm that Have I Been Pwned uses, but probably an order of magnitude more expensive to run.

dagmx 1 year ago | | |

I believe Safari does the same as well, so not even technically the first at Apple if I’m correct?

MBCook 1 year ago | |

I tried to look homomorphic encryption up casually earlier this year. I saw references that it was being used, but I don’t think they said where.

This is one topic I have a very hard time with, I just don’t know enough math to really grok it.

It just seems crazy a system could operate on encrypted data (which is effectively random noise from the server’s point of view) and return a result that is correctly calculated and encrypted for the client, despite never understanding the data at any point.

I sort of understand the theory (at a very simple level) but my brain doesn’t want to agree.

oblvious-earth 1 year ago | | |

Maybe it’s the fact it can be done with multiple operators and strong encryption that is hard to grok, but at least here is a very simple example of a limited partially homomorphic encryption:

You have a 7-bit character representation (e.g. ASCII) and your encryption is to add 1 mod 128. E.g. 0 -> 1, 1 -> 2, ... 126 -> 127, 127 -> 0.

As it turns out, all your operations can be represented as adding or subtracting constants. You can now encrypt your data (+1), send it to a remote server, send all the adding and subtracting operations, pull back the processed data, decrypt the data (-1).

Of course, this example is neither useful encryption nor generally useful operation, but can be useful for grokking why it might be possible.

kmeisthax 1 year ago | | |

Let's say I want you to add two numbers, but I don't want you to know what those numbers are, nor what the result is. What I can do is multiply both numbers by some other number you don't know. I then give you the premultiplied numbers, you add them, and give back a premultiplied answer. I can then divide out the number to get the true result.

What we've done here is this:

(a * key) + (b * key) = (c * key)

The rules of elementary algebra allow us to divide out the key on both sides because of a few useful symmetries that addition and multiplication have. Namely, these two equations are always the same number:

(a + b) * key = (a * key) + (b * key)

This is known as the distributive property. Normally, we talk about it applying to numbers being added and multiplied, but there are plenty of other mathematical structures and pairs of operations that do this, too. In the language of abstract algebra, we call any number system and pair of operations that distribute like this a "field", of which addition and multiplication over real[0] numbers is just one of.

A simple example of a field that isn't the normal number system you're used to is a 'finite field'. To visualize these, imagine a number circle instead of a line. We get a finite field by chopping off the number line at some prime[1] number that we decide is the highest in the loop. But this is still a field: addition and multiplication keep distributing.

It turns out cryptography loves using finite fields, so a lot of these identities hold in various cryptosystems. If I encrypt some data with RSA, which is just a pair of finite field exponents, multiplying that encrypted data will multiply the result when I decrypt it later on. In normal crypto, this is an attack we have to defend against, but in homomorphic crypto we want to deliberately design systems that allow manipulation of encrypted data like this in ways we approve of.

[0] Also complex numbers.

[1] Yes, it has to be prime and I'm unable to find a compact explanation as to why, I assume all the symmetries of algebra we're used to stop working if it's not.

nightpool 1 year ago | |

Second: Google Recaptcha Enterprise can use Homomorphic Encryption to check whether your password has been compromised (searching the set of all breached passwords without disclosing which individual password you want to check)

Now, in practice, HaveIBeenPwned does the exact same thing with a k-anonymity scheme based off of MD5 collisions, which is wayyyy easier in practice and what most people actually deploy, but the Google thing is cool too.

7e 1 year ago | |

A TEE would be a cheaper and more straightforward solution, though.

saagarjha 1 year ago | | |

They also mean if you break the TEE then your privacy guarantees are lost. This, of course, has happened many times.

glenngillen 1 year ago | |

I believe Cipherstash is using HE to do what they do: https://cipherstash.com

kmdupree 1 year ago | | |

it says on their webpage that they aren't using HE

tiffanyh 1 year ago |

This is hugely significant (long-term), that won't be felt immediately.

This is a massive announcement for AI and use cases related to PII.

oulipo 1 year ago |

How does it compare to the FHE from https://zama.ai ?

rhindi 1 year ago | |

They use BFV, which is an FHE scheme allowing a limited number of fast additions and multiplications (enough for their use case).

Zama uses TFHE, which allows any operation (eg comparisons) with unlimited depth.

So if you only need add/mul, BFV, BGV and CKKS are good options. For anything else, you better use TFHE

jayavanth 1 year ago | | |

I was curious about that choice as well. I guess they also just wanted to operate on integers and not floats

gumby 1 year ago |

The name is hilarious because HME is anything but speedy -- by many orders of magnitude.

I think the real fix is secure enclaves, and those have proven to be difficult as well.

karulont 1 year ago | |

There was a recent paper that also uses Swift in the name:

“Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs” - https://arxiv.org/pdf/2407.13055

We were a little worried, but quickly discovered that they used Swift as an adjective not as a programming language.

[Disclosure: I work on the team responsible for the feature]

Someone 1 year ago | |

> I think the real fix is secure enclaves

FTA: “Live Caller ID Lookup uses homomorphic encryption to send an encrypted query to a server that can provide information about a phone number without the server knowing the specific phone number in the request”

So, this would require a distributed Secure Enclave or one of them on Apple’s server communicating with one on an Apple device (likely, certainly over time, with lots of different Apple devices fo lots of different iCloud accounts)

dllthomas 1 year ago | | |

I don't see why it would? IIUC, the promise of homomorphic encryption is that I can encrypt my database of contacts and send it to an untrusted server, later send the encrypted query to that untrusted server, and get back an encrypted response, without the server being able to tell anything that couldn't be told from the wire (some bounds on how much data, timing of communication, that sort of thing) or provide an incorrect answer.

shortstuffsushi 1 year ago | |

I think Swift in this case is just referring to the programming language, Swift, and not a characteristic of the encryption library itself

dllthomas 1 year ago | | |

Right, but that doesn't make it not funny.

ganyu 1 year ago | |

At least 10^4 times slower than raw code, i think

That makes HE anything but Swift (

bawolff 1 year ago | |

Its like high-temperature super conductors, its all relative.

layer8 1 year ago | |

I didn’t look at domain at first and ended up being quite disappointed. :)

ReptileMan 1 year ago |

What is the processing that the server does on the encrypted phone number? I am not sure I understand. I always thought that this type of encryption was (roughly and imprecisely) - you send some encrypted blob to the server, it does some side effect free number crunching on the blob and returns the output blob. You decrypt the blob and everyone is happy.

But to return information if some number is spam it has to be either plaintext or hashed condition somewhere outside of the phone?

fboemer 1 year ago | |

https://news.ycombinator.com/item?id=41115179 give some intuition. The server database is stored in plaintext, but the server response will be encrypted under the client's key.

[Disclosure: I work on the team responsible for the feature]

dboreham 1 year ago | |

The "side effect free number crunching" in this case is: is <encrypted_phone_number> in <set_of_encrypted_bad_numbers>

You're on the right track with the idea of hashing -- I find it helpful to explain any fancy encryption scheme beginning with "if it were just hashing", then extend to "well this is a very fancy kind of hash", and <poof> now I kind of understand what's going on. Or at least it's no longer magic.

saagarjha 1 year ago | | |

I don't think the set of bad numbers needs to be encrypted.

yalogin 1 year ago |

FHE is cool but I wonder how many use cases it actually fits. Don’t get me wrong, it gives better security guarantees for the end user but do they really care if the organization makes a promise about a secure execution environment in the cloud?

Also from an engineering point of view, using FHE requires a refactoring of flows and an inflexible commitment to all processing downstream. Without laws mandating it, do organizations have enough motivation to do that?

kybernetikos 1 year ago | |

I think the main thing that throws it into question is when you get the software that sends the data to the service and the service from the same people (in this case apple). You're already trusting them with your data, and a fancy HE scheme doesn't change that. They can update their software and start sending everything in plain text and you wouldn't even realise they'd done it.

FHE is plausibly most useful when you trust the source of the client code but want to use the compute resource of an organisation you don't want to have to trust.

bobbylarrybobby 1 year ago | |

I assume companies like it because it lets them compute on servers they don't trust. The corollary is they don't need to secure HE servers as much because any data the servers lose isn't valuable. And the corollary to that is that companies can have much more flexible compute infra, sending HE requests to arbitrary machines instead of only those that are known to be highly secure.

nightpool 1 year ago | |

> but do they really care if the organization makes a promise about a secure execution environment in the cloud?

Uh... demonstrably yes? No "secure execution environment" is secure against a government wiretap order. FHE is.

chipsrafferty 1 year ago | | |

Unless the operating system for iPhones is open source and one can verify which version they have installed, users can't really be sure that Apple is doing this. They could just say they are doing things to protect user's privacy, and then not, and sell their data.

nmadden 1 year ago |

The thing that I always want to know with FHE: the gold standard of modern encryption is IND-CCA security. FHE by definition cannot meet that standard (being able to change a ciphertext to have predictable effects on the plaintext is the definition of a chosen ciphertext attack). So how close do modern FHE schemes get? ie how much security am I sacrificing to get the FHE goodness?

GTP 1 year ago | |

Is the used scheme fully homomorphic encryption or just homomorphic wrt a specific operation? Because they only mention "homomorphic" without the "fully".

fboemer 1 year ago | | |

Swift Homomorphic Encryption implements the Brakerski-Fan-Vercauteren (BFV) HE scheme (https://eprint.iacr.org/2012/078, https://eprint.iacr.org/2012/144) (without bootstrapping). This is a leveled HE scheme, which supports a limited number of encrypted adds and multiplies (among other operations).

[Disclosure: I work on the team responsible for the feature]

nmadden 1 year ago | | |

With respect to IND-CCA, it doesn’t matter. Neither is compatible.

hansvm 1 year ago | |

You can't attain IND-CCA2 (adaptively choosing cyphertexts based on previous decryptions). You can attain IND-CCA1 (after a decryption oracle, you're done fiddling with the system).

nmadden 1 year ago | | |

Right, but IND-CCA1 is kind of a toy security goal though. A sort theoretical consolation prize if you can’t achieve the real thing. And AFAICT, no actually implemented schemes do obtain even CCA1?

menkalinan 1 year ago |

I don't quite understand how the server can match the ciphertext with a value without knowing the key. How does the server determine that the ciphertext corresponds to the specific value? If the server constructs this ciphertext-value database, how does it know what algorithm to use to create ciphertext from a value and store on its side?

karulont 1 year ago | |

Check my comment here for some intuition: https://news.ycombinator.com/item?id=41115179

Basically the server does not know, it just computes with every possible value. And the result turns out to be what the client was interested in.

motohagiography 1 year ago |

great to see this becoming part of mainstream tools. the question I have is, when a weakness is published in FHE, is it more like a hash function you can do some transformations on, but there is no 'decryption' to recover plaintext again- or is it more like a symmetric cipher, where all your old ciphertexts can be cracked, but now your FHE data sets are no longer considered secure or private and need to be re-generated from their plaintexts with the updated version?

what is the failure mode of FHE and how does it recover?

j2kun 1 year ago | |

It is more like a symmetric cipher. Once you have a key you can decrypt everything encrypted with that key

motohagiography 1 year ago | | |

the risk in this is that FHE is proposed as a privacy protecting tech, and it will "squeeze a lot of toothpaste out of the tube" in private data sharing, where a weakness will be a rug pull under all the data subjects whose data was shared under the aegis of being "encrypted."

It's important to understand this failure mode, imo.

lsh123 1 year ago |

If we assume that server is “evil” then the server can store both PIR encrypted and plain text phone number in the same row in the database and when this row is read, simply log plain text phone number. What do I miss here? We can send PIR request and trust server not to do the above; or we can send plain text phone number and trust server not to log it — what’s the difference?

karulont 1 year ago | |

A very simple PIR scheme on top of homomorphic encryption that supports multiplying with a plaintext and homomorphic addition, would look like this:

The client one-hot-encodes the query: Enc(0), Enc(1), Enc(0). The server has 3 values: x, y, z. Now the server computes: Enc(0) * x + Enc(1) * y + Enc(0) * z == Enc(y). Client can decrypt Enc(y) and get the value y. Server received three ciphertexts, but does not know which one of them was encryption of zero or one, because the multiplications and additions that the server did, never leak the underlying value.

This gives some intuition on how PIR works, actual schemes are more efficient.

[Disclosure: I work on the team responsible for the feature]

lsh123 1 year ago | | |

Does the server reads specific rows from spam numbers DB or the whole database?

jayd16 1 year ago | |

The server never gets the plaintext at all. It only ever receives encrypted data that it cannot read.

vlovich123 1 year ago | | |

I think OP is talking about the set of “spam phone numbers” stored on the server and looking at side channels based on what data is looked up by processing the query.

vlovich123 1 year ago | |

It’s a lot more complicated because the phone numbers themselves are stored encrypted and there’s not a 1:1 mapping between encrypted representation and the mapping. So processing the query is actually blinding the evil server afaik.

lsh123 1 year ago | | |

Evil server stores BOTH encrypted and plain text phone number in the same db row

attilakun 1 year ago |

Is there a good primer that explains the math basis of this?

j2kun 1 year ago | |

https://www.jeremykun.com/2024/05/04/fhe-overview/

tombert 1 year ago |

I wrote some basic homomorphic encryption code for a hackathon like 8 years ago. When I interviewed for a BigTechCo [1] about a year later, the topic came up, and when I tried explaining what homomorphic encryption was to one of the interviewers, he told me that I misunderstood, because it was "impossible" to update encrypted data without decrypting it. I politely tried saying "actually no, that's what makes homomorphic encryption super cool", and we went back and forth; eventually I kind of gave up because I was trying to make a good impression.

I did actually get that job, but I found out that that interviewer actually said "no", I believe because he thought I was wrong about that.

[1] My usual disclaimer: It's not hard to find my work history, I don't hide it, but I politely ask that you do not post it here directly.