Formal verification of Amazon's s2n SSL/TLS library | Dark Hacker News

Formal verification of Amazon's s2n SSL/TLS library | Dark Hacker News

NathanCollins 9 years ago | | |

I don't think I understand what you mean by "white-box testing" here, but perhaps it's helpful to clarify what I meant by "equivalence" above, and how it relates to testing: what we did here was verify input/output equivalence between the imperative C code and our functional mathematical spec in Cryptol, for a range of key and input buffer sizes. This corresponds to testing all inputs of those sizes, which is not possible to do by direct testing: e.g., for a 64 byte key and a 1000 byte message, the equivalence corresponds to checking

8^(64 + 1000) =

772229093352564060021182203061704429810699485400692901921197 543030601797302324658889178066005708227773161814337173682980 065612522479316644103460638515687114933331680544961552375412 914711698479251875125441335427310394080188149008724146221306 402242642191159219745353079189135871713826154087180913177991 135554545843425504232155742364801022614341625532175948198587 539576566458760517446126909555225085347521013376171505426231 008775737688282539095967230536510936329489906183630574979494 541005574981802619546120394597788656899688609063922312837993 473534655739423794995816974687759952971465473538229880976237 137410666755636310464327792929854669852851716265627988045993 010404521026728809660275537200281773360887456757531693050082 473180078568595877659952113273156104380151800825339034988199 020562681928372626978536148813617979584497069978086989075685 756621893032191527888867820144068182725496496585643739551119 7590300209437142003442599950379602277911674788208191414992896

tests, which would take "forever" to verify by direct testing.

We did not prove any properties of our mathematical specification in Cryptol, but the claim is that it's close enough to the official FIPS mathematical specification for HMAC [1] that it's easy to believe that it's correct. However, a group at Princeton has also verified HMAC in the past, and gone further than us by not only proving that the imperative C code is input/output equivalent to their mathematical spec in Coq, but also proving that their mathematical spec has the security properties of a secure hash function [2].

[1] http://csrc.nist.gov/publications/fips/fips198-1/FIPS-198-1_...

[2] https://www.cs.princeton.edu/~appel/papers/verified-hmac.pdf

guitarbill 9 years ago | | |

AFAIK, white-box testing is simply when you can look at the source code (as opposed to black-box testing) for example a unit test is a type of white-box test.

What I was struggling to express is that in the mathematical notation, the operations are well defined (right?); in C that's not necessarily the case. So you could argue that if you were writing direct tests, you don't need to check all inputs, but testing edge-cases will do. And maybe that's true, but practically impossible for complex algos because how do you know which inputs cause edge case behaviour? So I was agreeing that this approach is probably better than having some fallible human write test cases :) (better = more thorough and reliable) And although you'd have to make sure the same fallible human hasn't put bugs in the mathematical spec, as you've said that's probably easier to check.

EDIT: Nevermind, I found part three about undefined behaviour. I had written: You seem to know loads about this, maybe you could say how undefined C behaviour is handled when comparing against a spec? Is e.g. shift-past-bitwidth simply forbidden? The only alternative I can think of is looking at the disassembly on a certain platform and checking those instructions, which sounds less than ideal.

NathanCollins 9 years ago | | |

Thanks for clarifying "white-box testing".

Some comments:

* the operations in the mathematical spec are mostly well defined, but e.g. division by zero is not defined. However, the verification handles this by checking that all operations are well-defined on all possible inputs.

* yes, identifying the "edge cases" is not something you can do easily, and hard to make formal. In some sense, the fact the non-edge-case inputs are treated in a uniform way is probably what allows the verification to succeed at all.

* a short summary of the answer you already found in the third blog post: what we actually verify is the LLVM assembly that Clang produces when compiling the C program. Much of the potentially undefined behavior in a C program is translated away by the compiler on the way to LLVM assembly. For any potential undefined behavior that remains in the LLVM assembly, the verification checks that it cannot happen at runtime.