Yin-Yang – A tool for stress-testing SMT solvers

Yin-Yang – A tool for stress-testing SMT solvers(testsmt.github.io)

54 points by zsu 5 years ago | 14 comments

gugagore 5 years ago |

Any ideas why Z3 has so many more issues than CVC4?

Maybe because CVC4 has better practices around fuzzing? https://github.com/CVC4/CVC4/wiki/Fuzzing-CVC4 Not sure.

I am also looking for a recommendation for a survey comparing different SMT solvers, including e.g. https://github.com/SRI-CSL/yices2 and https://github.com/dreal/dreal4

dwinterer 5 years ago | |

Good question, gugagore!

We have treated Z3 and CVC4 exactly equal, i.e. every formula on which Z3 was tested, CVC4 got also tested. It is thus striking that we found almost twice as many bugs in Z3 as compared to CVC4. The solvers have different development models. Whereas Z3 relies on a single main developer and a couple of assisting developers, CVC4 is more of a community effort. CVC4 requires code reviews, Z3 usually not. On the other hand, issues in Z3 get usually fixed faster and Z3 has the larger codebase (Z3: ~400k LoC vs CVC4: ~200k LoC). The fuzzing practices of CVC4 did not play a role, though. As soon as they were in place (since August 2020), we applied their rules to both solvers. Our earlier work "Validating SMT Solvers via Semantic Fusion" published at PLDI 2020 (bug hunting from July 2019 - November 2019) showed the same trend. To the best of our knowledge, all SMT solver bug hunting campaigns prior to ours, found no bugs in CVC4 at all. CVC4 simply seems to be harder to crack.

zsu 5 years ago | | |

To add a quick follow-up to dwinterer's nice reply, note please Z3's current support for nonlinear arithmetic and string logics is more advanced than CVC4's, where many of the detected bugs in Z3 occurred.

ahelwer 5 years ago | | |

I really wish Microsoft would hire some developers to help Nikolaj. I've done some simple volunteer dev work for Z3 (just creating a nuget package & improving their build/release pipeline mostly) but my impression is there is a ton of work that could be done if one were to really sink their teeth into the codebase. Really tricky, interesting, applied-theory type work too.

I'm biased but if I had any purse string authority at Microsoft, RiSE's budget would be a multiple of what it currently is.

philzook 5 years ago | |

The SMT-COMP competition is held every year. https://smt-comp.github.io/2020/results.html Look up your problem domain and see how they compare.

A nice thing about the smtlib2 format is that you can try the same problem in different solvers without too much pain. Generally every solver will out perform other solvers on at least some problems.

For general purpose use, Z3 is a good starting point.

I've had good luck with Boolector.

I haven't really tried dreal. It is specialized for problems involving nonlinear predicates over real numbers, so it targets a different application domain than the other solvers.

gugagore 5 years ago |

This video illustrates the principle well: https://www.youtube.com/watch?v=hEWpbO5yXPw

espadrine 5 years ago | |

Good video!

It could have been improved by digging into why the code of Z3 and CVC4 had those bugs.

It is surprising to me that they could be wrong on such simple statements (and such a simple fuzzing procedure). I expected the faulty statements to be gigantic edge-case monstrosities, but they are just a few lines of boolean logic.

zsu 5 years ago | | |

Agreed; such an analysis/study would be nice to have. It is worth to point that the small tests were the results of test input reduction --- the original tests were often quite large. Many of the bugs were due to incorrect handling of corner/boundary cases, such as "division by zero" (which has well-defined semantics in SMT-LIB).

zsu 5 years ago |

Project website: https://testsmt.github.io/

wyldfire 5 years ago |

It's verification-verifiers all the way down.

carapace 5 years ago |

Congratulations! This is fantastic.

zsu 5 years ago | |

Thanks!

stonecharioteer 5 years ago |

"Come on Richard. Yin and Yang? Pretty sure it's Ying and Yang." - Ron LaFlamme