Modern SAT solvers: fast, neat and underused (2018)(codingnest.com) |
Modern SAT solvers: fast, neat and underused (2018)(codingnest.com) |
It's worth mentioning that there are higher level abstractions that are far more accessible than SAT. If I were teaching a course on this, I would start with either Answer Set Programming (ASP) or Satisfiability Modulo Theories (SMT). The most widely used solvers for those are clingo [0] and Z3 [1]:
With ASP, you write in a much clearer Prolog-like syntax that does not require nearly as much encoding effort as your typical SAT problem. Z3 is similar -- you can code up problems in a simple Python API, or write them in the smtlib language.
Both of these make it easy to add various types of optimization, constraints, etc. to your problem, and they're much better as modeling languages than straight SAT. Underneath, they have solvers that leverage all the modern CDCL tricks.
We wrote up a paper [2] on how to formulate a modern dependency solver in ASP; it's helped tremendously for adding new types of features like options, variants, and complex compiler/arch dependencies to Spack [3]. You could not get good solutions to some of these problems without a capable and expressive solver.
[0] https://github.com/potassco/clingo
[1] https://github.com/Z3Prover/z3
[2] https://arxiv.org/abs/2210.08404, https://dl.acm.org/doi/abs/10.5555/3571885.3571931
Each "school" excels at different types of problems. ASP for modelling a knowledge-base and running queries on it, CP for discrete optimization problems or for all-solution search, SMT for formal verification and proofs, MIP for optimization of (mostly) continuous variables.
Modern solvers in these "schools" can do things traditionally meant for other "schools". Z3 can do optimization, clingo can include CP-style constraints with clingcon, some MIP solvers can find all solutions, etc.
All in all, this type of "classical" AI is super interesting and I hope the hype on LLMs doesn't suck up all the funding that would go to research in this area.
These days I recommend people start with the Lifschitz book [2] and read through the Potassco book [0]. Lifschitz's book is a much gentler introduction to ASP and logic programming in general and its examples are in ASP code (not math). It's also more geared towards the programming side than the solving side, which is probably better for most people until they really want to understand what clingo/gringo/clasp are doing and what their limitations are.
There are other more applied courses, like Adam Smith's Applied ASP course at UCSC [3]. The problems in that course look like a lot of fun.
[0] https://potassco.org/book/
[1] https://teaching.potassco.org
[2] https://www.cs.utexas.edu/users/vl/teaching/378/ASP.pdf, https://www.amazon.com/Answer-Set-Programming-Vladimir-Lifsc...
Modern SAT solvers are fast, neat and criminally undertaught by the universities. Seriously, why isn't this taught at the undergraduate level in CS70 [1] discrete math type courses or CS170 [2] analysis of algorithms type courses?
Solving Environment | / - | \The CNF Converter is a gem.
https://github.com/scala/scala/blob/v2.13.5/src/compiler/sca...
I don't know anything about the Scala compiler. A few years ago I needed a CNF Converter and I ripped their Logic Module.
(performant CNF Converter are harder to find than SAT Solver)
He has a few open source SAT solvers and tooling that provide good and proven examples on modern SAT solver techniques.
https://www.anaconda.com/blog/understanding-and-improving-co...
For example, what are the defining characteristics of a graphs for which the travelling salesman problem is resolvable without resorting to brute force?
The Silent (R)evolution of SAT https://news.ycombinator.com/item?id=36079115
Boolean satisfiability problem
https://en.wikipedia.org/wiki/Boolean_satisfiability_problem
"In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE."
This reminds me of make[0] and of being made aware that make[0] is a SAT solver. I think it was when I attended a conference. Unfortunately I cannot find an authoritative source to quote, so will rely on, and be grateful to, the HN community to correct me should this be wrong.
https://github.com/darius/sturm/blob/master/satgame.py (Python 2)
I had a lot of fun making my own CDCL solver in Rust, and I’ve really enjoyed messing with Z3 for some theoretical computer science. On all of my explorations, there was a very tangible problem size beyond which the solve time was unusable.
In the case of Z3 with most real world problems, the typical problem size is beyond this limit.
In the specific context of language there was a famous debate between Chompsky and Norvig that touches on these themes: http://norvig.com/chomsky.html
I believe events of recent years have not been kind to Chompsky's side of this debate. I'm less bullish on large language models turning into AGI than many people here, but I think if we do develop AGI it's a certainty it will be a based on probabilistic models, not logically consistent formalisms alone.
The overheads are huge, and it's very bad at dealing with fuzzy situations.
I don't recall being taught any practical uses of SAT. It was introduced only in the context of Cook's theorem, as the problem you needed to reduce to other problems in order to show NP-completeness.
I think most people now learn SAT in that theoretical context, not as a tool to solve problems.
Note the former is used as a solution technique for feeding into SAT solvers, where the latter's goal is basically the exact opposite (to show NP-hardness and hence algorithmic intractability). Formal methods courses do the former, but algorithms courses usually use SAT for the latter.
[1] Rough problem outline: Input goods must be transformed through a complex series of assembly/machining/processing lines to output goods; each line transforms one or two inputs into an (intermediary or end-) product; an assembly line produces it's product in a fixed number of time units and cannot be stopped or delayed; finished intermediary products arrive at the end of an assembly line, which must be cleared by then; there are a limited number of temporary storage spaces where intermediary products can be moved to/from in a fixed number of time units; some assembly lines must wait for two intermediary products to be completed to start a job combining them into another intermediary or end product; end products must then be moved to their destinations.
These should also be taught in an advanced PL course, e.g liquid, dependent etc types.
Which is a bit of the general idea of these being underused. If you can get your problem into a SAT form or three, than feed it to a state of the art solver, it can work amazingly well. But, you will be spending a lot of time moving to and from the SAT formulation.
Schaefer's dichotomy theorem shows that, when possible, just make sure to use Horn clauses when possible.
Takes a bit of thinking but is superior to huristics or SAT solvers if you can refactor to allow for it.
many package dependency managers use sat solvers since suse spearheaded this in 2007/2008 with libsolv, which is blazing fast for large repositories.
This meant that people who had moved to Conda because they couldn't get Pip to install important packages into their environment took another look and found that actually they could get things installed using Pip now.
At the same time Pip also got a resolver allowing you to have install time confidence you're not installing conflicting package, and recently (Pip 23.1+) the resolver's backtracking got pretty good.
That said Conda mostly solved this (and once mamba is the default resolver engine things will be really fast), Pip is not ever going to be a package manager, and Poetry still isn't an environment manager, and most other Python package/installer alternatives to Conda won't do things like install your Jupyterlab's nodejs dependency.
After many years I now almost exclusively use Pip to install into an environment, but still nothing beats Conda for bootstraping the non-Python-package requirement's (such as Python itself) nor for getting things working when you are in a weird environment that you can't install OS dev libraries into.
Fabrice Le Fessant, Luc Maranget, Optimizing Pattern-Matching ICFP'2001 http://pauillac.inria.fr/~maranget/papers/opat/
I’m no SMT expert, but the way I’ve done it is to make some representation in Python Z3, and then write some function or class that generates those. I was solving MLB eliminations (more complex than it sounds) and I think I used arrays of ints for number of wins. So I’d pull MLB data, turn that into schedule objects which turned themselves into z3 constraints.
- Puzzles: Sudoku, St8ts
- Bridge crossing: Missionaries and cannibals, 17 minute bridge crossing (I need a computer to solve this anyway)
- Concurrency: finding bug in mutual exclusion algorithm (Peterson algorithm with a bug)
- Graphs: coloring a map, finding a Hamilton path
- Sorting: Finding bugs in a sorting network
For many students it was difficult to encode problems in SAT. They seemed to understand given example encodings, but then found it difficult to vary them. There is a lot of freedom in how one may encode things and it's hard at first to debug at first when things don't work in the way you're expecting. If there is no solution, then one needs to investigate where there are unwanted constraints or errors in the encoding. If there are unwanted solutions, one needs to identify the missing constraints. It was hard to get across how to do this and it's probably frustrating for beginners.
> It was introduced only in the context of Cook's theorem, as the problem you needed to reduce other problems to in order to show NP-completeness.
Are you referring to reductions from SAT, or to SAT? You seem to be mentioning both?
I will also note that this amuses me to no end. If you have enough dependencies that you need the speed of a SAT solver.... how many dependencies do you have? And why are they changing so dang much?
The playlist is aimed at a general scientific/business audience, the presenter suggests that a lot of natural and business systems can be described in this manner. The presenter also mentions how a Clingo program was used, without modification, to optimize radio frequency band allocation.
Here's a repository [1] of ASP programs in clingo. Under problem classes, I see mostly: game AIs, graph problems, various puzzles, so on.
[0] https://www.youtube.com/playlist?list=PL7DBaibuDD9O4I05DiQfi...
> DNF is a fork of Yum 3.4 that uses libsolv via hawkey for a backend. The main goals of the project are:
> * using a SAT solver for dependency resolving
> ...
Edit: I will note that the linked doc is silly old. And it seems that the original proposed replacement for YUM was Hawkeye, and is deprecated? But DNF is still steaming ahead? I didn't find any obvious links talking about performance. Did the SAT idea pan out? I'd almost think it was a bust, with some of these wikis. :(
The syntax looks like Prolog, but predicate negations are a first class citizen, avoids infinite loops.
Prolog's approach is like a depth first search through a search tree -- ASP is like a nondeterministic turing machine, exploring all branches simultaneously from the bottom up.
[1] https://en.wikipedia.org/wiki/Answer_set_programming
[2] https://www.youtube.com/playlist?list=PL7DBaibuDD9O4I05DiQfi...
You can spend time looking really hard at the problem instances you have and identifying common patterns, and write a complex algorithm that works well as long as the dependencies you are trying to solve at least sort-of follow these patterns. This usually works well until it fails completely, at which point you can look really hard for new patterns in new use cases, and make your complex algorithm handle those as well.
But there's also the option of turning your problem into a SAT (or answer set programming, or constraint programming, or integer programming, etc.) formulation, using an existing SAT solver, and not have to write any complex algorithm at all.