From hours to 360ms: over-engineering a puzzle solution

From hours to 360ms: over-engineering a puzzle solution(blog.danielh.cc)

97 points by max__dev 1 year ago | 35 comments

toast0 1 year ago |

I don't think it would make a large difference in runtime, but the bounds aren't quite right.

The largest possible value isn't 999,999,999, because that's not a legal value. The largest value is 987,654,321

Dividing that by 9, you get 109,739,369, but that's not a legal value either, you'd want to start with 109,738,365 for the gcd as it's a valid possible row. I don't know that all gcds would need to be a valid row, but any gcd larger than 98,765,432 would need to be used at multiples from 1-9; with that gcd or smaller, multiple 10 becomes viable and the gcd doesn't need to be valid with multiple of 1.

amyres 1 year ago | |

One more nitpick on the upper bound of the GCD: since we know that 0 is going to be in every column, one of the row numbers must start with a 0 and will be an 8-digit number (0 in the first cell). The GCD can't be larger than the smallest row number.

The GCD search should start at 098,765,432 and then go down from there.

Someone 1 year ago | |

> you'd want to start with 109,738,365 for the gcd as it's a valid possible row

Nitpick: that has two threes.

Also, what’s wrong with 109,738,654 from that argument’s view?

toast0 1 year ago | | |

Yes, I think you're right!

gus_massa 1 year ago |

If the floating point trick worth it? When I was young, floating point operations were very slow and I still try to avoid them like the plague.

I remembered there was a trick to avoid popcount, but I didn't remember it. So I found https://stackoverflow.com/questions/51387998/count-bits-1-on...

That version counts, the 1 bit's but here we already know the number is not 0 and we want to know that the inverted number it has exactly 1 bit set, so instead of

  popcnt(n) == 1

we can use

  n & (n-1) == 0

Moreover, n is the inverted number

  n = 0b1111111111 - s

so an alternative is to run the check in the original number, and I think that this does the trick:

  s | (s+1) == 0b1111111111

adgjlsfhk1 1 year ago | |

floating point has the same bandwidth and ~3x the latency of integer+/-

natessilva 1 year ago |

Should look into Knuth’s DLX. My JS implementation solved this case in 1.8ms

gus_massa 1 year ago | |

If you can write a blog post about it (including source code), I guess it will be good to post here.

aroundtown 1 year ago |

For us slow kids in the back what does this mean, “[…] such that the nine 9-digit numbers formed by the rows of the grid has the highest-possible GCD over any such grid”?

genewitch 1 year ago | |

All the digits in a row become a number. All nine such numbers have a greatest common denominator. The puzzle solution is the center row of the grid where the numbers have the greatest common denominator possible. I. E. None of them are prime.

It took 3 readings and a scan of some of the later words but I think this is the correct reading

So a naive program can just solve the puzzle repeatedly, differently, and ccmpute the GCD of the rows, and output the one with the highest result. There aren't infinitely many solutions to a sodoku style puzzle, so...?

sizediterable 1 year ago |

I also wrote my solution in Rust. I was pleased that my approach gave me an opportunity to write a linked list and get the chance to apply some rarely used learning[0].

My solution doesn't use SIMD, but is actually takes about the same amount of time as the the solution in the article, given the same number of cores, though an glaring weakness of my approach is that it can only scale up to 7 cores as written.

Rough outline of how mine works:

- Set current best GCD to 1.

- Search through all valid board configurations.

- Bail out from the search early if the current partially generated board couldn't have a GCD greater than the current maximum found, e.g. if we've only generated two rows of the board, and the GCD of those two rows are already less than the max.

- Update the current best GCD as you find higher ones.

- Share the current best GCD value across multiple threads. That way the longer the program runs, the earlier and earlier the searches will start bailing out.

- Don't always read from the shared variable to avoid contention. Instead, each thread has its own copy of the last value it read which we compare with first before copying and caching the shared value.

- Another interesting property of this approach is that it can be used to validate the correct solution even faster than it takes to find it. Instead of initially setting the max GCD to 1, set it to `solution - 1`. That way branches in our search will bail even sooner from the beginning. This leads to the program running about 20% more quickly.

Source: https://gist.github.com/lazytype/35b45f3ea81b5c1c5555546fe6f...

[0] https://rust-unofficial.github.io/too-many-lists/

amyres 1 year ago |

Since we're over-engineering here, one more optimization is to skip all potential GCDs that are divisible by 2 or 5.

Suppose the GCD was divisible by 2, then all rows would be even. Since the last digit of an even integer is in {0,2,4,6,8} and we need 9 unique numbers in the final column, we know that 4 or 5 of the row numbers must be odd. So the GCD can't be even.

Similarly, the GCD can't be divisible by 5. If it were, all rows numbers would need a 0 or 5 in the final digit.

lern_too_spel 1 year ago | |

We know that the GCD is divisible by 9 because the sum of the digits for every row is 45.

Jtsummers 1 year ago | | |

Not necessarily. The sum of the digits ranges from 36-45, there are 10 possible digits of which 9 will be used. If, say, 5 were the unused number then it would not be divisible by 9.

junon 1 year ago |

> My first attempt when solving this puzzle was to encode the constraints as an SMT optimization problem, and using Z3 to find the solution.

This is why choosing the right tool for the job matters. Z3 is hilariously overkill for such a problem (not to undermine how interesting and useful Z3 is, and not to accuse the author of doing anything wrong).

Z3 is only one class of solver; a finite domain constraint solver is much more suited to these sorts of tasks.

https://gist.github.com/Qix-/3580f268e2725848971703e74f6b26b...

    $ time node sudoku.js
    
    1 2 7  6 8 5  4 3 9  
    5 9 4  1 3 7  8 2 6  
    8 3 6  4 2 9  5 7 1  
    
    3 6 1  8 5 4  2 9 7  
    7 4 5  2 9 1  3 6 8  
    2 8 9  3 7 6  1 5 4  
    
    6 5 3  9 1 8  7 4 2  
    4 1 2  7 6 3  9 8 5  
    9 7 8  5 4 2  6 1 3 
    
    0:00.08elapsed

gus_massa 1 year ago | |

The article is not super clear, but the problem in the article has an additional requests, to maximize the GCD of the rows interpreted as "9" digits numbers, in your case:

GCD(127685439, 594137826, 836429571, ...) = 9

but the author finds one GCD with 8 digits.

Also, for this reason it's important in this version to keep the "0" as "0" instead of mapping it to "1".

foota 1 year ago |

Sigh. My friend did this problem on pen and paper and made me feel stupid. Their solution was so wildly clever! It relied on the observation that the sum of the rows of the sudoku board (given the digits in use) is a known fixed value, and went from there (I'll leave the rest as an exercise to the reader to avoid spoilers).

rahimnathwani 1 year ago | |

The Jane Street page explains the (a?) number theory approach.

toast0 1 year ago | |

This isn't exactly a sudoku board, because they allow for 0.

Jtsummers 1 year ago | | |

While it's not a true sudoku board, that invariant still holds. Whichever set of digits you end up using (it'll be the same 9 for rows, columns, and blocks) the sum will be same for all of them. It's also unique to the set of 9 numbers you end up using to solve the puzzle.

If you use 1-9, sum is 45. For anything else, it's 45 - (the unused number).

foota 1 year ago | | |

Yep, I'm aware, hence the given digits part. The whole board is still drawn from 9 digits, it's just that precisely which 9 digits is unknown.

cwmoore 1 year ago |

This looks like fun for hours and hours, special 100 digit numbers.

The last time I saw a puzzle like this it led me to https://www.kakurokokoro.com

moralestapia 1 year ago |

If anything, what a testament of the massive failure Z3 is.

Jtsummers 1 year ago | |

Unless I missed it, their Z3 solution wasn't even presented. So we can't comment on how good or bad Z3 is without seeing how good or bad their Z3 attempt was.

moralestapia 1 year ago | | |

That is true, indeed.

It could be that the author is just massively unskilled at writing Z3 code.

fn-mote 1 year ago | |

Can you say more about this?

On the face of it this comment seems ridiculous to me. Z3 is fabulously successful in other domains. Perhaps the problem fit is not there, or the problem encoding chosen was not appropriate.

moralestapia 1 year ago | | |

(Way) worse than exhaustive brute force search ... you don't really have to say much more.

I think it's the first time I see such thing in the wild.

$ time node sudoku.js 1 2 7 6 8 5 4 3 9 5 9 4 1 3 7 8 2 6 8 3 6 4 2 9 5 7 1 3 6 1 8 5 4 2 9 7 7 4 5 2 9 1 3 6 8 2 8 9 3 7 6 1 5 4 6 5 3 9 1 8 7 4 2 4 1 2 7 6 3 9 8 5 9 7 8 5 4 2 6 1 3 0:00.08elapsed

max_gcd = 0 for a1 in range(9): for b1 in range(9): ... *check_valid(a1, b1, ...)* n1 = to_number(a1, b1, ...) ... new_gcd = gcd(n1, n2, ...) max_gcd = max(max_gcd, new_gcd)