From hours to 360ms: over-engineering a puzzle solution(blog.danielh.cc) |
From hours to 360ms: over-engineering a puzzle solution(blog.danielh.cc) |
The largest possible value isn't 999,999,999, because that's not a legal value. The largest value is 987,654,321
Dividing that by 9, you get 109,739,369, but that's not a legal value either, you'd want to start with 109,738,365 for the gcd as it's a valid possible row. I don't know that all gcds would need to be a valid row, but any gcd larger than 98,765,432 would need to be used at multiples from 1-9; with that gcd or smaller, multiple 10 becomes viable and the gcd doesn't need to be valid with multiple of 1.
The GCD search should start at 098,765,432 and then go down from there.
I remembered there was a trick to avoid popcount, but I didn't remember it. So I found https://stackoverflow.com/questions/51387998/count-bits-1-on...
That version counts, the 1 bit's but here we already know the number is not 0 and we want to know that the inverted number it has exactly 1 bit set, so instead of
popcnt(n) == 1
we can use n & (n-1) == 0
Moreover, n is the inverted number n = 0b1111111111 - s
so an alternative is to run the check in the original number, and I think that this does the trick: s | (s+1) == 0b1111111111It took 3 readings and a scan of some of the later words but I think this is the correct reading
So a naive program can just solve the puzzle repeatedly, differently, and ccmpute the GCD of the rows, and output the one with the highest result. There aren't infinitely many solutions to a sodoku style puzzle, so...?
My solution doesn't use SIMD, but is actually takes about the same amount of time as the the solution in the article, given the same number of cores, though an glaring weakness of my approach is that it can only scale up to 7 cores as written.
Rough outline of how mine works:
- Set current best GCD to 1.
- Search through all valid board configurations.
- Bail out from the search early if the current partially generated board couldn't have a GCD greater than the current maximum found, e.g. if we've only generated two rows of the board, and the GCD of those two rows are already less than the max.
- Update the current best GCD as you find higher ones.
- Share the current best GCD value across multiple threads. That way the longer the program runs, the earlier and earlier the searches will start bailing out.
- Don't always read from the shared variable to avoid contention. Instead, each thread has its own copy of the last value it read which we compare with first before copying and caching the shared value.
- Another interesting property of this approach is that it can be used to validate the correct solution even faster than it takes to find it. Instead of initially setting the max GCD to 1, set it to `solution - 1`. That way branches in our search will bail even sooner from the beginning. This leads to the program running about 20% more quickly.
Source: https://gist.github.com/lazytype/35b45f3ea81b5c1c5555546fe6f...
Suppose the GCD was divisible by 2, then all rows would be even. Since the last digit of an even integer is in {0,2,4,6,8} and we need 9 unique numbers in the final column, we know that 4 or 5 of the row numbers must be odd. So the GCD can't be even.
Similarly, the GCD can't be divisible by 5. If it were, all rows numbers would need a 0 or 5 in the final digit.
This is why choosing the right tool for the job matters. Z3 is hilariously overkill for such a problem (not to undermine how interesting and useful Z3 is, and not to accuse the author of doing anything wrong).
Z3 is only one class of solver; a finite domain constraint solver is much more suited to these sorts of tasks.
https://gist.github.com/Qix-/3580f268e2725848971703e74f6b26b...
$ time node sudoku.js
1 2 7 6 8 5 4 3 9
5 9 4 1 3 7 8 2 6
8 3 6 4 2 9 5 7 1
3 6 1 8 5 4 2 9 7
7 4 5 2 9 1 3 6 8
2 8 9 3 7 6 1 5 4
6 5 3 9 1 8 7 4 2
4 1 2 7 6 3 9 8 5
9 7 8 5 4 2 6 1 3
0:00.08elapsedGCD(127685439, 594137826, 836429571, ...) = 9
but the author finds one GCD with 8 digits.
Also, for this reason it's important in this version to keep the "0" as "0" instead of mapping it to "1".
If you use 1-9, sum is 45. For anything else, it's 45 - (the unused number).
The last time I saw a puzzle like this it led me to https://www.kakurokokoro.com
It could be that the author is just massively unskilled at writing Z3 code.
On the face of it this comment seems ridiculous to me. Z3 is fabulously successful in other domains. Perhaps the problem fit is not there, or the problem encoding chosen was not appropriate.
I think it's the first time I see such thing in the wild.
Either one of the two statements is true, no other way around.
Who asked?
It's a forum, people come here to discuss the submissions.
You should know this ...
No, I mean even the most naive brutal force code comes back after ~10 minutes, whereas the Z3 didn't finish even after "several hours".
I am now quite curious about writing a Z3 implementation of this myself and try to figure out why is it taking that much time.
max_gcd = 0
for a1 in range(9):
for b1 in range(9):
...
*check_valid(a1, b1, ...)*
n1 = to_number(a1, b1, ...)
...
new_gcd = gcd(n1, n2, ...)
max_gcd = max(max_gcd, new_gcd)
and the first brute force version was for gcd in range(111111111):
for r1 in range(?):
for r2 in range(?):
...
a1, a2,... = split(r1*gcd)
...
*check_valid(a1, b1, ...)*
max_gcd = gcd
(I guess the author was checking earlier, but let's put only one check in may fake version.)It's very difficult for a compiler to transform one into the other.