https://arxiv.org/abs/2105.02124
> The universal Turing machine is generally considered to be the simplest, most abstract model of a computer. This paper reports on the discovery of an accidental arbitrary code execution vulnerability in Marvin Minsky's 1967 implementation of the universal Turing machine. By submitting crafted data, the machine may be coerced into executing user-provided code. The article presents the discovered vulnerability in detail and discusses its potential implications. To the best of our knowledge, an arbitrary code execution vulnerability has not previously been reported for such a simple system.
> A common strategy for understanding a problem is to reduce it to its minimal form. In the field of computer security,we may ask the question: "What is the simplest system exploitable to arbitrary code execution?" In this article, we pro-pose an answer to that question by reporting on the discovery that a well-established implementation of the universal Turing machine is vulnerable to a both unintentional and non-trivial form of arbitrary code execution.
> This paper reports on the discovery of an accidental arbitrary code execution vulnerability in Marvin Minsky's 1967 implementation of the universal Turing machine. By submitting crafted data, the machine may be coerced into executing user-provided code.
It's a universal Turing machine. Its whole purpose is running "user-provided code". That's what a Universal Turing machine does, it runs arbitrary Turing machines. This is a little bit like saying "we found a weakness in the Python interpreter whereby you can feed it specially crafted input that allows you to run arbitrary Python programs". Like... yeah... that's what it's supposed to do.
If my understanding is correct, it's more like, "we found a weakness in the Python interpreter whereby you can feed it a specially crafted input to a Python script that allows you override that script and forces the Python interpreter to do something else", which can be a reasonable point to make.
BTW, if we keep using the Python analogy, this scenario sounds like a Python script that loads its configuration data from a Pickle file on the hard drive. In Python, Pickle doesn't validate the input and it's capable of modifying the internal state of the program. What the authors say is that malicious data in the Pickle file leads to code execution due to deserialization of untrusted data. Depending on your perspective, just like Python says a malicious Pickle file is not in its threat model, you can argue that the Universal Turing Machine is never meant to be protected from malicious data on the tape, and this is not a exploit, but an interesting thought experiment nevertheless.
The paper says,
> The universal machine, U, will be given just the necessary materials: a description, on its tape, of T and of [the initial configuration on T's own, simulated tape] s_x; some working space; and the built-in capacity to interpret correctly the rules of operation as given in the description of T. Its behavior will be very simple. U will simulate the behavior of T one step at a time [...]
> [...] There is one obvious trust boundary in a universal Turing machine, U: the initial string on the tape of the simulated Turing machine, T. That string corresponds to the user-provided data of an ordinary computer program. Because the potential users may be unknown to the developers and administrators of the computer and its programs, it is common to view this data as untrusted. In our explorations of the universal Turing machine, we will make the same assumption. Therefore, if it were possible to execute arbitrary code without manipulating the program of T, but only by providing crafted data on T’s simulated tape, that would constitute a vulnerability.
Basically the authors' reasoning is:
1. An Universal Turing machine is an interpreter/simulation that is capable of executing code written for any Turing machine, provided by the user.
2. The user code takes an external input - the initial content on the tape.
3. It's possible to maliciously craft an input (the content on the tape) to the user code to hijack the simulation, without modifying the user code.
At least they're honest about it with this CVE...
I reported CVE-2014-3423 back in the day, relating to GNU Emacs using a predictable filename when talking to the Mosiac browser. No choice, as that was what the browser required, something that wouldn't exist these days.
Not sure if declaring this is standard practice, but I had a good laugh.
Making a language that have the expressive power of finite state machines could be an example.
- Configuration languages (JSON, YAML, XML) are pure combinatorial logic.
- Regular expressions are... mostly not actually regular, but you get the idea.
- Some templating languages are deliberately less powerful than Turing machines, e.g. ST4 is context-free.
- Prepared SQL statements are a similar idea on a different axis.
The real question is whether a non-TC language could be useful for general purpose programming. Such a language might come with very strong guarantees (termination, time complexity, even correctness or a limited form of correctness), but they might be extra-cumbersome for 'normal' workloads.
It's as if you could pass a "null terminator" in a Unicode string that didn't match the "normal" null character.
Brainfuck is a classical example of a turing complete language that doesn't have a notion of functions. What does "recursion" mean here? Nothing, I think. So you don't need recursion for Turing completeness.
Some people work on languages that support functions as first class objects, which don't natively support recursion. This is nice from a language design standpoint, because functions only see the scope of their arguments and can't look any further. The "Y Combinator" pattern is used to bootstrap recursion into these languages. You may have heard of it.
Turing-incomplete languages can also support unlimited recursion. For example, take a Turing complete language, and equip it with a state of the art theorem prover. Before entering a loop, or calling a function, the theorem prover is allowed to spend up to 1 minute proving that the loop or function call will terminate. If the theorem prover times out, the program halts with an error. Some functions, such as a naive implementation of factorial, can be easily proved to halt for all inputs. So this language allows recursion, but by construction, all programs terminate, so it's turing incomplete.
Obviously using a recursive descent parser is out but thats probably easy to work around right?
I could be wrong tho so don't base mission critical code on this without rigorous testing!
Key to being a good programmer: at least read a bit about the language before speculating about it. Not sure what you mean by "allows constants", but you're only given increment and decrement operators, and a promise that fresh tape is zeroed out.
Brainfuck supports iteration. Performing recursion in a higher level language will still look like iteration in Brainfuck.
The bytecode is theoretically turing-complete, but the verifier present in the kernel ensures that the code contains no loops and accesses no out-of-bounds memory.
With `clang`, you can compile C down to eBPF bytecode. It's a subset of C, with a lot of restrictions, but it is familiar and capable of some remarkable things past just simple packet parsing & filtering.
U is fed machine T and input I. By only changing I you can force U to execute a different machine X.
The implications of this are well understood. You'd have similar vulnerabilities if you ran raw machine code on, say, an x86 processor. Enforcing checks and sanitization in such a way that the 'exploit' can't happen is the exact kind of job you'd have to do in order to write an operating system.
The CVE is a joke in the sense that you don’t really need a CVE for a universal Turing machine, as they are quite literally only academic.
paper is essentially saying that by using simplest machine possible
we prove it does not provide layers of security abstractions which were invented and deployed in last 40+ years to computers
so thats like getting CVE for turning off ASLR, AppARMOR SElinux etc in linux kernel.
Showing a universal Turing machine is a proof of the fact that when you write stuff like the smn theorem or anything that uses universal functions you aren't writing nonsense.
There is no "contract" between the universal machine and the hypothetical user. One could certainly ensure that, say, the input TM is properly formatted, or that "the runtime" will only use a certain set of cells, and so on. The difficulties and problems one would find in trying to do so are the same one would face when writing an operating system and a compiler.
Filtering only printable user input helps but even bit map images can expose a heap to a sensitive registers that will execute some target specific generated shell code.