TIL you can de-obfuscate code with ChatGPT(twitter.com) |
TIL you can de-obfuscate code with ChatGPT(twitter.com) |
So, all I've learned is that ChatGPT knows the obfuscated and de-obfuscated versions of code that it itself has generated.
My original code:
function resizeImage(img) {
var maxHeight = 350;
var ratio = 1;
if(img.height > maxHeight) {
ratio = maxHeight / img.height;
}
var width = img.width * ratio
var height = img.height * ratio;
var canvas = document.createElement('canvas');
canvas.height = height;
canvas.width = width;
var ctx = canvas.getContext('2d');
ctx.drawImage(img, 0, 0, width, height);
return canvas.toDataURL("image/jpeg", 0.8);
}
ChatGPT's answer:
https://i.imgur.com/5jgPMEd.pngAlso interesting is how it's explanation of the deobfuscated code, although broadly correct in terms of goal, doesn't accurately describe the steps. Almost as if it's disregarding the code altogether and merely describing another implementation of "resizeImage".
function hi() {
console.log("Hello World!");
}
hi();
ChatGPT was unable to deobfuscate. Here's the answer: https://i.imgur.com/20XhPw6.pngI’ve been toying around with ChatGPT for a few weeks now and I encountered a few situations in which ChatGPT was like 90% accurate at best. Things like suggesting snippets of configuration files or plugin research. It’s good to get an idea and get started somewhere, but I certainly cannot trust it blindly.
This is kind of what makes it good for generating code, because everything it generates can be pretty quickly verified and validated by another machine (interpreter/compiler).
Makes it not so great for writing essays on books you didn't read, and especially for doing math you don't understand... because it can't do math AT ALL.
Let's hypothetically assume we have some sort of AGI and we can ask it to write programs and text and nothing else.
Is there anyone on this planet who would think that they don't need to look at the generated code? I mean imagine a manager simply feeding in tickets and getting a finished application out without ever knowing how it was produced.
The application is business critical and any kind of mistake could ruin his business which puts the manager at complete mercy of the AI.
Now you might say that this happens with humans as well but when humans cause problems we let other humans review and test their code.
AI causes problems? Let's add more humans. Wait a minute...
It can be verified in a sense that it builds, but that doesn't mean that it actually does what you asked it to do, or that it does it on all valid inputs. The worst bugs to track down are silent logic bugs.
---
This code uses JavaScript's `eval` function to obfuscate the code by looping over an array of strings and passing them as arguments to `eval` to create a variable. It also uses an anonymous function to obfuscate the code. The code is deobfuscated by replacing the `eval` function and the anonymous function with their respective strings.
Clearly if you didn't know how to write the other 29 lines of code there's no way you are going to be able to debug the regex.
In this case, 80% of the answer for 20% of the effort ?
It's very similar to the deobfuscated version, but ChatGPT wrote the code in the first place
It's just like any other AI system, it returns results as a best effort proposition of accurate with a % confidence that doesn't map well to binary outcomes.
So yes, it can be accurate. But there are scenarios where it must be strict or binary correct, and its not great at that bit.
I’m still excited to use it, but you have to know enough about coding to ensure correctness. It’s no where near possible for a non-coder to build a complicated app with (so far).
I had it write a handful of scripts for me yesterday. It got about 90% of it right on the first pass and 99% of it right on the second pass. You still need to have some understanding of what you're doing so you can see when things are wrong but man if it doesn't save you a lot of time.
alex literally asked chatgpt to come up with a math problem and its solution, both from whole cloth. and you ask so what.
well, everyone should ask "so what?" to alex.
anything else is just naive, wishfull thinking and a waste of time. you will have to deobfuscate the code manually anyways
but at least you got on HN
EDIT:
OMG...
the obfuscated code was generated by chatgpt
AND DOESNT EVEN WORK
int main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf("%.8s\n",(char*)&_);}
If you want to get the same (or similar) results: curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "code-davinci-002",
"prompt": "Deobfuscate the following C code:\n\nint main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf(\"%.8s\\n\",(char*)&_);}",
"temperature": 0,
"max_tokens": 256,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}'
That one loops on the response: The code is obfuscated with the following rules:
* `B` is the value of `b`
* `I` is the value of `i`
* `T` is the value of `y`
* `S` is the value of `_`
Changing to: "model": "text-davinci-003", int main(int b, char** i) {
long long n = b, a = *i[0] ^ n, r = (a / b & a) >> 4, y = atoi(*++i), _ = (((a ^ n / b) * (y >> 8) | y >> 7) & r) | (a ^ r);
printf("%.8s\n", (char*)&_);
}
Which... isn't exactly helpful.Asking it to explain it instead:
This code is a C program that prints out a string. The program takes two parameters, b and i, and assigns them to variables n and a. Then the program performs various calculations on those variables and the parameters, such as shifting and bitwise operations. Finally, the program prints out the result of the calculations, which is stored in the variable _, as an 8-character string.It would be interesting to test if it can solve "memory hard" problems, like repeated obfuscation or hashing, that would require large internal state. Then you could map how large the equivalent memory would be.
- "the result doesn't even work..." yeah, even to be a rubberduck is amazing, what thll you expect, got your payment too?
- "wow, amazing...": not really, best case it is a google without (direct) advertisement, found the original code/very similar parts..
tryied with own obfuscated code, not from the net... can not get anything from it
In my head, this is like asking someone to translate "Hello" into French and then asking them to translate "Bonjour" back to English. It proves nothing about capabilities or usefulness.
I have asked it to generate some mac os compatible vba code, oh dear, that "macro" never worked so far.
I've seen a project for battlefield 3 tho already have the feeling it's a team effort at minimum?
Note: I broke the disassembly intentionally because when I presented the original disassembly it immediately outputted the/a C program to factorize integers.
I can't even imagine that would be a particularly hard thing to do, especially if it isn't correct even before actively attacking ChatGPT! Fooling it even harder won't be terribly difficult. This is advantage attacker overall.
I imagine it would be as easy as using some cognitively loaded, but wrong, terms as variable names instead of short letters and numbers. Ask ChatGPT "please unobfuscate this network code" and get back a substring search algorithm because the network code was written with a dozen variants on "haystack" and "needle" for variable names, for instance.
ChatGPT being actively wrong would be a step back for such deobfuscators then, not a positive at all.
Such testing won't be able to prove that the two are equivalent (unless it's exhaustive) but with decent coverage of the original you can get some good confidence. The goal of deobfuscation is usually understanding, so I'm not sure you need strong guarantees of perfect semantic equivalence with no human intervention/judgment.
And of course, existing deobfuscators have bugs and aren't guaranteed to preserve semantics either.
so, ROUGHLY, what is it trying to do?
Just be happy with plausible sounding answers people. Sheesh!
Regardless, I find it silly to focus on the small flaws when we're witnessing a foundational shift in what kind of problems we can solve.
The key is to verify... and that's true for AI and people too, though for sure that's not something people are used to do sadly.
However, verifying results is and will always be important. Iterating on the prompts and alternative paths is also important.
But even as it is right now, it is very useful to a lot of people. Remember, a lot of people accepted self-driving cars that can drive themselves off roads and crash into trailers, even paid extra for it, while this is just text generation for now.
I see that it is going to transform everything. No technical revolution has ever been more readily apparent. We now have an interface to talk to computers that can translate unstructured raw language into other formats. This is the real innovation that will unlock the power of more specialized, more advanced models in the near future.
It's not merely UI changes.
Or like crypto was about to revolutionize the world in 2010 ?
Or maybe like we would have fully autonomous cars "in two years" in 2012 ?
I might be a pessimistic grandpa but it's not worse than a blind technophile
I agree, we’re not far from that, or we’re there now.
And it had some funny mistakes in there - something called "Reload service XYZ" and it was actually a hard restart of the service, rather silly file locations and such, sure.
But at the same time, it saved us an hour or two of boilerplate setup and even dug up a somewhat smart way to validate the configuration for this very specific service. This allowed us to jump more into understanding the service, tuning the config and setting up good tests for the setup instead of the same boring 20 resources in a config management.
I guess I could also ask if we could have some better form of service or config management which eliminates this boilerplate... but ChatGPT made our current day-to-day work a little easier there.
Especially given that the "obfuscated code" is not syntactically valid. Even if you repair the syntax, it contains a number of errors, and eventually descends into gibberish (although some of what it does is not bad!). There is nothing to deobfuscate here.
If i ask it for the dimensions of a product and it gives the wrong figures it should just tell me it doesn't know instead of inventing something.
That's the problem. It doesn't tell you when something is wrong so you can never trust if it's right unless you happen to know the field. That makes it far less useful.
and then it generated something that looks like valid source based on that garbage.
is the source it spat out runable? then it is not the same program as the input and (any way you spin it) nothing has been deobfuscated but just dreamed about the prompt a bit and then shown you its dream diary notes. wake up boy, this is statistics, not a magic swiss army knife API.
w.r.t codebases I may look at some of the free models (as this gets around the cost problem) and try to feed it prompts as a block of code plus meaningful references to same under the token limit.
[0] https://github.com/victor-li/pwnable.kr-write-ups/blob/maste...
I meant what I said. I expect ChatGPT would happily output a substring search algorithm for the accept loop of an HTTP server if you just put enough "haystack" and "needle" words in the obfuscated code. How are you supposed to "refine" that into the truth?
To the extent that there is an answer, the answer is, completely ignore the ChatGPT output and use existing tools. Which is to say, ChatGPT would be worse than useless at that point.
I'm not saying ChatGPT will be slightly off, and maybe the obfuscator can kick it to be another 5 or 10% wrong. I'm saying, it is likely trivial to update the obfuscator to make ChatGPT utterly wrong, in every detail, up to and including the entire fundamental nature of the code.
If you're saying that obfuscators can eventually adapt, then sure. So can deobfuscators. This particular problem is kind of inherently an arms race.