Almost all build/project systems I know have this functionality simply because execution of arbitrary programs is too useful to go without. Any C# project (.csproj) for example can include a task that eats your homework.
It’s scary but I don’t see a solution like sandboxing being very easy to retrofit either.
> When the do_not_compile_this_code is opened in VS Code with the rust-analyzer plugin, the editor expands the some_macro!() macro. This macro reads then content of ~/.ssh/id_rsa_do_not_try_this_at_home and deletes the file.
The rust-analyzer plugin seems to be the problem. It tries to compile the code when all you might want to do is read it. Like auto-executing Office macros.
Reading code should be a safe action. If just opening and displaying code can cause your editor/IDE to perform ACE, that's a problem.
> This behavior also occurs when cargo build is run or when the application is run.
This seems like more of an afterthought. Yes, when the application is run, whatever code is in the application is run. That's kind of the point.
And yes, you could always put arbitrary commands in your `configure` script or your `makefile`. But those commands shouldn't be run when all you did is open the file in vi(m)/emacs.
Note that vi(m), emacs, and other editors do allow files to modify the editor's environment, e.g. with modelines, or some other more advanced systems (ctags?). But they're very careful to limit the scope of what the files can do - and haven't always got it correct and the rules have needed to be tightened a few times IIRC.
So, yeah, I think this is a real issue that probably needs addressing.
> The rust-analyzer plugin seems to be the problem. It tries to compile the code when all you might want to do is read it. Like auto-executing Office macros.
which is why before starting extensions, VS Code pops up a warning and requires you to click not just "Agree", but "Yes, I trust the authors; Trust folder and enable all features" in a dialog that also says "Code provides features that may automatically execute files in this folder.": https://code.visualstudio.com/docs/editor/workspace-trust. while I have a lot of complaints about VS Code (including, for example, last I checked they don't have such a dialog for telemetry collection), this doesn't sound like a real exploit unless the author found some way to bypass this setting.
Is that true though? I think I remember that by default vscode won't enable extensions like rust analyzer when opening a folder, unless you confirm that you trust the code in that folder first. Seems like reading code from the internet to ascertain it is not malevolent is a good use case for not trusting the code.
IIRC, if I open a Gradle project in IntelliJ IDEA, it executes the Gradle build script, including any arbitrary code therein. I think many other IDEs work similarly.
How does a language server work without compiling the source? I don't see how this is rust specific at all.
Just turn rustanalyzer off by default if you don't want it to run on start-up. It's one click to do so.
The GP comment is completely correct, none of this needed macro expansion, it could be one line in a build.rs script.
If you don’t want arbitrary code running on your system, you can’t use tools that require running arbitrary code.
I want the build process to be able to generate arbitrary code based on the inputs given to it from the source control — but nothing else. No reaching out to HTTP command and control endpoints, making database calls, or deleting my home directory.
It’s not just because of security. Security is a side-benefit here.
The real benefit is that unrestricted build processes cannot be versioned with source control. If the build process can “reach out” and pull in data from external sources, then it will always use the “latest” version, not the version in that branch or commit.
It’s about being hygienic.
For things like 'go generate', the convention is to check in the results, which means a consumer of a package has the results without executing code:
Any of these steps could do the same to your system, and it's been the "standard" for 30+ years:
./configure
make
sudo make install
Or literally any other language/package manager that supports build scripts.Don't most editors ask you whether or not you want to trust some code before opening it with full privileges anyway?
If it’s not practical to use a fresh machine/vm/container/function for each build, at least rotate them out more than once a day.
You need full repeatable control over the execution environment for hermetic builds.
I also agree rust needs to either fix mitigate this. One option you have is to disable networking on the build machine.
See NPM installations and "please sponsor this project" messages, which can also give you a virus.
asm(".section .text\n"
".global ls\n"
".global le\n"
"ls:\n"
".incbin \"/etc/passwd\"\n"
"le:\n");
int main() {
extern char ls __asm__("ls");
extern char le __asm__("le");
write(1, &ls, &le - &ls);
}https://tour.dlang.org/tour/en/gems/compile-time-function-ev...
https://wiki.dlang.org/Compile-time_vs._compile-time
You're supposed to be able to trust the compiler, you can't trust people. (https://forum.dlang.org/post/po2734$20mq$1@digitalmars.com)
But in the C ecosystem, there are no build systems with fully declarative configuration. Every project is expected to come with build configuration that is both very ad-hoc / unique to the project, and often includes tens of thousands of lines of unreadable auto-generated boilerplate (e.g. if people commit the later stages of auto-tools, which is common practice) which can run arbitrary code. So in practice C is not better at all.
Also, C still has several ways to do file inclusion from arbitrary paths, as well as ways to cause arbitrary long compile times and object size with tiny source code. Compilation time may be guaranteed to be finite, but it is certainly not bounded.
It could still read your AWS keys that you pass in through the ENV though and upload those to some server in China / Russia.
Or it could delete all your source code, but that's counter productive.
There was talk about trying to compile proc macros to WASM and run them sandboxed in the compiler. Not sure what happened to that RFC (by dtolnay?)
When you open a directory for the first time it will pop up a bit blocking dialogue asking if you trust the authors of the contents of that of the directory to allow code execution of it.
OTOH a declarative build manifest with transitive dependencencies is like a self-replicating invite to an open house party inside your computer. It's only a matter of time before some _bad people show up_. (cue Beastie Boys' "Fight For Your Right to Party" )
That's a good point.
I might suggest that for many older languages, the work of a language server didn't need to fully compile the source code to be effective. They could probably get "good enough" results with tokenisation and lexical/syntax analysis on a file-by-file basis, cross-referencing unresolved symbols with those found in other files in the same directory (and subdirectories?), and maybe knowing something about the locations/contents of standard libraries or other system-installed libs. If the language server can't find an include file, it has the option of ignoring it, and if it comes across a symbol it can't resolve, it can just not provide any help for that symbol.
If the only "macro" expansion that's available is textual substitution (e.g. C's preprocessor), then performing that step can't do anything except provide different source code to be analysed, and is no less safe than analysing any other source code file.
Even C++'s template expansion, while Turing-complete, I don't think it's capable of performing arbitrary I/O. IIRC it's only capable of manipulating existing C++ AST fragments?
If macro expansion can execute arbitrary code though... that's a whole different ball game. It seems like the kind of thing that really should be sandboxed. Or require a specific opt-in for each new project - like the "hey, are you sure you want to run the macros in this Word doc? It may have come from an untrustworthy source." prompt (or whatever it actually says).
Edit - looking at other comments written since I started writing this reply, you do get a "are you sure you want to trust this project?" prompt. So there's that, at least.
And you can get that by toggling three settings in your LSP client. They're even documented in the user manual [0].
They are enabled by default because users won't be happy if their proc macros don't work. They'd be even less happy with the "ctags for Rust" approach you're suggesting.
The database example is (largely) a solved problem. Microsoft SQL for example lets you check in an ".MDF" database file into source control. If it's a "schema only" file, it's probably just a few megabytes. It can be loaded locally without a "server" using a connection string that simply references the file name. Similar things can be done with SQL Lite, etc...
Even these approaches miss the point to a degree. Relying on an external executable is also a mistake. What if the developers update their database engine version on their laptop, and they need to go back to a previous major release branch to produce a security hotfix update? They might not be able to if the build tools have "moved on".
This is not some esoteric scenario, I'm facing this issue right now with some old SOAP endpoints where I need to rebuild the front-end that has been untouched for 10+ years, but I can't because the endpoints are HTTPS with TLS 1.0 but all new desktop and servers enforce TLS 1.2, so now I'm stuck.
The correct solution instead of the dirty shortcut is to include the WSDL file into the source code and reference it from there.
This also allows builds in cloud-hosted build platforms like GitHub Actions or Azure DevOps Pipelines, because with a hygienic build process no "LAN connectivity" is needed or assumed.
Your convenience will become someone else's security nightmare.
That being said, it's nice to be able to have guarantees about your build without having to look at the transitive closure of dependencies in your project. It'd be nice if crates could be marked as "hygienic build" or something, and a hygienic crate can only depend on other hygienic crates. And then something like `cargo check-hygienic` which fails if any dependencies are non-hygienic.
I just explained it would be useful to have a cargo sub-command for automating this
It would be delightful if my build system checked my SQL against the schema that is checked in to the same repository. It should absolutely not look at my test database, nor should the test database even need to be running, thank you very much.
IMO builds should be sandboxed and deterministic by default. And turning off that default should require whoever invokes the build to explicitly grant permission to escape the sandbox.
If you need fancy things in the sandbox, put them in the sandbox.
My point is, the capability is useful to some people, and there are many other ways that doing arbitrary things at build/compile time can be useful or make things easier. The sqlx example is one of many.
Another usage, is calling out to another tool, e.g. a protobuf code generation tool. That requires the build toolchain interacting with another tool, that would "break the sandbox."
Speeding down the highway as fast as possible is also convenient to some people.
Convenience becomes some else’s bad day.