The API could even be a more modern pointer+length interface rather than null termination, to sidestep that class of mistakes/exploits (CWE-170).
https://www.daviddeley.com/autohotkey/parameters/parameters.... is a great read on how fragmented this all seems to be.
> Spaces embedded in strings may cause unexpected behavior; for example, passing _exec the string "hi there" will result in the new process getting two arguments, "hi" and "there".
https://learn.microsoft.com/en-us/cpp/c-runtime-library/exec...
Oh yeah, pass those arguments as a list, then we'll completely ignore that and fuck your shit up. Err, I mean you need to quote them! Even though they are passed as separate arguments.
I fail to see how this would help. If i understand correctly, the issue is how cmd.exe interprets the args, not how the args get to it.
- Create `CreateProcessArgv`, a version of `CreateProcess` that takes `argv` rather than `lpCommandLine` (like `execv*`)
- Create `GetCommandLineArgv`, an alternative to `GetCommandLine` that returns an `argv`
- Create `ProcessCreatedWithArgv` so a program can prefer either `GetCommandLine` or `GetCommandLineArgv` (for compatibility with those that have their own quoting, such as cmd)
Then child processes can use `GetCommandLineArgv` with no overhead if the parent invoked with `CreateProcessArgv`, otherwise `CreateProcess` and `GetCommandLine` will continue to work with no overhead. There would be a compatibility layer in the kernel to either split `lpCommandLine` or quote `argv` for `CreateProcess`+`GetCommandLineArgv` or `CreateProcessArgv`+`GetCommandLine` combinations. Probably need a way to opt out of taking `lpCmdLine` in `WinMain`.
Seems not-impossible, but also a bit of a pipe dream...
This "usual escaping mechanism" is a bit of a weasel word. Windows passes a single null-terminated character string to a process. Every application run-time must parse that into arguments itself.
I think what "usual escaping mechanism" refers to is the algorithm implemented in the Microsoft Visual C Run Time which takes the command line string and produces a char *argv[] for the main function.
There is no telling what uses that exact algorithm and what doesn't. Programs built with Microsoft languages probably do; obviously VC and VC++.
https://learn.microsoft.com/en-us/windows/win32/api/shellapi...
https://learn.microsoft.com/en-us/cpp/cpp/main-function-comm...
Also, here is another implementation of this algorithm in C#, used in .NET:
https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...
On Linux the parsing is done on the caller's side of the interface, so the caller knows what quoting rules to apply – could be Python's rules if they're using Python to construct the argument array, bash's rules if they're using bash, etc.
On Windows, the parsing is done on the receiver's side of the interface, so the caller can't know how it's supposed to be quoted unless they have special knowledge of a specific receiver and its parsing rules.
I could see how allowing the user to whitelist individual scripts would make sense, but as far as I can tell that's not how it works? A blanket policy of "all scripts are forbidden unless wrapped with fragile and shady-looking hacks" doesn't seem particularly useful.
But I guess here we have some of the underlying problem.
If something just executes whatever you throw at it, people complain.
If something doesn't just execute whatever your throw it, people complain as well. ;)
Sadly some software I use is so old that the only way to call Powershell scripts is via a batch script...
> bad, but not the worst
For example imagine that I have a shell script to write an entry to a guestbook. Maybe I call it from my webapp like this:
# webapp.py
subprocess.run(['guestbook', untrusted_msg])
On Linux this is perfectly fine. I can then write my guestbook script like #!/bin/bash
echo "$1" >> guestbook.txt
As far as I am aware there are no security issues here. The user can pass whatever they want as the message and other than some mess in the `guestbook.txt` file they can't cause any harm.However this doesn't work well on Windows because in order to escape the arguments you need to know how the `guestbook` program parses its arguments. Right now basically all languages assume that the caller will use `CommandLineToArgvW`. However if `guestbook` is a batch file a different parsing mechanism is used and remote code execution can occur before the batch script even starts executing.
Basically in order to properly escape the arguments the caller needs to know what is being called. The current APIs don't have a way to know this so they can't do it right in all cases.
I was just trying to write a simple batch script that accepted filenames as arguments and was surprised to find that there is no safe way to do so, as they're always passed through shell expansion, so if you have a filename like "foo %PATH% bar.txt" (which is allowed) the script will receive it with the PATH variable expanded and cannot get at the actual filename.
Also, passing arguments to programs is unsafe on Windows even if you don't go through the shell, because the quoting rules are entirely up to the program being invoked. The CreateProcess function[0] accepts a string, not an array, so you have to quote the arguments – but you can't do this quoting correctly unless you know exactly what program you're invoking and what grammar it has chosen for parsing its lpCommandLine string.
The article mentions that "many programming languages guarantee that the command arguments are escaped properly", but there is no universal "escaped properly" on Windows. There is escaped properly for the C runtime's parser[1], or escaped properly for CommandLineToArgv[2] which parses "in a way that is similar to" the C runtime, or escaped properly for .NET which has its own set of rules[3] – but there is no guarantee that any particular program is using any of these ways; any program can use whatever rules it likes!
Raymond Chen has written[4] about this as well.
PowerShell has an interesting workaround[5] of sorts: If you specify "-EncodedCommand" and "-EncodedArguments" it lets you pass base64-encoded strings when you "require complex quotation marks or curly braces".
[0] https://learn.microsoft.com/en-us/windows/win32/api/processt...
[1] https://learn.microsoft.com/en-us/cpp/c-language/parsing-c-c...
[2] https://learn.microsoft.com/en-us/windows/win32/api/shellapi...
[3] https://learn.microsoft.com/en-us/dotnet/api/system.environm...
[4] https://devblogs.microsoft.com/oldnewthing/20100917-00/?p=12...
[5] https://learn.microsoft.com/en-us/powershell/module/microsof...
Here's my try in ruby from a prior life: https://github.com/chef/mixlib-shellout/blob/main/lib/mixlib...
In practice, with the exception of `cmd.exe`, which is an old beast that cannot be redeemed due to backwards compatibility, there is a consistent way to round-trip argv to more-or-less all programs one encounters in the wild. It's not a guarantee and I'm sure you could find a program which does something weird, but you could find the same in the POSIX world. In both cases, we can probably agree that it's the mistake of the program that it's parsing arguments in a non-standard way.
Why block execution of PowerShell scripts when batch files, WSH scripts and plain executables can still run? You could try to prevent those other kinds of scripts from even getting onto the machine, I guess, but then why wouldn't you simply do the same for PowerShell scripts?
The AllSigned policy where it asks you explicitly about trusting new publishers[0] seems like what I'm asking for, except that it apparently requires the certificate to be installed in Trusted Root Certificate Authorities! That's way more trust than should be necessary.
The only option that seems to make sense (aside from Unrestricted) is buying a certificate from an existing CA that's already trusted, so that users don't need to trust you with acting as a CA, but that's quite expensive.
[0] https://www.hanselman.com/blog/signing-powershell-scripts
Unless you get a 2nd person on the team (working remotely), and they want to be able to sign scripts as well?
Unless you get some sort of automated CI/CD system?
At the core really is that on Linux the arguments provided as a list of separate arguments is The Format of arguments, so it can be exposed and used without question, whereas on Windows the native format is a single string which can still be used to achieve the same things, but now the callee must necessarily know what way the caller expects multiple arguments (if it does at all) and stdlibs so far had just been assuming one format where bat files have a different one.
Suppose `/bin/sh` concatenated all arguments together, then split them back apart. That would be a stupid thing to do, but that stupidity would be entirely contained within `/bin/sh`. A bug report for `/bin/sh` could clearly point to the broken component and state that it needs to be fixed. This is possible because the `execve` API provides a list of strings. Any extra (concatenate, split) pairs must exist on one side or another of the border imposed by `execve`.
Here, there's a mismatch between two entirely separate components. The `CreateProcess` API accepts an arbitrary string. The `GetCommandLine` function returns that same arbitrary string. The (concatenate,split) pair must straddle the border between the two processes, with concatenation done on the side that calls `CreateProcess`, and splitting done on the side that calls `GetCommandLine`. A developer for the parent process can shrug and say that it's the fault of the child process for not parsing arguments correctly. A developer for the subprocess can shrug and say that it's the fault of the parent process for not providing arguments in the expected form.
pathname must be either a binary executable, or a script starting with a line of the form: #!interpreter [optional-arg]
which is the equivalent of Windows starting CMD.EXE to execute a batch file. The only difference WRT the shell being invoked implicitly is how a script is detected (file name extension vs. first line of content), but that doesn't seem to be relevant when it comes to the shell mis-interpreting its inputs.