Neat Rust Tricks: Passing Closures to C(blog.seantheprogrammer.com) |
Neat Rust Tricks: Passing Closures to C(blog.seantheprogrammer.com) |
Maybe I just overcomplicate things ;-)
Fun fact: you can't safely do this with ctypes. Since it is called as pure Python, it cannot do watertight Python exception handling in a callback context (because even if you have a try/except block, an exception can always happen right before or after it), and ctypes provides no usable internal way of doing it - it just eats exceptions inside callbacks. This is what motivated me to rewrite Ceph's librbd bindings from ctypes to Cython.
The "neat" factor comes from how little type wrangling and unsafe code is needed.
(Also hi, go contribute to Nixpkgs again!)
Essentially it's taking advantage of the fact that closures are static methods with "implicit" data pointers. It should be fairly obvious that this is a massive violation of safety and undefined behavior, and most likely to break when debugging symbols etc. are inserted.
The safest way to do this until Rust has figured out a stable-enough-ABI for closure passing would be a thread-local trampoline, I guess. Not very nice..
[0] https://github.com/psychonautwiki/rust-ul/blob/master/src/he...
For callbacks the overhead likely isn't significant.
Example:
http://rosettacode.org/wiki/Window_creation#Win32.2FWin64
In this program, a translation of Microsoft's "Your First Windows Program" from MSDN, defun is used to define a WindowsProc callback. defun generates a lambda under the hood, which carries a lexical scope.
The lambda is passed directly to Win32 as a callback, which is nicely called for repainting the window. (Or at least, things appear that way to the programmer.)
Setting this up requires a few steps. We need a target function, of course, which can be any callable object.
Then there is this incantation:
(deffi-cb wndproc-fn LRESULT (HWND UINT LPARAM WPARAM))
The deffi-cb operator takes a name and some type specifications: return type and parameters. The name is defined as a function; so here we get a function called wndproc-fn. This function is a converter. If we pass it a Lisp function, it gives back a FFI closure object.Then in the program, we instantiate this closure object, and stick it into the WNDPROC structure as required by the Windows API. Here we use the above wndproc-fn converter to obtain WindowProc in the shape of a FFI closure:
(let* ((hInstance (GetModuleHandle nil))
(wc (new WNDCLASS
lpfnWndProc [wndproc-fn WindowProc]
...
The lpfnWndProc member of the WNDCLASS FFI structure is defined as having the FFI type closure; that will correspond to a function pointer on the C side. The rest is just Windows: (RegisterClass wc)
register the class, and then CreateWindow with that class by name and so on.http://www.kylheku.com/cgit/txr/tree/tests/017/qsort.tl
It's done in two ways, as UTF-8 char * strings and as wchar_t * strings.
What's used as the callback is the function cmp-str which is in TXR Lisp's standard library. A lambda expression could be used instead.
Also tested is the perpetration of a non-local control transfer out of the callback, instead of the normal return. This properly cleans up the temporary memory allocated for the string conversions.
Author is hilarious. Who is familiar with that but not c?
This is just like the node.js craze a few years ago - people will rant on trying to justify why you should use rust and write the "rust" way before realising that what they already had worked as intended.
A true replacement for C (when one is finally developed) will remove all of these doubts and back-shadowing behaviour almost instantly (kind of like the react way of ux did)
EDIT: typo
As for C staying around, unfortunately yes, until we get rid of POSIX based OSes, C will be around.
After all we need to keep those <UNIX clone OS> Security conferences alive. /s
Side point: ftw(3) is much more interesting unix API to call from some FFI layer than qsort(3). And I spent about a year pestering people from Sun with you should implement fts_open(3) and friends because it presents more sane API for FFIs for the same functionality.
It seems Solaris has been adding many BSD and, especially, Linux compatibility APIs lately. It seems too little, too late; or perhaps the initiative is part of their effort to EoL Solaris, providing an upgrade path to Linux.
There's really no reason to pass a rust closure to `qsort` instead of sorting in Rust. That said, if there's demand for real world use cases that require passing Rust closures to C APIs that take only a function pointer and not a data pointer, I'll be happy to write a follow up.
That's still true even if the API takes a separate context pointer that is given to your function as an argument.
There is still a function pointer there, and what you'd like to use as a function pointer is a function in your local language, and that's an object with an environment. Even if some instances of it have no environment, the FFI mechanism might want to tack on its own. For instance, the FFI needs to be able to route the callback to the appropriate function. Whatever function pointer FFI gives to the C function, when the C library calls that function, FFI has to dispatch it back to the original high level function. That requires context. Now that context could be shoehorned into that context parameter, but it could be inconvenient to do so; that parameter belongs to the program and to the conversation that program is having with the C API.
(qsort is really only for C. Other languages can potentially inline the comparison function, so using FFI for that is kind of insane.)
Problem solved.
This is the TXR Lisp interactive listener of TXR 228.
Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
1> (with-dyn-lib nil
(deffi qsort "qsort" void ((ptr (array wstr)) size-t size-t closure))
(deffi-cb qsort-cb int ((ptr wstr-d) (ptr wstr-d))))
#:lib-0005
2> (let ((vec #("the" "quick" "brown" "fox"
"jumped" "over" "the" "lazy" "dogs")))
(prinl vec)
(qsort vec (length vec) (sizeof wstr)
[qsort-cb (lambda (a b) (cmp-str a b))])
(prinl vec))
#("the" "quick" "brown" "fox" "jumped" "over" "the" "lazy" "dogs")
#("brown" "dogs" "fox" "jumped" "lazy" "over" "quick" "the" "the")
#("brown" "dogs" "fox" "jumped" "lazy" "over" "quick" "the" "the")
The lambda is pointless; we could create the FFI closure directly from cmp-str with [qsort-cb cmp-str]. It shows more clearly that we can use any closure.- is inherently slow, because CPUs have separate data and instruction caches;
- is extra slow in practice because you need a separate allocation for executable memory (unless your stacks and heap are RWX, which is a terrible idea);
- is not portable, requiring architecture- and OS-specific code; and
- is not supported at all in many environments (of varying levels of braindeadness).
For a statically compiled language like Rust, it makes much more sense to use the context pointer.
And big honkin' manual.
Roughly speaking:
thread_local! {
static CBQ: Option<Box<impl FnMut(i32, i32) -> i32>>;
}
#[no_mangle]
extern "C" fn qsort(array: *mut i32, val: usize, callback: impl FnMut(i32, i32) -> i32);
pub fn rust_qsort(array : Vec<i32>, callback: impl FnMut(i32, i32) -> i32){
CBQ.replace(Box::new(callback)).unwrap_none();
unsafe {
qsort(array.as_mut_ptr(), array.len(), &rust_qsort_callback);
}
CBQ.take().unwrap();
}
fn rust_qsort_callback(a: *mut i32, b: *mut i32) -> i32 {
let callback = CBQ.take().unwrap();
let (a, b) = unsafe {
(*a, *b)
};
let result = callback(a, b);
CBQ.replace(callback).unwrap_none();
result
}
fn main() {
let a = vec![4,5,6,3,2];
rust_qsort(a, |a, b| {
if a < b {
-1
} else if a > b {
1
} else {
0
}
})
}
ought to work. (There's some fun with generics and panics, which is some fun to solve, but nothing which breaks the premise above). wrapped_qsort(/* array,callback */)
{
auto tmp = CBQ;
CBQ = wrap(callback);
qsort(array.ptr,array.len,cbq_callback);
CBQ = tmp; /* pop old value from stack */
}It won't fail recursively - while the second call is happening, the first will be stored on the stack (see the take and replace in the callback shim function)
It shows no such thing.
I generally work on relatively small ~1MLOC C++ codebases. There are codebases out there measured in the hundreds of MLOC. These are not the comparatively tiny javascript codebases you find React used in - where additional milliseconds of download / parse / evaluation time has a measurable effect on your user retention statistics. There is no "near instant" at 100M+ LOC scales. There is only incremental rewrites, and incremental rewrites means making your new code talk to your old code, and to other people's existing code.
This means interop with existing C ABIs. There is no such thing as a C "replacement" that can't talk to an existing C ABI, or expose an existing C ABI.
Of course, a C ABI doesn't mean C. It's more frequently C++ in my ecosystem, for example. But it could just as easily be Rust, or any number of other languages capable of exposing a C ABI.
I agree with your argument, but I think in practice trying to "port" something with hundreds of MLOC is a losing battle (especially away from C). By the time you finished porting to rust (or your other language of the week), rust will likely have come and gone and will be been replaced by something either better or "better".
IMO people should spend less time trying to re-invent the wheel in rust and more time either improving C or the tooling / static analysis for C. It would avoid _so many_ of these issues.
And I'm all for more C tooling. Static analysis, fuzzing, sanitizers, valgrind, clang thread safety annotations, etc. are all wonderful tools I lean on heavily. But these are opt in, patchwork, platform specific, rife with false positives, false negatives, inconsistent, slow, painful to configure and use... I've wanted far more out of my C and C++ tooling than it's been able to give me for many years now. I'll frequently try out new attributes and annotations, only to curse when they fail to handle really trivial edge cases.
Meanwhile, Rust? It already catches things I didn't even realize I wanted to catch. Static checks opt-out into fast dynamic checks opt-out into heisenbugs in auditable unsafe blocks. The defaults are great.
I doubt C or C++'s tooling will reach the state of Rust, as frozen today, within the next decade. Smart people have tried long and hard to improve things, with quite middling results, convincing me it's a hard problem. I'm a bit more optimistic that it might catch up within the next century, but if I'm not long dead by then, I'll almost certainly be long retired. If it eventually does catch up, I suspect it'll have taken more than a few notes from Rust's approach.
I share your wariness of re-inventing the wheel, but the C & C++ static analysis ecosystem has left enough to be desired that I think it's warranted in this case. It's to rust's credit that they aren't re-inventing everything, and e.g. leverage LLVM for codegen, optimizations, debug info generation, etc.
That Solaris, iOS and in the future Android, pursue hardware memory tagging as workaround for memory corruption exploits, it is a proof how bad the situation in terms of security is.
Technically this can also be done via static code trampolines that are mmap'd as well [1]. That approach has been used on iOS in the past to turn blocks into raw function pointers.
If you have a platform that allows W+X on code (yikes!), you can do [2] as well.
[1] https://github.com/plausiblelabs/plblockimp/blob/master/Sour... [2] https://www.mikeash.com/pyblog/friday-qa-2010-02-12-trampoli...
Or if the OS doesn't support W+X allocation at all, then you can have a bunch of tightly packed pregenerated trampolines in the binary.
clo_code:
4C8B1501100000 mov r10 [rel clo_code+0x1008]
FF25F30F0000 jmp [rel clo_code+0x1000]
0F1F00 nop3
# one page away...
struct clo_slot {
void (*func)(void* _R10,...);
void* data;
};
Edit: to use r10 rather than rotating all the argument registers.