LuaJIT bad FFI callback issue
LuaJIT's FFI is super fast. It is even faster than C/C++'s dynamic library call and 40x times faster than Golang's CGO (reference). But there is one thing not allowed:
an FFI call into a C function get JIT-compiled, which in turn calls a callback, calling into Lua again.
When it happends, user will have a "bad callback" error.
What is a LuaJIT FFI callback?
LuaJIT FFI callback is a lua function, which is called by FFI C code.
An example to trigger the bad callback issue.
-- save to file test.lua local ffi = require("ffi") ffi.cdef [[ typedef int (*my_fn_t)(int); int f2(); void setup(my_fn_t f, int); ]] local lib = ffi.load("test") function setup(cb, a) lib.setup(cb, a) end function f0() -- The FFI call to f2(), which is defined in C library test.so. return lib.f2() + 1 end do local cb = ffi.cast("my_fn_t", -- This is the FFI callback function. function(a) return a end) for i=1,100 do if i == 80 then setup(cb, 10) end f0() end end
// save to file lib.c // and compile it to a shared libary: // gcc -Wall -O -g -o libtest.so -fpic -shared lib.c // put libtest.so under the same path of test.lua typedef int (*my_fn_t)(int); my_fn_t gf = 0; int ga; void setup(my_fn_t f, int a) { gf = f; ga = a; } int f2() { if (gf == 0) { // this is necessary to escape the auto-detection. return 3; } else { return gf(ga) + 1; } }
And then, run it will give us the "bad callback" error:
$luajit test.lua PANIC: unprotected error in call to Lua API (bad callback)
Run it with trace dump will show that the f0() call is compiled by the JIT compiler:
---- TRACE 1 start test.lua:27 0022 ISNEN 6 0 ; 80 0023 JMP 7 => 0028 0028 GGET 7 9 ; "f0" 0029 CALL 7 1 1 0000 . FUNCF 2 ; test.lua:15 0001 . UGET 0 0 ; lib 0002 . TGETS 0 0 0 ; "f2" 0000 . . . FUNCC ; ffi.clib.__index 0003 . CALL 0 2 1 0000 . . FUNCC ; ffi.meta.__call // FFI call at line: 17 in test.lua is compiled. 0004 . ADDVN 0 0 0 ; 1 0005 . RET1 0 2 0030 FORL 3 => 0022
The call chain is like: Trace 1 -> lib.f2() -> lua callback function.
LuaJIT's bad callback auto detection feature
In some case, LuaJIT can automatically detect bad callback and disable the JIT compilation for the related FFI call. Here is a slightly modified example from previous example.
local ffi = require("ffi") ffi.cdef [[ typedef int (*my_fn_t)(int); int f2(); void setup(my_fn_t f, int); ]] local lib = ffi.load("test") function setup(cb, a) lib.setup(cb, a) end function f0() -- The FFI call to f2(), which is defined in C library test.so. return lib.f2() + 1 end do local cb = ffi.cast("my_fn_t", -- This is the FFI callback function. function(a) return a end) local cb2 = ffi.cast("my_fn_t", -- This is the FFI callback function. function(a) return a+1 end) setup(cb, 10) for i=1,100 do if i == 80 then setup(cb2, 10) end f0() end end
typedef int (*my_fn_t)(int); my_fn_t gf = 0; int ga; void setup(my_fn_t f, int a) { gf = f; ga = a; } int f2() { // f2 will always call a Lua callback. return gf(ga) + 1; }
This example can run without bad callback error. And from the dumped trace, we can see the JIT compilation for the FFI call in f0() is aborted and it is added to backlist. So there is no JIT compiled FFI call, it is safe to do lua callback in the C code now.
---- TRACE 1 start test1.lua:35 0030 ISNEN 7 0 ; 80 0031 JMP 8 => 0036 0036 GGET 8 9 ; "f0" 0037 CALL 8 1 1 0000 . FUNCF 2 ; test1.lua:15 0001 . UGET 0 0 ; lib 0002 . TGETS 0 0 0 ; "f2" 0000 . . . FUNCC ; ffi.clib.__index 0003 . CALL 0 2 1 0000 . . FUNCC ; ffi.meta.__call ---- TRACE 1 abort test1.lua:17 -- blacklisted
The same effect can be achieved by manually turning off JIT compilation in the first test case:
do local cb = ffi.cast("my_fn_t", -- This is the FFI callback function. function(a) return a end) jit.off(f0) for i=1,100 do if i == 80 then setup(cb, 10) end f0() end end
Why the auto detection can not always catches the LuaJIT FFI callback case?
Because the auto detection takes effect only during LuaJIT trace compilation and in the first case, when the trace is compiled, there is no FFI callback. Since we deliberately only set the callback in the lua loop code when i == 80. So in the later iteration of the looping (i > 80), there is FFI callback, which is not expected in the compiled trace. And we can not change the compiled code at this time. The assumption “not allow an FFI call into a C function get JIT-compiled” is broken. It then runs into error “bad callback”.
How other VMs solve this issue?
In the "Trace-based Just-in-Time Type Specialization for Dynamic Languages by Andreas Gal etc." paper section 6.5, there is a description for the similar situation:
Another problem is that external functions can reenter the interpreter by calling scripts, which in turn again might want to access the call stack or global variables. To address this problem, we made the VM set a flag whenever the interpreter is reentered while a compiled trace is running. Every call to an external function then checks this flag and exits the trace immediately after returning from the external function call if it is set. There are many external functions that seldom or never reenter, and they can be called without problem, and will cause trace exit only if necessary.
This approach seems like will cause issue in LuaJIT. When the callback function is hot enough and got compiled – let's say previous FFI call is compiled to Trace 1. This hot callback function is compiled to Trace 2 – then the some state will be modified in Trace 2, which is likely unexpected in Trace 1. When returning to Trace 1 from Trace 2, error will probably occur due to corrupted state. However, it is worth giving it a try.