Tested building https://github.com/trolleyman/cuda-macros with this (the `cuda-macros-test` crate) and it builds & links correctly. Haven't had a chance to test that this runs yet, but will in the morning.
I wasn't sure that this way is the most elegant, but this seemed like the way that did the least amount of changes. I am also not sure that this is the correct way of doing this, especially regarding cross-compiling, but it gets it up and running at least.
To test you can do a `cargo build` in the root of the repo linked above. The build stuff is a bit hacky, but essentially it generates the CUDA function below & calls it.
```c
extern "C" __global__ void hello(int32_t* x, int32_t y) {
printf("Hello from block %d, thread %d (y=%d)\n", blockIdx.x, threadIdx.x, y);
*x = 2;
}
```
```rust
extern "C" unsafe fn hello(x: *mut i32, y: i32);
```