when i built this i was at a yc hackathon so i just used blackhole because it was fast to get working. the better approach is to hook inside the injected process at the AudioUnit level instead of going through the HAL layer. facetime uses a VoiceProcessingIO AudioUnit, so if you're already injecting the process you can intercept the render callbacks directly and grab/inject PCM buffers before they ever touch the system audio layer.
from there just ship them out over a unix socket or shared memory ring buffer from within the process.
this sidesteps the background/session issues entirely because you're operating inside the process boundary, the render callbacks keep firing as long as the call's audio graph is active regardless of app state
i think, never actually got it working though. hence why i went with blackhole