i used an IPC and child processes for a cryptocurrency miner protocol i built a few years back... it was fully 20% faster than letting the concurrent work stealing scheduler run it
in most cases, cpu bound processing is the exception, not the rule
if Go had a syntax for binding goroutines to threads you could fix that problem as well