garbage collection has to be done, if you don't free after you malloc in C/C++ you can blow up and end up with an OOM panic and be force killed by the kernel
coroutine scheduling only runs a background process when you are running on more than one kernel thread because it tries to parallelise as much as possible
this is one of the other deficiencies of go - if you need to do bulk compute it's better to refactor your processing unit as an independent process and coordinate them with an IPC, i have also done this, the difference is about 20% for compute bound heavy processing (it was a crypto miner) - vanity mining addresses, also, another example of what benefits in Go from this, whereas in languages with explicit access to kernel thread control can do this natively