Go: GOMAXPROCS & Live Updates

ℹ️ This article is based on Go 1.13.
GOMAXPROCS controls the maximum number of OS threads that are executing code simultaneously. This can be done while launching your programs or even during the execution. By default, Go sets the value up to the number of logical CPU available, but it has not always been like this.
Default Value
Since Go 1.5, the default value for GOMAXPROCS has been changed from one to the number of visible CPUs. This change has been possible thanks to the improvements done on the Go scheduler and the context switch on the goroutines. Indeed, in the early days of Go, programs that aimed to work concurrently with frequent goroutines switches suffered from switches between threads.
The proposal for this new value of GOMAXPROCS provides benchmarks that show this improvement:
- The first benchmark creates a chain of 100 goroutines connected by channels, buffered and unbuffered:

GOMAXPROCS- A second benchmark with the generation of the prime numbers shows how using more core went from a big negative impact to a huge positive impact:

GOMAXPROCS has now a great positive impactThe scheduler has clearly solved the problem encountered in the single threaded program and now scale well.
Update during the execution
Go allows the update of GOMAXPROCS at any time during the execution. It can be the result of VMs or containers reconfiguring the number of available CPU. Since the instruction to grow or shrink the number of processors can happen at any time, it becomes effective as soon as Go does a “Stop the World” phase. Adding a new processor is quite straightforward, it creates the local cache mcache and adds the processor in the idle list. Here is an example with a newly allocated P when going from two to three processors:

Then, when the world starts again, the new P gets a goroutine to run:

Reducing the number of processors is a bit more complex. Removing a P needs to make its local goroutine queue empty by moving them to the global queue:

Then, it has to free the local mcache in order to make reusable:

Here is an example of the tracing when going from two to one P, then one to three P:

GOMAXPROCS=1
Increasing GOMAXPROCS to a higher value does not mean your program will run faster. The Go documentation explains it well:
It depends on the nature of your program. Problems that are intrinsically sequential cannot be sped up by adding more goroutines. Concurrency only becomes parallelism when the problem is intrinsically parallel.
Let’s see now how concurrently is enough for some kinds of programs. Here is some code that checks some URLs to know if the website is up or not:

This example works well with concurrency since it has a lot of pauses, giving spaces for the Go scheduler to run other goroutines while waiting. Here is the benchmark with a different value of GOMAXPROCS thanks to the flag -cpu=1,2,4,8 from the test package:
name time/op
URLsCheck-8 4.19s ± 2%
URLsCheck-4 4.30s ± 5%
URLsCheck-2 4.33s ± 4%
URLsCheck-1 4.14s ± 1%Adding parallelism here does not bring any advantages. Using the full capacity of the CPUs will improve the performance in many cases. However, it is worth it to run the tests/benchmarks with different values to see how it behaves.





