avatarVincent Blanchon

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2108

Abstract

aaf">The scheduler has clearly solved the problem encountered in the single threaded program and now scale well.</p><h1 id="fb7c">Update during the execution</h1><p id="0c15">Go allows the update of <code>GOMAXPROCS</code> at any time during the execution. It can be the result of VMs or containers reconfiguring the number of available CPU. Since the instruction to grow or shrink the number of processors can happen at any time, it becomes effective as soon as Go does a “Stop the World” phase. Adding a new processor is quite straightforward, it creates the local cache <code>mcache</code> and adds the processor in the idle list. Here is an example with a newly allocated <code>P</code> when going from two to three processors:</p><figure id="05b0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vj_JyWSlAXTsmasKRi8Jxw.png"><figcaption>GOMAXPROCS is growing by one processor</figcaption></figure><p id="1e18">Then, when the world starts again, the new <code>P</code> gets a goroutine to run:</p><figure id="4788"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8JEZmtmbAnArXNcH-MPQfA.png"><figcaption></figcaption></figure><p id="616c">Reducing the number of processors is a bit more complex. Removing a <code>P</code> needs to make its local goroutine queue empty by moving them to the global queue:</p><figure id="c30b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ZfWYpGhsalCR28-AyGlkBw.png"><figcaption>Goroutines moves to the global queue</figcaption></figure><p id="3288">Then, it has to free the local <code>mcache</code> in order to make reusable:</p><figure id="8ef5"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Pcz3eqhJgvsi81kGWLNhDA.png"><figcaption></figcaption></figure><p id="84b9">Here is an example of the tracing when going from two to one <code>P</code>, then one to three <code>P</code>:</p><figure id="5660"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*UrLvObU2U26NoFOu7VOOQg.png"><figcaption></figcaption></figure><h1 id="88c6">GOMAXPROCS=1</h1><p id="b46c">Increasing <code>GOMAXPRO

Options

CS</code> to a higher value does not mean your program will run faster. The Go documentation explains it well:</p><blockquote id="ae37"><p>It depends on the nature of your program. Problems that are intrinsically sequential cannot be sped up by adding more goroutines. Concurrency only becomes parallelism when the problem is intrinsically parallel.</p></blockquote><p id="89be">Let’s see now how concurrently is enough for some kinds of programs. Here is some code that checks some URLs to know if the website is up or not:</p><figure id="7b8c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CqwGCftAqME3Nz3zGx0MJw.png"><figcaption></figcaption></figure><p id="22ee">This example works well with concurrency since it has a lot of pauses, giving spaces for the Go scheduler to run other goroutines while waiting. Here is the benchmark with a different value of <code>GOMAXPROCS</code> thanks to the flag <code>-cpu=1,2,4,8</code> from the <code>test</code> package:</p><div id="74c7"><pre><span class="hljs-attribute">name</span> time/op <span class="hljs-attribute">URLsCheck</span>-<span class="hljs-number">8</span> <span class="hljs-number">4</span>.<span class="hljs-number">19</span>s ± <span class="hljs-number">2</span>% <span class="hljs-attribute">URLsCheck</span>-<span class="hljs-number">4</span> <span class="hljs-number">4</span>.<span class="hljs-number">30</span>s ± <span class="hljs-number">5</span>% <span class="hljs-attribute">URLsCheck</span>-<span class="hljs-number">2</span> <span class="hljs-number">4</span>.<span class="hljs-number">33</span>s ± <span class="hljs-number">4</span>% <span class="hljs-attribute">URLsCheck</span>-<span class="hljs-number">1</span> <span class="hljs-number">4</span>.<span class="hljs-number">14</span>s ± <span class="hljs-number">1</span>%</pre></div><p id="2cfb">Adding parallelism here does not bring any advantages. Using the full capacity of the CPUs will improve the performance in many cases. However, it is worth it to run the tests/benchmarks with different values to see how it behaves.</p></article></body>

Go: GOMAXPROCS & Live Updates

Illustration created for “A Journey With Go”, made from the original Go Gopher, created by Renee French.

ℹ️ This article is based on Go 1.13.

GOMAXPROCS controls the maximum number of OS threads that are executing code simultaneously. This can be done while launching your programs or even during the execution. By default, Go sets the value up to the number of logical CPU available, but it has not always been like this.

Default Value

Since Go 1.5, the default value for GOMAXPROCS has been changed from one to the number of visible CPUs. This change has been possible thanks to the improvements done on the Go scheduler and the context switch on the goroutines. Indeed, in the early days of Go, programs that aimed to work concurrently with frequent goroutines switches suffered from switches between threads.

The proposal for this new value of GOMAXPROCS provides benchmarks that show this improvement:

  • The first benchmark creates a chain of 100 goroutines connected by channels, buffered and unbuffered:
Scheduler improvement with higher value of GOMAXPROCS
  • A second benchmark with the generation of the prime numbers shows how using more core went from a big negative impact to a huge positive impact:
Higher value for GOMAXPROCS has now a great positive impact

The scheduler has clearly solved the problem encountered in the single threaded program and now scale well.

Update during the execution

Go allows the update of GOMAXPROCS at any time during the execution. It can be the result of VMs or containers reconfiguring the number of available CPU. Since the instruction to grow or shrink the number of processors can happen at any time, it becomes effective as soon as Go does a “Stop the World” phase. Adding a new processor is quite straightforward, it creates the local cache mcache and adds the processor in the idle list. Here is an example with a newly allocated P when going from two to three processors:

GOMAXPROCS is growing by one processor

Then, when the world starts again, the new P gets a goroutine to run:

Reducing the number of processors is a bit more complex. Removing a P needs to make its local goroutine queue empty by moving them to the global queue:

Goroutines moves to the global queue

Then, it has to free the local mcache in order to make reusable:

Here is an example of the tracing when going from two to one P, then one to three P:

GOMAXPROCS=1

Increasing GOMAXPROCS to a higher value does not mean your program will run faster. The Go documentation explains it well:

It depends on the nature of your program. Problems that are intrinsically sequential cannot be sped up by adding more goroutines. Concurrency only becomes parallelism when the problem is intrinsically parallel.

Let’s see now how concurrently is enough for some kinds of programs. Here is some code that checks some URLs to know if the website is up or not:

This example works well with concurrency since it has a lot of pauses, giving spaces for the Go scheduler to run other goroutines while waiting. Here is the benchmark with a different value of GOMAXPROCS thanks to the flag -cpu=1,2,4,8 from the test package:

name         time/op
URLsCheck-8  4.19s ± 2%
URLsCheck-4  4.30s ± 5%
URLsCheck-2  4.33s ± 4%
URLsCheck-1  4.14s ± 1%

Adding parallelism here does not bring any advantages. Using the full capacity of the CPUs will improve the performance in many cases. However, it is worth it to run the tests/benchmarks with different values to see how it behaves.

Golang
Go
Internals
Recommended from ReadMedium