avatarEmma Boudreau

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4348

Abstract

<span class="hljs-string">| _ _ | | __ _ | Type "?" for help, "]?" for Pkg help. </span> <span class="hljs-string">|</span> <span class="hljs-string">|</span> <span class="hljs-string">|</span> <span class="hljs-string">|</span> <span class="hljs-string">|</span> <span class="hljs-string">|</span> <span class="hljs-string">|/</span> <span class="hljs-string">`</span> <span class="hljs-string">|</span> <span class="hljs-string">| | | || | | | (| | | Version 1.9.2 (2023-07-05) </span> <span class="hljs-string">/</span> <span class="hljs-string">|_'|||_'|</span> <span class="hljs-string">|</span> <span class="hljs-string">Fedora</span> <span class="hljs-number">38</span> <span class="hljs-string">build</span> <span class="hljs-string">|__/</span> <span class="hljs-string">|</span>

<span class="hljs-string">julia></span> <span class="hljs-string">using</span> <span class="hljs-string">ParametricProcesses</span> [ <span class="hljs-attr">Info:</span> <span class="hljs-string">Precompiling</span> <span class="hljs-string">ParametricProcesses</span> [<span class="hljs-string">9ce9415f-ecd2-4b63-a3f6-984ce63e76ce</span>]

<span class="hljs-string">julia></span> <span class="hljs-string">pm</span> <span class="hljs-string">=</span> <span class="hljs-string">processes(4)</span> <span class="hljs-number">2</span> <span class="hljs-string">|Threaded</span> <span class="hljs-attr">process:</span> <span class="hljs-number">1</span> <span class="hljs-string">(inactive)</span> <span class="hljs-number">3</span> <span class="hljs-string">|Threaded</span> <span class="hljs-attr">process:</span> <span class="hljs-number">2</span> <span class="hljs-string">(inactive)</span> <span class="hljs-number">4</span> <span class="hljs-string">|Threaded</span> <span class="hljs-attr">process:</span> <span class="hljs-number">3</span> <span class="hljs-string">(inactive)</span> <span class="hljs-number">5</span> <span class="hljs-string">|Threaded</span> <span class="hljs-attr">process:</span> <span class="hljs-number">4</span> <span class="hljs-string">(inactive)</span></pre></div><figure id="3a7d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7948nqt-l1B88IRrflikTQ.png"><figcaption></figcaption></figure><p id="05a5">Now we have four julia processes.</p><h2 id="7b46">the trade-offs to parallel computing</h2><p id="039d">The keen-eyed might have already noticed a significant shortcoming to parallel computing. Due to the nature of threads, data has to be transmitted from one process to the other — though data can be shared with arguments, the functions and dependencies for those arguments must <i>also</i> be shared. In addition, it is simply a new process of the language — as a result, we do end up trading a significant amount of memory. In this case, simply having 4 threaded workers has used a whopping 700mb + of memory — and that’s before we even do anything.</p><p id="cb97">Another significant footnote has also been mentioned; data has to be transmitted from one process to the other. No matter what type of data this is, it is important to remember what is happening. Whenever we move data from one thread to another in our high-level programming language, we are making a call to the kernel, asking it to share this data with the other thread. This by nature takes time, and this is an important trade-off to remember. Along with this, the more threads we are distributing these things to, the more time it takes. The more processes we start, the longer starting our application would take. Building the software that runs well is all about correctly identifying your target; what is your target, and what is your biggest hardware limitation? How many threads would be ideal for your use-case?</p><h2 id="786b">use-cases for parallel computing</h2><p id="19a2">A common theme when getting started with parallel computing might be unanticipated results. For example, one multi-threads a <code>Function</code> for the first time expecting it to be faster, only to find that it is incredibly slow compared to the single-threaded version. This is understandably confusing to novice developers, surely with more threads the application must always run better? In reality, there are two major use-cases for distributed computing that I think should

Options

be considered when determining</p><ul><li>if you application should even use multiple threads to begin with,</li><li>and how many threads your application might want to utilize.</li></ul><p id="1824">I would say there are two main uses for parallel computing. The first of these is larger, more intense calculations. Though this is not the case for most algorithms, there are certainly cases where we might want to distribute some calculations across a cluster, our available threads, or with our GPU. There are certainly use-cases in software-engineering for this, such as rendering, but <i>for the most part</i> most modern use-cases for distributed computing like this will probably be in a more specialized field, like Data Science. There <b>is</b> an exception to this, and this is the second use-case for distributed computing I wanted to talk about:</p><p id="5462" type="7">callbacks.</p><p id="fa2f"><i>Callbacks</i> are a traditional term for functions that are registered to be called later. This is a typical workflow for Graphical User Interfaces; we add events to UI elements which call our function and present us with some Graphical return. Callbacks are a fantastic use-case for multi-threading, as it allows us to process more than one thing at the same time.</p><p id="12e3">There certainly are use-cases for parallel computing everywhere, but what we are doing needs to warrant the hardware we are allocating to it. There are significant trade-offs to any type of parallel computing, but having the tasks to warrant such a thing will make those trade-offs worthwhile. If they are not worthwhile, we will end up with a worse result than if we had done it on a single-thread — which is really unfortunate considering how much effort it often takes to do this.</p><h2 id="3637">GPU parallel computing platforms</h2><p id="8f60">Something that might also be important to consider when approaching parallel computing are the variety of platforms available for a Graphics Processing Unit. After all, the most common form of parallel computing aside from multi-threading is likely to involve tasks ideal for a graphics processor. Before even buying a GPU for parallel computing, it is important to consider what technology you want to use because only certain technologies work with certain cards. The three largest platform technologies for distributing tasks to GPUs are</p><ul><li>CUDA</li><li>OpenCL</li><li>and Rocm.</li></ul><p id="f13b">CUDA is an Nvidia-licensed software product which utilizes Nvidia’s patented CUDA cores and hardware. For most programmers, this is generally the first choice simply because CUDA is the most prominent parallel computing technology which concerns the GPU inside of the industry. CUDA is also exclusive to Nvidia GPUs, so if you want to use CUDA this will be a prerequesite. The same cannot be said for the next example, OpenCL.</p><p id="5844">OpenCL is a bit different, as it is a third party technology for distributing processes to different hardware. OpenCL itself is a programming language that is supported by a lot of different hardware, and this is a great option for use-cases that want to be GPU-agnostic. OpenCL is also capable of a lot more than just distributing to a graphics processor.</p><p id="0588">The last option on this list I wanted to review is AMD RocM. RocM is a parallel computing technology AMD has been working on to compete with CUDA. Though Rocm is not as widely-used in the industry in my experience, the AMDGPU graphics driver (for Linux users) is pretty fantastic so I think this might be an option for those who might want to stay away from Nvidia for one reason or another.</p><p id="5d28">Parallel computing is a revolutionary and incredibly powerful concept that can do a lot of software. Using different parallel computing techniques, we are able to effective distribute our software as evenly as possible across our software. Though parallel computing is incredibly powerful, it must be used in the right context in order to actually be effective. There are a number of important trade-offs and pitfalls to be aware of with parallel computing. Fortunately, with the information provided in this article we have a firm knowledge foundation that will assist a lot in using distributed computing effectively. Thank you all for reading!</p></article></body>

4 Most CRUCIAL Things To Know Before Parallel Computing

How can MULTI-threading POSSIBLY be SLOWER than SINGLE-threading?

Hardware has come a long way in the short period of time between its initial consumer launch and the modern era, where phones have become people’s wallets and a connection to the majority of people on Earth is in their pocket. Modern computers have doubled and tripled the amount of cores and frequency of those cores seen in machines that are merely 10 years old at this point. Graphics cards have also seen a pretty substantial evolution over the years, alongside improvements to connectors, display technology, battery technology, and more. An important thing to remember about hardware is that software warrants hardware. In other words, your firmware is designed to run your components which are designed to run your operating system — for every piece of hardware, there needs to be an appropriate amount of software. Without any software, hardware becomes a paperweight until there is software.

With all of this new hardware and consumer access to hardware, one thing that software often needs to do is catch-up. It is doubtful that the majority of applications take advantage of all 16-cores of a machine’s Central Processing Unit (CPU), and it is doubtful that the majority of applications are perfectly optimized for — or even use — a Graphics Processing Unit (GPU). Taking advantage of this hardware might go a long way if we are doing performance-intensive tasks, and as software developers we certainly want to take advantage of the hardware we are provided with to run our software. While the use-cases for different distributed computing techniques are certainly there, the process of distributing your tasks is not always quite so intuitive.

what is a thread?

The thread is the first thing we need to understand to get started with parallel computing. Even with forms of parallel computing which involve different hardware to the CPU, understanding threads will be incredibly important to distribute tasks. A thread is an open task which a processor is partaking in. Our kernel opens up a new thread to bind our application to a processor, and every application on our computer runs asynchronously across all available threads — so long as the software is built to take advantage of all threads. Whenever we assert a process has _ threads, we are saying that processor is able to do _ small sequences of tasks at the same time.

This is where threads come from, the kernel of our operating system interfaces these tasks with our hardware, in return we get to visualize it in a system monitor.

julia

We see a new process — each process our kernel has allows us to interface with a new thread. The only way to launch an additional workload at the same time, really, is to launch it on a new thread. The exception to this is doing things asynchronously — though asynchronous does not do things at the same time — instead it pauses some tasks to complete other tasks. Let’s try starting Julia with more than one thread, and spawning some processes with ParametricProcesses :

06:26 AM|emma|ManyThread🩷> julia --threads 8
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.2 (2023-07-05)
 _/ |\__'_|_|_|\__'_|  |  Fedora 38 build
|__/                   |

julia> using ParametricProcesses
[ Info: Precompiling ParametricProcesses [9ce9415f-ecd2-4b63-a3f6-984ce63e76ce]

julia> pm = processes(4)
2 |Threaded process: 1 (inactive)
3 |Threaded process: 2 (inactive)
4 |Threaded process: 3 (inactive)
5 |Threaded process: 4 (inactive)

Now we have four julia processes.

the trade-offs to parallel computing

The keen-eyed might have already noticed a significant shortcoming to parallel computing. Due to the nature of threads, data has to be transmitted from one process to the other — though data can be shared with arguments, the functions and dependencies for those arguments must also be shared. In addition, it is simply a new process of the language — as a result, we do end up trading a significant amount of memory. In this case, simply having 4 threaded workers has used a whopping 700mb + of memory — and that’s before we even do anything.

Another significant footnote has also been mentioned; data has to be transmitted from one process to the other. No matter what type of data this is, it is important to remember what is happening. Whenever we move data from one thread to another in our high-level programming language, we are making a call to the kernel, asking it to share this data with the other thread. This by nature takes time, and this is an important trade-off to remember. Along with this, the more threads we are distributing these things to, the more time it takes. The more processes we start, the longer starting our application would take. Building the software that runs well is all about correctly identifying your target; what is your target, and what is your biggest hardware limitation? How many threads would be ideal for your use-case?

use-cases for parallel computing

A common theme when getting started with parallel computing might be unanticipated results. For example, one multi-threads a Function for the first time expecting it to be faster, only to find that it is incredibly slow compared to the single-threaded version. This is understandably confusing to novice developers, surely with more threads the application must always run better? In reality, there are two major use-cases for distributed computing that I think should be considered when determining

  • if you application should even use multiple threads to begin with,
  • and how many threads your application might want to utilize.

I would say there are two main uses for parallel computing. The first of these is larger, more intense calculations. Though this is not the case for most algorithms, there are certainly cases where we might want to distribute some calculations across a cluster, our available threads, or with our GPU. There are certainly use-cases in software-engineering for this, such as rendering, but for the most part most modern use-cases for distributed computing like this will probably be in a more specialized field, like Data Science. There is an exception to this, and this is the second use-case for distributed computing I wanted to talk about:

callbacks.

Callbacks are a traditional term for functions that are registered to be called later. This is a typical workflow for Graphical User Interfaces; we add events to UI elements which call our function and present us with some Graphical return. Callbacks are a fantastic use-case for multi-threading, as it allows us to process more than one thing at the same time.

There certainly are use-cases for parallel computing everywhere, but what we are doing needs to warrant the hardware we are allocating to it. There are significant trade-offs to any type of parallel computing, but having the tasks to warrant such a thing will make those trade-offs worthwhile. If they are not worthwhile, we will end up with a worse result than if we had done it on a single-thread — which is really unfortunate considering how much effort it often takes to do this.

GPU parallel computing platforms

Something that might also be important to consider when approaching parallel computing are the variety of platforms available for a Graphics Processing Unit. After all, the most common form of parallel computing aside from multi-threading is likely to involve tasks ideal for a graphics processor. Before even buying a GPU for parallel computing, it is important to consider what technology you want to use because only certain technologies work with certain cards. The three largest platform technologies for distributing tasks to GPUs are

  • CUDA
  • OpenCL
  • and Rocm.

CUDA is an Nvidia-licensed software product which utilizes Nvidia’s patented CUDA cores and hardware. For most programmers, this is generally the first choice simply because CUDA is the most prominent parallel computing technology which concerns the GPU inside of the industry. CUDA is also exclusive to Nvidia GPUs, so if you want to use CUDA this will be a prerequesite. The same cannot be said for the next example, OpenCL.

OpenCL is a bit different, as it is a third party technology for distributing processes to different hardware. OpenCL itself is a programming language that is supported by a lot of different hardware, and this is a great option for use-cases that want to be GPU-agnostic. OpenCL is also capable of a lot more than just distributing to a graphics processor.

The last option on this list I wanted to review is AMD RocM. RocM is a parallel computing technology AMD has been working on to compete with CUDA. Though Rocm is not as widely-used in the industry in my experience, the AMDGPU graphics driver (for Linux users) is pretty fantastic so I think this might be an option for those who might want to stay away from Nvidia for one reason or another.

Parallel computing is a revolutionary and incredibly powerful concept that can do a lot of software. Using different parallel computing techniques, we are able to effective distribute our software as evenly as possible across our software. Though parallel computing is incredibly powerful, it must be used in the right context in order to actually be effective. There are a number of important trade-offs and pitfalls to be aware of with parallel computing. Fortunately, with the information provided in this article we have a firm knowledge foundation that will assist a lot in using distributed computing effectively. Thank you all for reading!

Programming
Computer Science
Software Development
Coding
Julia
Recommended from ReadMedium