avatarHéla Ben Khalfallah

Summary

The provided content discusses various memory safety violations and memory management strategies in programming, emphasizing the importance of understanding memory usage, allocation, and deallocation in different programming languages and environments.

Abstract

The article delves into the intricacies of memory management in programming, covering topics such as manual and automatic memory management, garbage collection, and ownership models. It explains how memory is allocated and freed in languages like C, C++, Objective-C, Swift, Java, and JavaScript, and the potential issues that can arise, such as memory leaks, buffer overflows, null dereferences, and dangling pointers. The author highlights the transition from manual memory management to more automated systems like Automatic Reference Counting (ARC) and Garbage Collection (GC), while also noting their limitations and the need for developer vigilance to prevent memory-related errors. The article also addresses multithreading race conditions and concludes with best practices for memory safety, advocating for a defensive coding approach to mitigate common memory issues.

Opinions

  • The author expresses a preference for ARC over garbage collection due to its deterministic nature and the control it provides over memory deallocation.
  • There is a concern that automatic memory management does not guarantee memory safety, as issues like buffer overflows and null dereferences can still occur.
  • The author suggests that developers should not rely solely on garbage collectors to manage memory but should actively write code that supports efficient memory usage.
  • The article conveys that despite the advancements in memory management, developers must remain aware of memory safety vulnerabilities and actively work to avoid them.
  • The author seems to appreciate the evolution of memory management techniques, acknowledging the improvements brought by ARC and GC while also recognizing their limitations.
  • There is an emphasis on the importance of defensive programming and adopting best practices to prevent memory leaks and other related issues.
  • The author points out that even with modern memory management practices, programming languages with garbage collection like Java and JavaScript are not immune to memory leaks.

Memory Safety Violations

How to avoid memory issues?

Running Program’s Memory (Image by the author)

Running Program’s Memory

A typical memory layout of a running program can be as follows:

  • Static memory: static size, static allocation (compile time), for global variables, and static local variables.
  • Stack memory (Call Stack): static size, dynamic allocation (run time), for local variables.
  • Heap memory: dynamic size, dynamic allocation (run time). Its programmer controlled for some programming languages (variable-sized objects).

For all data, memory must be allocated (i.e., memory space reserved).

Great! We now know how and when our program uses memory!

Now, let’s see how programming languages manage (allocate and free) Heap memory!

Allocating and Deallocating Memory in the Heap

There are several types of memory management:

  • Manual: C, C++
  • Manual retain-release (MRR): Objective-C
  • Automatic Reference Counting (ARC): Objective-C, Swift
  • Garbage Collection (fully automatic management): Java, JavaScript
  • Ownership: Rust

Let’s take a quick look inside the magic of each option!

Manual memory management

In C, functions such as malloc() are used to dynamically allocate memory from the Heap. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes.

#include <stdlib.h>
int *array = malloc(num_items * sizeof(int));

malloc() will try to find unused memory that is large enough to hold the specified number of bytes and reserve it. Otherwise, the program terminates with an error message.

malloc() is not automatically deallocated. It must also be deallocated explicitly using free.

Forgetting to deallocate leads to memory leaks and running out of memory!

We must not use a freed pointer unless reassigned or reallocated!

Wow, isn’t that hard?

Let’s take a look inside C++!

In C++, memory management is accomplished using the new and delete operators. new is used to allocate memory during execution time. delete deallocates the reserved memory.

As with the C language, we must be careful not to forget to free memory and avoid access errors!

Despite the difficulty of manually managing the memory, the advantage is that we know the exact needs of our program, and we can free the memory instantly after its use.

We also ensure that objects exist as long as they should, but no longer.

You can also create your automatic way of managing memory based on low-level memory management functions!

From Manual Retain Release (MRR) to Automatic Reference Counting (ARC)

iOS and OS X apps achieve memory management through reference-counting:

  • When we claim ownership of an object, we increase its reference count.
  • When we’re done with the object, we decrease its reference count.
  • As the count reaches zero, the operating system is allowed to destroy it.

Once upon a time, with Objective-C, we manually controlled an object’s reference count by calling special memory-management methods:

  • alloc : increase by one the reference counting (own an object).
  • retain : increase by one the reference Counting (take ownership of an object).
  • release, autorelease: decrease by one the reference counting (relinquish ownership of an object).

This is called Manual Retain Release (MRR)!

Reference Counting Memory Management

It’s our job to claim and relinquish ownership of every object in the program:

  • If we forget to free an object, its underlying memory is never freed, resulting in a memory leak (we will see it in more detail later).
  • When we try to free an object too many times, it results in a dangling pointer (we will see it in more detail later).
  • In either case, the program will most likely crash.

It’s hard to keep the balance between every alloc, retain, copy and release or autorelease!

Fortunately, with the new versions of Objective-C and Swift, we have moved to ARC!

Automatic Reference Counting works the exact same way as MRR, but it automatically inserts the appropriate memory-management methods for us. This means that we will not manually call again retain, release, or autorelease. Whoa!

Automatic Reference Counting lets us completely forget about memory management. The idea is to focus on high-level functionality instead of the underlying memory management.

You can find more details about ARC here. Enjoy!

Aha, that’s a big step to automatically manage memory, but it’s not the only way! Let’s continue our discovery!

Garbage collection

Garbage Collection (GC) is the technique used for automatic memory management like in Java and JavaScript.

In Java, objects are allocated using a new operator.

To simplify the working mechanism of a GC, it is like having programmed a thread in the background which will run every period to analyze memory usage and try to free unused objects.

In general, all GC focus on two areas:

  • Find out all objects that are still alive or used (Marking Reachable Objects).
  • Get rid of everything else — the supposedly dead and unused objects (removing unused objects, or sweep).

This algorithm is called Mark and Sweep. Gotcha!

Well, what can we notice from all these definitions:

  • the GC is tied to the runtime and not to the programming language.
  • practically the execution period is not regular and can happen at indeterminate intervals: either after a certain amount of time has passed, or when the runtime sees available memory getting low.
  • this implies that objects are not necessarily released at the exact moment they are no longer used.
  • normal program execution is suspended while the garbage collection algorithm runs in order to find what is used and clean what is unused.

Whoa! It’s not very magical!

To be honest, I don’t like GC because of this indeterminism. I rather prefer the ARC approach and an ahead-of-time way rather than a runtime task ️which may slow the program execution.

Garbage collection vs ARC

With ARC, the compiler will inject code into the executable that keeps track of object reference counts and will automatically release objects as necessary, rather than having the runtime look for and dispose of unused objects in the background.

Automatic Reference Counting (ARC). At compile time, it inserts into the object code messages retain and release which increase and decrease the reference count at run time, marking for deallocation those objects when the number of references to them reaches zero. ARC differs from tracing garbage collection in that there is no background process that deallocates the objects asynchronously at runtime. Unlike tracing garbage collection, ARC does not handle reference cycles automatically. Automatic Reference Counting — Wikipedia

Amazing! It’s done at compilation. A deterministic destruction, i.e., ahead of time and no need for background processing!

Memory management vs memory safety

I think you’re starting to realize that automatic doesn’t imply safe:

  • What if GC arrives late to free memory?
  • What if we use the variable at the same time it is being released by the GC (multi-threading)?
  • What if ARC doesn’t manage retain cycles well?
  • What about temporary data?
  • What about global variables?
  • What if the input size exceeds the memory?

Unfortunately, these cases cause many memory problems that lead to program crashes and sometimes to security violations!

In the majority of cases, automatic memory management guarantees a certain degree of safety, However, this is not sufficient, and we will see why. So, let’s move on to see what memory problems can occur and how can we avoid them!

Memory Violations and How To Avoid Them

Memory jargon

Before we begin, here is some important memory jargon:

Buffer is a place to put information temporarily while waiting for something else to process the data or while we’re processing it. For example, when input comes from the keyboard, it is stored in an input buffer until it is read and used by the application.

Pointer is a variable that stores the memory address of an object.

Buffer overflow

Aha, I think you know this well-known security vulnerability!

A buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer’s boundary and overwrites adjacent memory locations. Buffer overflow — Wikipedia

In other words, a buffer overflow occurs when a program tries to put more data into a buffer than it can hold, or when a program tries to put data into a memory area beyond a buffer:

Buffer overflow Example (Image by the author)

Ah, you see why I said that automatic does not imply safe!

Buffer overflow vulnerabilities can occur in code that:

  • relies on external data to control its behavior.
  • depends on data properties that are applied outside the immediate scope of the code.
  • has many memory manipulation functions in C and C++ that do not perform bounds checking, and it can easily overwrite the allocated bounds of the buffers they operate upon.

Writing outside the bounds of a block of allocated memory can corrupt data, crash the program, or cause the execution of malicious code.

To avoid this type of violation, we need to verify all code that accepts input from users via the HTTP request and ensure it provides appropriate size and type checking on all such inputs.

It’s also recommended to validate that array indices are within the proper bounds before using them as an index to an array.

This problem concerns all programming languages whether the memory is managed manually or automatically because it is due to the way it’s coded (BufferOverflowException, ArrayIndexOutOfBoundsException, IndexOutOfBoundsException).

FYI, there is another buffer problem: Buffer over-read, which occurs when reading a buffer and makes the program go over the buffer limit and read the adjacent memory.

Null dereference

A null pointer dereference occurs when a NULL pointer is used as if pointing to a valid memory area.

A null pointer should not be confused with an uninitialized pointer:

  • an uninitialized variable is a variable that is declared but not set to a defined known value before being used.
  • a null pointer is a pointer that does not point to any memory location (pointing to nothing). It stores the base address of the segment.

NULL pointer dereference can happen:

  • when the program does not check for an error after calling a function that can return with a NULL pointer if the function fails.
  • when the program does not properly anticipate or handle exceptional conditions that rarely occur during normal operation of the software.
  • through a number of flaws, including race conditions and simple programming omissions.

In C, dereferencing a null pointer is undefined behavior. In Java, access to a null reference triggers a NullPointerException.

To avoid this type of violation:

  • before using a pointer, ensure it is not equal to NULL.
  • when freeing pointers, ensure they are not set to NULL. Be sure to set them to NULL once they are freed.
  • when working with a multi-threaded or otherwise asynchronous environment, ensure that proper locking APIs are used to lock before the if statement; and unlock when it has finished.
  • in Java, NullPointerException can be caught by error handling code, but the preferred practice is to ensure that such exceptions never occur.
  • use a defensive programming approach.

For fan Null pointer dereference in Google Chrome and Google Chrome (cybersecurity-help.cz).

Dangling and wild pointers

Did you hear about the Dangling pointer vulnerability using DOM plugin array — Mozilla and the Dangling pointer vulnerability in nsTreeSelection — Mozilla?

Security researcher Sergey Glazunov reported a dangling pointer vulnerability in the implementation of navigator.plugins in which the navigator object could retain a pointer to the plugins array even after it had been destroyed.

An attacker could potentially use this issue to crash the browser and run arbitrary code on a victim’s computer. 584512 — (CVE-2010–2767) nsPluginArray — memory corruption (mozilla.org)

The dangling pointer arises when the referencing object is deleted or deallocated and the pointer still pointing to a memory location. It creates a problem because the pointer is pointing to the memory that is not available. Oops!

A pointer that is not initialized properly before its first use (not even NULL) is known as a Wild Pointer. The uninitialized pointer’s behavior is totally undefined because it may point to some arbitrary location that can be the cause of the program crash. That’s why it’s called a wild pointer. OMG!

These issues happen when JavaScript engines are written using C++ (Rust for Firefox actual versions).

Using an automatic memory management mechanism (GC or ARC) considerably reduces the probability of encountering these pointer problems, but that does not prevent memory leaks as we will see below. Let’s move on!

Stack Overflow

I think you’ve encountered this error at least once if you’re using JavaScript in the browser:

JavaScript Stack Overflow — maximum call stack size exceeded (Image by the author)

This error usually occurs with recursive calls and indicates that the maximum Stack size has been exceeded:

Call Stack Overflow (Image by the author)

Here too, the GC does not help because each iteration references the previous one!

This problem is not specific to JavaScript, but to all programming languages that do not have the Tail Call Optimization mechanism.

A tail call is when the last statement of a function is a call to another function. The optimization consists in having the tail call function replace its parent function in the stack. This way, recursive functions won’t grow the stack.

Tail Call Optimization (Image by the author)

The question now is what can be done if the language does not implement Tail Call Optimization (TCO) by default?

Well, there is a magic pattern that can help us in this case, and it can be applied in all languages to closely imitate the behavior of TCO: the Trampoline pattern.

A trampoline function wraps our recursive function in a loop. Under the hood, it calls the recursive function piece by piece until it no longer produces recursive calls:

// Trampoline 
const Trampoline = fn => (...args) => {
  let result = fn(...args)
  while (typeof result === 'function') {
    result = result()
  }
  return result
}
Recursive Tail Calls and Trampolines in Swift — uraimo.com

With Trampoline, we make almost no change to the normal recursive algorithm (non-TCO!), but we skip the call-stack build-up entirely. It’s just a new Utility!

A wonderful pattern!

Out of Memory (OOM)

This error occurs when there is insufficient space to allocate an object in the Heap, and the Heap cannot be expanded further. In this situation, the GC can’t help!

OOM error usually means that the program is doing something wrong, such as the following:

  • holding onto objects too long
  • trying to process too much data at a time
  • having many global variables (a problem in function and variables scoop)

For example, in JavaScript, the references that are directly pointed to the root (global or window) are always active (used), and the GC cannot clear them!

So, what we can do to avoid OOM? let’s see!

To process large data,

  • for backend applications using Node JS, we can use stream and pagination, geospatial queries.
  • for frontend applications using JavaScript and React, we can use some technics like windowing, or pagination.
  • in general, it is highly recommended to adopt these principles: processing on demand, display on demand, at the beginning we only load and process what is necessary (lazy evaluation).

In order to avoid strong references, try before exploiting these APIs: WeakMap, WeakSet, WeakHashMap.

To cache data, we can adopt a mechanism such as LRU (least-recently-used) by setting a max number of the most recently used items that we want to keep.

To have scooped functions and variables, we can adopt a Functional Programming approach. For JavaScript, we should avoid global variables, windows, and global listeners as much as possible.

Memory leaks

While the GC effectively handles a good portion of memory, it doesn’t guarantee a foolproof solution to memory leaking. The GC is pretty smart, but not flawless. Memory leaks can still sneak up. Memory leaks are a genuine problem in Java. Understanding Memory Leaks in Java | Baeldung

I like this explanation. It sums up everything I want to explain in this article: a GC is important but not sufficient. We shouldn’t rely on the GC to clean everything up for us, but we should help it do its job well!

I think you are surprised that we can encounter memory leaks despite the presence of a GC. Let’s look together!

A memory leak is a situation where objects in the heap are no longer in use, but the garbage collector is unable to remove them from memory. Hence, they are unnecessarily maintained. When does this situation occur?

In Java, memory leaks can occur due to:

  • static variables
  • unclosed resources (make a new connection or open a stream)
  • whenever a class’ finalize() method is overridden, then objects of that class aren’t instantly garbage collected. Instead, the GC queues them for finalization, which occurs at a later point in time
  • when using this ThreadLocal, each thread will hold an implicit reference to its copy of a ThreadLocal variable. It will maintain its own copy instead of sharing the resource across multiple threads, as long as the thread is alive

In JavaScript, memory leaks can occur due to:

  • Undeclared or accidental global variables
  • Forgotten setTimeout and setInterval
  • Out of DOM reference or Detached node: nodes that have been removed from the DOM but are still available in the memory
  • Uncleaned DOM event listener
  • WebSocket subscription and request to an API
  • React components that perform state updates and run asynchronous operations can cause memory leak issues if the state is updated after the component is unmounted.

Wow! That’s scary knowing that JS and Java memory management uses garbage collection!

Multithreading and Race condition

In various parallel programming models, process/threads share a common address space, which they read and write to asynchronously. In this model, all processes have equal access to shared memory.

A race condition occurs in a shared memory program when two threads access the same variable using shared memory data and at least one thread executes a write operation. The access is concurrent so they could happen simultaneously.

For information, it is safe for several threads to try to read a shared resource as long as they do not try to modify it.

All systems comprising a multiprocessing environment are vulnerable to a race condition attack!

In a race condition, shared memory can be corrupted by threads.

Various mechanisms such as locks (synchronized) and semaphore may be used to control access to the shared memory.

Java offers several data structures allowing concurrent access: DelayQueue, BlockingQueue, ConcurrentMap, ConcurrentHashMap.

Conclusion

What a journey into memory! It was very instructive!

We saw how the memory of a program is laid out and how it is managed: manually or automatically via ARC or GC.

We have seen that automatically managing memory is important but not sufficient. Some memory issues can occur even in a run-time with a GC.

Buffer overflow, Buffer over-read, Null dereference, dangling and wild pointers, Stack Overflow, and out of memory (OOM) are exceptions for all programming languages. They are more related to an “unsafe” coding defect than an internal problem in the programming language or the runtime.

The most important piece of advice is to take a defensive approach in writing code. A program should be able to run properly even through unforeseen processes or when unexpected entries.

That’s all the folks for this journey. Happy reading!

Thank you for reading my article.

Want to Connect?
You can find me at GitHub: https://github.com/helabenkhalfallah
Programming
Memory Management
Memory Safety
Software Development
Software Engineering
Recommended from ReadMedium