Why Data Types Matter
How you handle data types may lead to a number of vulnerabilities or odd behavior attackers can abuse
One of my post that may later become a book on Secure Code. Also one of my posts on Application Security.
Free Content on Jobs in Cybersecurity | Sign up for the Email List
You’ve probably already heard about data types if you’ve been programming for any length of time. But have you dug into the details? Why should you care about data types?
In some languages, you have to make sure you use the correct data type in your code when you declare a variable. Other languages will try to figure out the type of data you want to use when you instantiate a variable. Who cares what the data type is just let the programming language figure it out, right? Well, let’s take a look at how that might work out.
Problems with mismatched data types
Most programming languages include something called primitive data types. These data types are defined within the language itself. They have certain properties. For example an integer data type in Java must be a whole number between -2,147,483,648 to 2,147,483,647.
Let’s at data types in Java and C#. What do you notice about the data types in these two charts?
Java
https://www.w3schools.com/java/java_data_types.asp

C#
https://www.w3schools.com/cs/cs_data_types.php

Java has a 2 byte short data type. C# has no such data type. When processing numeric values such as in banking systems, you’ll need to ensure that systems that integrate with each other use common data types. If using two different languages, you’ll need to ensure that you define the minimum and maximum values allowed in each system to prevent bugs.
Now consider some of the numeric data types in Microsoft’s version of SQL, Transact-SQL.

Here the data types can be even more granular, to conserve space in a database where storage and processing time become crucial factors to database performance.
One of the most common reasons for batch job failures at the bank where I worked involved batch jobs that imported data from external systems that sent numbers larger than our system could process. Often that would cause a production incident. Sometimes one of our team members on-call would have to respond to a phone call and write a query to manually adjust the system.
That manual query or production fix usually doesn’t go through the same level of scrutiny and testing it did during a development process. At some organizations, a “production support” or “operations” team may make this correction. That team is doing their best given the information they have most of the time, but often doesn’t fully understand the functionality and nuances of the application.
What could go wrong when someone comes in to manually write a query that impacts financial records? If you aren’t already imagining a number of scenarios I’ll help. What if two people are colluding to write a query and send money to the wrong bank account leveraging a process that entails less scrutiny of their actions? What if the person has a bug in their manual hastily written query that wipes out valid records or posts incorrect values and that goes unnoticed? What if the person writes a query and provides an explanation non-technical people don’t understand that fixes the problem but introduces a system vulnerability — on purpose?
I provided another example of what could go wrong with incorrect data types in my last post on secure transactions. Consider the scenario I described involved a flat-file that received the wrong data in a CSV file. It could be that someone put a string in a column that should have been an integer. The CSV file doesn’t care. That could trigger the error in a downstream system. In my example that led to the free wire transfer for a customer in code that failed to properly wrap a set of related operations in a transaction.
Incorrect data types lead to unexpected results
The other day I noticed the following post on Twitter. Notice anything a bit odd?

The last line parses 0.0000005 as 5. What could go wrong?
Let’s say an application is processing dividends that could result in a number with a lot of extra decimal places. Instead of getting a partial cent as a dividend the person receiving this dividend would receive $5.00. Let’s say you have hundreds of thousands of people receiving that dividend. That could add up!
Many people commented on this post. Some of them wrote about how ridiculous it is that JavaScript behaves this way.
Others had a different perspective. They chastised anyone who would write code like this because the parseInt function requires you to pass in a String rather than Number. Who would be so foolish as to use a function incorrectly?
JavaScript is one of the languages that allow you to create variables without specifying a data type. Do you always check what type of data type you need to pass into a JavaScript function before you call it? I can imagine many programmers don’t. They fail to take this step either because they may not even be aware that they should be doing that. Sometimes they are in a hurry to get something done and don’t think about the risk or know that it even exists.
Another potential problem is simply making a mistake. Yes, you’re all perfect right? I’m not. I could easily see myself doing this I’m quickly writing some code to accomplish some task. You know you’re supposed to pass in a string and that you should understand all the inputs to a particular function before calling it. But while typing fast you forgot to put those quotes around that value. Oops. It happens.
On the other hand, shouldn’t this function be checking for proper inputs? Why do you, as a programmer, need to understand how every single function you call works. Shouldn’t they protect you from mistakes like this? Well, some programming languages do and others don’t. It may be a pain to define your data types upfront but when you do, your programming language that enforces correct data types should protect you from problems like this.
What’s the takeaway? A programming language that forces you to define a data type could possibly lead to fewer errors. A programmer would need to choose the correct data type when they define the variable. The language would throw an error if the programmer passes a variable with the wrong data type into a function that doesn’t match the signature of the function. This makes it more complicated to program in that language but can prevent errors such as the one above.
This is how the languages you choose for creating your applications can affect your security, by the way. When you choose to use a particular language for applications involving sensitive data, dig into the details of how they work under the hood to prevent security problems later. If you choose to use a language that doesn’t enforce data types you’ll need to ensure your application checks that the right type of data is leveraged throughout to prevent related security problems.
Vulnerabilities and unexpected behavior in primitive data types
Sometimes the data types in your favorite programming language have a vulnerability. Of course, you’ll need to be aware of those and update to the latest version as quickly as possible to ensure your system is not vulnerable to attack. However, you can also learn from these vulnerabilities and ensure when you implement your own data types they do not have similar problems.
Primitive data types may also have unexpected behavior. Be aware of these issues and ensure you select data types appropriately for given application functionality. Some data types perform rounding incorrectly when dealing with currency values. If you are using Java you’ll want to avoid using the double data type when programming financial applications. Dig into the details of the data types you select and make sure they are appropriate for your application.
This article goes into a lot more detail on Java data types and currency calculations if you are interested:
https://www.infoworld.com/article/2071332/the-need-for-bigdecimal.html
A related issue involves functions that round numbers. This article from Microsoft includes rounding discrepancies in C#:
I wrote a blog post about a problem with foreign currencies in Cold Fusion back in 2010:
From some tax documentation I am reading for Sales Tax Online web tax calculation component:
“You may encounter a precision problem for very large amounts when using the ColdFusion number type. Because this type has a very large exponential range, it necessarily sacrifices precision in the number of significant digits it can carry. This will not generally be a problem with amounts expressed in currencies that have only two significant fractional digits (e.g., US Dollars), however, foreign currencies can have as many as four significant fractional digits. A value with a large non-fractional portion could suffer a rounding error in the fractional digits.”
To help discover issues with unexpected behavior, test applications thoroughly with appropriate bounds checking. Bounds checking means that if you allow values between 1 to 10 for a particular variable, check values up to and outside those boundaries. You should test 0, 1, 10, and 11. You’ll probably want to also test negative numbers because sometimes people add functionality or use data types that drop the sign and that leads to unexpected results.
You could use automation to help you find discrepancies when using different data types and functions. I haven’t done this myself but it would be pretty interesting. If you try it out give my blog post a plug! Write functionality to loop through numbers using different data types and functions with the same numeric inputs. Compare the results. Any time the results don’t match you have a potential issue that you’ll want to understand before using that data type or function.
Creating your own data types
In addition to primitive data types, languages may allow you to create your own data types. You may create data types with the definition of your choosing. Some languages allow you to create objects.
I’m not going to explain objects in complete depth here as you should look up and understand how objects work in your programming language of choice. But I want to address the fact that when creating and using your own data types you have some of the same issues you do when using primitive data types. You need to ensure the correct data types are in use within your own objects.
When you create an object it often has characteristics called properties and methods. Properties leverage primitive data types to describe the object within your system. These larger groupings of primitive data types are expected to have certain characteristics and as you use your object you should validate the integrity of the data contained in your object. Prevent assignment of incorrect data types to object properties to prevent security errors.
Methods are actions that your object takes within an application. These methods often receive data types as input. Just like properties, those data types passed in as arguments to a method could be primitive data types or other objects. Ensure the data types your method receives are correct to prevent a myriad of security problems.
When an application has a security problem related to passing in the wrong data type into an application is known as object type confusion. Object type confusion can lead to a number of problems up to executing commands on a remote machine. This book is about securing code and I’m not going into all the details of how these attacks work and the potential resulting damage. If you want to look at one of these attacks in more detail check out this blog post from Microsoft involving an Adobe Flash vulnerability.
Here’s an example of a CVE in Tensor Flow that allows an attacker to create a model that causes an integer overflow, or in other words push data to an application expecting an integer that is outside the bounds of an integer data type.
What kind of damage might be caused by an integer overflow? Well in one instance, it crashed a rocket. You can do additional research on your own for more examples of why you don’t want to allow data larger than the expected data type size including obtaining an understanding of buffer overflows. These types of vulnerabilities have caused numerous security problems over time.
Here’s another vulnerability due to failing to check data types in PHP that leads to sensitive information exposure:
Some programming languages help you prevent them via strong type checking. If they don’t, you should be checking those data types yourself. When a language is strongly typed you must declare the data type when you create a variable and also for parameters in functions you define. Java advertised that it prevented buffer overflows when it arrived on the scene. It does, mostly, through proper error handling, a topic covered in a prior blog post, and type checking.
Use proper error handling to ensure your custom objects accept only the expected values and call functions with the proper values. Use proper error handling to prevent unexpected errors from introducing security vulnerabilities.
Null
A null value in programming indicate that a variable has no value. It’s empty. Nothing is assigned to it. Some programming languages will distinguish between null and zero, such as Microsoft’s Transact-SQL. Other languages consider null and zero to be one and the same. Some languages use a different name for null values such as Python’s None. Different languages handle null in different ways and it is important to understand those distinctions.
Null values are the source of many bugs, as well as security problems. When you forget to assign a value to variable and pass it to a method you may see the infamous “null value” error message appear on your screen. Too many times programmers forget to check for null values and the resulting error is not always descriptive or helpful. Checking a stack trace may lead you back to the offending function or library that caused a system crash with this nondescript error message.
Some data types use a null value to indicate to the program it has reached the end of a data type. For example, an application or programming language may determine when it reaches the end of the string when it encounters a null value. Sometimes a buffer will assume it’s at the end when it reaches a null value. Attackers may take advantage of this by inserting null values before the end of the value in an application, when then causes the application to dump memory or execute code passed in after it reached the null value. Sometimes the null value with be obfuscated (obscured or disguised) using encoded values, the topic of a future post.
Null values in financial applications can also cause unwanted errors and unexpected results. Is the null treated as a zero? Or is it treated as an empty string? Or nothing? Perhaps your application programming language handles the null differently than your database. When the null value occurs in a batch processing job it may cause the system to crash. A null value may also alter the outcome of a calculation. Always checked for null values in stored procedures or functions used for financial calculations.
You should always try to test your applications with a null values for all the different inputs to ensure you do not get unwanted results. Also test the word ‘null’ because sometimes programmers will inadvertently check for the string ‘null’ instead of an actual null value. The programming language itself may have an issue with the string ‘null’ as well.
Speaking of unwanted results involving null values, here’s a hilarious presentation by someone who thought it would be fun to get a license plate with the word “null” on it called “Go Null Yourself.” I highly recommend this entertaining presentation on what can go wrong with unexpected processing of null values. I doubt you can watch it without at least cracking a smile.



