Summary

The article revisits the famous quote by Phil Karlton that "There are only two hard things in Computer Science: cache invalidation and naming things," and argues that these two problems are more pervasive and fundamental to programming than often acknowledged, suggesting that almost everything in Computer Science can be difficult.

Abstract

The author of the article, a software engineer, reflects on the well-known adage attributed to Phil Karlton, which humorously identifies cache invalidation and naming as the only two hard problems in Computer Science. The piece delves into the significance of these issues, highlighting the challenges of naming conventions in code and the complexities of caching mechanisms, which are central to programming due to the foundational principles of the Von Neumann architecture. The author extends the concept of caching to include basic data assignment in programming, arguing that managing data effectively is a substantial part of what makes programming difficult. The article suggests that the two problems are not just challenges but fundamental aspects of Computer Science, with implications that make most programming tasks inherently complex. The author also touches on the potential of Functional Programming to mitigate some of these difficulties by managing data within a localized scope and reducing side effects.

Opinions

The author believes that the quote by Phil Karlton is an understatement and that the two problems it highlights are indicative of the broader complexity of Computer Science.
There is a hint of self-deprecation as the author admits to creating classes with vague names like "Manager," "Handler," or "Helper," acknowledging the common struggle with naming things clearly and meaningfully.
The author asserts that caching, a fundamental aspect of programming, is akin to life in the field, emphasizing its ubiquity and necessity.
The article posits that almost everything in Computer Science is difficult, contrary to the implication of Karlton's quote that most issues are relatively easy.
The author suggests that Functional Programming can simplify some aspects of programming by localizing data management and reducing side effects, but also recognizes that it is not a one-size-fits-all solution.
The piece concludes that while making programming tasks stateless could simplify them, it is not always feasible, and the goal should be to shift tasks from the "hard" to the "easy" column where possible.

Debunking the infamous “Only two hard problems in Computer Science”

As a software engineer, one of my favorite sayings has always been:

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Is it a joke, a truism, or both? The saying comes up fairly often during design (naming) and debugging why you have the wrong value (caching). With regards to naming, for example, browse through your Git repo and count the number of class names that end in Manager, Handler, or Helper. What exactly is the responsibility of these classes? (Guilty admission: I just created a new Handler class yesterday).

Not to mention the challenge with arriving on consistent, clear domain names for entities related to the actual capability you are trying to build. There is an entire book and discipline centered around the idea that building software is much easier if we simply use the same names for things throughout the documentation, design, code, and interfaces.

And then there is caching. Trying to make something faster before we know its slow is a favorite past time of engineers. Even those of us that learned this the hard way catch ourselves doing it sometimes. For an insightful and comic take on this, see why Donald Knuth claimed that “premature optimization is the root of all evil”.

Back to Phil Karlton’s quote, one can take either view point. Its either a literal observation or an extremely funny joke (keeping in mind there is a hint of truth in every joke). In either case, the implication is clear: most issues conversely in Computer Science are less (or not that) difficult. They are what we like to call ‘tractable’ or ‘solvable’ problems. Given some reasonable amount of time, we can accomplish these things.

However, as a relatively new panelist on the Ruby Rogues podcast, I participated in a discussion this week that changed my entire view on this. A light bulb went off, and I realized that the reality of programming means we should actually infer the opposite. Almost everything in Computer Science is difficult (or at least can be made so).

Caching is Life

The Dani Rojas character on the show Ted Lasso likes to say that football is life. In many ways, caching is life in programming.This is why the two hard problems quote is wildly underestimated in its vast scope.

Why do I say this? A 15-second interlude on the foundational paradigm of computing helps answer this. Almost all common programming falls under the Von Neumann architecture where only two things matter: data and instructions. Our programs read and write data, and they execute instructions against that data. That’s it at the most basic level.

Think back to Assembly language if you were ever (un)fortunate enough to try it out (only one semester in college for me, thank you very much), and you’ll better appreciate this concept. Thankfully, we have many abstractions and frameworks that now sit on top of this but the underlying concepts remain.

I’m skeptical, show me the code

Consider a simple assignment statement in your program. This may take various forms including one of the following:

# Example 1
a = 10

# Example 2
sum = add(5, 7)

# Example 3
price = get_bitcoin_price_in_usd()

In these examples, we are caching the expression on the right-hand-side so that we can reference it in the future using the variable name on the left-hand-side. But wait, you say, this isn’t caching. Caching is when I get something from the database and save it in memory, so I don’t have to hit the database again for that item.

That is true. That is an example of caching. However, so are the three code samples above. They apply the same concept in that I don’t have to do-the-thing-I-need-to-do-to-get-the-data again.

A simple constant value assignment (Example #1) is not that interesting. Example #2 always results in the same value. The answer could be memoized, another form of caching, but it’s hardly worth it in this case.

Finally, Example #3 provides a clear analogy for our purposes here. As anyone who follows cryptocurrency knows, the second after you get the price of Bitcoin, your cache should likely be invalidated as the price can change quickly.

Half of programming is hard, half is easy

By extension then, one of the two foundational elements of programming is challenging, i.e. managing data in our programs. Anytime we reference data using a variable, it can be viewed as caching. Based on a simplistic interpretation of Von Neumann’s model, this is half of what we do!

Phil Karton was not only right, he was more right than perhaps even he imagined. The two problems we face are not limited though, they are fundamental to our discipline.

So what is the net result of this observation? We can’t get rid of data because I can’t think of a useful program that doesn’t involve some kind of data.

Purists will tell you that Functional Programming is the answer. All data is managed within a localized scope (the function), so although you strictly speaking still have our academic view of caching, you eliminate side-effects and minimize the blast radius of continuing to hold onto (potentially wrong) values. Our variable only matters during that single invocation of the function.

Consider again our assignment statement to retrieve the price of Bitcoin. While the price value will undoubtedly change over time, we can make meaningful decisions about it during the lifetime of a function invocation, such as whether we should buy, sell, or hodl (not a typo, see background here).

Although I am a huge fan of Functional Programming, you should always use the right tool for the job. It is great in some use cases, and terrible at others. For example, user interfaces inherently maintain state and are typically best implemented using objects. What did the user type into your form? Simply reference the corresponding object’s text_box.value. At the same time, extensive use of stateful objects in workflow systems can lead to unintended side effects.

Or consider another (Ruby) code snippet to further illustrate the point from a different angle.

# Smaller chance of side effects with assingment to a
def my_function
    a = 10
    # do some great stuff
end

# Bigger chance of side effects (dollar sign is a global var)
$a = 10

Stateless will generally always be simpler

If life were only that easy, we would make everything stateless. Alas however, we can’t always use this approach without making some significant tradeoffs. But we can look for opportunities to do so. We don’t have to use objects for everything. More generally, we don’t need to maintain state outside of the database in many use cases.

Our goal can be to shift more of our programming tasks from the hard column over to the easy column. Now the challenge is just coming up with the right set of instructions. Oh yeah, and naming those functions.

If you are looking for a new job in tech and need to prepare for interviews, find out more using my 7-hour plan.