avatarTeri Radichel

Summarize

Should We Apply Statistics to Cybersecurity Risk Decisions?

ACM.2 Considering different methods of risk analysis

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

⚙️ Check out my series on Automating Cybersecurity Metrics. The Code.

🔒 Related Stories: Cybersecurity | Cybersecurity Math | Governance

💻 Free Content on Jobs in Cybersecurity | ✉️ Sign up for the Email List

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

https://en.wikipedia.org/wiki/File:Standard_Normal_Distribution.png

In the last post I reviewed a book How to Measure Anything in Cybersecurity Risk. I listened to parts of it again while researching and writing this blog post.

The book covers various formulas and methods to calculate the probability of whether or not an organization is likely to have a data breach.

While listening to the methods in this book I had three questions:

1. How accurate are the methods recommended in this book?
2. Do the methods reduce the chance an organization will have a data breach?
3. How do these methods compare with others used to quantify and evaluate cybersecurity risk?

Accuracy of actuarial methods applied to cybersecurity

If you work as an actuary in the cyber insurance industry, you will definitely be applying statistical methods to datasets to try to determine how to price cybersecurity policies. That is a given.

We can look to the insurance industry to see how well the application of statistical methods is working at the moment. Insurance companies need to predict how many of the companies to whom they sell policies will have data breaches and how much they will have to pay out in order to set the rates for all their customers.

The article explains that insurance companies are struggling to correctly calculate risk and set rates for cyber insurance:

The overarching issue is that cyber is a sizable insurance market, and it is relatively new. There just isn’t enough good data and loss experience to properly underwrite the risk.

That statement is at odds with the book on cybersecurity metrics that says you don’t need a lot of data to calculate risk. The article also notes the variables and constant change related to data breaches make it hard to assess and calculate cybersecurity risk:

“Cyber is, of course, still emerging and evolving. But whether or not it’s a new insurance line, so much in cyber is conditional — it is dynamic and changing, and that is driving the number and nature of the claims that insurers are seeing,” he added.

Another article states that cyber insurance companies need more money because they underestimated the risk and what they will need to pay out:

Tangent: Where is that money going when an insurance company pays on a claim? It’s going towards lawsuits, fines, and ransomware payments. You could say all that money is going into the pockets of lawyers, criminals, and governments. The insurance payouts aren't stopping the breaches or helping the actual victims. And by the way, if you're investing in Bitcoin and other crypto, you're essentially investing in criminal enterprises.

Insurance companies are raising rates in response to initially underestimating the risk and cost of providing cyber insurance.

Insurance companies are making adjustments in an effort to get these calculations right and prevent losses. Direct-written premiums collected by the largest U.S. insurance carriers in 2021 swelled by 92% year-over-year, according to information submitted to the National Association of Insurance Commissioners, an industry watchdog, and compiled by ratings firms. Analysts say that the increase primarily reflects higher rates, rather than insurers significantly expanding the amount of money they are willing to cover.

Insurance companies will certainly improve their risk metrics over time. They have to because they need to set rates appropriately or they will lose money. As you will see if you read on, one of the drivers of producing better risk metrics involves a more accurate data model or process simulation.

If predicting the probability of a data breach is a critical component to the success of companies providing cyber insurance and they can’t get it right, how likely is it that an organization that has many other competing objectives will accurately predict their own cybersecurity risk? Of course that doesn’t mean we shouldn’t try. It just points to the fact that there seems to be a good chance we might get it wrong.

Eliminating coverage for events that are too difficult to predict

One method insurance companies can use to reduce their own risk is to stop insuring the most risky and expensive scenarios. That’s what health insurance companies did when they stopped insuring people with the most expensive medical conditions. Home insurers do this by excluding natural disasters.

That’s also what Lloyds of London did by excluding nation state attacks.

What constitutes a nation state attack? Just about everything these days is attributed to some organization connected to a government. That new policy amendment could apply to a large number of claims. What if attribution is unknown? The article states that the insurance company will decide.

This approach seems to give the insurance company a lot of leeway on whether or or not they will pay a claim for a data breach. Unfortunately, an organization facing the possibility of a data breach cannot simply exclude nation state attacks as a possibility to reduce their own risk. As I discuss in my book, you might want to check if your insurance policy excludes acts of war.

Statistics for Individuals vs. Populations

I read a comment from someone that highlights the difference between an actuary trying to set rates for an insurance company and an individual organization trying to predict if and when they will have a data breach. The person’s misconceptions lied in the fact that he thought an actuary could predict when he would die.

An actuary will be calculating risk for a population of insured customers to set a rate to charge customers that will still allow the insurance company to make money. The actuary may state that 5% of the population will get sick from a particular ailment based on trends and statistics and 1% will die. But the actuary cannot predict exactly when an individual in that population will die and how.

How meaningful are the probabilities that an organization will have a data breach to an organization based on industry-wide statistics? It does provide some insight into how risk associated with the decision to fix a known security problem. But it can’t definitively answer the question “Are we going to have a data breach if we don’t fix that problem?”

You have two ways of looking at a security vulnerability, both of which involve quantifiable methods. One is more likely to prevent a data breach than the other:

On the one hand, you can estimate probability of a security event based on a particular configuration within your organization that attackers have exploited at other companies. You decide to take a chance that the event will not happen based on the entire population of companies and the percent that have been affected by this vulnerability exploit (which you probably don’t really know). In this scenario, you haven’t reduced your risk, you’re simply taking a chance based on the odds. The cost if you are wrong and get attacked based on this control is most likely in the realm of the average cost of a data breach.

Conversely, you may decide to shore up security defenses related to a known bad security configuration. That choice provides a concrete risk reduction. It’s not a probability, an estimate with a range of confidence, or a guess. This choice offers a measurable risk reduction based on a reduction of the quantity of security configurations that exist within the organization that could lead to a data breach. The cost if you are wrong and do not get attacked is the cost of the security control.

Cybersecurity decision-making process

Let’s say you have five doors on your house. You have locks on four of the doors. You know that a lock is a pretty good deterrent when it comes to preventing someone from stealing what’s inside your house.

Would you ever just skip installing a lock on the fifth door of a home or business in a large city where crime is prevalent? Do you spend hours calculating probabilities and risks to make that decision? Or do you install the lock based on the information available to you?

Of course, a lock is not perfect security. We all know about lock pickers in cybersecurity. But just because someone could potentially pick the lock does that mean you should forget about installing the lock on the 5th door?

What could you do about the lock pickers? You could install an alarm. An alarm is an additional expense. Do you need to spend money on an alarm? Well, you could spend time calculating the probability that burglars will break into your home if you don’t have an alarm. Alternatively you could so some research and find reports like this:

When selecting a target…Approximately 83 percent said they would try to determine if an alarm was present before attempting a burglary.

You know that your chances of a break-in are lower if you have an alarm. You also know that the cost of stolen valuables plus and the cost of the associated actions you will need to take to deal with the break-in (time, money, and opportunity cost) is higher than the cost of the alarm, so you get one.

Insurance companies offer a discount for companies that install things like alarms and sprinkler systems. They ask for information about the building where a company is located prior to insuring office space. Perhaps they need to ask more questions about the state of the security configurations at a company before providing cyber insurance.

Statistical tools — Caveats

Statistical tools can be helpful when trying to model and predict events. Clearly they have some limits as demonstrated by the issues the insurance industry is having right now in regards to cyber insurance. If you plan to use them be aware of the following caveats.

One chapter in the book introduces Monte Carlo simulations as a means for predicting cybersecurity risk. This type of simulation is used by some financial advisors to predict probable investment outcomes. You can find some criticisms of this particular predictive tool in the follow article:

As explained by the investment advisor in this article, the output is highly correlated with the input and could produce wildly different results based on those inputs for a single portfolio. He feels that he was able to produce more realistic predictions by analyzing the portfolio with actual market data over different periods of time. My own thoughts on all of this are aligned with the last line of this article:

The bottom line for investors today, Evensky concludes, is being less concerned with the probability of success and more concerned with the consequences of failure.

According to the explanation in the link below, the accuracy depends on the accuracy of your simulation model. Or as stated here:

As you pointed out, Monte Carlo simulations are simulations. So they will not be accurate unless you simulate the processes in question realistically.

https://www.quora.com/Is-the-Monte-Carlo-method-accurate

One of the other tools used in this book is the Bayes’ theorem. As highlighted in the articles below, the accuracy of your model data greatly influences the quality of your predictions.

As long as you use your real prior density in constructing your model, then all Bayesian statistics are admissible, where admissibility is defined as the least risky way to make an estimate.

Validity is maintained as long as the prior probability model is correctly specified regardless of prespecified experimental design.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6406060/

As Yuling says, the full Bayes posterior is the right answer if the model is correct — but the model isn’t ever correct.

Bayes’ Theorem as it relates to false positives and false negatives from covid tests:

If you get a positive result on a Covid test that only gives a false positive one time in every 1,000, what’s the chance that you’ve actually got Covid? Surely it’s 99.9%, right? No! The correct answer is: you have no idea. You don’t have enough information to make the judgment.

Don’t get me wrong. I still think Bayesian methods are great, and I think the proclivity of Bayesian inferences to tend toward the ridiculous is just fine — as long as we’re willing to take such poor predictions as a reason to improve our models. But Bayesian inference can lead us astray, and we’re better statisticians if we realize that.

Cons:

1. Choice of prior. Coming up with a prior that’s well reasoned and actually represents your best attempt at summarizing a prior is a great deal of work in many cases. 
2. It’s computationally intensive. 
3. Posterior distributions are somewhat more difficult to incorporate into a meta-analysis, unless a frequentist, parametric description of the distribution has been provided.
4. Reviewer objections.

If your evidence is flimsy, Bayes’ theorem won’t be of much use. Garbage in, garbage out.

Clearly using the Baye’s Theorem will produce undesirable results if you don’t start with proper assumptions. I’ll let you review the book I’ve reviewed above to see what you think about the proposed model and if you think it will produce results that meet your needs. I’ve already moved on to other methods I’ll be covering in future posts.

The value of probability in cybersecurity

Using statistics to calculate the potential for future data breaches has some merit, especially in the insurance industry. How accurate are these predictions? This post covers some of the caveats of using statistical methods and potentially misleading outcomes. If you plan to use these methods make sure you fully understand the methods and the potential for error.

Alternative ways exist to quantify cybersecurity at an organization an evaluate risk, just like investors use alternative methods to make stock market investments. Follow me or sign up for the email list for more in this series like my next post on A Value-Based Approach to Cybersecurity Metrics.

Next:

Follow for updates.

Teri Radichel | © 2nd Sight Lab 2022

About Teri Radichel:
~~~~~~~~~~~~~~~~~~~~
⭐️ Author: Cybersecurity Books
⭐️ Presentations: Presentations by Teri Radichel
⭐️ Recognition: SANS Award, AWS Security Hero, IANS Faculty
⭐️ Certifications: SANS ~ GSE 240
⭐️ Education: BA Business, Master of Software Engineering, Master of Infosec
⭐️ Company: Penetration Tests, Assessments, Phone Consulting ~ 2nd Sight Lab
Need Help With Cybersecurity, Cloud, or Application Security?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
🔒 Request a penetration test or security assessment
🔒 Schedule a consulting call
🔒 Cybersecurity Speaker for Presentation
Follow for more stories like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
❤️ Sign Up my Medium Email List
❤️ Twitter: @teriradichel
❤️ LinkedIn: https://www.linkedin.com/in/teriradichel
❤️ Mastodon: @teriradichel@infosec.exchange
❤️ Facebook: 2nd Sight Lab
❤️ YouTube: @2ndsightlab
Cybersecurity
Statistics
Probabilities
Cybersecurity Metrics
Quantitative
Recommended from ReadMedium