Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6072

Abstract

might have defined features of different types of animals in your initial data set. From that you derive an algorithm to categorize data new data sets with unidentified animal types.<a href="https://en.wikipedia.org/wiki/Unsupervised_learning">Unsupervised</a>: Abstractions from unlabeled data sets that you can apply to new datasets. In this case you may be looking for clusters of observations with similar properties.<a href="https://en.wikipedia.org/wiki/Statistical_classification">Classification</a>: Assigning observations to categories.<a href="https://en.wikipedia.org/wiki/Regression_analysis">Regression</a>: Predicting numeric properties related to observations. Estimating relationships between variables.<a href="https://en.wikipedia.org/wiki/Artificial_intelligence">Artificial Intelligence</a>: An algorithmic solutions to complex problems typically solved by humans. The book uses the example of a self-driving car.<a href="https://en.wikipedia.org/wiki/Deep_learning">Deep Learning</a> or <a href="https://en.wikipedia.org/wiki/Neural_network">Neural Network</a>: A strict subset of machine learning that uses layers of simpler statistics to learn about data.<a href="https://en.wikipedia.org/wiki/Pattern_recognition">Pattern recognition</a>: Find common characteristics in the data that indicate an attack.<a href="https://en.wikipedia.org/wiki/Anomaly_detection">Anomaly detection</a>: Creating a baseline of normal activity and determining when actions deviate from that baseline.<a href="https://en.wikipedia.org/wiki/Stemming">Stemming</a>: remove morphological affixes from words for more flexible pattern matching, for example, different cases of the same verb.Fuzzy Hashing or <a href="https://en.wikipedia.org/wiki/Locality-sensitive_hashing">LSH (Location sensitive hash)</a>: A method to find matching values only slightly different rather than exact matches.<a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">Frequency/Inverse Document Frequency (TF/IDF)</a>: A better indicator of a word’s importance in a text sample by normalizing raw word counts.<a href="https://en.wikipedia.org/wiki/Ensemble_learning">Ensemble/Stacked Generalization/Stacking</a>: Taking advantage of the strengths of different methods of machine learning.<a href="https://en.wikipedia.org/wiki/Overfitting">Overfit</a>: Tied tightly to the training dataset and doesn’t generally work across other and larger, more varied data sets.<a href="https://en.wikipedia.org/wiki/Time_series">Time Series Analysis</a>: Regression problems for which inputs have a time dimension (often system logs).<a href="https://en.wikipedia.org/wiki/Cluster_analysis">Cluster Analysis</a>: Given several data points that are similar to one another.<a href="https://en.wikipedia.org/wiki/Feature_%28machine_learning%29">Feature</a>: Properties of something observed by a machine learning algorithm.<a href="https://en.wikipedia.org/wiki/One-hot">One-hot encoding</a>: Dataset with binary features where each row in the dataset has precisely one of the features set to one.<a href="https://en.wikipedia.org/wiki/Dummy_variable_%28statistics%29">Dummy Variables (statistics):</a> Variable that takes 0 or 1 as a value to indicate a metric that may influence a prediction. They may be considered numeric stand-ins for qualitative variables.<a href="https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets">Training Dataset</a>: The data used to produce a model.<a href="https://en.wikipedia.org/wiki/Machine_learning#Models">Model</a>: An algorithm that takes in data points and outputs predictions.<a href="https://en.wikipedia.org/wiki/Machine_learning#Models">Model family</a>: The universe of models to choose from for a particular problem.Cost function or <a href="https://en.wikipedia.org/wiki/Loss_function">Loss function:</a> Quantitatively compare different models. Measures the cost or loss of wrong predictions.<a href="https://en.wikipedia.org/wiki/Machine_learning#Optimization">Optimization algorithm</a>: Helps choose the best model.The goal of a Machine Learning Algorithm: Find parameters that produce predicted labels and minimize loss.<a href="https://en.wikipedia.org/wiki/Convergence_%28evolutionary_computing%29">Convergence</a>: Satisfactory optimization result.<a href="https://en.wikipedia.org/wiki/Categorical_variable">Categorical Variable</a>: Can take one of several usually fixed values.<a href="https://en.wikipedia.org/wiki/Continuous_or_discrete_variable#Continuous_variable">Continuous Variable:</a> Can take an uncontrollable set of values.<a href="https://en.wikipedia.org/wiki/Continuous_or_discrete_variable#Discrete_variable">Discrete Variable</a>: Can take one of a set number of fixed values.<a href="https://en.wikipedia.org/wiki/Linear_regression">Linear regression:</a> Find an equation to accurately predict a binary outcome based on the values of one or more input variables. Continuous outcome. Regression algorithm.<a href="https://en.wikipedia.org/wiki/Logistic_regression#:~:text=Logistic%20regression%20is%20a%20statistical,a%20form%20of%20binary%20regression%29.">Logistic regression:</a> Requires more data. In some cases, linear regression will produce r

Options

esults more efficiently. Calculate the probability of a particular response value based on a set of input variables. Discrete outcome. Classification Algorithm. Outputs probability between 0 and 1.<a href="https://en.wikipedia.org/wiki/Decision_tree_learning">Decision trees</a>: Supervised learning model that is a binary tree data structure. Simple to understand (a series of binary decisions). Efficient. May suffer from overfitting and not appropriate for some kinds of relationships. Tend to be less accurate than other methods.<a href="https://en.wikipedia.org/wiki/Decision_tree_learning">Gini impurity</a>: The proportion of samples incorrectly labeled in a subset, randomly labeled according to the distribution of labels in a set. A higher-quality decision tree splits a set into subsets cleanly separated by their labels resulting in a lower gini impurity score.<a href="https://en.wikipedia.org/wiki/Variance_reduction">Variance reduction</a>: Total reduction of variance in a set after splitting it in two. The best split is the one that provides the most significant variance reduction.<a href="https://en.wikipedia.org/wiki/Information_gain_ratio">Information gain</a>: The amount of entropy reduction from a split of a data set where child notes have less entropy than the parent nodes.<a href="https://en.wikipedia.org/wiki/Random_forest">Random Forests</a>: Simple ensembling of multiple decision trees.<a href="https://en.wikipedia.org/wiki/Gradient_boosting">Gradient-Boosted decision trees (GBDTs)</a>: Leverage intelligent combinations of individual decision tree predictions to get better overall predictions.<a href="https://en.wikipedia.org/wiki/Support_vector_machine">Support vector machine (SVG)</a>: Linear classifier that leverages hinge loss.<a href="https://en.wikipedia.org/wiki/Hinge_loss">Hinge loss</a>: Penalizing only incorrect predictions or correct predictions very close to the line between correct and incorrect. Looks at margins.<a href="https://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes’ Theorem</a>: Describes the probability of an event based on prior conditions that might be related to that event.<a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive Bayes classifier</a>: Uses the Bayes theorem to classify events, assuming that all features are independent.<a href="https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm">k-Nearest Neighbors</a>: lazy learning algorithm. Collects data during training. Puts off computations until classification time. Fast. Space-inefficient. Rarely seen in practical models. Skewed towards classes with more samples in the training data.<a href="https://en.wikipedia.org/wiki/Neural_network">Neural Networks</a> or Artificial Neural Networks (ANNs): Modeled after the human brain, neurons consist of individual mathematical step functions (activation functions) that take weighted input and may emit output that triggers another neuron.<a href="https://en.wikipedia.org/wiki/Regularization_%28mathematics%29">Regularization</a>: adding a term to the loss function that represents model complexity quantitatively.<a href="https://en.wikipedia.org/wiki/Cluster_analysis">Clustering Algorithms</a>: Group data points together. Examples: grouping, k-means, hierarchical, locality-sensitive hashing, <a href="https://en.wikipedia.org/wiki/DBSCAN">density-based spatial clustering of applications with noise (DBSCAN)</a>.How to pick a model: Test different methods.That last point — test different methods — leads me to my next blog post where I’ll talk about creating my model and my thoughts on the <a href="https://readmedium.com/security-machine-learning-part-2-a99b98bca1d8">application of machine learning to cybersecurity</a> presented in the remainder of the book. In my last post in the series I cover <a href="http://Check out part three of this series where I address optimization of machine learning models.">optimization of machine learning models</a>.Follow for updates.Teri Radichel | © <a href="https://2ndsightlab.com/?source=post_page---------------------------">2nd Sight Lab</a> 2020<div id="8b5f"><pre>About Teri Radichel:

⭐️ Author: Cybersecurity Books
⭐️ Presentations: Presentations by Teri Radichel
⭐️ Recognition: SANS Award, AWS Security Hero, IANS Faculty
⭐️ Certifications: SANS ~ GSE 240
⭐️ Education: BA Business, Master of Software Engineering, Master of Infosec
⭐️ Company: Penetration Tests, Assessments, Phone Consulting ~ 2nd Sight Lab</pre></div><div id="caae"><pre><span class="hljs-section">Need Help With Cybersecurity, Cloud, or Application Security?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~</span>
🔒 Request a penetration test or security assessment
🔒 Schedule a consulting call
🔒 Cybersecurity Speaker for Presentation</pre></div><div id="3b5e"><pre>Follow <span class="hljs-keyword">for</span> more stories like <span class="hljs-keyword">this</span>:

❤️ Sign Up my Medium Email List ❤️ Twitter: @teriradichel ❤️ LinkedIn: https://www.linkedin.com/in/teriradichel ❤️ Mastodon: @teriradichel@infosec.exchange ❤️ Facebook: 2nd Sight Lab ❤️ YouTube: @2ndsightlab</pre></div><figure id="5610"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*H9Ew1KCl-29nZiPR.jpeg"><figcaption></figcaption></figure></article></body>

Security & Machine Learning — Part 1

Basic machine learning terminology

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

🔒 Related Stories: AI & Machine Learning

💻 Free Content on Jobs in Cybersecurity | ✉️ Sign up for the Email List

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We’ve been doing this meetup series on AWS DeepRacer at the Seattle AWS Architects and Engineers meetup. I got one of the cars to use with our group. Our intention was to hold an event and race the car. Unfortunately, COVID hit, and we weren’t able to hold meetups in person, so we are doing this virtually. The great thing is that AWS has a way to set up virtual races, so you don’t even need the physical AWS DeepRacer to participate.

If you are reading this prior to our meetup on Monday October 5, 2020, you can find out how to submit a model here and participate for prizes! I’ll be posting another blog tomorrow with some additional information about creating and submitting a model, and you can find some links @teriradichel on Twitter.

I will admit that my first reaction to the AWS DeepRacer was, why would I want to do that? But some people really got into it. I watched some of the DeepRacer League videos and they are actually quite intriguing to watch. I presumed that getting a DeepRacer and using it would motivate me to get a deeper understanding of ML and how it might apply to things we do at 2nd Sight Lab like pentesting and analysis of security logs. During the meetup series, Drake Loud from Pariveda Solutions gave an excellent presentation that introduced us to machine learning basics.

Playing with the car was fun, and we will see the results of our race at the final meetup. The available parameters or “features” to set for the vehicle were interesting. I wanted to know more about how to optimize to win a race. However, I was also curious to find out how effectively we can apply machine learning to security problems. That led me to pick up this book on the topic: Machine Learning & Security: Protecting Systems with Data and Algorithms.

I will write more on my thoughts on the book and the use of machine learning in security in part two of this blog post. Initially, I just needed to get a handle on the terminology and general concepts for ML. These are some notes that you may find helpful. They also apply to determining how to set up a model for your car. As I suspected, a successful model that can win a race depends on your training data, the track, and some iterative testing.

Machine learning — issues in cybersecurity:

When it comes to machine learning for security, a few issues exist. I’ve heard some of these before when AWS machine learning experts came to speak at our meetup. These concepts apply not only to security but using machine learning to solving predictability problems in general:

More data = better results. To get accurate results, you’ll be better off if you have a large volume and diversity of data. That’s why I believe ML on cloud solutions is particularly powerful. The cloud providers have a lot of data to leverage.
Some aspects of ML may be overly resource-intensive in certain environments, such as embedded systems.
You may need to optimize for different factors with different datasets and problems. As an example, AWS offers several tracks on which you can test your model. The variation in tracks will undoubtedly affect how well your car performs. You may optimize your model for one track, and then it performs miserably on another.
ML may be unnecessary, where a simple rule will suffice. That is something I’ve thought about from the start. In some cases, machine learning may be overkill to achieve the objective you’re targeting — especially when you don’t have enough data to build a generalized model that works in all circumstances.

Machine Learning Terminology

Whenever I start learning a new topic, I try to get broad-based concepts and terminology. Later I delve into the details. Here are some machine learning terms to help you get started. You can also watch Drake Loud’s presentation on the topic.

Machine Learning: Statistical learning algorithms capable of generalizing abstractions (models) by analyzing and interpreting datasets.

Supervised: Bayesian approach using probabilities of previously observed events. For example, you might have defined features of different types of animals in your initial data set. From that you derive an algorithm to categorize data new data sets with unidentified animal types.

Unsupervised: Abstractions from unlabeled data sets that you can apply to new datasets. In this case you may be looking for clusters of observations with similar properties.

Classification: Assigning observations to categories.

Regression: Predicting numeric properties related to observations. Estimating relationships between variables.

Artificial Intelligence: An algorithmic solutions to complex problems typically solved by humans. The book uses the example of a self-driving car.

Deep Learning or Neural Network: A strict subset of machine learning that uses layers of simpler statistics to learn about data.

Pattern recognition: Find common characteristics in the data that indicate an attack.

Anomaly detection: Creating a baseline of normal activity and determining when actions deviate from that baseline.

Stemming: remove morphological affixes from words for more flexible pattern matching, for example, different cases of the same verb.

Fuzzy Hashing or LSH (Location sensitive hash): A method to find matching values only slightly different rather than exact matches.

Frequency/Inverse Document Frequency (TF/IDF): A better indicator of a word’s importance in a text sample by normalizing raw word counts.

Ensemble/Stacked Generalization/Stacking: Taking advantage of the strengths of different methods of machine learning.

Overfit: Tied tightly to the training dataset and doesn’t generally work across other and larger, more varied data sets.

Time Series Analysis: Regression problems for which inputs have a time dimension (often system logs).

Cluster Analysis: Given several data points that are similar to one another.

Feature: Properties of something observed by a machine learning algorithm.

One-hot encoding: Dataset with binary features where each row in the dataset has precisely one of the features set to one.

Dummy Variables (statistics): Variable that takes 0 or 1 as a value to indicate a metric that may influence a prediction. They may be considered numeric stand-ins for qualitative variables.

Training Dataset: The data used to produce a model.

Model: An algorithm that takes in data points and outputs predictions.

Model family: The universe of models to choose from for a particular problem.

Cost function or Loss function: Quantitatively compare different models. Measures the cost or loss of wrong predictions.

Optimization algorithm: Helps choose the best model.

The goal of a Machine Learning Algorithm: Find parameters that produce predicted labels and minimize loss.

Convergence: Satisfactory optimization result.

Categorical Variable: Can take one of several usually fixed values.

Continuous Variable: Can take an uncontrollable set of values.

Discrete Variable: Can take one of a set number of fixed values.

Linear regression: Find an equation to accurately predict a binary outcome based on the values of one or more input variables. Continuous outcome. Regression algorithm.

Logistic regression: Requires more data. In some cases, linear regression will produce results more efficiently. Calculate the probability of a particular response value based on a set of input variables. Discrete outcome. Classification Algorithm. Outputs probability between 0 and 1.

Decision trees: Supervised learning model that is a binary tree data structure. Simple to understand (a series of binary decisions). Efficient. May suffer from overfitting and not appropriate for some kinds of relationships. Tend to be less accurate than other methods.

Gini impurity: The proportion of samples incorrectly labeled in a subset, randomly labeled according to the distribution of labels in a set. A higher-quality decision tree splits a set into subsets cleanly separated by their labels resulting in a lower gini impurity score.

Variance reduction: Total reduction of variance in a set after splitting it in two. The best split is the one that provides the most significant variance reduction.

Information gain: The amount of entropy reduction from a split of a data set where child notes have less entropy than the parent nodes.

Random Forests: Simple ensembling of multiple decision trees.

Gradient-Boosted decision trees (GBDTs): Leverage intelligent combinations of individual decision tree predictions to get better overall predictions.

Support vector machine (SVG): Linear classifier that leverages hinge loss.

Hinge loss: Penalizing only incorrect predictions or correct predictions very close to the line between correct and incorrect. Looks at margins.

Bayes’ Theorem: Describes the probability of an event based on prior conditions that might be related to that event.

Naive Bayes classifier: Uses the Bayes theorem to classify events, assuming that all features are independent.

k-Nearest Neighbors: lazy learning algorithm. Collects data during training. Puts off computations until classification time. Fast. Space-inefficient. Rarely seen in practical models. Skewed towards classes with more samples in the training data.

Neural Networks or Artificial Neural Networks (ANNs): Modeled after the human brain, neurons consist of individual mathematical step functions (activation functions) that take weighted input and may emit output that triggers another neuron.

Regularization: adding a term to the loss function that represents model complexity quantitatively.

Clustering Algorithms: Group data points together. Examples: grouping, k-means, hierarchical, locality-sensitive hashing, density-based spatial clustering of applications with noise (DBSCAN).

How to pick a model: Test different methods.

That last point — test different methods — leads me to my next blog post where I’ll talk about creating my model and my thoughts on the application of machine learning to cybersecurity presented in the remainder of the book. In my last post in the series I cover optimization of machine learning models.

Follow for updates.

About Teri Radichel:
~~~~~~~~~~~~~~~~~~~~
⭐️ Author: Cybersecurity Books
⭐️ Presentations: Presentations by Teri Radichel
⭐️ Recognition: SANS Award, AWS Security Hero, IANS Faculty
⭐️ Certifications: SANS ~ GSE 240
⭐️ Education: BA Business, Master of Software Engineering, Master of Infosec
⭐️ Company: Penetration Tests, Assessments, Phone Consulting ~ 2nd Sight Lab

Need Help With Cybersecurity, Cloud, or Application Security?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
🔒 Request a penetration test or security assessment
🔒 Schedule a consulting call
🔒 Cybersecurity Speaker for Presentation

Follow for more stories like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
❤️ Sign Up my Medium Email List
❤️ Twitter: @teriradichel
❤️ LinkedIn: https://www.linkedin.com/in/teriradichel
❤️ Mastodon: @teriradichel@infosec.exchange
❤️ Facebook: 2nd Sight Lab
❤️ YouTube: @2ndsightlab