Skip Navigation
BlackBerry Blog

Data Science Talks Machine-Learning and Artificial Intelligence

Artificial intelligence (AI) and machine learning (ML) are incredibly powerful tools for the security industry as a whole, not to mention their capabilities when applied to any industry.

Once I started learning ML while working at Cylance, I identified how powerful of a tool it was. It changed how I thought about problems, and enabled me to tackle problems at scales that otherwise I would consider impossible. Not only that, but it allowed me to free up my own time; as the ML does work, I can be working on more, enabling me to not only handle harder problems, but more of them at once. As an employee and potential hire by other organizations, my value drastically increased.

Let’s be real, it's also just really cool to work with. For example, our 2016 Blackhat USA talk on ML in infosec shows a bunch of fun examples:


VIDEO: Cylance Data Science Team at Black Hat 2016

When getting into the ML world, I found many educational resources had this hyper focus on explaining on how every algorithm worked instead of showing how ML could be practically applied. They also typically didn’t focus on what a developer/researcher/etc. would really need to know to get started with machine learning.

No, you don’t need to be able to do back propagation by hand to utilize a neural network, but it does help to understand the theory. For this reason, in our new book, we focus more on algorithms at a high level, and avoid digging too much into the weeds of everything. We also provide examples that allow for immediate takeaways with simple tools that implement machine learning techniques explained in each chapter.

Example Code: https://github.com/CylanceSPEAR/IntroductionToMachineLearningForSecurityPros

Clustering

We start out by covering the topic of clustering. Clustering is essentially grouping pieces of information by similarity. For instance, imagine a large set of images that you want to group based on their similarities; clustering would be our primary choice for such a process. The chapter also covers high dimensionality and features, which are essential for understanding later chapters.

We also cover kmeans and DBSCAN clustering algorithms in some depth to help show the reader how they work under the hood.

In the chapter, we use the practical example of clustering HTTP logs in order to identify groups of behaviors as well as anomalous behavior.

Classification

The classification chapter is the heart of the book, as classification is one of the more powerful methods of machine learning. With classification, you can use ML to make decisions about information. An example: for determining if a file is malicious or benign or what botnet a command and control panel belongs to.

The example for this chapter focuses on identifying botnet command and control panels, utilizing decision trees and logistic regression to be able to make such a classification in very few requests, hence minimizing the noise a command and control operator might notice.

Probability

The probability chapter shows how basic probability is used, as well as how probability can be used for both clustering and classification. Probability is a powerful tool for anyone to take advantage of, especially when it comes to ML. I mean, what are the odds that probability won’t be useful to you in the future, no matter what you’re doing?

For the example in this chapter, we tackle the classic problem of machine learning in security: identifying spam messages. We apply a small twist to the problem by identifying spam SMS messages.

Deep Learning

Deep learning is one of the coolest parts of ML, allowing you to teach a virtual brain to tackle any number of problems. It's a powerful tool to understand and use, but it can be quite complex

In this chapter of our book, we not only cover the topic of neural networks, but also the more advanced neural network topics such as Convolutional Neural Networks and Long Short-Term Memory Neural Networks, as they are massively powerful.

For our example, we predict the length of an XOR encryption key over encrypted text, and leave room for the reader to create their own predictive model to predict the encryption key.

Introduction to Artificial Intelligence for Security Professionals

Artificial intelligence and machine learning are incredibly powerful and empowering topics for anyone in the security or development field to learn. For that reason, the Data Science team here at Cylance has written a book to help anyone get into the world of ML.

The focus of the book is to allow someone without a massive math background learn how ML works and how it can be applied without needing to get a PhD.

The book our Data Science team has written is focused on being an introduction to machine learning for people in information security or even into software engineering. It deviates from the path of being a math heavy introduction and focuses, instead, on the practical applications to get the reader from novice to capable as quick as possible.

If you are looking to greatly increase your value as an employee and more so as someone who can solve problems, check out our book.

About the Cylance Data Science Team

The Cylance Data Science team consists of experts in a variety of fields: Contributing members from this team for this book include:

  • Brian Wallace, a security researcher turned data scientist with a propensity for building tools that merge the worlds of information security and data science.
  • Sepehr Akhavan-Masouleh is a data scientist who works on the application of statistical and machine learning models in cyber-security with a Ph.D from University of California, Irvine.
  • Andrew Davis is a neural network wizard wielding a Ph.D in computer engineering from
    University of Tennessee.
  • Mike Wojnowicz is a data scientist with a Ph.D. from Cornell University who enjoys developing and deploying large-scale probabilistic models due to their interpretability.
  • John H. Brock is a data scientist who researches applications of machine learning to static malware detection and analysis, holds an M.S. in computer science from University of California, Irvine, and can usually be found debugging Lovecraftian open source code while mumbling to himself about the virtues of unit testing.
The BlackBerry Cylance Data Science and Machine Learning Team

About The BlackBerry Cylance Data Science and Machine Learning Team

The BlackBerry Cylance Data Science and Machine Learning research team consists of experts in a variety of fields. With machine learning at the heart of all of BlackBerry’s cybersecurity products, the Data Science Research team is a critical, highly visible, and high-impact team within the company. The team brings together experts from machine learning, stats, computer science, computer security, and various applied sciences, with backgrounds including deep learning, Bayesian statistics, time-series modeling, generative modeling, topology, scalable data processing, and software engineering.

What we do:

  • Invent novel machine-learning techniques to tackle important problems in computer security
  • Write code that scales to very large datasets, often with millions of dimensions and billions of attributes
  • Discover ways to strengthen machine learning models against adversarial attacks
  • Publish papers and present research at conferences