top of page
  • Writer's pictureNick Thomas

How Companies Are Detecting Spear Phishing Attacks Using Machine Learning


Spear phishing attacks are becoming more sophisticated and harder to spot, but machine-learning algorithms are helping businesses detect and stop these malicious attacks.


Most people are familiar with phishing, where an attacker sends out a malicious email that pretends to be legitimate. Common phishing emails pretending to be notifications from trusted organizations (banks, Amazon, Netflix, etc.) that require the recipient to log into their account to fix the issue. By setting up a website that mimics the legitimate site, attackers can collect login credentials and other personal information.


Spear phishing emails are a more targeted version of phishing emails. Rather than use a pretext that applies to a large number of people (like an email from a common bank), spear phishers research their intended targets and tailor their malicious emails to that target. A simple example is one where the attacker uses the company logo and the name of a CEO to craft an attack that looks like the CEO ordering an employee to do something for them. Since these phishing emails look more realistic and personal, they are more likely to trick their targets.


The detection of spear phishing emails is a major component of a company's cyberdefense strategy. However, the more sophisticated a phishing email becomes, the more difficult it is to detect. Here we'll discuss some applications of machine learning to detect phishing emails.


Detect spear phishing with machine learning

Machine learning is a powerful tool for analyzing data and extracting patterns and anomalies. However, these algorithms need usable data in order to draw valid conclusions. There are presently three methods by which machine learning and anomaly detection algorithms can be applied to the detection of spear phishing emails.


Social graph analysis One of the first ways that machine learning can be applied to spear phishing detection is based on a "social graph" of the common communication patterns within a company. For example, members of the same department in the company are expected to communicate frequently and will have a high level of interconnectivity. On the flip side, you don't expect the accounting department intern to be frequently sending emails to the CEO or vice versa.


Building a social graph of a company is straightforward. By observing the information included in the headers of each email sent within the company, connections can be observed without needing to read the contents of the email itself. And by weighting connections between company employees based on frequency of communication, a social graph can be created.


Spear phishing attacks usually use a communication path outside of the ordinary within a company. One example is a business email compromise (BEC) attack where the attacker pretends to be someone in authority (like the CEO) and contacts another member of staff with instructions. These attacks are designed to take advantage of employee's instinctive reaction to obey authority.


Social graph analysis can help detect this type of spear phishing attack. By observing the connections used by each email and comparing them to the model, machine-learning algorithms can detect anomalous emails. While these may be legitimate, providing a warning decreases the probability that the recipient will be taken in by an attack.


User communication profiling Everyone has their own unique style and voice that is expressed when writing emails. Some of these are generally applicable (few CEOs include emoticons or text message abbreviations in their official communications) while others are more specific (someone may have a favorite phrase that they use frequently). These idiosyncrasies can be used to help detect and protect against spear phishing emails.


Natural language processing (NLP) is a field dedicated to teaching computers to understand and model language. Using NLP techniques, it is possible to analyze written text and extract identifying features from it. For example, the use of a dangling preposition (like the "for" in "What do you want that for?") is more common in some areas (and the people who grew up in those areas) than others. Also, people have different vocabularies, and a simple statistical analysis of word and phrase choice and preferred sentence structure and complexity can help to differentiate the writing of different people.


Using this type of linguistic analysis, certain types of spear phishing emails can be detected with anomaly detection algorithms. If someone writes an email in their own voice but then signs it as the CEO, it won't fit the profile generated from the CEO's legitimate emails. However, spear phishing emails which use a legitimate email as a template but change the destination of a few links may be undetectable by this method. Using this technique in combination with the others discussed here improves the probability that spear phishing emails will be detected and their recipients warned appropriately.


Email structural analysis When you get an email, what you see in your email client of choice is the sender, recipient, time, subject, message, and attachments. While this is the bulk of the information contained within most emails, it is far from everything in them and this other information can help in detecting spear phishing attacks.


For example, emails contain the chain of IP addresses that an email hopped through from the sender to the recipient. If the sender typically uses Gmail for their mail and rarely travels outside the United States, the originating IP address should probably be a Google server located within the U.S. While the chain of IP addresses can be faked or modified, it typically involves adding additional hops to conceal the originating address. If a user's emails typically take a hop or two to reach their destination and suddenly this one takes five to ten hops, then maybe it deserves additional scrutiny.

Another way to detect spear phishing through structural analysis of emails is observation of the headers that are and are not included in the email. For example, Gmail has several headers (X-GM-Message-State and X-Google-Smtp-Source are examples). If you have an email claiming to be from a Gmail server that lacks these headers or an email not from Gmail that has them, it may be cause for suspicion.


By observing and recording common structural details of a user's emails, it's possible to build up a user-specific profile for each employee in an organization. Each new email can then be compared to this profile using anomaly detection algorithms and flagged to notify the recipient if there is reason for suspicion.


Protecting against phishing attacks

Detecting and protecting against spear phishing attacks is an important component of an organization’s cyber defense strategy. Ninety-one percent of cyberattacks begin with a phishing email, so an effective spear phishing detection strategy is an important step in protecting an organization's cyberassets.


Many email protection tools provide basic protection against phishing attacks by checking for malicious links or attachments within an email. However, spear phishing attacks do not require this kind of content to function since the goal may be to tell the recipient to wire money to a certain bank account in response to an attached, non-malicious invoice. Selection of an email protection tool that provides these advanced spear phishing detection features will improve an organization’s cybersecurity posture and decrease the probability of a breach.


Credit: Andrew Goldberg

I am Chief Scientist at Inky.com, Leading development at Inky Email Protection, an enterprise communications security platform, working to protect corporate email from new breeds of sophisticated phishing attacks. Besides full-stack development, my background includes machine learning, big data, and natural language processing applied to text and communications data.

13 views0 comments
bottom of page