What security bugs will be exploited? Researchers create an ML model to find out

Using machine learning trained on data from more than two dozen sources, a team of academic researchers created a model to predict which vulnerabilities will likely result in a working exploit, a potentially valuable tool that could help companies make better decisions. which software vulnerabilities to prioritize.

The model, called Expected Exploitability, can detect 60% of vulnerabilities that will have working exploits, with a prediction accuracy – or “accuracy,” to use classification terminology – of 86%. A key to the research is to allow for changes in some metrics over time, as not all relevant information is available at the time a vulnerability is disclosed, and the use of subsequent events allowed researchers to refine the accuracy of the prediction.

By improving exploit predictability, companies can reduce the number of vulnerabilities deemed critical for remediation, but the metric also has other uses, says Tudor Dumitraș, associate professor of electrical and computer engineering at the University of Maryland. at College Park, and one of the authors of the research paper released last week at the USENIX security conference.

“Exploitability prediction is not only relevant for companies who want to prioritize patches, but also for insurance companies trying to calculate risk levels and for developers, because it can -be a step toward understanding what makes a vulnerability exploitable,” he says.

The research from the University of Maryland at College Park and Arizona State University is the latest attempt to give companies additional insight into vulnerabilities that could be or are likely to be exploited. In 2018, researchers from Arizona State University and the USC Information Science Institute focused on analyzing Dark Web discussions to find phrases and features that could be used to predict the likelihood that a vulnerability either or has been exploited.

And in 2019, researchers at data research firm Cyentia Institute, RAND Corp. and Virginia Tech presented a model that improved predictions of vulnerabilities that would be exploited by attackers.

Many systems rely on manual processes by analysts and researchers, but the expected exploitability metric can be fully automated, says Jay Jacobs, chief data scientist and co-founder of the Cyentia Institute.

“This research is different because it focuses on detecting all subtle clues automatically, consistently and without relying on an analyst’s time and opinions,” he says. “[T]Everything is done in real time and at scale. It can easily keep up with and evolve with the flood of vulnerabilities disclosed and published daily.”

Not all features were available at the time of disclosure, so the model also had to take time into account and overcome the challenge of so-called “label noise”. When machine learning algorithms use a static point in time to classify patterns – into, say, workable and non-workable – the classification can hurt the efficiency of the algorithm, if the label later turns out incorrect.

PoCs: security bug analysis for exploitability

The researchers used information on nearly 103,000 vulnerabilities and then compared it to the 48,709 proof-of-concept (PoC) exploits collected from three public repositories – ExploitDB, BugTraq and Vulners – which represented exploits for 21,849 of the separate vulnerabilities. The researchers also mined social media discussions for keywords and tokens – phrases of one or more words – and also created a dataset of known exploits.

However, PoCs aren’t always a good indicator of whether a vulnerability is exploitable, the researchers said in the paper.

“PoCs are designed to trigger the vulnerability by crashing or suspending the target application and are often not directly weaponizable,” the researchers said. “[W]We observe that this leads to many false positives for predicting functional exploits. In contrast, we find that certain PoC characteristics, such as code complexity, are good predictors because triggering a vulnerability is a necessary step for every exploit, making these characteristics causally related to the difficulty of creating functional exploits.”

Dumitraș notes that predicting whether a vulnerability will be exploited adds an additional difficulty, as researchers would need to create a model of attackers’ motivations.

“If a vulnerability is exploited in the wild, then we know there is a working exploit out there, but we know of other cases where there is a working exploit, but there is no example known to be exploitative in the wild,” he said. “Vulnerabilities that have a working exploit are dangerous and therefore should be prioritized for patching.”

A study published by Kenna Security – now owned by Cisco – and the Cyentia Institute found that the existence of public exploit code led to a sevenfold increase in the likelihood of an exploit being used in the wild.

Yet prioritizing patches isn’t the only way exploit prediction can benefit businesses. Cyberinsurance companies could use exploit prediction as a way to determine potential risk to policyholders. Additionally, the model could be used to analyze software in development to find patterns that could indicate whether the software is easier or harder to operate, Dumitraș says.

Previous Mac vulnerability gave hackers access to entire computer
Next MyCert Issues Cybersecurity Best Practices Alert for Merdeka's Holiday