Living-Off-The-Land Command Detection Using Active Learning

Talha Ongun; Jack W. Stokes; Jonathan Bar Or; Ke Tian; Farid Tajaddodianfar; Joshua Neil; Christian Seifert; Alina Oprea; John C. Platt

Living-Off-The-Land Command Detection Using Active Learning

Talha Ongun ,
Jack W. Stokes ,
Jonathan Bar Or ,
Ke Tian ,
Farid Tajaddodianfar ,
Joshua Neil ,
Christian Seifert ,
Alina Oprea ,
John C. Platt

International Symposium on Research in Attacks, Intrusions and Defenses | October 2021

下载 BibTex

In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called “Living-Off-The-Land”. Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them.

We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 96% at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances.