Formal concept analysis approach to understand digital evidence relationships

Article

Link to Science Direct

Authors: Pavol Sokol, Ľubomír Antoni, Ondrej Krídlo, Eva marková, Kristína Kováčová, Stanislav Krajči

Abstract

The number of cyber attacks is constantly increasing daily, which demands organizations to respond quickly and adequately to security incidents. Digital forensics plays an essential role in these activities. In the digital investigation process, it is necessary to identify and separate relevant digital evidence from that which is not. In this paper, we describe the construction of ordinary and fuzzy formal contexts based on digital evidence collected from event logs and the filesystem (New Technology File System). We generated four concept lattices for various subsets of attributes regarding timestamps, types of files, or event logs. The association rules and their connections with Formal Concept Analysis are explored, and several algorithms, including GUHA methods, are applied to our data. We compare, evaluate and interpret the various methods for association rules mining. Moreover, we describe the state-of-the-art of fuzzy attribute implications in Formal Concept Analysis and provide the interpretation of implications in our epoch-time attributes. Our solution provides warnings for the security analyst to manually check and inspect the suspicious records in data. Hence, the analyst can quickly find relevant records for the case and perform further analysis.

Introduction

In recent years, there has been a trend of increasing sophistication in cyber security threats. These threats are executed with incredible speed and precision and often target specific organizations rather than being indiscriminate. As a result, organizations should ensure effective strategies for responding to security threats and incidents. Digital forensics plays a vital role in this process. It allows forensic analysts to quickly gather relevant information about the case and identify the source of the security incident, the attacker’s methods and motivations, and the potential impact.
As the frequency of cyberattacks increases, there is a growing need for security analysts and teams to handle security incident response. With the abundance of alerts generated by monitoring devices, it can be overwhelming for analysts to assess the situation and gather all necessary information quickly. Therefore, analysts must make informed decisions about their next steps to minimize the potential loss of sensitive information and prevent future attacks.
Handling security incidents is critical to information and cyber security for organizations. This reactive activity aims to identify the source of the incident, understand the attacker’s methods, assess the impact, and implement security measures.
Timely and effective incident resolution is essential, which is why digital forensics is often utilized. This advanced form of analysis involves the investigation of all devices that can store digital data to confirm or refute forensic hypotheses, particularly in a security incident.
As the use of digital devices continues to rise, the number of investigations requiring digital evidence is also expected to increase. It will likely result in a growing backlog for law enforcement agencies tasked with analyzing and utilizing this type of evidence in their cases [49].
It is common for digital evidence obtained from individual devices to include relevant and irrelevant information for the case under investigation. Therefore, it is necessary to identify and separate the relevant digital evidence from that which is not. Currently, most techniques for this purpose involve manual searching [68]. However, machine learning techniques can significantly accelerate digital forensics, mainly through pattern recognition algorithms and solutions for detecting abnormal behavior [62]. In recent years, there has been a growing trend of incorporating machine learning, or deep learning, into digital forensics [38].
The digital investigation aims to identify and obtain relevant information from the system, including metadata and a timeline. Metadata such as file size, path, and name are commonly used to filter and index files. At the same time, the creation and analysis of a timeline allow for the chronological representation of sets of records [44]. Timeline analysis is a crucial forensic capability for investigating cyber attacks [27], as it enables security teams to quickly identify digital evidence or events of significant forensic value and gain a comprehensive understanding of the events leading up to, during, and following the event [44].
This research aims to address the challenge of automatically identifying relevant digital evidence and the relationship between them within the file system of a Windows operating system using the New Technology File System (NTFS). We are also considering Windows Event Logs from the system.
File systems are a rich source of information about user activity, as they can track every file created, modified, copied, or deleted on a device. As a result, most digital evidence is often found within the file system [16], [54].
The proposed model focuses on identifying unusual occurrences of digital evidence that may interest forensic analysts, particularly in the initial stages of their analysis, and on searching for connections and relationships between them. The model can streamline forensic analysts‘ work by facilitating the identification of relevant evidence.
One crucial aspect of security incident response and digital forensics is identifying the attributes of digital evidence and examining the relationships between them [56]. Another important aspect is identifying the digital evidence relevant to the specific case being analyzed [68], [22]. Machine learning has been widely used in digital forensic investigations for various purposes, such as data discovery, device triage, and network forensics. According to Flach [33], machine learning consists of tasks, models, and features and follows three steps: task definition, feature construction, and evaluation and optimization [29].
We aim to use Formal Concept Analysis to identify relevant digital evidence and then examine the relationships between the attributes of digital evidence. To summarize the issues discussed above, the following questions will be addressed in this research:

to propose and analyze a method of identification of digital evidence relevant to the case based on attribute representation;
to analyze the relationships between attributes representing digital evidence in digital forensics.

To answer these questions, we have focused on Formal Concept Analysis and several methods for association rules mining. Formal concept analysis [36], [35], [37] provides the methods and algorithms for exploring meaningful groupings of digital objects based on their shared attributes. Moreover, it provides visualization capabilities employing concept lattices, which we will present in the area of digital forensics in Section 4.
This research is an extension of our previous studies [64]. In this paper, we applied Formal Concept Analysis, a set of data analysis methods based on lattice theory, to automatically identify relevant digital evidence within the file system and analyze the relationships between attributes representing digital evidence in digital forensics. The proposed method streamlines the identification of relevant evidence and relationships, reducing the time and effort required to conduct a forensic investigation. We have also extended this analysis with another type of source digital evidence (event logs). At the same time, we have extended our results in a fuzzy setting. We propose a method for automatically identifying relevant digital evidence and analyzing the relationships between attributes representing digital evidence in digital forensics. To accomplish this, we utilize Formal Concept Analysis and several methods for association rules mining.
In this paper, we focus on identifying unusual occurrences of digital evidence that may interest forensic analysts, particularly in the initial stages of their analysis, and on searching for connections and relationships between them. This research makes a significant contribution to the field of digital forensics by providing a valuable tool for assisting forensic analysts in their investigations and improving incident response for organizations.
The remainder of the paper is structured as follows: Section 2 presents a literature review of the existing research in the automatic analysis of events and forensic artifacts and formal concept analysis in cyber security. Section 3 describes the dataset and methods used in this research. We focus on data set preprocessing, filesystem dataset, and event log dataset description. Section 4 focuses on digital evidence’s concept lattice, especially on the lattice of filesystem operations, Windows filesystem data, and event logs. In Section 5, we provide the comparison, evaluation, and interpretation of several methods for association rules mining of digital evidence. Section 6 discusses the fuzzy attribute implications of event logs in more detail. The conclusions are highlighted in Section 7.

Abstract

Introduction

You Might Also Like

Evolution of legal issues of honeynets

Virtual honeypots and detection of telnet botnets

Lessons Learned from Honeypots – Statistical Analysis of Logins and Passwords