You are currently viewing Data Collection and Data Analysis in Honeypots and Honeynets

Data Collection and Data Analysis in Honeypots and Honeynets

Link to Google Scholar

Authors: Pavol Sokol, Patrik Pekarčík, Tomáš Bajtoš

Abstract

Honeypots and honeynets are unconventional security tools to study techniques,
methods, tools, and goals of attackers. Therefore, data analysis is an important part
of honeypots and honeynets. In paper we focus on analysis of data collected from
different honeypots and honeynets. We discuss framework to analyse honeypots’
and honeynets’ data. Also, we outline a secure way to transfer collected data from
honeypots to the analysis itself. At last, we propose a framework for analysis of
attack based on data collected by honeypots and honeynets.

Introduction

The traditional ways of defence (e.g. firewalls, IDS, IPS) are becoming less and less
effective. It is due to the changing nature of the attackers’ behaviour, methods, and
tools. Therefore the attackers are several steps ahead of defensive mechanisms.
From this perspective, we need to find new approaches to protect information and
infrastructure of the organizations. One of the effective approaches to protect them
is concept of honeypots and honeynets.
A honeypot is “a computing resource, whose value is in being attacked” [1]. Lance
Spitzner defines honeypots as “an information system resource whose value lies in
unauthorized or illicit use of that resource” [2]. Honeypots are a very useful tool for
learning about tools, procedures, targets, and methods of attackers.
For the purpose of this paper, we classify the honeypots according to their level of
interaction and role. The first classification is based on the role of honeypot.
According to this classification, honeypots are divided in server-side honeypots and
client-side honeypots. Server-side honeypots are useful in detecting new exploits,
collecting malware, and enriching research of the threat analysis (e.g. Conpot [3]).
On the other hand, honeypots for client-side attacks are called client-side (e.g.
Thug [4]). The prime motive of client-side honeypots is to identify and detect
malicious activities across the Internet [5].
The second classification is based on the level of interaction. The level of
interaction can be defined as the range of possibilities that a honeypot allows an
attacker to have. The low-interaction honeypots detect attackers using software
emulation of characteristics of a particular operating system and network services
on the host operating system. Advantage of this approach is in a better control over
attacker’s activities, since attacker is limited to software running on a host operating
system. On the other hand, disadvantageous about this approach is the fact that the
low-interaction honeypot emulates service, or couple of services, but it does not
emulate complete operating system. Examples of this type of honeypots are
Dionaea [6], HoneyD [7].
Honeypots that offer attackers more ability to interact than do the low-interaction
honeypots, but less functionality than high-interaction solutions, are called
medium-interaction honeypots. They can „expect certain activity and are designed
to give certain responses beyond what a low-interaction honeypot would give” [1].
Examples of this type of honeypot is Kippo [8].
In order to get more information about attackers, their methods and attacks, we use
a complete operating system with all services. This type of honeypot is called high
interaction honeypot. Main aim of this type of honeypot is to provide the attacker
access to a real operating system [9]. Examples of this type of honeypots are
HonSSH [10], Sebek [11].
The concept of honeypots is extended by honeynets. Honeynet can be defined as
“a highly controlled network of honeypots” [12]. At present, complete honeynet,
running on a single computer in virtual environment is used [12]. This type of
honeynet is defined as a virtual honeynet.
To successfully deploy a honeynet, we must correctly deploy the honeynet
architecture. There is “no single rule on how one should deploy this architecture”
[13]. There are three core elements of the honeynet architecture that define
honeynet architecture [2,12]:
 Data capture- monitors and logs all activities of attacker within the
honeynet.
 Data control- purpose of which is to control and contain the activity of
attacker.
 Data collection- all data are captured and stored in one central location.
The first two core functions are the most important, and they are applicable to
every honeynet deployment. The last core function- data collection- is applied by
organization in case that organization has the multiple honeynets in distributed
environments.
Some authors [14,15] add data analysis to the above-mentioned core elements.
Data analysis is an ability of honeynet to analyse the data, which is being collected
from it. Data analysis is used for “understanding, analysing, and tracking the
captured probes, attacks or some other malicious activities” [1]. Example of this
core element is combination of security devices, such as firewall (IPtables),
intrusion prevention system (Cisco IPS) and intrusion detection system (Snort,
Suricata), where this security devices can analyse the network traffic in detail, and
return the result of analysis in a visible way. In this paper we focus on data analysis.
Deployment and usage of honeypots and honeynets brings many benefits, e.g. the
possibility of discovering new forms of attacks. On the other hand, usage of
honeypots and honeynets brings about some problems. The primary motivation for
elaborating this paper is the fact that there are several problems in field of analysis
of data. There are a lot of implementations of honeypots that collect data. In most
cases they use different format for their storage, or collected data differ. Therefore,
it is difficult to analyse the attack from various types of honeypots. Another
problem represents a secure way of transferring the collected data from honeypots
to the analysis itself.
To formalize the scope of our work, we state three research questions:
 How to collect data for their further analysis securely?
 How to analyse data from different types of honeypots?
 How to analyse incident according to data collected from honeypots?
This paper is organized into five sections. In Section II, it is focused on the papers
related to data analysis of honeypots and honeynets. Section III is the main part of
paper. It provides framework for data analysis. In this section, answers to the first
and the second research question are provided. Section IV outlines incident
taxonomy, based on honeypots’ and honeynets’ data. In this section, the third
research question is answered. Section V concludes the paper and contains
suggestions for future work.