You are currently viewing Lessons Learned from Honeypots – Statistical Analysis of Logins and Passwords

Lessons Learned from Honeypots – Statistical Analysis of Logins and Passwords

Article
Link to Google Scholar

Authors: Pavol Sokol, Veronika Kopčová

Abstract

Honeypots are unconventional tools to study methods, tools and goals of attackers. In addition to IP addresses, timestamps and count of attacks, these tools collect combinations of login and password. Therefore, analysis of data collected by honeypots can bring different view of logins and passwords. In paper, advanced statistical methods and correlations with spatial-oriented data were applied to find out more detailed information about the logins and passwords. Also we used the Chi-square test of independence to study difference between login and password. In addition, we study agreement of structure of password and login using kappa statistics.

Introduction

In current information society we deal with an increasing security threat. Therefore, an important part of information security is protection of information. Common security tools, methods and techniques used before are ineffective against new security threats. Therefore, it is necessary to choose other tools and techniques. It seems that the network forensics, especially honeypots and honeynets, are very useful tools. The use of the word “honeypot” is quite recent [1], however honeypots have been used for more than twenty years in computer systems. It can be defined as a computing resource, whose value is in being attacked [2]. Lance Spitzner defines honeypot as an information system resource whose value lies in unauthorized or illicit use of that resource [3].

The most common classification of honeypot is classification based on the level of interaction. The definition of level of interaction is the range of possibilities the attacker is given after attacking the system. Honeypots can be divided into low-interaction and high-interaction. Example of this type of honeypots is Dionaea [4]. On one hand, low-interaction honeypots emulate the characteristics of network services or a particular operating system. On the other hand, a complete operating system with all services is used to get more accurate information about attacks and attackers [5]. This type of honeypot is called high-interaction honeypot. Example of this type of honeypots is HonSSH [6].

Concept of honeypot is extended by honeynet – a special kind of high-level interaction honeypot. The honeynet can be also referred to as “a virtual environment, consisting of multiple honeypots, designed to deceive an intruder into thinking that he or she has located a network of computing devices of targeting value” [7]. Four main parts of the honeynet architecture are known, namely data control, data capture, data collection and data analysis [27].

The main reason to use these tools is collection and analysis of data captured using honeypots and honeynets. Learning new unconventional information about the attacks, attackers and tools is involved in the protection of the network services and computer networks of organizations. Each honeypot collects the IP addresses of attackers and special data according to type of honeypot. In paper we use the low-interaction honeypots Kippo [8], which collect timestamps, IP address of attacker, type of SSH clients and combination of logins and passwords. For purpose of this paper we focus on logins, passwords and their combinations.

This paper is a sequel to the analysis of data collected from honeypots and honeynets. In paper [9] authors focus on automated secure shell (SSH) bruteforce attacks and discuss the length of passwords, password composition compared to known dictionaries, dictionary sharing, username-password combination, username analysis and timing analysis. On the other hand, the main aim of this paper is to provide light on attackers’ behaviour, and provide recommendations for SSH users and administrators. In this paper we focus on two main statistical analyses. Firstly, chi-square test of independence that analyzes group of differences. Secondly, Kappa statistics that measures agreement between observes.

To formalize the scope of our work, authors state two research questions:

  • What attribution of logins, passwords and their attribution are significant for security of systems?

  • What is the relationship between the logins and passwords and origin of attacks?

This paper is organized into seven sections. Section 2 focuses on the review of published research related to lessons learned from analysis in the honeypots and honeynets. Section 3 outlines the dataset and methods used for experiment. Sections 45 and 6 focus on statistical and spatial analysis of logins, passwords and combination of them. The last section contains conclusions, discussion and our suggestions for the future research.