Analysis and Prediction of Legal Judgements in the Slovak Criminal Proceedings

Post category:Články

Analysis and Prediction of Legal Judgements in the Slovak Criminal Proceedings

Article

Authors: Dávid Varga, Zoltán Szoplák, Stanislav Krajči, Pavol Sokol, Peter Gurský

Abstract

This paper uses machine learning to analyze criminal judgements in the Slovak Republic to determine their adequacy and set a baseline for predicting their outcomes. First, we summarize past and recent advancements in predicting verdicts and other attributes of legal text written in different languages. We then demonstrate data preparation of all publicly available Slovak judgements, extraction of their verdicts, and separation into main parts using a Slovak word inflection dictionary called Tvaroslovník.

Later, we use this data to classify the judgements into acquittal or conviction using several known machine learning methods, ranging from simple statistical methods such as SVM and random forests to deep learning networks based on convolution, recurrence, and their combinations. We evaluate their efficiency, analyze and identify significantly highly correlated terms with each result class, and offer a hypothesis as to why these terms are correlated with these results.

We have found that a sequential input of word2vec embeddings combined with convolution-based deep learning methods produces the best results, achieving over 99% accuracy.

Introduction

Since 2016, the Ministry of Justice of the Slovak Republic has published more than 3 million publicly available court decisions online. These court decisions contain some structured data, e.g., the name of the judge or court, but mostly free text. This free text contains the most relevant parts of court decisions: the final verdict and the reasoning behind the verdict. We aim to find a method to identify court decisions that are not sufficiently reasoned and provide such decisions to lawyers for a more detailed analysis.

In this paper, we examine several statistical and machine learning methods of text representation and classification, intending to correctly predict court decisions based on the reasoning alone. After our model is trained, the reasoning and the verdict of the court decision will become inputs for this model. The model predicts the verdict from the input justification, comparing it with the true verdict received at the input. Subsequently, two situations can occur. If the predicted verdict is identical to the true verdict, we will take this court decision as sufficiently reasoned. If the predicted verdict differs from the true verdict, we will take such a decision as insufficiently reasoned.

The model justifies its prediction by extracting the parts of the court’s reasoning that most influenced the prediction of the verdict. This paper is based on the research stated in Sokol et al. [29], in which the authors formulated their conclusions on the current state and developing trends in the use of digital evidence in judicial proceedings and the usage of the in dubio pro reo principle in criminal proceedings.

To achieve a better understanding of how judgments are reasoned, this paper aims to:

create a classification model which can predict the verdict of the judgement from its reasoning part;
identify significant terms in the judgments’ reasonings closely related to the results of judgements in the criminal proceedings (innocent or guilty).

This paper is organized into six sections. Section 2 focuses on the review of past and recent advancements in the classification of legal documents. Section 3 is devoted to data preprocessing and judgement extraction. Section 4 describes the different methods of text representation and the learning algorithms that will use them. The results produced by these algorithms and their subsequent analysis are presented in Section 5, followed by the last section containing conclusions and future works.

Abstract

Introduction

You Might Also Like

Lessons Learned from Automated Sharing of Intrusion Detection Alerts: The Case of the SABU Platform

The analysis of digital evidence by Formal concept analysis

Prediction of attacks against honeynet based on time series modeling