2019-10-21 15:40:47 +02:00
|
|
|
\documentclass[conference]{IEEEtran}
|
|
|
|
\IEEEoverridecommandlockouts{}
|
2019-12-11 20:21:10 +01:00
|
|
|
% The preceding line is only needed to identify funding in the first footnote.
|
|
|
|
% If that is unneeded, please comment it out.
|
2019-10-29 11:09:23 +01:00
|
|
|
\usepackage{csquotes}
|
|
|
|
\usepackage[style=ieee,backend=biber]{biblatex}
|
|
|
|
|
|
|
|
\addbibresource{./bibliography.bib}
|
|
|
|
|
2019-10-21 15:40:47 +02:00
|
|
|
\usepackage{amsmath,amssymb,amsfonts}
|
|
|
|
\usepackage{algorithmic}
|
|
|
|
\usepackage{booktabs}
|
|
|
|
\usepackage{graphicx}
|
|
|
|
\usepackage{textcomp}
|
|
|
|
\usepackage{xcolor}
|
2019-12-10 20:36:42 +01:00
|
|
|
\usepackage{caption}
|
|
|
|
\usepackage{subcaption}
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
% code listings
|
|
|
|
\usepackage{minted}
|
|
|
|
\usepackage{relsize}
|
|
|
|
% acronyms
|
|
|
|
\usepackage{acro}
|
|
|
|
\acsetup{list-long-format=\capitalisewords}
|
|
|
|
|
2019-10-21 15:40:47 +02:00
|
|
|
%additional packages
|
|
|
|
%\usepackage[ngerman]{babel}
|
|
|
|
\usepackage[utf8]{inputenc}
|
|
|
|
\usepackage{hyperref}
|
2019-12-16 17:56:17 +01:00
|
|
|
\usepackage{cleveref}
|
2019-10-21 15:40:47 +02:00
|
|
|
\usepackage{url}
|
|
|
|
%%fuer abkuerzungen begin
|
|
|
|
\usepackage[acronym,hyperfirst = false]{glossaries}
|
|
|
|
\glsdisablehyper{}
|
2019-12-11 20:21:10 +01:00
|
|
|
%\usepackage[acronym,acronymlists={main,
|
|
|
|
%abbreviationlist},shortcuts,toc,description,footnote]{glossaries}
|
2019-10-21 15:40:47 +02:00
|
|
|
\newglossary[clg]{abbreviationlist}{cyi}{cyg}{List of Abbreviations}
|
|
|
|
\newglossary[slg]{symbolslist}{syi}{syg}{Symbols}
|
|
|
|
\renewcommand{\firstacronymfont}[1]{\emph{#1}}
|
|
|
|
\renewcommand*{\glspostdescription}{} % Punkt am Ende jeder Beschreibung entfernen
|
|
|
|
\renewcommand*{\acrnameformat}[2]{#2 (\acronymfont{#1})} % Langform der Akronyme
|
|
|
|
\makeglossaries{}
|
|
|
|
\date{\today}
|
|
|
|
\input{glossary}
|
|
|
|
%%fuer abkuerzungen end
|
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\include{acronyms}
|
|
|
|
|
2019-10-21 15:40:47 +02:00
|
|
|
\begin{document}
|
|
|
|
|
2019-10-22 11:04:52 +02:00
|
|
|
\title{Overview Over Attack Vectors and Countermeasures for Buffer Overflows}
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-10-22 11:04:52 +02:00
|
|
|
\author{\IEEEauthorblockN{Valentin Brandl}
|
2019-10-21 15:40:47 +02:00
|
|
|
\IEEEauthorblockA{\textit{Faculity of Computer Science and Mathematics} \\
|
|
|
|
\textit{OTH Regensburg}\\
|
|
|
|
Regensburg, Germany \\
|
|
|
|
valentin.brandl@st.oth-regensburg.de\\
|
|
|
|
MatrNr. 3220018}
|
|
|
|
}
|
|
|
|
|
|
|
|
\maketitle
|
|
|
|
|
|
|
|
\begin{abstract}
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
This paper tries to explain the details behind buffer overflows, explore the
|
|
|
|
problems stemming from those kinds of software vulnerabilities and discus
|
|
|
|
possible countermeasures with focus on their effectiveness, performance impact
|
2019-12-17 21:27:03 +01:00
|
|
|
and ease of use. It discusses compiler based (such as ASLR, NX, stack
|
|
|
|
canaries) as well as type system based (e.g.\ dependent types) solutions to
|
|
|
|
this prevalent type of software bugs based on their performance impact and the
|
|
|
|
effort needed to introduce the mitigations into existing software projects. An
|
|
|
|
analysis of the current state of the art informs the reader about what to
|
|
|
|
expect when writing software today. The analysis shows that most techniques
|
|
|
|
actually tackle the problem of exploiting buffer overflows for code execution
|
|
|
|
but do nothing to prevent introducing them in the first place.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
2019-10-21 15:40:47 +02:00
|
|
|
\end{abstract}
|
|
|
|
|
|
|
|
\begin{IEEEkeywords}
|
|
|
|
Buffer Overflow, Software Security
|
|
|
|
\end{IEEEkeywords}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\section{Motivation}\label{ref:motivation}
|
|
|
|
|
2019-12-17 21:27:03 +01:00
|
|
|
In the early days of programming, memory as managed manually to make the best
|
|
|
|
use of slow hardware and low memory. This opened the door for many kinds of
|
|
|
|
programming errors. Memory can be deallocated more than once (double-free),
|
|
|
|
invalid pointers can be dereferenced (\mintinline{C}{NULL} pointer dereference;
|
|
|
|
this is still a problem in many modern languages) or the program could read or
|
|
|
|
write out of bounds of a buffer (information leaks, \acp{bof}). Languages that
|
|
|
|
are affected by this are e.g.\ C, C++ and Fortran. While modern programming
|
|
|
|
languages solve most if not all of these problems, critical parts of the worlds
|
|
|
|
infrastructure are still implemented in these old languages, either because they
|
|
|
|
allow the implementation of really performant programs, offer deterministic
|
|
|
|
runtime behaviour (e.g.\ no pauses due to garbage collection), because they
|
|
|
|
power legacy systems or for portability reasons. Scientists and software
|
|
|
|
engineers have proposed lots of solutions to this problem over the years and
|
|
|
|
this paper aims to compare and give an overview about those.
|
|
|
|
|
|
|
|
Reading out of bounds can result in an information leak and is one of the less
|
|
|
|
critical results of \ac{bof} in most cases, but there are exceptions, e.g.\ the
|
|
|
|
Heartbleed bug~\cite{Heardbleed2014} in OpenSSL which allowed dumping secret
|
|
|
|
keys from memory. Out of bounds writes are almost always critical and result in
|
|
|
|
code execution vulnerabilities or at least application crashes.
|
2019-11-20 20:03:53 +01:00
|
|
|
|
|
|
|
In 2018, 14\% (2368 out of 16556)~\cite{Cve2018} of all software vulnerabilities
|
2019-12-09 13:04:39 +01:00
|
|
|
that have a CVE assigned, were overflow related. This shows that, even if this
|
2019-11-20 20:03:53 +01:00
|
|
|
type of bug is very old and well known, it's still relevant today.
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\section{Background}\label{ref:background}
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\subsection{Technical Details}
|
2019-10-28 12:41:43 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
Code execution via \ac{bof} vulnerabilities almost always works by overwriting
|
|
|
|
the return address in the current stack frame (known as \enquote{stack
|
|
|
|
smashing})~\cite{Smashing2004}, so when the \mintinline{ASM}{RET} instruction is
|
2019-12-17 11:04:27 +01:00
|
|
|
executed, an attacker controlled address is moved into the \ac{ip} register and
|
|
|
|
the code pointed to by this address is executed~\cite{Detection2018}. Other ways
|
|
|
|
include overwriting addresses in the \ac{plt} (the \ac{plt} contains addresses
|
|
|
|
of dynamically linked library functions) of a binary so that, if a linked
|
|
|
|
function is called, an attacker controlled function is called instead, or (in
|
2019-12-17 21:27:03 +01:00
|
|
|
C++) overwriting the \ac{vmt}, which stores the pointers to an object's methods.
|
2019-11-20 20:03:53 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
A simple vulnerable C program might look like this:
|
2019-11-20 20:03:53 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\begin{figure}[h!]
|
2019-11-20 20:03:53 +01:00
|
|
|
\begin{minted}{c}
|
2019-12-16 17:56:17 +01:00
|
|
|
void vuln(char *input) {
|
2019-11-20 20:03:53 +01:00
|
|
|
char buf[50];
|
2019-12-16 17:56:17 +01:00
|
|
|
size_t len = strlen(input);
|
|
|
|
for (size_t i = 0; i < len; i++) {
|
|
|
|
buf[i] = input[i];
|
2019-11-20 20:03:53 +01:00
|
|
|
}
|
2019-12-16 17:56:17 +01:00
|
|
|
}
|
|
|
|
int main(int argc, char **argv) {
|
|
|
|
vuln(argv[1]);
|
2019-11-20 20:03:53 +01:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
\end{minted}
|
2019-12-16 17:56:17 +01:00
|
|
|
\caption{Vulnerable C program}\label{lst:vuln}
|
|
|
|
\end{figure}
|
2019-10-21 16:44:53 +02:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
A successful stack \ac{bof} exploit would place the payload in the memory by
|
|
|
|
supplying it as an argument to the program (or by placing it in an environment
|
2019-12-17 21:27:03 +01:00
|
|
|
variable, writing it to a file that the program reads, via network packet,
|
|
|
|
\dots) and eventually overwrite the return address by providing an input with
|
|
|
|
more than 50 bytes and therefore writing out of bounds. When executing the
|
|
|
|
\mintinline{C}{return} instruction, and the \ac{ip} jumps into the payload, the
|
|
|
|
attacker's code is executed. This works due to the way, how CPUs perform
|
|
|
|
function calls: The stack frame of the current function lies between the \ac{bp}
|
|
|
|
and \ac{sp} as shown in~\cref{fig:before}. When calling a function, the value of
|
|
|
|
the \ac{bp} and \ac{ip} is pushed to the stack (\cref{fig:call}) and the CPU
|
|
|
|
writes the address of the called function into the \ac{ip}. When the function
|
|
|
|
returns, after restoring the old \ac{ip} from the stack, the execution continues
|
|
|
|
from where the function call occurred earlier. If an overflow overwrites the old
|
|
|
|
\ac{ip} (\cref{fig:exploit}), the attacker controls where execution continues.
|
2019-12-10 20:36:42 +01:00
|
|
|
|
|
|
|
\begin{figure}[h!]
|
2019-12-16 17:56:17 +01:00
|
|
|
\begin{subfigure}[b]{.3\textwidth}
|
|
|
|
\includegraphics[width=\textwidth]{./dot/before.pdf}
|
2019-12-10 20:36:42 +01:00
|
|
|
\caption{Stack layout before function call}\label{fig:before}
|
2019-12-16 17:56:17 +01:00
|
|
|
\end{subfigure}\\
|
2019-12-10 20:36:42 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\begin{subfigure}[b]{.3\textwidth}
|
|
|
|
\includegraphics[width=\textwidth]{./dot/call.pdf}
|
2019-12-10 20:36:42 +01:00
|
|
|
\caption{Stack layout after function call}\label{fig:call}
|
2019-12-16 17:56:17 +01:00
|
|
|
\end{subfigure}\\
|
2019-12-10 20:36:42 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\begin{subfigure}[b]{.3\textwidth}
|
|
|
|
\includegraphics[width=\textwidth]{./dot/exploit.pdf}
|
2019-12-10 20:36:42 +01:00
|
|
|
\caption{Stack layout after overflow}\label{fig:exploit}
|
2019-12-16 17:56:17 +01:00
|
|
|
\end{subfigure}
|
|
|
|
\caption{Stack layouts during an \ac{bof} exploit}
|
|
|
|
\end{figure}%
|
2019-12-10 20:36:42 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
This is only one of several types and exploitation techniques. Others include
|
2019-12-10 20:36:42 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\begin{itemize}
|
|
|
|
|
|
|
|
\item Heap-based \ac{bof}: In this case there is no way of overwriting the
|
|
|
|
return address but objects on the heap might contain function pointers
|
|
|
|
(e.g.\ for dynamic dispatch) which can be overwritten to execute the
|
2019-12-17 21:27:03 +01:00
|
|
|
attackers code, when called~\cite{Detection2018}.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
\item Integer overflow: Some calculation on fixed sized integers is used to
|
|
|
|
allocate memory. The calculation leads to an integer overflow and only a
|
2019-12-17 21:27:03 +01:00
|
|
|
small buffer is allocated~\cite{Detection2018}. Later the buffer is indexed
|
|
|
|
with a big integer and performs a read or write outside the buffer. This
|
|
|
|
kind of vulnerability can also lead to other problems because at least in C,
|
|
|
|
signed integer overflow is undefined behaviour.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
\end{itemize}
|
|
|
|
|
2019-12-17 21:27:03 +01:00
|
|
|
This paper does not explore other kinds of \ac{bof} in detail because the
|
|
|
|
concept is always the same: Unchecked indexing into memory allows the attacker
|
|
|
|
to overwrite some kind of return or call address, which allows hijacking of the
|
2019-12-16 17:56:17 +01:00
|
|
|
execution flow.
|
2019-12-11 20:21:10 +01:00
|
|
|
|
2019-12-17 21:27:03 +01:00
|
|
|
The most trivial kind of payloads is known as a \mintinline{ASM}{NOP} sled.
|
2019-12-16 17:56:17 +01:00
|
|
|
Here the attacker appends as many \mintinline{ASM}{NOP} instructions before any
|
|
|
|
shell-code (e.g.\ to invoke \mintinline{shell}{/bin/sh}) and points the
|
|
|
|
overwritten \ac{ip} or function pointer somewhere inside the
|
|
|
|
\mintinline{ASM}{NOP}s. The execution \enquote{slides} (hence the name) through
|
|
|
|
the \mintinline{ASM}{NOP}s until it reaches the shell-code. Most of the
|
|
|
|
mitigation techniques described in this paper protect against this kind of
|
|
|
|
exploit but there are different and more complex ways of exploiting \acp{bof}
|
|
|
|
that are not that easily mitigated.
|
2019-10-28 12:41:43 +01:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\section{Concept and Methods}\label{ref:concept}
|
2019-10-21 16:44:53 +02:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\subsection{Research Methods}
|
2019-11-19 21:18:50 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
This paper describes several techniques that have been proposed to mitigate the
|
|
|
|
problems introduced by \acp{bof} and tries to answer the following questions:
|
2019-11-19 21:18:50 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\begin{itemize}
|
2019-10-28 11:46:07 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\item What is the performance impact?
|
|
|
|
|
|
|
|
\item How effective is the technique? Did it actually prevent exploitation of
|
|
|
|
\acp{bof}?
|
|
|
|
|
|
|
|
\item How realistic is it for developers to use the technique in real-world
|
2019-12-17 21:27:03 +01:00
|
|
|
code? Is an incremental introduction possible?
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
The paper focuses on solutions for the C language, since it is still the second
|
|
|
|
most used language as of December 2019~\cite{Tiobe2019}. Some of the described
|
|
|
|
techniques are language agnostic but this is not a focus of this paper. In the
|
|
|
|
end, there is a discussion about the current state.
|
|
|
|
|
|
|
|
For the literature research, the paper~\citetitle{Detection2018} served as a
|
2019-12-17 21:27:03 +01:00
|
|
|
base. From there on, the author performed a snowball system search with
|
|
|
|
combinations of the keywords \enquote{buffer}, \enquote{overflow},
|
|
|
|
\enquote{detection}, \enquote{prevention} and \enquote{dependent typing} using
|
2019-12-16 17:56:17 +01:00
|
|
|
\url{https://scholar.google.com/}.
|
|
|
|
|
2019-12-17 21:27:03 +01:00
|
|
|
Evaluation and prioritization of results is done using the following criteria:
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
|
|
|
\item Type of publication in the following order:
|
|
|
|
|
|
|
|
\begin{enumerate}
|
|
|
|
\item conference paper
|
|
|
|
\item unreleased paper
|
|
|
|
\item books
|
|
|
|
\item online sources
|
|
|
|
\end{enumerate}
|
|
|
|
|
|
|
|
\item Number of citations
|
|
|
|
|
|
|
|
\item Publisher
|
|
|
|
|
|
|
|
\item Author's reputation and institute
|
|
|
|
|
|
|
|
\item Overall quality (first by checking structure and abstract, then by
|
|
|
|
the actual content)
|
|
|
|
|
|
|
|
\end{itemize}
|
2019-10-28 11:46:07 +01:00
|
|
|
|
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\subsection{\Ac{rbc}}
|
|
|
|
|
|
|
|
The easiest and maybe single most effective method to prevent \acp{bof} is to
|
|
|
|
check, if a write or read operation is out of bounds. This requires storing the
|
|
|
|
size of a buffer together with the pointer to the buffer (so called fat
|
|
|
|
pointers) and check for each read or write in the buffer, if it is in bounds at
|
2019-12-17 21:27:03 +01:00
|
|
|
runtime. Almost any language that comes with a managed runtime, uses \ac{rbc}.
|
2019-12-18 17:12:55 +01:00
|
|
|
For this technique to be effective in general, writes to raw pointers must be
|
|
|
|
disallowed. Otherwise the security checks can be circumvented. \Ac{rbc}
|
2019-12-17 21:27:03 +01:00
|
|
|
introduces a runtime overhead for every indexed read or write operation. This is
|
|
|
|
a problem if a program runs on limited hardware or might impact real-time
|
|
|
|
properties.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
Introducing \ac{rbc} into an existing codebase is not easy. Using fat pointers
|
|
|
|
in a few functions does not prevent other parts of the code to use raw pointers
|
|
|
|
into the same buffer. So for this to be effective, the whole codebase needs to
|
|
|
|
be changed to disallow raw pointers, which, depending on the size, might not be
|
2019-12-17 21:27:03 +01:00
|
|
|
feasible. Still, if done correctly and consequently, there will be no \ac{bof}
|
|
|
|
vulnerabilities. \Ac{dos} might is still possible depending on how invalid
|
|
|
|
indexing is handled, because the program might terminate gracefully when a out
|
|
|
|
of bounds index is used.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
\subsection{Prevent/Detect Overwriting Return Address}
|
|
|
|
|
2019-12-17 21:27:03 +01:00
|
|
|
Since stack based \ac{bof} exploits work by overwriting the return address in
|
|
|
|
the current stack frame, preventing or at least detecting this, can be quite
|
2019-11-20 20:03:53 +01:00
|
|
|
effective without much overhead at runtime. \citeauthor{Rad2001} describe a
|
2019-12-16 17:56:17 +01:00
|
|
|
technique that stores a redundant copy of the return address in a secure memory
|
2019-11-20 20:03:53 +01:00
|
|
|
area that is guarded by read-only memory, so it cannot be overwritten by
|
|
|
|
overflows. When returning, the copy of the return address is compared to the one
|
2019-12-17 21:27:03 +01:00
|
|
|
in the current stack frame and only if it matches, the \mintinline{ASM}{RET}
|
2019-11-20 20:03:53 +01:00
|
|
|
instruction is actually executed~\cite{Rad2001}. While this is effective against
|
2019-12-17 21:27:03 +01:00
|
|
|
stack based \acp{bof}, in the described form, it does not protect against
|
|
|
|
\ac{vmt} or \ac{plt} overwrites. An extension could be made to also protect the
|
|
|
|
\ac{plt} and \ac{vmt} but custom constructs using function pointers would remain
|
|
|
|
vulnerable. Since this technique is a compiler extension, no modification of the
|
|
|
|
codebase is required to enable it, and while it does not prevent all kinds of
|
|
|
|
\ac{bof}, it mitigates all stack based \acp{bof} with only minimal overhead when
|
|
|
|
calling and returning from a function.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
An older technique from 1998 proposes to put a canary word (named after the
|
|
|
|
canaries that were used in mines to detect low oxygen levels) between the data
|
|
|
|
of a stack frame and the return address~\cite{Stackguard1998}\cite{AtkDef2016}.
|
2019-12-17 21:27:03 +01:00
|
|
|
When returning, a check is performed, to confirm, the canary is intact, if it is
|
|
|
|
not, a \ac{bof} occurred. This technique is implemented by major
|
2019-12-16 17:56:17 +01:00
|
|
|
compilers~\cite{Gcc2003} but can be defeated, if there is an information leak
|
2019-12-17 21:27:03 +01:00
|
|
|
that leaks the canary to the attacker. The attacker is then able to construct a
|
|
|
|
payload, that keeps the canary intact. This mitigation has a minimal performance
|
|
|
|
impact~\cite{Gcc2003} and offers a good level of protection. It is a compiler
|
|
|
|
extension so there is no need for modification of the code base.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\subsection{Type System Solutions}
|
2019-10-28 11:46:07 +01:00
|
|
|
|
2019-10-29 11:09:23 +01:00
|
|
|
\citeauthor{Dep2007} propose an extension to the C type system that extends it
|
2019-11-19 21:18:50 +01:00
|
|
|
with dependent types. These types have an associated value, e.g.\ a pointer type
|
2019-12-16 17:56:17 +01:00
|
|
|
can have the buffer size associated to it~\cite{Dep2007}. This prevents indexing
|
|
|
|
into a buffer with out-of-bounds values. This extension is a superset of C so
|
2019-12-17 21:27:03 +01:00
|
|
|
compilation of any valid C code is possible using the extension and incremental
|
|
|
|
improvement of the codebase is possible. If the type extension is advanced
|
|
|
|
enough, the additional information might form the base for a formal
|
|
|
|
verification. In some cases, inference of the type extensions is
|
|
|
|
possible~\cite{Dep2007}.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
This technique prevents all kinds of overflows, if used, but requires changes to
|
|
|
|
the codebase and is only effective where these changes are applied. Since it is
|
2019-12-17 21:27:03 +01:00
|
|
|
a compile-time solution, it affects the compile-time but has no negative effect
|
|
|
|
on the runtime.
|
2019-11-19 21:18:50 +01:00
|
|
|
|
|
|
|
\subsection{Address Space Layout Randomization}
|
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\Ac{aslr} aims to prevent exploitation of \acp{bof} by placing code at random
|
|
|
|
locations in memory~\cite{AtkDef2016}. That way, it is not trivial to set the
|
|
|
|
return address to point to the payload in memory. This is effective against
|
|
|
|
every kind of \ac{bof} vulnerability but it is still possible to exploit
|
|
|
|
\ac{bof} vulnerabilities in combination with information leaks or other
|
|
|
|
techniques like heap spraying. Also on 32 bit systems, the address space is
|
|
|
|
small enough to try a brute-force attempt until the payload in memory is
|
|
|
|
hit~\cite{Effectiveness2014}.
|
|
|
|
|
|
|
|
This is another technique that works without modification of the code base. Also
|
|
|
|
there is no runtime overhead because nothing changed except the location of the
|
|
|
|
program.
|
2019-10-28 11:46:07 +01:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\subsection{w\^{}x Memory}
|
2019-10-22 11:04:52 +02:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
w\^{}x (also known as \ac{nx} or \ac{dep}) makes memory either writable or
|
|
|
|
executable~\cite{AtkDef2016}. That way, an attacker cannot place arbitrary
|
2019-12-17 21:27:03 +01:00
|
|
|
payloads in memory. There are still techniques to exploit this by reusing
|
|
|
|
existing executable code. The ret-to-libc exploiting technique uses existing
|
2019-12-16 17:56:17 +01:00
|
|
|
calls to the libc with attacker controlled parameters, e.g.\ if the program uses
|
|
|
|
the \mintinline{shell}{system} command, the attacker can plant
|
|
|
|
\mintinline{shell}{/bin/sh} as parameter on the stack, followed by the address
|
2019-12-17 21:27:03 +01:00
|
|
|
of \mintinline{shell}{system} and get a shell on the system. \Ac{rop} (a
|
2019-12-16 17:56:17 +01:00
|
|
|
superset of ret-to-libc exploits) uses so called \ac{rop} gadgets, combinations
|
|
|
|
of memory modifying instructions followed by the \mintinline{ASM}{RET}
|
|
|
|
instruction to build instruction chains, that execute the desired shell-code.
|
2019-12-17 21:27:03 +01:00
|
|
|
This is achieved by placing the desired return addresses in the right order on
|
|
|
|
the stack and reuses the existing code to circumvent the w\^{}x protection.
|
|
|
|
These combinations of memory modification followed by \mintinline{ASM}{RET}
|
|
|
|
instructions, known as \ac{rop} chains, are Turing complete~\cite{Rop2007}, so
|
|
|
|
in theory it is possible to construct any imaginable payload, as long as the
|
2019-12-16 17:56:17 +01:00
|
|
|
exploited program contains enough gadgets and the overflowing buffer has enough
|
|
|
|
space.
|
2019-10-22 11:04:52 +02:00
|
|
|
|
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\section{Discussion}\label{ref:discussion}
|
2019-10-21 16:44:53 +02:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\subsection{Effectiveness}
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\subsubsection{\ac{aslr}}
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-12-17 21:27:03 +01:00
|
|
|
\Ac{aslr} has proven effective and sees wide use in production. Most major
|
|
|
|
operating systems implement this technique~\cite{FBSDaslr}. Some even use kernel
|
2019-12-11 20:21:10 +01:00
|
|
|
\ac{aslr}~\cite{Linuxaslr}. Since this mechanism is active at runtime, it does
|
2019-12-16 17:56:17 +01:00
|
|
|
not require any changes in the code itself, the program only has to be compiled
|
2019-12-11 20:21:10 +01:00
|
|
|
as a \ac{pie}. On 32-bit CPUs, only 16-bit of the address are randomized. These
|
|
|
|
16-bit can be brute forced in a few minutes or seconds~\cite{AslrEffective2004}.
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
There is no runtime overhead since the only change is the position of the
|
2019-12-18 17:12:55 +01:00
|
|
|
program in memory. This technique can and should be used on modern systems
|
|
|
|
because there is no additional work required, except maybe recompilation.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\subsubsection{w\^{}x}
|
2019-10-28 12:41:43 +01:00
|
|
|
|
2019-12-17 21:27:03 +01:00
|
|
|
The rise of code reuse exploits like \ac{rop} and ret-to-libc, shows the
|
|
|
|
ineffectiveness of w\^{}x protection. It makes vulnerabilities harder to exploit
|
|
|
|
by preventing the most naive types of payloads but it doesn't actually prevent
|
|
|
|
exploits from happening.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
\Ac{nx} does not prevent any exploits but makes it harder for an attacker that
|
|
|
|
does not know the system, the program is running on (e.g.\ a network service).
|
|
|
|
It has no runtime overhead and is a compile-time option so it does not hurt to
|
|
|
|
enable \ac{nx}.
|
2019-11-19 21:18:50 +01:00
|
|
|
|
|
|
|
\subsubsection{Runtime Bounds Checks}
|
|
|
|
|
|
|
|
Checking for overflows at runtime is very effective but can have a huge
|
|
|
|
performance impact so it is not feasible in every case. It also comes with other
|
2019-12-16 17:56:17 +01:00
|
|
|
footguns. There might be integer overflows when calculating the bounds which
|
2019-11-19 21:18:50 +01:00
|
|
|
might introduce other problems.
|
|
|
|
|
|
|
|
\subsection{State of the Art}
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-12-17 21:27:03 +01:00
|
|
|
Operating systems started to compile C code to \acp{pie} by
|
2019-12-16 17:56:17 +01:00
|
|
|
default~\cite{ArchPie2017} and \ac{aslr} is enabled, too. Same goes for \ac{nx}
|
|
|
|
and stack canaries~\cite{ArchPie2017}. The combination of these mitigations
|
|
|
|
makes it hard to write general exploits for modern operating systems.
|
2019-10-28 12:41:43 +01:00
|
|
|
|
2019-12-17 11:04:27 +01:00
|
|
|
To check the current state, the author investigates, which mitigations are
|
|
|
|
enabled by default in the latest release (9.2) of the \ac{gcc} and the latest
|
|
|
|
commit of the LLVM-project (\mintinline[breaklines]{shell}{181ab91efc9}) by
|
2019-12-17 21:27:03 +01:00
|
|
|
building both compilers using the default configuration. The experiments are
|
2019-12-17 11:04:27 +01:00
|
|
|
performed on a 64-bit Debian 9.11 system running on version 4.19.0 of the Linux
|
|
|
|
kernel. The following commands compile the source codes:
|
2019-11-19 21:18:50 +01:00
|
|
|
|
2019-12-16 17:56:17 +01:00
|
|
|
\begin{figure}[h!]
|
|
|
|
\begin{subfigure}[b]{.3\textwidth}
|
|
|
|
\begin{minted}{shell}
|
|
|
|
mkdir objdir \
|
|
|
|
&& cd objdir \
|
|
|
|
&& ../configure \
|
|
|
|
--build=x86_64-linux-gnu \
|
|
|
|
--host=x86_64-linux-gnu \
|
|
|
|
--target=x86_64-linux-gnu \
|
|
|
|
--disable-multilib \
|
|
|
|
&& make -j8
|
|
|
|
\end{minted}
|
|
|
|
\caption{\ac{gcc} compilation script}\label{lst:gcc}
|
|
|
|
\end{subfigure}
|
|
|
|
\\
|
|
|
|
\begin{subfigure}[b]{.3\textwidth}
|
|
|
|
\begin{minted}{shell}
|
|
|
|
mkdir build \
|
|
|
|
&& cd build \
|
|
|
|
&& cmake -DLLVM_ENABLE_PROJECTS=clang \
|
|
|
|
-DCMAKE_BUILD_TYPE=Release \
|
2019-12-16 18:04:15 +01:00
|
|
|
-G "Unix Makefiles" ../llvm \
|
2019-12-16 17:56:17 +01:00
|
|
|
&& make -j8
|
|
|
|
\end{minted}
|
2019-12-17 11:04:27 +01:00
|
|
|
\caption{clang compilation script}\label{lst:clang}
|
2019-12-16 17:56:17 +01:00
|
|
|
\end{subfigure}
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
The \mintinline{shell}{build}, \mintinline{shell}{host} and
|
2019-12-17 11:04:27 +01:00
|
|
|
\mintinline{shell}{target} parameters in~\cref{lst:gcc} describe the target
|
2019-12-16 17:56:17 +01:00
|
|
|
platform for the compiler and \mintinline{shell}{disable-multilib} disables
|
2019-12-17 21:27:03 +01:00
|
|
|
32-bit support, which is not needed for this experiment. The
|
|
|
|
\mintinline{sh}{-j8} flag only tells make to use all 8 available cores for
|
|
|
|
compilation. \mintinline{shell}{CMAKE_BUILD_TYPE=Release} creates a release
|
|
|
|
build of the clang compiler (see~\cref{lst:clang}).
|
2019-12-16 17:56:17 +01:00
|
|
|
|
|
|
|
The fresh builds of \ac{gcc} and clang compile the code from~\cref{lst:vuln} to
|
2019-12-17 21:27:03 +01:00
|
|
|
check which mitigations are enabled by default. After using
|
2019-12-16 17:56:17 +01:00
|
|
|
\mintinline[breaklines]{shell}{gcc -o vuln.gcc vuln.c} and
|
|
|
|
\mintinline[breaklines]{shell}{clang -o vuln.clang vuln.c} to compile the source
|
|
|
|
code, the \mintinline{shell}{checksec.sh} tool~\cite{Checksec2019} shows which
|
|
|
|
mitigations are active in the new binary:
|
|
|
|
|
|
|
|
\begin{table}[h!]
|
|
|
|
\begin{center}
|
2019-12-17 11:04:27 +01:00
|
|
|
\begin{tabular}{lrr}
|
2019-12-16 17:56:17 +01:00
|
|
|
\toprule
|
|
|
|
Mitigation & Active in \ac{gcc}? & Active in clang? \\
|
|
|
|
\toprule
|
|
|
|
Stack Canary & No & No \\
|
|
|
|
\midrule
|
|
|
|
\ac{nx} & Yes & Yes \\
|
|
|
|
\midrule
|
|
|
|
\ac{pie} & No & No \\
|
|
|
|
\bottomrule
|
|
|
|
\end{tabular}
|
|
|
|
\caption{Enabled mitigations in a default \ac{gcc} and clang
|
|
|
|
build}\label{tab:mitigations}
|
|
|
|
\end{center}
|
|
|
|
\end{table}
|
|
|
|
|
|
|
|
Surprisingly enough, two of the most popular C compilers enable only one of the
|
2019-12-17 11:04:27 +01:00
|
|
|
described compile-time mitigations by default (see~\cref{tab:mitigations}).
|
|
|
|
Maintainer of operating system packages of the compiler might choose a more
|
|
|
|
secure configuration for the compiler as shown in~\cite{ArchPie2017} but still,
|
|
|
|
compiler vendors might want to choose better defaults, too.
|
2019-12-16 17:56:17 +01:00
|
|
|
|
2019-12-17 11:04:27 +01:00
|
|
|
So far, all discussed mitigations don't change anything about the existence of
|
2019-12-16 17:56:17 +01:00
|
|
|
\acp{bof} but just try to prevent the exploitation for code execution. The
|
2019-12-17 11:04:27 +01:00
|
|
|
vulnerable programs terminate if the stack canary is overwritten, a call into
|
|
|
|
\ac{nx} memory occurs or execution continues inside garbage data due to
|
|
|
|
\ac{aslr}. The underlying problem persists, only the worst results are
|
2019-12-16 17:56:17 +01:00
|
|
|
mitigated. \Ac{dos} is still a problem in safety critical systems (e.g.\ cars,
|
|
|
|
planes, medical devices) or in any area with real-time requirements.
|
|
|
|
|
|
|
|
Language extensions to fix the problem of \acp{bof} as described
|
|
|
|
in~\cite{Dep2007} require lots of discipline to use them everywhere. They are
|
|
|
|
only useful if the whole codebase uses the new features. Introducing them in an
|
|
|
|
existing codebase is quite unrealistic since it requires lots of modifications.
|
|
|
|
On the other hand, this actually prevents \acp{bof} from happening and not just
|
|
|
|
from being exploited, so it looks like an interesting concept for safety
|
|
|
|
critical software.
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\section{Conclusion}\label{ref:conclusion}
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-11-20 20:03:53 +01:00
|
|
|
While there are many techniques, that protect against different types of
|
2019-12-16 17:56:17 +01:00
|
|
|
\acp{bof}, none of them is effective in every situation but in combination they
|
2019-12-17 11:04:27 +01:00
|
|
|
offer good protection against code execution attacks. Maybe the time has come,
|
|
|
|
where usage of memory unsafe languages has to be stopped where it is not
|
2019-10-28 11:46:07 +01:00
|
|
|
inevitable. There are many modern programming languages, that aim for the same
|
2019-12-16 17:56:17 +01:00
|
|
|
problem space as C, C++ or Fortran but without the issues coming from these
|
|
|
|
languages. If it is feasible to use a garbage collector, languages like Go, Java
|
|
|
|
or even scripting languages like Python might work just fine. If real-time
|
|
|
|
properties are required, Rust could be the way to go, without any language
|
|
|
|
runtime and with deterministic memory management. For any other problem, almost
|
|
|
|
any other memory safe language is better than using unsafe C.
|
2019-10-21 15:40:47 +02:00
|
|
|
|
2019-11-19 21:18:50 +01:00
|
|
|
\printbibliography{}
|
2019-10-29 11:09:23 +01:00
|
|
|
% \bibliographystyle{IEEEtran}
|
|
|
|
% \bibliography{bibliography}
|
2019-12-09 13:04:39 +01:00
|
|
|
% \printacronyms{}
|
2019-10-21 15:40:47 +02:00
|
|
|
|
|
|
|
\end{document}
|
|
|
|
% vim: set filetype=tex ts=2 sw=2 tw=80 et spell :
|