\documentclass[conference]{IEEEtran}
\IEEEoverridecommandlockouts{}
% The preceding line is only needed to identify funding in the first footnote.
% If that is unneeded, please comment it out.
\usepackage{csquotes}
\usepackage[style=ieee,backend=biber]{biblatex}

\addbibresource{./bibliography.bib}

\usepackage{amsmath,amssymb,amsfonts}
\usepackage{algorithmic}
\usepackage{booktabs}
\usepackage{graphicx}
\usepackage{textcomp}
\usepackage{xcolor}
\usepackage{caption}
\usepackage{subcaption}

% code listings
\usepackage{minted}
\usepackage{relsize}
% acronyms
\usepackage{acro}
\acsetup{list-long-format=\capitalisewords}

%additional packages
%\usepackage[ngerman]{babel}
\usepackage[utf8]{inputenc}
\usepackage{hyperref}
\usepackage{url}
%%fuer abkuerzungen begin
\usepackage[acronym,hyperfirst = false]{glossaries}
\glsdisablehyper{}
%\usepackage[acronym,acronymlists={main,
%abbreviationlist},shortcuts,toc,description,footnote]{glossaries}
\newglossary[clg]{abbreviationlist}{cyi}{cyg}{List of Abbreviations}
\newglossary[slg]{symbolslist}{syi}{syg}{Symbols}
\renewcommand{\firstacronymfont}[1]{\emph{#1}}
\renewcommand*{\glspostdescription}{}	% Punkt am Ende jeder Beschreibung entfernen
\renewcommand*{\acrnameformat}[2]{#2 (\acronymfont{#1})}	% Langform der Akronyme
\makeglossaries{}
\date{\today}
\input{glossary}
%%fuer abkuerzungen end

\include{acronyms}

\begin{document}

\title{Overview Over Attack Vectors and Countermeasures for Buffer Overflows}

\author{\IEEEauthorblockN{Valentin Brandl}
\IEEEauthorblockA{\textit{Faculity of Computer Science and Mathematics} \\
\textit{OTH Regensburg}\\
Regensburg, Germany \\
valentin.brandl@st.oth-regensburg.de\\
MatrNr. 3220018}
}

\maketitle

\begin{abstract}
TODO
\end{abstract}

\begin{IEEEkeywords}
Buffer Overflow, Software Security
\end{IEEEkeywords}


\section{Motivation}\label{ref:motivation}

When the first programming languages were designed, memory had to be managed
manually to make the best use of slow hardware. This opened the door for many
kinds of programming errors. Memory can be deallocated more than once
(double-free), the programm could read or write out of bounds of a buffer
(information leaks, \acp{bof}). Languages that are affected by this are e.g. C,
C++ and Fortran. These languages are still used in critical parts of the worlds
infrastructure, either because they allow to implement really performant
programms, because they power legacy systems or for portability reasons.
Scientists and software engineers have proposed lots of solutions to this
problem over the years and this paper aims to compare and give an overview about
those.

Reading out of bounds can result in an information leak and is less critical
than \acp{bof} in most cases, but there are exceptions, e.g.\ the Heartbleed bug
in OpenSSL which allowed dumping secret keys from memory. Out of bounds writes
are almost always critical and result in code execution vulnerabilities or at
least application crashes.

In 2018, 14\% (2368 out of 16556)~\cite{Cve2018} of all software vulnerabilities
that have a CVE assigned, were overflow related. This shows that, even if this
type of bug is very old and well known, it's still relevant today.

\section{Background}\label{ref:background}

% TODO: many references

\subsection{Technical Details}

Exploitation of \ac{bof} vulnerabilities almost always works by overriding the
return address in the current stack frame, so when the \mintinline{ASM}{RET}
instruction is executed, an attacker controlled address is moved into the
instruction pointer register and the code pointed to by this address is
executed. Other ways include overriding addresses in the \ac{plt} of a binary so
that, if a linked function is called, an attacker controlled function is called
instead, or (in C++) overriding the vtable where the pointers to an object's
methods are stored.

A simple vulnerable programm might look like this:

\begin{minted}{c}
int main(int argc, char **argv) {
  char buf[50];
  for (size_t i = 0; i < strlen(argv[1]); i++) {
    buf[i] = argv[1][i];
  }
  return 0;
}
\end{minted}

A successful exploit would place the payload in the memory by supplying it as an
argument to the programm and eventually overwrite the return address by
providing an input $> 50$ and therefore writing out of bounds. When the
\mintinline{C}{return} instruction is executed, and jumps into the payload, the
attacker's code is executed. This works due to the way, how function calls on
CPUs work. The stack frame of the current function lies between the two pointers
\ac{bp} and \ac{sp} as shown in~\ref{fig:before}. When a function is called, the
value of the \ac{bp}, \ac{sp} and \ac{ip} is pushed to the stack
(Fig.~\ref{fig:call}) and the \ac{ip} is set to the address of the called
function. When the function returns, the old \ac{ip} is restored from the stack
and the execution continues from where the function was called. If an overflow
overwrites the old \ac{ip} (Fig.~\ref{fig:exploit}), the execution continues in
attacker controlled code.

\begin{figure}[h!]
  \includegraphics[width=.3\textwidth]{./dot/before.pdf}
  \caption{Stack layout before function call}\label{fig:before}
\end{figure}%

\begin{figure}[h!]
  \includegraphics[width=.3\textwidth]{./dot/call.pdf}
  \caption{Stack layout after function call}\label{fig:call}
\end{figure}%

\begin{figure}[h!]
  \includegraphics[width=.3\textwidth]{./dot/exploit.pdf}
  \caption{Stack layout after overflow}\label{fig:exploit}
\end{figure}

This is only one of several types and exploitation techniques but the general
idea stays the same: ovewrite the return address or some kind of function
pointer (e.g.\ in vtables or the \ac{plt}) and once that function is called, the
execution flow is hijacked and the attacker can execute arbitiary code.

The most trivial kinds of exploits is known as a \mintinline{ASM}{NOP} sled.
Here the attacker appends as many \mintinline{ASM}{NOP} instructions before any
shellcode (e.g.\ to invoke \mintinline{shell}{/bin/sh}) and points the
overwritten \ac{ip} somewhere inside the \mintinline{ASM}{NOP}s. The execution
\enquote{slides} through the \mintinline{ASM}{NOP}s until it reaches the
shellcode. Most of the migration techniques described in this paper protect
against this kind of exploit but there are different and more complex ways of
exploiting \acp{bof} that are not that easily migrated.

\subsection{Implications}

\section{Concept and Methods}\label{ref:concept}

\subsection{Methods}

This paper describes several techniques that have been proposed to fix the
problems introduced by \acp{bof}. The performance impact, effectiveness (e.g.\
did the technique actually prevent exploitation of \acp{bof}?) and how realistic
it is for developers to use the technique in real-world code (e.g.\ is
incremental introduction into an existing codebase possible). In the end, there
is a discussion about the current state.

\subsection{Runtime Bounds Checks}

The easiest and maybe single most effective method to prevent \acp{bof} is to
check, if a write or read operation is out of bounds. This requires storing the
size of a buffer together with the pointer to the buffer and check for each read
or write in the buffer, if it is in bounds at runtime. Still almost any language
that comes with a runtime, uses runtime checking. For this technique to be
effective effective in general, writes to a raw pointer must be disallowed.
Otherwise the security checks can be circumvented.

\subsection{Prevent/Detect Overriding Return Address}

Since most traditional \ac{bof} exploits work by overriding the return address
in the current stack frame, preventing or at least detecting this, can be quite
effective without much overhead at runtime. \citeauthor{Rad2001} describe a
technique that stores a redudnant copy of the return address in a secure memory
area that is guarded by read-only memory, so it cannot be overwritten by
overflows. When returning, the copy of the return address is compared to the one
in the current stack frame and only, if it matches, the \mintinline{ASM}{RET}
instruction is actually executed~\cite{Rad2001}. While this is effective against
\ac{rop} based exploits, it does not protect against vtable overrides.

An older technique from 1998 proposes to put a canary word between the data of a
stack frame and the return address~\cite{Stackguard1998}. When returning, the
canary is checked, if it is still intact and if not, a \ac{bof} occurred.  This
technique is used in major operating systems %TODO
but can be defeted, if there
is an information leak that leaks the cannary to the attacker. The attacker is
then able to construct a payload, that keeps the canary intact.

\subsection{Restricting Language Features to a Secure Subset}
\subsection{Static Analysis}
\subsection{Type System Solutions}

\citeauthor{Dep2007} propose an extension to the C type system that extends it
with dependent types. These types have an associated value, e.g.\ a pointer type
can have the buffer size associated to it. This prevents indexing into a buffer
with out-of-bounds values. This extension is a superset of C so any valid C code
can be compiled using the extension and the codebase is improved incrementally.
If the type extension is advanced enough, the additional information might form
the base for a formal verification.

\subsection{Address Space Layout Randomization}

\Ac{aslr} aims to prevent exploitatoin of \acp{bof} by placing code at random
locations in memory. That way, it is not trivial to set the return address to
point to the payload in memory. This is effective against generic exploits but
it is still posible to exploit \ac{bof} vulnerabilities in combination with
information leaks or other techniques like heap spraying. Also on 32 bit
systems, the address space is small enough to try a brute-force attempt until
the payload in memory is hit.

\subsection{w\^{}x Memory}

w\^{}x (also known as \ac{nx}) makes memory either writable or executable. That
way, an attacker cannot place arbitiary payloads in memory.  There are still
techniques to exploit this by reusing existing executable code.  The ret-to-libc
exploiting technique uses existing calls to the libc with attacker controlled
parameters, e.g.\ if the programm uses the \mintinline{shell}{system} command,
the attacker can plant \mintinline{shell}{/bin/sh} as parameter on the stack,
followed by the address of \mintinline{shell}{system} and get a shell on the
system. \ac{rop} (a superset of ret-to-libc exploits) uses so called \ac{rop}
gadgets, combinations of memory modifying instructions followed by the
\mintinline{ASM}{RET} instruction to build instruction chains, that execute the
desired shellcode. This is done by placing the desired return addresses in the
right order on the stack and reuses the existing code to circumvent the w\^{}x
protection. These combinations of memory modification followed by
\mintinline{ASM}{RET} instructions are called \ac{rop} chains and are turing
complete~\cite{Rop2007}, so in theory it is possible to implement any imaginable
payload, as long as the exploited program contains enough gadgets and the
overflowing buffer has enough space.


\section{Discussion}\label{ref:discussion}

\subsection{Ineffective or Inefficient}

\subsubsection{\ac{aslr}}

\Ac{aslr} has been really effective and wildly used in production. It is
included in most major operating systems~\cite{FBSDaslr}. Some even use kernel
\ac{aslr}~\cite{Linuxaslr}. Since this mechanism is active at runtime, it does
not require any changes in the code itself, the programm only has to be compiled
as a \ac{pie}. On 32-bit CPUs, only 16-bit of the address are randomized. These
16-bit can be brute forced in a few minutes or seconds~\cite{AslrEffective2004}.

\subsubsection{w\^{}x}

With the rise of \ac{rop} techniques, w\^{}x protection has been shown to be
ineffective. It makes vulnerabilities harder to exploit but does not prevent
anything.

\subsubsection{Runtime Bounds Checks}

Checking for overflows at runtime is very effective but can have a huge
performance impact so it is not feasible in every case. It also comes with other
footguns. There might be integer overflows when calculating the bounts which
might introduce other problems.

Methods that have been shown to be ineffective (e.g.\ can be circumvented
easily) or inefficient (to much runtime overhead)\ldots

\subsection{State of the Art}

What techniques are currently used?

\subsection{Outlook}


\section{Conclusion}\label{ref:conclusion}

While there are many techniques, that protect against different types of
\acp{bof}, none of them is effctive in every situation. Maybe we've come to a
point where we have to stop using memory unsafe languages where it is not
inevitable. There are many modern programming languages, that aim for the same
problem space as C, C++ or Fortran but without the issues comming/stemming %TODO
from these languages. If it is feasible to use a garbage collector, Go might
work just fine. If real-time properties are required, Rust could be the way to
go, without any language runtime and with deterministic memory management. For
any other problem, almost any other memory safe language is better than using
unsafe C.

\section{Sources (Dummy Section for Deadline)}

\begin{itemize}

  \item RAD:\ A Compile-Time Solution to Buffer Overflow Attacks~\cite{Rad2001}
    (might not protect against e.g.\ vtable overrides, \ac{plt} address changes,
    \dots)

  \item Dependent types for low-level programming~\cite{Dep2007}

  \item StackGuard: Automatic Adaptive Detection and Prevention of
    Buffer-Overflow Attachs~\cite{Stackguard1998} (ineffective in combination
    with information leaks)

  \item Type-Assisted Dynamic Buffer Overflow Detection~\cite{TypeAssisted2002}

  \item On the Effectiveness of NX, SSP, RenewSSP, and \ac{aslr} against Stack
    Buffer Overflows~\cite{Effectiveness2014}

  \item What Do We Know About Buffer Overflow Detection?: A Survey on Techniques
    to Detect A Persistent Vulnerability~\cite{Detection2018}

  \item Survey of Attacks and Defenses on Stack-based Buffer Overflow
    Vulnerability~\cite{AtkDef2016}

  \item Beyond stack smashing: recent advances in exploiting buffer
    overruns~\cite{Smashing2004}

  \item Runtime countermeasures for code injection attacks against C and C++
    programs~\cite{Counter2012}

\end{itemize}


\printbibliography{}
% \bibliographystyle{IEEEtran}
% \bibliography{bibliography}
% \printacronyms{}

\end{document}
% vim: set filetype=tex ts=2 sw=2 tw=80 et spell :