\documentclass[conference]{IEEEtran} \IEEEoverridecommandlockouts{} % The preceding line is only needed to identify funding in the first footnote. % If that is unneeded, please comment it out. \usepackage{csquotes} \usepackage[style=ieee,backend=biber]{biblatex} \addbibresource{./bibliography.bib} \usepackage{amsmath,amssymb,amsfonts} \usepackage{algorithmic} \usepackage{booktabs} \usepackage{graphicx} \usepackage{textcomp} \usepackage{xcolor} \usepackage{caption} \usepackage{subcaption} % code listings \usepackage{minted} \usepackage{relsize} % acronyms \usepackage{acro} \acsetup{list-long-format=\capitalisewords} %additional packages %\usepackage[ngerman]{babel} \usepackage[utf8]{inputenc} \usepackage{hyperref} \usepackage{url} %%fuer abkuerzungen begin \usepackage[acronym,hyperfirst = false]{glossaries} \glsdisablehyper{} %\usepackage[acronym,acronymlists={main, %abbreviationlist},shortcuts,toc,description,footnote]{glossaries} \newglossary[clg]{abbreviationlist}{cyi}{cyg}{List of Abbreviations} \newglossary[slg]{symbolslist}{syi}{syg}{Symbols} \renewcommand{\firstacronymfont}[1]{\emph{#1}} \renewcommand*{\glspostdescription}{} % Punkt am Ende jeder Beschreibung entfernen \renewcommand*{\acrnameformat}[2]{#2 (\acronymfont{#1})} % Langform der Akronyme \makeglossaries{} \date{\today} \input{glossary} %%fuer abkuerzungen end \include{acronyms} \begin{document} \title{Overview Over Attack Vectors and Countermeasures for Buffer Overflows} \author{\IEEEauthorblockN{Valentin Brandl} \IEEEauthorblockA{\textit{Faculity of Computer Science and Mathematics} \\ \textit{OTH Regensburg}\\ Regensburg, Germany \\ valentin.brandl@st.oth-regensburg.de\\ MatrNr. 3220018} } \maketitle \begin{abstract} TODO \end{abstract} \begin{IEEEkeywords} Buffer Overflow, Software Security \end{IEEEkeywords} \section{Motivation}\label{ref:motivation} When the first programming languages were designed, memory had to be managed manually to make the best use of slow hardware. This opened the door for many kinds of programming errors. Memory can be deallocated more than once (double-free), the programm could read or write out of bounds of a buffer (information leaks, \acp{bof}). Languages that are affected by this are e.g. C, C++ and Fortran. These languages are still used in critical parts of the worlds infrastructure, either because they allow to implement really performant programms, because they power legacy systems or for portability reasons. Scientists and software engineers have proposed lots of solutions to this problem over the years and this paper aims to compare and give an overview about those. Reading out of bounds can result in an information leak and is less critical than \acp{bof} in most cases, but there are exceptions, e.g.\ the Heartbleed bug in OpenSSL which allowed dumping secret keys from memory. Out of bounds writes are almost always critical and result in code execution vulnerabilities or at least application crashes. In 2018, 14\% (2368 out of 16556)~\cite{Cve2018} of all software vulnerabilities that have a CVE assigned, were overflow related. This shows that, even if this type of bug is very old and well known, it's still relevant today. \section{Background}\label{ref:background} % TODO: many references \subsection{Technical Details} Exploitation of \ac{bof} vulnerabilities almost always works by overriding the return address in the current stack frame, so when the \mintinline{ASM}{RET} instruction is executed, an attacker controlled address is moved into the instruction pointer register and the code pointed to by this address is executed. Other ways include overriding addresses in the \ac{plt} of a binary so that, if a linked function is called, an attacker controlled function is called instead, or (in C++) overriding the vtable where the pointers to an object's methods are stored. A simple vulnerable programm might look like this: \begin{minted}{c} int main(int argc, char **argv) { char buf[50]; for (size_t i = 0; i < strlen(argv[1]); i++) { buf[i] = argv[1][i]; } return 0; } \end{minted} A successful exploit would place the payload in the memory by supplying it as an argument to the programm and eventually overwrite the return address by providing an input $> 50$ and therefore writing out of bounds. When the \mintinline{C}{return} instruction is executed, and jumps into the payload, the attacker's code is executed. This works due to the way, how function calls on CPUs work. The stack frame of the current function lies between the two pointers \ac{bp} and \ac{sp} as shown in~\ref{fig:before}. When a function is called, the value of the \ac{bp}, \ac{sp} and \ac{ip} is pushed to the stack (Fig.~\ref{fig:call}) and the \ac{ip} is set to the address of the called function. When the function returns, the old \ac{ip} is restored from the stack and the execution continues from where the function was called. If an overflow overwrites the old \ac{ip} (Fig.~\ref{fig:exploit}), the execution continues in attacker controlled code. \begin{figure}[h!] \includegraphics[width=.3\textwidth]{./dot/before.pdf} \caption{Stack layout before function call}\label{fig:before} \end{figure}% \begin{figure}[h!] \includegraphics[width=.3\textwidth]{./dot/call.pdf} \caption{Stack layout after function call}\label{fig:call} \end{figure}% \begin{figure}[h!] \includegraphics[width=.3\textwidth]{./dot/exploit.pdf} \caption{Stack layout after overflow}\label{fig:exploit} \end{figure} This is only one of several types and exploitation techniques but the general idea stays the same: ovewrite the return address or some kind of function pointer (e.g.\ in vtables or the \ac{plt}) and once that function is called, the execution flow is hijacked and the attacker can execute arbitiary code. The most trivial kinds of exploits is known as a \mintinline{ASM}{NOP} sled. Here the attacker appends as many \mintinline{ASM}{NOP} instructions before any shellcode (e.g.\ to invoke \mintinline{shell}{/bin/sh}) and points the overwritten \ac{ip} somewhere inside the \mintinline{ASM}{NOP}s. The execution \enquote{slides} through the \mintinline{ASM}{NOP}s until it reaches the shellcode. Most of the migration techniques described in this paper protect against this kind of exploit but there are different and more complex ways of exploiting \acp{bof} that are not that easily migrated. \subsection{Implications} \section{Concept and Methods}\label{ref:concept} \subsection{Methods} This paper describes several techniques that have been proposed to fix the problems introduced by \acp{bof}. The performance impact, effectiveness (e.g.\ did the technique actually prevent exploitation of \acp{bof}?) and how realistic it is for developers to use the technique in real-world code (e.g.\ is incremental introduction into an existing codebase possible). In the end, there is a discussion about the current state. \subsection{Runtime Bounds Checks} The easiest and maybe single most effective method to prevent \acp{bof} is to check, if a write or read operation is out of bounds. This requires storing the size of a buffer together with the pointer to the buffer and check for each read or write in the buffer, if it is in bounds at runtime. Still almost any language that comes with a runtime, uses runtime checking. For this technique to be effective effective in general, writes to a raw pointer must be disallowed. Otherwise the security checks can be circumvented. \subsection{Prevent/Detect Overriding Return Address} Since most traditional \ac{bof} exploits work by overriding the return address in the current stack frame, preventing or at least detecting this, can be quite effective without much overhead at runtime. \citeauthor{Rad2001} describe a technique that stores a redudnant copy of the return address in a secure memory area that is guarded by read-only memory, so it cannot be overwritten by overflows. When returning, the copy of the return address is compared to the one in the current stack frame and only, if it matches, the \mintinline{ASM}{RET} instruction is actually executed~\cite{Rad2001}. While this is effective against \ac{rop} based exploits, it does not protect against vtable overrides. An older technique from 1998 proposes to put a canary word between the data of a stack frame and the return address~\cite{Stackguard1998}. When returning, the canary is checked, if it is still intact and if not, a \ac{bof} occurred. This technique is used in major operating systems %TODO but can be defeted, if there is an information leak that leaks the cannary to the attacker. The attacker is then able to construct a payload, that keeps the canary intact. \subsection{Restricting Language Features to a Secure Subset} \subsection{Static Analysis} \subsection{Type System Solutions} \citeauthor{Dep2007} propose an extension to the C type system that extends it with dependent types. These types have an associated value, e.g.\ a pointer type can have the buffer size associated to it. This prevents indexing into a buffer with out-of-bounds values. This extension is a superset of C so any valid C code can be compiled using the extension and the codebase is improved incrementally. If the type extension is advanced enough, the additional information might form the base for a formal verification. \subsection{Address Space Layout Randomization} \Ac{aslr} aims to prevent exploitatoin of \acp{bof} by placing code at random locations in memory. That way, it is not trivial to set the return address to point to the payload in memory. This is effective against generic exploits but it is still posible to exploit \ac{bof} vulnerabilities in combination with information leaks or other techniques like heap spraying. Also on 32 bit systems, the address space is small enough to try a brute-force attempt until the payload in memory is hit. \subsection{w\^{}x Memory} w\^{}x (also known as \ac{nx}) makes memory either writable or executable. That way, an attacker cannot place arbitiary payloads in memory. There are still techniques to exploit this by reusing existing executable code. The ret-to-libc exploiting technique uses existing calls to the libc with attacker controlled parameters, e.g.\ if the programm uses the \mintinline{shell}{system} command, the attacker can plant \mintinline{shell}{/bin/sh} as parameter on the stack, followed by the address of \mintinline{shell}{system} and get a shell on the system. \ac{rop} (a superset of ret-to-libc exploits) uses so called \ac{rop} gadgets, combinations of memory modifying instructions followed by the \mintinline{ASM}{RET} instruction to build instruction chains, that execute the desired shellcode. This is done by placing the desired return addresses in the right order on the stack and reuses the existing code to circumvent the w\^{}x protection. These combinations of memory modification followed by \mintinline{ASM}{RET} instructions are called \ac{rop} chains and are turing complete~\cite{Rop2007}, so in theory it is possible to implement any imaginable payload, as long as the exploited program contains enough gadgets and the overflowing buffer has enough space. \section{Discussion}\label{ref:discussion} \subsection{Ineffective or Inefficient} \subsubsection{\ac{aslr}} \Ac{aslr} has been really effective and wildly used in production. It is included in most major operating systems~\cite{FBSDaslr}. Some even use kernel \ac{aslr}~\cite{Linuxaslr}. Since this mechanism is active at runtime, it does not require any changes in the code itself, the programm only has to be compiled as a \ac{pie}. On 32-bit CPUs, only 16-bit of the address are randomized. These 16-bit can be brute forced in a few minutes or seconds~\cite{AslrEffective2004}. \subsubsection{w\^{}x} With the rise of \ac{rop} techniques, w\^{}x protection has been shown to be ineffective. It makes vulnerabilities harder to exploit but does not prevent anything. \subsubsection{Runtime Bounds Checks} Checking for overflows at runtime is very effective but can have a huge performance impact so it is not feasible in every case. It also comes with other footguns. There might be integer overflows when calculating the bounts which might introduce other problems. Methods that have been shown to be ineffective (e.g.\ can be circumvented easily) or inefficient (to much runtime overhead)\ldots \subsection{State of the Art} What techniques are currently used? \subsection{Outlook} \section{Conclusion}\label{ref:conclusion} While there are many techniques, that protect against different types of \acp{bof}, none of them is effctive in every situation. Maybe we've come to a point where we have to stop using memory unsafe languages where it is not inevitable. There are many modern programming languages, that aim for the same problem space as C, C++ or Fortran but without the issues comming/stemming %TODO from these languages. If it is feasible to use a garbage collector, Go might work just fine. If real-time properties are required, Rust could be the way to go, without any language runtime and with deterministic memory management. For any other problem, almost any other memory safe language is better than using unsafe C. \section{Sources (Dummy Section for Deadline)} \begin{itemize} \item RAD:\ A Compile-Time Solution to Buffer Overflow Attacks~\cite{Rad2001} (might not protect against e.g.\ vtable overrides, \ac{plt} address changes, \dots) \item Dependent types for low-level programming~\cite{Dep2007} \item StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attachs~\cite{Stackguard1998} (ineffective in combination with information leaks) \item Type-Assisted Dynamic Buffer Overflow Detection~\cite{TypeAssisted2002} \item On the Effectiveness of NX, SSP, RenewSSP, and \ac{aslr} against Stack Buffer Overflows~\cite{Effectiveness2014} \item What Do We Know About Buffer Overflow Detection?: A Survey on Techniques to Detect A Persistent Vulnerability~\cite{Detection2018} \item Survey of Attacks and Defenses on Stack-based Buffer Overflow Vulnerability~\cite{AtkDef2016} \item Beyond stack smashing: recent advances in exploiting buffer overruns~\cite{Smashing2004} \item Runtime countermeasures for code injection attacks against C and C++ programs~\cite{Counter2012} \end{itemize} \printbibliography{} % \bibliographystyle{IEEEtran} % \bibliography{bibliography} % \printacronyms{} \end{document} % vim: set filetype=tex ts=2 sw=2 tw=80 et spell :