Content

2022-04-08 01:43:49 +02:00
parent e7d8c3131d
commit 48eb7891af
8 changed files with 106 additions and 44 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,3 @@
-_minted-paper/
 *.acr
 *.aux
 *.bbl
@ -12,7 +11,9 @@ _minted-paper/
 *.out
 *.pdf
 *.run.xml
+*.tdo
 *.toc
+_minted-paper/

 result
 _minted-report
--- a/acronyms.tex
+++ b/acronyms.tex
@ -90,4 +90,9 @@
  long  = {Message-Digest Algorithm 5},
 }

+\DeclareAcronym{mm}{
+  short = {MM},
+  long  = {Membership Management},
+}
+
 % vim: set filetype=tex ts=2 sw=2 tw=0 et :
--- a/appendix.tex
+++ b/appendix.tex
@ -1,10 +1,5 @@
 \appendix

-% TODO: add to table of contents?
-\printbibliography[heading=bibintoc]{}
-
-\clearpage
-
 % TODO: add to table of contents?
 \addcontentsline{toc}{section}{List of Figures}
 \listoffigures
@ -22,6 +17,11 @@

 \clearpage

+% TODO: add to table of contents?
+\printbibliography[heading=bibintoc]{}
+
+\clearpage
+
 \begin{otherlanguage}{ngerman}
  \makedeclaration{}
 \end{otherlanguage}
--- a/bibliography.bib
+++ b/bibliography.bib
@ -377,4 +377,19 @@
  series = {LEET'12}
 }

+@article{bib:wangCollisions,
+  title={Collisions for hash functions MD4, MD5, HAVAL-128 and RIPEMD},
+  author={Wang, Xiaoyun and Feng, Dengguo and Lai, Xuejia and Yu, Hongbo},
+  journal={Cryptology EPrint Archive},
+  year={2004}
+}
+
+@article{bib:stevensCollision,
+    author       = {Marc Stevens},
+    title        = {Fast Collision Attack on MD5},
+    howpublished = {Cryptology ePrint Archive, Report 2006/104},
+    year         = {2006},
+    note         = {\url{https://ia.cr/2006/104}},
+}
+
 /* vim: set filetype=bib ts=2 sw=2 tw=0 et :*/
--- a/content.tex
+++ b/content.tex
@ -7,7 +7,7 @@ We use the internet to communicate, shop, handle financial transactions, and muc
 Many personal and professional workflows are so dependent on the internet, that they won't work when being offline, and with the pandemic, we are living through, this dependency grew even stronger.

 %{{{ motivation
-\subsection{Motivation}
+% \subsection{Motivation}

 In 2021 there were around 10 billion internet connected \ac{iot} devices and this number is estimated to more than double over the next years up to 25 billion in 2030~\cite{bib:statista_iot_2020}.
 Many of these devices run on outdated software, don't receive regular updates, and don't follow general security best practices.
@ -44,26 +44,43 @@ To complicate take-down attempts, botnet operators came up with a number of idea
 %}}}fig:c2vsp2p

 A number of botnet operations were shut down like this~\cite{bib:nadji_beheading_2013} and as the defenders upped their game, so did attackers---the concept of \ac{p2p} botnets emerged.
-The idea is to build a decentralized network without \acp{spof} in the form of \ac{c2} servers as shown in \autoref{fig:p2p}.
+The idea is to build a distributed network without \acp{spof} in the form of \ac{c2} servers as shown in \autoref{fig:p2p}.
 In a \ac{p2p} botnet, each node in the network knows a number of its neighbors and connects to those, each of these neighbors has a list of neighbors on its own, and so on.
 The bot master only needs to join the network to send new commands or receive stolen data.
 Any of the nodes in \autoref{fig:p2p} could be the bot master but they don't even have to be online all the time since the peers will stay connected autonomously.
-In fact there have been arrests of operators of \ac{p2p} botnets but due to the autonomy offered by the decentralized approach, the botnet keeps communicating~\cite{bib:netlab_mozi}.
+In fact there have been arrests of operators of \ac{p2p} botnets but due to the autonomy offered by the distributed approach, the botnet keeps communicating~\cite{bib:netlab_mozi}.
 Especially worm-like botnets, where each peer tries to find and infect other systems, the network can keep lingering for many years.

 This lack of a \ac{spof} makes \ac{p2p} botnets more resilient to take-down attempts since the communication is not stopped and bot masters can easily rejoin the network and send commands.

-The constantly growing damage produced by botnets has many researchers and law enforcement agencies trying to shut down these operations~\cite{bib:nadji_beheading_2013, bib:nadji_still_2017, bib:dittrich_takeover_2012, bib:fbiTakedown2014}.
-The monetary value of these botnets directly correlates with the amount of effort bot masters are willing to put into implementing defense mechanisms against take-down attempts.
-Some of these countermeasures include deterrence, which limits the number of allowed bots per IP address or subnet to 1; blacklisting, where known crawlers and sensors are blocked from communicating with other bots in the network (mostly IP based); disinformation, when fake bots are placed in the peer lists, which invalidates the data collected by crawlers; and active retaliation like \ac{ddos} attacks against sensors or crawlers~\cite{bib:andriesse_reliable_2015}.
-
-Successful take-downs of a \ac{p2p} botnet requires intricate knowledge over the network topology, protocol characteristics and participating peers.
-This work aims to make the monitoring and information gathering phase more efficient and resilient to detection.
-
 %}}} motivation

+\section{Background}
+
 %{{{ formal model
-\subsection{Formal Model of a \Acs*{p2p} Botnet}
+\subsection{Definition and Formal Model of \Acs*{p2p} Botnets}
+
+Botnets consist of infected computers, so called \textit{bots}, controlled by a \textit{botmaster}.
+Bots can be split into two distinct groups according to their reachability: publicly reachable peers, also known as \textit{superpeers}, and those, that are not (\eg{} because they are behind a \ac{nat} router or firewall).
+In contrast to centralized botnets with a fixed set of \ac{c2} servers, in a \ac{p2p} botnet, every superpeer might take the roll of a \ac{c2} server.
+
+As there is no well known server in a \ac{p2p} botnet, they have to coordinate autonomously.
+This is achieved by connecting the bots among each other.
+Bot \textit{B} is considered a \textit{neighbor} of bot \textit{A}, if \textit{A} knows and connects to \textit{B}.
+Since bots can become unavailable, they have to permanently update their neighbor lists to avoid losing their connection into the botnet.
+This is achieved by periodically querying their neighbor's neighbors.
+This process is known as \textit{\ac{mm}}.
+
+The concept of \textit{churn} describes when a bot becomes unavailable.
+There are two types of churn:
+
+\begin{itemize}
+
+  \item \textit{IP churn}: A bot becomes unreachable because it got a new IP address assigned. The bot is still available but under another address.
+
+  \item \textit{Device churn}: The device is actually offline, \eg{} because the infection was cleaned, it got shut down or lost its internet connection.
+
+\end{itemize}

 A \ac{p2p} botnet can be modelled as a digraph

@ -75,14 +92,14 @@ With the set of vertices \(V\) describing the peers in the network and the set o

 \(G\) is not required to be a connected graph but might consist of multiple disjoint components~\cite{bib:rossow_sok_2013}. Components consisting of peers, that are infected by the same bot, are considered part of the same graph.

-\(\forall v \in V\), the \textbf{predecessors} \(\text{pred}(v)\) and \textbf{successors} \(\text{succ}(v)\) are defined as:
+For a bot \(v \in V\), the \textit{predecessors} (neighbors) \(\text{pred}(v)\) and \textit{successors} \(\text{succ}(v)\) are defined as:

 \begin{align*}
  \text{succ}(v) &= \{ u \in V \mid (u, v) \in E \} \\
  \text{pred}(v) &= \{ u \in V \mid (v, u) \in E \}
 \end{align*}

-The set of edges \(\text{pred}(v)\) is also called the \textbf{peer list} of \(v\).
+The set of edges \(\text{pred}(v)\) is also called the \textit{peer list} of \(v\).
 Those are the nodes, a peer will connect to, to request new commands and other peers.

 For a vertex \(v \in V\), the in and out degree \(\deg^{+}\) and \(\deg^{-}\) describe how many bots know \(v\) or are known by \(v\) respectively.
@ -97,12 +114,12 @@ For a vertex \(v \in V\), the in and out degree \(\deg^{+}\) and \(\deg^{-}\) de
 %}}} formal model

 %{{{ detection techniques
-\subsection{Detection Techniques for \Acs*{p2p} Botnets}
+\subsection{Monitoring Techniques for \Acs*{p2p} Botnets}

 There are two distinct methods to map and get an overview of the network topology of a \ac{p2p} botnet:

 %{{{ passive detection
-\subsubsection{Passive Detection}
+\subsubsection{Passive Monitoring}

 For passive detection, traffic flows are analysed in large amounts of collected network traffic (\eg{} from \acp{isp}).
 This has some advantages in that it is not possible for bot masters to detect or prevent data collection of that kind, but it is not trivial to distinguish valid \ac{p2p} application traffic (\eg{} BitTorrent, Skype, cryptocurrencies, \ldots) from \ac{p2p} bots.
@ -120,28 +137,45 @@ As most detection botnet mechanisms, also the passive ones work by building comm
 \end{itemize}
 \todo{no context}

+Passive monitoring is only mentioned for completeness and not a topic for this thesis.
+
 %}}} passive detection

 %{{{ active detection
-\subsubsection{Active Detection}
+\subsubsection{Active Monitoring}

-In this case, a subset of the botnet protocol are reimplemented to place pseudo-bots or sensors in the network, which will only communicate with other nodes but won't accept or execute commands to perform malicious actions.
-The difference in behaviour from the reference implementation and conspicuous graph properties (\eg{} high \(\deg^{+}\) vs.\ low \(\deg^{-}\)) of these sensors allows bot masters to detect and block the sensor nodes.
+For active detection, a subset of the botnet protocol and behavior is reimplemented to take part in the network.
+To do so, samples of the malware are reverse engineered to unterstand and recreate the protocol.
+This partial implementation includes the communication part of the botnet but ignores the malicious functionality as to not support and take part in illicit activity.
+% The difference in behaviour from the reference implementation and conspicuous graph properties (\eg{} high \(\deg^{+}\) vs.\ low \(\deg^{-}\)) of these sensors allows bot masters to detect and block the sensor nodes.

-There are three subtypes of active detection:
+There are two subtypes of active detection: \textit{sensors} wait to be contacted by other peers, while \textit{crawlers} actively query known bots and recursively ask for their neighbors~\cite{bib:karuppayah_sensorbuster_2017}.
+Obviously crawlers can only detect superpeers and therefore only see a small subset of the network, while sensors are also contacted by peers in private networks and behind firewalls.
+To accurately monitor a \ac{p2p} botnet, a hybrid approach of crawlers and sensors is required.

-\begin{enumerate}
+A crawler starts with a predefined list of known bots, connects to those and uses the peer exchange mechanism to request other bots.
+Each found bot is crawled again, slowly building the graph of superpeers on the way.
+Every entry \textit{E} in the peer exchange response received from bot \textit{A} represents an edge from \textit{A} to \textit{E} in the graph.

-  \item Crawlers: recursively ask known bots for their peer lists
-
-  \item Sensors: implement a subset of the botnet protocol and become part of the network without performing malicious actions
-
-  \item Hybrid of crawlers and sensors
-
-\end{enumerate}
+A sensor implements the passive part of the botnet's \ac{mm}.
+They cannot be used to create the botnet graph (only edges into the sensor node) but are the only way to enumerate the whole network.

 %}}} active detection

+%{{{ anti-monitoring
+\subsubsection{Anti-Monitoring}
+\todo{good title}
+
+The constantly growing damage produced by botnets has many researchers and law enforcement agencies trying to shut down these operations~\cite{bib:nadji_beheading_2013, bib:nadji_still_2017, bib:dittrich_takeover_2012, bib:fbiTakedown2014}.
+The monetary value of these botnets directly correlates with the amount of effort bot masters are willing to put into implementing defense mechanisms against take-down attempts.
+
+Some of these countermeasures are explored by \citeauthor{bib:andriesse_reliable_2015} in \citetitle{bib:andriesse_reliable_2015} and include deterrence, which limits the number of allowed bots per IP address or subnet to 1; blacklisting, where known crawlers and sensors are blocked from communicating with other bots in the network (mostly IP based); disinformation, when fake bots are placed in the peer lists, which invalidates the data collected by crawlers; and active retaliation like \ac{ddos} attacks against sensors or crawlers~\cite{bib:andriesse_reliable_2015}.
+
+Successful take-downs of a \ac{p2p} botnet requires intricate knowledge over the network topology, protocol characteristics and participating peers.
+In this work we try to find ways to make the monitoring and information gathering phase more efficient and resilient to detection.
+
+%}}} anti-monitoring
+
 %}}} detection techniques

 %%{{{ detection criteria
@ -180,7 +214,7 @@ The changes should allow the current sensors to use the new abstraction with as
 The final results should be as general as possible and not depend on any botnet's specific behaviour, but it assumes, that every \ac{p2p} botnet has some kind of \enquote{getPeerList} method in the protocol, that allows other peers to request a list of active nodes to connect to.

 In the current implementation, each crawler will itself visit and monitor each new node it finds.
-The idea for this work is to report newfound nodes back to the \ac{bms} backend first, where the graph of the known network is created, and a fitting worker is selected to archive the goal of the according coordination strategy.
+The idea for this work is to report newfound nodes back to the \ac{bms} backend first, where the graph of the known network is created, and a fitting worker is selected to achieve the goal of the according coordination strategy.
 That sensor will be responsible to monitor the new node.

 If it is not possible, to select a specific sensor so that the monitoring activity stays inconspicuous, the coordinator can do a complete shuffle of all nodes between the sensors to restore the wanted graph properties or warn if more sensors are required to stay undetected.
@ -248,7 +282,7 @@ The following sharding strategy will be investigated:
 \begin{itemize}
  \item Round Robin. See~\autoref{sec:rr}

-  \item Assuming IP addresses are evenly distributed and so are infections, take the IP address as an \SI{32}{\bit} integer modulo \(\abs{C}\). See~\autoref{sec:ip_part}
+  \item Assuming IP addresses are evenly distributed and so are infections, take the IP address as an \SI{32}{\bit} integer modulo \(\abs{C}\). See~\autoref{sec:ipPart}
    Problem: reassignment if a crawler joins or leaves
 \end{itemize}

@ -340,14 +374,14 @@ for _, peer := range peers {
 \todo{reference for wrr}


-\subsubsection{IP-based Partitioning}\label{sec:ip_part}
+\subsubsection{IP-based Partitioning}\label{sec:ipPart}

 The output of cryptographic hash functions is uniformly distributed---even substrings of the calculated hash hold this property.
-Calculating the hash of an IP address and distributing the work with regard to \(\text{hash}(\text{IP}) \mod \abs{C}\) creates about evenly sized buckets for each worker to handle.
-This gives us the mapping \(m(i) = \text{hash}(i) \mod \abs{C}\) to sort peers into buckets.
+Calculating the hash of an IP address and distributing the work with regard to \(H(\text{IP}) \mod \abs{C}\) creates about evenly sized buckets for each worker to handle.
+For any hash function \(H\), this gives us the mapping \(m(i) = H(i) \mod \abs{C}\) to sort peers into buckets.

 Any hash function can be used but since it must be calculated often, a fast function should be used.
-While the \ac{md5} hash function must be considered broken for cryptographic use, it is faster to calculate than hash functions with longer output.
+While the \ac{md5} hash function must be considered broken for cryptographic use~\cite{bib:stevensCollision}, it is faster to calculate than hash functions with longer output.
 For the use case at hand, only the uniform distribution property is required so \ac{md5} can be used without scarifying any kind of security.

 This strategy can also be weighted using the crawlers capabilities by modifying the list of available workers so that a worker can appear multiple times according to its weight.
@ -357,10 +391,11 @@ This strategy can also be weighted using the crawlers capabilities by modifying
  \includegraphics[width=1\linewidth]{./md5_ip_dist.png}
  \caption{Distribution of the lowest byte of \ac{md5} hashes over IPv4}\label{fig:md5IPDist}
 \end{figure}
+\todo{remove this?}

-\ac{md5} returns a \SI{128}{\bit} hash but Go cannot directly work with \SI{128}{\bit} integers.
-It would be possible to implement the modulo operation for arbitrarily sized integers, but the uniform distribution also holds substrings of hashes.
-\autoref{fig:md5IPDist} shows the distribution of the lowest \SI{8}{\bit} for \ac{md5} hashes over all \(2^{32}\) IP addresses in their representation as \SI{32}{\bit} integers.
+\ac{md5} returns a \SI{128}{\bit} hash value.
+The Go standard library includes helpers for arbitrarily sized integers\footnote{\url{https://pkg.go.dev/math/big\#Int}}.
+This helps us in implementing the mapping \(m\) from above.

 By exploiting the even distribution offered by hashing, the work of each crawler is also evenly distributed over all IP subnets, \ac{as} and geolocations.
 This ensures neighboring peers (\eg{} in the same \ac{as}, geolocation or IP subnet) get visited by different crawlers.
--- a/metadata.tex
+++ b/metadata.tex
@ -1,4 +1,4 @@
-\title{Collaborative Crawling of Fully Distributed Botnets}
+\title{Collaborative Monitoring of Fully Distributed Botnets}
 % \title{Centralized Crawling of Decentralized Botnets\\
 % Collaborative Crawling of Decentralized Botnets\\
 % Centralized Crawling of P2P Botnets}
--- a/references/2006_understanding_churn.pdf
+++ b/references/2006_understanding_churn.pdf
--- a/report.tex
+++ b/report.tex
@ -92,6 +92,8 @@ headsepline,
 % line spacing
 \usepackage[onehalfspacing]{setspace}

+% TODO: start new page with each new section
+\AddToHook{cmd/section/before}{\clearpage}
 % hyperlinks
 \usepackage[pdftex,colorlinks=false]{hyperref}

@ -107,7 +109,7 @@ headsepline,

 \graphicspath{{assets/}}

-\setcounter{tocdepth}{2}
+% \setcounter{tocdepth}{2}

 \begin{document}

@ -117,6 +119,10 @@ headsepline,

 \tableofcontents

+\clearpage{}
+
+\listoftodos{}
+
 \include{content}

 \cleardoublepage{}