The number of connected \ac{iot} devices is around 10 billion in 2021 and estimated to be constantly growing over the next years up to 25 billion in 2030~\cite{statista_iot_2020}.
Many of these devices run on outdated software, don't receive any updates and don't follow general security best practices.
While in 2016 only 77\% of German households had a broadband connection with a bandwidth of 50 Mbit/s or more, in 2020 it were already 95\% with more than 50 Mbit/s and 59\% with at least 1000 Mbit/s~\cite{statista_broadband_2021}.
This makes them an attractive target for botmasters since they are easy to infect, always online, behind internet connections that are getting faster and faster, and due to their nature as small devices, often without any direct user interaction, an infection can go unnoticed for a long time.
In recent years, \ac{iot} botnets have been responsible for some of the biggest \ac{ddos} attacks ever recorded, creating up to 1 Tbit/s of traffic~\cite{ars_ddos_2016}.
In classic botnets, there are one or more central coordinating hosts called \ac{c2} servers.
These \ac{c2} servers could use anything from \ac{irc} over \ac{http} to Twitter as communication channel with the infected systems.
The infected systems can be abused for a number of things, \eg{}\ac{ddos} attacks, stealing data from victims, as proxies to hide the attackers identity, send spam emails\dots{}
Analyzing and shutting down a centralized botnet is comparatively easily since every bot knows the IP address, domain name, Twitter handle or \ac{irc} channel the \ac{c2} servers are using.
A targeted operation with help from law enforcement, hosting providers, domain registrars and platform providers could shut down or take over the operation by changing how requests are rooted or simply shutting down the controlling servers/accounts.
A number of botnet operations were shut down like this and as the defenders upped their game, so did attackers --- the idea of \ac{p2p} botnets came up.
In a \ac{p2p} botnet, each node in the network knows a number of it's neighbours and connects to those, each of these neighbours has a list of neighbours on his own, and so on.
This lack of a \ac{spof} makes \ac{p2p} botnets more resilient to take-down attempts since the communication is not stopped and botmasters can easily rejoin the network and send commands.
With the set of vertices \(V\) describing the bots in the network and the set of edges \(E\) describing the \enquote{is neighbour of} relationships between bots.
For a vertex \(v \in V\), the in degree \(\deg^{+}(v)=\abs{\{ u \in V \mid(u, v)\in E \}}\) and out degree \(\deg^{-}(v)=\abs{\{ u \in V \mid(v, u)\in E \}}\) describe how many bots know \(v\) and how many nodes \(v\) knows respectively.
% TODO: source for constantly growing, position in text
% TODO: take-down? take down?
The damage produced by botnets has been constantly growing and there are many researchers and law enforcement agencies trying to shut down these operations.
The monetary value of these botnets directly correlates with the amount of effort, botmasters are willing to put into implementing defense mechanisms against take-down attempts.
Some of these countermeasures include deterrence, which limits the amount of allowed bots per IP address or subnet to 1; blacklisting, where known crawlers and sensors are blocked from communicating with other bots in the network (mostly IP based); disinformation, when fake bots are placed in the neighbourhood lists, which invalidates the data collected by crawlers; and active retaliation like \ac{ddos} attacks against sensors or crawlers~\cite{andriesse_reliable_2015}.
For passive detection, traffic flows are analyzed in large amounts of collected network traffic (\eg{} from \acp{isp}).
This has some advantages in that it is not possible for botmasters to detect or prevent data collection of that kind but it is not trivial to distinguish valid \ac{p2p} application traffic (\eg{} BitTorrent, Skype, cryptocurrencies, \ldots) from \ac{p2p} bots.
\citeauthor{zhang_building_2014} propose a system of statistical analysis to solve some of these problems in~\cite{zhang_building_2014}.
Also getting access to the required datasets might not be possible for everyone.
\item Large scale network analysis (hard to differentiate from legitimate \ac{p2p} traffic (\eg{} BitTorrent), hard to get data, knowledge of some known bots required)~\cite{zhang_building_2014}
% TODO: BotMiner (in zhang_building_2014)
\item Heuristics: Same traffic patterns, same malicious behaviour
In this case, a subset of the botnet protocol are reimplemented to place pseudo-bots or sensors in the network, which will only communicate with other nodes but won't accept or execute commands to perform malicious actions.
The difference in behaviour from the reference implementation and conspicuous graph properties (\eg{} high \(\deg^{+}\) vs.\ low \(\deg^{-}\)) of these sensors allows botmasters to detect and block the sensor nodes.
The implementation of the concepts of this work will be done as part of \ac{bms}\footnotemark, a monitoring platform for \ac{p2p} botnets described by \citeauthor{bock_poster_2019} in~\cite{bock_poster_2019}.
\Ac{bms} uses a hybrid active approach of crawlers and sensors (reimplementations of the \ac{p2p} protocol of a botnet, that won't perform malicious actions) to collect live data from active botnets.
In an earlier project, I implemented different node ranking algorithms (among others \enquote{PageRank}~\cite{page_pagerank_1998}) to detect sensors and crawlers in a botnet, as described in \citetitle{karuppayah_sensorbuster_2017}.
The goal of this work is to complicate detection mechanisms like this for botmasters, by centralizing the coordination of the system's crawlers and sensors, thereby reducing the node's rank for specific graph metrics.
The final result should be as general as possible and not depend on any botnet's specific behaviour but it assumes, that every \ac{p2p} botnet has some kind of \enquote{getNeighbourList} method in the protocol, that allows other peers to request a list of active nodes to connect to.
In the current implementation, each sensor will itself visit and monitor each new node it finds.
The idea for this work is to report newfound nodes back to the \ac{bms} backend first, where the graph of the known network is created and a sensor is selected, so that the specific ranking algorithm doesn't calculate to a suspiciously high or low value.
That sensor will be responsible to monitor the new node.
If it is not possible, to select a specific sensor so that the monitoring activity stays inconspicuous, the coordinator can do a complete shuffle of all nodes between the sensors to restore the wanted graph properties or warn if more sensors are required to stay undetected.
The improved sensor system should allow new sensors to register themselves and their capabilities (\eg{} bandwidth, geolocation, ), so the amount of work can be scaled accordingly between hosts.
Further work might even consider autoscaling the monitoring activity using some kind of cloudcomputing provider.
To validate the result, the old sensor implementation will be compared to the new system using different graph metrics.
% TODO: maybe?
If time allows, \ac{bsf}\footnotemark{} will be used to simulate a botnet place sensors in the simulated network and measure the improvement archived by the coordinated monitoring effort.
\item\mintinline{go}{startCrawling(targets)}: Start crawling a batch of nodes for a specified time or until stopped, with \mintinline{go}{targets} being a list of targets and each target consists of a botnet identifier, IP address, port, bot identifier, how long and how often this bot should be monitored
\item\mintinline{go}{stopCrawling(targets)}: Stop crawling a batch of nodes