This commit is contained in:
Valentin Brandl 2022-04-25 21:46:41 +02:00
parent 8e03043f8b
commit 7f285cd0ab
4 changed files with 22 additions and 21 deletions

View File

@ -54,8 +54,6 @@ Taking down a \ac{p2p} botnet requires intricate knowledge of the botnet's chara
Just like for centralized and decentralized botnets, to take down a \ac{p2p} botnet, the \ac{c2} channel needs to be identified and disrupted. Just like for centralized and decentralized botnets, to take down a \ac{p2p} botnet, the \ac{c2} channel needs to be identified and disrupted.
By \emph{monitoring} peer activity of known participants in the botnet, this knowledge can be obtained and used to find attack vectors in the botnet protocol. By \emph{monitoring} peer activity of known participants in the botnet, this knowledge can be obtained and used to find attack vectors in the botnet protocol.
\todo{few words about monitoring}
In this work, we will show how a collaborative system of crawlers and sensors can make the monitoring and information gathering phase of a \ac{p2p} botnet more efficient, resilient to detection and how collaborative monitoring can help circumvent anti-monitoring techniques. In this work, we will show how a collaborative system of crawlers and sensors can make the monitoring and information gathering phase of a \ac{p2p} botnet more efficient, resilient to detection and how collaborative monitoring can help circumvent anti-monitoring techniques.
%}}} introduction %}}} introduction
@ -233,7 +231,6 @@ They depend on suspicious graph properties to enumerate candidate peers~\cite{bi
\Ac{bms} is intended for a hybrid active approach of crawlers and sensors (reimplementations of the \ac{p2p} protocol of a botnet, that won't perform malicious actions) to collect live data from active botnets. \Ac{bms} is intended for a hybrid active approach of crawlers and sensors (reimplementations of the \ac{p2p} protocol of a botnet, that won't perform malicious actions) to collect live data from active botnets.
In an earlier project, we implemented different graph ranking algorithms---among others \emph{PageRank}~\cite{bib:page_pagerank_1998} and \emph{SensorRank}---to detect sensor candidates in a botnet, as described in \citetitle{bib:karuppayah_sensorbuster_2017}. In an earlier project, we implemented different graph ranking algorithms---among others \emph{PageRank}~\cite{bib:page_pagerank_1998} and \emph{SensorRank}---to detect sensor candidates in a botnet, as described in \citetitle{bib:karuppayah_sensorbuster_2017}.
In an earlier project, we implemented the ranking algorithms described in \citetitle{bib:karuppayah_sensorbuster_2017} for \ac{bms}.
%%{{{ detection criteria %%{{{ detection criteria
%\subsection{Detection Criteria} %\subsection{Detection Criteria}
@ -366,7 +363,7 @@ To keep the distribution as even as possible, we keep track of the last crawler
For the sake of simplicity, only the bandwidth will be considered as a capability but it can be extended by any shared property between the crawlers, \eg{} available memory or processing power. For the sake of simplicity, only the bandwidth will be considered as a capability but it can be extended by any shared property between the crawlers, \eg{} available memory or processing power.
For a given crawler \(c_i \in C\) let \(cap(c_i)\) be the capability of the crawler. For a given crawler \(c_i \in C\) let \(cap(c_i)\) be the capability of the crawler.
The total available capability is \(B = \sum\limits_{c \in C} cap(c)\). The total available capability is \(B = \sum\limits_{c \in C} cap(c)\).
With \(G\) being the greatest common divisor of all the crawler's capabilities, the weight \(W(c_i) = \frac{cap(c_i)}{G}\). With \(G\) being the greatest common divisor of all the crawler's capabilities, the weight of a crawler is \(W(c_i) = \frac{cap(c_i)}{G}\).
\(\frac{cap(c_i)}{B}\) gives us the percentage of the work a crawler is assigned. \(\frac{cap(c_i)}{B}\) gives us the percentage of the work a crawler is assigned.
% The set of target peers \(P = <p_0, p_1, \ldots, p_{n-1}>\), is partitioned into \(|C|\) subsets according to \(W(c_i)\) and each subset is assigned to its crawler \(c_i\). % The set of target peers \(P = <p_0, p_1, \ldots, p_{n-1}>\), is partitioned into \(|C|\) subsets according to \(W(c_i)\) and each subset is assigned to its crawler \(c_i\).
% The mapping \mintinline{go}{gcd(C)} is the greatest common divisor of all peers in \mintinline{go}{C}, \(\text{maxWeight}(C) = \max \{ \forall c \in C : W(c) \}\). % The mapping \mintinline{go}{gcd(C)} is the greatest common divisor of all peers in \mintinline{go}{C}, \(\text{maxWeight}(C) = \max \{ \forall c \in C : W(c) \}\).
@ -421,7 +418,8 @@ Given the hash function \(H\), calculating the hash of an IP address and distrib
This gives us the mapping \(m(i) = H(i) \mod \abs{C}\) to sort peers into buckets. This gives us the mapping \(m(i) = H(i) \mod \abs{C}\) to sort peers into buckets.
Any hash function can be used but since it must be calculated often, a fast function should be used. Any hash function can be used but since it must be calculated often, a fast function should be used.
While the \ac{md5} hash function must be considered broken for cryptographic use~\cite{bib:stevensCollision}, it is faster to calculate than hash functions with longer output.\todo{md5 crypto broken, distribution not?} While the \ac{md5} hash function must be considered broken for cryptographic use~\cite{bib:stevensCollision}, it is faster to calculate than hash functions with longer output.
Collisions for \ac{md5} have been found but collision resistance is not required.
For the use case at hand, only the uniform distribution property is required so \ac{md5} can be used without scarifying any kind of security. For the use case at hand, only the uniform distribution property is required so \ac{md5} can be used without scarifying any kind of security.
This strategy can also be weighted using the crawlers' capabilities by modifying the list of available workers so that a worker can appear multiple times according to its weight. This strategy can also be weighted using the crawlers' capabilities by modifying the list of available workers so that a worker can appear multiple times according to its weight.
@ -554,7 +552,7 @@ While the effective frequency of the whole system is halved compared to~\Fref{fi
\subsection{Creating and Reducing Edges for Sensors} \subsection{Creating and Reducing Edges for Sensors}
\citetitle*{bib:karuppayah_sensorbuster_2017} describes different graph metrics to find sensors in \ac{p2p} botnets. \citetitle*{bib:karuppayah_sensorbuster_2017} describes different graph metrics to find sensors in \ac{p2p} botnets.
These metrics depend on the uneven ratio between incoming and outgoing edges for crawlers. These metrics depend on the uneven ratio between incoming and outgoing edges for sensors.
The \emph{SensorBuster} metric uses \acp{wcc} since naive sensors don't have any edges back to the main network in the graph. The \emph{SensorBuster} metric uses \acp{wcc} since naive sensors don't have any edges back to the main network in the graph.
@ -609,7 +607,7 @@ The following candidates to place on the neighbor list will be investigated:
\textbf{Other Sensors:} Returning all the other sensors when responding to peer list requests, thereby effectively creating a complete graph \(K_{\abs{C}}\) among the workers, creates valid outgoing edges. \textbf{Other Sensors:} Returning all the other sensors when responding to peer list requests, thereby effectively creating a complete graph \(K_{\abs{C}}\) among the workers, creates valid outgoing edges.
The resulting graph will still form a \ac{wcc} with now edges back into the main network. The resulting graph will still form a \ac{wcc} with now edges back into the main network.
Building a complete graph \(G_C = K_{\abs{C}}\) between the sensors by making them return the other known worker on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\Fref{fig:sensorbuster2} and~\Fref{tab:metricsTable}).\todo{where?} Building a complete graph \(G_C = K_{\abs{C}}\) between the sensors by making them return the other known worker on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\Fref{fig:sensorbuster2} and~\Fref{tab:metricsTable}).
%{{{ churned peers %{{{ churned peers
@ -906,7 +904,7 @@ This is good enough for balancing the tasks among workers.
%{{{ eval redu requ freq %{{{ eval redu requ freq
\subsection{Reduction of Request Frequency} \subsection{Reduction of Request Frequency}
To evaluate the request frequency optimization described in \Fref{sec:stratRedReqFreq}, crawl a simulated peer and check if the requests are evenly distributed and how big the deviation from the theoretically optimal result is. To evaluate the request frequency optimization described in \Fref{sec:stratRedReqFreq}, we crawl a simulated peer and check if the requests are evenly distributed and how big the deviation from the theoretically optimal result is.
To get more realistic results, the crawlers and simulated peer are running on different machines so they are not within the same LAN\@. To get more realistic results, the crawlers and simulated peer are running on different machines so they are not within the same LAN\@.
We use the same parameters as in the example above: We use the same parameters as in the example above:
@ -999,7 +997,7 @@ With this experiment, we try to estimate the impact of the latency.
\caption{Average deviation per crawler}\label{tab:perCralwerDeviation} \caption{Average deviation per crawler}\label{tab:perCralwerDeviation}
\end{table} \end{table}
The monitored peer crawler \emph{c0} are located in Falkenstein, Germany, \emph{c1} in Nurnberg, Germany, \emph{c2} is in Helsinki, Finland and \emph{c3} in Ashburn, USA, to have some geographic distribution. The monitored peer and crawler \emph{c0} are located in Falkenstein, Germany, \emph{c1} in Nurnberg, Germany, \emph{c2} is in Helsinki, Finland and \emph{c3} in Ashburn, USA, to have some geographic distribution.
The average deviation per crawler is below \SI{0.002}{\second} even with some outliers due to network latency or server load. The average deviation per crawler is below \SI{0.002}{\second} even with some outliers due to network latency or server load.
The crawler \emph{c3} in the experiment is the furthest away from the monitored host therefore the larger derivation due to network latency is expected. The crawler \emph{c3} in the experiment is the furthest away from the monitored host therefore the larger derivation due to network latency is expected.
@ -1090,7 +1088,7 @@ SensorBuster relies on the assumption that sensors don't have any outgoing edges
For the \ac{wcc} metric, it is obvious that even a single edge back into the main network is enough to connect the sensor back to the main graph and therefore beat this metric. For the \ac{wcc} metric, it is obvious that even a single edge back into the main network is enough to connect the sensor back to the main graph and therefore beat this metric.
\subsubsection{Effectiveness against Page- and SensorRank} \subsection{Reducing Incoming Edges to Reduce Page- and SensorRank}
In this section, we will evaluate how adding outgoing edges to a sensor impacts its PageRank and SensorRank values. In this section, we will evaluate how adding outgoing edges to a sensor impacts its PageRank and SensorRank values.
Before doing so, we will check the impact of the initial rank by calculating it with different initial values and comparing the value distribution of the result. Before doing so, we will check the impact of the initial rank by calculating it with different initial values and comparing the value distribution of the result.
@ -1268,11 +1266,11 @@ Experiments were performed, in which the incoming edges for the known sensor are
\end{figure} \end{figure}
\end{landscape} \end{landscape}
\Fref{fig:pr0} and \Fref{fig:sr0} show the situation on the base truth without modifications. The graphs with 0 removed edges show the situation on the base truth without modifications.
We can see in \Fref{fig:prFiltered} and \Fref{fig:srFiltered}, that we have to reduce the incoming edges by \SI{20}{\percent} and \SI{30}{\percent} respectively to get average values for SensorRank and PageRank. We can see in \Fref{fig:prFiltered} and \Fref{fig:srFiltered}, that we have to reduce the incoming edges by \SI{20}{\percent} and \SI{30}{\percent} respectively to get average values for SensorRank and PageRank.
This also means that the number of incoming edges for a sensor must be about the same as the average about of incoming edges as can be seen in \Fref{fig:in3}. This also means that the number of incoming edges for a sensor must be about the same as the average about of incoming edges.
Depending on the protocol details of the botnet (\eg{} how many incoming edges are allowed per peer), this means that a large amount of sensors is needed if we want to monitor the whole network. Depending on the protocol details of the botnet (\eg{} how many incoming edges are allowed per peer), this means that a large amount of sensors is needed if we want to monitor the whole network.
@ -1343,7 +1341,7 @@ The server-side part of the system consists of a \ac{grpc} server to handle the
\section{Conclusion} \section{Conclusion}
Collaborative monitoring of \ac{p2p} botnets allows circumventing some anti-monitoring efforts. Collaborative monitoring of \ac{p2p} botnets allows circumventing some anti-monitoring efforts.
It also enables more effective monitoring systems for larger botnets, since each peer can be visited by only one crawler. We were able to show, that it also enables more effective monitoring systems for larger botnets, since each peer can be visited by only one crawler.
The current concept of independent crawlers in \ac{bms} can also use multiple workers but there is no way to ensure a peer is not watched by multiple crawlers thereby using unnecessary resources. The current concept of independent crawlers in \ac{bms} can also use multiple workers but there is no way to ensure a peer is not watched by multiple crawlers thereby using unnecessary resources.
We were able to show, that a collaborative monitoring approach for \ac{p2p} botnets helps to circumvent anti-monitoring and monitoring detection mechanisms and is helpful to improve resource usage when monitoring large botnets. We were able to show, that a collaborative monitoring approach for \ac{p2p} botnets helps to circumvent anti-monitoring and monitoring detection mechanisms and is helpful to improve resource usage when monitoring large botnets.
@ -1363,15 +1361,16 @@ This might bring some performance issues to light which can be solved by investi
Another way to expand on this work is automatically scaling the available crawlers up and down, depending on the botnet size and the number of concurrently online peers. Another way to expand on this work is automatically scaling the available crawlers up and down, depending on the botnet size and the number of concurrently online peers.
Doing so would allow a constant crawl interval for even highly volatile botnets. Doing so would allow a constant crawl interval for even highly volatile botnets.
Autoscaling features offered by many cloud-computing providers can be evaluated to automatically add or remove crawlers based on the monitoring load, a botnet's size, and the number of active peers.
This should also allow the creation of workers with new IP addresses in different geolocations in a fast, easy and automated way.
This also requires investigating hosting providers which allow botnet crawling by their terms of use.
The current backend implementation assumes an immutable set of crawlers.
For autoscaling to work, efficient reassignment of peers has to be implemented to account for added or removed workers.
Placing churned peers or peers with suspicious network activity (those behind carrier-grade \acp{nat}) might just offer another characteristic to flag sensors in a botnet. Placing churned peers or peers with suspicious network activity (those behind carrier-grade \acp{nat}) might just offer another characteristic to flag sensors in a botnet.
The feasibility of this approach should be investigated and maybe there are ways to mitigate this problem. The feasibility of this approach should be investigated and maybe there are ways to mitigate this problem.
Autoscaling features offered by many cloud-computing providers can be evaluated to automatically add or remove crawlers based on the monitoring load, a botnet's size, and the number of active peers.
This should also allow the creation of workers with new IP addresses in different geolocations in a fast, easy and automated way.
The current implementation assumes an immutable set of crawlers.
For autoscaling to work, efficient reassignment of peers has to be implemented to account for added or removed workers.
%}}} further work %}}} further work
%{{{ acknowledgments %{{{ acknowledgments

View File

@ -12,10 +12,12 @@
\studyprogramme{Master Informatik} \studyprogramme{Master Informatik}
%\startingdate{1.\,November 2088} %\startingdate{1.\,November 2088}
%\closingdate{11.\,Dezember 2089} %\closingdate{11.\,Dezember 2089}
\startingdate{2021-12-01}
\closingdate{2022-05-01}
\firstadvisor{Prof.\ Dr.\ Christoph Skornia} \firstadvisor{Prof.\ Dr.\ Christoph Skornia}
\secondadvisor{Prof.\ Dr.\ Thomas Waas} \secondadvisor{Prof.\ Dr.\ Thomas Waas}
%\externaladvisor{Dr. Klara Endlos} \externaladvisor{Leon Böck}
\date{\today} \date{\today}
% \date{} % \date{}

Binary file not shown.

View File

@ -107,7 +107,7 @@ headsepline,
\usepackage[pdftex,colorlinks=false]{hyperref} \usepackage[pdftex,colorlinks=false]{hyperref}
% make overfull hbox warnings prominently visible in document % make overfull hbox warnings prominently visible in document
\overfullrule=2cm % \overfullrule=2cm
\pagestyle{headings} \pagestyle{headings}
@ -134,7 +134,7 @@ headsepline,
\clearpage{} \clearpage{}
\listoftodos{} % \listoftodos{}
\include{content} \include{content}