Content
@ -1,7 +1,8 @@
|
||||
\begin{abstract}
|
||||
Botnets pose a huge risk to general internet infrastructure and services.
|
||||
Distributed \Acs*{p2p} topologies make it harder to detect and take those botnets offline.
|
||||
Distributed \Acs*{p2p} topologies make those botnets harder to detect, and more resilient to take-down attempts.
|
||||
To take a \ac{p2p} botnet down, it has to be monitored to estimate the size and learn about the network topology.
|
||||
% Monitoring requires some kind of participation in the network to
|
||||
With the growing damage and monetary value produced by such botnets, ideas emerged on how to detect and prevent monitoring activity in the network.
|
||||
This work explores ways to make monitoring of fully distributed botnets more efficient, resilient, and harder to detect, by using a collaborative, coordinated approach.
|
||||
Further, we show how the coordinated approach helps in circumventing anti-monitoring techniques deployed by botnets.
|
||||
|
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 26 KiB |
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 39 KiB |
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 40 KiB |
@ -1,5 +1,6 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import numpy as np
|
||||
import statistics
|
||||
from collections import defaultdict
|
||||
from typing import Dict
|
||||
@ -26,7 +27,8 @@ def plot_devi(data: Dict[datetime, str]):
|
||||
# c = 0
|
||||
per_diff = defaultdict(list)
|
||||
for prev, next in zip(sor, sor[1:]):
|
||||
diff = abs(2.5 - (next[0].timestamp() - prev[0].timestamp()))
|
||||
# diff = abs(2.5 - (next[0].timestamp() - prev[0].timestamp()))
|
||||
diff = ((next[0].timestamp() - prev[0].timestamp()) - 2.5)
|
||||
diffs.append(diff)
|
||||
per_crawler[prev[1]].append(prev[0])
|
||||
per_diff[prev[1]].append(diff)
|
||||
@ -72,16 +74,20 @@ def plot_devi(data: Dict[datetime, str]):
|
||||
t = per_crawler[c]
|
||||
devi = []
|
||||
for pre, nex in zip(t, t[1:]):
|
||||
devi.append(abs(10 - (nex.timestamp() - pre.timestamp())))
|
||||
x = [10 * x for x in range(len(devi))]
|
||||
# devi.append(abs(10 - (nex.timestamp() - pre.timestamp())))
|
||||
devi.append(((nex.timestamp() - pre.timestamp()) - 10))
|
||||
x = np.array([10 * x for x in range(len(devi))])
|
||||
devi = np.array(devi)
|
||||
fig, ax = plt.subplots()
|
||||
ax.scatter(x, devi, s=10)
|
||||
m, b = np.polyfit(x, devi, 1)
|
||||
plt.plot(x, m*x+b, color='red')
|
||||
ax.set_title(f'Timedeviation for {c}')
|
||||
ax.set_xlabel('Time passed in seconds')
|
||||
ax.set_ylabel('Deviation in seconds')
|
||||
plt.savefig(f'./time_devi_{c}.png')
|
||||
plt.close()
|
||||
print(f'{c}: {statistics.mean(devi)}')
|
||||
print(f'{c} & \\num{{{statistics.mean(devi)}}} \\\\')
|
||||
# for ts in per_crawler[c]:
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 29 KiB After Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 26 KiB |
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 39 KiB |
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 40 KiB |
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 43 KiB |
54
content.tex
@ -361,7 +361,7 @@ Load balancing allows scaling out, which can be more cost-effective.
|
||||
|
||||
\subsubsection{Round Robin Distribution}\label{sec:rr}
|
||||
|
||||
This strategy distributes work evenly among crawlers by either naively assigning tasks to the crawlers rotationally or weighted according to their capabilities\todo{1 -- 2 sentences about naive rr?}.
|
||||
This strategy distributes work evenly among crawlers by either naively assigning tasks to the crawlers rotationally or weighted according to their capabilities.
|
||||
To keep the distribution as even as possible, we keep track of the last crawler a task was assigned to and start with the next in line in the subsequent round of assignments.
|
||||
For the sake of simplicity, only the bandwidth will be considered as a capability but it can be extended by any shared property between the crawlers, \eg{} available memory or processing power.
|
||||
For a given crawler \(c_i \in C\) let \(cap(c_i)\) be the capability of the crawler.
|
||||
@ -551,13 +551,12 @@ While the effective frequency of the whole system is halved compared to~\Fref{fi
|
||||
%}}} frequency reduction
|
||||
|
||||
%{{{ against graph metrics
|
||||
\subsection{Creating Edges for Crawlers and Sensors}
|
||||
\subsection{Creating and Reducing Edges for Sensors}
|
||||
|
||||
\citetitle*{bib:karuppayah_sensorbuster_2017} describes different graph metrics to find sensors in \ac{p2p} botnets.
|
||||
These metrics depend on the uneven ratio between incoming and outgoing edges for crawlers.
|
||||
The \emph{SensorBuster} metric uses \acp{wcc} since naive sensors don't have any edges back to the main network in the graph.
|
||||
|
||||
Building a complete graph \(G_C = K_{\abs{C}}\) between the sensors and crawlers by making them return the other known worker on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\Fref{fig:sensorbuster2} and~\Fref{tab:metricsTable}).
|
||||
|
||||
With \(v \in V\), \(\text{succ}(v)\) being the set of successors of \(v\) and \(\text{pred}(v)\) being the set of predecessors of \(v\), \emph{PageRank} is recursively defined as~\cite{bib:page_pagerank_1998}:
|
||||
|
||||
@ -605,17 +604,19 @@ The following candidates to place on the neighbor list will be investigated:
|
||||
% Knowledge of only \num{90} peers leaving due to IP rotation would be enough to make a crawler look average in Sality\todo{repeat analysis, actual number}.
|
||||
% This number will differ between different botnets, depending on implementation details and size of the network\todo{upper limit for NL size as impl detail}.
|
||||
|
||||
\subsubsection{Other Sensors or Crawlers}
|
||||
% \subsubsection{Other Sensors or Crawlers}
|
||||
|
||||
Returning all the other sensors when responding to peer list requests, thereby effectively creating a complete graph \(K_{\abs{C}}\) among the workers, creates valid outgoing edges.
|
||||
\textbf{Other Sensors:} Returning all the other sensors when responding to peer list requests, thereby effectively creating a complete graph \(K_{\abs{C}}\) among the workers, creates valid outgoing edges.
|
||||
The resulting graph will still form a \ac{wcc} with now edges back into the main network.
|
||||
|
||||
Building a complete graph \(G_C = K_{\abs{C}}\) between the sensors by making them return the other known worker on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\Fref{fig:sensorbuster2} and~\Fref{tab:metricsTable}).\todo{where?}
|
||||
|
||||
|
||||
%{{{ churned peers
|
||||
\subsubsection{Churned Peers After IP Rotation}
|
||||
% \subsubsection{Churned Peers After IP Rotation}
|
||||
|
||||
Churn describes the dynamics of peer participation in \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.\todo{übergang}
|
||||
Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's peer list.\todo{what is an AS}
|
||||
\textbf{Churned peers after IP rotation:} Churn describes the dynamics of peer participation in \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.
|
||||
Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's peer list.
|
||||
If the timing of the churn event correlates with IP rotation in the \ac{as}, it can be assumed, that the peer left due to being assigned a new IP address---not due to connectivity issues or going offline---and will not return using the same IP address.
|
||||
These peers, when placed in the peer list of the crawlers, will introduce paths back into the main network and defeat the \ac{wcc} metric.
|
||||
It also helps with the PageRank and SensorRank metrics since the crawlers start to look like regular peers without actually supporting the network by relaying messages or propagating active peers.
|
||||
@ -623,15 +624,15 @@ It also helps with the PageRank and SensorRank metrics since the crawlers start
|
||||
%}}} churned peers
|
||||
|
||||
%{{{ cg nat
|
||||
\subsubsection{Peers Behind Carrier-Grade \acs*{nat}}
|
||||
% \subsubsection{Peers Behind Carrier-Grade \acs*{nat}}
|
||||
|
||||
Some peers show behavior, where their IP address changes almost after every request.
|
||||
\textbf{Peers behind carrier-grade \acs{nat}:} Some peers show behavior, where their IP address changes almost after every request.
|
||||
Those peers can be used as fake neighbors and create valid-looking outgoing edges for the sensor.
|
||||
|
||||
%}}} cg nat
|
||||
|
||||
\clearpage{}
|
||||
\todo{clearpage?}
|
||||
% \clearpage{}
|
||||
% \todo{clearpage?}
|
||||
In theory, it would be possible to detect churned peers or peers behind carrier-grade \acs{nat}, without coordinating the sensors but the coordination gives us a few advantages:
|
||||
|
||||
\begin{itemize}
|
||||
@ -956,11 +957,11 @@ The ideal distribution would be \SI{2.5}{\second} between every two events.
|
||||
Due to network latency and load from crawling other peers, we expect the actual result to deviate from the optimal value over time.
|
||||
With this experiment, we try to estimate the impact of the latency.
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{time_devi.png}
|
||||
\caption{Deviation from the expected interval}\label{fig:timeDevi}
|
||||
\end{figure}
|
||||
% \begin{figure}[H]
|
||||
% \centering
|
||||
% \includegraphics[width=1\linewidth]{time_devi.png}
|
||||
% \caption{Deviation from the expected interval}\label{fig:timeDevi}
|
||||
% \end{figure}
|
||||
|
||||
\begin{landscape}
|
||||
\begin{figure}[H]
|
||||
@ -986,21 +987,24 @@ With this experiment, we try to estimate the impact of the latency.
|
||||
\end{figure}
|
||||
\end{landscape}
|
||||
|
||||
The deviation between crawl events per crawler is below \SI{0.01}{\second} most of the time, with occasional outliers due to network latency or server load.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\begin{tabular}{rr}
|
||||
\begin{tabular}{rS}
|
||||
\textbf{Crawler} & \textbf{Average Deviation} \\
|
||||
c0 & \num{0.0005927812081134085} \\
|
||||
c1 & \num{0.0003700713297978895} \\
|
||||
c2 & \num{0.0006121075253902246} \\
|
||||
c3 & \num{0.0020807891511268814} \\
|
||||
c0 & \num{0.0003166149207321755} \\
|
||||
c1 & \num{0.0002065727194268201} \\
|
||||
c2 & \num{0.0003075813840032066} \\
|
||||
c3 & \num{0.0038056359425696364} \\
|
||||
\end{tabular}
|
||||
\caption{Average deviation per crawler}\label{tab:perCralwerDeviation}
|
||||
\end{table}
|
||||
|
||||
The average deviation per crawler is below \SI{0.002}{\second} even with some huge outliers. In general it is below \SI{0.0007}{\second}, which is a surprisingly accurate result.
|
||||
The monitored peer crawler \emph{c0} are located in Falkenstein, Germany, \emph{c1} in Nurnberg, Germany, \emph{c2} is in Helsinki, Finland and \emph{c3} in Ashburn, USA, to have some geographic distribution.
|
||||
|
||||
The average deviation per crawler is below \SI{0.002}{\second} even with some outliers due to network latency or server load.
|
||||
The crawler \emph{c3} in the experiment is the furthest away from the monitored host therefore the larger derivation due to network latency is expected.
|
||||
|
||||
% In general it is below \SI{0.0007}{\second}, which is a surprisingly accurate result.
|
||||
In real-world scenarios, crawlers will monitor more than a single peer and the scheduling is expected to be less accurate.
|
||||
Still, the deviation will always stay below the effective frequency \(f\), because after exceeding \(f\), a crawler is overtaken by the next in line.
|
||||
The impact of the deviation when crawling real-world botnets has to be investigated and if it shows to be a problem, the tasks have to be rescheduled periodically to prevent this from happening.
|
||||
|