Content
@ -1,7 +1,8 @@
|
|||||||
\begin{abstract}
|
\begin{abstract}
|
||||||
Botnets pose a huge risk to general internet infrastructure and services.
|
Botnets pose a huge risk to general internet infrastructure and services.
|
||||||
Distributed \Acs*{p2p} topologies make it harder to detect and take those botnets offline.
|
Distributed \Acs*{p2p} topologies make those botnets harder to detect, and more resilient to take-down attempts.
|
||||||
To take a \ac{p2p} botnet down, it has to be monitored to estimate the size and learn about the network topology.
|
To take a \ac{p2p} botnet down, it has to be monitored to estimate the size and learn about the network topology.
|
||||||
|
% Monitoring requires some kind of participation in the network to
|
||||||
With the growing damage and monetary value produced by such botnets, ideas emerged on how to detect and prevent monitoring activity in the network.
|
With the growing damage and monetary value produced by such botnets, ideas emerged on how to detect and prevent monitoring activity in the network.
|
||||||
This work explores ways to make monitoring of fully distributed botnets more efficient, resilient, and harder to detect, by using a collaborative, coordinated approach.
|
This work explores ways to make monitoring of fully distributed botnets more efficient, resilient, and harder to detect, by using a collaborative, coordinated approach.
|
||||||
Further, we show how the coordinated approach helps in circumventing anti-monitoring techniques deployed by botnets.
|
Further, we show how the coordinated approach helps in circumventing anti-monitoring techniques deployed by botnets.
|
||||||
|
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 26 KiB |
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 39 KiB |
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 40 KiB |
@ -1,5 +1,6 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
import statistics
|
import statistics
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
from typing import Dict
|
from typing import Dict
|
||||||
@ -26,7 +27,8 @@ def plot_devi(data: Dict[datetime, str]):
|
|||||||
# c = 0
|
# c = 0
|
||||||
per_diff = defaultdict(list)
|
per_diff = defaultdict(list)
|
||||||
for prev, next in zip(sor, sor[1:]):
|
for prev, next in zip(sor, sor[1:]):
|
||||||
diff = abs(2.5 - (next[0].timestamp() - prev[0].timestamp()))
|
# diff = abs(2.5 - (next[0].timestamp() - prev[0].timestamp()))
|
||||||
|
diff = ((next[0].timestamp() - prev[0].timestamp()) - 2.5)
|
||||||
diffs.append(diff)
|
diffs.append(diff)
|
||||||
per_crawler[prev[1]].append(prev[0])
|
per_crawler[prev[1]].append(prev[0])
|
||||||
per_diff[prev[1]].append(diff)
|
per_diff[prev[1]].append(diff)
|
||||||
@ -72,16 +74,20 @@ def plot_devi(data: Dict[datetime, str]):
|
|||||||
t = per_crawler[c]
|
t = per_crawler[c]
|
||||||
devi = []
|
devi = []
|
||||||
for pre, nex in zip(t, t[1:]):
|
for pre, nex in zip(t, t[1:]):
|
||||||
devi.append(abs(10 - (nex.timestamp() - pre.timestamp())))
|
# devi.append(abs(10 - (nex.timestamp() - pre.timestamp())))
|
||||||
x = [10 * x for x in range(len(devi))]
|
devi.append(((nex.timestamp() - pre.timestamp()) - 10))
|
||||||
|
x = np.array([10 * x for x in range(len(devi))])
|
||||||
|
devi = np.array(devi)
|
||||||
fig, ax = plt.subplots()
|
fig, ax = plt.subplots()
|
||||||
ax.scatter(x, devi, s=10)
|
ax.scatter(x, devi, s=10)
|
||||||
|
m, b = np.polyfit(x, devi, 1)
|
||||||
|
plt.plot(x, m*x+b, color='red')
|
||||||
ax.set_title(f'Timedeviation for {c}')
|
ax.set_title(f'Timedeviation for {c}')
|
||||||
ax.set_xlabel('Time passed in seconds')
|
ax.set_xlabel('Time passed in seconds')
|
||||||
ax.set_ylabel('Deviation in seconds')
|
ax.set_ylabel('Deviation in seconds')
|
||||||
plt.savefig(f'./time_devi_{c}.png')
|
plt.savefig(f'./time_devi_{c}.png')
|
||||||
plt.close()
|
plt.close()
|
||||||
print(f'{c}: {statistics.mean(devi)}')
|
print(f'{c} & \\num{{{statistics.mean(devi)}}} \\\\')
|
||||||
# for ts in per_crawler[c]:
|
# for ts in per_crawler[c]:
|
||||||
|
|
||||||
|
|
||||||
|
Before Width: | Height: | Size: 29 KiB After Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 26 KiB |
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 39 KiB |
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 40 KiB |
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 43 KiB |
54
content.tex
@ -361,7 +361,7 @@ Load balancing allows scaling out, which can be more cost-effective.
|
|||||||
|
|
||||||
\subsubsection{Round Robin Distribution}\label{sec:rr}
|
\subsubsection{Round Robin Distribution}\label{sec:rr}
|
||||||
|
|
||||||
This strategy distributes work evenly among crawlers by either naively assigning tasks to the crawlers rotationally or weighted according to their capabilities\todo{1 -- 2 sentences about naive rr?}.
|
This strategy distributes work evenly among crawlers by either naively assigning tasks to the crawlers rotationally or weighted according to their capabilities.
|
||||||
To keep the distribution as even as possible, we keep track of the last crawler a task was assigned to and start with the next in line in the subsequent round of assignments.
|
To keep the distribution as even as possible, we keep track of the last crawler a task was assigned to and start with the next in line in the subsequent round of assignments.
|
||||||
For the sake of simplicity, only the bandwidth will be considered as a capability but it can be extended by any shared property between the crawlers, \eg{} available memory or processing power.
|
For the sake of simplicity, only the bandwidth will be considered as a capability but it can be extended by any shared property between the crawlers, \eg{} available memory or processing power.
|
||||||
For a given crawler \(c_i \in C\) let \(cap(c_i)\) be the capability of the crawler.
|
For a given crawler \(c_i \in C\) let \(cap(c_i)\) be the capability of the crawler.
|
||||||
@ -551,13 +551,12 @@ While the effective frequency of the whole system is halved compared to~\Fref{fi
|
|||||||
%}}} frequency reduction
|
%}}} frequency reduction
|
||||||
|
|
||||||
%{{{ against graph metrics
|
%{{{ against graph metrics
|
||||||
\subsection{Creating Edges for Crawlers and Sensors}
|
\subsection{Creating and Reducing Edges for Sensors}
|
||||||
|
|
||||||
\citetitle*{bib:karuppayah_sensorbuster_2017} describes different graph metrics to find sensors in \ac{p2p} botnets.
|
\citetitle*{bib:karuppayah_sensorbuster_2017} describes different graph metrics to find sensors in \ac{p2p} botnets.
|
||||||
These metrics depend on the uneven ratio between incoming and outgoing edges for crawlers.
|
These metrics depend on the uneven ratio between incoming and outgoing edges for crawlers.
|
||||||
The \emph{SensorBuster} metric uses \acp{wcc} since naive sensors don't have any edges back to the main network in the graph.
|
The \emph{SensorBuster} metric uses \acp{wcc} since naive sensors don't have any edges back to the main network in the graph.
|
||||||
|
|
||||||
Building a complete graph \(G_C = K_{\abs{C}}\) between the sensors and crawlers by making them return the other known worker on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\Fref{fig:sensorbuster2} and~\Fref{tab:metricsTable}).
|
|
||||||
|
|
||||||
With \(v \in V\), \(\text{succ}(v)\) being the set of successors of \(v\) and \(\text{pred}(v)\) being the set of predecessors of \(v\), \emph{PageRank} is recursively defined as~\cite{bib:page_pagerank_1998}:
|
With \(v \in V\), \(\text{succ}(v)\) being the set of successors of \(v\) and \(\text{pred}(v)\) being the set of predecessors of \(v\), \emph{PageRank} is recursively defined as~\cite{bib:page_pagerank_1998}:
|
||||||
|
|
||||||
@ -605,17 +604,19 @@ The following candidates to place on the neighbor list will be investigated:
|
|||||||
% Knowledge of only \num{90} peers leaving due to IP rotation would be enough to make a crawler look average in Sality\todo{repeat analysis, actual number}.
|
% Knowledge of only \num{90} peers leaving due to IP rotation would be enough to make a crawler look average in Sality\todo{repeat analysis, actual number}.
|
||||||
% This number will differ between different botnets, depending on implementation details and size of the network\todo{upper limit for NL size as impl detail}.
|
% This number will differ between different botnets, depending on implementation details and size of the network\todo{upper limit for NL size as impl detail}.
|
||||||
|
|
||||||
\subsubsection{Other Sensors or Crawlers}
|
% \subsubsection{Other Sensors or Crawlers}
|
||||||
|
|
||||||
Returning all the other sensors when responding to peer list requests, thereby effectively creating a complete graph \(K_{\abs{C}}\) among the workers, creates valid outgoing edges.
|
\textbf{Other Sensors:} Returning all the other sensors when responding to peer list requests, thereby effectively creating a complete graph \(K_{\abs{C}}\) among the workers, creates valid outgoing edges.
|
||||||
The resulting graph will still form a \ac{wcc} with now edges back into the main network.
|
The resulting graph will still form a \ac{wcc} with now edges back into the main network.
|
||||||
|
|
||||||
|
Building a complete graph \(G_C = K_{\abs{C}}\) between the sensors by making them return the other known worker on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\Fref{fig:sensorbuster2} and~\Fref{tab:metricsTable}).\todo{where?}
|
||||||
|
|
||||||
|
|
||||||
%{{{ churned peers
|
%{{{ churned peers
|
||||||
\subsubsection{Churned Peers After IP Rotation}
|
% \subsubsection{Churned Peers After IP Rotation}
|
||||||
|
|
||||||
Churn describes the dynamics of peer participation in \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.\todo{übergang}
|
\textbf{Churned peers after IP rotation:} Churn describes the dynamics of peer participation in \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.
|
||||||
Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's peer list.\todo{what is an AS}
|
Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's peer list.
|
||||||
If the timing of the churn event correlates with IP rotation in the \ac{as}, it can be assumed, that the peer left due to being assigned a new IP address---not due to connectivity issues or going offline---and will not return using the same IP address.
|
If the timing of the churn event correlates with IP rotation in the \ac{as}, it can be assumed, that the peer left due to being assigned a new IP address---not due to connectivity issues or going offline---and will not return using the same IP address.
|
||||||
These peers, when placed in the peer list of the crawlers, will introduce paths back into the main network and defeat the \ac{wcc} metric.
|
These peers, when placed in the peer list of the crawlers, will introduce paths back into the main network and defeat the \ac{wcc} metric.
|
||||||
It also helps with the PageRank and SensorRank metrics since the crawlers start to look like regular peers without actually supporting the network by relaying messages or propagating active peers.
|
It also helps with the PageRank and SensorRank metrics since the crawlers start to look like regular peers without actually supporting the network by relaying messages or propagating active peers.
|
||||||
@ -623,15 +624,15 @@ It also helps with the PageRank and SensorRank metrics since the crawlers start
|
|||||||
%}}} churned peers
|
%}}} churned peers
|
||||||
|
|
||||||
%{{{ cg nat
|
%{{{ cg nat
|
||||||
\subsubsection{Peers Behind Carrier-Grade \acs*{nat}}
|
% \subsubsection{Peers Behind Carrier-Grade \acs*{nat}}
|
||||||
|
|
||||||
Some peers show behavior, where their IP address changes almost after every request.
|
\textbf{Peers behind carrier-grade \acs{nat}:} Some peers show behavior, where their IP address changes almost after every request.
|
||||||
Those peers can be used as fake neighbors and create valid-looking outgoing edges for the sensor.
|
Those peers can be used as fake neighbors and create valid-looking outgoing edges for the sensor.
|
||||||
|
|
||||||
%}}} cg nat
|
%}}} cg nat
|
||||||
|
|
||||||
\clearpage{}
|
% \clearpage{}
|
||||||
\todo{clearpage?}
|
% \todo{clearpage?}
|
||||||
In theory, it would be possible to detect churned peers or peers behind carrier-grade \acs{nat}, without coordinating the sensors but the coordination gives us a few advantages:
|
In theory, it would be possible to detect churned peers or peers behind carrier-grade \acs{nat}, without coordinating the sensors but the coordination gives us a few advantages:
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
@ -956,11 +957,11 @@ The ideal distribution would be \SI{2.5}{\second} between every two events.
|
|||||||
Due to network latency and load from crawling other peers, we expect the actual result to deviate from the optimal value over time.
|
Due to network latency and load from crawling other peers, we expect the actual result to deviate from the optimal value over time.
|
||||||
With this experiment, we try to estimate the impact of the latency.
|
With this experiment, we try to estimate the impact of the latency.
|
||||||
|
|
||||||
\begin{figure}[H]
|
% \begin{figure}[H]
|
||||||
\centering
|
% \centering
|
||||||
\includegraphics[width=1\linewidth]{time_devi.png}
|
% \includegraphics[width=1\linewidth]{time_devi.png}
|
||||||
\caption{Deviation from the expected interval}\label{fig:timeDevi}
|
% \caption{Deviation from the expected interval}\label{fig:timeDevi}
|
||||||
\end{figure}
|
% \end{figure}
|
||||||
|
|
||||||
\begin{landscape}
|
\begin{landscape}
|
||||||
\begin{figure}[H]
|
\begin{figure}[H]
|
||||||
@ -986,21 +987,24 @@ With this experiment, we try to estimate the impact of the latency.
|
|||||||
\end{figure}
|
\end{figure}
|
||||||
\end{landscape}
|
\end{landscape}
|
||||||
|
|
||||||
The deviation between crawl events per crawler is below \SI{0.01}{\second} most of the time, with occasional outliers due to network latency or server load.
|
|
||||||
|
|
||||||
\begin{table}[H]
|
\begin{table}[H]
|
||||||
\centering
|
\centering
|
||||||
\begin{tabular}{rr}
|
\begin{tabular}{rS}
|
||||||
\textbf{Crawler} & \textbf{Average Deviation} \\
|
\textbf{Crawler} & \textbf{Average Deviation} \\
|
||||||
c0 & \num{0.0005927812081134085} \\
|
c0 & \num{0.0003166149207321755} \\
|
||||||
c1 & \num{0.0003700713297978895} \\
|
c1 & \num{0.0002065727194268201} \\
|
||||||
c2 & \num{0.0006121075253902246} \\
|
c2 & \num{0.0003075813840032066} \\
|
||||||
c3 & \num{0.0020807891511268814} \\
|
c3 & \num{0.0038056359425696364} \\
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\caption{Average deviation per crawler}\label{tab:perCralwerDeviation}
|
\caption{Average deviation per crawler}\label{tab:perCralwerDeviation}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
The average deviation per crawler is below \SI{0.002}{\second} even with some huge outliers. In general it is below \SI{0.0007}{\second}, which is a surprisingly accurate result.
|
The monitored peer crawler \emph{c0} are located in Falkenstein, Germany, \emph{c1} in Nurnberg, Germany, \emph{c2} is in Helsinki, Finland and \emph{c3} in Ashburn, USA, to have some geographic distribution.
|
||||||
|
|
||||||
|
The average deviation per crawler is below \SI{0.002}{\second} even with some outliers due to network latency or server load.
|
||||||
|
The crawler \emph{c3} in the experiment is the furthest away from the monitored host therefore the larger derivation due to network latency is expected.
|
||||||
|
|
||||||
|
% In general it is below \SI{0.0007}{\second}, which is a surprisingly accurate result.
|
||||||
In real-world scenarios, crawlers will monitor more than a single peer and the scheduling is expected to be less accurate.
|
In real-world scenarios, crawlers will monitor more than a single peer and the scheduling is expected to be less accurate.
|
||||||
Still, the deviation will always stay below the effective frequency \(f\), because after exceeding \(f\), a crawler is overtaken by the next in line.
|
Still, the deviation will always stay below the effective frequency \(f\), because after exceeding \(f\), a crawler is overtaken by the next in line.
|
||||||
The impact of the deviation when crawling real-world botnets has to be investigated and if it shows to be a problem, the tasks have to be rescheduled periodically to prevent this from happening.
|
The impact of the deviation when crawling real-world botnets has to be investigated and if it shows to be a problem, the tasks have to be rescheduled periodically to prevent this from happening.
|
||||||
|