Content
BIN
assets/0.25_1_sr.png
Normal file
After Width: | Height: | Size: 17 KiB |
BIN
assets/0.25_2_sr.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
assets/0.25_3_sr.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
assets/0.25_4_sr.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
assets/0.25_5_sr.png
Normal file
After Width: | Height: | Size: 19 KiB |
BIN
assets/0.50_1_sr.png
Normal file
After Width: | Height: | Size: 17 KiB |
BIN
assets/0.50_2_sr.png
Normal file
After Width: | Height: | Size: 17 KiB |
BIN
assets/0.50_3_sr.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
assets/0.50_4_sr.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
assets/0.50_5_sr.png
Normal file
After Width: | Height: | Size: 19 KiB |
BIN
assets/0.5_1_sr.png
Normal file
After Width: | Height: | Size: 17 KiB |
BIN
assets/0.5_2_sr.png
Normal file
After Width: | Height: | Size: 17 KiB |
BIN
assets/0.5_3_sr.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
assets/0.5_4_sr.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
assets/0.5_5_sr.png
Normal file
After Width: | Height: | Size: 19 KiB |
BIN
assets/0.75_1_sr.png
Normal file
After Width: | Height: | Size: 17 KiB |
BIN
assets/0.75_2_sr.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
assets/0.75_3_sr.png
Normal file
After Width: | Height: | Size: 19 KiB |
BIN
assets/0.75_4_sr.png
Normal file
After Width: | Height: | Size: 19 KiB |
BIN
assets/0.75_5_sr.png
Normal file
After Width: | Height: | Size: 19 KiB |
149
content.tex
@ -410,10 +410,10 @@ While the effective frequency of the whole system is halved compared to~\autoref
|
||||
%}}} frequency reduction
|
||||
|
||||
%{{{ against graph metrics
|
||||
\todo{sinnvoll?}
|
||||
\subsection{Working Against Suspicious Graph Metrics}
|
||||
|
||||
\citetitle*{bib:karuppayah_sensorbuster_2017} describes different graph metrics to find sensors in \ac{p2p} botnets.
|
||||
These metrics depend on the uneven ratio between incoming and outgoing edges for crawlers.
|
||||
One of those, \enquote{SensorBuster} uses \acp{wcc} since crawlers don't have any edges back to the main network in the graph.
|
||||
|
||||
Building a complete graph \(G_C = K_{\abs{C}}\) between the crawlers by making them return the other crawlers on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\autoref{fig:sensorbuster2} and~\autoref{fig:metrics_table}).
|
||||
@ -425,50 +425,7 @@ With \(v \in V\), \(\text{succ}(v)\) being the set of successors of \(v\) and \(
|
||||
\text{PR}(v) = \text{dampingFactor} \times \sum\limits_{p \in \text{pred}(v)} \frac{\text{PR}(p)}{\abs{\text{succ}(p)}} + \frac{1 - \text{dampingFactor}}{\abs{V}}
|
||||
\]
|
||||
|
||||
For the first iteration, the PageRank of all nodes is set to the same initial value. When iterating often enough, any value can be chosen~\cite{bib:page_pagerank_1998}.\todo{how often? experiments!}
|
||||
In our experiments on a snapshot of the Sality~\cite{bib:falliere_sality_2011} botnet exported from \ac{bms} over the span of \daterange{2021-04-22}{2021-04-29}\todo{export timespan}, 3 iterations were enough to get distinct enough values to detect sensors and crawlers.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\begin{tabular}{lllll}
|
||||
\textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
|
||||
1 & wat? & wut? & wit? & wot? \\
|
||||
2 & wat? & wut? & wit? & wot? \\
|
||||
3 & wat? & wut? & wit? & wot? \\
|
||||
4 & wat? & wut? & wit? & wot? \\
|
||||
5 & wat? & wut? & wit? & wot? \\
|
||||
\end{tabular}
|
||||
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:pr_iter_table}
|
||||
\end{table}
|
||||
\todo{proper table formatting}
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\begin{tabular}{lllll}
|
||||
\textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
|
||||
1 & wat? & wut? & wit? & wot? \\
|
||||
2 & wat? & wut? & wit? & wot? \\
|
||||
3 & wat? & wut? & wit? & wot? \\
|
||||
4 & wat? & wut? & wit? & wot? \\
|
||||
5 & wat? & wut? & wit? & wot? \\
|
||||
\end{tabular}
|
||||
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:pr_iter_table}
|
||||
\end{table}
|
||||
\todo{proper table formatting}
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\begin{tabular}{lllll}
|
||||
\textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
|
||||
1 & wat? & wut? & wit? & wot? \\
|
||||
2 & wat? & wut? & wit? & wot? \\
|
||||
3 & wat? & wut? & wit? & wot? \\
|
||||
4 & wat? & wut? & wit? & wot? \\
|
||||
5 & wat? & wut? & wit? & wot? \\
|
||||
\end{tabular}
|
||||
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:pr_iter_table}
|
||||
\end{table}
|
||||
\todo{proper table formatting}
|
||||
For the first iteration, the PageRank of all nodes is set to the same initial value. When iterating often enough, any value can be chosen~\cite{bib:page_pagerank_1998}.
|
||||
|
||||
The dampingFactor describes the probability of a person visiting links on the web to continue doing so, when using PageRank to rank websites in search results.
|
||||
For simplicity---and since it is not required to model human behaviour for automated crawling and ranking---a dampingFactor of \(1.0\) will be used, which simplifies the formula to
|
||||
@ -523,14 +480,112 @@ Applying SensorRank PageRank once with an initial rank of \(0.25\) once on the e
|
||||
|
||||
While this works for small networks, the crawlers must account for a significant amount of peers in the network for this change to be noticeable.\todo{for bigger (generated) graphs?}
|
||||
|
||||
\subsubsection{Excurs: Churn}
|
||||
In our experiments on a snapshot of the Sality~\cite{bib:falliere_sality_2011} botnet exported from \ac{bms} over the span of \daterange{2021-04-21}{2021-04-28}\todo{export timespan}, even 1 iteration were enough to get distinct enough values to detect sensors and crawlers.
|
||||
|
||||
Churn describes the dynamics of peer participation of \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\begin{tabular}{lllll}
|
||||
\textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
|
||||
1 & 0.24854932 & 0.63277194 & 0.15393478 & 0.56545578 \\
|
||||
2 & 0.24854932 & 0.63277194 & 0.15393478 & 0.56545578 \\
|
||||
3 & 0.24501068 & 0.46486353 & 0.13810930 & 0.41540997 \\
|
||||
4 & 0.24501068 & 0.46486353 & 0.13810930 & 0.41540997 \\
|
||||
5 & 0.24233737 & 0.50602884 & 0.14101354 & 0.45219598 \\
|
||||
\end{tabular}
|
||||
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:pr_iter_table_25}
|
||||
\end{table}
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\begin{subfigure}[b]{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{0.25_1_sr.png}
|
||||
\caption{Distribution after 1 iteration}\label{fig:dist_sr_25_1}
|
||||
\end{subfigure}%
|
||||
\begin{subfigure}[b]{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{0.25_5_sr.png}
|
||||
\caption{Distribution after 5 iterations}\label{fig:dist_sr_25_5}
|
||||
\end{subfigure}%
|
||||
\caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:dist_sr_25}
|
||||
\end{figure}
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\begin{tabular}{lllll}
|
||||
\textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
|
||||
1 & 0.49709865 & 1.26554389 & 0.30786955 & 1.13091156 \\
|
||||
2 & 0.49709865 & 1.26554389 & 0.30786955 & 1.13091156 \\
|
||||
3 & 0.49002136 & 0.92972707 & 0.27621861 & 0.83081993 \\
|
||||
4 & 0.49002136 & 0.92972707 & 0.27621861 & 0.83081993 \\
|
||||
5 & 0.48467474 & 1.01205767 & 0.28202708 & 0.90439196 \\
|
||||
\end{tabular}
|
||||
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:pr_iter_table_5}
|
||||
\end{table}
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\begin{subfigure}[b]{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{0.50_1_sr.png}
|
||||
\caption{Distribution after 1 iteration}\label{fig:dist_sr_50_1}
|
||||
\end{subfigure}%
|
||||
\begin{subfigure}[b]{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{0.50_5_sr.png}
|
||||
\caption{Distribution after 5 iterations}\label{fig:dist_sr_50_5}
|
||||
\end{subfigure}%
|
||||
\caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:dist_sr_50}
|
||||
\end{figure}
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\begin{tabular}{lllll}
|
||||
\textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
|
||||
1 & 0.74564797 & 1.89831583 & 0.46180433 & 1.69636734 \\
|
||||
2 & 0.74564797 & 1.89831583 & 0.46180433 & 1.69636734 \\
|
||||
3 & 0.73503203 & 1.39459060 & 0.41432791 & 1.24622990 \\
|
||||
4 & 0.73503203 & 1.39459060 & 0.41432791 & 1.24622990 \\
|
||||
5 & 0.72701212 & 1.51808651 & 0.42304062 & 1.35658794 \\
|
||||
\end{tabular}
|
||||
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:pr_iter_table_75}
|
||||
\end{table}
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\begin{subfigure}[b]{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{0.75_1_sr.png}
|
||||
\caption{Distribution after 1 iteration}\label{fig:dist_sr_75_1}
|
||||
\end{subfigure}%
|
||||
\begin{subfigure}[b]{.5\textwidth}
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{0.75_5_sr.png}
|
||||
\caption{Distribution after 5 iterations}\label{fig:dist_sr_75_5}
|
||||
\end{subfigure}%
|
||||
\caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:dist_sr_75}
|
||||
\end{figure}
|
||||
|
||||
The distribution graphs in \autoref{fig:dist_sr_25}, \autoref{fig:dist_sr_50} and \autoref{fig:dist_sr_75} show that the initial rank has no effect on the distribution, only on the actual numeric rank values.
|
||||
|
||||
For all combinations of initial value and PageRank iterations, the rank for a well known crawler is in the \nth{95} percentile, so for our use case, those parameters do not matter.
|
||||
|
||||
On average, peers in the analyzed dataset have \num{223} successors over the whole week.
|
||||
Looking at the data in smaller buckets of one hour each, the average number of successors per peer is \num{90}.
|
||||
|
||||
Churn describes the dynamics of peer participation of \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.\todo{übergang}
|
||||
Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's neighbourhood list.
|
||||
If the timing of the churn event correlates with IP rotation in the \ac{as}, it can be assumed, that the peer left due to being assigned a new IP address---not due to connectivity issues or going offline---and will not return using the same IP address.
|
||||
These peers, when placed in the neighbourhood list of the crawlers, will introduce paths back into the main network and defeat the \ac{wcc} metric.
|
||||
It also helps with the PageRank and SensorRank metrics since the crawlers start to look like regular peers without actually supporting the network by relaying messages or propagating active peers.
|
||||
|
||||
Knowledge of only \num{90} peers leaving due to IP rotation would be enough to make a crawler look average in Sality.
|
||||
This number will differ between different botnets, depending on implementation details and size of the network.
|
||||
|
||||
Adding edges from the known crawler to \num{90} random peers to simulate the described strategy gives the following rankings:\todo{table, distribution with random edges}
|
||||
|
||||
|
||||
|
||||
%}}} against graph metrics
|
||||
|
||||
%}}} strategies
|
||||
|
BIN
report.pdf
@ -58,6 +58,7 @@ headsepline,
|
||||
|
||||
% formatting numbers
|
||||
\usepackage{nicefrac}
|
||||
\usepackage{nth}
|
||||
% units
|
||||
\usepackage{siunitx}
|
||||
\sisetup{%
|
||||
|