Content

2022-03-22 20:15:30 +01:00 · 2022-03-22 20:15:30 +01:00 · 6ebc297b81
commit 6ebc297b81
parent 5819fa2443
24 changed files with 104 additions and 47 deletions
--- a/assets/0.25_1_sr.png
+++ b/assets/0.25_1_sr.png
--- a/assets/0.25_2_sr.png
+++ b/assets/0.25_2_sr.png
--- a/assets/0.25_3_sr.png
+++ b/assets/0.25_3_sr.png
--- a/assets/0.25_4_sr.png
+++ b/assets/0.25_4_sr.png
--- a/assets/0.25_5_sr.png
+++ b/assets/0.25_5_sr.png
--- a/assets/0.50_1_sr.png
+++ b/assets/0.50_1_sr.png
--- a/assets/0.50_2_sr.png
+++ b/assets/0.50_2_sr.png
--- a/assets/0.50_3_sr.png
+++ b/assets/0.50_3_sr.png
--- a/assets/0.50_4_sr.png
+++ b/assets/0.50_4_sr.png
--- a/assets/0.50_5_sr.png
+++ b/assets/0.50_5_sr.png
--- a/assets/0.5_1_sr.png
+++ b/assets/0.5_1_sr.png
--- a/assets/0.5_2_sr.png
+++ b/assets/0.5_2_sr.png
--- a/assets/0.5_3_sr.png
+++ b/assets/0.5_3_sr.png
--- a/assets/0.5_4_sr.png
+++ b/assets/0.5_4_sr.png
--- a/assets/0.5_5_sr.png
+++ b/assets/0.5_5_sr.png
--- a/assets/0.75_1_sr.png
+++ b/assets/0.75_1_sr.png
--- a/assets/0.75_2_sr.png
+++ b/assets/0.75_2_sr.png
--- a/assets/0.75_3_sr.png
+++ b/assets/0.75_3_sr.png
--- a/assets/0.75_4_sr.png
+++ b/assets/0.75_4_sr.png
--- a/assets/0.75_5_sr.png
+++ b/assets/0.75_5_sr.png
--- a/content.tex
+++ b/content.tex
@ -410,10 +410,10 @@ While the effective frequency of the whole system is halved compared to~\autoref
 %}}} frequency reduction

 %{{{ against graph metrics
-\todo{sinnvoll?}
 \subsection{Working Against Suspicious Graph Metrics}

 \citetitle*{bib:karuppayah_sensorbuster_2017} describes different graph metrics to find sensors in \ac{p2p} botnets.
+These metrics depend on the uneven ratio between incoming and outgoing edges for crawlers.
 One of those, \enquote{SensorBuster} uses \acp{wcc} since crawlers don't have any edges back to the main network in the graph.

 Building a complete graph \(G_C = K_{\abs{C}}\) between the crawlers by making them return the other crawlers on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\autoref{fig:sensorbuster2} and~\autoref{fig:metrics_table}).
@ -425,50 +425,7 @@ With \(v \in V\), \(\text{succ}(v)\) being the set of successors of \(v\) and \(
  \text{PR}(v) = \text{dampingFactor} \times \sum\limits_{p \in \text{pred}(v)} \frac{\text{PR}(p)}{\abs{\text{succ}(p)}} + \frac{1 - \text{dampingFactor}}{\abs{V}}
 \]

-For the first iteration, the PageRank of all nodes is set to the same initial value. When iterating often enough, any value can be chosen~\cite{bib:page_pagerank_1998}.\todo{how often? experiments!}
-In our experiments on a snapshot of the Sality~\cite{bib:falliere_sality_2011} botnet exported from \ac{bms} over the span of \daterange{2021-04-22}{2021-04-29}\todo{export timespan}, 3 iterations were enough to get distinct enough values to detect sensors and crawlers.
-
-\begin{table}[H]
-  \centering
-\begin{tabular}{lllll}
-  \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
-  1 & wat? & wut? & wit? & wot? \\
-  2 & wat? & wut? & wit? & wot? \\
-  3 & wat? & wut? & wit? & wot? \\
-  4 & wat? & wut? & wit? & wot? \\
-  5 & wat? & wut? & wit? & wot? \\
-\end{tabular}
-  \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:pr_iter_table}
-\end{table}
-\todo{proper table formatting}
-
-\begin{table}[H]
-  \centering
-\begin{tabular}{lllll}
-  \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
-  1 & wat? & wut? & wit? & wot? \\
-  2 & wat? & wut? & wit? & wot? \\
-  3 & wat? & wut? & wit? & wot? \\
-  4 & wat? & wut? & wit? & wot? \\
-  5 & wat? & wut? & wit? & wot? \\
-\end{tabular}
-  \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:pr_iter_table}
-\end{table}
-\todo{proper table formatting}
-
-\begin{table}[H]
-  \centering
-\begin{tabular}{lllll}
-  \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
-  1 & wat? & wut? & wit? & wot? \\
-  2 & wat? & wut? & wit? & wot? \\
-  3 & wat? & wut? & wit? & wot? \\
-  4 & wat? & wut? & wit? & wot? \\
-  5 & wat? & wut? & wit? & wot? \\
-\end{tabular}
-  \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:pr_iter_table}
-\end{table}
-\todo{proper table formatting}
+For the first iteration, the PageRank of all nodes is set to the same initial value. When iterating often enough, any value can be chosen~\cite{bib:page_pagerank_1998}.

 The dampingFactor describes the probability of a person visiting links on the web to continue doing so, when using PageRank to rank websites in search results.
 For simplicity---and since it is not required to model human behaviour for automated crawling and ranking---a dampingFactor of \(1.0\) will be used, which simplifies the formula to
@ -523,14 +480,112 @@ Applying SensorRank PageRank once with an initial rank of \(0.25\) once on the e

 While this works for small networks, the crawlers must account for a significant amount of peers in the network for this change to be noticeable.\todo{for bigger (generated) graphs?}

-\subsubsection{Excurs: Churn}
+In our experiments on a snapshot of the Sality~\cite{bib:falliere_sality_2011} botnet exported from \ac{bms} over the span of \daterange{2021-04-21}{2021-04-28}\todo{export timespan}, even 1 iteration were enough to get distinct enough values to detect sensors and crawlers.

-Churn describes the dynamics of peer participation of \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.
+\begin{table}[H]
+  \centering
+\begin{tabular}{lllll}
+  \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
+  1 & 0.24854932 & 0.63277194 & 0.15393478 & 0.56545578 \\
+  2 & 0.24854932 & 0.63277194 & 0.15393478 & 0.56545578 \\
+  3 & 0.24501068 & 0.46486353 & 0.13810930 & 0.41540997 \\
+  4 & 0.24501068 & 0.46486353 & 0.13810930 & 0.41540997 \\
+  5 & 0.24233737 & 0.50602884 & 0.14101354 & 0.45219598 \\
+\end{tabular}
+  \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:pr_iter_table_25}
+\end{table}
+
+\begin{figure}[H]
+  \centering
+\begin{subfigure}[b]{.5\textwidth}
+  \centering
+  \includegraphics[width=1\linewidth]{0.25_1_sr.png}
+  \caption{Distribution after 1 iteration}\label{fig:dist_sr_25_1}
+\end{subfigure}%
+\begin{subfigure}[b]{.5\textwidth}
+  \centering
+  \includegraphics[width=1\linewidth]{0.25_5_sr.png}
+  \caption{Distribution after 5 iterations}\label{fig:dist_sr_25_5}
+\end{subfigure}%
+  \caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:dist_sr_25}
+\end{figure}
+
+\begin{table}[H]
+  \centering
+\begin{tabular}{lllll}
+  \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
+  1 & 0.49709865 & 1.26554389 & 0.30786955 & 1.13091156 \\
+  2 & 0.49709865 & 1.26554389 & 0.30786955 & 1.13091156 \\
+  3 & 0.49002136 & 0.92972707 & 0.27621861 & 0.83081993 \\
+  4 & 0.49002136 & 0.92972707 & 0.27621861 & 0.83081993 \\
+  5 & 0.48467474 & 1.01205767 & 0.28202708 & 0.90439196 \\
+\end{tabular}
+  \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:pr_iter_table_5}
+\end{table}
+
+\begin{figure}[H]
+  \centering
+\begin{subfigure}[b]{.5\textwidth}
+  \centering
+  \includegraphics[width=1\linewidth]{0.50_1_sr.png}
+  \caption{Distribution after 1 iteration}\label{fig:dist_sr_50_1}
+\end{subfigure}%
+\begin{subfigure}[b]{.5\textwidth}
+  \centering
+  \includegraphics[width=1\linewidth]{0.50_5_sr.png}
+  \caption{Distribution after 5 iterations}\label{fig:dist_sr_50_5}
+\end{subfigure}%
+  \caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:dist_sr_50}
+\end{figure}
+
+\begin{table}[H]
+  \centering
+\begin{tabular}{lllll}
+  \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\
+  1 & 0.74564797 & 1.89831583 & 0.46180433 & 1.69636734 \\
+  2 & 0.74564797 & 1.89831583 & 0.46180433 & 1.69636734 \\
+  3 & 0.73503203 & 1.39459060 & 0.41432791 & 1.24622990 \\
+  4 & 0.73503203 & 1.39459060 & 0.41432791 & 1.24622990 \\
+  5 & 0.72701212 & 1.51808651 & 0.42304062 & 1.35658794 \\
+\end{tabular}
+  \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:pr_iter_table_75}
+\end{table}
+
+\begin{figure}[H]
+  \centering
+\begin{subfigure}[b]{.5\textwidth}
+  \centering
+  \includegraphics[width=1\linewidth]{0.75_1_sr.png}
+  \caption{Distribution after 1 iteration}\label{fig:dist_sr_75_1}
+\end{subfigure}%
+\begin{subfigure}[b]{.5\textwidth}
+  \centering
+  \includegraphics[width=1\linewidth]{0.75_5_sr.png}
+  \caption{Distribution after 5 iterations}\label{fig:dist_sr_75_5}
+\end{subfigure}%
+  \caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:dist_sr_75}
+\end{figure}
+
+The distribution graphs in \autoref{fig:dist_sr_25}, \autoref{fig:dist_sr_50} and \autoref{fig:dist_sr_75} show that the initial rank has no effect on the distribution, only on the actual numeric rank values.
+
+For all combinations of initial value and PageRank iterations, the rank for a well known crawler is in the \nth{95} percentile, so for our use case, those parameters do not matter.
+
+On average, peers in the analyzed dataset have \num{223} successors over the whole week.
+Looking at the data in smaller buckets of one hour each, the average number of successors per peer is \num{90}.
+
+Churn describes the dynamics of peer participation of \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.\todo{übergang}
 Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's neighbourhood list.
 If the timing of the churn event correlates with IP rotation in the \ac{as}, it can be assumed, that the peer left due to being assigned a new IP address---not due to connectivity issues or going offline---and will not return using the same IP address.
 These peers, when placed in the neighbourhood list of the crawlers, will introduce paths back into the main network and defeat the \ac{wcc} metric.
 It also helps with the PageRank and SensorRank metrics since the crawlers start to look like regular peers without actually supporting the network by relaying messages or propagating active peers.

+Knowledge of only \num{90} peers leaving due to IP rotation would be enough to make a crawler look average in Sality.
+This number will differ between different botnets, depending on implementation details and size of the network.
+
+Adding edges from the known crawler to \num{90} random peers to simulate the described strategy gives the following rankings:\todo{table, distribution with random edges}
+
+
+
 %}}} against graph metrics

 %}}} strategies
--- a/report.pdf
+++ b/report.pdf
--- a/report.tex
+++ b/report.tex
@ -58,6 +58,7 @@ headsepline,

 % formatting numbers
 \usepackage{nicefrac}
+\usepackage{nth}
 % units
 \usepackage{siunitx}
 \sisetup{%
--- a/shell.nix
+++ b/shell.nix
@ -18,6 +18,7 @@ let
    dejavu
    isodate
    latexmk
+    nth
    siunitx
    substr
    todonotes