diff --git a/assets/0.25_1_sr.png b/assets/0.25_1_sr.png new file mode 100644 index 00000000..7209bf19 Binary files /dev/null and b/assets/0.25_1_sr.png differ diff --git a/assets/0.25_2_sr.png b/assets/0.25_2_sr.png new file mode 100644 index 00000000..b9d1d85a Binary files /dev/null and b/assets/0.25_2_sr.png differ diff --git a/assets/0.25_3_sr.png b/assets/0.25_3_sr.png new file mode 100644 index 00000000..7c17fcda Binary files /dev/null and b/assets/0.25_3_sr.png differ diff --git a/assets/0.25_4_sr.png b/assets/0.25_4_sr.png new file mode 100644 index 00000000..349a8592 Binary files /dev/null and b/assets/0.25_4_sr.png differ diff --git a/assets/0.25_5_sr.png b/assets/0.25_5_sr.png new file mode 100644 index 00000000..be6cffea Binary files /dev/null and b/assets/0.25_5_sr.png differ diff --git a/assets/0.50_1_sr.png b/assets/0.50_1_sr.png new file mode 100644 index 00000000..8a6509b9 Binary files /dev/null and b/assets/0.50_1_sr.png differ diff --git a/assets/0.50_2_sr.png b/assets/0.50_2_sr.png new file mode 100644 index 00000000..3b19ea0c Binary files /dev/null and b/assets/0.50_2_sr.png differ diff --git a/assets/0.50_3_sr.png b/assets/0.50_3_sr.png new file mode 100644 index 00000000..f869e673 Binary files /dev/null and b/assets/0.50_3_sr.png differ diff --git a/assets/0.50_4_sr.png b/assets/0.50_4_sr.png new file mode 100644 index 00000000..b0cfe1d9 Binary files /dev/null and b/assets/0.50_4_sr.png differ diff --git a/assets/0.50_5_sr.png b/assets/0.50_5_sr.png new file mode 100644 index 00000000..771ee4a6 Binary files /dev/null and b/assets/0.50_5_sr.png differ diff --git a/assets/0.5_1_sr.png b/assets/0.5_1_sr.png new file mode 100644 index 00000000..8a6509b9 Binary files /dev/null and b/assets/0.5_1_sr.png differ diff --git a/assets/0.5_2_sr.png b/assets/0.5_2_sr.png new file mode 100644 index 00000000..3b19ea0c Binary files /dev/null and b/assets/0.5_2_sr.png differ diff --git a/assets/0.5_3_sr.png b/assets/0.5_3_sr.png new file mode 100644 index 00000000..f869e673 Binary files /dev/null and b/assets/0.5_3_sr.png differ diff --git a/assets/0.5_4_sr.png b/assets/0.5_4_sr.png new file mode 100644 index 00000000..b0cfe1d9 Binary files /dev/null and b/assets/0.5_4_sr.png differ diff --git a/assets/0.5_5_sr.png b/assets/0.5_5_sr.png new file mode 100644 index 00000000..771ee4a6 Binary files /dev/null and b/assets/0.5_5_sr.png differ diff --git a/assets/0.75_1_sr.png b/assets/0.75_1_sr.png new file mode 100644 index 00000000..8b6b1943 Binary files /dev/null and b/assets/0.75_1_sr.png differ diff --git a/assets/0.75_2_sr.png b/assets/0.75_2_sr.png new file mode 100644 index 00000000..ffd7887f Binary files /dev/null and b/assets/0.75_2_sr.png differ diff --git a/assets/0.75_3_sr.png b/assets/0.75_3_sr.png new file mode 100644 index 00000000..fb027f8a Binary files /dev/null and b/assets/0.75_3_sr.png differ diff --git a/assets/0.75_4_sr.png b/assets/0.75_4_sr.png new file mode 100644 index 00000000..01281d5c Binary files /dev/null and b/assets/0.75_4_sr.png differ diff --git a/assets/0.75_5_sr.png b/assets/0.75_5_sr.png new file mode 100644 index 00000000..fe88d2e6 Binary files /dev/null and b/assets/0.75_5_sr.png differ diff --git a/content.tex b/content.tex index 5cbd1f0c..3606b6bd 100644 --- a/content.tex +++ b/content.tex @@ -410,10 +410,10 @@ While the effective frequency of the whole system is halved compared to~\autoref %}}} frequency reduction %{{{ against graph metrics -\todo{sinnvoll?} \subsection{Working Against Suspicious Graph Metrics} \citetitle*{bib:karuppayah_sensorbuster_2017} describes different graph metrics to find sensors in \ac{p2p} botnets. +These metrics depend on the uneven ratio between incoming and outgoing edges for crawlers. One of those, \enquote{SensorBuster} uses \acp{wcc} since crawlers don't have any edges back to the main network in the graph. Building a complete graph \(G_C = K_{\abs{C}}\) between the crawlers by making them return the other crawlers on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\autoref{fig:sensorbuster2} and~\autoref{fig:metrics_table}). @@ -425,50 +425,7 @@ With \(v \in V\), \(\text{succ}(v)\) being the set of successors of \(v\) and \( \text{PR}(v) = \text{dampingFactor} \times \sum\limits_{p \in \text{pred}(v)} \frac{\text{PR}(p)}{\abs{\text{succ}(p)}} + \frac{1 - \text{dampingFactor}}{\abs{V}} \] -For the first iteration, the PageRank of all nodes is set to the same initial value. When iterating often enough, any value can be chosen~\cite{bib:page_pagerank_1998}.\todo{how often? experiments!} -In our experiments on a snapshot of the Sality~\cite{bib:falliere_sality_2011} botnet exported from \ac{bms} over the span of \daterange{2021-04-22}{2021-04-29}\todo{export timespan}, 3 iterations were enough to get distinct enough values to detect sensors and crawlers. - -\begin{table}[H] - \centering -\begin{tabular}{lllll} - \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\ - 1 & wat? & wut? & wit? & wot? \\ - 2 & wat? & wut? & wit? & wot? \\ - 3 & wat? & wut? & wit? & wot? \\ - 4 & wat? & wut? & wit? & wot? \\ - 5 & wat? & wut? & wit? & wot? \\ -\end{tabular} - \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:pr_iter_table} -\end{table} -\todo{proper table formatting} - -\begin{table}[H] - \centering -\begin{tabular}{lllll} - \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\ - 1 & wat? & wut? & wit? & wot? \\ - 2 & wat? & wut? & wit? & wot? \\ - 3 & wat? & wut? & wit? & wot? \\ - 4 & wat? & wut? & wit? & wot? \\ - 5 & wat? & wut? & wit? & wot? \\ -\end{tabular} - \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:pr_iter_table} -\end{table} -\todo{proper table formatting} - -\begin{table}[H] - \centering -\begin{tabular}{lllll} - \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\ - 1 & wat? & wut? & wit? & wot? \\ - 2 & wat? & wut? & wit? & wot? \\ - 3 & wat? & wut? & wit? & wot? \\ - 4 & wat? & wut? & wit? & wot? \\ - 5 & wat? & wut? & wit? & wot? \\ -\end{tabular} - \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:pr_iter_table} -\end{table} -\todo{proper table formatting} +For the first iteration, the PageRank of all nodes is set to the same initial value. When iterating often enough, any value can be chosen~\cite{bib:page_pagerank_1998}. The dampingFactor describes the probability of a person visiting links on the web to continue doing so, when using PageRank to rank websites in search results. For simplicity---and since it is not required to model human behaviour for automated crawling and ranking---a dampingFactor of \(1.0\) will be used, which simplifies the formula to @@ -523,14 +480,112 @@ Applying SensorRank PageRank once with an initial rank of \(0.25\) once on the e While this works for small networks, the crawlers must account for a significant amount of peers in the network for this change to be noticeable.\todo{for bigger (generated) graphs?} -\subsubsection{Excurs: Churn} +In our experiments on a snapshot of the Sality~\cite{bib:falliere_sality_2011} botnet exported from \ac{bms} over the span of \daterange{2021-04-21}{2021-04-28}\todo{export timespan}, even 1 iteration were enough to get distinct enough values to detect sensors and crawlers. -Churn describes the dynamics of peer participation of \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}. +\begin{table}[H] + \centering +\begin{tabular}{lllll} + \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\ + 1 & 0.24854932 & 0.63277194 & 0.15393478 & 0.56545578 \\ + 2 & 0.24854932 & 0.63277194 & 0.15393478 & 0.56545578 \\ + 3 & 0.24501068 & 0.46486353 & 0.13810930 & 0.41540997 \\ + 4 & 0.24501068 & 0.46486353 & 0.13810930 & 0.41540997 \\ + 5 & 0.24233737 & 0.50602884 & 0.14101354 & 0.45219598 \\ +\end{tabular} + \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:pr_iter_table_25} +\end{table} + +\begin{figure}[H] + \centering +\begin{subfigure}[b]{.5\textwidth} + \centering + \includegraphics[width=1\linewidth]{0.25_1_sr.png} + \caption{Distribution after 1 iteration}\label{fig:dist_sr_25_1} +\end{subfigure}% +\begin{subfigure}[b]{.5\textwidth} + \centering + \includegraphics[width=1\linewidth]{0.25_5_sr.png} + \caption{Distribution after 5 iterations}\label{fig:dist_sr_25_5} +\end{subfigure}% + \caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:dist_sr_25} +\end{figure} + +\begin{table}[H] + \centering +\begin{tabular}{lllll} + \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\ + 1 & 0.49709865 & 1.26554389 & 0.30786955 & 1.13091156 \\ + 2 & 0.49709865 & 1.26554389 & 0.30786955 & 1.13091156 \\ + 3 & 0.49002136 & 0.92972707 & 0.27621861 & 0.83081993 \\ + 4 & 0.49002136 & 0.92972707 & 0.27621861 & 0.83081993 \\ + 5 & 0.48467474 & 1.01205767 & 0.28202708 & 0.90439196 \\ +\end{tabular} + \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:pr_iter_table_5} +\end{table} + +\begin{figure}[H] + \centering +\begin{subfigure}[b]{.5\textwidth} + \centering + \includegraphics[width=1\linewidth]{0.50_1_sr.png} + \caption{Distribution after 1 iteration}\label{fig:dist_sr_50_1} +\end{subfigure}% +\begin{subfigure}[b]{.5\textwidth} + \centering + \includegraphics[width=1\linewidth]{0.50_5_sr.png} + \caption{Distribution after 5 iterations}\label{fig:dist_sr_50_5} +\end{subfigure}% + \caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:dist_sr_50} +\end{figure} + +\begin{table}[H] + \centering +\begin{tabular}{lllll} + \textbf{Iteration} & \textbf{Avg. PR} & \textbf{Crawler PR} & \textbf{Avg. SR} & \textbf{Crawler SR} \\ + 1 & 0.74564797 & 1.89831583 & 0.46180433 & 1.69636734 \\ + 2 & 0.74564797 & 1.89831583 & 0.46180433 & 1.69636734 \\ + 3 & 0.73503203 & 1.39459060 & 0.41432791 & 1.24622990 \\ + 4 & 0.73503203 & 1.39459060 & 0.41432791 & 1.24622990 \\ + 5 & 0.72701212 & 1.51808651 & 0.42304062 & 1.35658794 \\ +\end{tabular} + \caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:pr_iter_table_75} +\end{table} + +\begin{figure}[H] + \centering +\begin{subfigure}[b]{.5\textwidth} + \centering + \includegraphics[width=1\linewidth]{0.75_1_sr.png} + \caption{Distribution after 1 iteration}\label{fig:dist_sr_75_1} +\end{subfigure}% +\begin{subfigure}[b]{.5\textwidth} + \centering + \includegraphics[width=1\linewidth]{0.75_5_sr.png} + \caption{Distribution after 5 iterations}\label{fig:dist_sr_75_5} +\end{subfigure}% + \caption{SensorRank distribution with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:dist_sr_75} +\end{figure} + +The distribution graphs in \autoref{fig:dist_sr_25}, \autoref{fig:dist_sr_50} and \autoref{fig:dist_sr_75} show that the initial rank has no effect on the distribution, only on the actual numeric rank values. + +For all combinations of initial value and PageRank iterations, the rank for a well known crawler is in the \nth{95} percentile, so for our use case, those parameters do not matter. + +On average, peers in the analyzed dataset have \num{223} successors over the whole week. +Looking at the data in smaller buckets of one hour each, the average number of successors per peer is \num{90}. + +Churn describes the dynamics of peer participation of \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.\todo{übergang} Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's neighbourhood list. If the timing of the churn event correlates with IP rotation in the \ac{as}, it can be assumed, that the peer left due to being assigned a new IP address---not due to connectivity issues or going offline---and will not return using the same IP address. These peers, when placed in the neighbourhood list of the crawlers, will introduce paths back into the main network and defeat the \ac{wcc} metric. It also helps with the PageRank and SensorRank metrics since the crawlers start to look like regular peers without actually supporting the network by relaying messages or propagating active peers. +Knowledge of only \num{90} peers leaving due to IP rotation would be enough to make a crawler look average in Sality. +This number will differ between different botnets, depending on implementation details and size of the network. + +Adding edges from the known crawler to \num{90} random peers to simulate the described strategy gives the following rankings:\todo{table, distribution with random edges} + + + %}}} against graph metrics %}}} strategies diff --git a/report.pdf b/report.pdf index 57d810d7..9465f853 100644 Binary files a/report.pdf and b/report.pdf differ diff --git a/report.tex b/report.tex index bfff6677..a4394ecc 100644 --- a/report.tex +++ b/report.tex @@ -58,6 +58,7 @@ headsepline, % formatting numbers \usepackage{nicefrac} +\usepackage{nth} % units \usepackage{siunitx} \sisetup{% diff --git a/shell.nix b/shell.nix index 947ba118..4e780aa7 100644 --- a/shell.nix +++ b/shell.nix @@ -18,6 +18,7 @@ let dejavu isodate latexmk + nth siunitx substr todonotes