Update
This commit is contained in:
parent
e3142f1938
commit
a7550a0557
68
content.tex
68
content.tex
@ -384,14 +384,59 @@ One of those, \enquote{SensorBuster} uses \acp{wcc} since crawlers don't have an
|
|||||||
Building a complete graph \(G_C = K_{\abs{C}}\) between the crawlers by making them return the other crawlers on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\autoref{fig:sensorbuster2} and~\autoref{fig:metrics_table}).
|
Building a complete graph \(G_C = K_{\abs{C}}\) between the crawlers by making them return the other crawlers on peer list requests would still produce a disconnected component and while being bigger and maybe not as obvious at first glance, it is still easily detectable since there is no path from \(G_C\) back to the main network (see~\autoref{fig:sensorbuster2} and~\autoref{fig:metrics_table}).
|
||||||
|
|
||||||
\todo{rank? deg+ - deg-?}
|
\todo{rank? deg+ - deg-?}
|
||||||
With \(v \in V\), \(\text{rank}(v)\), \(\text{succ}(v)\) being the set of successors of \(v\) and \(\text{pred}(v)\) being the set of predecessors of \(v\), PageRank is defined as~\cite{bib:page_pagerank_1998}:
|
With \(v \in V\), \(\text{succ}(v)\) being the set of successors of \(v\) and \(\text{pred}(v)\) being the set of predecessors of \(v\), PageRank recursively is defined as~\cite{bib:page_pagerank_1998}:
|
||||||
|
|
||||||
\[
|
\[
|
||||||
\text{PR}(v) = \text{dampingFactor} \times \sum\limits_{p \in \text{pred}(v)} \frac{\text{rank}(p)}{\abs{\text{succ}(p)}} + \frac{1 - \text{dampingFactor}}{\abs{V}}
|
\text{PR}(v) = \text{dampingFactor} \times \sum\limits_{p \in \text{pred}(v)} \frac{\text{PR}(p)}{\abs{\text{succ}(p)}} + \frac{1 - \text{dampingFactor}}{\abs{V}}
|
||||||
\]
|
\]
|
||||||
|
|
||||||
|
For the first iteration, the PageRank of all nodes is set to the same initial value. When iterating often enough, any value can be chosen~\cite{bib:page_pagerank_1998}.\todo{how often? experiments!}
|
||||||
|
In our experiments on a snapshot of the Sality botnet exported from \ac{bms} over the span of\todo{export timespan}, 3 iterations were enough to get distinct enough values to detect sensors and crawlers.
|
||||||
|
|
||||||
|
\begin{figure}[H]
|
||||||
|
\centering
|
||||||
|
\begin{tabular}{lllll}
|
||||||
|
Iteration & Avg. PR & Crawler PR & Avg. SR & Crawler SR \\
|
||||||
|
1 & wat? & wut? & wit? & wot? \\
|
||||||
|
2 & wat? & wut? & wit? & wot? \\
|
||||||
|
3 & wat? & wut? & wit? & wot? \\
|
||||||
|
4 & wat? & wut? & wit? & wot? \\
|
||||||
|
5 & wat? & wut? & wit? & wot? \\
|
||||||
|
\end{tabular}
|
||||||
|
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.25\)}\label{fig:pr_iter_table}
|
||||||
|
\end{figure}
|
||||||
|
\todo{proper table formatting}
|
||||||
|
|
||||||
|
\begin{figure}[H]
|
||||||
|
\centering
|
||||||
|
\begin{tabular}{lllll}
|
||||||
|
Iteration & Avg. PR & Crawler PR & Avg. SR & Crawler SR \\
|
||||||
|
1 & wat? & wut? & wit? & wot? \\
|
||||||
|
2 & wat? & wut? & wit? & wot? \\
|
||||||
|
3 & wat? & wut? & wit? & wot? \\
|
||||||
|
4 & wat? & wut? & wit? & wot? \\
|
||||||
|
5 & wat? & wut? & wit? & wot? \\
|
||||||
|
\end{tabular}
|
||||||
|
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.5\)}\label{fig:pr_iter_table}
|
||||||
|
\end{figure}
|
||||||
|
\todo{proper table formatting}
|
||||||
|
|
||||||
|
\begin{figure}[H]
|
||||||
|
\centering
|
||||||
|
\begin{tabular}{lllll}
|
||||||
|
Iteration & Avg. PR & Crawler PR & Avg. SR & Crawler SR \\
|
||||||
|
1 & wat? & wut? & wit? & wot? \\
|
||||||
|
2 & wat? & wut? & wit? & wot? \\
|
||||||
|
3 & wat? & wut? & wit? & wot? \\
|
||||||
|
4 & wat? & wut? & wit? & wot? \\
|
||||||
|
5 & wat? & wut? & wit? & wot? \\
|
||||||
|
\end{tabular}
|
||||||
|
\caption{Values for PageRank iterations with initial rank \(\forall v \in V : \text{PR}(v) = 0.75\)}\label{fig:pr_iter_table}
|
||||||
|
\end{figure}
|
||||||
|
\todo{proper table formatting}
|
||||||
|
|
||||||
The dampingFactor describes the probability of a person visiting links on the web to continue doing so, when using PageRank to rank websites in search results.
|
The dampingFactor describes the probability of a person visiting links on the web to continue doing so, when using PageRank to rank websites in search results.
|
||||||
For simplicity, and since it is not required to model human behaviour for automated crawling and ranking, a dampingFactor of \(1.0\) will be used, which simplifies the formula to
|
For simplicity---and since it is not required to model human behaviour for automated crawling and ranking---a dampingFactor of \(1.0\) will be used, which simplifies the formula to
|
||||||
|
|
||||||
\[
|
\[
|
||||||
\text{PR}(v) = \sum\limits_{p \in \text{pred}(v)} \frac{\text{rank}(p)}{\abs{\text{succ}(p)}}
|
\text{PR}(v) = \sum\limits_{p \in \text{pred}(v)} \frac{\text{rank}(p)}{\abs{\text{succ}(p)}}
|
||||||
@ -424,20 +469,17 @@ Based on this, SensorRank is defined as
|
|||||||
|
|
||||||
Applying SensorRank PageRank once with an initial rank of \(0.25\) once on the example graphs above results in:
|
Applying SensorRank PageRank once with an initial rank of \(0.25\) once on the example graphs above results in:
|
||||||
|
|
||||||
\todo{pagerank, sensorrank calculations, proper example graphs}
|
\todo{pagerank, sensorrank calculations, proper example graphs, proper table formatting}
|
||||||
\begin{figure}[H]
|
\begin{figure}[H]
|
||||||
\centering
|
\centering
|
||||||
\begin{tabular}{|l|l|l|l|l|l|}
|
\begin{tabular}{llllll}
|
||||||
\hline
|
|
||||||
Node & \(\deg^{+}\) & \(\deg^{-}\) & In \ac{wcc}? & PageRank & SensorRank \\
|
Node & \(\deg^{+}\) & \(\deg^{-}\) & In \ac{wcc}? & PageRank & SensorRank \\
|
||||||
\hline\hline
|
|
||||||
n0 & 0/0 & 4/4 & no & 0.75/0.5625 & 0.3125/0.2344 \\
|
n0 & 0/0 & 4/4 & no & 0.75/0.5625 & 0.3125/0.2344 \\
|
||||||
n1 & 1/1 & 3/3 & no & 0.25/0.1875 & 0.0417/0.0313 \\
|
n1 & 1/1 & 3/3 & no & 0.25/0.1875 & 0.0417/0.0313 \\
|
||||||
n2 & 2/2 & 2/2 & no & 0.5/0.375 & 0.3333/0.25 \\
|
n2 & 2/2 & 2/2 & no & 0.5/0.375 & 0.3333/0.25 \\
|
||||||
c0 & 3/5 & 0/2 & yes (1/3) & 0.0/0.125 & 0.0/0.0104 \\
|
c0 & 3/5 & 0/2 & yes (1/3) & 0.0/0.125 & 0.0/0.0104 \\
|
||||||
c1 & 1/3 & 0/2 & yes (1/3) & 0.0/0.125 & 0.0/0.0104 \\
|
c1 & 1/3 & 0/2 & yes (1/3) & 0.0/0.125 & 0.0/0.0104 \\
|
||||||
c2 & 2/4 & 0/2 & yes (1/3) & 0.0/0.125 & 0.0/0.0104 \\
|
c2 & 2/4 & 0/2 & yes (1/3) & 0.0/0.125 & 0.0/0.0104 \\
|
||||||
\hline
|
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\caption{Values for metrics from~\autoref{fig:sensorbuster} (a/b)}\label{fig:metrics_table}
|
\caption{Values for metrics from~\autoref{fig:sensorbuster} (a/b)}\label{fig:metrics_table}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
@ -450,7 +492,7 @@ While this works for small networks, the crawlers must account for a significant
|
|||||||
|
|
||||||
Churn describes the dynamics of peer participation of \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.
|
Churn describes the dynamics of peer participation of \ac{p2p} systems, \eg{} join and leave events~\cite{bib:stutzbach_churn_2006}.
|
||||||
Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's neighbourhood list.
|
Detecting if a peer just left the system, in combination with knowledge about \acp{as}, peers that just left and came from an \ac{as} with dynamic IP allocation (\eg{} many consumer broadband providers in the US and Europe), can be placed into the crawler's neighbourhood list.
|
||||||
If the timing if the churn event correlates with IP rotation in the \ac{as}, it can be assumed, that the peer left due to being assigned a new IP address and not due to connectivity issues or going offline, and will not return using the same IP address.
|
If the timing of the churn event correlates with IP rotation in the \ac{as}, it can be assumed, that the peer left due to being assigned a new IP address---not due to connectivity issues or going offline---and will not return using the same IP address.
|
||||||
These peers, when placed in the neighbourhood list of the crawlers, will introduce paths back into the main network and defeat the \ac{wcc} metric.
|
These peers, when placed in the neighbourhood list of the crawlers, will introduce paths back into the main network and defeat the \ac{wcc} metric.
|
||||||
It also helps with the PageRank and SensorRank metrics since the crawlers start to look like regular peers without actually supporting the network by relaying messages or propagating active peers.
|
It also helps with the PageRank and SensorRank metrics since the crawlers start to look like regular peers without actually supporting the network by relaying messages or propagating active peers.
|
||||||
|
|
||||||
@ -505,6 +547,14 @@ Current report possibilities are \mintinline{go}{LoggingReport} to simply log ne
|
|||||||
|
|
||||||
%}}} implementation
|
%}}} implementation
|
||||||
|
|
||||||
|
%{{{ further work
|
||||||
|
\section{Further Work}
|
||||||
|
|
||||||
|
Following this work it should be possible to rewrite the existing crawlers to use the new abstraction.
|
||||||
|
This might bring some performance issues to light which can be solved by investigating the optimizations from the old implementation and apply them to the new one.
|
||||||
|
|
||||||
|
%}}} further work
|
||||||
|
|
||||||
%{{{ acknowledgments
|
%{{{ acknowledgments
|
||||||
\section*{Acknowledgments}
|
\section*{Acknowledgments}
|
||||||
|
|
||||||
|
BIN
report.pdf
BIN
report.pdf
Binary file not shown.
Loading…
Reference in New Issue
Block a user