Update
This commit is contained in:
parent
516fca8dbc
commit
5819fa2443
37
content.tex
37
content.tex
@ -253,6 +253,8 @@ type PeerTask struct {
|
||||
\section{Coordination Strategies}
|
||||
|
||||
Let \(C\) be the set of available crawlers.
|
||||
Without loss of generality, if not stated otherwise, we assume that \(C\) is known when \ac{bms} is started and will not change afterward.
|
||||
There will be no joining or leaving crawlers.
|
||||
|
||||
%{{{ load balancing
|
||||
\subsection{Load Balancing}
|
||||
@ -261,19 +263,48 @@ This strategy simply splits the work into even chunks and split it between the a
|
||||
The following sharding conditions come to mind:
|
||||
|
||||
\begin{itemize}
|
||||
\item Assuming IP addresses are evenly distributed and so are infections, take the IP address as an \SI{32}{\bit} integer modulo \(\abs{C}\).
|
||||
\item Assuming IP addresses are evenly distributed and so are infections, take the IP address as an \SI{32}{\bit} integer modulo \(\abs{C}\). See~\autoref{sec:ip_part}
|
||||
Problem: reassignment if a crawler joins or leaves
|
||||
|
||||
\item Maintain an internal counter/list of tasks for each available crawler and assign to the crawler with the most available resources.
|
||||
\item Maintain an internal counter/list of tasks for each available crawler and assign to the crawler with the most available resources. See~\autoref{sec:ewd}
|
||||
Easy reassignment
|
||||
|
||||
\item Round Robin
|
||||
\item Round Robin. See~\autoref{sec:rr}
|
||||
\end{itemize}
|
||||
|
||||
Load balancing in itself does not help prevent the detection of crawlers but it allows better usage of available resources.
|
||||
No peer will be crawled by more than one crawler and it allows crawling of bigger botnets where the current approach would reach its limit and could also be worked around with scaling up the machine where the crawler is executed.
|
||||
Load balancing allows scaling out, which can be more cost-effective.
|
||||
|
||||
\subsubsection{Round Robin Distribution}\label{sec:rr}
|
||||
|
||||
\subsubsection{Even Work Distribution}\label{sec:ewd}
|
||||
\todo{weighted round robin}
|
||||
|
||||
Work is evenly distributed between crawlers according to their capabilities.
|
||||
For the sake of simplicity, we will only consider the bandwidth as capability but it can be extended by any shared property between the crawlers, \eg{} available memory, CPU speed.
|
||||
For a given crawler \(c \in C\) let \(B_c\) be the total bandwidth of the crawler.
|
||||
The total available bandwidth is \(B = \sum\limits_{c \in C} B_c\).
|
||||
The weight \(W_c = \frac{B}{B_c}\)\todo{proper def for weight} defines which percentage of the work gets assigned to \(c\).
|
||||
The set of target peers \(P = <p_0, p_1, \ldots, p_{n-1}>\), is partitioned into \(|C|\) subsets according to \(W_c\) and each subset is assigned to its crawler \(c\).
|
||||
|
||||
\begin{table}[H]
|
||||
\center
|
||||
\begin{tabular}{lll}
|
||||
\(C_n\) & \(B_c\) & \(W_c\) \\
|
||||
0 & 100 & \(\frac{10}{16}\) \\
|
||||
1 & 10 & \(\frac{1}{16}\) \\
|
||||
2 & 50 & \(\frac{5}{16}\) \\
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
\todo{remove me}
|
||||
|
||||
\subsubsection{IP-based Partitioning}\label{sec:ip_part}
|
||||
|
||||
Assuming IP addresses in a botnet are evenly distributed with regard to their \(\mod |C|\)\todo{source? law of large numbers}.
|
||||
Using \(m(i) = i \mod |C|\) as mapping to determine which IP is assigned to which crawler.
|
||||
This ensures neighboring IP addresses (\eg{} in the same \ac{as} and/or geolocation) get visited by different crawlers.
|
||||
|
||||
%}}} load balancing
|
||||
|
||||
%{{{ frequency reduction
|
||||
|
BIN
report.pdf
BIN
report.pdf
Binary file not shown.
Loading…
Reference in New Issue
Block a user