This commit is contained in:
Valentin Brandl 2022-03-18 18:25:54 +01:00
parent 516fca8dbc
commit 5819fa2443
2 changed files with 35 additions and 4 deletions

View File

@ -253,6 +253,8 @@ type PeerTask struct {
\section{Coordination Strategies} \section{Coordination Strategies}
Let \(C\) be the set of available crawlers. Let \(C\) be the set of available crawlers.
Without loss of generality, if not stated otherwise, we assume that \(C\) is known when \ac{bms} is started and will not change afterward.
There will be no joining or leaving crawlers.
%{{{ load balancing %{{{ load balancing
\subsection{Load Balancing} \subsection{Load Balancing}
@ -261,19 +263,48 @@ This strategy simply splits the work into even chunks and split it between the a
The following sharding conditions come to mind: The following sharding conditions come to mind:
\begin{itemize} \begin{itemize}
\item Assuming IP addresses are evenly distributed and so are infections, take the IP address as an \SI{32}{\bit} integer modulo \(\abs{C}\). \item Assuming IP addresses are evenly distributed and so are infections, take the IP address as an \SI{32}{\bit} integer modulo \(\abs{C}\). See~\autoref{sec:ip_part}
Problem: reassignment if a crawler joins or leaves Problem: reassignment if a crawler joins or leaves
\item Maintain an internal counter/list of tasks for each available crawler and assign to the crawler with the most available resources. \item Maintain an internal counter/list of tasks for each available crawler and assign to the crawler with the most available resources. See~\autoref{sec:ewd}
Easy reassignment Easy reassignment
\item Round Robin \item Round Robin. See~\autoref{sec:rr}
\end{itemize} \end{itemize}
Load balancing in itself does not help prevent the detection of crawlers but it allows better usage of available resources. Load balancing in itself does not help prevent the detection of crawlers but it allows better usage of available resources.
No peer will be crawled by more than one crawler and it allows crawling of bigger botnets where the current approach would reach its limit and could also be worked around with scaling up the machine where the crawler is executed. No peer will be crawled by more than one crawler and it allows crawling of bigger botnets where the current approach would reach its limit and could also be worked around with scaling up the machine where the crawler is executed.
Load balancing allows scaling out, which can be more cost-effective. Load balancing allows scaling out, which can be more cost-effective.
\subsubsection{Round Robin Distribution}\label{sec:rr}
\subsubsection{Even Work Distribution}\label{sec:ewd}
\todo{weighted round robin}
Work is evenly distributed between crawlers according to their capabilities.
For the sake of simplicity, we will only consider the bandwidth as capability but it can be extended by any shared property between the crawlers, \eg{} available memory, CPU speed.
For a given crawler \(c \in C\) let \(B_c\) be the total bandwidth of the crawler.
The total available bandwidth is \(B = \sum\limits_{c \in C} B_c\).
The weight \(W_c = \frac{B}{B_c}\)\todo{proper def for weight} defines which percentage of the work gets assigned to \(c\).
The set of target peers \(P = <p_0, p_1, \ldots, p_{n-1}>\), is partitioned into \(|C|\) subsets according to \(W_c\) and each subset is assigned to its crawler \(c\).
\begin{table}[H]
\center
\begin{tabular}{lll}
\(C_n\) & \(B_c\) & \(W_c\) \\
0 & 100 & \(\frac{10}{16}\) \\
1 & 10 & \(\frac{1}{16}\) \\
2 & 50 & \(\frac{5}{16}\) \\
\end{tabular}
\end{table}
\todo{remove me}
\subsubsection{IP-based Partitioning}\label{sec:ip_part}
Assuming IP addresses in a botnet are evenly distributed with regard to their \(\mod |C|\)\todo{source? law of large numbers}.
Using \(m(i) = i \mod |C|\) as mapping to determine which IP is assigned to which crawler.
This ensures neighboring IP addresses (\eg{} in the same \ac{as} and/or geolocation) get visited by different crawlers.
%}}} load balancing %}}} load balancing
%{{{ frequency reduction %{{{ frequency reduction
@ -577,4 +608,4 @@ In the end, I would like to thank
%}}} acknowledgments %}}} acknowledgments
% vim: set filetype=tex ts=2 sw=2 tw=0 et foldmethod=marker spell : % vim: set filetype=tex ts=2 sw=2 tw=0 et foldmethod=marker spell :

Binary file not shown.