mirror of https://github.com/lukechilds/docs.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
184 lines
8.3 KiB
184 lines
8.3 KiB
---
|
|
layout: core
|
|
permalink: /:collection/:path.html
|
|
---
|
|
# How Atlas Works
|
|
{:.no_toc}
|
|
|
|
Atlas was designed to overcome the structural weaknesses inherent to all
|
|
distributed hash tables. In particular, it uses an unstructured peer network to
|
|
maximize resilience against network link failure, and it uses the underlying
|
|
blockchain (through BNS) to rate-limit chunk announcements.
|
|
|
|
This section contains the following sections:
|
|
|
|
* TOC
|
|
{:toc}
|
|
|
|
## Peer Selection
|
|
|
|
Atlas peers self-organize into an unstructured peer-to-peer network.
|
|
The Atlas peer network is a [random K-regular
|
|
graph](https://en.wikipedia.org/wiki/Random_regular_graph). Each node maintains
|
|
*K* neighbors chosen at random from the set of Atlas peers.
|
|
|
|
Atlas nodes select peers by carrying out an unbiased random walk of the peer
|
|
graph. When "visiting" a node *N*, it will ask for *N*'s neighbors and then
|
|
"step" to one of them with a probability dependent on *N*'s out-degree and the
|
|
neighbor's in-degree.
|
|
|
|
The sampling algorithm is based on the Metropolis-Hastings (MH) random graph walk
|
|
algorithm, but with a couple key differences. In particular, the algorithm
|
|
attempts to calculate an unbiased peer graph sample that accounts for the fact
|
|
that most nodes will be short-lived or unreliable, while a few persistent nodes
|
|
will remain online for long periods of time. The sampling algorithm accounts
|
|
for this with the following tweaks:
|
|
|
|
* If the neighbors of the visited node *N* are all unresponsive, the random
|
|
walk resets to a randomly-chosen known neighbor. There is no back-tracking on
|
|
the peer graph in this case.
|
|
|
|
* The transition probability from *N* to a live neighbor is *NOT* `min(1,
|
|
degree(neighbor)/degree(N))` like it is in the vanilla MH algorithm. Instead,
|
|
the transition probability discourages backtracking to the previous neighbor *N_prev*,
|
|
but in a way that still guarantees that the sampling will remain unbiased.
|
|
|
|
* A peer does not report its entire neighbor set when queried,
|
|
but only reports a random subset of peers that have met a minimium health threshold.
|
|
|
|
* A new neighbor is only selected if it belongs to the same [BNS
|
|
fork-set]({{site.baseurl}}/core/naming/introduction.html#bns-forks) (i.e. it reports
|
|
as having a recent valid consensus hash).
|
|
|
|
The algorithm was adapted from the work from [Lee, Xu, and
|
|
Eun](https://arxiv.org/pdf/1204.4140.pdf) in the proceedings of
|
|
ACM SIGMETRICS 2012.
|
|
|
|
## Comparison to DHTs
|
|
|
|
The reason Atlas uses an unstructured random peer network
|
|
instead of a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table)
|
|
(DHT) is that DHTs are susceptbile to Sybil attacks. An adaptive adversary can
|
|
insert malicious nodes into the DHT in order to stop victims from
|
|
resolving chunks or finding honest neighbors.
|
|
|
|
### Chunk Censorship
|
|
|
|
In a DHT, an attacker can censor a chunk by inserting nodes into the peers' routing tables
|
|
such that the attacker takes control over all of the chunk's hash buckets.
|
|
It can do so at any point in time after the chunk was first stored,
|
|
because only the peers who maintain the chunk's hash bucket have to store it.
|
|
This is a *fundamental* problem with structured overlay networks
|
|
that perform request routing based on content hash---they give the attacker
|
|
insight as to the path(s) the queries take through the peer graph, and thus
|
|
reduce the number of paths the attacker must disrupt in order to censor the
|
|
chunk.
|
|
|
|
Atlas uses an unstructured overlay network combined with a 100% chunk
|
|
replication strategy in order to maximize
|
|
the amount of work an adversary has to do to censor a chunk.
|
|
In Atlas, all peers replicate a chunk, and the paths the chunk take through the
|
|
network are *independent* of the content and *randomized* by the software
|
|
(so the paths cannot be predicted in advance). The attacker's only
|
|
recourse is to quickly identify the nodes that can serve the chunk and partition them from
|
|
the rest of the network in order to carry out a censorship attack.
|
|
This requires them to have visibility into the vast majority of network links in
|
|
the Atlas network (which is extremely difficult to do, because in practice Atlas
|
|
peers maintain knowledge of up to 65536 neighbors and only report 10 random peers
|
|
when asked).
|
|
|
|
### Neighbor Censorship
|
|
|
|
Another problem with DHTs is that their overlay
|
|
network structure is determined by preferential attachment. Not every peer that
|
|
contacts a given DHT node has an equal chance of becoming its neighbor.
|
|
The node will instead rank a set of peers as being more or less ideal
|
|
for being neighbors. In DHTs, the degree of preference a node exhibits to
|
|
another node is usually a function of the node's self-given node identifier
|
|
(e.g. a node might want to select neighbors based on proximity in the key
|
|
space).
|
|
|
|
The preferential attachment property means that an adaptive adversary can game the node's
|
|
neighbor selection algorithm by inserting malicious nodes that do not
|
|
forward routing or lookup requests. The attacker does not even have to eclipse
|
|
the victim node---the victim node will simply prefer to talk to the attacker's unhelpful nodes
|
|
instead of helpful honest nodes. In doing so, the attacker can prevent honest peers from discovering each
|
|
other and each other's chunks.
|
|
|
|
Atlas's neighbor selection strategy does not exhibit preferential attachment
|
|
based on any self-reported node properties. A
|
|
node is selected as a neighbor only if it is reached through an unbiased random graph
|
|
walk, and if it responds to queries correctly.
|
|
In doing so, an attacker is forced to completely eclipse a set of nodes
|
|
in order to cut them off from the rest of the network.
|
|
|
|
## Chunk Propagation
|
|
|
|
Atlas nodes maintain an *inventory* of chunks that are known to exist. Each
|
|
node independently calculates the chunk inventory from its BNS database.
|
|
Because the history of name operations in BNS is linearized, each node can
|
|
construct a linearized sub-history of name operations that can set chunk
|
|
hashes as their name state. This gives them a linearized sequence of chunks,
|
|
and every Atlas peer will independently arrive at the same sequence by reading
|
|
the same blockchain.
|
|
|
|
Atlas peers keep track of which chunks are present and which are absent. They
|
|
each construct an *inventory vector* of chunks *V* such that *V[i]* is set to 1
|
|
if the node has the chunk whose hash is in the *i*th position in the chunk
|
|
sequence (and set to 0 if it is absent).
|
|
|
|
Atlas peers exchange their inventory vectors with their neighbors in order to
|
|
find out which chunks they each have. Atlas nodes download chunks from
|
|
neighbors in rarest-first order in order to prioritize data replication for the
|
|
chunks that are currently most at-risk for disappearing due to node failure.
|
|
|
|
```
|
|
Name operation | chunk hashes | chunk data | Inventory
|
|
history | as name state | | vector
|
|
|
|
+-------------------+
|
|
| NAME_PREORDER |
|
|
+-------------------+----------------+
|
|
| NAME_REGISTRATION | chunk hash | "0123abcde..." 1
|
|
+-------------------+----------------+
|
|
| NAME_UPDATE | chunk hash | (null) 0
|
|
+-------------------+----------------+
|
|
| NAME_TRANSFER |
|
|
+-------------------+
|
|
| NAME_PREORDER |
|
|
+-------------------+----------------+
|
|
| NAME_IMPORT | chunk hash | "4567fabcd..." 1
|
|
+-------------------+----------------+
|
|
| NAME_TRANSFER |
|
|
+-------------------|
|
|
. . .
|
|
|
|
|
|
Figure 2: Relationship between Atlas node chunk inventory and BNS name state.
|
|
Some name operations announce name state in the blockchain, which Atlas
|
|
interprets as a chunk hash. The Atlas node builds up a vector of which chunks
|
|
it has and which ones it does not, and announces it to other Atlas peers so
|
|
they can fetch chunks they are missing. In this example, the node's
|
|
inventory vector is [1, 0, 1], since the 0th and 2nd chunks are present
|
|
but the 1st chunk is missing.
|
|
```
|
|
|
|
## Querying Chunk Inventories
|
|
|
|
Developers can query a node's inventory vector as follows:
|
|
|
|
```python
|
|
>>> import blockstack
|
|
>>> result = blockstack.lib.client.get_zonefile_inventory("https://node.blockstack.org:6263", 0, 524288)
|
|
>>> print len(result['inv'])
|
|
11278
|
|
>>>
|
|
```
|
|
|
|
The variable `result['inv']` here is a big-endian bit vector, where the *i*th
|
|
bit is set to 1 if the *i*th chunk in the chunk sequence is present. The bit at
|
|
`i=0` (the earliest chunk) refers to the leftmost bit.
|
|
|
|
A sample program that inspects a set of Atlas nodes' inventory vectors and determines
|
|
which ones are missing which chunks can be found
|
|
[here](https://github.com/blockstack/atlas/blob/master/atlas/atlas-test).
|
|
|