Browse Source

BIP-0158: allow filters to define values for P and M, reparameterize default filter

master
Olaoluwa Osuntokun 6 years ago
parent
commit
1c2ed6dce3
No known key found for this signature in database GPG Key ID: 964EA263DD637C21
  1. 65
      bip-0158.mediawiki

65
bip-0158.mediawiki

@ -65,11 +65,14 @@ For each block, compact filters are derived containing sets of items associated
with the block (eg. addresses sent to, outpoints spent, etc.). A set of such with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
data objects is compressed into a probabilistic structure called a data objects is compressed into a probabilistic structure called a
''Golomb-coded set'' (GCS), which matches all items in the set with probability ''Golomb-coded set'' (GCS), which matches all items in the set with probability
1, and matches other items with probability <code>2^(-P)</code> for some integer 1, and matches other items with probability <code>2^(-P)</code> for some
parameter <code>P</code>. integer parameter <code>P</code>. We also introduce parameter <code>M</code>
which allows filter to uniquely tune the range that items are hashed onto
before compressing. Each defined filter also selects distinct parameters for P
and M.
At a high level, a GCS is constructed from a set of <code>N</code> items by: At a high level, a GCS is constructed from a set of <code>N</code> items by:
# hashing all items to 64-bit integers in the range <code>[0, N * 2^P)</code> # hashing all items to 64-bit integers in the range <code>[0, N * M)</code>
# sorting the hashed values in ascending order # sorting the hashed values in ascending order
# computing the differences between each value and the previous one # computing the differences between each value and the previous one
# writing the differences sequentially, compressed with Golomb-Rice coding # writing the differences sequentially, compressed with Golomb-Rice coding
@ -80,9 +83,13 @@ The following sections describe each step in greater detail.
The first step in the filter construction is hashing the variable-sized raw The first step in the filter construction is hashing the variable-sized raw
items in the set to the range <code>[0, F)</code>, where <code>F = N * items in the set to the range <code>[0, F)</code>, where <code>F = N *
2^P</code>. Set membership queries against the hash outputs will have a false M</code>. Customarily, <code>M</code> is set to <code>2^P</code>. However, if
positive rate of <code>2^(-P)</code>. To avoid integer overflow, the number of one is able to select both Parameters independently, then more optimal values
items <code>N</code> MUST be <2^32 and <code>P</code> MUST be <=32. can be
selected<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
Set membership queries against the hash outputs will have a false positive rate
of <code>2^(-P)</code>. To avoid integer overflow, the
number of items <code>N</code> MUST be <2^32 and <code>M</code> MUST be <2^32.
The items are first passed through the pseudorandom function ''SipHash'', which The items are first passed through the pseudorandom function ''SipHash'', which
takes a 128-bit key <code>k</code> and a variable-sized byte vector and produces takes a 128-bit key <code>k</code> and a variable-sized byte vector and produces
@ -104,9 +111,9 @@ result.
hash_to_range(item: []byte, F: uint64, k: [16]byte) -> uint64: hash_to_range(item: []byte, F: uint64, k: [16]byte) -> uint64:
return (siphash(k, item) * F) >> 64 return (siphash(k, item) * F) >> 64
hashed_set_construct(raw_items: [][]byte, P: uint, k: [16]byte) -> []uint64: hashed_set_construct(raw_items: [][]byte, k: [16]byte, M: uint) -> []uint64:
let N = len(raw_items) let N = len(raw_items)
let F = N << P let F = N * M
let set_items = [] let set_items = []
@ -197,8 +204,8 @@ with Golomb-Rice coding. Finally, the bit stream is padded with 0's to the
nearest byte boundary and serialized to the output byte vector. nearest byte boundary and serialized to the output byte vector.
<pre> <pre>
construct_gcs(L: [][]byte, P: uint, k: [16]byte) -> []byte: construct_gcs(L: [][]byte, P: uint, k: [16]byte, M: uint) -> []byte:
let set_items = hashed_set_construct(L, P, k) let set_items = hashed_set_construct(L, k, M)
set_items.sort() set_items.sort()
@ -224,8 +231,8 @@ against the reconstructed values. Note that querying does not require the entire
decompressed set be held in memory at once. decompressed set be held in memory at once.
<pre> <pre>
gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint) -> bool: gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint, M: uint) -> bool:
let F = N << P let F = N * M
let target_hash = hash_to_range(target, F, k) let target_hash = hash_to_range(target, F, k)
stream = new_bit_stream(compressed_set) stream = new_bit_stream(compressed_set)
@ -260,6 +267,8 @@ against the decompressed GCS contents. See
This BIP defines one initial filter type: This BIP defines one initial filter type:
* Basic (<code>0x00</code>) * Basic (<code>0x00</code>)
* <code>M = 784931</code>
* <code>P = 19</code>
==== Contents ==== ==== Contents ====
@ -271,24 +280,27 @@ items for each transaction in a block:
==== Construction ==== ==== Construction ====
Both the basic and extended filter types are constructed as Golomb-coded sets The basic type is constructed as Golomb-coded sets with the following
with the following parameters. parameters.
The parameter <code>P</code> MUST be set to <code>20</code>. This value was The parameter <code>P</code> MUST be set to <code>19</code>, and the parameter
chosen as simulations show that it minimizes the bandwidth utilized, considering <code>M</code> MUST be set to <code>784931</code>. Analysis has shown that if
both the expected number of blocks downloaded due to false positives and the one is able to select <code>P</code> and <code>M</code> independently, then
size of the filters themselves. The code along with a demo used for the setting <code>M=1.497137 * 2^P</code> is close to optimal
parameter tuning can be found <ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
[https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go here].
Empirical analysis also shows that was chosen as these parameters minimize the
bandwidth utilized, considering both the expected number of blocks downloaded
due to false positives and the size of the filters themselves.
The parameter <code>k</code> MUST be set to the first 16 bytes of the hash of The parameter <code>k</code> MUST be set to the first 16 bytes of the hash of
the block for which the filter is constructed. This ensures the key is the block for which the filter is constructed. This ensures the key is
deterministic while still varying from block to block. deterministic while still varying from block to block.
Since the value <code>N</code> is required to decode a GCS, a serialized GCS Since the value <code>N</code> is required to decode a GCS, a serialized GCS
includes it as a prefix, written as a CompactSize. Thus, the complete includes it as a prefix, written as a <code>CompactSize</code>. Thus, the
serialization of a filter is: complete serialization of a filter is:
* <code>N</code>, encoded as a CompactSize * <code>N</code>, encoded as a <code>CompactSize</code>
* The bytes of the compressed filter itself * The bytes of the compressed filter itself
==== Signaling ==== ==== Signaling ====
@ -311,7 +323,8 @@ though it requires implementation of the new filters.
We would like to thank bfd (from the bitcoin-dev mailing list) for bringing the We would like to thank bfd (from the bitcoin-dev mailing list) for bringing the
basis of this BIP to our attention, Greg Maxwell for pointing us in the basis of this BIP to our attention, Greg Maxwell for pointing us in the
direction of Golomb-Rice coding and fast range optimization, and Pedro direction of Golomb-Rice coding and fast range optimization, Pieter Wullie for
his analysis of optimal GCS parameters, and Pedro
Martelletto for writing the initial indexing code for <code>btcd</code>. Martelletto for writing the initial indexing code for <code>btcd</code>.
We would also like to thank Dave Collins, JJ Jeffrey, and Eric Lombrozo for We would also like to thank Dave Collins, JJ Jeffrey, and Eric Lombrozo for
@ -363,8 +376,8 @@ easier to understand.
=== Golomb-Coded Set Multi-Match === === Golomb-Coded Set Multi-Match ===
<pre> <pre>
gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint) -> bool: gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint, M: uint) -> bool:
let F = N << P let F = N * M
// Map targets to the same range as the set hashes. // Map targets to the same range as the set hashes.
let target_hashes = [] let target_hashes = []

Loading…
Cancel
Save