updates the bolt version to 6639cef095a2ecc7b8f0c48c6e7f2f906fbfbc58.
this requires us to use the new bolt parser at generate-bolt.py
and updates to all of the type specifications (ie. from u8 -> byte)
We hit the timestamp assert on #2750; it shouldn't happen, but crashing
doesn't leave much information.
Reported-by: @m-schmook
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We seek a certain number of peers at each level of gossip; 3 "flood"
if we're missing gossip, 2 at 24 hours past to catch recent gossip, and
8 with current gossip. The rest are given a filter which causes them
not to gossip to us at all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The first sign that we're missing gossip is that we get a channel_update
for an unknown channel. The peer might be wrong (or lying), but if it turns
out to be a real channel, we were definitely missing something.
This patch does two things: queries when we get an unknown channel_update,
and then notes that a channel_announcement was from such an update when
it's finally processed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, we'll need to know the short_channel_id if a
channel_update is unknown (implies we're missing a channel), and whether
processing a pending channel_announcement was successful (implies that
the channel was real).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Up until now we only generated these in dev mode for testing. Hoist
into common code, turn counter into a flag (we're only allowed one!)
and note if query is internal or not.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means there's now a semantic difference between the default `fromid`
and setting `fromid` explicitly to our own node_id. In the default case,
it means we don't charge ourselves fees on the route.
This means we can spend the full channel balance.
We still want to consider the pricing of local channels, however:
there's a *reason* to discount one over another, and that is to bias
things. So we add the first-hop fee to the *risk* value instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This clarifies things a fair bit: we simply add and remove from the
gossip_store directly.
Before this series: (--disable-developer, -Og)
store_load_msec:20669-20902(20822.2+/-82)
vsz_kb:439704-439712(439706+/-3.2)
listnodes_sec:0.890000-1.000000(0.92+/-0.04)
listchannels_sec:11.960000-13.380000(12.576+/-0.49)
routing_sec:3.070000-5.970000(4.814+/-1.2)
peer_write_all_sec:28.490000-30.580000(29.532+/-0.78)
After: (--disable-developer, -Og)
store_load_msec:19722-20124(19921.6+/-1.4e+02)
vsz_kb:288320
listnodes_sec:0.860000-0.980000(0.912+/-0.056)
listchannels_sec:10.790000-12.260000(11.65+/-0.5)
routing_sec:2.540000-4.950000(4.262+/-0.88)
peer_write_all_sec:17.570000-19.500000(18.048+/-0.73)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we intercept the peer's gossip_timestamp_filter request
in the per-peer subdaemon itself. The rest of the semantics are fairly
simple however.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project. But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.
This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.
We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They already send *us* gossip messages, so they have to be distinct anyway.
Why make us both do extra work?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the high bit of the length field: this way we can still check
that the checksums are valid on deleted fields.
Once this is done, serially reading the gossip_store file will result
in a complete, ordered, minimal gossip broadcast. Also, the horrible
corner case where we might try to delete things from the store during
load time is completely gone: we only load non-deleted things.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each destructor2 costs 40 bytes, and struct chan is only 120 bytes. So
this drops our memory usage quite a bit:
MCP bench results change:
-vsz_kb:580004-580016(580006+/-4.8)
+vsz_kb:533148
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This has two effects: most importantly, it avoids the problem where
lightningd creates a 800MB JSON blob in response to listchannels,
which causes OOM on the Raspberry Pi (our previous max allocation was
832MB). This is because lightning-cli can start draining the JSON
while we're filling the buffer, so we end up with a max allocation of
68MB.
But despite being less efficient (multiple queries to gossipd), it
actually speeds things up due to the parallelism:
MCP with -O3 -flto before vs after:
-listchannels_sec:8.980000-9.330000(9.206+/-0.14)
+listchannels_sec:7.500000-7.830000(7.656+/-0.11)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now have a test blockchain for MCP which has the correct channels,
so this is not needed.
Also fix a benchmark script bug where 'mv "$DIR"/log
"$DIR"/log.old.$$' would fail if you log didn't exist from a previous run.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of reading the store ourselves, we can just send them an
offset. This saves gossipd a lot of work, putting it where it belongs
(in the daemon responsible for the specific peer).
MCP bench results:
store_load_msec:28509-31001(29206.6+/-9.4e+02)
vsz_kb:580004-580016(580006+/-4.8)
store_rewrite_sec:11.640000-12.730000(11.908+/-0.41)
listnodes_sec:1.790000-1.880000(1.83+/-0.032)
listchannels_sec:21.180000-21.950000(21.476+/-0.27)
routing_sec:2.210000-11.160000(7.126+/-3.1)
peer_write_all_sec:36.270000-41.200000(38.168+/-1.9)
Signficant savings in streaming gossip:
-peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)
+peer_write_all_sec:35.780000-37.980000(36.43+/-0.81)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to store the channel capacity for channel_announcement: hand it
in directly rather than having the gossip_store code do a lookup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We know exactly how many there will be, so allocate an entire array up-front.
-listnodes_sec:2.540000-2.610000(2.584+/-0.029)
+listnodes_sec:2.100000-2.170000(2.118+/-0.026)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we compact the store, we need to adjust the broadast index for
peers so they know where they're up to.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This requires some trickiness when we want to re-add unannounced channels
to the store after compaction, so we extract a common "copy_message" to
transfer from old store to new.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:36034-37853(37109.8+/-5.9e+02)
vsz_kb:577456
store_rewrite_sec:12.490000-13.250000(12.862+/-0.27)
listnodes_sec:1.250000-1.480000(1.364+/-0.09)
listchannels_sec:30.820000-31.480000(31.068+/-0.24)
routing_sec:26.940000-27.990000(27.616+/-0.39)
peer_write_all_sec:65.690000-68.600000(66.698+/-0.99)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1202316
+vsz_kb:577456
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The txout_script field is unused; the local_disable only applies to
the handful of local channels, so move that into a hash table.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:39207-45089(41374.6+/-2.2e+03)
vsz_kb:1202316
store_rewrite_sec:15.090000-16.790000(15.654+/-0.63)
listnodes_sec:1.290000-3.790000(1.938+/-0.93)
listchannels_sec:30.190000-32.120000(31.31+/-0.69)
routing_sec:28.220000-31.340000(29.314+/-1.2)
peer_write_all_sec:66.830000-76.850000(71.976+/-3.6)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:35107-37944(36686+/-1e+03)
+store_load_msec:39207-45089(41374.6+/-2.2e+03)
-vsz_kb:1218036
+vsz_kb:1202316
-listchannels_sec:28.510000-30.270000(29.6+/-0.6)
+listchannels_sec:30.190000-32.120000(31.31+/-0.69)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to have a `struct chan` while we're waiting for an update; now we
keep that internally. So a `struct chan` without a channel_announcement
in the store is private, and other is public.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reload them from disk if they do listnodes.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35390-38659(37336.4+/-1.3e+03)
vsz_kb:1780516
store_rewrite_sec:13.800000-16.800000(15.02+/-0.98)
listnodes_sec:1.280000-1.530000(1.382+/-0.096)
listchannels_sec:28.700000-30.440000(29.34+/-0.68)
routing_sec:30.120000-31.080000(30.526+/-0.35)
peer_write_all_sec:65.910000-76.850000(69.462+/-4.1)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1792996
+vsz_kb:1780516
-listnodes_sec:1.030000-1.120000(1.068+/-0.032)
+listnodes_sec:1.280000-1.530000(1.382+/-0.096)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we need the payload, pull it from the gossip store.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:30189-52561(39416.4+/-8.8e+03)
vsz_kb:1812904
store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
listnodes_sec:1.120000-1.230000(1.176+/-0.044)
listchannels_sec:38.900000-50.580000(44.716+/-3.9)
routing_sec:45.080000-48.160000(46.814+/-1.1)
peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:2288784
+vsz_kb:1812904
-store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
+store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
-listnodes_sec:0.750000-0.850000(0.794+/-0.042)
+listnodes_sec:1.120000-1.230000(1.176+/-0.044)
-listchannels_sec:30.740000-31.760000(31.096+/-0.35)
+listchannels_sec:38.900000-50.580000(44.716+/-3.9)
-routing_sec:29.600000-33.560000(30.472+/-1.5)
+routing_sec:45.080000-48.160000(46.814+/-1.1)
-peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
+peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of an arbitrary counter, we can use the file offset for our
partial ordering, removing a field. It takes some care when we compact
the store, however, as this field changes.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:34271-35283(34789.6+/-3.3e+02)
vsz_kb:2288784
store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
listnodes_sec:0.750000-0.850000(0.794+/-0.042)
listchannels_sec:30.740000-31.760000(31.096+/-0.35)
routing_sec:29.600000-33.560000(30.472+/-1.5)
peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:35685-38538(37090.4+/-9.1e+02)
+store_load_msec:34271-35283(34789.6+/-3.3e+02)
-vsz_kb:2288768
+vsz_kb:2288784
-peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
+peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is more compact, but also required once we replace the arbitrary
"index" with an actual offset into the gossip store. That will let us
remove the in-memory variants entirely.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35685-38538(37090.4+/-9.1e+02)
vsz_kb:2288768
store_rewrite_sec:35.530000-41.230000(37.904+/-2.3)
listnodes_sec:0.720000-0.810000(0.762+/-0.041)
listchannels_sec:30.750000-35.990000(32.704+/-2)
routing_sec:29.570000-34.010000(31.374+/-1.8)
peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:2621808
+vsz_kb:2288768
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used an s64 so we could use -1 and save a check, but that's just
silly as we have adjacent non-u64 fields: wastes 7 bytes per node
and 16 per channel.
Interestingly, this seemed to make us a little slower for some reason.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35569-38776(37169.8+/-1.2e+03)
vsz_kb:2621808
store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
listnodes_sec:0.740000-0.800000(0.768+/-0.023)
listchannels_sec:29.820000-32.730000(30.972+/-0.99)
routing_sec:30.110000-30.590000(30.346+/-0.18)
peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:32825-36365(34615.6+/-1.1e+03)
+store_load_msec:35569-38776(37169.8+/-1.2e+03)
-vsz_kb:2637488
+vsz_kb:2621808
-store_rewrite_sec:35.150000-36.200000(35.59+/-0.4)
+store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
-listnodes_sec:0.590000-0.710000(0.682+/-0.046)
+listnodes_sec:0.740000-0.800000(0.768+/-0.023)
-peer_write_all_sec:49.020000-52.890000(50.376+/-1.5)
+peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can save significant space by combining both sides: so much that we
can reduce the WIRE_LEN_LIMIT to something sane again.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:34467-36764(35517.8+/-7.7e+02)
vsz_kb:2637488
store_rewrite_sec:35.310000-36.580000(35.816+/-0.44)
listnodes_sec:1.140000-2.780000(1.596+/-0.6)
listchannels_sec:55.390000-58.110000(56.998+/-0.99)
routing_sec:30.330000-30.920000(30.642+/-0.19)
peer_write_all_sec:50.640000-53.360000(51.822+/-0.91)
MCP notable changes from previous patch (>1 stddev):
-store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
+store_rewrite_sec:35.310000-36.580000(35.816+/-0.44)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Don't turn them to/from pubkeys implicitly. This means nodeids in the store
don't get converted, but bitcoin keys still do.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:33934-35251(34531.4+/-5e+02)
vsz_kb:2637488
store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
listnodes_sec:1.020000-1.290000(1.146+/-0.086)
listchannels_sec:51.110000-58.240000(54.826+/-2.5)
routing_sec:30.000000-33.320000(30.726+/-1.3)
peer_write_all_sec:50.370000-52.970000(51.646+/-1.1)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:46184-47474(46673.4+/-4.5e+02)
+store_load_msec:33934-35251(34531.4+/-5e+02)
-vsz_kb:2638880
+vsz_kb:2637488
-store_rewrite_sec:46.750000-48.280000(47.512+/-0.51)
+store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tried to just do gossipd, but it was uncontainable, so this ended up being
a complete sweep.
We didn't get much space saving in gossipd, even though we should save
24 bytes per node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes the next step easier.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:45791-46917(46330.4+/-3.6e+02)
vsz_kb:2641316
store_rewrite_sec:47.040000-48.720000(47.684+/-0.57)
listnodes_sec:1.140000-1.340000(1.2+/-0.072)
listchannels_sec:50.970000-54.250000(52.698+/-1.3)
routing_sec:29.950000-31.010000(30.332+/-0.37)
peer_write_all_sec:51.570000-52.970000(52.1+/-0.54)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us benchmark without a valid blockchain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'fixup!_gossipd__dev_option_to_allow_unknown_channels.patch':
fixup! gossipd: dev option to allow unknown channels.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For giant nodes, it seems we spend a lot of time memmoving this array.
Normally we'd go for a linked list, but that's actually hard: each
channel has two nodes, so needs two embedded list pointers, and when
iterating there's no good way to figure out which embedded pointer
we'd be using.
So we (ab)use htable; we don't really need an index, but it's good for
cache-friendly iteration (our main operation). We can actually change
to a hybrid later to avoid the extra allocation for small nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically we tell it that every field ending in '_msat' is a struct
amount_msat, and 'satoshis' is an amount_sat. The exceptions are
channel_update's fee_base_msat which is a u32, and
final_incorrect_htlc_amount's incoming_htlc_amt which is also a
'struct amount_msat'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a side-effect of using amount_msat in gossipd/routing.c, we explicitly
handle overflows and don't need to pre-prune ridiculous-fee channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used a u16, and a 1000 multiplier, which meant we wrapped at
riskfactor 66. We also never undid the multiplier, so we ended up
applying 1000x the riskfactor they specified.
This changes us to pass the riskfactor with a 1M multiplier. The next
patch changes the definition of riskfactor to be more useful.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a seed, which is for (future!) unit testing consistency. This
makes it change every time, so our pay_direct_test is more useful.
I tried restarting the noed around the loop, but it tended to fail
rebinding to the same port for some reason?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>