lightning

Commit Graph

Author	SHA1	Message	Date
Rusty Russell	24cc371cdf	gossipd: gossip_store errors after rewrite are fatal. We can't continue, since we've moved the indexes. We'll just crash anyway, as seen from bugs #2742 and #2743. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	eb5cc47bdd	gossipd: count deleted records correctly when loading gossip_store. The result of an incorrect count was that we failed on next compaction. Fixes: #2743 Fixes: #2742 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	21c920a8e8	gossipd: note if loaded store seems reasonably up-to-date. If not, we can ask peers for full gossip (for now we just set a flag). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	0d2a4830ed	ccan: update to faster and correct crc32c implementation. I decided to try a faster implementation, only to find our crc32c was not correct! Ouch. I removed the crc32c functions from ccan/crc, and added a new crc32c module which has the Mark Adler x86-64-optimized variants. We bump gossip_store version again, since csums have changed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	628b65fb40	gossip_store: don't leave dangling channel_announce if we truncate. (Or, if we crashed before we got to write out the channel_update). It's a corner case, but one reported by @darosior and reproduced on my test node (both with bad gossip_store due to previous iterations of this patchset!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	3e733afb2b	gossipd: remove broadcast map altogether. This clarifies things a fair bit: we simply add and remove from the gossip_store directly. Before this series: (--disable-developer, -Og) store_load_msec:20669-20902(20822.2+/-82) vsz_kb:439704-439712(439706+/-3.2) listnodes_sec:0.890000-1.000000(0.92+/-0.04) listchannels_sec:11.960000-13.380000(12.576+/-0.49) routing_sec:3.070000-5.970000(4.814+/-1.2) peer_write_all_sec:28.490000-30.580000(29.532+/-0.78) After: (--disable-developer, -Og) store_load_msec:19722-20124(19921.6+/-1.4e+02) vsz_kb:288320 listnodes_sec:0.860000-0.980000(0.912+/-0.056) listchannels_sec:10.790000-12.260000(11.65+/-0.5) routing_sec:2.540000-4.950000(4.262+/-0.88) peer_write_all_sec:17.570000-19.500000(18.048+/-0.73) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	dd83453b2f	gossipd/gossip_store: fix compacting, don't use broadcast ordering. We have a problem: if we get halfway through writing the compacted store and run out of disk space, we've already changed half the indexes. This changes it so we do nothing until writing is finished: then we iterate through and update indexes. It also weans us off broadcast ordering, which we can now eliminated. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	5161b79bfc	gossipd/gossip_store: keep count of deleted entries, don't use bs->count. We didn't count some records before, so we could compare the two counters. This is much simpler, and avoids reliance on bs. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	948490ec58	gossipd: add timestamp in gossip store header. (We don't increment the gossip_store version, since there are only a few commits since the last time we did this). This lets the reader simply filter messages; this is especially nice since the channel_announcement timestamp is derived, not in the actual message. This also creates a 'struct gossip_hdr' which makes the code a bit clearer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	bad9734dc7	gossip_store: remove redundant copy_message. The single caller can easily use transfer_store_msg instead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	5591c0b5d8	gossipd: don't send gossip stream, let per-peer daemons read it themselves. Keeping the uintmap ordering all the broadcastable messages is expensive: 130MB for the million-channels project. But now we delete obsolete entries from the store, we can have the per-peer daemons simply read that sequentially and stream the gossip itself. This is the most primitive version, where all gossip is streamed; successive patches will bring back proper handling of timestamp filtering and initial_routing_sync. We add a gossip_state field to track what's happening with our gossip streaming: it's initialized in gossipd, and currently always set, but once we handle timestamps the per-peer daemon may do it when the first filter is sent. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	4399faf57c	gossipd: make writes to gossip_store atomic. There's a corner case where otherwise a reader could see the header and not the body of a message. It could handle that in various ways, but simplest (and most efficient) is to avoid it happening. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	df00f20e4a	gossipd: erase old entries from the store, don't just append. We use the high bit of the length field: this way we can still check that the checksums are valid on deleted fields. Once this is done, serially reading the gossip_store file will result in a complete, ordered, minimal gossip broadcast. Also, the horrible corner case where we might try to delete things from the store during load time is completely gone: we only load non-deleted things. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	696dc6b597	gossipd: disable gossip_store upgrade. We're about to bump version again, and the code to upgrade it was quite hairy (and buggy!). It's not worthwhile for such a poorly-tested path: I will just add code to limit how much incoming gossip we get to avoid flooding when we upgrade, however. I also use a modern gossip_store version in our test_gossip_store_load test, instead of relying on the upgrade path. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	43f2cbd250	gossipd: track gossip_store locations of local channels. We currently don't care, but the next patch means we have to find them again. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	180a552fba	gossip_store: mark private updates separately from normal ones. They're really gossipd-internal, and we don't want per-peer daemons to confuse them with normal updates. I don't bump the gossip_store version; that's coming with another update anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	763697eb4c	gossipd: fix gossip_store calling delete. Now we handle node_announcements properly, we have a failure case where we try to move them when a channel is deleted while loading the store. We're going to remove this soon, in favor of in-place delete, so workaround this for now to avoid an assert() when we try to write to the store while loading. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	c233fc5063	gossipd: fix spurious unused error with gcc-9 -O3. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	c091a4ee40	gossipd: fix spurious gcc warning. It turns out that we don't look at type when we return 0, but gcc isn't quite smart enough for that. Initializing to -1 is good practice anyway for the failure path. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	d1f43d993a	gossipd: use explicit destructor for struct chan. Each destructor2 costs 40 bytes, and struct chan is only 120 bytes. So this drops our memory usage quite a bit: MCP bench results change: -vsz_kb:580004-580016(580006+/-4.8) +vsz_kb:533148 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	85d8848ede	gossipd: neaten insert_broadcast a little. Suggested-by: @cdecker. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	0e37ac2433	common: move gossip_store read routine where subdaemons can access it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	d8db4e871f	gossipd: provide new fd to per-peer daemons when we compact it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	13717c6ebb	gossipd: hand a gossip_store_fd to all subdaemons. This will let them read from the gossip store directly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	89291b930e	gossipd: pass amount into gossip_store, rather than having it fetch. We need to store the channel capacity for channel_announcement: hand it in directly rather than having the gossip_store code do a lookup. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	7ede5aac31	gossip_store: change format so we store raw messages. Save some overhead, plus gets us ready for giving subdaemons direct store access. This is the first time we upgrade the gossip_store, rather than just discarding. The downside is that we need to add an extra message after each channel_announcement, containing the channel capacity. After: store_load_msec:28337-30288(28975+/-7.4e+02) vsz_kb:582304-582316(582306+/-4.8) store_rewrite_sec:11.240000-11.800000(11.55+/-0.21) listnodes_sec:1.800000-1.880000(1.84+/-0.028) listchannels_sec:22.690000-26.260000(23.878+/-1.3) routing_sec:2.280000-9.570000(6.842+/-2.8) peer_write_all_sec:48.160000-51.480000(49.608+/-1.1) Differences: -vsz_kb:582320 +vsz_kb:582316 -listnodes_sec:2.100000-2.170000(2.118+/-0.026) +listnodes_sec:1.800000-1.880000(1.84+/-0.028) -peer_write_all_sec:51.600000-52.550000(52.188+/-0.34) +peer_write_all_sec:48.160000-51.480000(49.608+/-1.1) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	0ca0db765a	gossipd: fix crash if we truncate store. Entries we've already loaded expect to exist in the store. We could go back and remove them all, but instead just truncate at the known-good point. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	b248bb155a	tools/bench-gossipd.sh: make it work (where possible) with DEVELOPER=0 Some tests require dev support, but the rest can run. We simplify the gossip_store output so it's the same in non-dev mode too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	261921dee2	gossipd: adjust peers' broadcast_offset when compacting store. When we compact the store, we need to adjust the broadast index for peers so they know where they're up to. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	fdb42c3170	gossipd: don't keep channel_updates in memory. This requires some trickiness when we want to re-add unannounced channels to the store after compaction, so we extract a common "copy_message" to transfer from old store to new. MCP results from 5 runs, min-max(mean +/- stddev): store_load_msec:36034-37853(37109.8+/-5.9e+02) vsz_kb:577456 store_rewrite_sec:12.490000-13.250000(12.862+/-0.27) listnodes_sec:1.250000-1.480000(1.364+/-0.09) listchannels_sec:30.820000-31.480000(31.068+/-0.24) routing_sec:26.940000-27.990000(27.616+/-0.39) peer_write_all_sec:65.690000-68.600000(66.698+/-0.99) MCP notable changes from previous patch (>1 stddev): -vsz_kb:1202316 +vsz_kb:577456 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	0370ed2eca	gossipd: use pread in the store. The next patch causes us to access the store while loading (we read channel_updates for local peers), which messes up loading due to the lseek involved. Using pread() is atomic with seek & read, and also a bit more efficient. Make the header contiguous too, while we're here. We don't need pwrite: we always open with O_APPEND which means the seek-to-end is implicit. MCP results from 5 runs, min-max(mean +/- stddev): store_load_msec:36771-38289(37529.6+/-5.3e+02) vsz_kb:1202316 store_rewrite_sec:12.460000-13.280000(12.784+/-0.29) listnodes_sec:1.240000-1.410000(1.34+/-0.058) listchannels_sec:29.850000-31.840000(30.908+/-0.69) routing_sec:27.800000-31.790000(28.822+/-1.5) peer_write_all_sec:66.200000-68.720000(67.44+/-0.84) MCP notable changes from previous patch (>1 stddev): -store_load_msec:39207-45089(41374.6+/-2.2e+03) +store_load_msec:36771-38289(37529.6+/-5.3e+02) -store_rewrite_sec:15.090000-16.790000(15.654+/-0.63) +store_rewrite_sec:12.460000-13.280000(12.784+/-0.29) -peer_write_all_sec:66.830000-76.850000(71.976+/-3.6) +peer_write_all_sec:66.200000-68.720000(67.44+/-0.84) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	2135c7a024	gossipd: allow reading from the store during load. When we no longer keep channel_updates in memory, there's a path where we access them on load: when we promote a local channel to an announced channel. This breaks at the moment, since gs->fd == -1; change it to a writable flag instead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	3280466e19	gossipd: don't keep channel_announcement messages in memory. MCP results from 5 runs, min-max(mean +/- stddev): store_load_msec:35107-37944(36686+/-1e+03) vsz_kb:1218036 store_rewrite_sec:14.060000-17.970000(15.966+/-1.6) listnodes_sec:1.270000-1.350000(1.314+/-0.034) listchannels_sec:28.510000-30.270000(29.6+/-0.6) routing_sec:30.230000-31.510000(30.83+/-0.44) peer_write_all_sec:67.390000-70.710000(68.568+/-1.2) MCP notable changes from previous patch (>1 stddev): -vsz_kb:1780516 +vsz_kb:1218036 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	cb297b0a1b	gossipd: free tmpctx children in gossip_store_load loop. We're accumulating children, and we'll get more in the successive patches. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	6b9069ee28	broadcast: don't keep payload pointer. If we need the payload, pull it from the gossip store. MCP results from 5 runs, min-max(mean +/- stddev): store_load_msec:30189-52561(39416.4+/-8.8e+03) vsz_kb:1812904 store_rewrite_sec:21.390000-27.070000(23.596+/-2.4) listnodes_sec:1.120000-1.230000(1.176+/-0.044) listchannels_sec:38.900000-50.580000(44.716+/-3.9) routing_sec:45.080000-48.160000(46.814+/-1.1) peer_write_all_sec:58.780000-87.150000(72.278+/-9.7) MCP notable changes from previous patch (>1 stddev): -vsz_kb:2288784 +vsz_kb:1812904 -store_rewrite_sec:38.060000-39.130000(38.426+/-0.39) +store_rewrite_sec:21.390000-27.070000(23.596+/-2.4) -listnodes_sec:0.750000-0.850000(0.794+/-0.042) +listnodes_sec:1.120000-1.230000(1.176+/-0.044) -listchannels_sec:30.740000-31.760000(31.096+/-0.35) +listchannels_sec:38.900000-50.580000(44.716+/-3.9) -routing_sec:29.600000-33.560000(30.472+/-1.5) +routing_sec:45.080000-48.160000(46.814+/-1.1) -peer_write_all_sec:49.220000-52.690000(50.892+/-1.3) +peer_write_all_sec:58.780000-87.150000(72.278+/-9.7) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	da845b660b	gossipd: gossip_store_get() to load a single store entry. This will allow us to load on demand, and not keep all messages in memory. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	1f08cfb3e3	gossipd: use file offset within store as broadcast index. Instead of an arbitrary counter, we can use the file offset for our partial ordering, removing a field. It takes some care when we compact the store, however, as this field changes. MCP results from 5 runs, min-max(mean +/- stddev): store_load_msec:34271-35283(34789.6+/-3.3e+02) vsz_kb:2288784 store_rewrite_sec:38.060000-39.130000(38.426+/-0.39) listnodes_sec:0.750000-0.850000(0.794+/-0.042) listchannels_sec:30.740000-31.760000(31.096+/-0.35) routing_sec:29.600000-33.560000(30.472+/-1.5) peer_write_all_sec:49.220000-52.690000(50.892+/-1.3) MCP notable changes from previous patch (>1 stddev): -store_load_msec:35685-38538(37090.4+/-9.1e+02) +store_load_msec:34271-35283(34789.6+/-3.3e+02) -vsz_kb:2288768 +vsz_kb:2288784 -peer_write_all_sec:51.140000-58.350000(55.69+/-2.4) +peer_write_all_sec:49.220000-52.690000(50.892+/-1.3) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	ec50ec6a71	gossipd: make gossip loading stats accurate. They didn't count the header sizes when reporting bytes, which is misleading. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	eb4564c3cd	gossipd: embed broadcast information into each structure. This is more compact, but also required once we replace the arbitrary "index" with an actual offset into the gossip store. That will let us remove the in-memory variants entirely. MCP results from 5 runs, min-max(mean +/- stddev): store_load_msec:35685-38538(37090.4+/-9.1e+02) vsz_kb:2288768 store_rewrite_sec:35.530000-41.230000(37.904+/-2.3) listnodes_sec:0.720000-0.810000(0.762+/-0.041) listchannels_sec:30.750000-35.990000(32.704+/-2) routing_sec:29.570000-34.010000(31.374+/-1.8) peer_write_all_sec:51.140000-58.350000(55.69+/-2.4) MCP notable changes from previous patch (>1 stddev): -vsz_kb:2621808 +vsz_kb:2288768 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	62918fcb3b	gossip_store: avoid gratuitous copy on load. Doesn't make measurable difference, but an obvious optimization. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	91849dddc4	wire: use struct node_id for node ids. Don't turn them to/from pubkeys implicitly. This means nodeids in the store don't get converted, but bitcoin keys still do. MCP results from 5 runs, min-max(mean +/- stddev): store_load_msec:33934-35251(34531.4+/-5e+02) vsz_kb:2637488 store_rewrite_sec:34.720000-35.130000(34.94+/-0.14) listnodes_sec:1.020000-1.290000(1.146+/-0.086) listchannels_sec:51.110000-58.240000(54.826+/-2.5) routing_sec:30.000000-33.320000(30.726+/-1.3) peer_write_all_sec:50.370000-52.970000(51.646+/-1.1) MCP notable changes from previous patch (>1 stddev): -store_load_msec:46184-47474(46673.4+/-4.5e+02) +store_load_msec:33934-35251(34531.4+/-5e+02) -vsz_kb:2638880 +vsz_kb:2637488 -store_rewrite_sec:46.750000-48.280000(47.512+/-0.51) +store_rewrite_sec:34.720000-35.130000(34.94+/-0.14) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	417e1bab7d	gossipd: use iterator helpers for iterating node channels. Makes the next step easier. MCP results from 5 runs, min-max(mean +/- stddev): store_load_msec:45791-46917(46330.4+/-3.6e+02) vsz_kb:2641316 store_rewrite_sec:47.040000-48.720000(47.684+/-0.57) listnodes_sec:1.140000-1.340000(1.2+/-0.072) listchannels_sec:50.970000-54.250000(52.698+/-1.3) routing_sec:29.950000-31.010000(30.332+/-0.37) peer_write_all_sec:51.570000-52.970000(52.1+/-0.54) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	891ee20a59	tools/bench-gossipd.sh: rough benchmark for gossipd and the million channels project Outputs CSV. We add some stats for load times in developer mode, so we can easily read them out. peer_read_all_sec doesn't work, since we seem to reject about half the updates for having bad signatures. It's also very slow... routing fails, for unknown reasons, so that failure is ignored in routing_sec. Results from 5 runs, min-max(mean +/- stddev): store_load_msec,vsz_kb,store_rewrite_sec,listnodes_sec,listchannels_sec,routing_sec,peer_write_all_sec 39275-44779(40466.8+/-2.2e+03),2899248,41.010000-44.970000(41.972+/-1.5),2.280000-2.350000(2.304+/-0.025),49.770000-63.390000(59.178+/-5),33.310000-34.260000(33.62+/-0.35),42.100000-44.080000(43.082+/-0.67) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Header from folded patch 'fixup!_tools-bench-gossipd.sh__rough_benchmark_for_gossipd_and_the_million_channels_project-2.patch': fixup! tools/bench-gossipd.sh: rough benchmark for gossipd and the million channels project Suggested-by: @niftynei Header from folded patch 'fixup!_tools-bench-gossipd.sh__rough_benchmark_for_gossipd_and_the_million_channels_project-1.patch': fixup! tools/bench-gossipd.sh: rough benchmark for gossipd and the million channels project MCP filename change. Header from folded patch 'tools-bench-gossipd.sh__dont_print_csv_by_default.patch': tools/bench-gossipd.sh: don't print CSV by default. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Header from folded patch 'fixup!_tools-bench-gossipd.sh__rough_benchmark_for_gossipd_and_the_million_channels_project.patch': fixup! tools/bench-gossipd.sh: rough benchmark for gossipd and the million channels project Make shellcheck happy. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	2bd7df93c6	gossipd: preserve unannounced channels across store compaction. Otherwise we'd forget them on restart, again. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	7c8f506a0f	dev-compact-store-gossip: specific RPC so we can test gossip_store rewrite. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	3ac0e814d0	daemons: use amount_msat/amount_sat in all internal wire transfers. As a side-effect of using amount_msat in gossipd/routing.c, we explicitly handle overflows and don't need to pre-prune ridiculous-fee channels. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	5c60d7ffb2	gossipd: split wire types into msgs from lightningd and msgs from per-peer daemons This avoids some very ugly switch() statements which mixed the two, but we also take the chance to rename 'towire_gossip_' to 'towire_gossipd_' for those inter-daemon messages; they're messages to gossipd, not gossip messages. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	8455b12781	Revert "gossipd: handle premature node_announcements in the store." This reverts commit `e2f426903d`. With the new store version, this can't happen. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	e2f426903d	gossipd: handle premature node_announcements in the store. These happen after we compact the store; every log I've seen of a restart on a real node has a message about truncating the store, because node_announcements predate channel_announcements. I extracted one such case from testnet, and reduced it to test here. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	317a830e94	devtools: dump-gossipstore. Not very useful by itself, but when combined with decodemsg it can tell us quite a bit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago

1 2

84 Commits (909f22f117b6986b57060d1c5cc7f74828effe02)