The new connect code revealed an existing race: we tell gossipd to
release the peer, but at the same time it connects in. gossipd fails
the release because the peer is remote, and json_fundchannel fails.
Instead, we catch this race when we get peer_connected() and we were
trying to open a channel. It means keeping a list of fundchannels which
are awaiting a gossipd response though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Without this, we can get errors on shutdown:
Valgrind error file: valgrind-errors.27444
==27444== Invalid read of size 8
==27444== at 0x1950E2: secp256k1_pubkey_load (secp256k1.c:127)
==27444== by 0x19CF87: secp256k1_ec_pubkey_serialize (secp256k1.c:189)
==27444== by 0x14FED9: towire_pubkey (towire.c:59)
==27444== by 0x15AAFB: towire_gossipctl_peer_disconnected (gen_gossip_wire.c:969)
==27444== by 0x1253EF: opening_channel_errmsg (opening_control.c:526)
==27444== by 0x1386A3: destroy_subd (subd.c:589)
==27444== by 0x18222C: notify (tal.c:240)
==27444== by 0x1826E1: del_tree (tal.c:400)
==27444== by 0x182733: del_tree (tal.c:410)
==27444== by 0x182733: del_tree (tal.c:410)
==27444== by 0x182B1F: tal_free (tal.c:511)
==27444== by 0x11FC53: main (lightningd.c:410)
==27444== Address 0x6c3af98 is 72 bytes inside a block of size 216 free'd
==27444== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27444== by 0x1827BC: del_tree (tal.c:421)
==27444== by 0x182B1F: tal_free (tal.c:511)
==27444== by 0x11F3C7: shutdown_subdaemons (lightningd.c:211)
==27444== by 0x11FC27: main (lightningd.c:406)
==27444== Block was alloc'd at
==27444== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27444== by 0x182296: allocate (tal.c:250)
==27444== by 0x182863: tal_alloc_ (tal.c:448)
==27444== by 0x12F2DF: new_peer (peer_control.c:74)
==27444== by 0x125600: new_uncommitted_channel (opening_control.c:576)
==27444== by 0x125870: peer_accept_channel (opening_control.c:668)
==27444== by 0x13032A: peer_sent_nongossip (peer_control.c:427)
==27444== by 0x116B9E: peer_nongossip (gossip_control.c:60)
==27444== by 0x116F2B: gossip_msg (gossip_control.c:172)
==27444== by 0x138323: sd_msg_read (subd.c:503)
==27444== by 0x137C02: read_fds (subd.c:330)
==27444== by 0x175550: next_plan (io.c:59)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
So we know how much counterparty could theoretically steal from us
(msatoshi_to_us - msatoshi_to_us_min) and how much we could
theoretically steal from counterparty (msatoshi_to_us_max -
msatoshi_to_us).
For more piloting goodness.
This bug is a classic case of being lazy:
1. peer_accept_channel() allocated its return off the input message,
rather than taking an explicit allocation context. This concealed the
lifetime nature of the return.
2. The context for sanitize_error was the error itself, rather than the
more obvious tmpctx (connect_failed does not take).
The global tmpctx removes the "efficiency" excuse for grabbing a random
object to use as context, and is also nice and explicit.
All-the-hard-work-by: @ZmnSCPxj
This simplifies things, and means it's always in the database. Our
previous approach to creating it on the fly had holes when it was
created for onchaind, causing us to use another every time we
restarted.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use NULL on the callback to mean "clear the slot", and call it.
We have do this in two places: the old daemon might die, or the new
daemon might start first.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each state (effectively, each daemon) has two slots: a permanent slot
if something permanent happens (usually, a failure), and a transient
slot which summarizes what's happening right now.
Uncommitted channels only have a transient slot, by their very nature.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We always hand in "NULL" (which means use tal_len on the msg), except
for two places which do that manually for no good reason.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also fold opening_got_hsm_funding_sig() into the caller; it was
previously a callback before we decided to always use the HSM
synchronously.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>