We still close the channel if we *send* an error, but we seem to have hit
another case where LND sends an error which seems transient, so this will
make a best-effort attempt to preserve our channel in that case.
Some test have to be modified, since they don't terminate as they did
previously :(
Changelog-Changed: quirks: We'll now reconnect and retry if we get an error on an established channel. This works around lnd sending error messages that may be non-fatal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The spec says to close the channel if they send us an error, but we
need to be more lenient to preserve channels with other
implementations.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Encapsulating the peer state was a win for lightningd; not surprisingly,
it's even more of a win for the other daemons, especially as we want
to add a little gossip information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently hand the error back to the master, who then stores it for
future connections and hands it back to another openingd to send and exit.
Just send directly; it's more reliable and simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The One Big API is confusing, and has enough corner cases that we should
ditch it rather than add more.
See: https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction
In particular, when openingd is changed to chat to peers even when
it's not actively opening a channel, it wants to handle (most) errors
by continuing, not calling peer_failed().
This exposes the constituent parts.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).
In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).
This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.
This turns out to be quite simple, and is probably how I should have
done it originally :(
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Several daemons (onchaind, hsm) want to use the status messages, but
don't communicate with peers. The coming changes made them drag in
more code they didn't need, so instead we have a different
non-overlapping type.
We combine the status_received_errmsg and status_sent_errmsg
into a single status_peer_error, with the presence or not of the
'error_for_them' field indicating direction.
We also rename status_fatal_connection_lost() to
peer_failed_connection_lost() to fit in.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We make it a macro, since everyone uses PEER_FD and GOSSIP_FD constants
(they're actually always the same, but this is slightly safer), and
add a gossip_index arg: this is groundwork for when we want to hand
the peer back to master for gossipd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.
It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The spec 4af8e1841151f0c6e8151979d6c89d11839b2f65 uses a 32-byte 'channel-id'
field, not to be confused with the 8-byte short ID used by gossip. Rename
appropriately, and update to the new handshake protocol.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>