There's now a potential race: the source peer connect returns, but in
destination peer the master hasn't read the connect message from
connectd, so the peer isn't in listpeers yet.
(Previously the connection stayed in connectd, so there was no such
window).
This is an occasional issue in a few places.
Note that we take the opportunity to speed up test_disconnectpeer too
while we're there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.
Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.
This is a deep structural change, so we do the minimum here and cleanup
in the following patches.
Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
find the race between us asking connectd to release the peer and it
reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
channel (until it has succeeded): it's very transitory anyway so not
worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
or a fundee, or an error in response to our request to fund a channel.
so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.
Openingd:
1. We always have information we need to accept them funding a channel (in
the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.
Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
exist; 'state' is now per-channel. It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
the peer hand over the connection to openingd. Our tests sometimes
checked getpeers() on the peer, and didn't see anything, so line_graph
needed updating.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, I found lightning_openingd processes after running
tests. When we use the dev_disconnect blackhole '0' option, they
stick around until the dev_disconnect file is truncated (there is only
so much you can do with only a file descriptor), so let's do that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The following changes revealed this race, where expecting listchannels()
to contain two channels immediately after fund_channel() was racy.
We also derive the short_channel_id first, so we can search logs for the
exact messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The next patches get better at reconecting, so if we use dev-allow-localhost
nodes can often find each other and reconnect before shutting down; only
use that option where we actually need it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Saw this in Travis: technically we return from the dev_set_max_scids...
cmd after sending it to gossipd, but we should wait for it to log.
Adding an internal reply message for a dev command seems overkill.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. If the IPv6 address was public, that changed the wireaddr and thus the ipv4 bind
would not be to a wildcard and would fail.
2. Binding two fds to the same port on both wildcard IPv4 and IPv6 succeeds; we only
fail when we try to listen, so allow error at this point.
For some reason this triggered on my digital ocean machine.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
openingd calculates our reserve based on the channel amount (even if
we're funding, to keep the calculation in one place), but it wasn't
reporting it back to the master daemon. We initialized it to 0 so that
valgrind wouldn't get upset, as it's part of a structure we send over
the wire.
Have openingd report back, and also initialize it to an impossible value
as extra assurance. And remove a stray (harmless but weird) semicolon.
Reported-by: Gálli Zoltán
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This adds one line with the onion and the channel_update we extract from
it. This in turn allows us to check that the channel_update in the onion is not
type prefixed, and that we patch it correctly before passing it to gossipd.
The easiest way to do this is to play with the 'wallet_tx' semantics
and have 'amount' have meaning even when 'all_funds' is set.
Note that we change the string 'Cannot afford funding transaction' to
'Cannot afford transaction' as this code is also used for withdrawls.
Inspired-by: molz on #c-lightning
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The logs in various Travis failures show that it takes 20 seconds just for
closingd to read the init message. As a result, the close times out (default
is 30 seconds).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were *supposed* to be waiting for the next commitment tx so we
made sure the one we broadcast was old, *but* the 'revoke_and_ack'
we were waiting for could be matched by the completion of the previous
'revoke_and_ack'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This needs to be done separately from the rest of the daemon since we can
otherwise not make sure that it happens before the DB is freed and we might
still need the DN, and be running in a DB transaction, for some destructors to
run.
That was the cause of the bad gossip order failures: gossipd thought our
channel was live, but the other end didn't receive message last time.
Now gossipd doesn't use fd to kill us (connectd tells master to do so), we
can implement read_peer_msg_nogossip().
Fixes: #1706
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch guts gossipd of all peer-related functionality, and hands
all the peer-related requests to channeld instead.
gossipd now gets the final announcable addresses in its init msg, since
it doesn't handle socket binding any more.
lightningd now actually starts connectd, and activates it. The init
messages for both gossipd and connectd still contain redundant fields
which need cleaning up.
There are shims to handle the fact that connectd's wire messages are
still (mostly) gossipd messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd combines the information if it knows it, but that's really the
job of 'listnodes'. More importantly, channeld won't have access to
this information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I saw an error in test_gossip_weirdalias in Travis, where listnodes(nodeid)
returned *BOTH* nodes; it happened to fail because [0] was the wrong one, but
it would have passed if the order had been different.
This helper asserts that we really do only have one element, and should
catch such bugs faster.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I could not figure out why test_announce_address suddenly stopped working:
I had previously been using DEVELOPER=1 on the cmdline for historical
reasons when testing locally.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't waiting for gossipd to actually process the
dev_set_max_scids_encode_size message, so under Travis it sometimes
split the reply before processing that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this test we tell l3 to disconnect on sending WIRE_CHANNEL_ANNOUNCEMENT.
This is hit by gossipd (to disconnect from l2) but *also* channeld to
disconnect from l4. That's OK, because normally by this point l4 has
sent its real channel_update.
However, the next patch introduces a delay in sending channel_updates,
meaning l4 hasn't sent it yet. If l3 doesn't reconnect to l4, we
never get the channel_update and the test which expects l1 to eventually
see both sides of the channel fails.
So we manually reconnect then. Note that we remove the redundant
'dev-no-reconnect' option from l2: it's added automatically as it
doesn't set 'may_reconnect'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd will ignore the second one, but doing it in the front end
gives an explicit error message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>