When gossipd sends a message, have a gossip_index. When it gets back a
peer, the current gossip_index is included, so it can know exactly where
it's up to.
Most of this is mechanical plumbing through openingd, channeld and closingd,
even though openingd and closingd don't (currently) read gossip, so their
gossip_index will be unchanged.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
All peers come from gossipd, and maintain an fd to talk to it. Sometimes
we hand the peer back, but to avoid a race, we always recreated it.
The race was that a daemon closed the gossip_fd, which made gossipd
forget the peer, then master handed the peer back to gossipd. We stop
the race by never closing the gossipfd, but hand it back to gossipd
for closing.
Now gossipd has to accept two fds, but the handling of peers is far
clearer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should also go through and use consistent nomenclature on functions which
are used with a local peer ("lpeer_xxx"?) and those with a remote peer
("rpeer_xxx"?) but this is minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will later be used to determine whether or not we should announce
ourselves as a node.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
And we report these through the getpeers JSON RPC again (carefully: in
our reconnect tests we can get duplicates which this patch now filters
out).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In future it will have TOR support, so the name will be awkward.
We collect the to/fromwire functions in common/wireaddr.c, and the
parsing functions in lightningd/netaddress.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to derive this from the fd when they connect in, but we already
know it if we're connecting out.
We want this so we can tell (in next few patches) master the peer's address.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
954a3990fa had two errors:
1) We created the handoff message *before* we sent the final packet, meaning
that the cryptostate was out-of-sync.
2) We called io_wait() on the output side of a duplex connection: it has
to be io_wait_out().
This time, stress testing for 2 hours revealed no more problems.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In this case, it was a gossip message half-sent, when we asked the peer
to be released. Fix the problem in general by making send_peer_with_fds()
wait until after the next packet.
test_routing_gossip/lightning-4/log:
b'lightning_openingd(8738): TRACE: First per_commit_point = 02e2ff759ed70c71f154695eade1983664a72546ebc552861f844bff5ea5b933bf'
b'lightning_openingd(8738): TRACE: Failed hdr decrypt with rn=11'
b'lightning_openingd(8738): STATUS_FAIL_PEER_IO: Reading accept_channel: Success'
test_routing_gossip/lightning-5/log:
b'lightning_gossipd(8461): UPDATE WIRE_GOSSIP_PEER_NONGOSSIP'
b'lightning_gossipd(8461): UPDATE WIRE_GOSSIP_PEER_NONGOSSIP'
b'lightningd(8308): Failed to get netaddr for outgoing: Transport endpoint is not connected'
The problem occurs here on release, but could be on any place where we hand
a peer over when using ccan/io. Note the other case (channel.c).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the flow is much simpler from a lightningd POV:
1. If we want to connect to a peer, just send gossipd `gossipctl_reach_peer`.
2. Every new peer, gossipd hands up to lightningd, with global/local features
and the peer fd and a gossip fd using `gossip_peer_connected`
3. If lightningd doesn't want it, it just hands the peerfd and global/local
features back to gossipd using `gossipctl_handle_peer`
4. If a peer sends a non-gossip msg (eg `open_channel`) the gossipd sends
it up using `gossip_peer_nongossip`.
5. If lightningd wants to fund a channel, it simply calls `release_channel`.
Notes:
* There's no more "unique_id": we use the peer id.
* For the moment, we don't ask gossipd when we're told to list peers, so
connected peers without a channel don't appear in the JSON getpeers API.
* We add a `gossipctl_peer_addrhint` for the moment, so you can connect to
a specific ip/port, but using other sources is a TODO.
* We now (correctly) only give up on reaching a peer after we exchange init
messages, which changes the test_disconnect case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes the only case where the master currently has to write directly
to the peer: re-sending an error. We make gossipd do it, by adding
a new gossipctl_fail_peer message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.
It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use a negative timestamp as the flag for this, making the test simple.
This allows valgrind to detect that we're accessing them prematurely,
including across the wire on gossip_getchannels_entry.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also, we split the more sophisticated json_add helpers to avoid pulling in
everything into lightning-cli, and unify the routines to print struct
short_channel_id (it's ':', not '/' too).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To avoid everything pulling in HTLCs stuff to the opening daemon, we
split the channel and commit_tx routines into initial_channel and
initial_commit_tx (no HTLC support) and move full HTLC supporting versions
into channeld.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As per lightning-rfc change 956e8809d9d1ee87e31b855923579b96943d5e63
"BOLT 7: add chain_hashes values to channel_update and channel_announcment"
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There was a race condition that would cause an assertion to segfault
if a call to release_peer was interleaved with a fail_peer. The
release_peer was making the peer non-local, which was then causing the
assertion in fail_peer to fail. Now we just have 3 cases: not found,
local, and non-local.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We weren't registering reconnecting peers for broadcasts. Just
starting a timer is enough. Also added an integration test to check
that the gossip sync is being resumed.
At the moment, master simply keeps the gossip fd open when peer
disconnects. That's inefficient, and wrong anyway (it may want a
complete new sync, or may not, but we'll currently send all the
messages including stale ones).
This interface will be required for restart anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had a terrible hack in gossip when a peer didn't exist. Formalize
a pattern when code+200 is a failure (with no fds passed), and use it
here.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I actually hit this very hard to reproduce race: if we haven't process
the channeld message when block #6 comes in, we won't send the gossip
message. We wait for logs, but don't generate new blocks, and timeout
on l1.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES').
The solution, which also tests that we don't send announcement signatures
immediately, is to generate a single block, wait for CHANNELD_NORMAL,
then (in gossip tests), generate 5 more.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can go to release a gossip peer, and it can fail at the same time.
We work around the problem that the reply must be a gossipctl_release_peer_reply
with two fds, but it's not pretty.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We kill the existing connection if possible; this may mean simply
forgetting the prior peer altogether if it's in an early state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We steal it when we're closing connection, but we normally want to forget
it if connection just dies.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to do this on every connection, whether reconnecting or not,
so it makes sense for the handshake daemon to handle it and return
the feature fields.
Longer term I'm considering having the handshake daemon handle the
listening and connecting, and simply hand the fds back once the peers
are ready.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The single string-based hostname and port has been retired in favor of
having multiple `struct ipaddr`s from the `node_announcement`. This
breaks the hostnames and ports from IRC, but I didn't bother to
backport ipaddr for it since it is only used in the legacy daemon.
Rather a big commit, but I couldn't figure out how to split it
nicely. It introduces a new message from the channel to the master
signaling that the channel has been announced, so that the master can
take care of announcing the node itself. A provisorial announcement is
created and passed to the HSM, which signs it and passes it back to
the master. Finally the master injects it into gossipd which will take
care of broadcasting it.