We seem to be getting spurious reconnect failures on Travis, this
should fix it (15 seconds may be too long at worst case).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The callback on `key_negotiate` was closing the connection under
certain circumstances and would also `free` the key_negotiate, which
would then be freed again once it returns. We steal it off of the
connection during the callback and doing the free manually afterwards
to make sure this can't happen.
Thanks to @jgriffiths for tracking this one down.
Fixes#142
Reported-By: @bjd and @bgorlick
This fails on the old dev-restart tests, so we need to only enable it
for the new tests:
rusty@rusty-XPS-13-9360:~/devel/cvs/lightning (guilt/ping-pong)$ daemon/test/test-basic --restart --verbose
...
{ }
RESTARTING
dev-restart failed!
valgrind: mmap(0x38000000, 2265088) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This just means we put the outgoing_cltv_value where we used to put zeroes.
The old daemon simply ignores this, but the new one should check it as per
BOLT 4.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Appending the channel direction ensures that the directions will be
treated independently. Without this only one direction would be
forwarded to the peers.
The direction bit was computed in several spots and was inconsistent
in some cases. Now we compute it just in routing, and once when
starting up `channeld`, this avoids recomputing it all over the place.
Before exiting, `channeld` constructs and sends a `channel_update`
marking the channel as disabled. This is the pro-active signalling
that the channel may no longer be used.
The JSON-RPC call `getroute` and the functionality to compute the
actual route have been split so that we can reuse it independently of
the JSON-interface. Since this is now a routing-only method I also
moved it into `routing.[ch]` instead of `pay.c`.
The spec 4af8e1841151f0c6e8151979d6c89d11839b2f65 uses a 32-byte 'channel-id'
field, not to be confused with the 8-byte short ID used by gossip. Rename
appropriately, and update to the new handshake protocol.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use a different 'struct peer' in the new daemons, so make sure
the structure isn't assumed in any shared files.
This is a temporary shim.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The peer structure is only for the old daemon; instead move the list
of all outgoing txs for rebroadcasting into struct topology (still
owned by peers, so they are removed when it exits).
One subtlety: on exit, struct topology is free before the peers,
so they end up removing from a freed list. Thus we actually free
every outgoing tx manually on topology free.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's no reason to think that the seed isn't reproducable from the
output: we don't want to give away our siphash seed and allow hashbombing,
so seed isaac with the SHA of the seed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Usually if we get a packet while closing (onchain event), we're going
through pkt_in which discards it. However, if we're reconnecting, we
simply process the init packet and get upset because they've forgotten
us.
Hard to reproduce, but here's the log (in this case, test-routing --reconnect
and we have just done mutual close):
We reconnect in STATE_MUTUAL_CLOSING, send INIT pkt:
+19.397025114 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Init with ack 1 opens + 9 sigs + 8 revokes + 1 shutdown + 1 closing
While waiting for response, we see the mutual close...
+19.398732602 lightningd(4637):DEBUG: reaped 6370: bitcoin-cli -regtest=1 -datadir=/tmp/bitcoin-lightning2 getblock 2a63b209e17aedc5b1bcc6c2f9e044f97c9c3ca136fc64a719f704d2f632df5f false
+19.401834422 lightningd(4637):DEBUG: Adding block 5fdf32f6d204f719a764fc36a13c9c7cf944e0f9c2c6bcb1c5ed7ae109b2632a
+19.405167334 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Got UTXO spend for 8bb48a:0: 7f5e422f...
+19.412543610 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: anchor_spent: STATE_MUTUAL_CLOSING => STATE_CLOSE_ONCHAIN_MUTUAL
And we also see it buried "forever" (10 blocks in test mode), so we forget peer:
+19.423045014 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Anchor at depth 13
+19.426775063 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: check_for_resolution: STATE_CLOSE_ONCHAIN_MUTUAL => STATE_CLOSED
+19.427613109 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_forget_peer(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
+19.428130685 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_start_transaction(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
+19.501027511 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_commit_transaction(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
Now, we get their reply, but they've forgotten us:
+19.520208608 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Decrypted header len 5
+19.520872035 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Received packet LEN=5, type=PKT__PKT_INIT
+19.520999082 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Our order counter is 19, their ack 0
+19.521078913 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: They acked 0, remote=16 local=15
+19.521447174 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_OPEN (order=19)
+19.522563794 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_OPEN_COMMIT_SIG (order=19)
+19.523517319 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:BROKEN: Can't rexmit 2 when local commit 15 and remote 16
+19.524613177 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:UNUSUAL: Sending PKT_ERROR: invalid ack
+19.526638447 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_ERROR (order=19)
+19.527508022 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: peer_comms_err: STATE_CLOSED => STATE_ERR_BREAKDOWN
We should never transition from STATE_CLOSED to STATE_ERR_BREAKDOWn,
and that's what this check prevents.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Or for blackbox tests --gdb1=<subdaemon> / --gdb2=<subdaemon>.
This makes the subdaemon wait as soon as it's execed, so we can attach
the debugger.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems rather easy to fix, the only case we do not want to set
`STATE_SHUTDOWN` us when we have updates which we have not committed
yet, which is handled separately in the other IF-branch.
The `dstate` reference was only an indirection to the `timers`
sub-structure anyway, so removing this indirection allows us to reuse
the timers in the subdaemon arch.
We used to have a permutation map; this reintroduces a variant which
uses the htlc pointers directly.
We need this because we have to send the htlc-tx signatures in output
order as part of the protocol: without two-stage HTLCs we only needed
to wire them up in the unilateral spend case so we simply brute-forced
the ordering.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If type and tag match, then we replace any existing message in the
queue. This allows us to drop old announcements. Special care needs to
be taken so that dependent messages are not reordered, but for gossip
this is the case, since the `channel_announcement` cannot be updated.
Moved the broadcast functionality to broadcast.[ch]. So far this
includes only the enqueuing side of broadcasts, the dequeuing and
actual push to the peer is daemon dependent. This also adds the
broadcast_state to the routing_state and the last broadcast index to
the peer for the legacy daemon.
This was the only time we actually reference non-routing structs in
routing, so moving this out should allow us to get it working in the
new subdaemons.
This allows us to move some legacy functions closer to where they are
actually used, and not worry about them when including routing.h into
the new subdaemons. `struct peer` is the main culprit here.