Usually if we get a packet while closing (onchain event), we're going
through pkt_in which discards it. However, if we're reconnecting, we
simply process the init packet and get upset because they've forgotten
us.
Hard to reproduce, but here's the log (in this case, test-routing --reconnect
and we have just done mutual close):
We reconnect in STATE_MUTUAL_CLOSING, send INIT pkt:
+19.397025114 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Init with ack 1 opens + 9 sigs + 8 revokes + 1 shutdown + 1 closing
While waiting for response, we see the mutual close...
+19.398732602 lightningd(4637):DEBUG: reaped 6370: bitcoin-cli -regtest=1 -datadir=/tmp/bitcoin-lightning2 getblock 2a63b209e17aedc5b1bcc6c2f9e044f97c9c3ca136fc64a719f704d2f632df5f false
+19.401834422 lightningd(4637):DEBUG: Adding block 5fdf32f6d204f719a764fc36a13c9c7cf944e0f9c2c6bcb1c5ed7ae109b2632a
+19.405167334 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Got UTXO spend for 8bb48a:0: 7f5e422f...
+19.412543610 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: anchor_spent: STATE_MUTUAL_CLOSING => STATE_CLOSE_ONCHAIN_MUTUAL
And we also see it buried "forever" (10 blocks in test mode), so we forget peer:
+19.423045014 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Anchor at depth 13
+19.426775063 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: check_for_resolution: STATE_CLOSE_ONCHAIN_MUTUAL => STATE_CLOSED
+19.427613109 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_forget_peer(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
+19.428130685 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_start_transaction(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
+19.501027511 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_commit_transaction(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
Now, we get their reply, but they've forgotten us:
+19.520208608 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Decrypted header len 5
+19.520872035 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Received packet LEN=5, type=PKT__PKT_INIT
+19.520999082 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Our order counter is 19, their ack 0
+19.521078913 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: They acked 0, remote=16 local=15
+19.521447174 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_OPEN (order=19)
+19.522563794 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_OPEN_COMMIT_SIG (order=19)
+19.523517319 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:BROKEN: Can't rexmit 2 when local commit 15 and remote 16
+19.524613177 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:UNUSUAL: Sending PKT_ERROR: invalid ack
+19.526638447 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_ERROR (order=19)
+19.527508022 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: peer_comms_err: STATE_CLOSED => STATE_ERR_BREAKDOWN
We should never transition from STATE_CLOSED to STATE_ERR_BREAKDOWn,
and that's what this check prevents.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
libwally's tools/cleanup.sh doesn't actually remove files if it can't
run make, so do that manually. Also clear some other cruft.
Also, we weren't deleting wire/gen_onion_wire.c in "make clean".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we've tested it:
1. open_channel needs to write response to REQ_FD not STATUS_FD.
2. recv_channel needs to send our next_per_commit, not echo theirs!
3. print the problematic signature if it's wrong, not our own.
Cleanups:
1. Return the message from open_channel/recv_channel for simplicity.
2. Trace signing information.
3. More tracing messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The signing helper was really just for testing, so remove it. But
turn the funding_tx() function into a useful one by making it take the
utxo array.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I made it privkey to prove we owned one key, but without the HSM checking
we have a valid sig for the first commitment transaction, and that
we haven't revealed the revocation secret key, why bother?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Or for blackbox tests --gdb1=<subdaemon> / --gdb2=<subdaemon>.
This makes the subdaemon wait as soon as it's execed, so we can attach
the debugger.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We should check that the peer it says it's returning is under its control,
we need to take back the peer fd, and use the correct conversion routine
for the packet it sends us.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For the moment this is simply handed through to lightningd for
generating the per-peer secrets; eventually the HSM should keep it and
all peer secret key operations would be done via HSM-ops.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Raw crypto_state is what we send across the wire: the peer one is for
use in async crypto io routines (peer_read_message/peer_write_message).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The requirements for accepting the remote config are more complex than
a simple min/max value, as various parameters are related. It turns
out that with a few assumptions, we can boil this down to:
1. The valid feerate range.
2. The minimum effective HTLC throughput we want
3. The a maximum delay we'll accept for us to redeem.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Unless the transaction is confirmed, the UTXOs should be released if
something happens to the peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
wire_sync_write() adds length, but we already have it, so use write_all.
sync_crypto_read() handed an on-stack buffer to cryptomsg_decrypt_header,
which expected a tal() pointer, so use the known length instead.
sync_crypto_read() also failed to read the tag; add that in (no
overflow possible as 16 is an int, len is a u16).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems rather easy to fix, the only case we do not want to set
`STATE_SHUTDOWN` us when we have updates which we have not committed
yet, which is handled separately in the other IF-branch.
The peer is woken up every 30 seconds to deliver the backlog of
messages. Additionally I added the normal message queue to be able to
send non-gossip message to the peer.
The `dstate` reference was only an indirection to the `timers`
sub-structure anyway, so removing this indirection allows us to reuse
the timers in the subdaemon arch.
Turns out we want to permute transactions for the wallet too, so we
use void ** rather than assume we're shuffling htlc ** (and do inputs,
too!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a workaround; eventually libwally will be a nice shared library that
we won't have to bundle, and clashing with internal symbols won't be
a problem.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>