Browse Source

daemon/peer: handle narrow reconnect race on close.

Usually if we get a packet while closing (onchain event), we're going
through pkt_in which discards it.  However, if we're reconnecting, we
simply process the init packet and get upset because they've forgotten
us.

Hard to reproduce, but here's the log (in this case, test-routing --reconnect
and we have just done mutual close):

We reconnect in STATE_MUTUAL_CLOSING, send INIT pkt:

   +19.397025114 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Init with ack 1 opens + 9 sigs + 8 revokes + 1 shutdown + 1 closing

While waiting for response, we see the mutual close...
   +19.398732602 lightningd(4637):DEBUG: reaped 6370: bitcoin-cli -regtest=1 -datadir=/tmp/bitcoin-lightning2 getblock 2a63b209e17aedc5b1bcc6c2f9e044f97c9c3ca136fc64a719f704d2f632df5f false
   +19.401834422 lightningd(4637):DEBUG: Adding block 5fdf32f6d204f719a764fc36a13c9c7cf944e0f9c2c6bcb1c5ed7ae109b2632a
   +19.405167334 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Got UTXO spend for 8bb48a:0: 7f5e422f...

   +19.412543610 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: anchor_spent: STATE_MUTUAL_CLOSING => STATE_CLOSE_ONCHAIN_MUTUAL

And we also see it buried "forever" (10 blocks in test mode), so we forget peer:
   +19.423045014 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Anchor at depth 13
   +19.426775063 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: check_for_resolution: STATE_CLOSE_ONCHAIN_MUTUAL => STATE_CLOSED
   +19.427613109 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_forget_peer(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
   +19.428130685 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_start_transaction(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)
   +19.501027511 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: db_commit_transaction(023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898)

Now, we get their reply, but they've forgotten us:
   +19.520208608 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Decrypted header len 5
   +19.520872035 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Received packet LEN=5, type=PKT__PKT_INIT
   +19.520999082 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Our order counter is 19, their ack 0
   +19.521078913 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: They acked 0, remote=16 local=15
   +19.521447174 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_OPEN (order=19)
   +19.522563794 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_OPEN_COMMIT_SIG (order=19)
   +19.523517319 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:BROKEN: Can't rexmit 2 when local commit 15 and remote 16
   +19.524613177 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:UNUSUAL: Sending PKT_ERROR: invalid ack
   +19.526638447 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: Queued pkt PKT__PKT_ERROR (order=19)
   +19.527508022 023ec94fb93c669154ba7b08907276e8c8661b2e65d80fc2c089215d5395574898:DEBUG: peer_comms_err: STATE_CLOSED => STATE_ERR_BREAKDOWN

We should never transition from STATE_CLOSED to STATE_ERR_BREAKDOWn,
and that's what this check prevents.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ppa-0.6.1
Rusty Russell 8 years ago
parent
commit
293bebbe2d
  1. 4
      daemon/peer.c

4
daemon/peer.c

@ -2496,6 +2496,10 @@ static struct io_plan *init_pkt_in(struct io_conn *conn, struct peer *peer)
goto fail;
}
/* We might have had an onchain event while handshaking! */
if (!state_can_io(peer->state))
goto fail;
if (peer->inpkt->init->has_features) {
size_t i;

Loading…
Cancel
Save