When we get a reconnection, kill the current remote peer, and wait for the
master to tell us it's dead. Then we hand it the new peer.
Previously, we would end up with gossipd holding multiple peers, and
the logging was really hard to interpret; I'm not completely convinced
that we did the right thing when one terminated, either.
Note that this now means we can have peers with neither ->local nor ->remote
populated, so we check that more carefully.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Lifetime of 'struct reaching' now only while we're actively doing connect.
2. Always free after a single attempt: if it's an important peer, retry
on a timer.
3. Have a single response message to master, rather than relying on
peer_connected on success and other msgs on failure.
4. If we are actively connecting and we get another command for the same
id, just increment the counter
The result is much simpler in the master daemon, and much nicer for
reconnection: if they say to connect they get an immediate response,
rather than waiting for 10 retries. Even if it's an important peer,
it fires off another reconnect attempt, unless it's actively
connecting now.
This removes exponential backoff: that's restored in next patch. It
also doesn't handle multiple addresses for a single peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Got some intermittant failures, mainly caused by the tests being slow
enough that the peer reconnected. We should always suppress
reconnection if we can, and not stress too much in the !DEVELOPER case
where we can't.
We should turn off dev-no-reconnect *always* unless told we will
reconnect, and since we can't if !DEVELOPER, don't do the connection
check there.
Instead of adding an option to line_graph, we remove it in favor
of connect (since we only use it with n=2 anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simplification of the offset calculation to use the rescan parameter, and rename
of `wallet_first_blocknum`. We now use either relative rescan from our last
known location, or absolute if a negative rescan was given. It's all handled in
a single location (except the case in which the blockcount is below our
precomputed offset), so this should reduce surprises.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Careful log examination revealed that we were generating a block before one
of the mutual close txs had entered the mempool. This is rare because it
means that both peers have to be too slow.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I had a weird failure which was caused by an unexpected disconnect and
reconnecct. Since we are prersistend and recover from these, they can
slip through our tests; most tests don't involve reconnection, so we
need to catch this explicitly.
For the connect() helper, we always suppress reconnection; tests which
want it all want other options so don't use this helper anyway. (Actually,
after I said that, test_closing_while_disconnected was added when I
rebased, which did require it, so I had to open-code that one).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also report tx and txid, and whether we closed unilaterally or
bilaterally, if we could close the channel.
Also make a manpage.
Fixes: #1207Fixes: #714Fixes: #622
Mixing the test into the existing test allows us to reduce the execution, but
adds a bit of complexity, so I added two small helpers.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We can have more than one; eg we might offer both bech32 and a p2sh
address, and in future we might offer v1 segwit, etc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This allows us to have some default options that can then be overridden easily
on a per-test basis.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
In the next commit we remove channels whose outpoint was spent from our network
view, so checking for it will not work anymore.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
This fixes the root cause of https://github.com/ElementsProject/lightning/issues/1212
where we deleted the payment because we wanted to retry, then retry failed
so we had an (old) HTLC without a matching payment. We then fed that
HTLC to onchaind, which tells us it's missing, and we try to fail the
payment and deref a NULL pointer.
Fixes: #1212
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We say "in N blocks" but we actually mean "N blocks after this tx" which is
actually N-1 or less. Change wording and tighten tests which misunderstood
this.
Also, the 'assert not l1.daemon.is_in_log('onchaind complete, forgetting peer')'
are unlikely to work until the daemon has actually seen the block, so add
sync_blockheight before all of those.
These changes reveal some sloppy testing, which we fix.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With the following patch applied, we could clearly see onchaind try to
broadcast the timeout tx one block too early:
sendrawtx exit 26, gave error code: -26?error message:?non-final (code 64)?
This is because of an out-by-one error in calculating the relative
depth required, since the out->tx_blockheight is already 1 before the
current block.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>