I discovered this accidentally when using the `tests/plugins/dblog.py`
plugin on another testcase: tests/test_connection.py::test_fail_unconfirmed
There the plugin/hook crashes because it can't execute the statement:
```json
{
"jsonrpc": "2.0",
"id": 34,
"error": {
"code": -32600,
"message": "Error while processing db_write: unrecognized token: \"174WHERE\"",
"traceback": "Traceback (most recent call last):\n File \"/home/will/projects/lightning.git/contrib/pyln-client/pyln/client/plugin.py\", line 535, in _dispatch_request\n result = self._exec_func(method.func, request)\n File \"/home/will/projects/lightning.git/contrib/pyln-client/pyln/client/plugin.py\", line 520, in _exec_func\n return func(*ba.args, **ba.kwargs)\n File \"/home/will/projects/lightning.git/tests/plugins/dblog.py\", line 45, in db_write\n plugin.conn.execute(c)\nsqlite3.OperationalError: unrecognized token: \"174WHERE\"\n"
}
}
```
Changelog-Fixed: plugin: Regression with SQL statement expansion that could result in invalid statements being passed to the `db_write` hook.
Since we start a new instance of postgres for each test we may end up swamped
and the startup can take a bit longer. So let's loop until we get a success.
lightning-5 can sometimes see itself sweeping the unilateral output resulting
in this weird line:
```log
HTLC already resolved by SELF when we found preimage
```
In both cases the flakyness arises from the destination not knowing about the
modified fees of the forwarding node, thus including the outdated details in
the routehint, and the sender being unlucky and always trying with the
routehint anyway.
The long-term solutions to this is going to be #4111, this commit just reduces
the flakyness to get back to business.
It was really flaky, especially under `test_mpp_interference_2`, most likely
due to multiple calls to `fund_channel`. This commit looks for the specific
txids in the `listfunds` output and the `getrawmempool` output, avoiding
strange artifacts from multiple calls.
We really are just interested in their on-chain footprint, so actually
starting the nodes is pointless overhead, and caused a lot of flakyness due to
the output ordering sometimes not matching up. We now just use the `bitcoind`
API to fund, sign and send a raw transaction that matches the stashed gossip
messages.
Kinda uses an antipattern of `cd`ing into the directory and calling `make`
there, but this keeps the contrib dirs self-contained so we can split them out
eventually.
The 100MB log buffer has been the biggest memory footprint for the daemon.
Keeping 10MB for emergency log dumps seems sufficient.
This has been mentioned in the last developer meeting.
Changelog-Changed: In-memory log buffer reduced from 100MB to 10MB
For performance reasons we were starting one for each session, which caused
the same postgres DB to be re-used for multiple tests (all test run in the
same worker process), but this could lead to interactions if there is a
timeout or a test happens to touch the `db_provider`. It turns out that we
were only saving about 15 seconds on a 1250 second run anyway, which is a
small cost for increased test isolation.
We were not removing the base test directory if we had other files in there,
which was the case for postgres runs. This now explicitly check for `test_*`
directories which are an indicator of a failed test.
We had a couple of issues with workers dying and attempting to re-initialize
the database while it was already initialized. This will look for a free
directory and just start the DB in there, allowing workers to be better
isolated.
While continuing to test all tests is good, it also means we potentially wait
much longer to get the final verdict, and it can truncate logs if there are
too many failures.
I got a corrupt file, which looked like multiple concurrent attempts
to build it. So instead, build it in one command, but also use
VERBOSE so we print correctly with V=1 (and --quiet).
Also move into plugins/ where it logically belongs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We added a conversion of failcodes that do not have sufficient information in
faac4b28ad. That means that a failcode that'd require additional information
in order to be a correct error to return in an onion is mapped to a generic
one since we can't backfill the information.
This tests that the mapping is performed correctly and replicates the
situation in #4070