We had a couple of instances where a plugin would be killed by `lightningd`
because we were returning a result of an exception twice, and it was hard to
trace down the logic error in the user plugin that caused that. This patch
adds a traceback the first time we return a result/exception, and raise an
exception with a stacktrace of the first termination when a second one comes
in.
This can still terminate the plugin, but the programmer gets a clear
indication where the result was set, and can potentially even recover from it.
Changelog-Added: pyln: Plugin method and hook requests prevent the plugin developer from accidentally setting the result multiple times, and will raise an exception detailing where the result was first set.
It is often pretty usefuk to use the builtin logging module to debug things,
including libraries that a plugin may use. This adds a simple
`PluginLogHandler` that maps the python logging levels to the `lightningd`
logging levels, and formats the record in a way that it doesn't clutter up the
`lightningd` logs (no duplicate timestamps and levels).
This allow us to tweak the log level that is reported to `lightningd` simply
using the following
```python3
import logging
logging.basicConfig(level=logging.DEBUG)
```
Notice that in order for the logs to be displayed on the terminal or the
logfile, both the logging level in the plugin _and_ the `--log-level`
`lightningd` is running need to be adjusted (the python logging level only
controls which messages get forwarded to `lightningd`, it does not have the
power to overrule `lightningd` about what to actually display).
I chose `logging.INFO` as the default, since libraries have a tendency to spew
out everything in `logging.DEBUG` mode
Changelog-Added: pyln: Plugins have been integrated with the `logging` module for easier debugging and error reporting.
Since we start a new instance of postgres for each test we may end up swamped
and the startup can take a bit longer. So let's loop until we get a success.
It was really flaky, especially under `test_mpp_interference_2`, most likely
due to multiple calls to `fund_channel`. This commit looks for the specific
txids in the `listfunds` output and the `getrawmempool` output, avoiding
strange artifacts from multiple calls.
For performance reasons we were starting one for each session, which caused
the same postgres DB to be re-used for multiple tests (all test run in the
same worker process), but this could lead to interactions if there is a
timeout or a test happens to touch the `db_provider`. It turns out that we
were only saving about 15 seconds on a 1250 second run anyway, which is a
small cost for increased test isolation.
We were not removing the base test directory if we had other files in there,
which was the case for postgres runs. This now explicitly check for `test_*`
directories which are an indicator of a failed test.
We had a couple of issues with workers dying and attempting to re-initialize
the database while it was already initialized. This will look for a free
directory and just start the DB in there, allowing workers to be better
isolated.
Several times we had issues with plugins not being able to re-encode an RPC
result because they forgot to use the custom encoder class. This allows us to
patch the JSONEncoder when we start the RPC or the plugin and automagically
support classes that provide a `to_json` method.
The next patch perturbed things enough that we suddenly started
getting (with --track-origins=yes):
Valgrind error file: valgrind-errors.120470
==120470== Use of uninitialised value of size 8
==120470== at 0x14EBD5: htable_val (htable.c:150)
==120470== by 0x14EC3C: htable_firstval_ (htable.c:165)
==120470== by 0x14F583: htable_del_ (htable.c:349)
==120470== by 0x11825D: pointer_referenced (memleak.c:65)
==120470== by 0x118485: scan_for_pointers (memleak.c:121)
==120470== by 0x118500: memleak_remove_region (memleak.c:130)
==120470== by 0x118A30: call_memleak_helpers (memleak.c:257)
==120470== by 0x118A8B: call_memleak_helpers (memleak.c:262)
==120470== by 0x118A8B: call_memleak_helpers (memleak.c:262)
==120470== by 0x118B25: memleak_find_allocations (memleak.c:278)
==120470== by 0x10EB12: closing_dev_memleak (closingd.c:584)
==120470== by 0x10F3E2: main (closingd.c:783)
==120470== Uninitialised value was created by a heap allocation
==120470== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==120470== by 0x1604E8: allocate (tal.c:250)
==120470== by 0x160AA9: tal_alloc_ (tal.c:428)
==120470== by 0x119BE0: new_per_peer_state (per_peer_state.c:24)
==120470== by 0x11A101: fromwire_per_peer_state (per_peer_state.c:95)
==120470== by 0x10FB7C: fromwire_closingd_init (closingd_wiregen.c:103)
==120470== by 0x10ED15: main (closingd.c:626)
==120470==
This is because there is uninitialized padding at the end of struct
peer_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently ``LightningConnection.remote_pubkey`` is set to ``None`` if the node is not the handshake initiator. This sets it to ``rs`` in act three from the receiver side
Since pyln-bolt* specify the 0.8.4 version which we didn't upload, and the
requirements.txt specify ==0.8.4, we need to backfill that version, even if we
could just bump it directly to 0.9.1.
Hooks do not tolerate failures at all. If we return a JSON-RPC error to a hook
call the only thing the main daemon can really do is to crash. This commit
adds a mapping of error to a safe fallback result, including a warning to the
node operator that this should be addressed in the plugin. The warning is
reported as a `**BROKEN**` message, and should therefore fail any testing done
on the plugin.
Changelog-Fixed: pyln: Fixed HTLCs hanging indefinitely if the hook function raises an exception. A safe fallback result is now returned instead.
Fixes: #2679
Changelog-Added: JSON-RPC: New `multiwithdraw` command to batch multiple onchain sends in a single transaction. Note it shuffles inputs and outputs, does not use BIP69.
Re-write start_ln such that we can create up to 10 nodes locally for
testing. Useful for scenarios where more than two nodes are needed
Changelog-Changed: contrib: startup_regtest.sh `startup_ln` now takes a number of nodes to create as a parameter