3.0.12 here:
```
pyln/testing/utils.py:9: error: Cannot find implementation or library stub for module named 'filelock'
pyln/testing/utils.py:9: error: Cannot find implementation or library stub for module named 'filelock'
pyln/testing/utils.py:9: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was causing the following error
```
Exception in thread Thread-553:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/tmp/cirrus-ci-build/contrib/pyln-testing/pyln/testing/utils.py", line 232, in tail
self.err_logs.append(line.rstrip().decode('UTF-8', 'replace')).rstrip()
AttributeError: 'NoneType' object has no attribute 'rstrip'
[gw5] [ 33%] FAILED tests/test_misc.py::test_bitcoin_failure
```
Notice the second call to `.rstrip()` on the return value of `.append()`
Both my machine and apparently the CI tester machines regularly run
into issues with load on the system, causing timeouts (and
unresponsiveness). The throttler throttles the speed with which new
instances of c-lightning get started to avoid overloading. Since the
plugin used for parallelism when testing spawns multiple processes we
need to lock on the fs. Since we have that file open already, we'll
also write a couple of performance metics to it.
We sometimes have very specific sequences of tx broadcasts and blocks
being generated to confirm them. If the confirmation is missed the
test can completely get out of sync. Make debugging this easier by
logging what we confirmed.
Logs may contain non-ASCII and non-UTF8 symbols, which crashes the
tailer. It's better to replace them with a glyph representing
undecodeable symbols instead, and handle the issue further up the
call-chain.
We rely on it when logging, but didn't set it in the class, rather we
did in the sub-classes. This proved to be rather annoying to search
since it'd just fail silently and not log any output.
This will change the current hardwired PostgreSQL binary lookup mechanism to
utilizing the `pg_config --bindir` utility that should work better for most
distributions. The old method caused issues at least on Archlinux.
Changelog-None
Since we start a new instance of postgres for each test we may end up swamped
and the startup can take a bit longer. So let's loop until we get a success.
It was really flaky, especially under `test_mpp_interference_2`, most likely
due to multiple calls to `fund_channel`. This commit looks for the specific
txids in the `listfunds` output and the `getrawmempool` output, avoiding
strange artifacts from multiple calls.
For performance reasons we were starting one for each session, which caused
the same postgres DB to be re-used for multiple tests (all test run in the
same worker process), but this could lead to interactions if there is a
timeout or a test happens to touch the `db_provider`. It turns out that we
were only saving about 15 seconds on a 1250 second run anyway, which is a
small cost for increased test isolation.
We were not removing the base test directory if we had other files in there,
which was the case for postgres runs. This now explicitly check for `test_*`
directories which are an indicator of a failed test.
We had a couple of issues with workers dying and attempting to re-initialize
the database while it was already initialized. This will look for a free
directory and just start the DB in there, allowing workers to be better
isolated.
The next patch perturbed things enough that we suddenly started
getting (with --track-origins=yes):
Valgrind error file: valgrind-errors.120470
==120470== Use of uninitialised value of size 8
==120470== at 0x14EBD5: htable_val (htable.c:150)
==120470== by 0x14EC3C: htable_firstval_ (htable.c:165)
==120470== by 0x14F583: htable_del_ (htable.c:349)
==120470== by 0x11825D: pointer_referenced (memleak.c:65)
==120470== by 0x118485: scan_for_pointers (memleak.c:121)
==120470== by 0x118500: memleak_remove_region (memleak.c:130)
==120470== by 0x118A30: call_memleak_helpers (memleak.c:257)
==120470== by 0x118A8B: call_memleak_helpers (memleak.c:262)
==120470== by 0x118A8B: call_memleak_helpers (memleak.c:262)
==120470== by 0x118B25: memleak_find_allocations (memleak.c:278)
==120470== by 0x10EB12: closing_dev_memleak (closingd.c:584)
==120470== by 0x10F3E2: main (closingd.c:783)
==120470== Uninitialised value was created by a heap allocation
==120470== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==120470== by 0x1604E8: allocate (tal.c:250)
==120470== by 0x160AA9: tal_alloc_ (tal.c:428)
==120470== by 0x119BE0: new_per_peer_state (per_peer_state.c:24)
==120470== by 0x11A101: fromwire_per_peer_state (per_peer_state.c:95)
==120470== by 0x10FB7C: fromwire_closingd_init (closingd_wiregen.c:103)
==120470== by 0x10ED15: main (closingd.c:626)
==120470==
This is because there is uninitialized padding at the end of struct
peer_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And when it's set, and we're SLOW_MACHINE, simply disable valgrind.
Since Travis (SLOW_MACHINE=1) only does VALGRIND=1 DEVELOPER=1 tests,
and VALGRIND=0 DEVELOPER=0 tests, it was missing tests which needed
DEVELOPER and !VALGRIND.
Instead, this demotes them to non-valgrind tests for SLOW_MACHINEs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can query all the txids at once, rather than one at a time.
Doesn't make any measurable difference to full runtime testing here
though.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We might have funds prior to calling join_nodes(), so testing that
we've all seen the block is better.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what fund_channel() does, which is more thorough than what
we were doing. But since the order of the logs is undefined, we need
to be a little careful.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reduces VALGRIND=1 node_factory.line_graph(5) time on my laptop from 42s to 36s.
This is simply because forking all the subdaemons just to check the
version is very expensive under valgrind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Telling `lightningd` to pass a `-datadir` to `bitcoin-cli` so it doesn't go
snooping where it doesn't belong (i.e., the user's home directory and config).
Changelog-None
Suggested-by: Simon Vrouwe <@SimonVrouwe>
Signed-off-by: Christian Decker <@cdecker>