We were getting a couple of starvations, so we need a fair filelock. I
also wasn't too happy with the lock as is, so I hand-coded it quickly.
Should be correct, but the overall timeout will tell us how well we
are doing on CI.
Both my machine and apparently the CI tester machines regularly run
into issues with load on the system, causing timeouts (and
unresponsiveness). The throttler throttles the speed with which new
instances of c-lightning get started to avoid overloading. Since the
plugin used for parallelism when testing spawns multiple processes we
need to lock on the fs. Since we have that file open already, we'll
also write a couple of performance metics to it.
For performance reasons we were starting one for each session, which caused
the same postgres DB to be re-used for multiple tests (all test run in the
same worker process), but this could lead to interactions if there is a
timeout or a test happens to touch the `db_provider`. It turns out that we
were only saving about 15 seconds on a 1250 second run anyway, which is a
small cost for increased test isolation.
We were not removing the base test directory if we had other files in there,
which was the case for postgres runs. This now explicitly check for `test_*`
directories which are an indicator of a failed test.
And when it's set, and we're SLOW_MACHINE, simply disable valgrind.
Since Travis (SLOW_MACHINE=1) only does VALGRIND=1 DEVELOPER=1 tests,
and VALGRIND=0 DEVELOPER=0 tests, it was missing tests which needed
DEVELOPER and !VALGRIND.
Instead, this demotes them to non-valgrind tests for SLOW_MACHINEs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
pytest captures the output by monkey patching out `sys.stdout`. This may
conflict with our use of `sys.stdout` when configuring logging, resulting in
the "Write to closed file" issue that is spamming the logs. By making the
logging configuration a fixture hopefully we always use the correct
stdout (after pytest has monkey-patched it).
For some reason we fail to remove the test directory in some cases. My
hypothesis is that it is a daemon that is not completely shut down yet, and
still writes to the directory. This commit intercepts the error, prints any
files in the directory and re-raises the error. This should allow us to debug
the reappears.
Some tests may not spawn a node at all, so make sure that our assumption that
the directory exists in the fixture cleanup is correct by creating the
directory.
We were hardcoding the chainparams->chain_hash which caused the query to
return an empty result. By parametrizing the test we can make it work on
elements.
In the c-lightning tests we have `tests/conftest.py` which annotates test
function with the outcome. If we use pyln-testing outside of the c-lightning
tree we cannot rely on that annotation being there, so we assume it passed.
Quite a few of the things in the LightningNode class are tailored to their use
in the c-lightning tests, so I decided to split those customizations out into
a sub-class, and adding one more fixture that just serves the class. This
allows us to override the LightningNode implementation in our own tests, while
still having sane defaults for other users.
We'll rewrite the tests to use this infrastructure in the next commit.
Changelog-Added: The new `pyln-testing` package now contains the testing infrastructure so it can be reused to test against c-lighting in external projects
Since elements addresses look quite different from the bitcoin mainnet
addresses I just added a sample to the chainparams fixture. In addition I
extracted some of the fixed strings to reference chainparams instead.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We are checking against chain-dependent constants, so let's make sure we are
using the ones for the correct chain.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We will soon have a postgres backend as well, so we need a way to control the
postgres process and to provision DBs to the nodes. The two interfaces are the
dsn that we pass to the node, and the python query interface needed to query
from tests.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
It seems we spend a lot of time waiting for `bitcoind` and `lightningd` to
talk to disks. This adds the `TEST_DIR` environment variable, allowing for
example to use `/dev/shm`, or a faster disk than the disk `/tmp` is on, as the
root directory for all test-related files.
Testing this on one of our builder machines cut the time to run the entire
suite under valgrind roughly in half (180-200 seconds vs 440-490 seconds).
My machine would accumulate a number of zombie lightningd and bitcoind
processes over time while testing. Investigating this showed that if a fixture
raised an exception during fixture teardown then other fixtures that have not
been torn down would linger around. The issue is that pytest treats exceptions
in fixtures as non-recoverable and therefore will not catch them and call the
remaining ones.
This commit adds a new fixture, that is there just to collect eventual errors
from other fixtures and ensure that anything that needs to clean up something,
e.g., processes started by the fixture, are cleaned up before we raise an
eventual exception. This is achieved by making any fixture that needs cleaning
up dependent on the teardown_checks fixture, which also serves as central
point to collect errors and printer of eventual errors.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
log files were being deleted on memleak errors, since
we weren't marking the node has having an error.
this helper function is designed to exactly handle this, so
we use the helper function and modify it to print any additional
error messages that are handed back from killall.
Throwing an exception while killing all nodes meant that
we aren't cleaning up all the nodes properly. Instead,
collect the errors, and return them back to the upper level,
where we report them and terminate as expected.
Memleaks appear in the logs as 'broken', so the broken log
check captures them as well. This moves broken to after memleak
so we get more informative error messages.
We were checking the test request against the searched for string. This fixes
it by actually looking at the outcome instead and should clean up correctly
if tests do not fail.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
And clean up some dev ones which actually happen (mainly by calling
channel_fail_permanent which logs UNUSUAL, rather than
channel_internal_error which logs BROKEN).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically, any "Bad" message from gossipd is something we should look
at. This covers failures loading the gossip_store, too!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were triggering a second exception in the directory cleanup step by
attempting to access a field that'd only be set upon entering the test code
itself. That error did not contribute to the problem resolution, so now we
check whether that field is set before accessing it.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
We were restarting the with the nodes before, which was causing some
port contention. This is more natural since `bitcoind` will take care
of terminating all proxies it returned.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Since we are planning to release a bug fix release, and the plugin
subsystem is not yet complete, it is better to make plugin support
opt-in while we continue testing.
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Unlike other daemons, closingd doesn't listen to the master, but runs
simply to its own beat. So instead of responding to the JSON dev_memleak
command, we always check for memory leaks, and make sure that the
python tests fail if they see MEMLEAK in the logs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently just ignore them. This is one reason the hsm (in some places)
explicitly calls log_broken so we get some idea.
This was the only subdaemon which had a NULL msgcb and msgname, so eliminate
those checks in subd.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a simple reverse proxy that `bitcoin-cli` can talk to when invoked by
`lightningd`. It allows us to trace `bitcoin-cli` calls, and intercept calls to
mock the replies, better than the current bash-script based method.