This was responsible for a huge number of loglines simply because we
log every subprocess start and termination. Moving the logging
upstream to where it is really needed gets rid of the polling, that
are successful. A highly unscientific test shows a reduction in
loglines produced by lightningd from 17000 to 10000 lines.
I caught the gossip daemon freeing a message, while it was queued to be
written. Using tal_dup_arr() is the Right Thing, as it handles taken()
properly automatically.
------------------------------- Valgrind errors --------------------------------
Valgrind error file: /tmp/lightning-rvc7d5oi/test_forward/lightning-3/valgrind-errors
==11057== Invalid read of size 8
==11057== at 0x1328F2: to_tal_hdr (tal.c:174)
==11057== by 0x133894: tal_len (tal.c:659)
==11057== by 0x11BBE7: do_write_wire (wire_io.c:103)
==11057== by 0x127B95: do_plan (io.c:369)
==11057== by 0x127C31: io_ready (io.c:390)
==11057== by 0x129461: io_loop (poll.c:295)
==11057== by 0x10CBB4: main (gossip.c:722)
==11057== Address 0x55a99d8 is 24 bytes inside a block of size 200 free'd
==11057== at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11057== by 0x133000: del_tree (tal.c:416)
==11057== by 0x132F77: del_tree (tal.c:405)
==11057== by 0x13333E: tal_free (tal.c:504)
==11057== by 0x1123F1: queue_broadcast (broadcast.c:38)
==11057== by 0x111EB0: handle_node_announcement (routing.c:918)
==11057== by 0x10B166: handle_gossip_msg (gossip.c:170)
==11057== by 0x10B76B: owner_msg_in (gossip.c:335)
==11057== by 0x12712E: next_plan (io.c:59)
==11057== by 0x127BD0: do_plan (io.c:376)
==11057== by 0x127C09: io_ready (io.c:386)
==11057== by 0x129461: io_loop (poll.c:295)
==11057== Block was alloc'd at
==11057== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11057== by 0x132AE7: allocate (tal.c:245)
==11057== by 0x1330A3: tal_alloc_ (tal.c:443)
==11057== by 0x1332A6: tal_alloc_arr_ (tal.c:491)
==11057== by 0x133FEC: tal_dup_ (tal.c:846)
==11057== by 0x112347: new_queued_message (broadcast.c:20)
==11057== by 0x11240B: queue_broadcast (broadcast.c:43)
==11057== by 0x111EB0: handle_node_announcement (routing.c:918)
==11057== by 0x10B166: handle_gossip_msg (gossip.c:170)
==11057== by 0x10B76B: owner_msg_in (gossip.c:335)
==11057== by 0x12712E: next_plan (io.c:59)
==11057== by 0x127BD0: do_plan (io.c:376)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
wire_io: make a copy in io_write_wire (unless taken()).
I hit a corner case where gossipd freed a duplicate while it was being
sent out; this kind of thing doesn't happen if io_write_wire() makes
a copy by default.
We also do a memcheck() here; this gives us a caller in the backtrace
if there are uninitialized bytes, rather than waiting until the write
which happens later.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
eg:
test_routing_gossip (__main__.LightningDTests) ... ERROR
======================================================================
ERROR: test_routing_gossip (__main__.LightningDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_lightningd.py", line 150, in tearDown
err_count += self.printValgrindErrors(node)
File "tests/test_lightningd.py", line 137, in printValgrindErrors
errors, fname = self.getValgrindErrors(node)
File "tests/test_lightningd.py", line 132, in getValgrindErrors
with open(error_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/lightning-l106st0a/test_routing_gossip/lightning-1/valgrind-errors'
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now in sync with 8ee57b97738b1e9467a1342ca8373d40f0c4aca5.
Our tool doesn't need to convert them any more, but we actually had a
mis-typed field in the HSM which needed fixing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The single string-based hostname and port has been retired in favor of
having multiple `struct ipaddr`s from the `node_announcement`. This
breaks the hostnames and ports from IRC, but I didn't bother to
backport ipaddr for it since it is only used in the legacy daemon.
Rather a big commit, but I couldn't figure out how to split it
nicely. It introduces a new message from the channel to the master
signaling that the channel has been announced, so that the master can
take care of announcing the node itself. A provisorial announcement is
created and passed to the HSM, which signs it and passes it back to
the master. Finally the master injects it into gossipd which will take
care of broadcasting it.
We alternated between using a sha256 and using a privkey, but there are
numerous places where we have a random 32 bytes which are neither.
This fixes many of them (plus, struct privkey is now defined in terms of
struct secret).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Under stress, the tests can mine blocks too soon, and the funding never
locks. This gives more of a chance, at least.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were getting an assert "!secp256k1_fe_is_zero(&ge->x)", because
an all-zero pubkey is invalid. We allow marshal/unmarshal of NULL for
now, and clean up the error handling.
1. Use status_failed if master sends a bad message.
2. Similarly, kill the gossip daemon if it gives a bad reply.
3. Use an array for returned pubkeys: 0 or 2.
4. Use type_to_string(trc, struct short_channel_id, &scid) for tracing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I couldn't actually figure out how to just dump them on error, so I
dump all the time. When running 3 lightningd + bitcoind, this separates
the logs nicely.
TODO: We should delete the directories on success!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
But it breaks:
test_forward (__main__.LightningDTests) ... lightningd_channel: Computed MAC does not match expected MAC, the message was modified.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I implemented this because a bug causes us to consider the HTLC malformed,
so I can trivially test it for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we now use the short_channel_id to identify the next hop we need
to resolve the channel_id to the pubkey of the next hop. This is done
by calling out to `gossipd` and stuffing the necessary information
into `htlc_end` and recovering it from there once we receive a reply.
This was overly complex since it was off-by-one and we were storing
some information elsewhere. Now this just loads the route as is into
structs, extracts some information for our outgoing HTLC, and then
shifts by the array of structs by one, and finally fills in the last
instruction, which is the terminal.
The new onion uses the `channel_id` instead of the `node_id` of the
next hop to identify where to forward the payment. So we return the
exact channel chosen by the routing algo, to avoid having to look it
up again later.
Mainly switching from the old include to the new include and adjusting
the actual size of the onion packet. It also moves `channel.c` to use
`struct hop_data`.
It introduces a dummy next hop in `channel.c` that will be replaced in
the next commit.