lightning

Commit Graph

Author	SHA1	Message	Date
Rusty Russell	a134ca9659	gossipd: use exponential backoff on reconnect for important peers. We start at 1 second, back off to 5 minutes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	bc4809aa85	gossipd: make sure master only ever sees one active connection. When we get a reconnection, kill the current remote peer, and wait for the master to tell us it's dead. Then we hand it the new peer. Previously, we would end up with gossipd holding multiple peers, and the logging was really hard to interpret; I'm not completely convinced that we did the right thing when one terminated, either. Note that this now means we can have peers with neither ->local nor ->remote populated, so we check that more carefully. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	be1f33b265	gossipd: have master explicitly tell us when peer is disconnected. Currently we intuit it from the fd being closed, but that may happen out of order with when the master thinks it's dead. So now if the gossip fd closes we just ignore it, and we'll get a notification from the master when the peer is disconnected. The notification is slightly ugly in that we have to disable it for a channel when we manually hand the channel back to gossipd. Note: as stands, this is racy with reconnects. See the next patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	1e282ecb7a	subd: record which ones connect to a peer. This comes in useful for the next patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	b68fb24758	read_peer_msg: handle incoming gossip from gossipd. This means that openingd and closingd now forward our gossip. But the real reason we want to do this is that it gives an easy way for gossipd to kill any active daemon, by closing its fd: previously closingd and openingd didn't read the fd, so tended not to notice. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	ab9d9ef3b8	gossipd: drain fd instead of passing around gossip index. (This was sitting in my gossip-enchancement patch queue, but it simplifies this set too, so I moved it here). In `94711969f` we added an explicit gossip_index so when gossipd gets peers back from other daemons, it knows what gossip it has sent (since gossipd can send gossip after the other daemon is already complete). This solution is insufficient for the more general case where gossipd wants to send other messages reliably, so replace it with the other solution: have gossipd drain the "gossip fd" which the daemon returns. This turns out to be quite simple, and is probably how I should have done it originally :( Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	9430a455ff	closing: don't go into temporary failure because we completed negotiation. It only lasts until the next block, but it's weird. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	72c459dd6c	gossipd: keep reaching struct only when we're actively connecting, and don't retry 1. Lifetime of 'struct reaching' now only while we're actively doing connect. 2. Always free after a single attempt: if it's an important peer, retry on a timer. 3. Have a single response message to master, rather than relying on peer_connected on success and other msgs on failure. 4. If we are actively connecting and we get another command for the same id, just increment the counter The result is much simpler in the master daemon, and much nicer for reconnection: if they say to connect they get an immediate response, rather than waiting for 10 retries. Even if it's an important peer, it fires off another reconnect attempt, unless it's actively connecting now. This removes exponential backoff: that's restored in next patch. It also doesn't handle multiple addresses for a single peer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	20e3a18af5	gossipd: maintain a separate structure to track important peers. Rather than using a flag in reaching/peer; we make it self-contained as the next patch puts it straight into a timer callback. Also remove unused 'succeeded' field from struct peer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	a1f77cab3c	lightningd: tell gossipd that peers we load from db are important. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	8c2c1fe1c2	openingd: tell gossipd that the peer is important once funding tx in place. And on channel_fail_permanent and closing (the two places we drop to chain), we tell gossipd it's no longer important. Fixes: #1316 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	c9fa9817f6	gossipd: explicitly track which peers are important. These don't have a maximum number of reconnect attempts, and ensure that we try to reconnect when the peer dies. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	b1498f07c5	gossipd: exponential backoff for reconnect (5 minute ceiling). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	5db8454755	test_lightningd.py: make tests more robust by suppressing reconnects. Got some intermittant failures, mainly caused by the tests being slow enough that the peer reconnected. We should always suppress reconnection if we can, and not stress too much in the !DEVELOPER case where we can't. We should turn off dev-no-reconnect always unless told we will reconnect, and since we can't if !DEVELOPER, don't do the connection check there. Instead of adding an option to line_graph, we remove it in favor of connect (since we only use it with n=2 anyway). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
ZmnSCPxj	079778e357	invoice: Check duplicate preimage when explicitly sprcified. Reported-by: @mcudev	7 years ago
Christian Decker	89ff46f1e6	db: Added DB migrations to get the correct sync height The no-rescan change requires us to rescan one last time from the first_blocknum of our channels (if we have any). The migrations just drop blocks that are higher, then insert a dummy with the first_blocknum, and then clean up after us. If we don't have any channels we don't go back at all. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	ac9e5581c8	pytest: Start nodes with --rescan=1 This shaves off about 15% of our integration testing suite on my machine. It assumes we never reorg below the first block the node starts with, which is true for all tests, so it's safe. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	74fa107578	pytest: Add a test for the new --rescan option Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	96352858d6	chaintopology: Simplify rescan offset computation Simplification of the offset calculation to use the rescan parameter, and rename of `wallet_first_blocknum`. We now use either relative rescan from our last known location, or absolute if a negative rescan was given. It's all handled in a single location (except the case in which the blockcount is below our precomputed offset), so this should reduce surprises. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	0f191f5d4f	opts: Add the --rescan option This is intended to recover from an inconsistent state, involving `onchaind`. Should we for some reason not restore the `onchaind` process correctly we can instruct `lightningd` to go back in time and just replay everything. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	7406a5b614	wallet: Report current blockheight as the offset to continue from This is a big simplification, we just report the DBs current blockchain height as the point to continue scanning, or the passed in default. No more guessing where to continue from or whether the wallet was used and when it first saw the light of day. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	4b22760cf9	onchaind: Replay stored channeltxs to restore onchaind state Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	244d4e49e1	onchaind: Store channeltxs so we can restore later Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	8e7ac53f5a	pytest: test onchaind restarts from the DB	7 years ago
Christian Decker	f44ea9f32e	channel: Allow channel lookup by database id Since we reference the channel ID to allow cascades in the database we also need the ability to look up a channel by its database ID. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	5e505e9c53	onchaind: Add a level of indirection to txwatches and txowatches This will allow us in the next commit to store the transactions that triggered this event in the DB and thus allowing us to replay them later on. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	876d698f3c	wallet: Add primitives to store onchaind transactions in the DB Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	d2dc93e3cb	wallet: Add a struct to represent an onchaind transaction This will be used to replay transactions that were witnessed in the blockchain during startup, so that onchaind can be recreate its state. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	28feb2eb7d	db: Add table for onchaind transactions These transactions being seen on the blockchain triggered some action in onchaind so we need to replay them when we restore the onchaind. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	4547afba33	onchaind: Move preimage transfer into onchaind startup We used to queue the preimages to be sent to onchaind only after receiving the onchaind_init_reply. Once we start replaying we might end up in a situation in which we queue the tx that onchaind should react to before providing it with the preimages. This commit just moves the preimages being sent, making it atomic with the init, and without changing the order. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	b84804009a	gossip: Use the DNS seeds to look up nodes if we don't have an addr Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	a0c1b7af1f	moveonly: Move DNS resolution to wireaddr conversion This is the simple getaddrinfo lookup and conversion into wireaddr. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
Christian Decker	c635396766	common: Moving some bech32 related utilities to bech32_util These were so far only used for bolt11 construction, but we'll need them for the DNS seed as well, so here we just pull them out into their own unit and prefix them. Signed-off-by: Christian Decker <decker.christian@gmail.com>	7 years ago
ZmnSCPxj	eb42804fcc	invoice: Support providing preimage when making invoice.	7 years ago
Rusty Russell	5ff0d40fed	travis: don't retry failing tests. Retrying gives spurious failures, since we see transactions from previous runs. That makes it near impossible to diagnose the actual problem. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	e2ba1d2290	test_lightningd.py: test_closing_different_fees must wait for txs to hit mempool Careful log examination revealed that we were generating a block before one of the mutual close txs had entered the mempool. This is rare because it means that both peers have to be too slow. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	16d5015d56	lightningd: fix shutdown with unconfirmed channel. We free the peers explicitly, but we don't free the unconfirmed channel: the result is that it gets freed twice. The workaround is to free the unconfirmed channel explicitly, but really the peer should be tal_link'ed as it's basically a reference counted structure. 1.974911451 lightningd(17906):INFO: 03b4bca72572889d4b44cd0f194f73d54972af367e1917579283122ee10fa05f54 chan #1: Owning subdaemon lightning_openingd died (62464) 1.980118094 lightningd(17906):BROKEN: FATAL SIGNAL 6 1.980150447 lightningd(17906):BROKEN: backtrace: common/daemon.c:42 (crashdump) 0x432ba0 1.980161268 lightningd(17906):BROKEN: backtrace: (null):0 ((null)) 0x7faeb18ff4af 1.980167045 lightningd(17906):BROKEN: backtrace: (null):0 ((null)) 0x7faeb18ff428 1.980171271 lightningd(17906):BROKEN: backtrace: (null):0 ((null)) 0x7faeb1901029 1.980175847 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:98 (call_error) 0x47543e 1.980181814 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:170 (check_bounds) 0x4755fb 1.980188065 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:180 (to_tal_hdr) 0x475649 1.980193756 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:504 (tal_free) 0x47600d 1.980199402 lightningd(17906):BROKEN: backtrace: lightningd/peer_control.c:118 (delete_peer) 0x423990 1.980205498 lightningd(17906):BROKEN: backtrace: lightningd/opening_control.c:574 (destroy_uncommitted_channel) 0x419df3 1.980212380 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x4757b0 1.980218052 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:400 (del_tree) 0x475c61 1.980223398 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:511 (tal_free) 0x476093 1.980229174 lightningd(17906):BROKEN: backtrace: lightningd/opening_control.c:549 (opening_channel_errmsg) 0x419d1a 1.980236227 lightningd(17906):BROKEN: backtrace: lightningd/subd.c:590 (destroy_subd) 0x42cf43 1.980242348 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x4757b0 1.980247771 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:400 (del_tree) 0x475c61 1.980252814 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:410 (del_tree) 0x475cb1 1.980258356 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:410 (del_tree) 0x475cb1 1.980263311 lightningd(17906):BROKEN: backtrace: ccan/ccan/tal/tal.c:511 (tal_free) 0x476093 1.980269189 lightningd(17906):BROKEN: backtrace: lightningd/lightningd.c:412 (main) 0x4144ed Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	d2b4e09e27	lightningd: re-allow closing negotiation when CLOSINGD_COMPLETE `d822ba1ee` accidentally removed this case, which is important: if the other side didn't get our final matching closing_signed, it will reconnect and try again. We consider the channel no longer "active" and thus ignore it, and get upset when it send the `channel_reestablish` message. We could just consider CLOSINGD_COMPLETE to be active, but then we'd have to wait for the closing transaction to be mined before we'd allow another connection. We can't special case it when the peer reconnects, because there could be (in theory) multiple channels for that peer in CLOSINGD_COMPLETE, and we don't know which one to reestablish. So, we need to catch this when they send the reestablish, and hand that msg to closingd to do negotiation again. We already have code to note that we're in CLOSINGD_COMPLETE and thus ignore any result it gives us. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	5551c161ca	gossipd: finish startup before master prints that it's ready. We're about to remove automatic retrying of connect, and that uncovered that we actually print out our "Server started" message before we create the listening socket. Move the init higher (outside the db transaction) and make it a request/response, the loop until it's done. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	8e976150ad	json_fundchannel: fix release vs connect/nongossip race. The new connect code revealed an existing race: we tell gossipd to release the peer, but at the same time it connects in. gossipd fails the release because the peer is remote, and json_fundchannel fails. Instead, we catch this race when we get peer_connected() and we were trying to open a channel. It means keeping a list of fundchannels which are awaiting a gossipd response though. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	bee795ed68	channeld: don't do explicit state update. We missed it in some corner cases where we crashed/were killed between being told of the lockin and sending the channel_normal_operation message. When we were restarted, we were told both sides were locked in already, so we never updated the state. Pull the entire "tell channeld" logic into channel_control.c, and make it clear that we need to keep waching if we cant't tell channeld. I think we did get this correct in practice, since funding_announce_cb has the same test, but it's better to be clear. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	22fe2c921f	lightningd: commit short-channel-id to db when we create it. We'd usually commit to the db soon, but there's a window where it could be missed. Also moves loc into the block it's used and make it tmpctx to avoid an explicit free. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	7604f27fb8	lightningd: make sure openingd and uncommitted_channel free each other. Without this, we can get errors on shutdown: Valgrind error file: valgrind-errors.27444 ==27444== Invalid read of size 8 ==27444== at 0x1950E2: secp256k1_pubkey_load (secp256k1.c:127) ==27444== by 0x19CF87: secp256k1_ec_pubkey_serialize (secp256k1.c:189) ==27444== by 0x14FED9: towire_pubkey (towire.c:59) ==27444== by 0x15AAFB: towire_gossipctl_peer_disconnected (gen_gossip_wire.c:969) ==27444== by 0x1253EF: opening_channel_errmsg (opening_control.c:526) ==27444== by 0x1386A3: destroy_subd (subd.c:589) ==27444== by 0x18222C: notify (tal.c:240) ==27444== by 0x1826E1: del_tree (tal.c:400) ==27444== by 0x182733: del_tree (tal.c:410) ==27444== by 0x182733: del_tree (tal.c:410) ==27444== by 0x182B1F: tal_free (tal.c:511) ==27444== by 0x11FC53: main (lightningd.c:410) ==27444== Address 0x6c3af98 is 72 bytes inside a block of size 216 free'd ==27444== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==27444== by 0x1827BC: del_tree (tal.c:421) ==27444== by 0x182B1F: tal_free (tal.c:511) ==27444== by 0x11F3C7: shutdown_subdaemons (lightningd.c:211) ==27444== by 0x11FC27: main (lightningd.c:406) ==27444== Block was alloc'd at ==27444== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==27444== by 0x182296: allocate (tal.c:250) ==27444== by 0x182863: tal_alloc_ (tal.c:448) ==27444== by 0x12F2DF: new_peer (peer_control.c:74) ==27444== by 0x125600: new_uncommitted_channel (opening_control.c:576) ==27444== by 0x125870: peer_accept_channel (opening_control.c:668) ==27444== by 0x13032A: peer_sent_nongossip (peer_control.c:427) ==27444== by 0x116B9E: peer_nongossip (gossip_control.c:60) ==27444== by 0x116F2B: gossip_msg (gossip_control.c:172) ==27444== by 0x138323: sd_msg_read (subd.c:503) ==27444== by 0x137C02: read_fds (subd.c:330) ==27444== by 0x175550: next_plan (io.c:59) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	d0bfd8407a	test_lightningd.py: catch unexpected reconnections. I had a weird failure which was caused by an unexpected disconnect and reconnecct. Since we are prersistend and recover from these, they can slip through our tests; most tests don't involve reconnection, so we need to catch this explicitly. For the connect() helper, we always suppress reconnection; tests which want it all want other options so don't use this helper anyway. (Actually, after I said that, test_closing_while_disconnected was added when I rebased, which did require it, so I had to open-code that one). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	05ba976a41	lightningd: --dev-no-reconnect needs to always suppress reconnection. It didn't in the restore-from-db case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
ZmnSCPxj, ZmnSCPxj jxPCSmnZ	d6bf7930b8	Loosen` close` timeout in `test_closing_different_fees`	7 years ago
ZmnSCPxj	0b331a2b60	test_lightningd.py: Clean up some uses of 'close' RPC.	7 years ago
Rusty Russell	68758a5d42	json_close: test that it works while disconnected. It should, indeed, close once they reconnect. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
ZmnSCPxj	2cee1ab20f	peer_control: Make close wait for complete closure, with timeout. Also report tx and txid, and whether we closed unilaterally or bilaterally, if we could close the channel. Also make a manpage. Fixes: #1207 Fixes: #714 Fixes: #622	7 years ago
BT	4673ba6a0a	Add in sudo apt-get update into ubuntu Add in sudo apt-get update into ubuntu instructions because sometimes the process fails if not updated	7 years ago

1 2 3 4 5 ...

3717 Commits (071ef628db543117e607f3d9081873b66d617b6d) All Branches Search

3717 Commits (071ef628db543117e607f3d9081873b66d617b6d)

All Branches