lightning

Commit Graph

Author	SHA1	Message	Date
Rusty Russell	1d0c433dc4	channeld: treat all incoming errors as "soft", so we retry. We still close the channel if we send an error, but we seem to have hit another case where LND sends an error which seems transient, so this will make a best-effort attempt to preserve our channel in that case. Some test have to be modified, since they don't terminate as they did previously :( Changelog-Changed: quirks: We'll now reconnect and retry if we get an error on an established channel. This works around lnd sending error messages that may be non-fatal. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	5 years ago
Rusty Russell	6c7a45623e	common: detect "sync error" from lnd. I'm deeply reluctant to do this, as I'd thought this was fixed with recent lnd versions. Logs below show that it continues, with channel loss on almost every restart. At this rate, we risk bifurcating the network. In fact, only four errors my node have ever been NOT "sync error". 2018-09-12T01:21:40.671Z lightningd(1263): 03e50492eab4107a773141bb419e107bda3de3d55652e6e1a41225f06a0bbf2d56 chan #3: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel b7008735ad2425ab92bcffa2f255ba93f63e0b5c685368f308e76ca0d2a30a41: sync error 2018-12-07T06:41:26.209Z lightningd(1215): 03da1c27ca77872ac5b3e568af30673e599a47a5e4497f85c7b5da42048807b3ed chan #1038: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 48858b0d55ae982596932ceb72584d4bb31363b9ecbaa56721b158ca4d18f5f8: sync error 2018-12-07T06:41:43.707Z lightningd(1215): 0219c2f8818bd2124dcc41827b726fd486c13cdfb6edf4e1458194663fb07891c7 chan #2508: Peer permanent failure in CHANNELD_AWAITING_LOCKIN: lightning_channeld: received ERROR channel 388b653e433773d20d74a151c552df647b74e240ef983d21a6d6c5816523b858: sync error 2018-12-07T06:41:45.553Z lightningd(1215): 03e50492eab4107a773141bb419e107bda3de3d55652e6e1a41225f06a0bbf2d56 chan #1044: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel b58e9391383bfbe848da881ab9ddd9a8987c76318d421dac6f552b0d451ff957: sync error 2018-12-07T06:41:46.501Z lightningd(1215): 0390b5d4492dc2f5318e5233ab2cebf6d48914881a33ef6a9c6bcdbb433ad986d0 chan #871: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 91f43cb6a8c37d0be237a7c491f11d9dfad48534699fb4f076b2c0cbde964424: sync error 2018-12-07T06:41:46.985Z lightningd(1215): 03e5ea100e6b1ef3959f79627cb575606b19071235c48b3e7f9808ebcd6d12e87d chan #1026: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 6cc360db0627b19df146ccd971570c14597b22662bbc0907a233042480e50be7: sync error 2018-12-07T06:41:47.340Z lightningd(1215): 03c2abfa93eacec04721c019644584424aab2ba4dff3ac9bdab4e9c97007491dda chan #1420: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel f363d174390bf819b47e568cb5890c8e432d61c03ba0d38d7c53996679080a74: sync error 2018-12-07T06:41:47.641Z lightningd(1215): 032679fec1213e5b0a23e066c019d7b991b95c6e4d28806b9ebd1362f9e32775cf chan #1058: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 602dc88c7f333ed88f24c6f2c760cb53fa359a4299dfab677f6a81ca33613231: sync error 2019-01-06T10:56:47.332Z lightningd(1202): 02cdf83ef8e45908b1092125d25c68dcec7751ca8d39f557775cd842e5bc127469 chan #2608: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 17b7c895c3feb6ae5b8209ef05044b0aa125629ef1ebc2ce6b2efb27e231533b: sync error 2019-01-06T10:57:08.896Z lightningd(1202): 0219c2f8818bd2124dcc41827b726fd486c13cdfb6edf4e1458194663fb07891c7 chan #2610: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 52d5e3717c7b4f6b06f2b7d55aa8d904a0558706e18be981c82d2c11d4bdf82c: sync error 2019-01-06T10:57:08.950Z lightningd(1202): 02ad6fb8d693dc1e4569bcedefadf5f72a931ae027dc0f0c544b34c1c6f3b9a02b chan #7185: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 245438c15a986b53da7694114c646b77ab663d236d7928732764f5b9251cd2d1: sync error 2019-01-15T09:15:26.882Z lightningd(1191): 03a76b80027d7c067e0da77da95880faaf89e9bf87b73a7d57bd4a3f2a124b764f chan #7430: Peer permanent failure in CHANNELD_AWAITING_LOCKIN: lightning_channeld: received ERROR channel 97c1e01612faf5653af2980abdf382c0f3b24d8a5961b6a3a1eb12444cf9db2e: sync error 2019-05-02T11:32:06.511Z lightningd(14815): 036e8a8efeb26f3cffce99f462839ef6ea3b1691d569d59c402be0d3d6cef9b79c chan #7573: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 6766b0b14013de753f9b354ce7a4b6e4756165ef970aae2650aeda990cfe5687: sync error 2019-06-12T10:38:57.503Z lightningd(1264): 024d2387409269f3b79e2708bb39b895c9f4b6a8322153af54eba487d4993bf60f chan #9607: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 1f3111399c670dab87b4e3d7bac22865c29d4c9992df71fdce9e8893666a08bc: sync error 2019-06-12T10:41:00.435Z lightningd(1264): 02809e936f0e82dfce13bcc47c77112db068f569e1db29e7bf98bcdd68b838ee84 chan #9332: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel a31b5252be9b001f573e00310ea9098532c81322389aa8721946185b1b70ca4c: sync error 2019-06-12T10:46:23.097Z lightningd(1264): 02fcdb04f51d61dddc0481c10751173d523e3408ebe3a848a1d6cb34b1f5df6668 chan #7586: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel bd18e98f5bd56ac73e7b2eb7fd70f6dbe3a4dda1e5bebe7bf6484c3a0f6b55e7: sync error 2019-06-12T10:46:24.627Z lightningd(1264): 03bb88ccc444534da7b5b64b4f7b15e1eccb18e102db0e400d4b9cfe93763aa26d chan #9626: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 345e89c2f0100257940aff7413c1e29786d08b0a1ea1e259d577650d18791872: sync error 2019-06-12T10:46:26.381Z lightningd(1264): 0331f80652fb840239df8dc99205792bba2e559a05469915804c08420230e23c7c chan #9677: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel d38752727ed5dab33abb06c5671e9d7d467feb469f0d249aa488f45e304221c1: sync error 2019-06-12T12:12:51.261Z lightningd(1264): 02d3366059edde4179fc0d071828b4bd726effba7225c3851f3d86a6a827f934a2 chan #9804: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel d00c9eb31bb0c1f5794804114117be3cc75a756a1e4c08099b7188a5fd9f7215: sync error 2019-06-13T03:19:28.212Z lightningd(1218): 03e5ea100e6b1ef3959f79627cb575606b19071235c48b3e7f9808ebcd6d12e87d chan #10792: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 873a526043bbc680ea4398c7a45b9742762d782dea285c661bb90ab8f165976d: sync error 2019-06-13T06:19:52.486Z lightningd(1230): 030995c0c0217d763c2274aa6ed69a0bb85fa2f7d118f93631550f3b6219a577f5 chan #10743: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 29157b32dd0c13bcf4f785c5527d067159e102d62516e3a00fbf2c0f33bf59ec: sync error 2019-06-14T01:25:37.598Z lightningd(1235): 02cf60741c586aa54ff24381beab1aebf45eda61a8c49b043cf1f6e203e611e581 chan #12786: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 827472a7167ab1fecd680e4f28e1ee74bcd25d04dcdea5d1295ba381b6543661: sync error 2019-07-17T03:37:12.703Z UNUSUAL lightningd(1262): 03021c5f5f57322740e4ee6936452add19dc7ea7ccf90635f95119ab82a62ae268 chan #14764: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 5ff0890d9f1fbb63439a7d793c28cb74c3baef8c9b610c51c64b8a6497237540: sync error 2019-07-17T03:37:14.964Z UNUSUAL lightningd(1262): 030c3f19d742ca294a55c00376b3b355c3c90d61c6b6b39554dbc7ac19b141c14f chan #14839: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 79525ec2c4eaffb5fd6893957f330db81b7383c50d57113d5bf8ffee3c121bdc: sync error 2019-07-17T03:37:16.048Z UNUSUAL lightningd(1262): 028c1da32603fce64118e469ffe2cfeec04d1c4bd88205efb4e8b4208f77a8064e chan #14996: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 6913067c9c89404d9451df25fed1a6cc98b9d9ef801b623d5e8e90aa43ca3077: sync error Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	dd79813a75	common: add peer_error flag to treat this error as "soft". The spec says to close the channel if they send us an error, but we need to be more lenient to preserve channels with other implementations. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	38d2899fbb	common/per_per_state: generalize lightningd/peer_comm Part 1 Encapsulating the peer state was a win for lightningd; not surprisingly, it's even more of a win for the other daemons, especially as we want to add a little gossip information. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	13717c6ebb	gossipd: hand a gossip_store_fd to all subdaemons. This will let them read from the gossip store directly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	b4e6a0fcad	peer_failed: write error message to peer directly. We currently hand the error back to the master, who then stores it for future connections and hands it back to another openingd to send and exit. Just send directly; it's more reliable and simpler. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	09cce4a9c7	common/read_peer_msg: deconstruct into individual helper routines. The One Big API is confusing, and has enough corner cases that we should ditch it rather than add more. See: https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction In particular, when openingd is changed to chat to peers even when it's not actively opening a channel, it wants to handle (most) errors by continuing, not calling peer_failed(). This exposes the constituent parts. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	6 years ago
Rusty Russell	ab9d9ef3b8	gossipd: drain fd instead of passing around gossip index. (This was sitting in my gossip-enchancement patch queue, but it simplifies this set too, so I moved it here). In `94711969f` we added an explicit gossip_index so when gossipd gets peers back from other daemons, it knows what gossip it has sent (since gossipd can send gossip after the other daemon is already complete). This solution is insufficient for the more general case where gossipd wants to send other messages reliably, so replace it with the other solution: have gossipd drain the "gossip fd" which the daemon returns. This turns out to be quite simple, and is probably how I should have done it originally :( Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	9cffa03647	peer_failed: set permanent slot when we fail the peer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	dba08f9d1b	peer_failed: don't send error ourselves. gossipd actually does that now, so we don't need this synchronous send hack. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	02d469b3d4	peer_failed: hand fds back to master when we fail. master now hands it back to gossipd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	f76ff90485	status: split off error messages into a new 'peer_status' type. Several daemons (onchaind, hsm) want to use the status messages, but don't communicate with peers. The coming changes made them drag in more code they didn't need, so instead we have a different non-overlapping type. We combine the status_received_errmsg and status_sent_errmsg into a single status_peer_error, with the presence or not of the 'error_for_them' field indicating direction. We also rename status_fatal_connection_lost() to peer_failed_connection_lost() to fit in. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	201d498e39	peer_failed: automatically hand PEER_FD, GOSSIP_FD; add gossip_index We make it a macro, since everyone uses PEER_FD and GOSSIP_FD constants (they're actually always the same, but this is slightly safer), and add a gossip_index arg: this is groundwork for when we want to hand the peer back to master for gossipd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	cc9ca82821	status: separate types for peer failure vs "impossible" failures. Ideally we'd rename status_failed() to status_fatal(), but that's too much churn for now. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	e34ec8da2d	peer_failed: use towire_errorfmtv() which doesn't add nul terminator. This code was actually wrong. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	ef28b6112c	status: use common status codes for all the failures. This change is really to allow us to have a --dev-fail-on-subdaemon-fail option so we can handle failures from subdaemons generically. It also neatens handling so we can have an explicit callback for "peer did something wrong" (which matters if we want to close the channel in that case). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	a37c165cb9	common: move some files out of lightningd/ Basically all files shared by different daemons. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	7 years ago
Rusty Russell	80886cda8a	daemon_conn: fix daemon_conn_sync_flush. We need to set fd to blocking before trying to sync write. Use io_fd_block() elsewhere, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	8 years ago
Rusty Russell	456fa39380	sync_crypto_write: support take(msg) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	8 years ago
Rusty Russell	7389aae26a	Massive BOLT text underscore and formatting updates. This brings us up to 61b5b3f7b4145c9d6d66973b6bfbf28e6c0a0791. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	8 years ago
Rusty Russell	be9bb5f9cb	lightningd: peer_fail helper to fail/reconnect peer. This will eventually hook into restart logic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	8 years ago
Rusty Russell	4bf398c4e7	status: move into lightningd/status. It's really a lightningd-only thing, and we're about to do surgery on it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	8 years ago
Rusty Russell	7419fde9a0	Update to new spec: differentiate channel_id and short_channel_id. The spec 4af8e1841151f0c6e8151979d6c89d11839b2f65 uses a 32-byte 'channel-id' field, not to be confused with the 8-byte short ID used by gossip. Rename appropriately, and update to the new handshake protocol. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	8 years ago
Rusty Russell	89af53267b	lightningd/peer_failed: helper to send PKT_ERR and exit daemon. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	8 years ago

17 Commits (c49c869933e7de15ee1971dab9f38af9515761e7)