📅 Original date posted:2021-04-24 📝 Original message:

📅 Original date posted:2021-04-24
📝 Original message:
> On Apr 24, 2021, at 01:56, Rusty Russell <rusty at rustcorp.com.au> wrote:
>
> Matt Corallo <lf-lists at mattcorallo.com> writes:
>> Somehow I missed this thread, but I did note in a previous meeting - these issues are great fodder for fuzzing. We’ve had a fuzzer which aggressively tests for precisely these types of message-non-delivery-and-resending production desync bugs for several years. When it initially landed it forced several rewrites of parts of the state machine, but quickly exhausted the bug fruit (though catches other classes of bugs occasionally as well). The state machine here is really not that big - while I agree simplifying it where possible is nice, ripping things out to replace them with fresh code (which would need similar testing) is probably not the most obvious decrease in complexity.
>
> It's historically had more bugs than anything else in the protocol. We
> literally found another one in feerate negotiation since the last
> c-lightning release :(
>
> I'd rather not have bugs than try to catch them all.

I promise it’s much less work than it sounds like, and avoids having to debug these things based on logs, which is a huge pain :). Definitely less work than a new state machine:).

> You could propose a splice (or update to anchors, or whatever) any time
> when it's your turn, as long as you haven't proposed any other updates.
> That's simple!

I presume you’d need to take it a few steps further - if the last message received required a response CS/RAA, you must still wait until things have settled down. I guess it also depends on the exact semantics of a “turn based” message protocol - if you received some updates and a signature, are you allowed to add more updates after you send your CS/RAA (then you have a good chunk of today’s complexity), or do you have to wait until they send you back their last RAA (in which case presumably they aren’t allowed to include anything else as then they’d be able to monopolize update windows). In the first case you still have the same issues of today, in the second less so, but you’re doing a similar “ok, just pause updates and wait for things to settle “, I think.

> Instead, *both* sides have to send a splice message to synchronize, and
> they can only do so once all in-flight changes have cleared. You have
> to resolve simultaneous splice attempts (we use "highest feerate"
> tiebreak by node_id), and keep track of this stage while you clear
> in-flight changes.

Isn’t that pretty similar? Discard one splice proposal deterministically (ok that’s new) and the loser has to store their proposal in a holding cell for later (which they have to do in turn-based anyway). Logic to check if there’s unsettled things in RAA handling is pretty similar to turn-based, and logic to reject other messages is the same as shutdown handling today.

> Here's the subset of requirements from the draft which relate to this:
>
> The sender:
> - MUST NOT send another splice message while a splice is being negotiated.
> - MUST NOT send a splice message after sending uncommitted changes.
> - MUST NOT send other channel updates until splice negotiation has completed.
>
> The receiver:
> - MUST respond with a `splice` message of its own if it has not already.
> - MUST NOT reply with `splice` until all commitment updates are resolved by both peers.

Probably use “committed” not “resolved”. “Resolved” sounds like “no pending HTLCs left”.

> - MUST use the higher of the two `funding_feerate_perkw` as the feerate for
> the splice.

If we like turn based, why not just deterministic throw out one slice? :)

> - MUST NOT send other channel updates until splice negotiation has completed.
>
> Similar requirements exist for other major channel changes.
>
> Cheers,
> Rusty.
>

Matt Corallo [ARCHIVE] on Nostr: 📅 Original date posted:2021-04-24 📝 Original message: > On Apr 24, 2021, at ...