fix: harden recoverable-error handling so the sync self-heals (#898)#960
Draft
bernardgut wants to merge 1 commit into
Draft
fix: harden recoverable-error handling so the sync self-heals (#898)#960bernardgut wants to merge 1 commit into
bernardgut wants to merge 1 commit into
Conversation
Make the native discover->reconcile->upload loop self-heal on recoverable errors instead of wedging or aborting the whole run (complements the token-wedge fix opencloud-eu#955): - classifyError: a transient network/timeout on one file is now a per-file NormalError + another-pass, not FatalError -> propagator()->abort() (which aborts the ENTIRE run on a single blip over a long, multi-day sync). Genuinely fatal cases (TLS handshake, proxy auth, redirects) keep FatalError. - TUS resume: a 409 Upload-Offset mismatch (opencloud-eu#898) now routes through the existing HEAD-offset-recovery path and resumes from the server's canonical offset, instead of wedging in commonErrorHandling. classifyError also maps 409 to a recoverable SoftError + another-pass. - test/testclassifyerror.cpp: regression coverage (bug-bites verified). Fixes opencloud-eu#898 Authored-By: Bernard Gütermann <bernard.gutermann@sekops.ch>
e823aa6 to
5a8b8ee
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Beyond the OAuth token-refresh wedge (#955), a couple of recoverable error conditions break the desktop client's self-healing during a large sync — a sync should always converge to complete on the next pass / re-login, but these bugs either abort the whole run or wedge a file. They bite hardest on long, multi-day syncs where transient errors are near-certain.
Fixes (mostly one function:
classifyError)1. A transient network error aborts the entire sync run — not just the file
classifyError()(src/libsync/owncloudpropagator_p.h) maps everyQNetworkReplyerror in(NoError, UnknownProxyError]toFatalError, which propagates up topropagator()->abort()and aborts the whole run. On a long sync a single timeout / connection-reset / temporary-failure on one file kills the entire pass, and the expensive discovery has to restart from scratch.Now the recoverable connectivity errors (
Timeout,ConnectionRefused,HostNotFound,TemporaryNetworkFailure,NetworkSessionFailed, proxy refused/closed/timeout) are a per-fileNormalErrorthat setsanotherSyncNeeded, so only that file is retried and the run continues. Genuinely fatal cases (TLS handshake, proxy auth, redirect loops, …) keepFatalError.2. TUS resume wedges on a 409 Upload-Offset mismatch (#898)
A stale/diverged TUS resume gets HTTP 409. It used to fall through to
commonErrorHandlingand the upload wedged (the file hangs near 100%, never completes; recovery needs a manual remove + re-add of the folder).PropagateUploadFileTUS::slotChunkFinished()now routes a 409 through the same HEAD-to-get-current-offset recovery path it already uses for timeouts, resuming from the server's canonicalUpload-Offset.classifyError()also classifies 409 as a recoverableSoftError+anotherSyncNeededas a fallback.Tests
Two new tests, both bug-bites-verified (each fails against the pre-fix code, passes after):
test/testclassifyerror.cpp(ClassifyError) — unit-pins theclassifyErrorcontract: transient connectivity errors →NormalError+ retry (notFatalError);409→SoftError+ retry; genuinely-fatal errors (e.g. TLS handshake) stayFatalError. Pre-fix:Actual FatalError ≠ Expected NormalError,Actual NormalError ≠ Expected SoftError.test/testselfheal.cpp(SelfHeal) — FakeFolder integration test proving the self-heal property end-to-end: a transient network error injected on one file must not abort the whole run; the healthy files queued after it still upload. Pre-fix this test fails (the run aborts on the first file, the healthy files queued behind it never sync).Full client
ctestis green (33/33) on the CI build image (opencloudeu/desktop-client-build:ubuntu-24.04-qt6.10).Fixes #898