Skip to content

fix: harden recoverable-error handling so the sync self-heals (#898)#960

Draft
bernardgut wants to merge 1 commit into
opencloud-eu:mainfrom
bernardgut:fix/recoverable-error-hardening
Draft

fix: harden recoverable-error handling so the sync self-heals (#898)#960
bernardgut wants to merge 1 commit into
opencloud-eu:mainfrom
bernardgut:fix/recoverable-error-hardening

Conversation

@bernardgut

@bernardgut bernardgut commented Jul 1, 2026

Copy link
Copy Markdown

Problem

Beyond the OAuth token-refresh wedge (#955), a couple of recoverable error conditions break the desktop client's self-healing during a large sync — a sync should always converge to complete on the next pass / re-login, but these bugs either abort the whole run or wedge a file. They bite hardest on long, multi-day syncs where transient errors are near-certain.

Fixes (mostly one function: classifyError)

1. A transient network error aborts the entire sync run — not just the file

classifyError() (src/libsync/owncloudpropagator_p.h) maps every QNetworkReply error in (NoError, UnknownProxyError] to FatalError, which propagates up to propagator()->abort() and aborts the whole run. On a long sync a single timeout / connection-reset / temporary-failure on one file kills the entire pass, and the expensive discovery has to restart from scratch.

Now the recoverable connectivity errors (Timeout, ConnectionRefused, HostNotFound, TemporaryNetworkFailure, NetworkSessionFailed, proxy refused/closed/timeout) are a per-file NormalError that sets anotherSyncNeeded, so only that file is retried and the run continues. Genuinely fatal cases (TLS handshake, proxy auth, redirect loops, …) keep FatalError.

2. TUS resume wedges on a 409 Upload-Offset mismatch (#898)

A stale/diverged TUS resume gets HTTP 409. It used to fall through to commonErrorHandling and the upload wedged (the file hangs near 100%, never completes; recovery needs a manual remove + re-add of the folder). PropagateUploadFileTUS::slotChunkFinished() now routes a 409 through the same HEAD-to-get-current-offset recovery path it already uses for timeouts, resuming from the server's canonical Upload-Offset. classifyError() also classifies 409 as a recoverable SoftError + anotherSyncNeeded as a fallback.

Tests

Two new tests, both bug-bites-verified (each fails against the pre-fix code, passes after):

  • test/testclassifyerror.cpp (ClassifyError) — unit-pins the classifyError contract: transient connectivity errors → NormalError + retry (not FatalError); 409SoftError + retry; genuinely-fatal errors (e.g. TLS handshake) stay FatalError. Pre-fix: Actual FatalError ≠ Expected NormalError, Actual NormalError ≠ Expected SoftError.
  • test/testselfheal.cpp (SelfHeal) — FakeFolder integration test proving the self-heal property end-to-end: a transient network error injected on one file must not abort the whole run; the healthy files queued after it still upload. Pre-fix this test fails (the run aborts on the first file, the healthy files queued behind it never sync).

Full client ctest is green (33/33) on the CI build image (opencloudeu/desktop-client-build:ubuntu-24.04-qt6.10).

Fixes #898

Make the native discover->reconcile->upload loop self-heal on recoverable errors
instead of wedging or aborting the whole run (complements the token-wedge fix opencloud-eu#955):

- classifyError: a transient network/timeout on one file is now a per-file
  NormalError + another-pass, not FatalError -> propagator()->abort() (which
  aborts the ENTIRE run on a single blip over a long, multi-day sync). Genuinely
  fatal cases (TLS handshake, proxy auth, redirects) keep FatalError.
- TUS resume: a 409 Upload-Offset mismatch (opencloud-eu#898) now routes through the existing
  HEAD-offset-recovery path and resumes from the server's canonical offset,
  instead of wedging in commonErrorHandling. classifyError also maps 409 to a
  recoverable SoftError + another-pass.
- test/testclassifyerror.cpp: regression coverage (bug-bites verified).

Fixes opencloud-eu#898

Authored-By: Bernard Gütermann <bernard.gutermann@sekops.ch>
@bernardgut bernardgut force-pushed the fix/recoverable-error-hardening branch from e823aa6 to 5a8b8ee Compare July 1, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TUS uploads can get permanently stuck after failed resume; stale uploadinfo is not reset since 3.0

1 participant