Fix A/V sync bugs and add sync test infrastructure#1977
Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
| - macos-latest | ||
| - windows-2022 | ||
| - ubuntu-24.04 | ||
| runs-on: ${{ matrix.runner }} |
There was a problem hiding this comment.
Workflow does not declare
permissions, leaving GITHUB_TOKEN scope to repo defaults
Workflow has no permissions: block at top or job level.
Add permissions: {} at the workflow top and grant least-privilege per job.
AI prompt
Check if this security scanner issue is valid. If so, understand the root cause and fix it. If appropriate, update or add tests. Keep the change focused and preserve intended behavior.
<file name=".github/workflows/sync-tests.yml">
<violation number="1" location=".github/workflows/sync-tests.yml:42">
<priority>P1</priority>
<title>Workflow does not declare `permissions`, leaving `GITHUB_TOKEN` scope to repo defaults</title>
<evidence>The workflow has no `permissions:` block at the top level or job level. The default GITHUB_TOKEN permissions depend on repo and org settings; older repos may default to write-all, giving this test workflow broad access it does not need.</evidence>
<recommendation>Add `permissions: {}` at the workflow top level, then grant each job only the permissions it needs. For example, the `sync-tests` job needs `contents: read` for checkout and `actions: write` for artifact upload.</recommendation>
</violation>
</file>
| runs-on: ${{ matrix.runner }} | ||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@v4 |
There was a problem hiding this comment.
Actions are pinned to mutable tags instead of commit SHAs
uses: lines reference mutable tags instead of immutable commit SHAs.
Pin all actions to full 40-character SHAs with version comments.
AI prompt
Check if this security scanner issue is valid. If so, understand the root cause and fix it. If appropriate, update or add tests. Keep the change focused and preserve intended behavior.
<file name=".github/workflows/sync-tests.yml">
<violation number="1" location=".github/workflows/sync-tests.yml:45">
<priority>P1</priority>
<title>Actions are pinned to mutable tags instead of commit SHAs</title>
<evidence>The workflow references several third-party and first-party actions by mutable tags: `uses: actions/checkout@v4`, `uses: dtolnay/rust-toolchain@1.88.0`, `uses: actions/setup-node@v4`, and `uses: actions/upload-artifact@v4`. Tags can be force-pushed by an attacker who compromises the action repository, silently injecting malicious code into this workflow.</evidence>
<recommendation>Pin every action to the full 40-character commit SHA of the intended version and append a version comment for readability. For example: `uses: actions/checkout@<sha> # v4`.</recommendation>
</violation>
</file>
|
|
||
| if let Some(rx) = pattern_rx { |
There was a problem hiding this comment.
When cap selftest installs the pattern runner and the runtime thread fails to spawn, the main thread enters serve_main_thread(rx) before the spawn error is handled. The sender remains installed, so rx.recv() never closes and the CLI hangs instead of reporting the spawn failure.
Context Used: AGENTS.md (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/cli/src/main.rs
Line: 468-469
Comment:
**Spawn Failure Hangs Selftest**
When `cap selftest` installs the pattern runner and the runtime thread fails to spawn, the main thread enters `serve_main_thread(rx)` before the spawn error is handled. The sender remains installed, so `rx.recv()` never closes and the CLI hangs instead of reporting the spawn failure.
**Context Used:** AGENTS.md ([source](https://app.greptile.com/cap/github/capsoftware/cap/-/custom-context?memory=27801409-c24c-4476-9c6c-180f1ef0a7f2))
How can I resolve this? If you propose a fix, please make it concise.| } else { | ||
| (samples as u128 * clock.sample_rate() as u128 / u128::from(self.sample_rate.max(1))) |
There was a problem hiding this comment.
Fractional Clock Drift Accumulates
For non-48 kHz sources, this converts each buffer to master-clock samples with integer truncation and drops the fractional remainder. A 44.1 kHz device with 1024-sample buffers loses part of a 48 kHz clock sample every buffer, so the master clock can drift from the source timeline and the gap tracker can still insert bogus silence over longer recordings.
Context Used: AGENTS.md (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/recording/src/output_pipeline/core.rs
Line: 672-673
Comment:
**Fractional Clock Drift Accumulates**
For non-48 kHz sources, this converts each buffer to master-clock samples with integer truncation and drops the fractional remainder. A 44.1 kHz device with 1024-sample buffers loses part of a 48 kHz clock sample every buffer, so the master clock can drift from the source timeline and the gap tracker can still insert bogus silence over longer recordings.
**Context Used:** AGENTS.md ([source](https://app.greptile.com/cap/github/capsoftware/cap/-/custom-context?memory=27801409-c24c-4476-9c6c-180f1ef0a7f2))
How can I resolve this? If you propose a fix, please make it concise.| && req.frame.abs_diff(frame_num) > fallback_distance | ||
| { | ||
| let _ = req.sender.send(last.to_decoded_frame()); | ||
| } else if allow_relaxed_fallback | ||
| && let Some(ref first) = *first_ever_frame.borrow() | ||
| { | ||
| let _ = req.sender.send(first.to_decoded_frame()); | ||
| } else { | ||
| unfulfilled_count += 1; | ||
| tracing::debug!( | ||
| req_frame = req.frame, | ||
| nearest_frame = frame_num, | ||
| "serving forward frame across pts gap" | ||
| ); | ||
| } | ||
| let _ = req.sender.send(cached.data().to_decoded_frame()); |
There was a problem hiding this comment.
Forward Fallback Ignores Distance
The fallback_distance check only logs when the nearest frame is after the request and too far away, but the frame is still sent unconditionally. After a seek or sparse decode, an initial request with a 2-frame tolerance can receive a future cached frame far outside that tolerance, producing a visible frame from the wrong timestamp instead of retrying or failing.
Context Used: AGENTS.md (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/rendering/src/decoder/avassetreader.rs
Line: 928-936
Comment:
**Forward Fallback Ignores Distance**
The `fallback_distance` check only logs when the nearest frame is after the request and too far away, but the frame is still sent unconditionally. After a seek or sparse decode, an initial request with a 2-frame tolerance can receive a future cached frame far outside that tolerance, producing a visible frame from the wrong timestamp instead of retrying or failing.
**Context Used:** AGENTS.md ([source](https://app.greptile.com/cap/github/capsoftware/cap/-/custom-context?memory=27801409-c24c-4476-9c6c-180f1ef0a7f2))
How can I resolve this? If you propose a fix, please make it concise.| // what un-edited recordings use. | ||
| let display_media_duration = match s.pipeline.screen.video_timestamp_span { | ||
| Some((first, last)) if display_fps > 0 => { | ||
| (last - first).as_secs_f64() + 1.0 / f64::from(display_fps) |
There was a problem hiding this comment.
last - first will panic on underflow if we ever see out-of-order timestamps (or a missing first). Might be worth making this saturating since it’s just being used for a duration estimate.
| (last - first).as_secs_f64() + 1.0 / f64::from(display_fps) | |
| last.saturating_sub(first).as_secs_f64() + 1.0 / f64::from(display_fps) |
| /// pipeline's timestamp span, the container duration from what the encoder | ||
| /// and muxer wrote. If they disagree by more than the tolerance, timestamps | ||
| /// were mangled between the pipeline and the file — the class of bug that | ||
| /// silently desyncs audio from video. Non-fatal: logs a structured warning |
There was a problem hiding this comment.
Nit: the doc says this “logs a structured warning” but the code logs at error level.
| /// silently desyncs audio from video. Non-fatal: logs a structured warning | |
| /// silently desyncs audio from video. Non-fatal: logs a structured error |
| } else { | ||
| (samples as u128 * clock.sample_rate() as u128 / u128::from(self.sample_rate.max(1))) | ||
| as u64 | ||
| }; |
There was a problem hiding this comment.
Minor: the integer division here will always floor, which introduces a small but systematic bias when converting between rates (eg 44.1k -> 48k). Rounding avoids that drift without changing the overall approach.
| }; | |
| let denom = u128::from(self.sample_rate.max(1)); | |
| let numer = samples as u128 * clock.sample_rate() as u128; | |
| ((numer + denom / 2) / denom) as u64 |
| - "crates/media-info/**" | ||
| - ".github/workflows/sync-tests.yml" | ||
|
|
||
| concurrency: |
There was a problem hiding this comment.
Worth adding an explicit permissions: block here. This workflow checks out code and uploads artifacts, so something like this keeps the token scoped tightly.
| concurrency: | |
| permissions: | |
| contents: read | |
| actions: write | |
| concurrency: |
| } else if let Some((&frame_num, cached)) = cache.range(req.frame..).next() { | ||
| if req.frame.abs_diff(frame_num) > fallback_distance { | ||
| tracing::debug!( | ||
| req_frame = req.frame, | ||
| nearest_frame = frame_num, | ||
| "serving forward frame across pts gap" | ||
| ); | ||
| } | ||
| let _ = req.sender.send(cached.data().to_decoded_frame()); |
There was a problem hiding this comment.
This still sends a forward cached frame even when it’s way outside fallback_distance (it only logs). If allow_relaxed_fallback is true, it seems safer to prefer last_sent_frame/first_ever_frame over showing a future frame from the wrong timestamp.
| } else if let Some((&frame_num, cached)) = cache.range(req.frame..).next() { | |
| if req.frame.abs_diff(frame_num) > fallback_distance { | |
| tracing::debug!( | |
| req_frame = req.frame, | |
| nearest_frame = frame_num, | |
| "serving forward frame across pts gap" | |
| ); | |
| } | |
| let _ = req.sender.send(cached.data().to_decoded_frame()); | |
| } else if let Some((&frame_num, cached)) = cache.range(req.frame..).next() { | |
| if req.frame.abs_diff(frame_num) > fallback_distance && allow_relaxed_fallback { | |
| if let Some(ref last) = *last_sent_frame.borrow() { | |
| let _ = req.sender.send(last.to_decoded_frame()); | |
| continue; | |
| } | |
| if let Some(ref first) = *first_ever_frame.borrow() { | |
| let _ = req.sender.send(first.to_decoded_frame()); | |
| continue; | |
| } | |
| } | |
| if req.frame.abs_diff(frame_num) > fallback_distance { | |
| tracing::debug!( | |
| req_frame = req.frame, | |
| nearest_frame = frame_num, | |
| "serving forward frame across pts gap" | |
| ); | |
| } | |
| let _ = req.sender.send(cached.data().to_decoded_frame()); |
| } else if let Some((&frame_num, cached)) = cache.range(req.frame..).next() { | ||
| if req.frame.abs_diff(frame_num) > fallback_distance { | ||
| tracing::debug!( | ||
| req_frame = req.frame, | ||
| nearest_frame = frame_num, | ||
| "serving forward frame across pts gap" | ||
| ); | ||
| } | ||
| let _ = req.sender.send(cached.data().to_decoded_frame()); |
There was a problem hiding this comment.
Follow-up: the suggestion above was missing the closing } for this branch.
| } else if let Some((&frame_num, cached)) = cache.range(req.frame..).next() { | |
| if req.frame.abs_diff(frame_num) > fallback_distance { | |
| tracing::debug!( | |
| req_frame = req.frame, | |
| nearest_frame = frame_num, | |
| "serving forward frame across pts gap" | |
| ); | |
| } | |
| let _ = req.sender.send(cached.data().to_decoded_frame()); | |
| } else if let Some((&frame_num, cached)) = cache.range(req.frame..).next() { | |
| if req.frame.abs_diff(frame_num) > fallback_distance && allow_relaxed_fallback { | |
| if let Some(ref last) = *last_sent_frame.borrow() { | |
| let _ = req.sender.send(last.to_decoded_frame()); | |
| continue; | |
| } | |
| if let Some(ref first) = *first_ever_frame.borrow() { | |
| let _ = req.sender.send(first.to_decoded_frame()); | |
| continue; | |
| } | |
| } | |
| if req.frame.abs_diff(frame_num) > fallback_distance { | |
| tracing::debug!( | |
| req_frame = req.frame, | |
| nearest_frame = frame_num, | |
| "serving forward frame across pts gap" | |
| ); | |
| } | |
| let _ = req.sender.send(cached.data().to_decoded_frame()); | |
| } |
| let last_before = last_sent_frame | ||
| .borrow() | ||
| .as_ref() | ||
| .filter(|l| l.number <= requested_frame) | ||
| .cloned(); |
There was a problem hiding this comment.
last_sent_frame fallback looks like it can still satisfy a post-seek request with a very old frame (freeze-at-stale-content). If this is meant as “VFR hold”, it seems worth bounding it (except when you’ve positively identified a pts hole via gap_hold).
| let last_before = last_sent_frame | |
| .borrow() | |
| .as_ref() | |
| .filter(|l| l.number <= requested_frame) | |
| .cloned(); | |
| let last_before = last_sent_frame | |
| .borrow() | |
| .as_ref() | |
| .filter(|l| { | |
| l.number <= requested_frame | |
| && requested_frame.saturating_sub(l.number) | |
| <= MAX_FRAME_LOOKBACK_TOLERANCE | |
| }) | |
| .cloned(); |
| echo "## A/V sync matrix — ${{ matrix.runner }}" | ||
| echo "" | ||
| if [ -f "$REPORT" ]; then | ||
| PYTHONIOENCODING=utf-8 python3 - "$REPORT" << 'PYEOF' |
There was a problem hiding this comment.
On windows-2022 the Bash environment usually has python but not always python3. Might be worth making this pick the right binary per runner so the summary generation can’t flake.
| PYTHONIOENCODING=utf-8 python3 - "$REPORT" << 'PYEOF' | |
| PYTHON_BIN=python3 | |
| if [ "$RUNNER_OS" = "Windows" ]; then PYTHON_BIN=python; fi | |
| PYTHONIOENCODING=utf-8 "$PYTHON_BIN" - "$REPORT" << 'PYEOF' |
| if [ "$RUNNER_OS" = "macOS" ]; then | ||
| export CAP_EDITOR_FORCE_FFMPEG_DECODER=1 | ||
| fi | ||
| cargo run --locked -p cap -- --log-level info selftest playback --duration 30 --json |
There was a problem hiding this comment.
Since the comment calls out “default 30 fps”, might be worth pinning it explicitly so this can’t change silently if the CLI default ever shifts.
| cargo run --locked -p cap -- --log-level info selftest playback --duration 30 --json | |
| cargo run --locked -p cap -- --log-level info selftest playback --duration 30 --fps 30 --json |
| contents: read | ||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@v4 |
There was a problem hiding this comment.
actions/checkout leaves credentials persisted in the workspace
Checkout step persists GITHUB_TOKEN in git config by default.
Add persist-credentials: false to the checkout step.
AI prompt
Check if this security scanner issue is valid. If so, understand the root cause and fix it. If appropriate, update or add tests. Keep the change focused and preserve intended behavior.
<file name=".github/workflows/sync-tests.yml">
<violation number="1" location=".github/workflows/sync-tests.yml:50">
<priority>P2</priority>
<title>actions/checkout leaves credentials persisted in the workspace</title>
<evidence>The workflow uses `actions/checkout@v4` without setting `persist-credentials: false`. The default behavior keeps the GITHUB_TOKEN in the local git config, where it can be read by later steps or untrusted build scripts.</evidence>
<recommendation>Add `persist-credentials: false` to the checkout step, or explicitly set it to `true` only if the workflow needs to push back to the repository.</recommendation>
</violation>
</file>
| /// thread from its serve loop. | ||
| pub fn shutdown_main_thread_runner() { | ||
| if let Some(slot) = PATTERN_TX.get() { | ||
| slot.lock().unwrap().take(); |
There was a problem hiding this comment.
Minor robustness: lock().unwrap() here can panic if the mutex is poisoned, which would prevent the sender from being dropped and leave the main thread stuck in rx.recv().
| slot.lock().unwrap().take(); | |
| slot.lock().unwrap_or_else(|e| e.into_inner()).take(); |
| pub async fn request_pattern(spec: PatternSpec) -> Result<PatternReport, String> { | ||
| let tx = PATTERN_TX | ||
| .get() | ||
| .and_then(|slot| slot.lock().unwrap().clone()) |
There was a problem hiding this comment.
Same idea here: avoiding a poisoned-mutex panic makes the selftest failure mode cleaner.
| .and_then(|slot| slot.lock().unwrap().clone()) | |
| .and_then(|slot| Some(slot.lock().unwrap_or_else(|e| e.into_inner()).clone())) |
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
| max_abs = max_abs.max((p - s).abs()); | ||
| let rel = ((p - pts[0]) - (s - sent[0])).abs(); | ||
| max_rel = max_rel.max(rel); | ||
| if rel > rel_tolerance { |
There was a problem hiding this comment.
Minor: since rel_tolerance now varies with fps, it’d be helpful to include it (and case.fps) in the failure message so CI flakes are easier to diagnose.
| if rel > rel_tolerance { | |
| return Err(format!( | |
| "frame {i}: relative pts error {rel:.3}s exceeds tolerance {rel_tolerance:.3}s at {fps}fps (pts {p:.3}s vs sent {s:.3}s)", | |
| fps = case.fps | |
| )); |
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
| } else { | ||
| unfulfilled_count += 1; | ||
| .map(|(_, c)| c.data().clone()); | ||
| let hold_before = gap_hold.clone().filter(|h| { |
There was a problem hiding this comment.
Minor perf: gap_hold.clone().filter(...) clones every request even when the predicate fails. as_ref() keeps it borrowed until it’s actually selected.
| let hold_before = gap_hold.clone().filter(|h| { | |
| let hold_before = gap_hold.as_ref().filter(|h| { | |
| pts_holes | |
| .get(&h.number) | |
| .is_some_and(|&end| h.number <= req.frame && req.frame < end) | |
| }).cloned(); |
| .as_ref() | ||
| .map(|(n, _)| *n) | ||
| .filter(|&n| n <= requested_frame); | ||
| let hold_before = gap_hold.clone().filter(|h| { |
There was a problem hiding this comment.
Minor perf: same clone pattern here — gap_hold.clone().filter(...) eagerly clones the frame even if it gets rejected.
| let hold_before = gap_hold.clone().filter(|h| { | |
| let hold_before = gap_hold.as_ref().filter(|h| { | |
| pts_holes.get(&h.number).is_some_and(|&end| { | |
| h.number <= requested_frame && requested_frame < end | |
| }) | |
| }).cloned(); |
Fixes eight audio/video sync bugs found through end-to-end testing, and adds the infrastructure to keep sync verified continuously.
Fixes
frame_count / fps, discarding capture timestamps — every dropped/undelivered frame compressed the video timeline against audio, desyncing screen-only studio recordings by hundreds of ms with progressive driftframe_count / fps, truncating VFR recordings; now uses the real muxed timestamp spanBufferedResamplerpanicked with an integer overflow on non-integer rate ratios (44.1↔48kHz)Every finalized recording now cross-checks its display container duration against the muxed timestamp span and logs a sync-invariant violation on mismatch.
Test infrastructure
cap selftest av-sync [--mic]: ~30s end-to-end sync check on any machine — renders a flash+beep pattern, records through the real capture stack, exports, and measures offset/drift in both (with--mic, also verifies the physical input path acoustically). Beep detection uses a 1kHz bandpass with edge-triggered hysteresis, so it works while music plays on the machinecrates/recording/tests/sync_matrix.rs: synthetic device matrix — 21 curated cases (15–120fps, jitter/drops/gaps, 8–96kHz, 1/2/6 channels, both muxers) plus seed-reproducible randomized cases: random fps/resolution/content (including per-frame noise for worst-case encoder load), random sample rates (8–192kHz), 1–8 channels, real-world buffer sizes (3–90ms), source clock drift, and random gap placement — with video+audio legs running concurrently like a real recording. Real frames and samples go through the real encoders and containers and are decoded back for verification. PRs run 6 random cases; the nightly schedule runs 40 (seed printed and stored in the report for exact reproduction viaCAP_SYNC_MATRIX_SEED). This matrix found five of the eight bugs above on its first run.github/workflows/sync-tests.yml: runs the matrix and timestamp property tests on macOS/Windows/Ubuntu with a per-case job-summary table and JSON artifactsVerified: full test suites pass, all matrix cases pass on macOS, Windows and Ubuntu CI, and live selftest runs measure −2 to +18 ms offset with ~0 drift on macOS (recording, microphone, and export legs).
Greptile Summary
This PR improves A/V sync handling and adds continuous sync checks. The main changes are:
cap selftest av-synccommand and measurement pipeline.Confidence Score: 4/5
The sync fixes need follow-up before merging because changed runtime paths can hang or return wrong media frames.
apps/cli/src/main.rs, crates/recording/src/output_pipeline/core.rs, crates/rendering/src/decoder/avassetreader.rs, crates/rendering/src/decoder/ffmpeg.rs
Important Files Changed
Comments Outside Diff (1)
crates/rendering/src/decoder/ffmpeg.rs, line 826-835 (link)This treats any previous
last_sent_framewithnumber <= requested_frameas a valid VFR hold, regardless of how far it is from the request. After a long seek or decoder reset, a frame from seconds earlier can satisfy the new request and silently freeze the preview or export at stale content.Context Used: AGENTS.md (source)
Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "Add randomized combined A/V cases and en..." | Re-trigger Greptile
Context used:
Follow-up hardening (second verification pass)
An adversarial re-review of this branch confirmed several defects in the fixes themselves, all now resolved:
Product fixes
pause-resumecase in the sync matrix guards this end-to-end.Selftest hardening
mad_ms); export-leg WARNs contribute a reason;--durationfloor raised to 14s so the minimum event count is always reachable.New coverage:
cap selftest playback.cap, drives the real editor machinery (EditorInstance, real decoders, real frame scheduling, real audio pipeline) with taps at both presentation boundaries — the renderer frame callback and a headless audio sink that pulls blocks on a device-like schedule through the exact production source pipeline.sync-tests.ymlalongsidecap-renderingunit tests and apermissions: contents: readblock.