Skip to content

fix(moq-video): make NVENC encode correct on hardware (forced IDR, in-band param sets, pitched input)#1997

Merged
kixelated merged 3 commits into
devfrom
nvenc-hw-validation
Jul 2, 2026
Merged

fix(moq-video): make NVENC encode correct on hardware (forced IDR, in-band param sets, pitched input)#1997
kixelated merged 3 commits into
devfrom
nvenc-hw-validation

Conversation

@kixelated

@kixelated kixelated commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Validated moq-video's NVENC backend on a Linux + NVIDIA box (RTX 3070 Ti, driver 595.71.05) for #1837. NVENC encode was compile-verified only and had never actually run on hardware; doing so surfaced three correctness bugs, all fixed here with hardware-gated regression tests.

  1. Forced keyframes were silently ignored. enable_picture_type_decision makes NVENC choose frame types and ignore the per-frame pictureType, so encode(..., keyframe=true) mid-stream produced a P frame, not an IDR. Now uses the FORCEIDR picture flag, which applies with picture-type decision on. (Masked before because the capture producer only forces a keyframe on the first frame, which is an IDR regardless.)
  2. Only the first IDR carried parameter sets. Periodic/GOP IDRs had no SPS/PPS (nor VPS for HEVC), so a subscriber joining mid-stream couldn't decode at any later keyframe — exactly what avc3/hev1 rely on. Fixed with repeatSPSPPS + idrPeriod = gop.
  3. Every frame was corrupted. NVENC's input-buffer pitch exceeds the width (512 for a 320-wide IYUV buffer, 256-aligned), but the write was flat, shearing the image. The old width % 64 == 0 guard's "aligned is safe" premise was wrong. Fixed with a pitched, plane-by-plane copy at the buffer's real stride. Also removes the width-multiple-of-64 restriction (any even width works now).

Plus a small flake doc fix: the libva devShell comment claimed moq-vaapi dlopens libva with vendored headers; moq-vaapi 0.0.2 actually links it via pkg-config and the binary carries NEEDED libva.so.2 / libva-drm.so.2 (build- and run-time dep). Verified in a fresh nix develop that the vaapi feature builds and the vaapi-linked binary loads without any LD_LIBRARY_PATH override.

Dependency

Fixes 1 and 3 need additions to the nvidia-video-codec-sdk fork (the safe wrapper exposed neither FORCEIDR nor a pitched write — it had a literal TODO: Write pitched?): a force_idr flag and BufferLock::write_rows / pitch(). Already pushed to the dynamic-loading branch (kixelated/nvidia-video-codec-sdk@5434cc8), which this crate's [patch.crates-io] tracks. A cargo update -p nvidia-video-codec-sdk picks it up.

Public API

No public API change to moq-video. decode::backend widened from private to pub(crate) (crate-internal only) so the NVENC round-trip test can decode its own output with the software decoder.

Test plan

  • LD_LIBRARY_PATH=/usr/lib64 cargo test -p moq-video --no-default-features --features nvenc — all pass on the RTX 3070 Ti (the 4 nvenc_* tests are no-ops on a GPU-less host).
  • nvenc_h264/h265_keyframes_carry_param_sets — forced mid-stream IDR carries SPS/PPS (+VPS).
  • nvenc_h264_periodic_idr_at_gop — periodic IDRs land at GOP boundaries and carry SPS/PPS.
  • nvenc_h264_pitched_write_roundtrips — encode→openh264-decode round-trip, MAE ~0.3 vs input (ffmpeg-free); independently confirmed with an ffmpeg decode at 320x240, 300x240, 1280x720.
  • clippy clean, all 18 moq-video tests pass.
  • VAAPI fallback spot-checked (this box has no VA driver): Kind::Named("vaapi") errors cleanly, Kind::Auto falls back to openh264.

Environment note (not a code bug): on a Nix dev-shell the runtime driver probe returns "not found" even with the driver present, because the nix loader doesn't search /usr/lib64 by soname (no ldconfig cache); hence LD_LIBRARY_PATH=/usr/lib64 above. A normal glibc distro resolves it via ldconfig.

(Written by Claude Opus 4.8)

🤖 Generated with Claude Code

https://claude.ai/code/session_01T68neyMTMV7j8qzJohUqfk

…-band param sets, pitched input)

Validated the NVENC backend on a Linux + NVIDIA box (RTX 3070 Ti). It had
never run correctly on hardware before, only compile-verified, and three
bugs surfaced:

- Forced keyframes were ignored. `enable_picture_type_decision` makes NVENC
  choose frame types and ignore the per-frame `pictureType`, so a forced
  `keyframe` produced a P frame, not an IDR. Use the `FORCEIDR` picture flag,
  which applies with picture-type decision on.
- Only the first IDR carried parameter sets. Periodic/GOP IDRs had no
  SPS/PPS (nor VPS for HEVC), so a subscriber joining mid-stream could not
  decode at any later keyframe. Set `repeatSPSPPS` and `idrPeriod = gop`.
- Every frame was corrupted. NVENC's input buffer pitch exceeds the width
  (512 for a 320-wide IYUV buffer), but the write was flat, shearing the
  image. Copy each plane row by row at the buffer's real pitch. This also
  removes the former width-must-be-a-multiple-of-64 restriction.

Adds four hardware-gated tests (no-ops without an NVIDIA driver): forced-IDR
parameter sets for H.264 and H.265, periodic-IDR cadence, and a pitch
round-trip that decodes the NVENC output with the in-crate openh264 decoder
(no ffmpeg). `decode::backend` is now `pub(crate)` for that round-trip.

Requires nvidia-video-codec-sdk (dynamic-loading branch) with the new
`force_idr` flag and `BufferLock::write_rows` / `pitch`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01T68neyMTMV7j8qzJohUqfk

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @kixelated, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

kixelated and others added 2 commits July 1, 2026 20:54
libva was described as dlopen'd at runtime with vendored headers. moq-vaapi
0.0.2 links libva via pkg-config at build time and the binary carries
NEEDED libva.so.2 / libva-drm.so.2, so it is a build- and run-time dep.
Verified in a fresh `nix develop`: the vaapi feature builds and the
vaapi-linked binary loads without any LD_LIBRARY_PATH override. See #1837.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01T68neyMTMV7j8qzJohUqfk
The test comment claimed "320 % 64 == 0, so the flat-write pitch
assumption holds" - a leftover from the removed width guard. There is no
flat write anymore (encode writes pitched), and 320 is just a convenient
even size; pitch correctness is covered by nvenc_h264_pitched_write_roundtrips.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01T68neyMTMV7j8qzJohUqfk
@kixelated kixelated enabled auto-merge (squash) July 2, 2026 04:13
@kixelated kixelated merged commit 3b87a3d into dev Jul 2, 2026
2 checks passed
@kixelated kixelated deleted the nvenc-hw-validation branch July 2, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant