fix(moq-video): make NVENC encode correct on hardware (forced IDR, in-band param sets, pitched input)#1997
Merged
Merged
Conversation
…-band param sets, pitched input) Validated the NVENC backend on a Linux + NVIDIA box (RTX 3070 Ti). It had never run correctly on hardware before, only compile-verified, and three bugs surfaced: - Forced keyframes were ignored. `enable_picture_type_decision` makes NVENC choose frame types and ignore the per-frame `pictureType`, so a forced `keyframe` produced a P frame, not an IDR. Use the `FORCEIDR` picture flag, which applies with picture-type decision on. - Only the first IDR carried parameter sets. Periodic/GOP IDRs had no SPS/PPS (nor VPS for HEVC), so a subscriber joining mid-stream could not decode at any later keyframe. Set `repeatSPSPPS` and `idrPeriod = gop`. - Every frame was corrupted. NVENC's input buffer pitch exceeds the width (512 for a 320-wide IYUV buffer), but the write was flat, shearing the image. Copy each plane row by row at the buffer's real pitch. This also removes the former width-must-be-a-multiple-of-64 restriction. Adds four hardware-gated tests (no-ops without an NVIDIA driver): forced-IDR parameter sets for H.264 and H.265, periodic-IDR cadence, and a pitch round-trip that decodes the NVENC output with the in-crate openh264 decoder (no ffmpeg). `decode::backend` is now `pub(crate)` for that round-trip. Requires nvidia-video-codec-sdk (dynamic-loading branch) with the new `force_idr` flag and `BufferLock::write_rows` / `pitch`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01T68neyMTMV7j8qzJohUqfk
There was a problem hiding this comment.
Sorry @kixelated, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
libva was described as dlopen'd at runtime with vendored headers. moq-vaapi 0.0.2 links libva via pkg-config at build time and the binary carries NEEDED libva.so.2 / libva-drm.so.2, so it is a build- and run-time dep. Verified in a fresh `nix develop`: the vaapi feature builds and the vaapi-linked binary loads without any LD_LIBRARY_PATH override. See #1837. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01T68neyMTMV7j8qzJohUqfk
The test comment claimed "320 % 64 == 0, so the flat-write pitch assumption holds" - a leftover from the removed width guard. There is no flat write anymore (encode writes pitched), and 320 is just a convenient even size; pitch correctness is covered by nvenc_h264_pitched_write_roundtrips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01T68neyMTMV7j8qzJohUqfk
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validated
moq-video's NVENC backend on a Linux + NVIDIA box (RTX 3070 Ti, driver 595.71.05) for #1837. NVENC encode was compile-verified only and had never actually run on hardware; doing so surfaced three correctness bugs, all fixed here with hardware-gated regression tests.enable_picture_type_decisionmakes NVENC choose frame types and ignore the per-framepictureType, soencode(..., keyframe=true)mid-stream produced a P frame, not an IDR. Now uses theFORCEIDRpicture flag, which applies with picture-type decision on. (Masked before because the capture producer only forces a keyframe on the first frame, which is an IDR regardless.)repeatSPSPPS+idrPeriod = gop.width % 64 == 0guard's "aligned is safe" premise was wrong. Fixed with a pitched, plane-by-plane copy at the buffer's real stride. Also removes the width-multiple-of-64 restriction (any even width works now).Plus a small flake doc fix: the
libvadevShell comment claimed moq-vaapi dlopens libva with vendored headers; moq-vaapi 0.0.2 actually links it via pkg-config and the binary carriesNEEDED libva.so.2/libva-drm.so.2(build- and run-time dep). Verified in a freshnix developthat the vaapi feature builds and the vaapi-linked binary loads without anyLD_LIBRARY_PATHoverride.Dependency
Fixes 1 and 3 need additions to the
nvidia-video-codec-sdkfork (the safe wrapper exposed neitherFORCEIDRnor a pitched write — it had a literalTODO: Write pitched?): aforce_idrflag andBufferLock::write_rows/pitch(). Already pushed to thedynamic-loadingbranch (kixelated/nvidia-video-codec-sdk@5434cc8), which this crate's[patch.crates-io]tracks. Acargo update -p nvidia-video-codec-sdkpicks it up.Public API
No public API change to
moq-video.decode::backendwidened from private topub(crate)(crate-internal only) so the NVENC round-trip test can decode its own output with the software decoder.Test plan
LD_LIBRARY_PATH=/usr/lib64 cargo test -p moq-video --no-default-features --features nvenc— all pass on the RTX 3070 Ti (the 4nvenc_*tests are no-ops on a GPU-less host).nvenc_h264/h265_keyframes_carry_param_sets— forced mid-stream IDR carries SPS/PPS (+VPS).nvenc_h264_periodic_idr_at_gop— periodic IDRs land at GOP boundaries and carry SPS/PPS.nvenc_h264_pitched_write_roundtrips— encode→openh264-decode round-trip, MAE ~0.3 vs input (ffmpeg-free); independently confirmed with an ffmpeg decode at 320x240, 300x240, 1280x720.Kind::Named("vaapi")errors cleanly,Kind::Autofalls back to openh264.Environment note (not a code bug): on a Nix dev-shell the runtime driver probe returns "not found" even with the driver present, because the nix loader doesn't search
/usr/lib64by soname (noldconfigcache); henceLD_LIBRARY_PATH=/usr/lib64above. A normal glibc distro resolves it vialdconfig.(Written by Claude Opus 4.8)
🤖 Generated with Claude Code
https://claude.ai/code/session_01T68neyMTMV7j8qzJohUqfk