Conversation
rppt
pushed a commit
that referenced
this pull request
Jun 2, 2026
The generic/642 test-case can reproduce the kernel crash: [40243.605254] ------------[ cut here ]------------ [40243.605956] kernel BUG at fs/ceph/xattr.c:918! [40243.607142] Oops: invalid opcode: 0000 [#1] SMP PTI [40243.608067] CPU: 7 UID: 0 PID: 498762 Comm: kworker/7:1 Not tainted 7.0.0-rc7+ #3 PREEMPT(full) [40243.609700] Hardware name: QEMU Ubuntu 25.10 PC v2 (i440FX + PIIX, + 10.1 machine, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [40243.611820] Workqueue: ceph-msgr ceph_con_workfn [40243.612715] RIP: 0010:__ceph_build_xattrs_blob+0x1b8/0x1e0 [40243.613731] Code: 0f 84 82 fe ff ff e9 cf 8e 56 ff 48 8d 65 e8 31 c0 5b 41 5c 41 5d 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b 4c 8b 62 08 41 8b 85 24 07 00 00 49 83 c4 04 41 89 44 24 fc [40243.616888] RSP: 0018:ffffcc80c4d4b688 EFLAGS: 00010287 [40243.617773] RAX: 0000000000010026 RBX: 0000000000000001 RCX: 0000000000000000 [40243.618928] RDX: ffff8a773798dee0 RSI: 0000000000000000 RDI: 0000000000000000 [40243.620158] RBP: ffffcc80c4d4b6a0 R08: 0000000000000000 R09: 0000000000000000 [40243.621573] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a75f3b58000 [40243.622907] R13: ffff8a75f3b58000 R14: 0000000000000080 R15: 000000000000bffd [40243.624054] FS: 0000000000000000(0000) GS:ffff8a787d1b4000(0000) knlGS:0000000000000000 [40243.625331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40243.626269] CR2: 000072f390b623c0 CR3: 000000011c02a003 CR4: 0000000000372ef0 [40243.627408] Call Trace: [40243.627839] <TASK> [40243.628188] __prep_cap+0x3fd/0x4a0 [40243.628789] ? do_raw_spin_unlock+0x4e/0xe0 [40243.629474] ceph_check_caps+0x46a/0xc80 [40243.630094] ? __lock_acquire+0x4a2/0x2650 [40243.630773] ? find_held_lock+0x31/0x90 [40243.631347] ? handle_cap_grant+0x79f/0x1060 [40243.632068] ? lock_release+0xd9/0x300 [40243.632696] ? __mutex_unlock_slowpath+0x3e/0x340 [40243.633429] ? lock_release+0xd9/0x300 [40243.634052] handle_cap_grant+0xcf6/0x1060 [40243.634745] ceph_handle_caps+0x122b/0x2110 [40243.635415] mds_dispatch+0x5bd/0x2160 [40243.636034] ? ceph_con_process_message+0x65/0x190 [40243.636828] ? lock_release+0xd9/0x300 [40243.637431] ceph_con_process_message+0x7a/0x190 [40243.638184] ? kfree+0x311/0x4f0 [40243.638749] ? kfree+0x311/0x4f0 [40243.639268] process_message+0x16/0x1a0 [40243.639915] ? sg_free_table+0x39/0x90 [40243.640572] ceph_con_v2_try_read+0xf58/0x2120 [40243.641255] ? lock_acquire+0xc8/0x300 [40243.641863] ceph_con_workfn+0x151/0x820 [40243.642493] process_one_work+0x22f/0x630 [40243.643093] ? process_one_work+0x254/0x630 [40243.643770] worker_thread+0x1e2/0x400 [40243.644332] ? __pfx_worker_thread+0x10/0x10 [40243.645020] kthread+0x109/0x140 [40243.645560] ? __pfx_kthread+0x10/0x10 [40243.646125] ret_from_fork+0x3f8/0x480 [40243.646752] ? __pfx_kthread+0x10/0x10 [40243.647316] ? __pfx_kthread+0x10/0x10 [40243.647919] ret_from_fork_asm+0x1a/0x30 [40243.648556] </TASK> [40243.648902] Modules linked in: overlay hctr2 libpolyval chacha libchacha adiantum libnh libpoly1305 essiv intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit kvm_intel kvm irqbypass joydev ghash_clmulni_intel aesni_intel rapl input_leds mac_hid psmouse vga16fb serio_raw vgastate floppy i2c_piix4 pata_acpi bochs qemu_fw_cfg i2c_smbus sch_fq_codel rbd dm_crypt msr parport_pc ppdev lp parport efi_pstore [40243.654766] ---[ end trace 0000000000000000 ]--- Commit d93231a ("ceph: prevent a client from exceeding the MDS maximum xattr size") moved the required_blob_size computation to before the __build_xattrs() call, introducing a race. __build_xattrs() releases and reacquires i_ceph_lock during execution. In that window, handle_cap_grant() may update i_xattrs.blob with a newer MDS-provided blob and bump i_xattrs.version. When __build_xattrs() detects that index_version < version, it destroys and rebuilds the entire xattr rb-tree from the new blob, potentially increasing count, names_size, and vals_size. The prealloc_blob size check that follows still uses the stale required_blob_size computed before the rebuild, so it passes even when prealloc_blob is too small for the now-larger tree. After __set_xattr() adds one more xattr on top, __ceph_build_xattrs_blob() is called from the cap flush path and hits: BUG_ON(need > ci->i_xattrs.prealloc_blob->alloc_len); Fix this by recomputing required_blob_size after __build_xattrs() returns, using the current tree state. Also re-validate against m_max_xattr_size to fall back to the sync path if the rebuilt tree now exceeds the MDS limit. Cc: stable@vger.kernel.org Fixes: d93231a ("ceph: prevent a client from exceeding the MDS maximum xattr size") Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
rppt
pushed a commit
that referenced
this pull request
Jun 2, 2026
…kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #3 - Fix ITS EventID sanitisation when restoring an interrupt translation table. - Fix PPI memory leak when failing to initialise a vcpu. - Correctly return an error when the validation of a hypervisor trace descriptor fails, and limit this validation to protected mode only.
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
Since the start of the git history, brport_store() always acquired the
bridge lock. Back then this decision made sense: The bridge lock
protects the STP state of the bridge and its ports and at that time the
function was only used by two STP related attributes (cost and
priority).
Nowadays, brport_store() processes a lot more attributes and most of
them do not need the bridge lock:
* Bridge flags: Only require RTNL. Read locklessly by the data path.
Annotations can be added in net-next.
* FDB port flushing: Only requires the FDB lock.
* Multicast attributes: Only require the multicast lock.
* Group forward mask: Only requires RTNL. Read locklessly by the data
path. Annotations can be added in net-next.
* Backup port: Only requires RTNL. Read locklessly by the data path.
This is a problem as the bridge calls dev_set_promiscuity() when certain
bridge port flags change and this function can sleep since the commit
cited below, resulting in a splat such as [1].
Fix this by reducing the scope of the bridge lock and only take it when
processing the two STP related attributes that require it. Remove the
now stale comment from br_switchdev_set_port_flag(). The
SWITCHDEV_F_DEFER flag can be removed in net-next.
[1]
BUG: sleeping function called from invalid context at net/core/dev_addr_lists.c:1262
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 372, name: bash
preempt_count: 201, expected: 0
RCU nest depth: 0, expected: 0
5 locks held by bash/372:
#0: ffff88810c51c3f0 (sb_writers#7){.+.+}-{0:0}, at: ksys_write (fs/read_write.c:740)
#1: ffff888115ce9480 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter (fs/kernfs/file.c:343)
#2: ffff88810b9fd330 (kn->active#37){.+.+}-{0:0}, at: kernfs_fop_write_iter (fs/kernfs/file.c:80 fs/kernfs/file.c:344)
#3: ffffffffa59473a0 (rtnl_mutex){+.+.}-{4:4}, at: brport_store (net/bridge/br_sysfs_if.c:326)
#4: ffff8881099d2d58 (&br->lock){+...}-{3:3}, at: brport_store (./include/linux/spinlock.h:348 net/bridge/br_sysfs_if.c:345)
Preemption disabled at:
0x0
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
__might_resched.cold (kernel/sched/core.c:9163)
netif_rx_mode_run (net/core/dev_addr_lists.c:1262)
netif_rx_mode_sync (net/core/dev_addr_lists.c:1428)
dev_set_promiscuity (net/core/dev_api.c:289)
br_manage_promisc (net/bridge/br_if.c:135 net/bridge/br_if.c:172)
br_port_flags_change (net/bridge/br_if.c:242 net/bridge/br_if.c:747)
store_learning (net/bridge/br_sysfs_if.c:79 net/bridge/br_sysfs_if.c:235)
brport_store (net/bridge/br_sysfs_if.c:346)
kernfs_fop_write_iter (fs/kernfs/file.c:352)
new_sync_write (fs/read_write.c:595)
vfs_write (fs/read_write.c:688)
ksys_write (fs/read_write.c:740)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
Fixes: 78cd408 ("net: add missing instance lock to dev_set_promiscuity")
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260526064818.272516-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
Ido Schimmel says: ==================== bridge: Fix sleep in atomic context Under certain circumstances the bridge driver can call dev_set_promiscuity() while holding the bridge spin lock. This is a problem as dev_set_promiscuity() might sleep. Patches #1-#2 fix the problem in the netlink and sysfs configuration paths by only taking the lock where it is actually needed, thereby avoiding calling dev_set_promiscuity() from an atomic context. Patch #3 adds test cases for both configuration paths in rtnetlink.sh which already includes test cases for similar issues. Note that dev_set_promiscuity() can sleep either when it takes the net device mutex or when calling netif_rx_mode_sync(). I encountered the problem with the latter, but blamed the former since it came earlier. ==================== Link: https://patch.msgid.link/20260526064818.272516-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
Revisit and reorganize the locking and lock coverage of the ap->lock spinlock as used in the two sysfs functions se_bind_store() and se_associate_store(). A kernel run reported a possible deadlock situation, caused by holding the spinlock (ap->lock) while triggering a uevent. The fix rearranges the code protected by the spinlock by excluding the uevent invocation, which does not require protection. Additionally, the start of the protected region is moved earlier to cover more lines, ensuring a consistent view of the AP queue state between reading and updating its struct fields. ===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 7.1.0-20260601.rc6.git12.516b5dbd4d4a.300.fc44.s390x+debug #1 Not tainted ----------------------------------------------------- setupseguest.sh/11034 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: 000001c991f498e8 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_cache_noprof+0x5a/0x6d0 and this task is already holding: 000000c4a1a12378 (&aq->lock){+.-.}-{2:2}, at: se_bind_store+0x96/0x3a0 which would create a new lock dependency: (&aq->lock){+.-.}-{2:2} -> (fs_reclaim){+.+.}-{0:0} but this new dependency connects a SOFTIRQ-irq-safe lock: (&aq->lock){+.-.}-{2:2} ... which became SOFTIRQ-irq-safe at: __lock_acquire+0x5ae/0x15a0 lock_acquire+0x14c/0x400 _raw_spin_lock_bh+0x58/0xb0 ap_tasklet_fn+0x72/0xd0 tasklet_action_common+0x174/0x1b0 handle_softirqs+0x180/0x5c0 irq_exit_rcu+0x196/0x200 do_ext_irq+0x12a/0x4d0 ext_int_handler+0xc6/0xf0 folio_zero_user+0x1c6/0x240 folio_zero_user+0x182/0x240 vma_alloc_anon_folio_pmd+0xa0/0x1d0 __do_huge_pmd_anonymous_page+0x3a/0x200 __handle_mm_fault+0x56c/0x590 handle_mm_fault+0xa2/0x370 do_exception+0x292/0x590 __do_pgm_check+0x136/0x3e0 pgm_check_handler+0x114/0x160 to a SOFTIRQ-irq-unsafe lock: (fs_reclaim){+.+.}-{0:0} ... which became SOFTIRQ-irq-unsafe at: ... __lock_acquire+0x5ae/0x15a0 lock_acquire+0x14c/0x400 __fs_reclaim_acquire+0x44/0x50 fs_reclaim_acquire+0xbe/0x100 fs_reclaim_correct_nesting+0x20/0x70 dotest+0x5e/0x148 locking_selftest+0x2854/0x2a88 start_kernel+0x3b2/0x4f0 startup_continue+0x2e/0x40 other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); local_irq_disable(); lock(&aq->lock); lock(fs_reclaim); <Interrupt> lock(&aq->lock); *** DEADLOCK *** 4 locks held by setupseguest.sh/11034: #0: 000000c485d01440 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x2fc/0x380 #1: 000000c4d2283288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x12a0x270 #2: 000000c4a1830e48 (kn->active#172){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x1e/0x270 #3: 000000c4a1a12378 (&aq->lock){+.-.}-{2:2}, at: se_bind_store+0x96/0x3a0 the dependencies between SOFTIRQ-irq-safe lock and the holding lock: -> (&aq->lock){+.-.}-{2:2} { HARDIRQ-ON-W at: __lock_acquire+0x5ae/0x15a0 lock_acquire+0x14c/0x400 _raw_spin_lock_bh+0x58/0xb0 ap_queue_init_state+0x2e/0x50 ap_scan_domains+0x5d6/0x620 ap_scan_adapter+0x4c0/0x810 ap_scan_bus+0x70/0x350 ap_scan_bus_wq_callback+0x56/0x80 process_one_work+0x2ba/0x820 worker_thread+0x21a/0x400 kthread+0x164/0x190 __ret_from_fork+0x4c/0x340 ret_from_fork+0xa/0x30 IN-SOFTIRQ-W at: __lock_acquire+0x5ae/0x15a0 lock_acquire+0x14c/0x400 _raw_spin_lock_bh+0x58/0xb0 ap_tasklet_fn+0x72/0xd0 tasklet_action_common+0x174/0x1b0 handle_softirqs+0x180/0x5c0 irq_exit_rcu+0x196/0x200 do_ext_irq+0x12a/0x4d0 ext_int_handler+0xc6/0xf0 folio_zero_user+0x1c6/0x240 folio_zero_user+0x182/0x240 vma_alloc_anon_folio_pmd+0xa0/0x1d0 __do_huge_pmd_anonymous_page+0x3a/0x200 __handle_mm_fault+0x56c/0x590 handle_mm_fault+0xa2/0x370 do_exception+0x292/0x590 __do_pgm_check+0x136/0x3e0 pgm_check_handler+0x114/0x160 INITIAL USE at: __lock_acquire+0x5ae/0x15a0 lock_acquire+0x14c/0x400 _raw_spin_lock_bh+0x58/0xb0 ap_queue_init_state+0x2e/0x50 ap_scan_domains+0x5d6/0x620 ap_scan_adapter+0x4c0/0x810 ap_scan_bus+0x70/0x350 ap_scan_bus_wq_callback+0x56/0x80 process_one_work+0x2ba/0x820 worker_thread+0x21a/0x400 kthread+0x164/0x190 __ret_from_fork+0x4c/0x340 ret_from_fork+0xa/0x30 } ... key at: [<000001c9936e8aa0>] __key.7+0x0/0x10 the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: -> (fs_reclaim){+.+.}-{0:0} { HARDIRQ-ON-W at: __lock_acquire+0x5ae/0x15a0 lock_acquire+0x14c/0x400 __fs_reclaim_acquire+0x44/0x50 fs_reclaim_acquire+0xbe/0x100 fs_reclaim_correct_nesting+0x20/0x70 dotest+0x5e/0x148 locking_selftest+0x2854/0x2a88 start_kernel+0x3b2/0x4f0 startup_continue+0x2e/0x40 SOFTIRQ-ON-W at: __lock_acquire+0x5ae/0x15a0 lock_acquire+0x14c/0x400 __fs_reclaim_acquire+0x44/0x50 fs_reclaim_acquire+0xbe/0x100 fs_reclaim_correct_nesting+0x20/0x70 dotest+0x5e/0x148 locking_selftest+0x2854/0x2a88 start_kernel+0x3b2/0x4f0 startup_continue+0x2e/0x40 INITIAL USE at: __lock_acquire+0x5ae/0x15a0 lock_acquire+0x14c/0x400 __fs_reclaim_acquire+0x44/0x50 fs_reclaim_acquire+0xbe/0x100 fs_reclaim_correct_nesting+0x20/0x70 dotest+0x5e/0x148 locking_selftest+0x2854/0x2a88 start_kernel+0x3b2/0x4f0 startup_continue+0x2e/0x40 } ... key at: [<000001c991f498e8>] __fs_reclaim_map+0x0/0x30 ... acquired at: check_prev_add+0x178/0xf40 __lock_acquire+0x12aa/0x15a0 lock_acquire+0x14c/0x400 __fs_reclaim_acquire+0x44/0x50 fs_reclaim_acquire+0xbe/0x100 __kmalloc_cache_noprof+0x5a/0x6d0 kobject_uevent_env+0xd4/0x420 ap_send_se_bind_uevent+0x48/0x70 se_bind_store+0x146/0x3a0 kernfs_fop_write_iter+0x18c/0x270 vfs_write+0x23c/0x380 ksys_write+0x88/0x120 __do_syscall+0x170/0x750 system_call+0x72/0x90 stack backtrace: CPU: 6 UID: 0 PID: 11034 Comm: setupseguest.sh Not tainted 7.1.0-20260601.rc6.git2.516b5dbd4d4a.300.fc44.s390x+debug #1 PREEMPT Hardware name: IBM 9175 ME1 701 (KVM/Linux) Call Trace: [<000001c98ffa0a7e>] dump_stack_lvl+0xae/0x108 [<000001c9900a6d7a>] print_bad_irq_dependency+0x47a/0x480 [<000001c9900a7184>] check_irq_usage+0x404/0x4c0 [<000001c9900a73b8>] check_prev_add+0x178/0xf40 [<000001c9900aaf1a>] __lock_acquire+0x12aa/0x15a0 [<000001c9900ab35c>] lock_acquire+0x14c/0x400 [<000001c9903be454>] __fs_reclaim_acquire+0x44/0x50 [<000001c9903be51e>] fs_reclaim_acquire+0xbe/0x100 [<000001c9903cf4ca>] __kmalloc_cache_noprof+0x5a/0x6d0 [<000001c9910ca9d4>] kobject_uevent_env+0xd4/0x420 [<000001c990d84098>] ap_send_se_bind_uevent+0x48/0x70 [<000001c990d87416>] se_bind_store+0x146/0x3a0 [<000001c99057da7c>] kernfs_fop_write_iter+0x18c/0x270 [<000001c99047712c>] vfs_write+0x23c/0x380 [<000001c990477438>] ksys_write+0x88/0x120 [<000001c9910f64e0>] __do_syscall+0x170/0x750 [<000001c99110a412>] system_call+0x72/0x90 INFO: lockdep is turned off. Fixes: 4179c39 ("s390/ap: Implement SE bind and associate uevents") Reported-by: Ingo Franzki <ifranzki@linux.ibm.com> Suggested-by: Finn Callies <fcallies@linux.ibm.com> Reviewed-by: Finn Callies <fcallies@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
vntb_epf_peer_db_set() raises an MSI interrupt to notify the RC side of a doorbell event. pci_epc_raise_irq(..., PCI_IRQ_MSI, interrupt_num) takes a 1-based MSI interrupt number. The ntb_hw_epf driver reserves MSI #1 for link events, so doorbells would naturally start at MSI #2 (doorbell bit 0 -> MSI #2). However, pci-epf-vntb has historically applied an extra offset and mapped doorbell bit 0 to MSI #3. This matches the legacy behavior of ntb_hw_epf and has been preserved since commit e35f56b ("PCI: endpoint: Support NTB transfer between RC and EP"). This offset has not surfaced as a functional issue because: - ntb_hw_epf typically allocates enough MSI vectors, so the off-by-one still hits a valid MSI vector, and - ntb_hw_epf does not implement .db_vector_count()/.db_vector_mask(), so client drivers such as ntb_transport effectively ignore the vector number and schedule all QPs. Correcting the MSI number would break interoperability with peers running older kernels. Document the legacy offset to avoid confusion when enabling per-db-vector handling. Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260513024923.451765-2-den@valinux.co.jp
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
…ernel/git/ath/ath Jeff Johnson says: ================== ath.git patches for v7.2 (PR #3) In ath12k, add driver support for WDS mode. In ath11k and ath12k, a number of cleanups and minor bug fixes. ================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
strset_add_str_mem() might reallocate the strset data buffer in order to accommodate the provided string 's'. However, if 's' points to a string already present in the buffer, it becomes dangling after the realloc. This leads to a use-after-free when attempting to memcpy() the string into the new buffer. One scenario that triggers this problematic path is when resolve_btfids attempts to patch kfunc prototypes using existing BTF parameter names: | resolve_btfids: function bpf_list_push_back_impl already exists in BTF | Segmentation fault (core dumped) Compiling resolve_btfids with fsanitize=address generates a detailed report of the UAF: | ================================================================= | ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 | ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818 | READ of size 5 at 0x7f4c4a500bd4 thread T0 | #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7) | #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2 | #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8 | #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14 | #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9 | #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9 | #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6 | #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 | #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3 | #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860) | | 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8) | freed by thread T0 here: | #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700) | #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9 | #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13 | | previously allocated by thread T0 here: | #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3) | #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20 While resolve_btfids could be refactored to avoid this call path, let's instead fix this issue at the source in strset__add_str() and avoid similar scenarios. Let's check if set->strs_data was reallocated and whether 's' points to an internal string within the old strset buffer. In such case, 's' is reconstructed to point to the new buffer. While already here, also fix strset__find_str() which suffers from the same problem by factoring out the common operations into a new helper function strset_str_append(). Fixes: 90d76d3 ("libbpf: Extract internal set-of-strings datastructure APIs") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Suggested-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260523162722.2718940-1-cmllamas@google.com
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
The Sennheiser MOMENTUM 3 is a wireless around-ear headphones featuring ANC, which can be connected via Bluetooth or USB-C. When connecting via USB-C, its UAC mixer works fine and precisely corresponds to the reported dB range. However, the mixer's volume GET_CUR method is somehow stubbed and returns a constant value (15dB). Since commit 86aa1ea ("ALSA: usb-audio: Do not expose sticky mixers"), the sticky check considers the mixer to be sticky and unnecessarily disables the mixer. Add a quirk table entry matching VID/PID=0x1377/0x6004 and applying the MIXER_GET_CUR_BROKEN quirk flag, so that the mixer is usable again. Quirky device sample: usb 7-1.4.4.1.1.1: new full-speed USB device number 30 using xhci_hcd usb 7-1.4.4.1.1.1: New USB device found, idVendor=1377, idProduct=6004, bcdDevice=38.85 usb 7-1.4.4.1.1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 usb 7-1.4.4.1.1.1: Product: MOMENTUM 3 usb 7-1.4.4.1.1.1: Manufacturer: Sennheiser electronic GmbH & Co. KG usb 7-1.4.4.1.1.1: SerialNumber: <REDACTED> usb 7-1.4.4.1.1.1: Found last interface = 0 usb 7-1.4.4.1.1.1: 1:1: add audio endpoint 0x3 usb 7-1.4.4.1.1.1: Creating new data endpoint #3 usb 7-1.4.4.1.1.1: 1:1 Set sample rate 48000, clock 0 usb 7-1.4.4.1.1.1: 6:0: sticky mixer values (0/11520/768 => 3840), disabling usb 7-1.4.4.1.1.1: [6] FU [PCM Playback Volume] skipped due to invalid volume input: Sennheiser electronic GmbH & Co. KG MOMENTUM 3 as /devices/pci0000:00/0000:00:08.3/0000:67:00.4/usb7/7-1/7-1.4/7-1.4.4/7-1.4.4.1/7-1.4.4.1.1/7-1.4.4.1.1.1/7-1.4.4.1.1.1:1.2/0003:1377:6004.002B/input/input208 input: Sennheiser electronic GmbH & Co. KG MOMENTUM 3 Consumer Control as /devices/pci0000:00/0000:00:08.3/0000:67:00.4/usb7/7-1/7-1.4/7-1.4.4/7-1.4.4.1/7-1.4.4.1.1/7-1.4.4.1.1.1/7-1.4.4.1.1.1:1.2/0003:1377:6004.002B/input/input209 hid-generic 0003:1377:6004.002B: input,hiddev99,hidraw12: USB HID v1.11 Device [Sennheiser electronic GmbH & Co. KG MOMENTUM 3] on usb-0000:67:00.4-1.4.4.1.1.1/input2 Signed-off-by: Rong Zhang <i@rong.moe> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20260531-uac-quirk-get-cur-vol-v4-2-ede643dca151@rong.moe
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
Richard Fitzgerald <rf@opensource.cirrus.com> says: These are for-next. They are not urgent because it only leaks memory if the driver failed to component_probe or is removed, which wouldn't happen in normal use. This series fixes some memory leaks: - The memory allocated by wm_adsp/cs_dsp was not freed. - If component_probe() failed it didn't clean up. The addition of this cleanup in patch #3 exposes an existing possible double-free of the debugfs, which is fixed in patch #2. Link: https://patch.msgid.link/20260610093432.557375-1-rf@opensource.cirrus.com
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
A deadlock occurs in the audit subsystem when duplicating executable-related rules. When a file is moved (e.g., via do_renameat2()), the VFS layer locks the parent directory (I_MUTEX_PARENT), which synchronously triggers an fsnotify_move event. If an existing executable audit rule matches the file being moved, the audit subsystem catches this event and calls audit_dupe_exe() to duplicate the watch and update the rule. Then, audit_alloc_mark() would call kern_path_parent() to resolve the path, leading to a blind attempt to acquire the exact same I_MUTEX_PARENT lock already held by the task, resulting in the following recursive locking deadlock: ============================================ WARNING: possible recursive locking detected 6.12.0-55.27.1.el10_0.x86_64+debug #1 Not tainted -------------------------------------------- mv/5099 is trying to acquire lock: ffff888132845358 (&inode->i_sb->s_type->i_mutex_dir_key/1){+.+.}-{3:3}, at: __kern_path_locked+0x10a/0x2f0 but task is already holding lock: ffff888132846b58 (&inode->i_sb->s_type->i_mutex_dir_key/1){+.+.}-{3:3}, at: lock_two_directories+0x13f/0x2b0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&inode->i_sb->s_type->i_mutex_dir_key/1); lock(&inode->i_sb->s_type->i_mutex_dir_key/1); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by mv/5099: #0: ffff888112a9c440 (sb_writers#13) at: do_renameat2+0x34c/0xbc0 #1: ffff888112a9c790 (&type->s_vfs_rename_key#3) at: do_renameat2+0x415/0xbc0 #2: ffff888132846b58 (&inode->i_sb->s_type->i_mutex_dir_key/1) at: lock_two_directories+0x13f/0x2b0 #3: ffff888132845358 (&inode->i_sb->s_type->i_mutex_dir_key/5) at: lock_two_directories+0x175/0x2b0 #4: ffffffffb3a1fb10 (&fsnotify_mark_srcu) at: fsnotify+0x454/0x28a0 #5: ffffffffaf886230 (audit_filter_mutex) at: audit_update_watch+0x36/0x11e0 stack backtrace: Call Trace: <TASK> dump_stack_lvl+0x6f/0xb0 print_deadlock_bug.cold+0xbd/0xca validate_chain+0x83a/0xf00 __lock_acquire+0xcac/0x1d20 lock_acquire.part.0+0x11b/0x360 down_write_nested+0x9f/0x230 __kern_path_locked+0x10a/0x2f0 kern_path_locked+0x26/0x40 audit_alloc_mark+0xfb/0x4f0 audit_dupe_exe+0x6c/0xe0 audit_dupe_rule+0x6c2/0xc00 audit_update_watch+0x4cc/0x11e0 audit_watch_handle_event+0x12c/0x1b0 send_to_group+0x5d0/0x8b0 fsnotify+0x615/0x28a0 fsnotify_move+0x1d8/0x630 vfs_rename+0xdcd/0x1df0 do_renameat2+0x9d4/0xbc0 __x64_sys_renameat+0x192/0x260 do_syscall_64+0x92/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f0491fe8c4e Code: 0f 1f 40 00 48 8b 15 c1 e1 16 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 08 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 89 RSP: 002b:00007ffc7210bf38 EFLAGS: 00000246 ORIG_RAX: 0000000000000108 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0491fe8c4e RDX: 0000000000000003 RSI: 00007ffc7210e6c8 RDI: 00000000ffffff9c RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: 00005575eb2dae2a R11: 0000000000000246 R12: 00005575eb2dae2a R13: 00007ffc7210e6c8 R14: 0000000000000003 R15: 00000000ffffff9c </TASK> The aforementioned deadlock can be consistently reproduced by running the script below: audit-dupe-exe-deadlock.sh -------------------------- #!/bin/bash auditctl -D mkdir -p /tmp/foo touch /tmp/file auditctl -a always,exit -F exe=/tmp/file -F path=/tmp/file -S all -k dr mv /tmp/file /tmp/foo/file rm -Rf /tmp/foo This patch fixes the issue by introducing struct audit_watch_ctx to pass the fsnotify event context down to audit_alloc_mark(). By utilizing the already-resolved directory inode provided by the event, we bypass the kern_path_parent() path resolution entirely, safely avoiding the recursive lock. Furthermore, it explicitly allows duplicate fsnotify marks (allow_dups = 1) during the rename update, allowing the new rule's mark to safely coexist with the old rule's mark until the old rule is freed. P.S.: This issue was identified and reproduced during a comprehensive code coverage analysis of the audit subsystem. The full report is available at the link below: https://people.redhat.com/rrobaina/audit-code-coverage-analysis.pdf P.P.S: With the permission of both Ricardo and Nathan, I've squashed a fixup patch from Nathan that addresses a compile time error when CONFIG_AUDITSYSCALL=n. Cc: stable@kernel.org Fixes: 34d99af ("audit: implement audit by executable") Acked-by: Waiman Long <longman@redhat.com> Acked-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ricardo Robaina <rrobaina@redhat.com> [PM: my link metadata into the msg, apply fix from NC] Signed-off-by: Paul Moore <paul@paul-moore.com>
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
Unlocking robust non-PI futexes happens in user space with the following sequence: 1) robust_list_set_op_pending(mutex); 2) robust_list_remove(mutex); lval = 0; 3) lval = atomic_xchg(lock, lval); 4) if (lval & WAITERS) 5) sys_futex(WAKE,....); 6) robust_list_clear_op_pending(); That opens a window between #3 and #6 where the mutex could be acquired by some other task which observes that it is the last user and: A) unmaps the mutex memory B) maps a different file, which ends up covering the same address When the original task exits before reaching #6 then the kernel robust list handling observes the pending op entry and tries to fix up user space. In case that the newly mapped data contains the TID of the exiting thread at the address of the mutex/futex the kernel will set the owner died bit in that memory and therefore corrupting unrelated data. PI futexes have a similar problem both for the non-contented user space unlock and the in kernel unlock: 1) robust_list_set_op_pending(mutex); 2) robust_list_remove(mutex); lval = gettid(); 3) if (!atomic_try_cmpxchg(lock, lval, 0)) 4) sys_futex(UNLOCK_PI,....); 5) robust_list_clear_op_pending(); Address the first part of the problem where the futexes have waiters and need to enter the kernel anyway. Add a new FUTEX_ROBUST_UNLOCK flag, which is valid for the sys_futex() FUTEX_UNLOCK_PI, FUTEX_WAKE, FUTEX_WAKE_BITSET operations. This deliberately omits FUTEX_WAKE_OP from this treatment as it's unclear whether this is needed and there is no usage of it in glibc either to investigate. For the futex2 syscall family this needs to be implemented with a new syscall. The sys_futex() case [ab]uses the @uaddr2 argument to hand the pointer to robust_list_head::list_pending_op into the kernel. This argument is only evaluated when the FUTEX_ROBUST_UNLOCK bit is set and is therefore backward compatible. This is an explicit argument to avoid the lookup of the robust list pointer and retrieving the pending op pointer from there. User space has the pointer already available so it can just put it into the @uaddr2 argument. Aside of that this allows the usage of multiple robust lists in the future without any changes to the internal functions as they just operate on the provided pointer. This requires a second flag FUTEX_ROBUST_LIST32 which indicates that the robust list pointer points to an u32 and not to an u64. This is required for two reasons: 1) sys_futex() has no compat variant 2) The gaming emulators use both both 64-bit and compat 32-bit robust lists in the same 64-bit application As a consequence 32-bit applications have to set this flag unconditionally so they can run on a 64-bit kernel in compat mode unmodified. 32-bit kernels return an error code when the flag is not set. 64-bit kernels will happily clear the full 64 bits if user space fails to set it. In case of FUTEX_UNLOCK_PI this clears the robust list pending op when the unlock succeeded. In case of errors, the user space value is still locked by the caller and therefore the above cannot happen. In case of FUTEX_WAKE* this does the unlock of the futex in the kernel and clears the robust list pending op when the unlock was successful. If not, the user space value is still locked and user space has to deal with the returned error. That means that the unlocking of non-PI robust futexes has to use the same try_cmpxchg() unlock scheme as PI futexes. If the clearing of the pending list op fails (fault) then the kernel clears the registered robust list pointer if it matches to prevent that exit() will try to handle invalid data. That's a valid paranoid decision because the robust list head sits usually in the TLS and if the TLS is not longer accessible then the chance for fixing up the resulting mess is very close to zero. The problem of non-contended unlocks still exists and will be addressed separately. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://patch.msgid.link/20260602090535.670514505@kernel.org
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
…unlock race When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes, then the unlock sequence in user space looks like this: 1) robust_list_set_op_pending(mutex); 2) robust_list_remove(mutex); lval = gettid(); 3) if (atomic_try_cmpxchg(&mutex->lock, lval, 0)) 4) robust_list_clear_op_pending(); else 5) sys_futex(OP | FUTEX_ROBUST_UNLOCK, ....); That still leaves a minimal race window between #3 and #4 where the mutex could be acquired by some other task, which observes that it is the last user and: 1) unmaps the mutex memory 2) maps a different file, which ends up covering the same address When then the original task exits before reaching #5 then the kernel robust list handling observes the pending op entry and tries to fix up user space. In case that the newly mapped data contains the TID of the exiting thread at the address of the mutex/futex the kernel will set the owner died bit in that memory and therefore corrupt unrelated data. On X86 this boils down to this simplified assembly sequence: mov %esi,%eax // Load TID into EAX xor %ecx,%ecx // Set ECX to 0 #3 lock cmpxchg %ecx,(%rdi) // Try the TID -> 0 transition .Lstart: jnz .Lend #4 movq %rcx,(%rdx) // Clear list_op_pending .Lend: If the cmpxchg() succeeds and the task is interrupted before it can clear list_op_pending in the robust list head (#4) and the task crashes in a signal handler or gets killed then it ends up in do_exit() and subsequently in the robust list handling, which then might run into the unmap/map issue described above. This is only relevant when user space was interrupted and a signal is pending. The fix-up has to be done before signal delivery is attempted because: 1) The signal might be fatal so get_signal() ends up in do_exit() 2) The signal handler might crash or the task is killed before returning from the handler. At that point the instruction pointer in pt_regs is not longer the instruction pointer of the initially interrupted unlock sequence. The right place to handle this is in __exit_to_user_mode_loop() before invoking arch_do_signal_or_restart() as this covers obviously both scenarios. As this is only relevant when the task was interrupted in user space, this is tied to RSEQ and the generic entry code as RSEQ keeps track of user space interrupts unconditionally even if the task does not have a RSEQ region installed. That makes the decision very lightweight: if (current->rseq.user_irq && within(regs, csr->unlock_ip_range)) futex_fixup_robust_unlock(regs, csr); futex_fixup_robust_unlock() then invokes a architecture specific function to return the pending op pointer or NULL. The function evaluates the register content to decide whether the pending ops pointer in the robust list head needs to be cleared. Assuming the above unlock sequence, then on x86 this decision is the trivial evaluation of the zero flag: return regs->eflags & X86_EFLAGS_ZF ? regs->dx : NULL; Other architectures might need to do more complex evaluations due to LLSC, but the approach is valid in general. The size of the pointer is determined from the matching range struct, which covers both 32-bit and 64-bit builds including COMPAT. The unlock sequence is going to be placed in the VDSO so that the kernel can keep everything synchronized, especially the register usage. The resulting code sequence for user space is: if (__vdso_futex_robust_list$SZ_try_unlock(lock, tid, &pending_op) != tid) err = sys_futex($OP | FUTEX_ROBUST_UNLOCK,....); Both the VDSO unlock and the kernel side unlock ensure that the pending_op pointer is always cleared when the lock becomes unlocked. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://patch.msgid.link/20260602090535.773669210@kernel.org
rppt
pushed a commit
that referenced
this pull request
Jun 14, 2026
When the FUTEX_ROBUST_UNLOCK mechanism is used for unlocking (PI-)futexes, then the unlock sequence in userspace looks like this: 1) robust_list_set_op_pending(mutex); 2) robust_list_remove(mutex); lval = gettid(); 3) if (atomic_try_cmpxchg(&mutex->lock, lval, 0)) 4) robust_list_clear_op_pending(); else 5) sys_futex(OP,...FUTEX_ROBUST_UNLOCK); That still leaves a minimal race window between #3 and #4 where the mutex could be acquired by some other task which observes that it is the last user and: 1) unmaps the mutex memory 2) maps a different file, which ends up covering the same address When then the original task exits before reaching #5 then the kernel robust list handling observes the pending op entry and tries to fix up user space. In case that the newly mapped data contains the TID of the exiting thread at the address of the mutex/futex the kernel will set the owner died bit in that memory and therefore corrupt unrelated data. Provide a VDSO function which exposes the critical section window in the VDSO symbol table. The resulting addresses are updated in the task's mm when the VDSO is (re)map()'ed. The core code detects when a task was interrupted within the critical section and is about to deliver a signal. It then invokes an architecture specific function which determines whether the pending op pointer has to be cleared or not. The unlock assembly sequence on 64-bit is: mov %esi,%eax // Load TID into EAX xor %ecx,%ecx // Set ECX to 0 lock cmpxchg %ecx,(%rdi) // Try the TID -> 0 transition .Lstart: jnz .Lend movq %rcx,(%rdx) // Clear list_op_pending .Lend: ret So the decision can be simply based on the ZF state in regs->flags. The pending op pointer is always in DX independent of the build mode (32/64-bit) to make the pending op pointer retrieval uniform. The size of the pointer is stored in the matching criticial section range struct and the core code retrieves it from there. So the pointer retrieval function does not have to care. It is bit-size independent: return regs->flags & X86_EFLAGS_ZF ? regs->dx : NULL; There are two entry points to handle the different robust list pending op pointer size: __vdso_futex_robust_list64_try_unlock() __vdso_futex_robust_list32_try_unlock() The 32-bit VDSO provides only __vdso_futex_robust_list32_try_unlock(). The 64-bit VDSO provides always __vdso_futex_robust_list64_try_unlock() and when COMPAT is enabled also the list32 variant, which is required to support multi-size robust list pointers used by gaming emulators. The unlock function is inspired by an idea from Mathieu Desnoyers. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Acked-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/20260311185409.1988269-1-mathieu.desnoyers@efficios.com Link: https://patch.msgid.link/20260602090535.883796247@kernel.org
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.