[DNM] vregions phase II: Actually use the lifetime allocations and make the memory division between lifetime and interim dynamic#10783
Conversation
|
@lgirdwood , @lyakh , @kv2019i , this PR is finally getting there. Everything works and there is no rigid separation of lifetime and interim memory requirements. There is only one size attirbute, and all vregion memory is used for lifetime allocations until the pipeline is complete, then the remaining memory is used to create interim heap for dynamic allocation. In this scheme a very little interim memory is used. One thing I still need to do is to go and get rid of separate lifetime_bytes and interim_bytes attributes and go back to just heap_bytes. There is some changes that are already merged to FW and I need to edit those, and then I need to change these PRs: |
|
This is finally getting there. I turned this ready for review to get copilot's comments, but it should be taken with a grain of salt. I have not yet even reviewed myself all the copilot generated code. There is also couple of commits, in addition to #10831 and #10842 , that should probably be separated to different PRs (src refactoring and debug_stream related stuff). However, this seems to work well enough with thesofproject/linux#5537 . |
There was a problem hiding this comment.
Pull request overview
This PR evolves the Zephyr “vregions” allocator from fixed lifetime/interim partitions into a single-size vregion that starts in LIFETIME mode and switches to a lazily-created INTERIM heap, then wires this into pipeline/module allocation via a shared mod_alloc_ctx. It also updates IPC4 DP/pipeline memory requirement payloads and topology tokens to a single heap_bytes requirement.
Changes:
- Refactor vregion API to use a single
memsizeand a runtime switch (vregion_set_interim()), with lazy interim heap initialization from remaining lifetime space. - Introduce
mod_alloc_ctxhelpers (sof_ctx_alloc/zalloc/free) and use them for pipeline/module allocations; add pipeline-level vregion support and switch-to-interim atpipeline_complete(). - Simplify IPC4 memory requirement data structures and topology tokens to a single
heap_bytesrequirement; plus small DP scheduler/thread naming and debug-stream syscall additions.
Reviewed changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| zephyr/test/vregion.c | Updates vregion unit test to new API and interim/lifetime switching semantics. |
| zephyr/lib/vregion.c | Core vregion refactor: single-size create, mode tracking, lazy interim heap init, updated allocation/free behavior. |
| zephyr/lib/fast-get.c | Updates vregion allocation call sites to new API (no explicit mem type). |
| zephyr/Kconfig | Adds configurable defaults for DP userspace heap size and minimum DP stack size. |
| zephyr/include/rtos/alloc.h | Adds mod_alloc_ctx and sof_ctx_* helpers using optional vregion. |
| zephyr/CMakeLists.txt | Registers debug stream text msg header for Zephyr syscalls generation. |
| tools/topology/topology2/include/components/widget-common.conf | Replaces lifetime/interim/shared heap requirements with a single heap requirement attribute. |
| tools/topology/topology2/include/common/tokens.conf | Replaces multiple memory requirement tokens with heap_bytes_requirement. |
| tools/topology/topology2/cavs-nocodec.conf | Updates DP SRC_DOMAIN memory requirement settings (currently still uses removed tokens). |
| src/schedule/zephyr_dp_schedule_application.c | Names DP threads and (optionally) records DP thread ID for memory API debug checks. |
| src/lib/objpool.c | Updates vregion alloc calls to new API. |
| src/ipc/ipc4/helper.c | Updates IPC4 logging to reflect new single-heap memory requirement. |
| src/include/user/debug_stream_text_msg.h | Adds syscall declaration for sending text records (userspace-capable). |
| src/include/sof/lib/vregion.h | Updates vregion API and adds vregion_set_interim(); adds INVALID mode. |
| src/include/sof/lib/dai-zephyr.h | Removes obsolete ctx alloc include. |
| src/include/sof/ctx_alloc.h | Removes old ctx allocation helper header (moved to rtos/alloc.h). |
| src/include/sof/audio/pipeline.h | Adds pipeline->alloc pointer for shared allocation context. |
| src/include/sof/audio/module_adapter/module/generic.h | Extends module resource debug info to track DP thread ID. |
| src/include/sof/audio/component.h | Removes obsolete ctx alloc include. |
| src/include/ipc4/pipeline.h | Collapses pipeline memory requirement payload to a single heap_bytes. |
| src/include/ipc4/module.h | Collapses DP module memory requirement payload to a single heap_bytes. |
| src/debug/debug_stream/debug_stream_text_msg.c | Implements ds_send_text_record() with Zephyr syscall verification path and refactors ds_msg sending. |
| src/audio/src/src.c | Refactors SRC init/prepare split to allocate stages earlier via a setup callback. |
| src/audio/src/src_lite.c | Same refactor as src.c for the “lite” coefficients variant. |
| src/audio/src/src_ipc4.c | Adds src_init_stages() and src_prepare_do() behavior tailored for IPC4/DP needs. |
| src/audio/src/src_ipc3.c | Adds IPC3-compatible stubs/behavior for the new SRC init/prepare split. |
| src/audio/src/src_common.h | Declares new SRC helper entry points and setup callback in comp_data. |
| src/audio/src/src_common.c | Adds shared delay-line allocation helper (src_allocate_delay_lines). |
| src/audio/smart_amp/llext/smart_amp.toml | Adds smart_amp llext manifest fragment. |
| src/audio/smart_amp/llext/llext.toml.h | Adds top-level llext TOML include wrapper. |
| src/audio/smart_amp/llext/CMakeLists.txt | Adds build rules for smart_amp llext with Maxim/non-Maxim variants. |
| src/audio/pipeline/pipeline-schedule.c | Allows DP thread stack sizing from IPC, with a configurable minimum. |
| src/audio/pipeline/pipeline-graph.c | Adds pipeline-level alloc context + vregion creation; switches allocations to interim at completion. |
| src/audio/module_adapter/module/generic.c | Updates debug thread check to allow DP thread as resource manager. |
| src/audio/module_adapter/module_adapter.c | Wires pipeline alloc/vregion sharing for LL modules; DP heap sizing via IPC; updates allocation/free paths. |
| src/audio/module_adapter/module_adapter_ipc4.c | Updates IPC4 ext-init logging to new single-heap scheme. |
| src/audio/buffers/ring_buffer.c | Removes obsolete ctx alloc include. |
| src/audio/buffers/comp_buffer.c | Removes obsolete ctx alloc include. |
| posix/include/rtos/alloc.h | Adds a POSIX stub mod_alloc_ctx/sof_ctx_* API (heap-only). |
| if (!vr->interim_initialized) | ||
| interim_heap_init(vr); | ||
|
|
||
| ptr = k_heap_aligned_alloc(&heap->heap, align, size, K_NO_WAIT); | ||
| if (!ptr) |
There was a problem hiding this comment.
This is a static function that is only called from vregion_alloc_align() so this function is never called unless vregion->type == VREGION_MEM_TYPE_INTERIM.
| /* | ||
| * Calling k_heap_init() with too small size causes an assert | ||
| * failure. Exact limit is hard to deduce from sys_heap_init() | ||
| * code, but 1024 seems to be ehough. | ||
| */ | ||
| if (interim_size < 1024) { | ||
| LOG_WRN("Too little memory, %u bytes, left for interim heap", interim_size); | ||
| vr->type = VREGION_MEM_TYPE_INVALID; | ||
| return; | ||
| } |
| /* lifetime linear allocator starts right after the vregion struct */ | ||
| vr->lifetime.base = vregion_base; | ||
| vr->lifetime.size = total_size; | ||
| vr->lifetime.ptr = vregion_base; |
| /* log the new vregion */ | ||
| LOG_INF("new at base %p size %#zx pages %u struct embedded at %p", | ||
| (void *)vr->base, total_size, pages, (void *)vr); |
| /* | ||
| * 3 pages of memsize + struct overhead → 4 pages total. | ||
| * Two 6000-byte lifetime allocs consume ~3 pages, leaving | ||
| * ~1 page for the interim heap. | ||
| */ | ||
| struct vregion *vreg = vregion_create(3 * CONFIG_MM_DRV_PAGE_SIZE); | ||
|
|
There was a problem hiding this comment.
The idea is that that some allocations fail
| #define DS_TEXT_MSG_MAX_LEN 128 | ||
|
|
||
| #if defined(__ZEPHYR__) && defined(CONFIG_USERSPACE) | ||
| __syscall void ds_send_text_record(const char *text, size_t len); | ||
| #else | ||
| void ds_send_text_record(const char *text, size_t len); | ||
| #endif |
| domain_id 0 | ||
| stack_bytes_requirement 2048 | ||
| interim_heap_bytes_requirement "$[(24 * 1024)]" | ||
| lifetime_heap_bytes_requirement 4096 | ||
| shared_bytes_requirement 0 |
| domain_id 0 | ||
| stack_bytes_requirement 2048 | ||
| interim_heap_bytes_requirement "$[(24 * 1024)]" | ||
| lifetime_heap_bytes_requirement 4096 | ||
| shared_bytes_requirement 0 |
| enum vregion_mem_type { | ||
| VREGION_MEM_TYPE_INTERIM, /* interim allocation that can be freed */ | ||
| VREGION_MEM_TYPE_LIFETIME, /* lifetime allocation */ | ||
| VREGION_MEM_TYPE_INVALID, /* Interim heap initializatoin failed */ | ||
| }; |
this sounds to me like this PR shouldn't be merged yet, marking it as such then |
|
Turned back to draft, now that I have copilot's comments. |
ca8651f to
8a50dd0
Compare
Convert mod_data_blob_handler_new() to a Zephyr syscall following the same pattern as mod_alloc_ext(), mod_free(), and mod_fast_get(). This allows modules running in user mode to call the function through the syscall interface. The z_vrfy_ handler validates the processing_module pointer and its heap region before dispatching to the implementation. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Convert mod_balloc_align() to a Zephyr syscall following the same pattern as mod_alloc_ext(), mod_free(), mod_fast_get(), and mod_data_blob_handler_new(). This allows modules running in user mode to allocate aligned buffer memory blocks through the syscall interface. The z_vrfy_ handler validates the processing_module pointer and its heap region before dispatching to the implementation. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Add a k_mutex to struct module_resources and take it around every objpool access in z_impl_mod_balloc_align(), z_impl_mod_alloc_ext(), z_impl_mod_data_blob_handler_new(), z_impl_mod_fast_get(), z_impl_mod_free(), and mod_free_all(). This prevents concurrent access to the resource tracking pool when a module's IPC thread and processing thread operate in parallel (e.g. set_large_config arriving while the module is running). In mod_free_all() the mutex is released before mod_resource_init() re-initializes it, so the lock is not held across re-initialization. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Remove the MEM_API_CHECK_THREAD() macro, CONFIG_MODULE_MEMORY_API_DEBUG Kconfig option, and the rsrc_mngr field from struct module_resources. This single-thread-owner check is now redundant: the previous commit added a spinlock that properly serializes all accesses to the resource pool, making the debug-only thread identity warning unnecessary. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Move pipeline_comp_dp_task_init() call after module private data initializations so that the struct module_config is available already at pipeline_comp_dp_task_init() init time. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Remove unused heap_size parameter from module_adapter_dp_heap_new() function. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Use IPC module init extended data (the dp_data) to determine DP module heap size when available. Add Kconfig option SOF_USERSPACE_DP_DEFAULT_HEAP_SIZE (default 20480) as fallback when extended init data is not present or does not provide heap sizes. Sanity-check the requested sizes (reject values above 64 MB) and log the allocated heap size. Also pass ext_init through module_adapter_mem_alloc() to module_adapter_dp_heap_new() and fix a minor comment typo. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Use IPC module init extended data (dp_data stack_bytes) to set the DP processing thread stack size when available. Fall back to TASK_DP_STACK_SIZE when ext init data is not present or does not provide a stack size. Add Kconfig option ZEPHYR_DP_SCHEDULER_MIN_STACK_SIZE (default 2048) to enforce a minimum stack size regardless of what the IPC payload requests. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Add a dummy Kconfig definition for SOF_USERSPACE_DP_DEFAULT_HEAP_SIZE so the testbench build can resolve the symbol. The testbench does not use Zephyr, where this option is normally defined, so it needs a local fallback. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Add posix stub headers for <zephyr/logging/log.h> and <zephyr/sys/util.h> so that firmware source files using LOG_ERR, LOG_INF, LOG_WRN, LOG_DBG, KB, or MB can compile in the testbench / posix build without modification. Remove the #ifdef __ZEPHYR__ guard around the <zephyr/logging/log.h> include in sof/trace/trace.h and the <zephyr/sys/util.h> include in sof/common.h so the posix stubs are picked up automatically. The redundant LOG_MODULE_REGISTER / LOG_MODULE_DECLARE definitions in trace.h are removed since the posix stub now provides them. Remove the macros and fix couple of format string issues. Also the tools/logger CMakeList.txt requires some tuning to find the now unconditionally included zephyr/sys/util.h. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Create a per-pipeline vregion in pipeline_new() when the IPC4 pipeline extension payload specifies lifetime or interim heap sizes. The vregion size is the sum of both. Store the vregion pointer in struct pipeline. In module_adapter_new_ext() resolve the pipeline pointer before module_adapter_mem_alloc() so the pipeline's vregion can be passed down. LL modules on a pipeline with a vregion use it as their allocation backend via vregion_get(), instead of the driver's default heap. DP modules continue to create their own per-module vregion. Call vregion_set_interim() for the pipeline vregion in pipeline_complete() to switch the allocator to interim mode after all lifetime allocations are done. Release the pipeline vregion with vregion_put() in ipc_pipeline_free() before calling pipeline_free(). Warn if the refcount does not reach zero, indicating a module still holds a reference. Also fix a pre-existing leak in module_adapter_mem_free(): always free the per-module mod_alloc_ctx for LL modules regardless of the vregion refcount, since alloc is allocated from the system heap, not from the vregion. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Replace the bare vregion pointer in struct pipeline with a mod_alloc_ctx object that bundles both the optional vregion and the heap pointer. The alloc context is always created in pipeline_new() and freed symmetrically in pipeline_free(), removing the separate cleanup that was in ipc_pipeline_free(). The pipeline object itself is now allocated through sof_ctx_zalloc() so it resides in the vregion when one is available, falling back to the default heap otherwise. LL modules share the pipeline's alloc context instead of only sharing the raw vregion pointer. A use_ppl_alloc flag gates the sharing to LL modules only, so DP modules continue to create their own vregion and alloc context as before. module_adapter_mem_free() detects whether a module's alloc belongs to its pipeline and either just releases the vregion reference (ppl_alloc case) or tears down the module's own alloc. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Refactor SRC and SRC-lite so that filter stage setup and delay line allocation happen during the module init callback instead of prepare. This ensures the bulk of SRC memory allocation occurs while the vregion allocator is still in its lifetime phase, before the interim heap is created. The allocations then persist across prepare/reset cycles without needing to be re-allocated each time. A new setup_stages() callback is added to struct comp_data, set by each variant (src.c, src_lite.c) to point at its own coefficient tables. The common src_allocate_delay_lines() is factored out of the old prepare path into src_common.c. For IPC4, src_init_stages() calls setup_stages() and src_allocate_delay_lines() at init time. The prepare path (src_prepare_do) only validates rates and sets downstream params. For IPC3, src_init_stages() is a no-op and src_prepare_do() retains the original behavior of doing full setup at prepare time, since IPC3 cannot be tested at this time. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
This is now more like a proof of concept, rather than a proper mergeable implementation.
For the time being the vregions is only used for DP modules, but if I have understood the plans correctly, eventually the LL pipelines should have their own vregions context that is used for all allocation related to the pipeline and modules in it (except the DP modules).
Before this PR the vregions lifetime memory is really not used at all. The only place where it could be used is an SRC DP module (SRC being the only DP capable module we have) , and this PR moves throws the balance of lifetime and interim allocs to the other end. After this PR all the allocations for DP SRC module is made from lifetime heap. SRC and DP infrastucture uses heap only for allocating memory for things that are needed for the whole lifetime of the module, and no interim memory is needed.
The things that I still plan to do for DP case:
As a follow up, take vregions also into use for complete LL pipelines. But before even starting this work we'd better get userspace LL merged.
FYI @lgirdwood , @lyakh , @kv2019i