[AI] Add host model cache API surface for 5.5.22#4325
Conversation
DBImpact Resolves: ZSTAC-81413 Change-Id: I6a7574656467636e686377756f716d67707a7063
XInfiniStorageController.blacklist was a no-op (// todo, comp.success()). When ExternalPrimaryStorageKvmFactory's beforeStartVmOnKvm fell back to blacklist after a deactivate failure, the no-op silently allowed the VM to start while a stale client on another host still held write access, risking a split-brain on the volume. 1. Why? xinfini does not implement volume path isolation yet. The blacklist no-op masked the missing capability and let the caller proceed as if the stale client had been fenced. 2. How? Throw OperationFailureException from blacklist with a clear message. Because beforeStartVmOnKvm invokes blacklist with NopeCompletion (which ignores fail()), only an exception can abort the VM start. This makes the unsupported case explicit and conservative until xinfini supports path isolation. 3. Side effects? VM start is aborted when an old active client cannot be deactivated on xinfini. This is the safe behavior; previously such VMs would start with split-brain risk. # Summary of changes (by module): - plugin/xinfini: throw OperationFailureException in blacklist instead of silently returning success. Related: ZSTAC-84963 Change-Id: Ib0092f39a7b23cf7b06442ac9616cb2ce39240b7 (cherry picked from commit 024c445)
Regenerated by ./runMavenProfile sdk from updated @APIParam.validValues on APIAddLabelToAlarmMsg and APIUpdateAlarmLabelMsg in premium repo (same Jira). Lets SDK clients pass RegexAgainst and NotEqual as the operator field for alarm labels, matching the server-side API interceptor. APIImpact: AddLabelToAlarm/UpdateAlarmLabel SDK Action now accepts RegexAgainst and NotEqual operator values. Resolves: ZSTAC-84454 Change-Id: I71636c616567697566786f667974766a6c757073
<fix>[xinfini]: fail blacklist to prevent split-brain on VM start See merge request zstackio/zstack!9797
<feature>[sdk]: regenerate SDK for RegexAgainst/NotEqual operators See merge request zstackio/zstack!9798
Existing host reconnect can fail before kvmagent echo. Ansible masks libvirt sockets with systemctl during deploy. If host systemd D-Bus is stuck, that optional step times out. Continue only for this known mask timeout on reconnect. New host deploy and other ansible failures still fail. Resolves: ZSTAC-77120 Change-Id: I0ef4a535065ff797c9e4cfae5b39c2daa321a4cc
fix(ZSTAC-77120): continue reconnect on systemd timeout See merge request zstackio/zstack!9810
Remove an unused MariaDB socket before zstack-server start. Restart MariaDB so MN can recover after power loss. Resolves: ZSTAC-83507 Change-Id: Ifb8255f7c9021ef12353f058a0230fe66989b590
…to avoid stale config cache Resolves: ZSTAC-79075 Change-Id: I6868776c0ea5ef4905ec409181fbf2a431f3034a
<feature>[utils]: Network group high availability strategy See merge request zstackio/zstack!9779
<feature>[conf]: recover stale mariadb socket See merge request zstackio/zstack!9809
<fix>[virtualRouter]: skip grayscale upgrade check on auto reconnect to avoid stale config cache See merge request zstackio/zstack!9814
Store VmModelMountVO lastAttachedEpoch in database so restore cleanup can distinguish successful asynchronous attach from stale failure callbacks.\n\nTested: docker verify-case runMavenProfile premium\nTested: docker verify-case VmModelMountCase Resolves: ZSTAC-84246 Change-Id: Ib426797059be9401c3e2556ecc31c1879dd04049
Resolves: ZSTAC-84919 Change-Id: Iff44de2bbb1dad50e75678180cb89d3430c562b1
<fix>[ai]: persist mount restore epoch schema See merge request zstackio/zstack!9833
<fix>[compute]: ZSTAC-84919 avoid stale iso detach NPE See merge request zstackio/zstack!9834
Move the lastAttachedEpoch schema change out of the 5.5.16 upgrade file by restoring that file to match the 5.5.16 release branch. The 5.5.22 upgrade file already carries the ADD_COLUMN migration, which keeps the change scoped to the target release. Constraint: 5.5.16 release schema must remain byte-for-byte aligned with upstream/5.5.16 Rejected: Leave the column in V5.5.16__schema.sql | would mutate an already released upgrade file Confidence: high Scope-risk: narrow Tested: git diff --exit-code upstream/5.5.16 -- conf/db/upgrade/V5.5.16__schema.sql Tested: git diff --exit-code upstream/5.5.22 -- conf/db/upgrade/V5.5.22__schema.sql Tested: git diff --check Resolves: ZSTAC-84246 Change-Id: Ibf18c6665d9d3c90361ed7f57d65fc609f6a1bfb
<fix>[ai]: keep 5.5.16 schema unchanged See merge request zstackio/zstack!9874
APIImpact Resolves: ZSTAC-85167 Change-Id: I6279726f7061786e6f6a66726368647074646c78
<fix>[loadBalancer]: support disabling UDP listener health check See merge request zstackio/zstack!9860
Ensure external primary storage heartbeat volumes in the KVM host connect path before host-storage status checks run. ZBS creates missing heartbeat volumes through the MN/controller side, including all configured pools. Host-side checks remain data-plane reads. Aggregate heartbeat ensure failures so reconnect diagnostics retain all failed primary storage causes. Resolves: ZSTAC-79201 Change-Id: I9b592bf4c48573f6e6951735eef4b39463b57fa6
SshShell.runCommand and runScript now inject ssh client timeout options
into the underlying ssh command line:
-o ConnectTimeout=10
-o ServerAliveInterval=5
-o ServerAliveCountMax=2
Subprocess self-exits in ~10-20s on unreachable or non-responsive
remote, instead of waiting forever in Process.waitFor (the underlying
ShellUtils.runAndReturn has no JVM side timeout).
The three values are tunable via setConnectTimeoutSeconds /
setServerAliveIntervalSeconds / setServerAliveCountMax. Defaults are
deliberately aggressive so existing call sites get protection without
any code change.
Background: in ZSTAC-72837 collect-log jstack, 222 KVMHost reconnect
flows (checkConnectConditions step check-if-host-can-reach-management
-node) had ssh subprocesses stuck in Process.waitFor for 30+ minutes,
exhausting the ThreadFacadeImpl worker pool and contributing to MN
unknown + VIP failover. host/ping HTTP itself already has a 10s
RESTFacade timeout; the gap was on the post ping-fail reconnect path
where SshShell.runCommand had no timeout. Adding ssh client side
timeouts breaks the indefinite wait regardless of upstream trigger
(slow remote sshd / DNS / PAM / TCP handshake).
Resolves: ZSTAC-72837
Change-Id: I1f999a8404352a178910920e92c56bbceb2c1e5f
<fix>[zbs]: ensure heartbeat before host check See merge request zstackio/zstack!9851
<fix>[utils]: SshShell add ssh client timeouts See merge request zstackio/zstack!9859
Squash selected ZCF commits from feature-5.5.22-zcf-temporary. Co-authored-by: shixin.ruan <shixin.ruan@zstack.io> Co-authored-by: shixin.ruan <shixin@shixin.ruan> Co-authored-by: zhangjianjun <jianjun.zhang@zstack.io> Resolves: ZCF-1936 Change-Id: I5560dc42173b48a8bad104543dcc18af4f2ba8a0
Resolves: ZCF-3009 Change-Id: I734d872432d519d4a04fade5d24a9eee040f160f
Add ZIAM_DEFAULT_SCOPE = "openid profile email" to OAuth2PluginConstants for use by IAM2RopcLoginBackend instead of hardcoded string. Resolves: ZCF-3009 Change-Id: I7638e4bb3231fa75b1ad3460863040d43970d9f5
1. Delete-via-POST: detect actionType=="Delete" with httpMethod==POST and generate PostWithRespKey with empty responseKey instead of Post with hardcoded "inventory" key. Handles URL placeholders via buildFullPath. 2. Get-with-inventories: when reply has "inventories" (List) instead of "inventory" (single), use the response struct directly with GetWithRespKey and empty responseKey to unmarshal the full response. Resolves: ZCF-0 Change-Id: I708245d6bd49172fd27488a506dec57d2bfd73ee
Port critical Go SDK generation fixes from MR !9249 (ZSV-11399): GoApiTemplate.groovy: - Add context.Context parameter to all generated Go method signatures - Add allTo response key extraction for PostWithRespKey - Add valid flag + isValid() for safe template initialization - Add reset() to prevent stale state across repeated SDK generation - Add getApiOptPath() for optional path handling - Fix pluralization: only replace y→ies after consonants - Pass allTo to GetWithSpec for correct response unwrapping GoInventory.groovy: - Add reset() + call GoApiTemplate.reset() at start of generation - Use template.isValid() instead of template.at != null - Fix time.Now → time.Now() in generated Go code (4 occurrences) - Remove deprecated generateClientFile() (~400 lines of dead code) base_param_types.go.template: - Fix time.Now → time.Now() (missing parentheses) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous reset() only cleared longJobMappings (static), leaving instance-level caches (allApiTemplates, generatedViewStructs, generatedParamStructs, generatedClientMethods, additionalClasses, etc.) intact across repeated generate() calls on the same instance. This caused skipped or duplicate output when TestGenerateGoSDK ran more than once in the same JVM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generate Java SDK action/result and ApiHelper entry for the model service launch command API.\n\nConstraint: API definitions live in premium but SDK/APIHelper output is committed in zstack.\nScope-risk: narrow\nTested: ./runMavenProfile sdk\nTested: ./runMavenProfile premium\nTested: ModelServiceCase Resolves: ZSTAC-84940 Change-Id: Ieaaee3e471e8c59052c35ff509ab4704244b9ab6
Type the generated launch command SDK fields and remove the redundant same-package import reported during MR review. Resolves: ZSTAC-84940 Change-Id: I59d7efaddfca3166f57c2b091adbdb2069f58fd7
Commit the output produced by ./runMavenProfile sdk so update_sdk verification stays clean. Resolves: ZSTAC-84940 Change-Id: I27eadd7fc19c37b70b00888a20b0fe1e805eb863
….5.28-aios' ZSTAC-84940 Add launch command SDK bindings See merge request zstackio/zstack!10166
The VM model-cache feature needs persistent schema rows and generated client surfaces before premium can expose scheduling and cache-management controls. Resolves: ZSTAC-0 Change-Id: I424f987899216872309b42439f8ba8a1353ae505
Host model cache identity and reservation lifecycle fields must be non-null to keep database uniqueness and capacity accounting consistent with the premium VO model. Resolves: ZSTAC-0 Change-Id: Ibbc18ef1436e3ec8366b38c055451fc610fff88a
host cache storage and policy rows are keyed by hostUuid/sourceRoot and are removed through host cascade, so the DDL now rejects null key components instead of relying on nullable unique-key behavior. Resolves: ZSTAC-0 Change-Id: If75425fc43b3ba407b807c7440fe1982b376cf9e
Regenerated ApiHelper after adding host model cache API actions. Resolves: ZSTAC-0 Change-Id: Ic04457599d0e3d3e59d26a7a22156b6785d3e913
Regenerated SDK in verify-case with the premium host model cache branch mounted so update_sdk stays clean. Tested: verify-case ./runMavenProfile premium Tested: verify-case ./runMavenProfile sdk Resolves: ZSTAC-0 Change-Id: I42c8e3c0edc320588097236dc8d207ebfcb8cdce
…ture-5.5.28-aios' [AI] Add host model cache API surface See merge request zstackio/zstack!10260
|
Warning
|
| Layer / File(s) | Summary |
|---|---|
数据库与公开类型 conf/db/upgrade/V5.5.22__schema.sql, conf/db/upgrade/V5.5.28__schema.sql, sdk/src/main/java/org/zstack/sdk/*, sdk/src/main/java/org/zstack/sdk/network/zns/*, sdk/src/main/java/SourceClassMap.java |
新增 AI 主机模型缓存、HA 网络组、dGPU、zoneUuid、ZNS 租户/路由/控制器等数据库表与公开 DTO、枚举、映射。 |
SDK action 与生成器 sdk/src/main/java/org/zstack/sdk/*Action.java, rest/src/main/resources/scripts/GoApiTemplate.groovy, rest/src/main/resources/scripts/GoInventory.groovy, rest/src/main/resources/scripts/SdkDataStructureGenerator.groovy, testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy, testlib/src/main/java/org/zstack/testlib/ApiHelperGenerator.groovy |
新增 AI 缓存、HA、dGPU、模型模板、ZNS 相关 action,并调整 Go/SDK 生成与测试封装。 |
ZNS/SDN 与虚拟机生命周期
| Layer / File(s) | Summary |
|---|---|
网卡生命周期与分配 compute/src/main/java/org/zstack/compute/vm/*, header/src/main/java/org/zstack/header/vm/* |
新增网卡生命周期接口、全局配置、管理器与桥接逻辑,并扩展分配、回收、挂载与状态变更流程。 |
ZNS/SDN Controller 与 API network/src/main/java/org/zstack/network/*, plugin/sdnController/src/main/java/org/zstack/sdnController/*, plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/*, plugin/kvm/src/main/java/org/zstack/kvm/*, plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/* |
新增 ZNS、SDN Controller、HA 网络组、负载均衡与相关拦截器、事件、状态枚举和 SDK 类型。 |
虚拟路由、KVM 与 DNS/MTU plugin/virtualRouterProvider/src/main/java/org/zstack/network/service/virtualrouter/*, plugin/kvm/src/main/java/org/zstack/kvm/*, network/src/main/java/org/zstack/network/l3/*, network/src/main/java/org/zstack/network/service/* |
虚拟路由灰度、KVM 重连/心跳/热插拔、DHCP/DNS/MTU、网络服务 attach 跳过逻辑等联动更新。 |
存储重初始化、恢复与快照
| Layer / File(s) | Summary |
|---|---|
数据卷重初始化与快照同步 storage/src/main/java/org/zstack/storage/*, plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/*, plugin/localstorage/src/main/java/org/zstack/storage/primary/local/*, plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/*, plugin/sharedMountPointPrimaryStorage/src/main/java/org/zstack/storage/primary/smp/*, simulator/simulatorImpl/src/main/java/org/zstack/simulator/storage/primary/* |
新增 ReInitDataVolume 消息、回调、拦截校验与快照/回滚/GC 同步处理。 |
启动恢复与安装依赖 conf/install/zstack-server, conf/tools/install.sh, core/src/main/java/org/zstack/core/rest/*, core/src/main/java/org/zstack/core/upgrade/UpgradeChecker.java, plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/*, plugin/expon/src/main/java/org/zstack/expon/ExponStorageController.java, plugin/xinfini/src/main/java/org/zstack/xinfini/*, plugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsStorageController.java |
新增 MariaDB 恢复、依赖预检、REST 失败注入与存储 blacklist/retry 行为调整。 |
文档与测试
| Layer / File(s) | Summary |
|---|---|
文档 docs/design/zns-pivot/*, docs/modules/network/* |
新增 ZNS pivot 设计、评审文档与网络资源文档条目。 |
测试 test/src/test/bash/*, test/src/test/groovy/* |
新增 MariaDB 恢复脚本测试、KVM first-connect drain 测试,以及 ISO 分离断言更新。 |
Review Effort
🎯 5 (Critical) | ⏱️ ~120 minutes
Poem
我是小兔来巡山,
一跳跳到 ZNS 边;
缓存、路由、网卡忙,
兔爪轻点卷回旋;
月光下,系统更稳 🐰
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 5.10% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | It clearly summarizes the main change: adding the host model cache API surface for 5.5.22. |
| Description check | ✅ Passed | The description is directly aligned with the PR, covering the host model cache backport and SDK generator fixes. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
sync/ye.zou/feature/ZSTAC-85984-host-cache-core-5.5.22-aios@@2
Warning
There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.
🔧 ast-grep (0.44.0)
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
plugin/flatNetworkProvider/src/main/java/org/zstack/network/service/flat/FlatDhcpBackend.java
[]
- 3 others
Comment @coderabbitai help to get the list of available commands.
|
Comment from ye.zou: Pushed follow-up fix Local verification with paired premium branch:
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@conf/db/upgrade/V5.5.22__schema.sql`:
- Around line 249-272: The uniqueness rule for AiHostCacheStorageVO currently
truncates sourceRoot to 255 characters, which can collide for valid long paths
and cause bad duplicate detection. Update the V5.5.22__schema.sql definition to
avoid using sourceRoot(255) in ukAiHostCacheStorageVOHostRoot; instead follow
the AiHostModelCachePolicyVO pattern by adding a sourceRootIdentity hash column
and building the unique key on that, or explicitly constrain sourceRoot to 255
if that is the intended limit. If this table can already exist in upgrade paths,
add the new column, backfill it from sourceRoot, drop the old unique index, and
create the new one in the upgrade-safe sequence.
In `@core/src/main/java/org/zstack/core/upgrade/UpgradeChecker.java`:
- Around line 330-345: In UpgradeChecker’s version update flow, the no-op check
only compares currentVersion and can skip updates when expectVersion has
changed, leaving stale target versions behind. Update the logic around the
agentVersionVO comparison so the early return in the version update path also
considers expectVersion, and when a change is needed, persist both expectVersion
and currentVersion together before dbf.update(agentVersionVO). Use the existing
UpgradeChecker method and agentVersionVO fields to locate the change.
In `@sdk/src/main/java/org/zstack/sdk/QueryAiHostModelCacheResult.java`:
- Around line 6-11: `QueryAiHostModelCacheResult` still exposes `inventories` as
a raw `List`, which breaks the typed SDK surface expected by
`QueryAiHostModelCacheAction` and
`ApiResult.getResult(QueryAiHostModelCacheResult.class)`. Update the
`inventories` field, its getter, and setter in `QueryAiHostModelCacheResult` to
use the concrete `AiHostModelCacheInventory` element type so deserialization
returns a stable typed collection. If this class is generated, also fix the
corresponding generator template or source annotation so regeneration preserves
the typed collection.
In `@sdk/src/main/java/org/zstack/sdk/UpdateAiHostModelCachePolicyAction.java`:
- Around line 28-29: The path parameter hostUuid in
UpdateAiHostModelCachePolicyAction is still allowed to be an empty string even
though it is inserted into the URL path, so tighten its validation to reject
empty values by changing the `@Param` constraint to a non-empty string check.
Update the hostUuid field annotation in UpdateAiHostModelCachePolicyAction so
callers are blocked before an invalid path is constructed.
In `@testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy`:
- Line 57785: queryAiHostModelCache currently calls a.conditions.collect without
guarding against null, which can trigger a NullPointerException when the closure
does not set conditions. Update the queryAiHostModelCache handling in
ApiHelper.groovy to follow the same null-safe pattern used by the other query
helper methods in this PR, so conditions are only collected when present and
otherwise left untouched.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 707d5bc2-1c48-483d-bba2-c71fd0cbe29b
📒 Files selected for processing (30)
conf/db/upgrade/V5.5.22__schema.sqlcore/src/main/java/org/zstack/core/upgrade/UpgradeChecker.javaheader/src/main/java/org/zstack/header/rest/SDKGeneric.javarest/src/main/resources/scripts/SdkDataStructureGenerator.groovysdk/src/main/java/SourceClassMap.javasdk/src/main/java/org/zstack/sdk/AiHostCacheStorageInventory.javasdk/src/main/java/org/zstack/sdk/AiHostCacheStorageStatus.javasdk/src/main/java/org/zstack/sdk/AiHostModelCacheFailureCode.javasdk/src/main/java/org/zstack/sdk/AiHostModelCacheFailurePhase.javasdk/src/main/java/org/zstack/sdk/AiHostModelCacheInventory.javasdk/src/main/java/org/zstack/sdk/AiHostModelCachePolicyInventory.javasdk/src/main/java/org/zstack/sdk/AiHostModelCacheStatus.javasdk/src/main/java/org/zstack/sdk/CleanAiHostModelCacheAction.javasdk/src/main/java/org/zstack/sdk/CleanAiHostModelCacheResult.javasdk/src/main/java/org/zstack/sdk/DatasetInventory.javasdk/src/main/java/org/zstack/sdk/GetAiHostModelCacheCapacityAction.javasdk/src/main/java/org/zstack/sdk/GetAiHostModelCacheCapacityResult.javasdk/src/main/java/org/zstack/sdk/ModelCenterInventory.javasdk/src/main/java/org/zstack/sdk/ModelInventory.javasdk/src/main/java/org/zstack/sdk/ModelServiceInstanceGroupInventory.javasdk/src/main/java/org/zstack/sdk/ModelServiceInventory.javasdk/src/main/java/org/zstack/sdk/QueryAiHostModelCacheAction.javasdk/src/main/java/org/zstack/sdk/QueryAiHostModelCacheResult.javasdk/src/main/java/org/zstack/sdk/RefreshAiHostModelCacheAction.javasdk/src/main/java/org/zstack/sdk/RefreshAiHostModelCacheResult.javasdk/src/main/java/org/zstack/sdk/UpdateAiHostModelCachePolicyAction.javasdk/src/main/java/org/zstack/sdk/UpdateAiHostModelCachePolicyResult.javasdk/src/main/java/org/zstack/sdk/UpdateModelCenterAction.javasdk/src/main/java/org/zstack/sdk/VmModelMountInventory.javatestlib/src/main/java/org/zstack/testlib/ApiHelper.groovy
| c.delegate = a | ||
| c() | ||
|
|
||
| a.conditions = a.conditions.collect { it.toString() } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win
queryAiHostModelCache 缺少 a.conditions 空值保护,存在 NPE 风险。
与本 PR 在 322-324 行(以及其它 query helper)已采用的空值保护写法不一致:当闭包未设置 conditions 时,a.conditions 为 null,直接调用 .collect 会抛出 NullPointerException。请与既有模式保持一致。
🛡️ 建议补充空值判断
- a.conditions = a.conditions.collect { it.toString() }
+ if (a.conditions != null) {
+ a.conditions = a.conditions.collect { it.toString() }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| a.conditions = a.conditions.collect { it.toString() } | |
| if (a.conditions != null) { | |
| a.conditions = a.conditions.collect { it.toString() } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy` at line 57785,
queryAiHostModelCache currently calls a.conditions.collect without guarding
against null, which can trigger a NullPointerException when the closure does not
set conditions. Update the queryAiHostModelCache handling in ApiHelper.groovy to
follow the same null-safe pattern used by the other query helper methods in this
PR, so conditions are only collected when present and otherwise left untouched.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@conf/db/upgrade/V5.5.22__schema.sql`:
- Around line 280-290: The upgrade logic for AiHostCacheStorageVO currently
drops the old unique key before ensuring the new one exists, which can leave the
schema in an unsafe intermediate state if the ALTER TABLE fails. In the
V5.5.22__schema.sql block around DELETE_INDEX and the dynamic creation of
ukAiHostCacheStorageVOHostRootIdentity, change the order so the new unique index
is created or verified first, and only then remove
ukAiHostCacheStorageVOHostRoot; keep the existing index-existence checks and
PREPARE/EXECUTE flow tied to the new index name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: bdc7ff21-031f-42a7-824a-28c852a9ed2f
📒 Files selected for processing (6)
conf/db/upgrade/V5.5.22__schema.sqlcore/src/main/java/org/zstack/core/upgrade/UpgradeChecker.javasdk/src/main/java/org/zstack/sdk/QueryAiHostModelCacheResult.javasdk/src/main/java/org/zstack/sdk/UpdateAiHostModelCachePolicyAction.javatestlib/src/main/java/org/zstack/testlib/ApiHelper.groovytestlib/src/main/java/org/zstack/testlib/ApiHelperGenerator.groovy
✅ Files skipped from review due to trivial changes (1)
- sdk/src/main/java/org/zstack/sdk/UpdateAiHostModelCachePolicyAction.java
🚧 Files skipped from review as they are similar to previous changes (1)
- sdk/src/main/java/org/zstack/sdk/QueryAiHostModelCacheResult.java
The VM model-cache feature needs persistent schema rows and generated client surfaces before premium can expose scheduling and cache-management controls. Resolves: ZSTAC-0 Change-Id: I424f987899216872309b42439f8ba8a1353ae505
Host model cache identity and reservation lifecycle fields must be non-null to keep database uniqueness and capacity accounting consistent with the premium VO model. Resolves: ZSTAC-0 Change-Id: Ibbc18ef1436e3ec8366b38c055451fc610fff88a
host cache storage and policy rows are keyed by hostUuid/sourceRoot and are removed through host cascade, so the DDL now rejects null key components instead of relying on nullable unique-key behavior. Resolves: ZSTAC-0 Change-Id: If75425fc43b3ba407b807c7440fe1982b376cf9e
Regenerated ApiHelper after adding host model cache API actions. Resolves: ZSTAC-0 Change-Id: Ic04457599d0e3d3e59d26a7a22156b6785d3e913
Add zoneUuid to ModelCenter-derived AI resource SDK inventories and update the 5.5.28 schema migration to persist, backfill, and constrain the zone relation for models, model services, datasets, and model service instance groups. When existing ModelCenter rows do not have zoneUuid, infer a historical default only from deployed inference service VMs and only when all observed VM zones under the same resource agree on one zone. This keeps ModelCenter as the binding point for new behavior while avoiding an arbitrary zone choice during upgrade. Constraint: Source branch must end with @@2 for the linked MR set Rejected: Default to the first ZoneEO row | arbitrary and can bind AI resources to the wrong physical zone Rejected: Leave all historical derived resources NULL when deployed inference VMs reveal a single zone | loses a reliable existing placement signal Confidence: high Scope-risk: moderate Directive: Keep migration inference conservative; do not backfill from mixed-zone deployed services without an explicit product rule Tested: Docker verify container ./runMavenProfile premium Tested: Docker verify container ./runMavenProfile sdk Tested: Restored 172.20.1.159 MySQL backup 2026-06-03_14-30-01 into MySQL 5.7 and ran beforeMigrate.sql plus V5.5.28__schema.sql twice Resolves: ZSTAC-75429 Change-Id: If0a2176680abb9911d63b0281779616215d9a787
Store VmModelMountVO lastAttachedEpoch in database so restore cleanup can distinguish successful asynchronous attach from stale failure callbacks.\n\nTested: docker verify-case runMavenProfile premium\nTested: docker verify-case VmModelMountCase Resolves: ZSTAC-84246 Change-Id: Ib426797059be9401c3e2556ecc31c1879dd04049
Move the lastAttachedEpoch schema change out of the 5.5.16 upgrade file by restoring that file to match the 5.5.16 release branch. The 5.5.22 upgrade file already carries the ADD_COLUMN migration, which keeps the change scoped to the target release. Constraint: 5.5.16 release schema must remain byte-for-byte aligned with upstream/5.5.16 Rejected: Leave the column in V5.5.16__schema.sql | would mutate an already released upgrade file Confidence: high Scope-risk: narrow Tested: git diff --exit-code upstream/5.5.16 -- conf/db/upgrade/V5.5.16__schema.sql Tested: git diff --exit-code upstream/5.5.22 -- conf/db/upgrade/V5.5.22__schema.sql Tested: git diff --check Resolves: ZSTAC-84246 Change-Id: Ibf18c6665d9d3c90361ed7f57d65fc609f6a1bfb
Tighten host cache SDK typing, client-side watermark validation, and null-safe query helper conditions. Resolves: ZSTAC-85984 Change-Id: I86b1215f6698decbe9dfdb0411ba39f42de36c13
Host cache replies now opt in to typed collection generation through SDKGeneric so generated SDK output remains repeatable. The generated action keeps validation on the server side instead of hand-written SDK code that runMavenProfile sdk overwrites. Resolves: ZSTAC-85984 Change-Id: Ic7d27efe4a1551802f725560b3406544b78efa69
Use a stable sourceRootIdentity for host cache policy uniqueness and backfill existing rows during upgrade. Resolves: ZSTAC-85984 Change-Id: I4ada66f7436c1c120a06adbca4128e99065bb03c
Update the persisted currentVersion from the agent-reported version instead of treating equal expected/current database values as unchanged. This keeps grayscale upgrade checks aligned with older agents reported during host ping.\n\nTested: ./runstablilitycase org.zstack.test.integration.premium.kvm.upgrade.GrayscaleUpgradeCase 1 in verify-host-cache-5522-rebuild2 after ./runMavenProfile premium. Resolves: ZSTAC-85984 Change-Id: Ibdff1c5ab2740ce98ba47bbee1be874dfa564e96
Backport host cache source-root identity handling and regenerate SDK/apihelper output. Resolves: ZSTAC-85984 Change-Id: Ia3a10f087b26a8fa6eaa3513bca04af96f3a8757
Create the new host cache storage sourceRootIdentity unique index before dropping the legacy sourceRoot-prefix index so a failed ALTER cannot leave the upgrade without either uniqueness guard. Resolves: ZSTAC-85984 Change-Id: Ibee237a7062337f9b3bf64dd8450dad26a315822
Regenerate ApiHelper on feature-5.5.28 after rebasing host cache API changes so query helper condition handling stays in sync with ApiHelperGenerator and duplicate host-cache helpers are not carried forward. Resolves: ZSTAC-85984 Change-Id: If9c1d33691a9ca7a146855d18158b4af5f43dd06
12447c7 to
e41dcac
Compare
There was a problem hiding this comment.
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java (1)
254-277: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
trackHost()和stop()之间仍有竞态。
stopped只在 Line 255 做了一次无锁检查,而 Line 436-438 的取消/清理也没有和 tracker 的创建注册串行化。一个线程如果在这次检查之后继续执行,另一个线程同时调用stop(),新的Tracker仍然可能在组件停止后被put并启动,导致后台继续 ping host。建议用同一把锁保护stopped检查、trackers.put()/start*()和stop()的清理流程。Also applies to: 435-438
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java` around lines 254 - 277, `trackHost()` still races with `stop()`: the `stopped` check, `trackers.put()/start*()`, and the cleanup in `stop()` are not serialized, so a new `Tracker` can be created and started after shutdown begins. Protect the `trackHost` registration/start path and the `stop()` cancellation/cleanup path with the same lock or synchronized section, using the existing `stopped` field, `trackers` map, and `Tracker.start()/startRightNow()/cancel()` flow to ensure no tracker is added or started once stopping begins.plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java (1)
378-423: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win不要在异步
deactivate()调用后立刻判断deactivateErr[0]。
nodeSvc.deactivate(...)通过Completion返回结果,但下面马上读取deactivateErr[0]。只要实现是真异步的,这里就看不到失败,blacklist(...)根本不会执行;而且这个共享数组在多个 client 之间也会互相覆盖。应把 blacklist 逻辑放进fail(ErrorCode)回调里,或改成显式同步流程。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java` around lines 378 - 423, The deactivate-and-blacklist flow in ExternalPrimaryStorageKvmFactory is reading deactivateErr[0] immediately after nodeSvc.deactivate(...), which is unsafe for a Completion-based async call and can also be overwritten across clients. Move the blacklist decision into the Completion.fail(ErrorCode) callback for that deactivate invocation, and keep the success logging in Completion.success(); use the existing nodeSvc.deactivate, nodeSvc.blacklist, and deactivateErr handling area as the main location to update.
🟡 Minor comments (13)
plugin/xinfini/src/main/java/org/zstack/xinfini/XInfiniStorageController.java-1121-1138 (1)
1121-1138: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win不要吞掉线程中断。
Line 1130-1132 捕获
InterruptedException后直接忽略,会清掉中断语义并继续重试;停机、取消任务或线程池回收时,这段代码会比预期多执行几轮。这里至少应恢复中断标记并立即退出重试。建议修改
try { TimeUnit.SECONDS.sleep(3); - } catch (InterruptedException ignore) {} + } catch (InterruptedException ie) { + Thread.currentThread().interrupt(); + throw new RuntimeException("retry interrupted", ie); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/xinfini/src/main/java/org/zstack/xinfini/XInfiniStorageController.java` around lines 1121 - 1138, The retryOrThrow method is swallowing thread interruption when TimeUnit.SECONDS.sleep(3) throws InterruptedException. Update the InterruptedException handling inside retryOrThrow to restore the interrupt status on the current thread and stop retrying immediately instead of continuing the loop. Keep the fix localized to retryOrThrow in XInfiniStorageController so the retry logic respects cancellation and shutdown.sdk/src/main/java/org/zstack/sdk/network/zns/QueryZnsTenantRouterResult.java-6-11 (1)
6-11: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win将
inventories改为带元素类型的集合
inventories现在仍是 rawList,SDK 侧会丢失元素类型信息;这里应生成List<ZnsTenantRouterInventory>,避免公共契约退化成非类型化集合。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/src/main/java/org/zstack/sdk/network/zns/QueryZnsTenantRouterResult.java` around lines 6 - 11, The QueryZnsTenantRouterResult.inventories field and its getInventories/setInventories accessors are using a raw List, which drops element type information in the SDK contract. Update the inventories declaration and both accessor signatures in QueryZnsTenantRouterResult to use List<ZnsTenantRouterInventory>, keeping the same property name but preserving the typed inventory element throughout the public API.sdk/src/main/java/org/zstack/sdk/GetModelServiceLaunchCommandsResult.java-6-11 (1)
6-11: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win把
commands改成带泛型的集合。sdk/src/main/java/org/zstack/sdk/GetModelServiceLaunchCommandsResult.java里这里还是原始List,SDK 使用方拿不到元素类型;如果返回项都是ModelServiceLaunchCommandInventory,应改成List<ModelServiceLaunchCommandInventory>,避免后续强转和反序列化歧义。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/src/main/java/org/zstack/sdk/GetModelServiceLaunchCommandsResult.java` around lines 6 - 11, Update GetModelServiceLaunchCommandsResult so the commands field and its getter/setter use a typed List<ModelServiceLaunchCommandInventory> instead of a raw List. Fix the declaration and method signatures in the GetModelServiceLaunchCommandsResult class, keeping the same property name but adding the generic element type so SDK consumers can rely on the returned inventory type without casts.header/src/main/java/org/zstack/header/network/sdncontroller/SdnControllerInventory.java-62-82 (1)
62-82: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win补上无端口方括号 IPv6 的归一化分支。
toHostAddress()现在只处理了[ipv6]:port,但像[2001:db8::1]这样的值会直接落到return value,最后把方括号一并塞进ip。Line 41 已经把所有vo.getIp()都改成走这个方法了,所以一旦库里存在这种端点,返回给上层的就不是纯 host address。💡 建议修改
private static String toHostAddress(String endpoint) { if (endpoint == null) { return null; } String value = endpoint.trim(); if (value.startsWith("[")) { int rightBracket = value.indexOf(']'); - if (rightBracket > 1 && rightBracket + 1 < value.length() && value.charAt(rightBracket + 1) == ':') { + if (rightBracket > 1 && (rightBracket == value.length() - 1 + || (rightBracket + 1 < value.length() && value.charAt(rightBracket + 1) == ':'))) { return value.substring(1, rightBracket); } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/network/sdncontroller/SdnControllerInventory.java` around lines 62 - 82, `toHostAddress()` in `SdnControllerInventory` only normalizes bracketed IPv6 when a port is present, so bracketed IPv6 without a port still returns with brackets. Add a branch in `toHostAddress(String endpoint)` to detect values like `[2001:db8::1]` and strip the surrounding brackets before the fallback return, while keeping the existing `[ipv6]:port` and single-port IPv4 logic unchanged. Ensure all callers that now route `vo.getIp()` through `toHostAddress()` receive a plain host address.test/src/test/groovy/org/zstack/test/integration/image/DeleteIsoCase.groovy-175-180 (1)
175-180: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win先把
iso2挂到newVm2,再验证卸载行为。当前
newVm2只挂载了iso1,随后卸载未挂载的iso2;这既可能导致测试失败,也没有覆盖“卸载一个 ISO 后另一个仍保留”的目标场景。建议修改
attachIsoToVmInstance { vmInstanceUuid = newVm2.uuid isoUuid = iso1.uuid } + + attachIsoToVmInstance { + vmInstanceUuid = newVm2.uuid + isoUuid = iso2.uuid + } detachIsoFromVmInstance { vmInstanceUuid = newVm2.uuid isoUuid = iso2.uuid🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/src/test/groovy/org/zstack/test/integration/image/DeleteIsoCase.groovy` around lines 175 - 180, The test in DeleteIsoCase.groovy is detaching iso2 from newVm2 without ever attaching iso2 first, so the scenario is invalid and does not verify the intended behavior. Update the setup around detachIsoFromVmInstance to first attach iso2 to newVm2, then detach iso2, and keep the assertion using IsoOperator.getIsoUuidByVmUuid to confirm the remaining ISO is still present on newVm2.docs/modules/network/pages/networkResource/ZnsIntegration.adoc-700-706 (1)
700-706: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win版本矩阵里的几个路径和正文定义不一致。
这里把
DELETE /zns/api/v1/segments、PATCH /zns/api/v1/segments/{id}/ports、DELETE /zns/api/v1/segments/{id}/ports写成了不带资源 ID 的形式,但文档前面和后面的示例都在使用/segments/{uuid}、/ports/{id}。按这张表实现会直接调用错接口。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/modules/network/pages/networkResource/ZnsIntegration.adoc` around lines 700 - 706, The version matrix entries for the segment and ports endpoints are inconsistent with the rest of the document and should use the same resource-id patterns as the surrounding examples. Update the affected rows in ZnsIntegration.adoc so the DELETE/PATCH ports and DELETE segments paths match the documented endpoint forms used by the API examples, and verify the version columns still align with the intended operations. Use the endpoint labels in the matrix itself to locate the mismatched entries.plugin/kvm/src/main/java/org/zstack/kvm/KVMHotUnplugVmShmemMsg.java-18-20 (1)
18-20: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win不要把
null留给总线路由阶段再报错。
normalize()现在只拒绝空白字符串,不拒绝null,所以hostUuid/vmUuid这类必填标识仍然可以写成null。对HostMessage来说,这会把错误延后到消息路由时才暴露,和这里已经引入的输入校验策略也不一致。🛠️ 建议修改
public void setHostUuid(String hostUuid) { - this.hostUuid = normalize(hostUuid, "hostUuid"); + this.hostUuid = requireNonBlank(hostUuid, "hostUuid"); } @@ public void setVmUuid(String vmUuid) { - this.vmUuid = normalize(vmUuid, "vmUuid"); + this.vmUuid = requireNonBlank(vmUuid, "vmUuid"); } @@ + private String requireNonBlank(String value, String fieldName) { + if (value == null) { + throw new IllegalArgumentException(fieldName + " cannot be null"); + } + return normalize(value, fieldName); + } + private String normalize(String value, String fieldName) { - if (value == null) { - return null; - } - String normalized = value.trim(); if (normalized.isEmpty()) { throw new IllegalArgumentException(fieldName + " cannot be empty"); } return normalized;Also applies to: 26-28, 57-65
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHotUnplugVmShmemMsg.java` around lines 18 - 20, `KVMHotUnplugVmShmemMsg` and the other affected setters are still allowing `null` for required identifiers because `normalize()` only rejects blank strings. Update the input validation in the setters for `hostUuid` and `vmUuid` so they fail immediately on `null` as well as empty/blank values, rather than deferring the error to `HostMessage` routing. Keep the existing validation pattern in the message setters consistent with the rest of the class.plugin/kvm/src/main/java/org/zstack/kvm/KVMHotPlugVmShmemMsg.java-18-20 (1)
18-20: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win不要把
null留给总线路由阶段再报错。
normalize()现在只拒绝空白字符串,不拒绝null,所以hostUuid/vmUuid这类必填标识仍然可以写成null。对HostMessage来说,这会把错误延后到消息路由时才暴露,和这里已经引入的输入校验策略也不一致。🛠️ 建议修改
public void setHostUuid(String hostUuid) { - this.hostUuid = normalize(hostUuid, "hostUuid"); + this.hostUuid = requireNonBlank(hostUuid, "hostUuid"); } @@ public void setVmUuid(String vmUuid) { - this.vmUuid = normalize(vmUuid, "vmUuid"); + this.vmUuid = requireNonBlank(vmUuid, "vmUuid"); } @@ + private String requireNonBlank(String value, String fieldName) { + if (value == null) { + throw new IllegalArgumentException(fieldName + " cannot be null"); + } + return normalize(value, fieldName); + } + private String normalize(String value, String fieldName) { - if (value == null) { - return null; - } - String normalized = value.trim(); if (normalized.isEmpty()) { throw new IllegalArgumentException(fieldName + " cannot be empty"); } return normalized;Also applies to: 26-28, 57-65
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHotPlugVmShmemMsg.java` around lines 18 - 20, `KVMHotPlugVmShmemMsg` 的必填标识校验还允许 `null` 通过,导致 `hostUuid`/`vmUuid` 之类的错误延后到消息路由阶段才暴露;请在 `normalize()` 以及相关 setter(如 `setHostUuid`、`setVmUuid`)里同时拒绝 `null` 和空白值,并确保所有必填字段在 `HostMessage` 构造/赋值时就被校验一致地失败。test/src/test/groovy/org/zstack/test/integration/kvm/host/HostAgentRestartDrainCase.groovy-86-115 (1)
86-115: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win失败路径也要保证释放 stuck 调用。
这 3 个用例都是在断言之后才
countDown()。一旦任一断言失败,模拟器侧会继续阻塞到 120 秒超时,既拖慢失败用例,也会污染后续断言。这里最好统一用try/finally释放。建议模式
void testFirstConnectDrainsStaleCalls() { StuckCall h = sendStuckCall(host.uuid) - overrideConnectResponse(true, System.currentTimeMillis() + 70_000L) - reconnectHost { - uuid = host.uuid - } - assert h.done.await(5, TimeUnit.SECONDS) - assert h.received.get() != null - assert h.received.get().code == SysErrors.OPERATION_ERROR.toString() - h.release.countDown() + try { + overrideConnectResponse(true, System.currentTimeMillis() + 70_000L) + reconnectHost { + uuid = host.uuid + } + assert h.done.await(5, TimeUnit.SECONDS) + assert h.received.get() != null + assert h.received.get().code == SysErrors.OPERATION_ERROR.toString() + } finally { + h.release.countDown() + } }其余两个测试方法建议套用同样模式。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/src/test/groovy/org/zstack/test/integration/kvm/host/HostAgentRestartDrainCase.groovy` around lines 86 - 115, These test methods leave the stuck-call release in the success path only, so a failed assertion can keep the simulator blocked until timeout. Update testFirstConnectDrainsStaleCalls, testNoFirstConnectKeepsStaleCalls, and testDifferentResourceUuidIsolated to wrap the assertions in try/finally and always invoke h.release.countDown() in the finally block, so the StuckCall is released even when a check fails.test/src/test/bash/test_zstack_server_mariadb_recovery.sh-108-112 (1)
108-112: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win这里不要把
PATH整个替换掉。这两个场景把
PATH设成只包含 stub 目录,conf/install/zstack-server里只要再调用一个未 stub 的系统命令,测试就会因为环境缺失失败,而不是因为 MariaDB 恢复逻辑失败。这里更稳妥的是前置 stub 目录:PATH="$sysv_bin_dir:$PATH"/PATH="$no_probe_bin_dir:$PATH"。Also applies to: 138-142
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/src/test/bash/test_zstack_server_mariadb_recovery.sh` around lines 108 - 112, These test invocations are replacing PATH entirely with only the stub directory, which can make the MariaDB recovery tests fail due to missing system commands instead of exercising the recovery logic. Update the start calls in the bash test to prepend the stub directory to the existing PATH in both affected scenarios, using the same pattern around the /bin/sh invocation so conf/install/zstack-server still has access to normal system utilities while picking up the stubs first.docs/modules/network/pages/networkResource/ZStackL2NetworkType.adoc-451-452 (1)
451-452: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
srcPath规则这里自相矛盾。Line 451-452 说
OVN_DPDK和ZNS都会设置srcPath,但 Line 478-480 又写L2GeneveNetwork + ZNS不设置且“仅 OVN_DPDK 设置”。这两处至少有一处不对,建议统一条件,并顺手说明 Geneve 是否是例外路径。Also applies to: 478-480
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/modules/network/pages/networkResource/ZStackL2NetworkType.adoc` around lines 451 - 452, The `srcPath` behavior described in `completeNicInformation()` is inconsistent between the `OVN_DPDK`/`ZNS` note and the `L2GeneveNetwork` exception note. Update the documentation so the condition is stated once, consistently, using the `completeNicInformation()` and `srcPath` descriptions to make clear whether `ZNS` also sets the DPDK vhost-user socket path or is excluded alongside `L2GeneveNetwork`; explicitly call out the Geneve case as an exception if applicable.docs/modules/network/pages/networkResource/ZStackL2NetworkType.adoc-34-41 (1)
34-41: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win文档里的 VmNicType 名称前后不一致。
Line 34-41 的示例一直写
VIRTUAL_NIC,但 Line 343-371 的总表又写成了VNIC/TFVNIC。除非这里想表达“代码常量名”和“展示名”两套概念,否则读者会直接把它们理解成不同类型,排障时也容易和实际 type 字符串对不上。Also applies to: 343-371
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/modules/network/pages/networkResource/ZStackL2NetworkType.adoc` around lines 34 - 41, The VmNicType naming is inconsistent across the document, with one section using VIRTUAL_NIC while the summary table uses VNIC/TFVNIC, which makes the types look different. Update the ZStackL2NetworkType.adoc content so the same VmNicType identifiers are used consistently in both the examples and the total table, or explicitly label one set as display names and the other as code constants. Check the network type examples and the summary table entries together to ensure the names map unambiguously to the same underlying type.header/src/main/java/org/zstack/header/vm/VmNicLifecycleExtensionPoint.java-38-41 (1)
38-41: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win给
setupOnHost(VmNicLifecycleContext, ...)补上 Javadoc。这个默认方法已经是扩展点公开契约的一部分,但现在是本接口里唯一没有方法注释的入口,后续实现方很难知道
context的用途以及它和旧重载的关系。As per path instructions,接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/vm/VmNicLifecycleExtensionPoint.java` around lines 38 - 41, The default `setupOnHost(VmNicLifecycleContext, String, List<VmNicInventory>, Completion)` method in `VmNicLifecycleExtensionPoint` needs a proper Javadoc comment and should clearly explain `context` plus its delegation relationship to the older `setupOnHost(hostUuid, nics, completion)` overload. Update the method-level documentation on this entry point so the interface contract is explicit, and keep the signature free of any unnecessary modifiers beyond `default`.Source: Path instructions
🧹 Nitpick comments (13)
header/src/main/java/org/zstack/header/network/l3/AfterSetL3NetworkMtuExtensionPoint.java (1)
5-6: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win给扩展点方法补上 Javadoc。
这里的接口方法现在没有文档,扩展实现方无法从声明处确认
completion的回调时机和失败语义。As per path instructions, “接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释”。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/network/l3/AfterSetL3NetworkMtuExtensionPoint.java` around lines 5 - 6, The AfterSetL3NetworkMtuExtensionPoint contract is missing Javadoc on afterSetL3NetworkMtu, so add a valid doc comment on the interface method explaining the purpose, the callback timing for completion, and how failures should be reported. Make sure the declaration stays clean for this extension-point style by avoiding any redundant modifiers and keeping the signature centered on AfterSetL3NetworkMtuExtensionPoint.afterSetL3NetworkMtu.Source: Path instructions
plugin/kvm/src/main/java/org/zstack/kvm/VmEventAlarmHandlerExtensionPoint.java (1)
3-4: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win给扩展点方法补上 Javadoc。
这个接口是跨实现类的扩展契约,当前方法缺少文档,事件来源、允许抛错方式和调用方预期都不够清晰。As per path instructions, “接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释”。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/kvm/src/main/java/org/zstack/kvm/VmEventAlarmHandlerExtensionPoint.java` around lines 3 - 4, The extension-point contract in VmEventAlarmHandlerExtensionPoint needs documentation and a clean interface signature: remove any redundant method modifier on handleVmEventAlarm, and add a proper Javadoc comment that explains the event source, expected caller behavior, and whether implementations may throw exceptions. Keep the method easy to locate by name and ensure the interface method follows the path rule requiring valid Javadoc.Source: Path instructions
plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java (1)
15-28: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win给新增接口方法补齐 Javadoc。
resolveStatus(...)和changeSdnControllerStatus(...)都是新的公共接口,但这里没有方法级 Javadoc。建议把事件到状态的映射规则、default分支语义,以及changeSdnControllerStatus()是否负责持久化写清楚。 As per path instructions, “接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java` around lines 15 - 28, 当前新增的公共接口方法缺少方法级 Javadoc,需在 SdnControllerFactory 中为 resolveStatus(...) 和 changeSdnControllerStatus(...) 补齐注释;请说明 resolveStatus(...) 各事件到 SdnControllerStatus 的映射规则、default 分支返回 vo.getStatus() 的语义,以及 changeSdnControllerStatus(...) 的职责是否包含状态持久化/更新流程,同时保持接口方法不使用多余修饰符。Source: Path instructions
header/src/main/java/org/zstack/header/vm/BeforeAllocateVmNicExtensionPoint.java (1)
12-13: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win给扩展点方法补上方法级 Javadoc。
Line 13 是新的公共扩展回调,但现在只有类型级说明。这里至少要把调用时机、
Completion的回调要求,以及失败后的回滚语义写到方法 Javadoc 里,否则实现方很难准确遵守契约。 As per path instructions, “接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/vm/BeforeAllocateVmNicExtensionPoint.java` around lines 12 - 13, `BeforeAllocateVmNicExtensionPoint.beforeAllocateVmNic` is a new public extension callback that needs method-level Javadoc and should not declare a redundant `public` modifier. Add Javadoc on `beforeAllocateVmNic(VmNicInventory, VmInstanceSpec, Completion)` describing when it is invoked, how implementers must complete the `Completion`, and what rollback/failure behavior is expected if the callback errors. Keep the interface declaration aligned with the contract by removing any unnecessary method modifier and ensuring the method is documented.Source: Path instructions
compute/src/main/java/org/zstack/compute/vm/VmNicManager.java (1)
24-24: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win给新重载补上 Javadoc。
Line 24 新增的是接口契约方法,但这里没有说明
vmSystemTags的语义,以及它和双参重载在标签缺失时的回退关系,后续实现和调用很容易各自理解。建议把参数用途、优先级和返回约定写进 Javadoc。
As per path instructions, 接口方法必须配有有效的 Javadoc 注释。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmNicManager.java` at line 24, The new VmNicManager.getVmNicType overload is missing the required interface Javadoc and the contract for vmSystemTags is unclear. Add Javadoc on this method describing the purpose of vmSystemTags, how it is used to determine the NIC type, and the fallback behavior relative to the two-argument getVmNicType overload when tags are absent or do not match. Keep the return contract explicit so implementers and callers understand the precedence and expected result.Source: Path instructions
network/src/main/java/org/zstack/network/l2/L2NetworkApiInterceptor.java (1)
146-279: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win把
validate(APIChangeL2NetworkVlanIdMsg)按目标类型拆开。这一段现在同时承载了 type 归一化、宿主机/集群前置检查、VLAN/NoVlan 冲突校验,以及 Geneve/Vxlan 的 VNI 校验,分支已经偏深;后面再加一种类型时很容易改漏。建议在归一化
targetType后尽早返回,并拆成validateVlanTarget(...)、validateNoVlanTarget(...)、validateOverlayTarget(...)之类的私有方法。As per path instructions, “应尽量减少 if...else 结构的使用,建议尽早返回(Early Return),将条件判断中的处理逻辑提前结束或抽成独立方法。”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@network/src/main/java/org/zstack/network/l2/L2NetworkApiInterceptor.java` around lines 146 - 279, `validate(APIChangeL2NetworkVlanIdMsg)` is doing too many type-specific checks in one deeply nested flow; split it into small private helpers keyed off `targetType`. After normalizing `msg.getType()` and determining `targetIsVlan`, `targetIsNoVlan`, `targetIsGeneve`, and `targetIsVxlan`, use early returns and delegate to methods like `validateVlanTarget(...)`, `validateNoVlanTarget(...)`, and `validateOverlayTarget(...)` to keep the host/cluster checks and VNI/VLAN validations isolated. Keep the existing behavior in `L2NetworkApiInterceptor.validate(...)`, but reduce the `if/else` chain so future type additions only require touching one focused helper.Source: Path instructions
header/src/main/java/org/zstack/header/storage/addon/primary/PrimaryStorageNodeSvc.java (1)
16-16: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win请补充
blacklist的接口契约说明。这里移除了
Completion,调用方需要明确该方法现在是同步完成、异步触发还是 best-effort;同时接口方法按规范需要有效 Javadoc。建议补充说明
+ /** + * Adds the volume path to the blacklist on the specified host. + */ void blacklist(String installPath, String protocol, HostInventory h);As per path instructions,
接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/storage/addon/primary/PrimaryStorageNodeSvc.java` at line 16, The blacklist method in PrimaryStorageNodeSvc lacks the required interface contract documentation after removing Completion, so add a valid Javadoc to blacklist(String installPath, String protocol, HostInventory h) that clearly states whether it is synchronous, asynchronous, or best-effort and describes its behavior. Also ensure the interface declaration follows the style rule for interface methods by keeping it without extra modifiers such as public.Source: Path instructions
plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerL2.java (1)
41-41: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win请为新的默认扩展点补充 Javadoc。
releaseNicIps的默认 no-op 行为会影响厂商实现是否必须 override,建议在接口上写清楚默认语义和适用场景。As per path instructions,
接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerL2.java` at line 41, The new default extension method releaseNicIps in SdnControllerL2 needs Javadoc and should clearly document that the default implementation is a no-op that immediately succeeds, so vendor implementations know when overriding is optional. Update the interface-level documentation around releaseNicIps to describe its default semantics and intended usage, and ensure the method signature follows interface conventions without any unnecessary modifiers.Source: Path instructions
plugin/kvm/src/main/java/org/zstack/kvm/VmNicLifecycleKvmBridge.java (1)
27-30: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win请避免新增缩写成员名。
pluginRgty和thdf是不必要缩写,建议改为pluginRegistry、threadFacade,避免新代码继续扩散既有缩写风格。建议重命名
`@Autowired` - private PluginRegistry pluginRgty; + private PluginRegistry pluginRegistry; `@Autowired` - private ThreadFacade thdf; + private ThreadFacade threadFacade; private List<VmNicLifecycleExtensionPoint> getExtensions() { - return pluginRgty.getExtensionList(VmNicLifecycleExtensionPoint.class); + return pluginRegistry.getExtensionList(VmNicLifecycleExtensionPoint.class); }- ThreadFacadeImpl.TimeoutTaskReceipt receipt = thdf.submitTimeoutTask(() -> { + ThreadFacadeImpl.TimeoutTaskReceipt receipt = threadFacade.submitTimeoutTask(() -> {As per path instructions,
不允许使用不必要的缩写.Also applies to: 32-34, 80-80
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/kvm/src/main/java/org/zstack/kvm/VmNicLifecycleKvmBridge.java` around lines 27 - 30, The issue is unnecessary abbreviated member names in VmNicLifecycleKvmBridge. Rename the injected fields and any related usages from pluginRgty and thdf to clearer names like pluginRegistry and threadFacade, and update the affected references in the constructor and methods so the class avoids propagating shorthand naming.Source: Path instructions
plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerConstants.java (1)
32-32: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win修正常量名中的拼写错误,避免扩散公共 API。
新增常量使用了
PROTOCL,建议合入前改为HEALTH_CHECK_TARGET_PROTOCOL_NONE;如需兼容既有拼写风格,可额外保留 deprecated alias。As per path instructions, “常量命名:全部大写,使用下划线分隔单词。要求表达清楚,避免使用含糊或不准确的名称。”建议修改
- public static final String HEALTH_CHECK_TARGET_PROTOCL_NONE = "none"; + public static final String HEALTH_CHECK_TARGET_PROTOCOL_NONE = "none"; ... - HEALTH_CHECK_TARGET_PROTOCOLS.add(HEALTH_CHECK_TARGET_PROTOCL_NONE); + HEALTH_CHECK_TARGET_PROTOCOLS.add(HEALTH_CHECK_TARGET_PROTOCOL_NONE);Also applies to: 135-135
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerConstants.java` at line 32, The constant name in LoadBalancerConstants contains a spelling error, so update the public API symbol HEALTH_CHECK_TARGET_PROTOCL_NONE to the correctly spelled HEALTH_CHECK_TARGET_PROTOCOL_NONE. If backward compatibility is needed, keep the old name as a deprecated alias that points to the new constant so callers can migrate without breakage.Source: Path instructions
plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfPortService.java (1)
123-126: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win把参数名统一成
portUuid。新重载继续用了
portUUid,不符合这里要求的lowerCamelCase,也会把这套拼写继续扩散到新的公共签名里。这个位置现在改名几乎没有成本,建议直接统一。As per path instructions, “方法名、参数名、成员变量和局部变量使用 lowerCamelCase 风格。”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfPortService.java` around lines 123 - 126, The deleteTfPort API still uses the misspelled parameter name portUUid instead of lowerCamelCase portUuid, and this inconsistent public signature will propagate the typo. Rename the parameter in TfPortService.deleteTfPort to portUuid, update the local usage in StringDSL.transToTfUuid, and keep the surrounding method signature and any overloads aligned with the same naming convention.Source: Path instructions
header/src/main/java/org/zstack/header/rest/RESTFacade.java (1)
42-42: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win给新接口补一段 Javadoc。
这里是接口方法,当前没有说明
cutoffMillis的时间基准、返回值语义,以及err会如何传播,后续实现和调用都容易各自理解。As per path instructions, "接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/rest/RESTFacade.java` at line 42, Add a Javadoc block for RESTFacade.failPendingCallsForResourceBefore that explains the cutoffMillis time basis, the return value meaning, and how err is used/propagated to failed pending calls; keep the interface method signature unchanged except ensuring it has no redundant modifiers, and make the documentation clear enough for both implementers and callers to understand the semantics.Source: Path instructions
header/src/main/java/org/zstack/header/vm/VmBeforeStartOnHypervisorExtensionPoint.java (1)
5-13: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win给扩展点方法补上有效的 Javadoc。
这里已经是公开扩展契约了,但当前只有空注释块,调用语义和失败约定都不清楚。As per path instructions, "接口方法不应有多余的修饰符(例如
public),且必须配有有效的 Javadoc 注释"。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/vm/VmBeforeStartOnHypervisorExtensionPoint.java` around lines 5 - 13, Add proper Javadoc to VmBeforeStartOnHypervisorExtensionPoint and its public contract methods beforeStartVmOnHypervisor(VmInstanceSpec) and beforeStartVmOnHypervisor(VmInstanceSpec, Completion), describing when each is invoked, what implementers should do, and how success/failure is signaled through Completion; also ensure the interface method signatures follow the project style by avoiding unnecessary public modifiers on interface methods.Source: Path instructions
Summary
Root Cause
./runMavenProfile sdk.Change
@SDKGenericmarker and SDK generator support for typed collection output.Verification
./runMavenProfile premiumpassed after rebuilding from:aithroughtest-premium.AccessKeyBasicCasepassed with this core branch mounted.mvn -P premium -pl premium/plugin-premium/ai -am install -DskipTests -Djacoco.skip=truepassed.mvn -pl sdk -DskipTests -Djacoco.skip=true compilepassed../runMavenProfile sdkpassed; a later rerun after full local install hit a localzstack-iam2VerifyError during reflections scan, not an SDK compile error.Risk
Resolves: ZSTAC-85984
sync from gitlab !10285