fix: reject invalid backup uploads#9130
Conversation
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
upload_complete, ifvalidate_uploaded_backupraises (e.g., invalid zip/manifest), the mergedoutput_pathfile and upload session/chunk directory are left behind; consider mirroring the cleanup logic used inupload_backupso the merged file and session state are removed on validation failure. validate_uploaded_backupcurrently treats any non-dict manifest the same as a missing/invalid manifest; if you expect other JSON shapes (like lists or primitive types) it may be worth logging or surfacing a more specific error to aid debugging rather than silently returningNone.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `upload_complete`, if `validate_uploaded_backup` raises (e.g., invalid zip/manifest), the merged `output_path` file and upload session/chunk directory are left behind; consider mirroring the cleanup logic used in `upload_backup` so the merged file and session state are removed on validation failure.
- `validate_uploaded_backup` currently treats any non-dict manifest the same as a missing/invalid manifest; if you expect other JSON shapes (like lists or primitive types) it may be worth logging or surfacing a more specific error to aid debugging rather than silently returning `None`.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces validation for uploaded backup files by verifying the presence and format of manifest.json in both single-file and chunked upload flows, along with corresponding unit tests. The review feedback identifies a resource leak in chunked uploads where validation failure bypasses session cleanup, warns about potential exception masking during file deletion, and suggests adding assertions to verify session cleanup in tests.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| outfile.write(data_block) | ||
|
|
||
| file_size = os.path.getsize(output_path) | ||
| self.validate_uploaded_backup(output_path) |
There was a problem hiding this comment.
If validate_uploaded_backup raises a BackupServiceError, the execution jumps to the outer except Exception: block, which deletes the merged ZIP file but bypasses await self.cleanup_upload_session(upload_id). This leaves the temporary chunk files and the session in memory until they expire, causing a resource leak. Wrapping the validation in a try...except block ensures the session is cleaned up immediately on validation failure.
| self.validate_uploaded_backup(output_path) | |
| try: | |
| self.validate_uploaded_backup(output_path) | |
| except Exception: | |
| await self.cleanup_upload_session(upload_id) | |
| raise |
There was a problem hiding this comment.
Addressed in commit 525f573. upload_complete() now wraps upload validation so invalid merged archives call cleanup_upload_session(upload_id) before re-raising, and the existing outer failure path still removes the merged ZIP output. The focused regression test now also asserts that the upload session is removed and the temporary chunk directory no longer exists.
There was a problem hiding this comment.
Addressed in commit 525f573. upload_complete() now wraps upload validation so invalid merged archives call cleanup_upload_session(upload_id) before re-raising, and the existing outer failure path still removes the merged ZIP output. The focused regression test now also asserts that the upload session is removed and the temporary chunk directory no longer exists.
| except Exception: | ||
| if os.path.exists(zip_path): | ||
| os.remove(zip_path) | ||
| raise |
There was a problem hiding this comment.
If os.remove(zip_path) raises an exception (such as PermissionError or FileNotFoundError), it will mask the original exception (e.g., BackupServiceError from validation failure). Wrapping the cleanup in a nested try...except block ensures that any secondary cleanup errors are logged but do not obscure the primary error.
except Exception:
try:
if os.path.exists(zip_path):
os.remove(zip_path)
except Exception as e:
logger.warning(f"清理失败的备份文件失败: {e}")
raiseThere was a problem hiding this comment.
Addressed in commit 525f573. Direct upload cleanup now runs in a nested try/except, logs cleanup failures, and then re-raises the original validation/save failure so a secondary os.remove() error does not mask the primary error.
| with pytest.raises(BackupServiceError, match="无效的备份文件"): | ||
| await service.upload_complete({"upload_id": "upload-1"}) | ||
|
|
||
| assert not (backup_dir / "bad.zip").exists() |
There was a problem hiding this comment.
There was a problem hiding this comment.
Added in commit 525f573. The invalid chunked upload regression now asserts both upload-1 not in service.upload_sessions and that the temporary chunk directory was removed.
Uploaded backup files were previously accepted based mainly on the
.zipfilename suffix. That allowed invalid archives to enter the backups directory even when they were not usable AstrBot backups.Modifications / 改动点
What Problem This Solves
Before this change, the backup upload path mainly gated uploaded files by filename shape: a file ending in
.zipcould be saved into the backups directory before AstrBot checked whether it was a usable AstrBot backup archive.In practice, the old acceptance boundary was closer to this shape: the filename suffix was checked, then the upload was persisted.
That is different from a valid backup. The ZIP file is only the container; the minimum AstrBot backup contract is that the archive must contain a readable
manifest.jsonwhose JSON value is an object. When an arbitrary invalid ZIP, or a ZIP without a usable manifest, was accepted by the upload path, later backup consumers could not treat it like a real backup.The existing backup listing path already depends on that manifest contract:
So an invalid upload could appear to succeed because the filename ended with
.zip, but the saved file could then be skipped bylist_backups(), unusable for restore/import paths that rely on the same manifest contract, and left behind as an orphaned archive in the backups directory.This PR moves that failure to the upload boundary. Direct uploads and completed chunked uploads are now validated after the ZIP is saved or merged, but before the upload is accepted:
The validation reuses the existing manifest read path and accepts only a readable JSON object:
Archives without a readable JSON-object
manifest.jsonnow fail fast withBackupServiceErrorinstead of being accepted first and failing later in backup consumers.Changes
BackupService.validate_uploaded_backup()to reject uploaded backup archives whenmanifest.jsonis missing, unreadable, or not a JSON object.get_backup_manifest()so it only returns a readable dictionary manifest and logs non-object manifest payloads at debug level.tests/test_backup.pyfor invalid direct upload cleanup and invalid merged archive cleanup after chunked upload completion, including the chunk session/chunk directory cleanup assertions.Evidence
backup_service.pynow validates uploaded backups through the readable dict manifest path before accepting them.mark_backup_as_uploaded(), removes the upload session and chunk directory on validation failure, and still removes the merged ZIP output through the existing failure cleanup path.tests/test_backup.pycovers both invalid direct ZIP uploads and invalid chunked ZIP uploads, including cleanup assertions for the saved/merged archive and the chunked upload session state.zipfile/jsonmanifest-read path and does not add new dependencies. It verifies the minimum readable-dictmanifest.jsoncontract; it does not attempt to fully validate restore completeness or every backup payload file.Possible call chain / impact
New invalid uploads now fail fast with
BackupServiceErrorinstead of being accepted as unusable backup archives. Existing legacy invalid backup files are still handled by the existinglist_backups()skip behavior; this PR changes the upload boundary, not historical cleanup or full backup restore validation.Screenshots or Test Results / 运行截图或测试结果
Before this change, a file named
bad.zipcontaining invalid ZIP bytes could be saved by the upload path. The archive would later be skipped bylist_backups()because its manifest could not be read, so the upload could appear successful while leaving an unusable backup file behind.After this change, both direct uploads and chunked uploads fail fast with
BackupServiceErrorwhen the saved or merged archive does not contain a readablemanifest.jsonobject. The invalid saved or merged file is removed before returning the error, and invalid chunked uploads also clear their upload session and temporary chunk directory.Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
Not applicable: this is a bug fix, not a new feature.
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Validate uploaded backup archives and ensure invalid ZIP files are rejected and cleaned up for both direct and chunked uploads.
Bug Fixes:
Tests: