Skip to content

[NAS Backup] Suppress Errors in Disk Usage Calculation that Caused Backup to Fail.#13424

Open
daviftorres wants to merge 23 commits into
apache:mainfrom
daviftorres:nas-backup-failed
Open

[NAS Backup] Suppress Errors in Disk Usage Calculation that Caused Backup to Fail.#13424
daviftorres wants to merge 23 commits into
apache:mainfrom
daviftorres:nas-backup-failed

Conversation

@daviftorres

@daviftorres daviftorres commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Description

This PR tried to prevent the failure of the job at the statistics section of a backup that has actually succeeded.

image

Apparently, it also fixes some silent failures I previously reported in #11727

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@daviftorres

daviftorres commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

This is the equivalent command for applying the fix:

sed -i 's_du -sb $dest | cut -f1_du -sb $dest 2>/dev/null | cut -f1 || true_g' /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/nasbackup.sh

We haven't confirmed the exact root cause of the du failure yet. As a precaution, we applied this fix to all servers and will monitor backups over the next few days.

So, I am running tests with 2>>/var/log/cloudstack/agent/nasbackup.err so I can see what is the error message.

@daviftorres daviftorres marked this pull request as ready for review June 16, 2026 15:02
Add timeout for unmounting backup mount point and cleanup.
@daviftorres

daviftorres commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Proposed Changes Rationale

backup_size=$(du -sb "$dest" 2>/dev/null | cut -f1) || true
  • NFS issues may cause du command to fail.
  • A size retrieval failure should not invalidate a successful backup.
timeout 60 umount "$mount_point" 2>/dev/null || true
rmdir "$mount_point" 2>/dev/null || true
  • Another process may keep the device busy (e.g., parallel backups).
  • Network issues may cause hangs on NFS.
  • Cleanup failures should not invalidate a successful backup.
echo -n "$backup_size"
  • Outputs the size at the end to confirm the script completed past the potentially problematic commands.

@daviftorres

Copy link
Copy Markdown
Contributor Author

Dear @abh1sar , do you think you can help me with this bug? Regards,

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 18.94%. Comparing base (82986f6) to head (eb53f2b).

Additional details and impacted files
@@            Coverage Diff            @@
##               main   #13424   +/-   ##
=========================================
  Coverage     18.94%   18.94%           
- Complexity    18363    18366    +3     
=========================================
  Files          6192     6192           
  Lines        556361   556361           
  Branches      67908    67908           
=========================================
+ Hits         105397   105407   +10     
+ Misses       439393   439383   -10     
  Partials      11571    11571           
Flag Coverage Δ
uitests 3.51% <ø> (ø)
unittests 20.15% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings June 18, 2026 20:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the KVM NAS backup script’s “statistics/cleanup” section so that failures while computing backup disk usage (and related cleanup commands) don’t cause an otherwise successful backup job to be marked as failed.

Changes:

  • Capture du output into backup_size and suppress du stderr to avoid failing the script during size calculation.
  • Add timeout around umount and suppress errors from umount/rmdir.
  • Emit the computed backup size at the end of backup_running_vm().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated

@DaanHoogland DaanHoogland left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@DaanHoogland DaanHoogland requested a review from abh1sar June 23, 2026 06:53
Copilot AI review requested due to automatic review settings June 23, 2026 12:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Copilot AI review requested due to automatic review settings June 24, 2026 20:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated

@abh1sar abh1sar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @daviftorres
would it be possible to test script changes in your env where it is reproducible?
Some log outputs would be nice.

Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
Co-authored-by: Abhisar Sinha <63767682+abh1sar@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 25, 2026 13:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread scripts/vm/hypervisor/kvm/nasbackup.sh Outdated
@daviftorres

Copy link
Copy Markdown
Contributor Author

would it be possible to test script changes in your env where it i

image
2026-06-25 13:22:42,360 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-1:[]) (logid:e4aca261) Request:Seq 218-281474976710814:  { Cmd , MgmtId: 90520736259046, via: 218, Ver: v1, Flags: 100111, [{"org.apache.cloudstack.backup.TakeBackupCommand":{"vmName":"i-84-6493-VM","backupPath":"i-84-6493-VM/2026.06.25.13.22.42","backupRepoType":"nfs","backupRepoAddress":"10.4.2.145:/mnt/VAN3-NAS01-STOR-POOL-A1/VAN3-NAS01-DS-01","wait":"0","bypassHostMaintenance":"false"}}] }
2026-06-25 13:22:42,360 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-1:[]) (logid:e4aca261) Processing command: org.apache.cloudstack.backup.TakeBackupCommand
2026-06-25 13:25:04,120 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-1:[]) (logid:e4aca261) Seq 218-281474976710814:  { Ans: , MgmtId: 90520736259046, via: 218, Ver: v1, Flags: 110, [{"org.apache.cloudstack.backup.BackupAnswer":{"size":"(47.67 GB) 51190241222","result":"true","details":"Job type:         Completed

Nothing else that I can find in the logs are related to the backup job.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 26, 2026 01:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@abh1sar

abh1sar commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18383

@DaanHoogland

Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants