Skip to content

perf: cache hardware detection and optimize warm-path reuse (2/4)#1252

Merged
inimaz merged 3 commits into
masterfrom
davidberenstein1957/perf-2-hardware-cache
Jun 29, 2026
Merged

perf: cache hardware detection and optimize warm-path reuse (2/4)#1252
inimaz merged 3 commits into
masterfrom
davidberenstein1957/perf-2-hardware-cache

Conversation

@davidberenstein1957

@davidberenstein1957 davidberenstein1957 commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

Part 2/4 of the tracker performance stack. Depends on #1251.

cc @inimaz — this is the hardware-cache layer of the #1246 split.

Adds process-level hardware reuse and faster detection so repeat tracker runs in the same Python process skip repeated probing:

  • New hardware_cache.py with HardwareKind enum and per-process setup cache
  • Reuse CPU/GPU/RAM detection across tracker instances (get_or_run_setup)
  • Cache probe results via @lru_cache (Power Gadget, PowerMetrics, NVIDIA, ROCm)
  • Platform-aware CPU backend order (Mac ARM prefers fast cpu_load before PowerMetrics)
  • Parallel CPU/GPU setup with ThreadPoolExecutor
  • Global cpu_percent prime once per process for cpu_load mode
  • PowerMetrics sudo check timeout (3 s)
  • FAQ note on warm hardware reuse within one process

Benchmarks (measured locally, offline Mac ARM, 2026-06-21)

Cumulative with #1251; fresh subprocess for cold metrics.

Metric master #1251 only #1251 + this PR Δ vs master
Tracker __init__ p50 190 ms 1.9 ms 1.2 ms ~99% faster
Cold lifecycle p50 (init→start→stop) 1,685 ms 1,767 ms 408 ms ~76% faster (~4×)
Warm lifecycle best (same process) 1,565 ms 1,561 ms 4.8 ms ~325× faster
CLI monitor subprocess p50 850 ms 786 ms 694 ms ~18% faster

Stack

  1. perf: defer tracker initialization and slim import path (1/4) #1251 — lazy initialization & import slimming
  2. This PR — hardware detection cache & warm-path reuse
  3. perf: defer API run creation until first emission upload (3/4) #1253 — lazy API run creation
  4. perf: speed up CLI monitor startup and fix wrapped commands (4/4) #1254 — CLI monitor startup

CI / quality

Test plan

  • Reviewer: second tracker in same process should reuse cached hardware, not re-probe

Replaces

Split from #1246.

@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.23502% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.61%. Comparing base (aea5d20) to head (c8aee4e).

Files with missing lines Patch % Lines
codecarbon/core/resource_tracker.py 88.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1252      +/-   ##
==========================================
+ Coverage   89.39%   89.61%   +0.21%     
==========================================
  Files          47       48       +1     
  Lines        4565     4757     +192     
==========================================
+ Hits         4081     4263     +182     
- Misses        484      494      +10     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@davidberenstein1957

Copy link
Copy Markdown
Collaborator Author

@inimaz#1246 has been split into this 4-PR stack for easier review. Benchmarks in the description are measured locally (Mac ARM, 2026-06-21); pre-commit and tests pass on each branch. Please start with #1251.

@davidberenstein1957 davidberenstein1957 force-pushed the davidberenstein1957/perf-2-hardware-cache branch from 1f759d9 to 9d1c656 Compare June 21, 2026 14:37
@davidberenstein1957

Copy link
Copy Markdown
Collaborator Author

Rebased on #1251 + conftest hardening to reset probe lru_cache state between tests.

Verified locally: 530 passed, pre-commit clean.

@davidberenstein1957 davidberenstein1957 force-pushed the davidberenstein1957/perf-2-hardware-cache branch from 9d1c656 to 00817d8 Compare June 22, 2026 07:49
Base automatically changed from davidberenstein1957/perf-1-lazy-init to master June 22, 2026 08:20
davidberenstein1957 and others added 2 commits June 29, 2026 08:49
Add process-level hardware setup cache, probe result caching, platform-aware
CPU backend selection, and parallel CPU/GPU setup for faster repeat runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Import GPU/CPU probe modules before clearing caches so lru_cache state
does not leak across tests in the full suite.

Co-authored-by: Cursor <cursoragent@cursor.com>
@davidberenstein1957

Copy link
Copy Markdown
Collaborator Author

@inimaz#1251 is merged. This PR has been rebased onto the updated stack. CI and local tests pass — ready for your review.

@davidberenstein1957 davidberenstein1957 force-pushed the davidberenstein1957/perf-2-hardware-cache branch from 00817d8 to 31dc256 Compare June 29, 2026 06:53
Cover scalar GPU id normalization, defensive error paths, power gadget
setup, and fallback tracking branches missing from the PR patch report.

Co-authored-by: Cursor <cursoragent@cursor.com>
@davidberenstein1957

Copy link
Copy Markdown
Collaborator Author

Added tests to bring hardware_cache.py and resource_tracker.py to 100% patch coverage locally (537 passed). This should resolve the codecov/project gap.

@inimaz inimaz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIce thanks @davidberenstein1957 !

@inimaz inimaz merged commit 473351b into master Jun 29, 2026
13 checks passed
@inimaz inimaz deleted the davidberenstein1957/perf-2-hardware-cache branch June 29, 2026 12:10
@benoit-cty benoit-cty mentioned this pull request Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants