Skip to content

Automate integration tests against the real Tested-with apps#6

Merged
faizollah merged 1 commit into
mainfrom
3-realapp-integration-tests
Jun 29, 2026
Merged

Automate integration tests against the real Tested-with apps#6
faizollah merged 1 commit into
mainfrom
3-realapp-integration-tests

Conversation

@faizollah

Copy link
Copy Markdown
Collaborator

Closes #3

Automates testing against the real apps in the Tested-with table, in two tiers.

Tier A — real-data integration tests (run on every push, tests workflow)

  • tests/fixtures/<app>/ — the actual captured_metadata.json + metrics_log.json harvested from real flwr runs of:
    • quickstart-pytorch (FedAvg, torch + torchvision)
    • fed-engines (FedProx, torch + HF datasets, anomaly metrics)
    • quickstart-sklearn (FedAvg, scikit-learn)
  • tests/test_real_apps.py — parametrized over all three: builds each crate and asserts the right strategy, frameworks, and metrics for that app.
  • Deterministic and fast — no Flower/Ray/torch needed, so it runs in the existing per-push suite (now 51 tests, ~92% coverage).

Tier B — real-app end-to-end (nightly + on demand, new realapps.yml)

  • Actually flwr news a real Hub app, runs the federation end to end, and validates the produced crate (tests/e2e/validate_crate.py).
  • The only tier that exercises live capture from a running Flower simulation.
  • Handles the no-TTY simulation detach by polling for the crate; uploads the crate as an artifact.
  • Currently covers quickstart-sklearn; the matrix is structured so pytorch/fed-engines are added by appending rows.

Badges

  • Updated existing badge URLs to eScienceLab/... (post-transfer).
  • Added a second real-app e2e badge.

Status

Both workflows are green on this branch ✅ (tests and real-app e2e).

Tier A (per-push, in tests workflow):
- tests/fixtures/<app>/ — real captured_metadata.json + metrics_log.json
  harvested from actual flwr runs of quickstart-pytorch (FedAvg),
  fed-engines (FedProx), quickstart-sklearn (FedAvg)
- tests/test_real_apps.py — parametrized over the three; builds each crate and
  asserts the right strategy, frameworks, and metrics. No Flower/Ray needed.

Tier B (nightly + on demand, new realapps.yml):
- actually 'flwr new' + run a real Hub app end to end and validate the crate
- tests/e2e/ — CI-integrated server_app (fixed /tmp paths) + validate_crate.py
- handles the no-TTY sim detach by polling for the crate

README: 2nd badge (real-app e2e), badge URLs updated to eScienceLab,
two-tier testing section.

Addresses #3.
@faizollah faizollah merged commit b51e93b into main Jun 29, 2026
7 checks passed
@faizollah faizollah deleted the 3-realapp-integration-tests branch June 29, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add integration tests

1 participant