Automate integration tests against the real Tested-with apps#6
Merged
Conversation
Tier A (per-push, in tests workflow): - tests/fixtures/<app>/ — real captured_metadata.json + metrics_log.json harvested from actual flwr runs of quickstart-pytorch (FedAvg), fed-engines (FedProx), quickstart-sklearn (FedAvg) - tests/test_real_apps.py — parametrized over the three; builds each crate and asserts the right strategy, frameworks, and metrics. No Flower/Ray needed. Tier B (nightly + on demand, new realapps.yml): - actually 'flwr new' + run a real Hub app end to end and validate the crate - tests/e2e/ — CI-integrated server_app (fixed /tmp paths) + validate_crate.py - handles the no-TTY sim detach by polling for the crate README: 2nd badge (real-app e2e), badge URLs updated to eScienceLab, two-tier testing section. Addresses #3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3
Automates testing against the real apps in the Tested-with table, in two tiers.
Tier A — real-data integration tests (run on every push,
testsworkflow)tests/fixtures/<app>/— the actualcaptured_metadata.json+metrics_log.jsonharvested from realflwr runs of:quickstart-pytorch(FedAvg, torch + torchvision)fed-engines(FedProx, torch + HF datasets, anomaly metrics)quickstart-sklearn(FedAvg, scikit-learn)tests/test_real_apps.py— parametrized over all three: builds each crate and asserts the right strategy, frameworks, and metrics for that app.Tier B — real-app end-to-end (nightly + on demand, new
realapps.yml)flwr news a real Hub app, runs the federation end to end, and validates the produced crate (tests/e2e/validate_crate.py).quickstart-sklearn; the matrix is structured so pytorch/fed-engines are added by appending rows.Badges
eScienceLab/...(post-transfer).Status
Both workflows are green on this branch ✅ (
testsandreal-app e2e).