Skip to content

[Feature] : API ENDPOINTS PR 1: Foundation and scaffolding#1130

Open
pulk17 wants to merge 3 commits into
CCExtractor:masterfrom
pulk17:api-pr1-scaffolding
Open

[Feature] : API ENDPOINTS PR 1: Foundation and scaffolding#1130
pulk17 wants to merge 3 commits into
CCExtractor:masterfrom
pulk17:api-pr1-scaffolding

Conversation

@pulk17

@pulk17 pulk17 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

[Feature]

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used the project.
  • I have used the project briefly.
  • I have used the project extensively, but have not contributed previously.
  • I am an active contributor to the project.

API foundation and core infrastructure (PR 1/6)

Summary

First of six stacked PRs introducing a JSON REST API for the sample platform
(supersedes #1117). This PR adds only the foundation — blueprint, middleware,
the token model, shared services, and schemas. No route endpoints are added
here; they land in PRs 2–6. Because there are no routes yet, the blueprint
hooks don't fire for real requests, so middleware is tested over HTTP starting
in PR 2 (auth) and PR 3 (system/runs).

Stacking: PRs 2–6 branch off this one. Easiest to review in order (1→6).

What's included

Blueprint & wiringmod_api mounted at /api/v1 (run.py);
ProxyFix(x_for=1) so request.remote_addr reflects the real client IP behind
nginx (needed for IP-based rate limiting).

Middleware — Bearer-token auth with @require_scope/@require_roles (auth
failures are returned after rate limiting, so unauthenticated floods are still
limited); structured JSON errors; fixed-window rate limiting (5/15min token
creation by IP, 20/min writes, 120/min reads by token) with X-RateLimit-*
headers; security headers; declarative request/query/path validation.
Rate-limit state is per-process — see Known caveat #1 (PR 6).

Token model (models/api_token.py) — opaque spci_-prefixed tokens with
256 bits of entropy. Only a SHA-256 hash is stored, verified with
hmac.compare_digest (constant-time). A slow password KDF (argon2/bcrypt) is
deliberately not used: it adds latency to every request and buys nothing against
brute force on a high-entropy random secret — the standard approach for API
tokens. User passwords continue to use passlib/bcrypt in mod_auth.

Status service (services/status.py) — single source of truth for
run/sample status, batch-loaded to avoid N+1. Correctly treats "no output when
output was expected" as a failure (comparing actual result files against the
expected RegressionTestOutput rows) rather than a silent pass.

Schemas & utils — shared pagination/error schemas and response helpers.

Existing files touched

  • run.py: register the API blueprint; ProxyFix(x_for=1).
  • mod_auth/models.py: add user.github_login (used by the API to authorize
    fork-run triggers from PR 3; populated at OAuth login in PR 6).
  • requirements.txt: add marshmallow.
  • .pycodestylerc: exclude .venv*.

Migration

d4f8e2a1b3c7 creates api_token and adds user.github_login. github_login
is bundled here (rather than a later migration) because PR 2 references it.
Chains from the current alembic head c8f3a2b1d4e5.

Testing

Unit tests for the token model, status service, and utils; existing suite
passes; isort/pydocstyle/pycodestyle/mypy --strict clean.

Next

PR 2 (#1131): the /auth/tokens endpoints.

@cfsmp3 cfsmp3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the two high complexity errors (validation.py and status.py), also see Claude review below.

Reading the rview, clearly H1 must be addressed before we can merge or things will break.

The rest could be addressed (possibly are) later on in the stack.

But let's make sure we can merge and test each PR in order but not at the same time, i.e. merging this diff shouldn't break the system, even if by itself it doesn't do anything useful (since it's scaffolding).

Claude review follows:

HIGH (blocker):

  • H1 — ApiToken model added with NO migration. The api_token table won't exist on real MySQL. Tests pass only because tests/base.py does create_all from models → masks the gap. PR2's auth endpoints break at runtime. #1117 had this migration (d4f8e2a1b3c7); dropped in the split. Must add it, chained off master head c8f3a2b1d4e5.

MEDIUM (carryover, code unchanged):

  • ~770 lines of security middleware merge untested — and with no routes yet, the before_request hooks never even fire in this PR. Deferred to PR2/3. Risk noted.
  • Rate limiter unbounded memory (no hard cap, eviction only every 100 req).
  • Auth timing oracle (no-candidate path skips argon2 verify → leaks whether a prefix exists).
  • _get_client_ip comment wrong (ProxyFix means remote_addr is from XFF).

LOW/NIT: run-level missing→fail path untested; stale is_dummy_row "DEPLOYMENT PREREQUISITE" docstring; N+1 footgun in the batch_get_run_data wrappers; pytest conftest.py inert under nose2; generic 429 handler hardcodes wrong limits.

@pulk17 pulk17 force-pushed the api-pr1-scaffolding branch 8 times, most recently from 8e813fc to 9950853 Compare June 25, 2026 09:57
@pulk17 pulk17 force-pushed the api-pr1-scaffolding branch from 9950853 to beb4fe9 Compare June 25, 2026 10:25
@cfsmp3

cfsmp3 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

New / still open:

  • 🔴 N1 (Medium) — the timing-attack fix doesn't work. They added a dummy argon2 verify on the no-candidate path, but the hardcoded hash is malformed — I measured it: verify fails-fast in 0.0 ms vs ~43 ms for a real verify. So the timing oracle is still open, now hidden behind code that looks like a fix. Cheap fix: _DUMMY_HASH = PasswordHasher().hash('dummy') at import. (Practical risk is low — prefixes are 65-bit — but don't ship dead security code.)
  • 🟡 N2 (Medium) — migration/model drift: the migration adds user.github_login but mod_auth/models.py doesn't declare it → tests (create_all) won't have the column, and a future flask db migrate would propose dropping it. Add the field to the User model here, or defer that migration line to [Feature] : API ENDPOINTS PR 3 : System Status and Run Management Endpoints #1132 where it's used.
  • 🟡 M2 (rate-limit memory cap) still open.

@pulk17 pulk17 force-pushed the api-pr1-scaffolding branch from 214fbd2 to b9a645a Compare June 26, 2026 08:35
@pulk17 pulk17 force-pushed the api-pr1-scaffolding branch from b9a645a to d1e2392 Compare June 26, 2026 19:33
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants