Skip to content

[Feature] : API ENDPOINTS PR 6 : Errors, Logs, and OpenAPI Documentation#1135

Open
pulk17 wants to merge 8 commits into
CCExtractor:masterfrom
pulk17:api-pr6-swagger
Open

[Feature] : API ENDPOINTS PR 6 : Errors, Logs, and OpenAPI Documentation#1135
pulk17 wants to merge 8 commits into
CCExtractor:masterfrom
pulk17:api-pr6-swagger

Conversation

@pulk17

@pulk17 pulk17 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

[FEATURE]

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used the project.
  • I have used the project briefly.
  • I have used the project extensively, but have not contributed previously.
  • I am an active contributor to the project.

Errors, logs, and OpenAPI documentation (PR 6/6)

Summary

Final part of 6 (supersedes #1117). Adds error/log diagnostics and the OpenAPI
document for the whole API. Also populates user.github_login at OAuth login
(with a lazy fallback in the run-trigger path), which PR 3's fork-run permission
check depends on.

Stacking: stacked on PR 5 (#1134). Please review #1134 first.

Endpoints (mod_api/routes/errors_logs.py)

  • GET /runs/{id}/errorsresults:read; derived test errors with
    ?type/?severity/?sample_id.
  • GET /runs/{id}/infrastructure-errorssystem:read; infra faults
    classified from progress messages (?include_stack is admin/contributor only).
  • GET /runs/{id}/error-summaryresults:read; counts grouped by
    type/severity/sample_id/regression_id.
  • GET /runs/{id}/logssystem:read; cursor-paginated build log (line-offset
    cursor capped at 10M; ?level/?source/?contains).
  • GET /runs/{id}/samples/{sid}/logs — placeholder (404); per-sample logs aren't
    produced by the CI worker yet (planned with the CI-VM work).

Log service (mod_api/services/log_service.py)

Streams the log off disk with itertools.islice (no full-file load) and applies
line-offset cursor pagination + substring filtering.

Auth change (mod_auth/controllers.py)

github_callback now stores github_login, and fetch_username_from_token
gained a request timeout. This is what lets PR 3's fork-run check identify the
caller's fork owner.

API contract

  • openapi-ci-api.yaml — OpenAPI 3.0.3 document for all ~26 endpoints
    (paths, scopes via x-required-scope, schemas, security); matches the
    implemented routes.
  • scripts/verify_schemathesis.py — property-based contract tests
    (schemathesis/hypothesis). Kept under scripts/ so standard pytest does not
    auto-collect it (those deps aren't in test-requirements.txt); run manually.

Testing

189 pytest tests pass; lint/type clean across mod_api/ and tests/api/.


Known caveats & design decisions (apply across the API)

Conscious tradeoffs at current scale (~250 samples / ~300 tests per run,
single-process deployment):

  1. In-memory rate limiting — dict + lock, per-process; global limit scales
    with worker count. Redis when we scale out.
  2. Log cursor = line offset — simple islice offset, hard-capped at
    10,000,000 lines to bound DoS. Byte offsets deferred.
  3. Status filtering in Python — derived statuses load rows and derive in
    Python rather than re-expressing multi-table logic in SQL. Negligible at ~250
    runs.
  4. N+1 in list_run_samples — per-result lazy loads vs one large eager
    join. Acceptable at ~300 tests/run; eager loading is an easy later win.
  5. Storage status without blob.exists() — list/summary endpoints infer
    ok/degraded from DB state to avoid a per-row GCS call; download/diff
    endpoints (which need the file) do verify.
  6. Token hashing = SHA-256, not a password KDF — API tokens are 256-bit
    random secrets, so SHA-256 + hmac.compare_digest (constant-time) is the
    correct, standard choice; argon2/bcrypt would only add latency. User
    passwords separately use passlib/bcrypt in mod_auth.
  7. @require_roles vs @require_scope — some routes use both deliberately
    (belt-and-suspenders), not a bug.
  8. Response schemas are partly informational — schemas validate requests and
    document the OpenAPI contract; some handlers build response dicts directly for
    simplicity in hot paths.

Conclusion

With this PR the API is structurally complete: runners and bots can orchestrate
the CI pipeline over HTTP/JSON.

@pulk17 pulk17 force-pushed the api-pr6-swagger branch 2 times, most recently from e38cc76 to cfffa04 Compare June 24, 2026 12:34
@pulk17 pulk17 force-pushed the api-pr6-swagger branch 2 times, most recently from 7a1ae9a to 439ae42 Compare June 26, 2026 18:50
@pulk17 pulk17 force-pushed the api-pr6-swagger branch from 439ae42 to f7dd2fa Compare June 27, 2026 16:00
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant