Skip to content

feat(observability): metrics end-to-end (gateway + UI panel)#5380

Draft
Ma77Ball wants to merge 140 commits into
apache:mainfrom
Ma77Ball:obs/pr6/metrics
Draft

feat(observability): metrics end-to-end (gateway + UI panel)#5380
Ma77Ball wants to merge 140 commits into
apache:mainfrom
Ma77Ball:obs/pr6/metrics

Conversation

@Ma77Ball

@Ma77Ball Ma77Ball commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Workflow metrics through the gateway plus an ECharts-based UI panel.
Backend:

  • Metrics query builder and parseMetrics.
  • MetricsResource serving a closed allowlist of named queries (throughput, outcome rates, and duration percentiles), enforced on both the gateway and the client.
    Frontend:
  • Metrics panel rendering each named query as a typed series with Apache ECharts. Chart data is bound as typed arrays; backend output is never interpreted as a formatter or template.

Any related issues, documentation, or discussions?

Closes: #5372
Part of #4070. Stacked on #5379.

How was this PR tested?

  • Backend specs for the metrics query builder and parser; sbt scalafmtCheckAll passes.
  • Frontend metrics-panel and service specs; prettier-eslint and eslint pass.
  • Compile and the full test suites run in this PR's CI.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF

Ma77Ball and others added 6 commits June 5, 2026 04:49
…, SDK bootstrap (default-off)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s panel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt scope, health, routing)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ca/eBPF profiling

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tracing primitives

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…UI metrics panel (ECharts)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added engine dependencies Pull requests that update a dependency file frontend Changes related to the frontend GUI docs Changes related to documentations infra common labels Jun 5, 2026
@codecov-commenter

codecov-commenter commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.02341% with 716 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.33%. Comparing base (6d31f46) to head (18def0e).
⚠️ Report is 52 commits behind head on main.

Files with missing lines Patch % Lines
...observability/gateway/ObservabilityResources.scala 0.00% 241 Missing ⚠️
...observability/logs-panel/logs-panel.component.html 56.71% 58 Missing ⚠️
...ala/org/apache/texera/observability/OtelInit.scala 63.30% 43 Missing and 8 partials ⚠️
...apache/texera/web/observability/gateway/dtos.scala 73.40% 44 Missing and 6 partials ⚠️
...texera/web/observability/gateway/AuditLogger.scala 0.00% 41 Missing ⚠️
...xera/web/observability/gateway/ScopeResolver.scala 19.44% 27 Missing and 2 partials ⚠️
...era/web/observability/gateway/GatewayContext.scala 0.00% 21 Missing ⚠️
...ra/web/observability/gateway/ResponseParsers.scala 77.27% 9 Missing and 11 partials ⚠️
...web/observability/gateway/WorkflowRunCounter.scala 0.00% 20 Missing ⚠️
...a/org/apache/texera/web/TexeraWebApplication.scala 0.00% 17 Missing ⚠️
... and 27 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5380      +/-   ##
============================================
+ Coverage     53.86%   55.33%   +1.46%     
- Complexity     2756     3020     +264     
============================================
  Files          1099     1147      +48     
  Lines         42541    44898    +2357     
  Branches       4577     4985     +408     
============================================
+ Hits          22916    24845    +1929     
- Misses        18290    18570     +280     
- Partials       1335     1483     +148     
Flag Coverage Δ *Carryforward flag
access-control-service 70.14% <100.00%> (-0.30%) ⬇️
agent-service 34.36% <ø> (ø) Carriedforward from 1a81c33
amber 57.25% <55.31%> (+2.05%) ⬆️
computing-unit-managing-service 0.00% <0.00%> (-1.66%) ⬇️
config-service 50.76% <40.00%> (-5.95%) ⬇️
file-service 58.88% <68.96%> (+1.81%) ⬆️
frontend 49.61% <76.88%> (+1.55%) ⬆️
notebook-migration-service 78.57% <ø> (?)
pyamber 90.15% <100.00%> (+0.02%) ⬆️ Carriedforward from 1a81c33
python 90.76% <ø> (-0.04%) ⬇️ Carriedforward from 1a81c33
workflow-compiling-service 54.74% <ø> (-3.96%) ⬇️

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Ma77Ball and others added 13 commits June 23, 2026 03:07
- bin/observability/docker-compose.yml: collector + parca-agent stack
- bin/single-node/docker-compose.yml: mount the otel-collector and parca
  configs and run the parca-agent sidecar

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- dtos: drop the per-signal Signal/maxWindowSeconds enum and the upper
  bound on TimeWindow.validate -- DB-backed counts have no retention
  limit and the backends just return what they retain; BadTimeWindow
  becomes a plain value (only empty/inverted windows are rejected)
- DtoValidationSpec: cover the new unbounded-window behavior
- UI shell: observability route + dashboard navigation entry

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ements

- gateway LogsResource: resolve log user ids to display names via UserDao,
  and pull id-field autofill values from VictoriaLogs /field_values (those
  ids are record fields, not stream labels); adapt to the unbounded
  TimeWindow.validate signature
- builders: fix the body filter to the correct `_msg:"..."` phrase form
  (contains_str is not valid LogsQL)
- ResponseParsers: parseFieldValueLongs for the autofill values
- dtos/observability.types: LogSourcesResponse.userNames
- logs panel: user-name dropdown + 7-day default window

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilter

- WorkflowRunCounter: exact COUNT over workflow_executions for totalRuns
  (the sampled counter only estimates); wired into GatewayContext
- dtos/MetricsResource: NamedMetric.instant/dbBacked; totalRuns answered
  from the DB as one window-wide scalar; optional userId filter; metrics
  validation adapts to the unbounded window; aggregate-window MetricsQL
- observability.service: drop the step upper bound (panel auto-relaxes for
  large windows), validate userId
- metrics panel: user filter + instant hero stat + loading spinner and
  persisted filter options

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @bobbai00, @aicam, @Yicong-Huang
    You can notify them by mentioning @bobbai00, @aicam, @Yicong-Huang in a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common dependencies Pull requests that update a dependency file docs Changes related to documentations engine frontend Changes related to the frontend GUI infra platform Non-amber Scala service paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Observability] Workflow metrics: gateway endpoint and dashboard panel

2 participants