Skip to content

feat(observability): traces end-to-end (gateway + UI panel)#5381

Draft
Ma77Ball wants to merge 165 commits into
apache:mainfrom
Ma77Ball:obs/pr7/traces
Draft

feat(observability): traces end-to-end (gateway + UI panel)#5381
Ma77Ball wants to merge 165 commits into
apache:mainfrom
Ma77Ball:obs/pr7/traces

Conversation

@Ma77Ball

@Ma77Ball Ma77Ball commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Distributed-trace retrieval through the gateway plus a trace-tree UI panel.
Backend:

  • Trace query builder and parseTraces.
  • TracesResource exposing trace retrieval by id, with id validation and scope enforcement at the gateway.
    Frontend:
  • Traces panel rendering spans as a tree built from parent-span relationships, with pivot-from-log so a log row can open its trace directly. Span fields are bound with text interpolation only.

Any related issues, documentation, or discussions?

Closes: #5373
Part of #4070. Stacked on #5380.

How was this PR tested?

  • Backend specs for the trace query builder and parser; sbt scalafmtCheckAll passes.
  • Frontend traces-panel specs, including the span-tree builder; prettier-eslint and eslint pass.
  • Compile and the full test suites run in this PR's CI.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF

Ma77Ball and others added 7 commits June 5, 2026 04:49
…, SDK bootstrap (default-off)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s panel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt scope, health, routing)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ca/eBPF profiling

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tracing primitives

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… trace tree panel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…UI metrics panel (ECharts)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added engine dependencies Pull requests that update a dependency file frontend Changes related to the frontend GUI docs Changes related to documentations infra common labels Jun 5, 2026
@codecov-commenter

codecov-commenter commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 59.74026% with 744 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.32%. Comparing base (6d31f46) to head (ae477e7).
⚠️ Report is 52 commits behind head on main.

Files with missing lines Patch % Lines
...observability/gateway/ObservabilityResources.scala 0.00% 273 Missing ⚠️
...observability/logs-panel/logs-panel.component.html 56.71% 58 Missing ⚠️
...ala/org/apache/texera/observability/OtelInit.scala 63.30% 43 Missing and 8 partials ⚠️
...apache/texera/web/observability/gateway/dtos.scala 76.59% 38 Missing and 6 partials ⚠️
...texera/web/observability/gateway/AuditLogger.scala 0.00% 41 Missing ⚠️
...xera/web/observability/gateway/ScopeResolver.scala 19.44% 27 Missing and 2 partials ⚠️
...ra/web/observability/gateway/ResponseParsers.scala 77.96% 10 Missing and 16 partials ⚠️
...era/web/observability/gateway/GatewayContext.scala 0.00% 21 Missing ⚠️
...web/observability/gateway/WorkflowRunCounter.scala 0.00% 20 Missing ⚠️
...a/org/apache/texera/web/TexeraWebApplication.scala 0.00% 18 Missing ⚠️
... and 26 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5381      +/-   ##
============================================
+ Coverage     53.86%   55.32%   +1.45%     
- Complexity     2756     3015     +259     
============================================
  Files          1099     1149      +50     
  Lines         42541    45065    +2524     
  Branches       4577     5016     +439     
============================================
+ Hits          22916    24932    +2016     
- Misses        18290    18643     +353     
- Partials       1335     1490     +155     
Flag Coverage Δ *Carryforward flag
access-control-service 70.14% <100.00%> (-0.30%) ⬇️
agent-service 34.36% <ø> (ø) Carriedforward from 818e316
amber 57.19% <55.03%> (+1.99%) ⬆️
computing-unit-managing-service 0.00% <0.00%> (-1.66%) ⬇️
config-service 50.76% <40.00%> (-5.95%) ⬇️
file-service 58.88% <68.96%> (+1.81%) ⬆️
frontend 49.66% <74.86%> (+1.61%) ⬆️
notebook-migration-service 78.57% <ø> (?)
pyamber 90.15% <100.00%> (+0.02%) ⬆️ Carriedforward from 818e316
python 90.76% <ø> (-0.04%) ⬇️ Carriedforward from 818e316
workflow-compiling-service 54.74% <ø> (-3.96%) ⬇️

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Ma77Ball and others added 5 commits June 23, 2026 03:18
…ements

- gateway LogsResource: resolve log user ids to display names via UserDao,
  and pull id-field autofill values from VictoriaLogs /field_values (those
  ids are record fields, not stream labels); adapt to the unbounded
  TimeWindow.validate signature
- builders: fix the body filter to the correct `_msg:"..."` phrase form
  (contains_str is not valid LogsQL)
- ResponseParsers: parseFieldValueLongs for the autofill values
- dtos/observability.types: LogSourcesResponse.userNames
- logs panel: user-name dropdown + 7-day default window

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilter

- WorkflowRunCounter: exact COUNT over workflow_executions for totalRuns
  (the sampled counter only estimates); wired into GatewayContext
- dtos/MetricsResource: NamedMetric.instant/dbBacked; totalRuns answered
  from the DB as one window-wide scalar; optional userId filter; metrics
  validation adapts to the unbounded window; aggregate-window MetricsQL
- observability.service: drop the step upper bound (panel auto-relaxes for
  large windows), validate userId
- metrics panel: user filter + instant hero stat + loading spinner and
  persisted filter options

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings the cumulative observability stack (foundations, backend-emit,
deployment, gateway-core, logs, metrics) into pr7, and tightens the
TracesResource @RolesAllowed to ADMIN-only to match the admin-gated
observability panel. BuildersSpec merge keeps both the metrics TotalRuns
test and the traces JaegerQueryBuilder test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @bobbai00, @aicam, @Yicong-Huang
    You can notify them by mentioning @bobbai00, @aicam, @Yicong-Huang in a comment.

Ma77Ball added 24 commits June 23, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common dependencies Pull requests that update a dependency file docs Changes related to documentations engine frontend Changes related to the frontend GUI infra platform Non-amber Scala service paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Observability] Distributed traces: gateway endpoint and dashboard panel

2 participants