Skip to content

Schema lineage: cap the EXPLAIN AST fan-out on large schemas #42

Description

@BorisTyshkevich

Problem

loadSchemaLineage (src/net/ch-client.js) resolves a view/MV's source tables by running EXPLAIN AST <as_select> per object, fanned out with a single unbounded Promise.all:

await Promise.all(tables.map(async (t) => {
  if (!t.as_select || (t.engine !== 'View' && t.engine !== 'MaterializedView')) return;
  try {
    const ast = await queryJson(ctx, 'EXPLAIN AST ' + t.as_select);
    t.astTables = parseAstTables((ast.data || []).map((r) => r.explain).join('\n'));
  } catch { /* best-effort */ }
}));

So the number of EXPLAIN AST queries launched simultaneously equals the number of views + materialized views in the database. There is no application-level concurrency limit — the only throttle is the browser's connection pool:

  • HTTP/1.1 — ~6 concurrent per origin (the rest queue).
  • HTTP/2 (what the TLS demo clusters serve) — all multiplexed over one connection, so effectively all fire near-simultaneously.

On a database with hundreds of views/MVs (e.g. github.demo's big schemas), dragging it onto the results pane can spray hundreds of concurrent EXPLAIN AST queries at ClickHouse in one burst.

Impact

  • Burst load on the ClickHouse server proportional to the view/MV count.
  • Slow time-to-graph for large schemas (mitigated UX-wise by the existing "Loading lineage…" state, but the work itself is still unbounded).

Proposed fix (either, or both)

  1. Bound the fan-out — run the EXPLAIN AST queries through a small worker pool (e.g. 4–8 in flight) instead of Promise.all over everything.
  2. Prefer structured columns first — when dependencies_database/dependencies_table are already populated for a view/MV, skip its EXPLAIN AST entirely and only fall back to parsing for the ones where the structured columns are empty. On modern builds this eliminates most/all of the fan-out; on older builds (e.g. Altinity-antalya 26.3, where dependencies_* is often empty) it falls back as today.

(2) is the bigger win where it applies; (1) is the robust backstop for the builds that still need the parse path.

Context

Introduced with the schema lineage graph (#41). The graph math (src/core/schema-graph.js) is unaffected — this is purely about how the loader gathers source-table data. A cap belongs in loadSchemaLineage; the structured-first short-circuit can be decided per-row there too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions