Skip to content

Fix(query): Remove duplicated rows when dataset/workflows shared Publicly in the hub page#5962

Open
Mrudhulraj wants to merge 1 commit into
apache:mainfrom
Mrudhulraj:fix/dataset-workkflow-fix
Open

Fix(query): Remove duplicated rows when dataset/workflows shared Publicly in the hub page#5962
Mrudhulraj wants to merge 1 commit into
apache:mainfrom
Mrudhulraj:fix/dataset-workkflow-fix

Conversation

@Mrudhulraj

@Mrudhulraj Mrudhulraj commented Jun 28, 2026

Copy link
Copy Markdown

What changes were proposed in this PR?

Issue - Duplicate datasets/workflows on hub landing page / hub search

Symptom: A user creates a dataset, makes it public, and grants another user explicit access. When the grantee browses the hub, the dataset appears twice in the search results.

Root cause:

DatasetSearchQueryBuilder.constructFromClause produced this SQL:

path: amber\src\main\scala\org\apache\texera\web\resource\dashboard\DatasetSearchQueryBuilder.scala:72

  SELECT DISTINCT ...
  FROM dataset
  LEFT JOIN dataset_user_access ON dua.did = dataset.did
  LEFT JOIN "user" ON ...
  WHERE (dua.uid = <ME>) OR (dataset.is_public = true)

For a dataset that is both public AND explicitly shared with the user, the LEFT JOIN produces one row per matching dataset_user_access row and the OR makes both branches true.
This applies similarly to worflows too.

Fix 1 — DatasetSearchQueryBuilder.constructFromClause

Move the UID filter from the WHERE clause into the JOIN's ON clause so each dataset produces at most one joined row, and force the JOIN to FALSE when uid == null so the SELECT still references a valid
table.

  val baseJoin = DATASET
    .leftJoin(DATASET_USER_ACCESS)
    .on(DATASET_USER_ACCESS.DID.eq(DATASET.DID))
    .**and**(if (uid == null) DSL.**falseCondition**() else DATASET_USER_ACCESS.UID.eq(uid))
    .leftJoin(USER)
    .on(USER.UID.eq(DATASET.OWNER_UID))

  val condition: Condition =
    if (uid == null) {
      DATASET.IS_PUBLIC.eq(true)
    } else if (includePublic) {
      DATASET.IS_PUBLIC.eq(true).or(DATASET_USER_ACCESS.UID.isNotNull)
    } else {
      DATASET_USER_ACCESS.UID.isNotNull
    }

  baseJoin.where(condition)

Why AND FALSE for uid == null?
The SELECT references DATASET_USER_ACCESS.PRIVILEGE. Without dataset_user_access in the FROM, DB throws missing
FROM-clause entry for table "dataset_user_access". AND FALSE keeps the table in the FROM while making the JOIN yield NULL access columns — which is the correct semantic for "no explicit grant".

Behavior matrix:

uid includePublic Matched datasets
null (n/a) Public only
not null false Datasets with explicit access of logged-in user only
not null true Public + logged-in explicit access (no duplicates)

Fix 2 — WorkflowSearchQueryBuilder.toEntryImpl + WorkflowSearchQueryBuilder.constructFromClause

Apply the same JOIN pattern for workflow represented in Fix-1 and add null-safe getters to handle the now-NULL access columns:
path: amber\src\main\scala\org\apache\texera\web\resource\dashboard\WorkflowSearchQueryBuilder.scala

  val privilege: String =
    Option(record.get(WORKFLOW_USER_ACCESS.PRIVILEGE, classOf[PrivilegeEnum]))
      .map(_.toString)
      .getOrElse("NONE")

  val ownerName: String =
    Option(record.into(USER).getName).getOrElse("")

  val ownerUid: Integer =
    Option(record.into(USER).getUid).getOrElse(0)

Any related issues, documentation, discussions?

Fixes #5957

How was this PR tested?

Tested manually with database checks and UI workflow testing.

Was this PR authored or co-authored using generative AI tooling?

No AI tools were used in the process.

@github-actions

Copy link
Copy Markdown
Contributor

👋 Thanks for your first contribution to Texera, @Mrudhulraj!

If you're looking for a good place to start, browse issues labeled starter-task; they're scoped to be approachable for newcomers.

You can drive common housekeeping yourself by commenting one of these commands on its own line:

  • Issues. Comment /take to assign an open issue to yourself, or /untake to release it. You can find unclaimed work with the search filter is:issue is:open no:assignee.
  • Sub-issues. To link issues into a parent/child hierarchy, comment /sub-issue #5166 #5222 on the parent to attach those children (or /unsub-issue #5166 #5222 to detach them). From a child issue, comment /parent-issue #5166 to set its parent, or /unparent-issue to clear it (the current parent is detected automatically). References may be written as #5166 or as a bare 5166; cross-repository references are not supported.
  • Pull requests (author only). Comment /request-review @user to request a review from someone, or /unrequest-review @user to withdraw that request.

Each command must match exactly: /take this will not work, only /take does. For the full contribution flow, see CONTRIBUTING.md.

@github-actions

Copy link
Copy Markdown
Contributor

👋 Thanks for opening this pull request, @Mrudhulraj!

It looks like the pull request description doesn't quite follow our template yet:

  • The What changes were proposed in this PR? section is empty; please fill it in.

Filling out the template helps reviewers understand and triage your contribution faster. Please edit the description to complete it. This message will disappear automatically once the template is followed.

You can find the template prompts by editing the description, or see CONTRIBUTING.md for the full contribution flow.

@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @xuang7, @aglinxinyuan
    You can notify them by mentioning @xuang7, @aglinxinyuan in a comment.

@carloea2

Copy link
Copy Markdown
Contributor

Does not select distinct works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Join Fan-out issue: Publicly shared dataset/workflows rows duplicated in the hub. (Has RCA and suggested fix)

2 participants