Fix(query): Remove duplicated rows when dataset/workflows shared Publicly in the hub page#5962
Fix(query): Remove duplicated rows when dataset/workflows shared Publicly in the hub page#5962Mrudhulraj wants to merge 1 commit into
Conversation
|
👋 Thanks for your first contribution to Texera, @Mrudhulraj! If you're looking for a good place to start, browse issues labeled You can drive common housekeeping yourself by commenting one of these commands on its own line:
Each command must match exactly: |
|
👋 Thanks for opening this pull request, @Mrudhulraj! It looks like the pull request description doesn't quite follow our template yet:
Filling out the template helps reviewers understand and triage your contribution faster. Please edit the description to complete it. This message will disappear automatically once the template is followed. You can find the template prompts by editing the description, or see CONTRIBUTING.md for the full contribution flow. |
Automated Reviewer SuggestionsBased on the
|
|
Does not select distinct works? |
What changes were proposed in this PR?
Issue - Duplicate datasets/workflows on hub landing page / hub search
Symptom: A user creates a dataset, makes it public, and grants another user explicit access. When the grantee browses the hub, the dataset appears twice in the search results.
Root cause:
DatasetSearchQueryBuilder.constructFromClause produced this SQL:
path:
amber\src\main\scala\org\apache\texera\web\resource\dashboard\DatasetSearchQueryBuilder.scala:72For a dataset that is both public AND explicitly shared with the user, the LEFT JOIN produces one row per matching dataset_user_access row and the OR makes both branches true.
This applies similarly to worflows too.
Fix 1 — DatasetSearchQueryBuilder.constructFromClause
Move the UID filter from the WHERE clause into the JOIN's ON clause so each dataset produces at most one joined row, and force the JOIN to FALSE when uid == null so the SELECT still references a valid
table.
Why
ANDFALSEforuid == null?The
SELECTreferencesDATASET_USER_ACCESS.PRIVILEGE. Withoutdataset_user_accessin theFROM, DB throws missingFROM-clause entry for table "dataset_user_access".
ANDFALSEkeeps the table in the FROM while making the JOIN yield NULL access columns — which is the correct semantic for "no explicit grant".Behavior matrix:
Fix 2 — WorkflowSearchQueryBuilder.toEntryImpl + WorkflowSearchQueryBuilder.constructFromClause
Apply the same JOIN pattern for workflow represented in Fix-1 and add null-safe getters to handle the now-NULL access columns:
path:
amber\src\main\scala\org\apache\texera\web\resource\dashboard\WorkflowSearchQueryBuilder.scalaAny related issues, documentation, discussions?
Fixes #5957
How was this PR tested?
Tested manually with database checks and UI workflow testing.
Was this PR authored or co-authored using generative AI tooling?
No AI tools were used in the process.