Skip to content

Bugfix: make jfrog xray impact paths deterministic#15094

Open
paulOsinski wants to merge 15 commits into
DefectDojo:devfrom
paulOsinski:bugfix-jfrog-xray-impact-paths-deterministic
Open

Bugfix: make jfrog xray impact paths deterministic#15094
paulOsinski wants to merge 15 commits into
DefectDojo:devfrom
paulOsinski:bugfix-jfrog-xray-impact-paths-deterministic

Conversation

@paulOsinski

@paulOsinski paulOsinski commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[sc-13285]

The OSS parser at dojo/tools/jfrog_xray_api_summary_artifact/parser.py picks impact_paths[0] from an unordered list returned by JFrog:

impact_paths = vulnerability.get("impact_path", [])
if len(impact_paths) > 0:
    impact_path = decode_impact_path(impact_paths[0])
...
file_path=impact_paths[0],
description=impact_path.name + ":" + impact_path.version + " -> " + vulnerability["description"],

When JFrog returns the same vulnerability's impact_path list in a different order, file_path and description shift between imports.

This can cause unexpected issues with deduplication because of drifting, unpredictable issue descriptions. This list should be ordered deterministically for consistency.

Fix
Sort impact_paths in the OSS parser so file_path and description are deterministic. Optionally list all paths in the description so developers see every location, not just one.

Change:
impact_paths = sorted(vulnerability.get("impact_path", [])) — every downstream use of impact_paths[0] (file_path, the decoded impact_path feeding unique_id, and the V3 LocationData) is now deterministic.
When more than one impact_path exists, description appends a Impact paths: list so developers see every location, not just one (which is a data model limitation)

DefectDojo release bot and others added 12 commits June 22, 2026 16:58
…x/3.0.100-3.1.0-dev

Release: Merge back 3.0.100 into bugfix from: master-into-bugfix/3.0.100-3.1.0-dev
… same file (DefectDojo#15003)

* fix(xygeni): keep repeated SAST/Secrets occurrences as distinct findings

Dedup on the per-occurrence issueId instead of uniqueHash (which Xygeni reuses
across occurrences of the same value), keeping uniqueHash as vuln_id_from_tool.
SCA is unchanged. Adds regression tests and updates the parser docs.

* docs(xygeni): add 2.59.2 upgrade note for reimport behavior

* docs(xygeni): move upgrade note to 3.0.100
* add organization and product type articles

* Fix table header from 'Assets' to 'Products'

Updated the table to correct 'Assets' to 'Products'.

---------

Co-authored-by: Paul Osinski <42211303+paulOsinski@users.noreply.github.com>
Two partial indexes on dojo_finding backing the heaviest finding-list
queries, which combine sparse per-user visibility with ORDER BY/LIMIT:

* idx_finding_sev_active (severity, numerical_severity DESC) WHERE active --
  severity-filtered lists ordered by severity (e.g. the default active-
  findings views and dashboard widgets). Replaces a backward numerical_severity
  scan that filtered millions of rows away with a bounded index range scan.
* idx_finding_riskaccepted_date (date DESC) WHERE risk_accepted --
  accepted-risks list; replaces a full-table backward scan with a direct
  ordered scan of just the risk-accepted rows.

Built CONCURRENTLY (atomic = False) so the migration takes no
ACCESS EXCLUSIVE lock on the large dojo_finding table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ility_perf_indexes

Add partial indexes for authorized finding-list queries
…#15059)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fectDojo#15080)

* Fix SARIF parser: unwrap BlackDuck nested fingerprint dict values

BlackDuck SARIF output uses {"value": "<hash>"} objects as fingerprint
values instead of plain strings as the SARIF 2.1.0 spec requires.

get_fingerprints_hashes() was storing these dicts directly into
unique_id_from_tool. Django's CharField silently coerces them to their
str() repr on save, but they arrive as actual dict objects in memory.

On second reimport the reimporter calls update_candidates_with_saved_finding()
which uses unique_id_from_tool as a Python dict key:

    candidates_by_uid.setdefault(finding.unique_id_from_tool, [])

Dicts are not hashable, so this raises:

    TypeError: cannot use 'dict' as a set element (unhashable type: 'dict')

This crash happens for ALL reimport deduplication algorithms because
update_candidates_with_saved_finding() runs unconditionally after each
finding is saved. Changing the deduplication algorithm is not a workaround.

Fix: unwrap the nested dict at parse time. If the value is a dict, extract
its "value" key (empty string fallback if missing). This produces the
correct plain-string unique_id_from_tool the rest of the pipeline expects.

Adds two unit test cases to test_get_fingerprints_hashes and a new
integration test test_blackduck_nested_fingerprints with a sanitized
5-finding SARIF fixture covering all severity levels.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix docstring format to satisfy ruff D213/D209

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* authorized users improvements

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* authorized users improvements: tests

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* authorized users improvements: registry system check + query coverage

- add a Django system check (dojo.E001, Tags.security) that fails loud if any
  auth-filter key the OS looks up is unregistered (silent fallback / allow-all);
  CRITICAL_AUTH_FILTERS is drift-guarded by the registry unit test
- add a scoped/no-access/superuser coverage matrix across the product-scoped
  object filters (engagements, tests, findings, endpoints, products, ...)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(authz): drop startup system check with hardcoded critical-filter list

Remove the @register(Tags.security) _check_auth_filters_wired system check and
its hardcoded CRITICAL_AUTH_FILTERS tuple. A hardcoded predefined list of
critical methods just rots and drifts from reality. The dynamic
test_all_looked_up_keys_are_registered guard (scans the tree for
get_auth_filter() lookups and asserts each is registered) covers the same
'silent fallback' risk without any hardcoded list, so keep that and drop the
list-matching test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…ad selector (DefectDojo#15063)

* fix(rbac): resolve product-scoped users for engagement lead selector (DefectDojo#15062)

The OS auth-filter impls _get_authorized_users_for_product_type and
_get_authorized_users_for_product_and_product_type ignored their
product_type/product argument and returned users.none() for any
non-staff/non-superuser caller. This emptied the (mandatory) engagement
'Testing Lead' selector for users granted access to a product via the
'authorized users' section, blocking engagement creation (DefectDojo#15062).

Resolve the users authorized on the given product/product type via the
authorized_users M2M (the inverse of _authorized_product_ids), mirroring
the rest of the OS authorization layer. Superuser/staff still bypass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(rbac): also surface superusers as engagement lead candidates (2.58.4 parity)

2.58.4's get_authorized_users_for_product_and_product_type always included
is_superuser users (and global-role users) as candidates. Mirror the OS-
applicable part here so admins remain pickable as a Testing Lead for a
product-scoped caller. Global-role resolution stays a Pro concern (roles are
inert in OS).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(rbac): restore collaborator visibility in get_authorized_users (2.58.4 parity)

The OS _get_authorized_users returned only the calling user (pk=user.pk) for
any non-staff/non-superuser caller, vs 2.58.4 which returned co-members of the
caller's authorized products/types plus global-role users and superusers. This
collapsed the engagement/test 'users' lists and the audit-log actor/user
filters to self-only for scoped users.

Resolve collaborators via the authorized_users M2M (users sharing the caller's
authorized products/product types) plus superusers, mirroring the OS-applicable
part of 2.58.4. Updated the carrier-table 'legacy' tests accordingly (carrier
roles remain inert in OS; superusers now surface) and added a collaborator test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(perf): recalibrate importer query counts for authorized-users resolution

The get_authorized_users_for_product_and_product_type / _for_product_type
resolution now executes the authorized_users subqueries (the DefectDojo#15062 fix),
adding a small fixed overhead (+1 async / +3 sync) to the import notification
path. Recalibrate both the V2 (TestDojoImporterPerformanceSmall) and V3
(TestDojoImporterPerformanceSmallLocations) expected query counts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…ical products exist (DefectDojo#15057)

* fix(metrics): prevent 500 on Critical Asset Metrics page when no critical products exist (DefectDojo#15051)

When no product types are marked critical, the empty critical_prods queryset
is falsy, so the metrics template fell through its '{% if not critical_prods %}'
general-dashboard branch, which references context (max_findings_details,
findings, ...) that the critical_product_metrics view does not supply, raising
VariableDoesNotExist for 'max_findings_details'.

Gate the dashboard branches on 'name != labels.ASSET_METRICS_CRITICAL_LABEL'
too, mirroring the existing header branch, so the critical asset page shows the
'no critical assets' message instead of the general dashboard. Applied to both
the V3 and classic metrics templates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(e2e): smoke-check /critical_asset_metrics page loads (DefectDojo#15051)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@paulOsinski paulOsinski force-pushed the bugfix-jfrog-xray-impact-paths-deterministic branch from df67647 to 2de1751 Compare June 26, 2026 18:19
@github-actions github-actions Bot removed the docs label Jun 26, 2026
@valentijnscholten

Copy link
Copy Markdown
Member

Sounds like a good plan. It does mean that it might change the hash_code for many existsing findings occurring in reimports/imports leading to a one-time close/create frenzy.

@Maffooch Maffooch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan. It does mean that it might change the hash_code for many existsing findings occurring in reimports/imports leading to a one-time close/create frenzy.

Agreed with Val - description is a default hash code setting for this perser. @paulOsinski can you pleas rebase to the dev branch and add some release notes letting folks know how to rerun their hash calculations?

@Maffooch Maffooch added this to the 3.1.0 milestone Jun 26, 2026
@valentijnscholten

Copy link
Copy Markdown
Member

recalculating the hash won't help as the descripton is not retroactively changed.

@paulOsinski paulOsinski changed the base branch from bugfix to dev June 29, 2026 14:09
@github-actions

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

…pact-paths-deterministic

# Conflicts:
#	dojo/models.py
#	helm/defectdojo/Chart.yaml
#	unittests/test_importers_performance.py
@github-actions github-actions Bot added New Migration Adding a new migration file. Take care when merging. docs labels Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@paulOsinski

Copy link
Copy Markdown
Contributor Author

@Maffooch did you see Val's comment? He's right, the description will change along with the hash code so it's not like recalculating will do anything. This will still create "net new" Findings but those are being created anyway by this parser.

@Maffooch

Copy link
Copy Markdown
Contributor

Okay, so I think that leaves us with two options

  1. Change this parser, and findings will have go through the closed/open loop
  2. Make a v2 of this parser
    It seems like findings are already doing the closed/open loop today, so option 1 may not be that bad

@valentijnscholten would you agree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants