Skip to content

fix: remove dead code from utils and adaptive_crawler#2042

Open
RajanChavada wants to merge 7 commits into
unclecode:mainfrom
RajanChavada:bugfix/remove-dead-code
Open

fix: remove dead code from utils and adaptive_crawler#2042
RajanChavada wants to merge 7 commits into
unclecode:mainfrom
RajanChavada:bugfix/remove-dead-code

Conversation

@RajanChavada

Copy link
Copy Markdown

Summary

Please include a summary of the change and/or which issues are fixed.

Removes unreachable and abandoned code that accumulated over time. No behaviour change.

List of files changed and why

  • crawl4ai/adaptive_crawler copy.py: editor artifact committed by mistake; byte-for-byte duplicate of
    adaptive_crawler.py, not imported anywhere. Deleted.
  • crawl4ai/utils.py: two dead normalize_url variants removed:
    • The first normalize_url definition was silently shadowed by the extended definition ~20 lines below it.
      Python last-write wins, so it was never callable.
    • normalize_url_tmp had zero callers outside utils.py itself and reimplemented what urllib.parse.urljoin already
      does correctly.

How Has This Been Tested?

Existing test suite passes (pytest). No callers of removed code exist -> confirmed by grep across the full codebase before removal. extract_xml_data_legacy (also "legacy"-named) was left in place because tests/regression/test_reg_utils.py uses it.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

nightcityblade and others added 7 commits June 5, 2026 11:10
fix(docker): make read-only tmpfs writable
fix: close browser contexts from snapshot
adaptive_crawler copy.py was an uncommitted editor artifact that ended up
tracked in the repo. It is byte-for-byte identical to adaptive_crawler.py
and is not imported anywhere.
Two unreachable functions in utils.py:

- The first `normalize_url` (plain urljoin wrapper) was silently shadowed
  by the extended `normalize_url` defined ~20 lines later. Python last-write
  wins, so the first definition was never callable.

- `normalize_url_tmp` was a hand-rolled URL joiner (string split on "/")
  with no callers outside utils.py itself. `urllib.parse.urljoin` already
  covers this correctly.
@RajanChavada

Copy link
Copy Markdown
Author

Requesting a review on this PR (tagging @unclecode) as the lead maintainer :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants