fix: remove dead code from utils and adaptive_crawler by RajanChavada · Pull Request #2042 · unclecode/crawl4ai

RajanChavada · 2026-06-29T15:38:02Z

Summary

Please include a summary of the change and/or which issues are fixed.

Removes unreachable and abandoned code that accumulated over time. No behaviour change.

List of files changed and why

crawl4ai/adaptive_crawler copy.py: editor artifact committed by mistake; byte-for-byte duplicate of
adaptive_crawler.py, not imported anywhere. Deleted.
crawl4ai/utils.py: two dead normalize_url variants removed:
- The first normalize_url definition was silently shadowed by the extended definition ~20 lines below it.
  Python last-write wins, so it was never callable.
- normalize_url_tmp had zero callers outside utils.py itself and reimplemented what urllib.parse.urljoin already
  does correctly.

How Has This Been Tested?

Existing test suite passes (pytest). No callers of removed code exist -> confirmed by grep across the full codebase before removal. extract_xml_data_legacy (also "legacy"-named) was left in place because tests/regression/test_reg_utils.py uses it.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added/updated unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

fix(docker): make read-only tmpfs writable

fix: close browser contexts from snapshot

…les is enabled (unclecode#2007)

adaptive_crawler copy.py was an uncommitted editor artifact that ended up tracked in the repo. It is byte-for-byte identical to adaptive_crawler.py and is not imported anywhere.

Two unreachable functions in utils.py: - The first `normalize_url` (plain urljoin wrapper) was silently shadowed by the extended `normalize_url` defined ~20 lines later. Python last-write wins, so the first definition was never callable. - `normalize_url_tmp` was a hand-rolled URL joiner (string split on "/") with no callers outside utils.py itself. `urllib.parse.urljoin` already covers this correctly.

RajanChavada · 2026-06-29T15:40:07Z

Requesting a review on this PR (tagging @unclecode) as the lead maintainer :)

nightcityblade and others added 7 commits June 5, 2026 11:10

fix: close browser contexts from snapshot

777a250

fix(docker): make read-only tmpfs writable

c8aee8c

Merge pull request unclecode#2034 from nightcityblade/fix/issue-2027

9fe0a7d

fix(docker): make read-only tmpfs writable

Merge pull request unclecode#2003 from nightcityblade/fix/issue-1999

e1ec743

fix: close browser contexts from snapshot

fix(html2text): preserve all attributes on table tags when bypass_tab…

511c73c

…les is enabled (unclecode#2007)

chore: remove accidental copy of adaptive_crawler

6a181da

adaptive_crawler copy.py was an uncommitted editor artifact that ended up tracked in the repo. It is byte-for-byte identical to adaptive_crawler.py and is not imported anywhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: remove dead code from utils and adaptive_crawler#2042

fix: remove dead code from utils and adaptive_crawler#2042
RajanChavada wants to merge 7 commits into
unclecode:mainfrom
RajanChavada:bugfix/remove-dead-code

RajanChavada commented Jun 29, 2026

Uh oh!

RajanChavada commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RajanChavada commented Jun 29, 2026

Summary

List of files changed and why

How Has This Been Tested?

Checklist:

Uh oh!

RajanChavada commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants