Fix concurrent capture worker rebuild after a reaped browser#309
Open
probably-jaden wants to merge 1 commit into
Open
Fix concurrent capture worker rebuild after a reaped browser#309probably-jaden wants to merge 1 commit into
probably-jaden wants to merge 1 commit into
Conversation
When the supervisor kills a stalled browser, the worker's rebuild (sync_playwright().start() via the fetcher factory) raised "Playwright Sync API inside the asyncio loop" and killed the whole run: the dead instance's event loop stays registered as running on the worker thread, and _close_quietly can't clear it because Playwright teardown is thread-affine. - pipeline: reset the worker thread's running-loop marker to its pre-fetcher baseline before rebuilding (and at worker teardown), abandoning the poisoned loop instead of touching thread-affine Playwright internals. - pipeline: a failed rebuild no longer propagates out of the worker; the URL keeps its error outcome and the next URL retries the rebuild. - playwright_fetcher: __exit__ still calls playwright.stop() when browser.close() fails on an already-dead browser, so same-thread teardown un-registers the loop properly. - tests: fake sync-Playwright-alike fetcher reproducing the poisoned thread state; both new tests fail on the previous code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the stall supervisor kills a wedged browser process, the worker's dead sync-Playwright instance leaves its asyncio loop registered as running on that thread. The rebuild's fresh `sync_playwright().start()` then refuses with "Playwright Sync API inside the asyncio loop", the exception propagates out of the worker, and the whole capture run dies.
Fix: capture the thread's loop-state baseline when a worker starts, and restore it (`asyncio.events._set_running_loop`) after abandoning a dead fetcher, so the rebuilt fetcher can start cleanly. The dead loop is deliberately leaked; its browser processes are already swept by the reaper. Includes regression tests for the rebuild path (browser death mid-run, and a failed rebuild no longer killing the run).
Observed in practice: any capture run large enough to hit one hung page (>~200 URLs at concurrency 5) crashed within minutes; with this fix the same runs complete.
🤖 Generated with Claude Code