feat: state machine for scrape task producer state mgmt#115
Open
extreme4all wants to merge 4 commits intodevelopfrom
Open
feat: state machine for scrape task producer state mgmt#115extreme4all wants to merge 4 commits intodevelopfrom
extreme4all wants to merge 4 commits intodevelopfrom
Conversation
…factor core logic
added 3 commits
April 10, 2026 23:21
…actor state management
…e test coverage for invalid transitions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
replaces imperative FetchParams mutation with generic state machine. Good separation: sm.py (generic engine), structs.py (enums/context), states.py (transition declarations), core.py (orchestration). ~120 lines cut from core.
Summary
Refactor scrape task producer state management from mutable
FetchParamsto an explicit state machine pattern.Changes
sm.py):StateMachine[S, E, C]with enum states/events, context object, decorator-based transition registration.structs.py):ScrapeStateenum (NORMAL → POSSIBLE_BAN → CONFIRMED_BAN → DONE),ScrapeEventenum (FETCH_MORE, REDUCE_DAYS, NEXT_STEP, NEW_DAY),ScraperCtxdataclass replacingFetchParams.states.py): All state transitions and side effects via@scraper_sm.transitiondecorators.core.py): Replacedetermine_fetch_params(mutation) withdetermine_event(pure) +scraper_sm.handle(transition). Removerun_async/runwrappers, inlinemain()setup.State flow
Each state narrows the date window (REDUCE_DAYS) until exhausted, then advances (NEXT_STEP). FETCH_MORE pages via cursor.
Notes
fetch_moreuses three decorators (one per state → self) as a workaround for missing self-transition concept in the generic SM. Consider addingtransition_to_selfor allowingto_state=Noneto mean "stay."