Skip to content

feat: state machine for scrape task producer state mgmt#115

Open
extreme4all wants to merge 4 commits intodevelopfrom
f/scrape-task-state-machine
Open

feat: state machine for scrape task producer state mgmt#115
extreme4all wants to merge 4 commits intodevelopfrom
f/scrape-task-state-machine

Conversation

@extreme4all
Copy link
Copy Markdown
Contributor

@extreme4all extreme4all commented Apr 10, 2026

replaces imperative FetchParams mutation with generic state machine. Good separation: sm.py (generic engine), structs.py (enums/context), states.py (transition declarations), core.py (orchestration). ~120 lines cut from core.

Summary

Refactor scrape task producer state management from mutable FetchParams to an explicit state machine pattern.

Changes

  • Add generic state machine (sm.py): StateMachine[S, E, C] with enum states/events, context object, decorator-based transition registration.
  • Extract domain types (structs.py): ScrapeState enum (NORMAL → POSSIBLE_BAN → CONFIRMED_BAN → DONE), ScrapeEvent enum (FETCH_MORE, REDUCE_DAYS, NEXT_STEP, NEW_DAY), ScraperCtx dataclass replacing FetchParams.
  • Declare transitions (states.py): All state transitions and side effects via @scraper_sm.transition decorators.
  • Simplify core (core.py): Replace determine_fetch_params (mutation) with determine_event (pure) + scraper_sm.handle (transition). Remove run_async/run wrappers, inline main() setup.
  • Rewrite tests: Test state transitions and event determination independently.

State flow

NORMAL → POSSIBLE_BAN → CONFIRMED_BAN → DONE
  ↑                                    |
  └──────────── NEW_DAY ───────────────┘

Each state narrows the date window (REDUCE_DAYS) until exhausted, then advances (NEXT_STEP). FETCH_MORE pages via cursor.

Notes

fetch_more uses three decorators (one per state → self) as a workaround for missing self-transition concept in the generic SM. Consider adding transition_to_self or allowing to_state=None to mean "stay."

@extreme4all extreme4all changed the title feat: implement state machine for scrape task state management and re… feat: state machine for scrape task producer state mgmt Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant