Autonomous Curator Agent (ACA)
Autonomous Curator Agent (ACA)
The Autonomous Curator Agent orchestrates an entire curriculum ingestion sprint — discover → fetch → curate → ingest — in a single command. Instead of processing documents one by one, the ACA manages a persistent Plan that tracks every source URL, handles retries, enforces quality thresholds, and recovers automatically from crashes.
How It Works
aca plan create --country US --standard AP --subject spanish │ ▼AcaPlanController → AutonomousCuratorOrchestrator (@Async) │ ├─ ① BatchDiscoverService │ Queries KnownSourcesRegistry + Google CSE │ → discovers 10-20 canonical URLs for the standard │ ├─ ② Persist Items │ One CuratorPlanItem per URL (status=PENDING) │ existsByPlanIdAndSourceUrl → idempotent on resume │ └─ ③ Process Loop (one item at a time) CuratorWorkflow (same LangGraph4j pipeline as headless mode) ├─ fetch_source → download + SHA-256 ├─ gather_context → HierarchyContextResolver ├─ propose_content → Gem C0 / C1 / C2 (router decides) └─ register_specs → IngestionService │ ▼ Quality Gate confidence ≥ 0.75 → DONE (auto-ingested) confidence < 0.75 → NEEDS_REVIEW (HITL queue) error → FAILED (message logged)Recovery
A scheduled job (checkAndRecoverStalledPlans) runs every 60 seconds using ShedLock (cluster-safe). If a plan’s last_heartbeat has not been updated for more than 30 minutes, the plan is set to PAUSED — safe to restart with aca plan start.
Prerequisites
- Backend running with Flyway V9 applied (
curator_plans+curator_plan_itemstables) ce-svc-clicompiled:mvn clean package -pl ce-svc-cli -am -DskipTests- Optional:
GOOGLE_CSE_API_KEY+GOOGLE_CSE_IDfor web-based source discovery
Quickstart — AP Spanish US Sprint
-
Create the plan
Ventana de terminal ./ce-cli.sh --env local \"aca plan create \--name 'AP Spanish US Sprint — Sept 2026' \--country US \--standard AP \--level HS \--subject spanish \--gates after_c0,after_c1"The CLI prints the
planId. Save it for the next steps. -
Start the plan (async)
Ventana de terminal ./ce-cli.sh --env local "aca plan start <planId>"The ACA begins discovering sources and processing them in the background. The command returns immediately.
-
Monitor progress
Ventana de terminal ./ce-cli.sh --env local "aca plan status <planId>"Example output:
Plan: AP Spanish US Sprint — Sept 2026Status: RUNNING | Items: 12 | Done: 7 | Review: 2 | Failed: 1 | Pending: 2Stage: c1_recipesLast heartbeat: 43s ago -
Review and approve the C0 gate
When all C0 (standards/rubrics) items are processed, the plan pauses at the
after_c0gate. Review the generated specs ince-specs/catalog/before proceeding to C1/C2.Ventana de terminal # Check new specs committed to ce-specsgit -C ../../ce-specs log --oneline -5# Approve the gate → pipeline resumes with C1 Recipes and C2 Exercises./ce-cli.sh --env local "aca plan approve <planId> --gate after_c0" -
Handle items needing human review
Items with
confidence_score < 0.75are queued asNEEDS_REVIEW. Review and approve or reject them individually.Ventana de terminal # List items needing review./ce-cli.sh --env local "aca plan list <planId> --status NEEDS_REVIEW"# Approve an individual item (triggers manual ingestion)./ce-cli.sh --env local "aca plan approve <planId> --item <itemId>" -
Check the final result
Ventana de terminal ./ce-cli.sh --env local "aca plan status <planId>"# → COMPLETED — 9 DONE · 2 NEEDS_REVIEW · 1 FAILED
Plan Lifecycle
DRAFT ──► RUNNING ──► WAITING_REVIEW ──► RUNNING ──► COMPLETED │ │ │ └──► RUNNING (after approve) │ ▼ PAUSED (stall / manual pause) │ └──► RUNNING (aca plan start resumes) │ ▼ FAILED (cancel or critical error)| State | Description |
|---|---|
DRAFT | Plan created, not yet started |
RUNNING | Actively processing items |
WAITING_REVIEW | Paused at an approval gate — waiting for operator |
PAUSED | Stall detected by recovery scheduler, or manually paused |
COMPLETED | All items processed (DONE + NEEDS_REVIEW + FAILED) |
FAILED | Cancelled or unrecoverable critical error |
Item Lifecycle
Each discovered URL becomes a CuratorPlanItem:
| State | Description |
|---|---|
PENDING | Queued, not yet processed |
PROCESSING | CuratorWorkflow is actively running for this item |
DONE | Auto-ingested — confidence_score ≥ threshold |
NEEDS_REVIEW | confidence_score < threshold — HITL queue |
FAILED | Unrecoverable error (encrypted PDF, timeout, parse error…) |
Full Command Reference
aca plan create
aca plan create \ --name "<name>" \ --country <CC> \ --standard <STD> \ [--level <level>] \ [--subject <subject>] \ [--gates <gate1,gate2>] \ [--threshold <0.0-1.0>]| Flag | Default | Description |
|---|---|---|
--name | required | Human-readable plan name |
--country | required | Country code: US, ES, MX… |
--standard | required | Curricular standard: AP, LOMLOE, IB… |
--level | — | Educational level: HS, 2 BACH, SL… |
--subject | — | Subject slug: spanish, lcl, biology… |
--gates | — | Comma-separated approval gate IDs: after_c0,after_c1 |
--threshold | 0.75 | Minimum confidence for auto-ingestion |
aca plan start
aca plan start <planId># POST /api/v1/aca/plans/{id}/startStarts a DRAFT plan or resumes a PAUSED plan. All PENDING items are re-queued. Items already DONE or FAILED are skipped (idempotent).
aca plan status
aca plan status <planId># GET /api/v1/aca/plans/{id}aca plan list
# List all plansaca plan list
# List items of a specific planaca plan list <planId>
# Filter by item statusaca plan list <planId> --status NEEDS_REVIEWaca plan list <planId> --status FAILEDaca plan approve
# Approve an approval gate → unblocks WAITING_REVIEW planaca plan approve <planId> --gate <gateId>
# Approve an individual NEEDS_REVIEW item → triggers manual ingestionaca plan approve <planId> --item <itemId>aca plan pause
aca plan pause <planId># POST /api/v1/aca/plans/{id}/pauseGraceful stop: the item currently in PROCESSING finishes, remaining PENDING items stay queued. Resume with aca plan start.
aca plan cancel
aca plan cancel <planId># POST /api/v1/aca/plans/{id}/cancel → status = FAILED (irreversible)Quality Gates
Quality gates pause the plan at a defined checkpoint so an operator can review the generated specs before the pipeline continues.
# Create a plan with two gatesaca plan create --name "..." --standard AP \ --gates after_c0,after_c1| Gate ID | When it triggers | Typical action |
|---|---|---|
after_c0 | All C0 (BlockRubric) items processed | Review rubrics; check framework coverage |
after_c1 | All C1 (Recipe) items processed | Validate pedagogical recipes before generating exercises |
To add custom gates, include the gate IDs in --gates. The plan pauses (WAITING_REVIEW) when all items up to that stage are done.
Handling NEEDS_REVIEW Items
Items below the confidence threshold are never auto-ingested. They appear in the HITL queue:
# See what needs reviewaca plan list <planId> --status NEEDS_REVIEW
# Example output:# [b72e9c01] AP Lit Scoring Guidelines (intro) — conf: 0.61 — NEEDS_REVIEW# source: https://apcentral.collegeboard.org/media/pdf/ap-sl-scoring-intro.pdf# error: "Partial extraction — only cover pages detected"
# Approve after manual verificationaca plan approve <planId> --item b72e9c01
# Or reject (leaves item as FAILED)# (currently via direct DB update or future CLI flag)Recovery from Crashes
If the JVM crashes mid-run:
- The scheduler detects
last_heartbeatnot updated for > 30 minutes - Sets plan status →
PAUSED - Operator sees the plan in
PAUSEDstate viaaca plan status - Resumes cleanly with
aca plan start <planId>— onlyPENDINGitems are retried
Configuration
# application.yml (ce-svc-ai-services)aca: confidence-threshold: 0.75 # Below this → NEEDS_REVIEW stall-threshold-minutes: 30 # No heartbeat for this long → PAUSED recovery: interval-ms: 60000 # Recovery scheduler frequencyEnvironment overrides:
export ACA_CONFIDENCE_THRESHOLD=0.80export ACA_STALL_THRESHOLD_MINUTES=15Database Schema (Flyway V9)
-- curator_plansid UUID PRIMARY KEY DEFAULT gen_random_uuid()name VARCHAR(255) NOT NULLcountry VARCHAR(10)standard VARCHAR(100)status VARCHAR(50) NOT NULL DEFAULT 'DRAFT'total_items INTEGER NOT NULL DEFAULT 0processed_items INTEGER NOT NULL DEFAULT 0failed_items INTEGER NOT NULL DEFAULT 0current_stage VARCHAR(100)config_json JSONB -- gates, threshold, model configstate_json JSONB -- checkpoint for resumelast_heartbeat TIMESTAMP -- updated every item; null → stallcreated_at TIMESTAMP NOT NULL DEFAULT now()updated_at TIMESTAMP NOT NULL DEFAULT now()
-- curator_plan_itemsid UUID PRIMARY KEY DEFAULT gen_random_uuid()plan_id UUID REFERENCES curator_plans(id) ON DELETE CASCADEsource_url TEXTspec_type VARCHAR(50)status VARCHAR(50) NOT NULL DEFAULT 'PENDING'confidence_score DOUBLE PRECISIONattempts INTEGER NOT NULL DEFAULT 0output_path TEXTerror_message TEXTcreated_at TIMESTAMP NOT NULL DEFAULT now()Curriculum-Aware Planning
The CurriculumArchitectService connects curriculum requirements (defined as YAML spec-as-code files in ce-specs/catalog/requirements/) to the ACA pipeline. Instead of manually choosing what to curate, you let the gap analysis tell you what is missing — then launch a targeted ACA sprint for exactly those gaps.
How It Fits Into the ACA
ce-specs/catalog/requirements/ └── ap_spanish_us.yaml ← spec-as-code: themes, required spec types, quantities │ ▼CurriculumGapAnalyzer Queries spec_definitions WHERE reference_code LIKE '%{theme}%' Compares existing counts vs. required counts per theme + spec type │ ▼CurriculumGapReport { coveragePercent, totalRequired, totalExisting, themes[] } │ (one gap per theme) ▼CurriculumArchitectService.createPlanFromGap() → Creates CuratorPlan (status=RUNNING) → Creates CuratorPlanItem per source URL (status=PENDING) → Enqueues via AcaPlanPublisher │ ▼AutonomousCuratorOrchestrator (existing ACA engine) Processes each item: fetch → curate → validate → ingestCurriculum Requirements YAML
Requirements for each standard are stored in ce-specs/catalog/requirements/:
standard: AP_SPANISHcountry: USthemes: - id: simulated_conversation name: Simulated Conversation specTypes: C2_EXERCISES: 10 INTERACTIVE_LESSON: 5 - id: email_reply name: Email Reply specTypes: C2_EXERCISES: 8 INTERACTIVE_LESSON: 4 - id: cultural_comparison name: Cultural Comparison specTypes: C2_EXERCISES: 6The property curriculum.requirements.path in application.properties points to this directory.
Gap-Driven Sprint Workflow
-
Analyze gaps for a standard
Ventana de terminal ./ce-cli.sh --env local "curriculum gaps --standard AP_SPANISH --country US"Example output:
📊 Curriculum Gap Analysis — AP_SPANISH / USCoverage : 42.0% (21 / 50 specs)Theme SpecType REQ HAVE GAP──────────────────────────────────────────────────────────────❌ simulated_conversation C2_EXERCISES 10 2 8❌ simulated_conversation INTERACTIVE_LESSON 5 0 5⚠️ email_reply C2_EXERCISES 8 4 4✅ cultural_comparison C2_EXERCISES 6 6 0 -
Create an ACA plan for the highest-priority gap
Ventana de terminal ./ce-cli.sh --env local \"curriculum plan create \--standard AP_SPANISH \--country US \--theme simulated_conversation \--spec-type C2_EXERCISES \--sources 'https://apcentral.collegeboard.org/media/pdf/ap23-apc-spanish-language.pdf,https://apcentral.collegeboard.org/media/pdf/ap22-apc-spanish-language.pdf'"Output:
✅ ACA Plan created and launchedPlan ID : e4a1c230-...Items queued : 2Status : RUNNINGMonitor with:aca plan status --id e4a1c230-...curriculum progress --standard AP_SPANISH --country US -
Monitor ACA progress
Ventana de terminal ./ce-cli.sh --env local "aca plan status e4a1c230"# → Status: RUNNING | Done: 1 | Review: 0 | Pending: 1 -
Check curriculum-level progress
Ventana de terminal ./ce-cli.sh --env local "curriculum progress --standard AP_SPANISH --country US"Output:
🚀 Curriculum Progress — AP_SPANISH / USOverall : 58.0% completeItems : 8 DONE | 1 PENDING | 1 REVIEW | 0 FAILEDPlan STATUS STAGE DONE PEND REV FAIL────────────────────────────────────────────────────────────────────────────CURRICULUM-AP_SPANISH-US-simulated... RUNNING c2_exercises 4 1 0 0CURRICULUM-AP_SPANISH-US-email_reply COMPLETED c2_exercises 4 0 1 0
REST API Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET | /api/v1/curriculum/gaps?standard=&country= | Gap analysis report |
GET | /api/v1/curriculum/progress?standard=&country= | ACA pipeline progress per standard |
POST | /api/v1/curriculum/plans | Create and launch an ACA plan from a gap |
Configuration
# application.properties (ce-svc-ai-services or ib-svc-parent-app)curriculum.requirements.path=/path/to/ce-specs/catalog/requirementsFor Cloud Run / Kubernetes, mount the ce-specs repository as a volume and set:
curriculum.requirements.path=/app/config/ce-specs/catalog/requirementsKnown Limitations
| Issue | Workaround |
|---|---|
| Encrypted PDFs (AES-256) | Pre-process with ghostscript -dNOPASSWORD before plan runs |
| Scanned PDFs (no text layer) | Pre-process with tesseract OCR |
| Discovery requires CSE key for web search | Set GOOGLE_CSE_API_KEY + GOOGLE_CSE_ID; known sources (College Board, Madrid EBAU) work without key |
| No UI for NEEDS_REVIEW approval | Use CLI aca plan approve --item (Flutter dashboard planned for v0.8) |
See Also
- Curator Agent Capabilities — All curator modes and pipeline architecture
- CLI Tutorial — End-to-end walkthrough
- Taxonomy Reference — Standard and bundle schemas