Curator Agent Capabilities
Curator Agent: Capabilities
The Curator Agent is ColabEdu’s AI-powered content ingestion pipeline. It transforms curricular documents (PDFs, Canvas/Moodle LMS packages) and open educational resources (CK-12, Khan Academy, Procomún, Europeana) into structured pedagogical specs ready for the platform.
What the Curator Does
| Capability | Description |
|---|---|
| Source Discovery | Identifies canonical URLs of official curricular documents and OER/LMS repository APIs. |
| Document Download | Downloads PDFs and LMS packages (.imscc Canvas, .mbz Moodle) to the local specs repository. |
| AI Analysis & Generation | Extracts pedagogical criteria, generates slides/widgets or quality rubrics using Gemini. |
| Gem Routing | curator_router classifies each input and delegates to the specialized Gem (lesson, resource, c2…). |
| YAML Scaffolding | Writes complete OAS InteractiveLesson, ResourceLearning, or ExerciseSpec specs. |
| LMS Package Ingestion | Unzips .imscc/.mbz, uploads assets to GCS, and generates InteractiveLesson with slides. |
| OER Indexing | Indexes open source resources with a 3-dimension pedagogical qualityScore. |
| Bulk Ingestion | Spec sync via sync_specs.sh or curator batch. |
| Seed & Bloom | Derives new exercises from ingested seeds (source docs, rubrics) via configurable Bloom strategies. Seeds may be private; derived specs are public. |
Unified Pipeline
Document (PDF / GCS URI) — or LMS Package (.imscc / .mbz) — or OER Resource (JSON/API) │ ▼POST /api/v1/curator/curate { source, sourceType, outputType, courseId?, pathwayId?, sandboxMode, ownerEmail?, shared } │ ▼CuratorWorkflow (LangGraph4j) ├─ [fetch_source] → download + SHA-256 hash │ │ │ sourceType == LMS_PACKAGE? │ ├─ YES ─► [unzip_lms] → LmsPackageProcessor.unzipPackage() │ │ └─► [upload_lms_assets] → GCS + assetUrlMap │ │ └─► [gather_context] │ └─ NO ─► [gather_context] ├─ [gather_context] → HierarchyContextResolver → CurricularContext ├─ [propose_content] → Gem selected by curator_router │ ├─ ExerciseSpec ← curator_c2 (exam PDFs) │ ├─ InteractiveLesson ← curator_lesson (LMS packages) │ └─ ResourceLearning ← curator_resource (OER APIs) ├─ [wait_for_review] → headless=true → auto-approve ├─ [register_specs] → IngestionService (visibility: PUBLIC | PRIVATE)
SEED & BLOOM PIPELINE (Content Derivation) ────────────────────────────────────────── curator curate --source rubric.pdf --visibility private ← ingest as seed → seed ingested with visibility=PRIVATE curator bloom derive --seed ib.spa.textual_analysis_rubric \ --strategy SOURCE_DERIVATION|PROGRESSION|MISCONCEPTION_REMEDIATION → derived specs generated with visibility=PUBLIC + provenance.seedRefCurricular Context Resolution
The same HierarchyContextResolver is shared with the Exercise Creation Wizard:
Explicit override (--standard / --subject) └─ courseId → CourseClient (standard, level, subject from the course) └─ pathwayId → PathwayClient (standard, level from the pathway) └─ referenceCode in manifest (field parsing) └─ generic prompt (no context)Curator Gems
The gem.curator_router classifies each input and delegates to the specialized Gem:
| Gem | Output | Triggers when |
|---|---|---|
gem.curator_router | — (routes) | Always — classifies input and delegates |
gem.curator_c0 | BlockRubric | Normative curricular documents |
gem.curator_c1 | Recipe | Complete didactic units |
gem.curator_c2 | ExerciseSpec | Exam PDFs, FRQ |
gem.curator_lesson | InteractiveLesson | LMS packages (.imscc Canvas, .mbz Moodle) |
gem.curator_resource | ResourceLearning | OER APIs: CK-12, Khan Academy, Procomún, Europeana |
Gem YAML files are stored in ce-specs/catalog/personas/curators/.
Generated Spec Types
The ColabEdu curation pipeline can generate the following OAS-compliant spec types. Each maps to a Gem and a pipeline mode:
Spec Types by Pipeline
| Spec Type | OAS Kind | Gem / Mode | Description |
|---|---|---|---|
ExerciseSpec | BLOCK_CONTENT | gem.curator_c2 / Classic | Exercises, FRQ, text analysis with achievement bands |
BlockRubric | BLOCK_RUBRIC | gem.curator_c0 / Classic | Normative rubrics from curricular documents |
Recipe | RECIPE | gem.curator_c1 / Classic | Complete didactic units with full pedagogical scaffolding |
PathwayTemplate | PATHWAY_TEMPLATE | gem.curator_c1 / ACA | Learning pathways for a curriculum standard or theme |
InteractiveLesson | INTERACTIVE_LESSON | gem.curator_lesson / LMS | Interactive slides from LMS packages (Canvas, Moodle) |
ResourceLearning | RESOURCE_LEARNING | gem.curator_resource / OER | Indexed OER resources with 3-dimensional quality score |
CurriculumRequirement | spec-as-code YAML | CurriculumGapAnalyzer | Requirement definition files driving gap analysis |
ExerciseSpec (C2_EXERCISES)
Classic Curator spec: exercises, FRQ, text analysis with achievement bands. Stored as BLOCK_CONTENT kind: ExerciseSpec.
BlockRubric (BLOCK_RUBRIC, C0)
Generated by gem.curator_c0 from normative curricular documents (official syllabi, standards frameworks). The C0 rubric defines the competency structure that C1 recipes and C2 exercises must map to.
kind: BLOCK_RUBRICreferenceCode: us.ap.spanish_language.hl.rubric_c0metadata: standard: AP_SPANISH country: US competencies: - id: simulated_conversation weight: 0.30 - id: email_reply weight: 0.25Recipe / PathwayTemplate (RECIPE, C1)
Generated by gem.curator_c1 from complete didactic units. Contains full pedagogical scaffolding: objectives, activities, assessment criteria, and content sequencing. Used as a template for generating targeted C2 exercises.
kind: RECIPEreferenceCode: us.ap.spanish_language.hl.recipe_simulated_conversationmetadata: theme: simulated_conversation specType: RECIPE provenance: seedRef: us.ap.spanish_language.hl.rubric_c0InteractiveLesson
Generated from LMS packages (Canvas Commons, Moodle Hub). Contains slides, interactive widgets, and learning objectives mapped to curricular competencies. See InteractiveLesson Reference.
ResourceLearning
Indexes OER resources with a 3-dimension qualityScore: alignment, pedagogicalRichness, accessibility. See ResourceLearning Reference.
CurriculumRequirement (spec-as-code YAML)
Not an OAS spec — a requirement definition file stored in ce-specs/catalog/requirements/. Read by the CurriculumGapAnalyzer to determine what specs are needed per standard + country.
standard: AP_SPANISHcountry: USthemes: - id: simulated_conversation name: Simulated Conversation specTypes: C2_EXERCISES: 10 INTERACTIVE_LESSON: 5 - id: email_reply name: Email Reply specTypes: C2_EXERCISES: 8These YAML files are the spec-as-code curriculum definition — they encode institutional curriculum requirements so the ACA can measure and close coverage gaps automatically.
Supported Sources
The KnownSourcesRegistry registers 14 active sources:
| ID | Type | Standard | Scope |
|---|---|---|---|
madrid, mec, ebau.cat | EBAU Exams | LOMLOE | 🇪🇸 |
collegeboard, apcentral | AP Exams | AP | 🇺🇸 |
web | Google CSE | — | 🌐 |
mec-procomun, mec-recursos | OER | LOMLOE | 🇪🇸 |
khan-academy, ck12 | OER | — | 🌐 |
europeana-edu | OER | — | 🇪🇺 |
canvas-commons | LMS | — | 🌐 |
moodle-hub, moodle-net | LMS | — | 🌐 |
# List all registered sourcesdocs sources
# Filter by scopedocs sources --scope esdocs sources --source canvas-commonsOperation Modes
Mode 1 — Headless / Automated (CI/CD)
curator curate \ --source exam.pdf \ --courseId 142# → fetch → context → propose → register (auto-approve)# job.shared = true → visible to all CONTENT_CREATOR/TEACHERMode 2 — Sandbox + Manual Review
curator curate \ --source canvas_module.imscc \ --source-type LMS_PACKAGE \ --output-type INTERACTIVE_LESSON \ --sandbox \ --output ./draft_specs/curator validate --spec ./draft_specs/lesson.yaml --ingestMode 3 — LMS/OER Batch
docs discover --source canvas-commons --subject biology --output canvas.yamldocs download --list canvas.yaml --local-dir ce-specs/sources/lms/canvas/curator batch --manifest ce-specs/tests/lms_batch_manifest.yaml --parallel 2curator test --lms --all --threshold 70Mode 4 — Shared Pedagogical Review Queue
Allows any pedagogue with CONTENT_CREATOR, TEACHER, or ADMIN role to view and approve jobs launched from the CLI, regardless of who launched them.
# Without --user: job is shared (visible to all pedagogues)curator curate \ --source exam.pdf \ --referenceCode us.ap.spanish_language.hl# → job.shared = true, job.owner_email = null# → 🟡 "Shared" badge in the Job Command Center
# With --user: job is assigned to the specified pedagogue (only they see it)curator curate \ --source exam.pdf \ --referenceCode us.ap.spanish_language.hl \ --user pedagogue@colabedu.org# → job.shared = false, job.owner_email = "pedagogue@colabedu.org"# → visible only for that user in their panelRBAC Visibility Strategy
| User Type | Own Jobs | Shared Jobs (CLI) |
|---|---|---|
| Standard user | ✅ Yes | ❌ No |
CONTENT_CREATOR | ✅ Yes | ✅ Yes |
TEACHER | ✅ Yes | ✅ Yes |
ADMIN | ✅ Yes | ✅ Yes |
Mode 5 — Autonomous Curator Agent (ACA)
The ACA is a fully autonomous pipeline that orchestrates discover → fetch → curate → ingest for an entire curriculum standard in a single command, with automatic recovery from crashes and optional human-in-the-loop quality gates.
# 1. Create an autonomous plan for AP Spanish (US)aca plan create \ --name "AP Spanish US Sprint" \ --country US --standard AP --level HS --subject spanish \ --gates after_c0,after_c1# → returns planId: a3f9b821-...
# 2. Start (fully async — ACA handles everything)aca plan start a3f9b821
# 3. Monitoraca plan status a3f9b821
# 4. Approve gate after_c0 (review C0 specs before generating C1/C2)aca plan approve a3f9b821 --gate after_c0
# 5. Handle items requiring human reviewaca plan list a3f9b821 --status NEEDS_REVIEWaca plan approve a3f9b821 --item <itemId>Plan lifecycle states:
| State | Description |
|---|---|
DRAFT | Created, awaiting start |
RUNNING | Actively processing items |
WAITING_REVIEW | Paused at an approval gate |
PAUSED | Stall detected or manually paused — resume with aca plan start |
COMPLETED | All items processed |
FAILED | Critical error or cancelled |
Quality gating: Items with confidence_score < threshold (default 0.75) are marked NEEDS_REVIEW and routed to the HITL queue instead of being auto-ingested.
Recovery: A scheduled job (checkAndRecoverStalledPlans) detects plans with no last_heartbeat update for > 30 minutes and sets them to PAUSED for safe restart.
Architecture
ce-svc-cli├── CuratorCommands.java — curate (--source-type, --output-type, --user)├── CuratorTestCommands.java — test (--lms), batch├── DocDiscoveryCommands.java — discover, download, catalog, search, stats└── LmsPackageProcessor.java — IMSCC/MBZ unzip + GCS upload
ce-svc-ai-services├── CuratorController — POST /api/v1/curator/curate + /batch├── CuratorWorkflow — LangGraph4j (conditional LMS branch)├── LmsPackageDiscoveryStrategy — Canvas Commons, MoodleNet APIs├── HierarchyContextResolver — curricular context shared with Wizard└── IngestionService — BLOCK_CONTENT kind: Lesson | Resource | Exercise
ce-svc-model-api (DDL v9)├── jobs.shared BOOLEAN DEFAULT TRUE — shared queue visibility├── jobs.owner_id NULLABLE — allows ownerless jobs├── jobs.owner_email VARCHAR(255) NULLABLE — CLI launcher email├── curator_plans.* — ACA plan state: status, last_heartbeat, config_json└── curator_plan_items.* — per-item: source_url, confidence_score, attempts, status
ce-specs/├── catalog/personas/curators/ — Gems: router, lesson, resource, c2├── sources/lms/ — .imscc / .mbz packages├── sources/oer/ — CK-12, Khan JSON resources└── tests/lms_curator_test_manifest.yamlSeed & Bloom — Content Derivation Engine
The Seed & Bloom engine generates original pedagogical exercises from ingested source materials without exposing proprietary content. Source docs (books, rubrics, exam PDFs) are ingested as private seeds; the derived exercises are public specs with full provenance traceability.
Architecture
ce-svc-content-engine├── BloomStrategy (interface)│ ├── SourceDerivationBloom — derives exercises from a private/public source seed│ ├── ProgressionBloom — generates 4 difficulty tiers from any seed│ ├── MisconceptionRemediationBloom — remediation exercises for known misconceptions│ └── CurricularPipelineBloom — C0 rubric → C2 exercise set (planned)└── SeedBloomService — dispatches to the active strategy
ce-svc-ai-services├── BloomController — REST: POST /api/v1/bloom/derive└── EduWorkflowTools — MCP: bloomContent tool
ce-svc-cli└── CuratorCommands — curator bloom deriveBloom Strategies
| Strategy | Trigger | Output |
|---|---|---|
SOURCE_DERIVATION | Teacher provides a seed doc (book, rubric) | N original exercises derived from the source |
PROGRESSION | Any exercise seed | SIMPLIFIED / STANDARD / EXTENDED / CHALLENGE tiers |
MISCONCEPTION_REMEDIATION | Grader identifies a misconception | Targeted remediation exercise |
CURRICULAR_PIPELINE | C0 rubric seed | Full C2 exercise set (planned) |
Provenance Model
Every derived spec includes a metadata.provenance block:
metadata: referenceCode: ib.spa.derived.textual_analysis_ex_001 visibility: PUBLIC provenance: seedRef: ib.spa.textual_analysis_rubric # parent seed bloomStrategy: SOURCE_DERIVATION bloomedAt: "2025-06-11T15:00:00Z"Entry Points
| Entry Point | How |
|---|---|
| CLI | curator bloom derive --seed <ref> --strategy SOURCE_DERIVATION |
| REST | POST /api/v1/bloom/derive |
| MCP tool | bloomContent (Teacher Companion, Exercise Wizard) |
| Grading | RemediationEventOrchestrator → MISCONCEPTION_REMEDIATION |
- CLI Tutorial — Complete end-to-end walkthrough
- Autonomous Curator Agent — ACA plan lifecycle, recovery, and quality gates
- Taxonomy Reference — Bundle and standard schemas
- InteractiveLesson Reference — Spec generated from LMS packages
- ResourceLearning Reference — Spec generated from OER resources
- Gem Reference — Curator Gems: router, lesson, resource
- Seed & Bloom Engine — Content derivation concepts and YAML provenance reference