Skip to content

Curator Agent Capabilities

Curator Agent: Capabilities

The Curator Agent is ColabEdu’s AI-powered content ingestion pipeline. It transforms curricular documents (PDFs, Canvas/Moodle LMS packages) and open educational resources (CK-12, Khan Academy, Procomún, Europeana) into structured pedagogical specs ready for the platform.


What the Curator Does

CapabilityDescription
Source DiscoveryIdentifies canonical URLs of official curricular documents and OER/LMS repository APIs.
Document DownloadDownloads PDFs and LMS packages (.imscc Canvas, .mbz Moodle) to the local specs repository.
AI Analysis & GenerationExtracts pedagogical criteria, generates slides/widgets or quality rubrics using Gemini.
Gem Routingcurator_router classifies each input and delegates to the specialized Gem (lesson, resource, c2…).
YAML ScaffoldingWrites complete OAS InteractiveLesson, ResourceLearning, or ExerciseSpec specs.
LMS Package IngestionUnzips .imscc/.mbz, uploads assets to GCS, and generates InteractiveLesson with slides.
OER IndexingIndexes open source resources with a 3-dimension pedagogical qualityScore.
Bulk IngestionSpec sync via sync_specs.sh or curator batch.
Seed & BloomDerives new exercises from ingested seeds (source docs, rubrics) via configurable Bloom strategies. Seeds may be private; derived specs are public.

Unified Pipeline

Document (PDF / GCS URI) — or LMS Package (.imscc / .mbz) — or OER Resource (JSON/API)
POST /api/v1/curator/curate
{ source, sourceType, outputType, courseId?, pathwayId?, sandboxMode,
ownerEmail?, shared }
CuratorWorkflow (LangGraph4j)
├─ [fetch_source] → download + SHA-256 hash
│ │
│ sourceType == LMS_PACKAGE?
│ ├─ YES ─► [unzip_lms] → LmsPackageProcessor.unzipPackage()
│ │ └─► [upload_lms_assets] → GCS + assetUrlMap
│ │ └─► [gather_context]
│ └─ NO ─► [gather_context]
├─ [gather_context] → HierarchyContextResolver → CurricularContext
├─ [propose_content] → Gem selected by curator_router
│ ├─ ExerciseSpec ← curator_c2 (exam PDFs)
│ ├─ InteractiveLesson ← curator_lesson (LMS packages)
│ └─ ResourceLearning ← curator_resource (OER APIs)
├─ [wait_for_review] → headless=true → auto-approve
├─ [register_specs] → IngestionService (visibility: PUBLIC | PRIVATE)
SEED & BLOOM PIPELINE (Content Derivation)
──────────────────────────────────────────
curator curate --source rubric.pdf --visibility private ← ingest as seed
→ seed ingested with visibility=PRIVATE
curator bloom derive --seed ib.spa.textual_analysis_rubric \
--strategy SOURCE_DERIVATION|PROGRESSION|MISCONCEPTION_REMEDIATION
→ derived specs generated with visibility=PUBLIC + provenance.seedRef

Curricular Context Resolution

The same HierarchyContextResolver is shared with the Exercise Creation Wizard:

Explicit override (--standard / --subject)
└─ courseId → CourseClient (standard, level, subject from the course)
└─ pathwayId → PathwayClient (standard, level from the pathway)
└─ referenceCode in manifest (field parsing)
└─ generic prompt (no context)

Curator Gems

The gem.curator_router classifies each input and delegates to the specialized Gem:

GemOutputTriggers when
gem.curator_router— (routes)Always — classifies input and delegates
gem.curator_c0BlockRubricNormative curricular documents
gem.curator_c1RecipeComplete didactic units
gem.curator_c2ExerciseSpecExam PDFs, FRQ
gem.curator_lessonInteractiveLessonLMS packages (.imscc Canvas, .mbz Moodle)
gem.curator_resourceResourceLearningOER APIs: CK-12, Khan Academy, Procomún, Europeana

Gem YAML files are stored in ce-specs/catalog/personas/curators/.


Generated Spec Types

The ColabEdu curation pipeline can generate the following OAS-compliant spec types. Each maps to a Gem and a pipeline mode:

Spec Types by Pipeline

Spec TypeOAS KindGem / ModeDescription
ExerciseSpecBLOCK_CONTENTgem.curator_c2 / ClassicExercises, FRQ, text analysis with achievement bands
BlockRubricBLOCK_RUBRICgem.curator_c0 / ClassicNormative rubrics from curricular documents
RecipeRECIPEgem.curator_c1 / ClassicComplete didactic units with full pedagogical scaffolding
PathwayTemplatePATHWAY_TEMPLATEgem.curator_c1 / ACALearning pathways for a curriculum standard or theme
InteractiveLessonINTERACTIVE_LESSONgem.curator_lesson / LMSInteractive slides from LMS packages (Canvas, Moodle)
ResourceLearningRESOURCE_LEARNINGgem.curator_resource / OERIndexed OER resources with 3-dimensional quality score
CurriculumRequirementspec-as-code YAMLCurriculumGapAnalyzerRequirement definition files driving gap analysis

ExerciseSpec (C2_EXERCISES)

Classic Curator spec: exercises, FRQ, text analysis with achievement bands. Stored as BLOCK_CONTENT kind: ExerciseSpec.

BlockRubric (BLOCK_RUBRIC, C0)

Generated by gem.curator_c0 from normative curricular documents (official syllabi, standards frameworks). The C0 rubric defines the competency structure that C1 recipes and C2 exercises must map to.

kind: BLOCK_RUBRIC
referenceCode: us.ap.spanish_language.hl.rubric_c0
metadata:
standard: AP_SPANISH
country: US
competencies:
- id: simulated_conversation
weight: 0.30
- id: email_reply
weight: 0.25

Recipe / PathwayTemplate (RECIPE, C1)

Generated by gem.curator_c1 from complete didactic units. Contains full pedagogical scaffolding: objectives, activities, assessment criteria, and content sequencing. Used as a template for generating targeted C2 exercises.

kind: RECIPE
referenceCode: us.ap.spanish_language.hl.recipe_simulated_conversation
metadata:
theme: simulated_conversation
specType: RECIPE
provenance:
seedRef: us.ap.spanish_language.hl.rubric_c0

InteractiveLesson

Generated from LMS packages (Canvas Commons, Moodle Hub). Contains slides, interactive widgets, and learning objectives mapped to curricular competencies. See InteractiveLesson Reference.

ResourceLearning

Indexes OER resources with a 3-dimension qualityScore: alignment, pedagogicalRichness, accessibility. See ResourceLearning Reference.

CurriculumRequirement (spec-as-code YAML)

Not an OAS spec — a requirement definition file stored in ce-specs/catalog/requirements/. Read by the CurriculumGapAnalyzer to determine what specs are needed per standard + country.

ce-specs/catalog/requirements/ap_spanish_us.yaml
standard: AP_SPANISH
country: US
themes:
- id: simulated_conversation
name: Simulated Conversation
specTypes:
C2_EXERCISES: 10
INTERACTIVE_LESSON: 5
- id: email_reply
name: Email Reply
specTypes:
C2_EXERCISES: 8

These YAML files are the spec-as-code curriculum definition — they encode institutional curriculum requirements so the ACA can measure and close coverage gaps automatically.


Supported Sources

The KnownSourcesRegistry registers 14 active sources:

IDTypeStandardScope
madrid, mec, ebau.catEBAU ExamsLOMLOE🇪🇸
collegeboard, apcentralAP ExamsAP🇺🇸
webGoogle CSE🌐
mec-procomun, mec-recursosOERLOMLOE🇪🇸
khan-academy, ck12OER🌐
europeana-eduOER🇪🇺
canvas-commonsLMS🌐
moodle-hub, moodle-netLMS🌐
Ventana de terminal
# List all registered sources
docs sources
# Filter by scope
docs sources --scope es
docs sources --source canvas-commons

Operation Modes

Mode 1 — Headless / Automated (CI/CD)

Ventana de terminal
curator curate \
--source exam.pdf \
--courseId 142
# → fetch → context → propose → register (auto-approve)
# job.shared = true → visible to all CONTENT_CREATOR/TEACHER

Mode 2 — Sandbox + Manual Review

Ventana de terminal
curator curate \
--source canvas_module.imscc \
--source-type LMS_PACKAGE \
--output-type INTERACTIVE_LESSON \
--sandbox \
--output ./draft_specs/
curator validate --spec ./draft_specs/lesson.yaml --ingest

Mode 3 — LMS/OER Batch

Ventana de terminal
docs discover --source canvas-commons --subject biology --output canvas.yaml
docs download --list canvas.yaml --local-dir ce-specs/sources/lms/canvas/
curator batch --manifest ce-specs/tests/lms_batch_manifest.yaml --parallel 2
curator test --lms --all --threshold 70

Mode 4 — Shared Pedagogical Review Queue

Allows any pedagogue with CONTENT_CREATOR, TEACHER, or ADMIN role to view and approve jobs launched from the CLI, regardless of who launched them.

Ventana de terminal
# Without --user: job is shared (visible to all pedagogues)
curator curate \
--source exam.pdf \
--referenceCode us.ap.spanish_language.hl
# → job.shared = true, job.owner_email = null
# → 🟡 "Shared" badge in the Job Command Center
# With --user: job is assigned to the specified pedagogue (only they see it)
curator curate \
--source exam.pdf \
--referenceCode us.ap.spanish_language.hl \
--user pedagogue@colabedu.org
# → job.shared = false, job.owner_email = "pedagogue@colabedu.org"
# → visible only for that user in their panel

RBAC Visibility Strategy

User TypeOwn JobsShared Jobs (CLI)
Standard user✅ Yes❌ No
CONTENT_CREATOR✅ Yes✅ Yes
TEACHER✅ Yes✅ Yes
ADMIN✅ Yes✅ Yes

Mode 5 — Autonomous Curator Agent (ACA)

The ACA is a fully autonomous pipeline that orchestrates discover → fetch → curate → ingest for an entire curriculum standard in a single command, with automatic recovery from crashes and optional human-in-the-loop quality gates.

Ventana de terminal
# 1. Create an autonomous plan for AP Spanish (US)
aca plan create \
--name "AP Spanish US Sprint" \
--country US --standard AP --level HS --subject spanish \
--gates after_c0,after_c1
# → returns planId: a3f9b821-...
# 2. Start (fully async — ACA handles everything)
aca plan start a3f9b821
# 3. Monitor
aca plan status a3f9b821
# 4. Approve gate after_c0 (review C0 specs before generating C1/C2)
aca plan approve a3f9b821 --gate after_c0
# 5. Handle items requiring human review
aca plan list a3f9b821 --status NEEDS_REVIEW
aca plan approve a3f9b821 --item <itemId>

Plan lifecycle states:

StateDescription
DRAFTCreated, awaiting start
RUNNINGActively processing items
WAITING_REVIEWPaused at an approval gate
PAUSEDStall detected or manually paused — resume with aca plan start
COMPLETEDAll items processed
FAILEDCritical error or cancelled

Quality gating: Items with confidence_score < threshold (default 0.75) are marked NEEDS_REVIEW and routed to the HITL queue instead of being auto-ingested.

Recovery: A scheduled job (checkAndRecoverStalledPlans) detects plans with no last_heartbeat update for > 30 minutes and sets them to PAUSED for safe restart.


Architecture

ce-svc-cli
├── CuratorCommands.java — curate (--source-type, --output-type, --user)
├── CuratorTestCommands.java — test (--lms), batch
├── DocDiscoveryCommands.java — discover, download, catalog, search, stats
└── LmsPackageProcessor.java — IMSCC/MBZ unzip + GCS upload
ce-svc-ai-services
├── CuratorController — POST /api/v1/curator/curate + /batch
├── CuratorWorkflow — LangGraph4j (conditional LMS branch)
├── LmsPackageDiscoveryStrategy — Canvas Commons, MoodleNet APIs
├── HierarchyContextResolver — curricular context shared with Wizard
└── IngestionService — BLOCK_CONTENT kind: Lesson | Resource | Exercise
ce-svc-model-api (DDL v9)
├── jobs.shared BOOLEAN DEFAULT TRUE — shared queue visibility
├── jobs.owner_id NULLABLE — allows ownerless jobs
├── jobs.owner_email VARCHAR(255) NULLABLE — CLI launcher email
├── curator_plans.* — ACA plan state: status, last_heartbeat, config_json
└── curator_plan_items.* — per-item: source_url, confidence_score, attempts, status
ce-specs/
├── catalog/personas/curators/ — Gems: router, lesson, resource, c2
├── sources/lms/ — .imscc / .mbz packages
├── sources/oer/ — CK-12, Khan JSON resources
└── tests/lms_curator_test_manifest.yaml

Seed & Bloom — Content Derivation Engine

The Seed & Bloom engine generates original pedagogical exercises from ingested source materials without exposing proprietary content. Source docs (books, rubrics, exam PDFs) are ingested as private seeds; the derived exercises are public specs with full provenance traceability.

Architecture

ce-svc-content-engine
├── BloomStrategy (interface)
│ ├── SourceDerivationBloom — derives exercises from a private/public source seed
│ ├── ProgressionBloom — generates 4 difficulty tiers from any seed
│ ├── MisconceptionRemediationBloom — remediation exercises for known misconceptions
│ └── CurricularPipelineBloom — C0 rubric → C2 exercise set (planned)
└── SeedBloomService — dispatches to the active strategy
ce-svc-ai-services
├── BloomController — REST: POST /api/v1/bloom/derive
└── EduWorkflowTools — MCP: bloomContent tool
ce-svc-cli
└── CuratorCommands — curator bloom derive

Bloom Strategies

StrategyTriggerOutput
SOURCE_DERIVATIONTeacher provides a seed doc (book, rubric)N original exercises derived from the source
PROGRESSIONAny exercise seedSIMPLIFIED / STANDARD / EXTENDED / CHALLENGE tiers
MISCONCEPTION_REMEDIATIONGrader identifies a misconceptionTargeted remediation exercise
CURRICULAR_PIPELINEC0 rubric seedFull C2 exercise set (planned)

Provenance Model

Every derived spec includes a metadata.provenance block:

metadata:
referenceCode: ib.spa.derived.textual_analysis_ex_001
visibility: PUBLIC
provenance:
seedRef: ib.spa.textual_analysis_rubric # parent seed
bloomStrategy: SOURCE_DERIVATION
bloomedAt: "2025-06-11T15:00:00Z"

Entry Points

Entry PointHow
CLIcurator bloom derive --seed <ref> --strategy SOURCE_DERIVATION
RESTPOST /api/v1/bloom/derive
MCP toolbloomContent (Teacher Companion, Exercise Wizard)
GradingRemediationEventOrchestratorMISCONCEPTION_REMEDIATION