Skip to content

Batch YAML — Reference

Curator Batch YAML Pipeline as Code

Curator batch YAML files let you define pedagogical content ingestion pipelines in a fully declarative way. They are stored in ce-specs/batches/ and executed with the spec batch run command.

A batch can chain any subset of pipeline stages in whatever order you need — from automatic source discovery to final ingestion into the OAS catalog.


Full Schema

# ══════════════════════════════════════════════════════════════════════
# Curator Batch Definition — Schema v1
# Path: ce-specs/batches/{name}.yaml
# ══════════════════════════════════════════════════════════════════════
# ── Batch metadata ────────────────────────────────────────────────────
batch:
id: string # Unique identifier (kebab-case slug)
name: string # Human-readable operator name
description: string # Pedagogical description of the content to ingest
# ── Curricular context ────────────────────────────────────────────────
context:
standard: string # Standard key: AP | LOMLOE | IB | EBAU | COMMON_CORE
country: string # ISO-3166-1 alpha-2: US | ES | MX | GB
locale: string # Language: es | en | fr
subject: string # Subject slug: spanish_language_and_culture | lcl | biology
level: string # Educational level: "9-12" | "2 BACH" | "HL" | "SL"
authority: string # Educational authority (optional): "California Dept. of Education"
program: string # Specific program (optional): "AP Spanish Language"
referenceCode: string # Base OAS code for generated specs
# ── Stages to execute (in order) ─────────────────────────────────────
operations:
- taxonomy # Ensure taxonomy nodes exist in the backend
- discover # Discover document source URLs
- fetch # Download documents to local staging
- parse # Extract OAS YAML specs via AI (Curator)
- approval # Wait for human review (HITL)
- ingest # Ingest approved specs into the OAS catalog
# ── Per-stage configuration (include only the stages you use) ─────────
taxonomy:
nodes: # Taxonomy nodes to ensure
- standard: string
level: string
subject: string
discover:
types: [string] # exams | standards | open_books
keywords: [string] # Search terms
sources: [string] # Source domains: collegeboard.org, educacion.gob.es…
maxResults: integer # Result limit
fetch:
stagingDir: string # Local download directory
parallelDownloads: integer # Parallel downloads (default: 1)
urls: [string] # Static URLs (if DISCOVER is skipped)
parse:
specTypes: [string] # C0_STANDARDS | C1_RECIPES | C2_EXERCISES | BLOCK_RUBRIC
sections: [string] # Document sections to process (empty = full document)
filePaths: [string] # Local file paths (if FETCH is skipped)
approval:
timeoutHours: integer # Hours before cancelling the wait (default: 48)
autoApprove: boolean # true = approve automatically without human review
ingest:
targetRepo: string # Target repository (always: "ce-specs")
branch: string # GitOps branch: batch/{batch-id}
specsDir: [string] # Local dirs with ready-to-ingest YAML files

Pipeline stages

StageDescriptionREST Endpoint
taxonomyEnsures taxonomy nodes (standard → level → subject) exist in the backend before processing contentPOST /api/v1/curator/taxonomy/ensure
discoverSearches educational sources (exam portals, OER repositories) and builds a list of document URLsPOST /api/v1/curator/discover
fetchDownloads discovered documents to a local staging directory with progress reportingPOST /api/v1/curator/fetch
parseSends documents to the Curator AI service for multimodal OAS spec extractionPOST /api/v1/curator/parse
approvalBlocks the pipeline until a human approves the specs in the Approval UIPOST /api/v1/curator/approval/submit + polling
ingestPushes approved specs to the spec-manager via the OAS ingestion APIPOST /api/v1/curator/ingest

Real examples

batch_02_ap_slc_frq_2024.yaml
batch:
id: batch-02-ap-slc-frq-2024
name: "Batch 02 — AP Spanish Language FRQ 2024"
description: >
AP Spanish Language and Culture — Free-Response Questions 2024.
Sections: Task 1 (Email), Task 2 (Essay), Task 3 (Conversation),
Task 4 (Oral/Written Presentation).
context:
standard: AP
country: US
locale: es
subject: spanish_language_and_culture
level: HL
referenceCode: us.ap.spanish_language_and_culture.ap_spanish_language_2024_frq.v1
operations: [parse, ingest]
parse:
filePaths:
- "/path/to/ce-specs/sources/us/ap/c2_exams/ap_spanish_language_2024_frq.pdf"
specTypes: [C0_STANDARDS, C2_EXERCISES]
sections:
- "Task 1"
- "Task 2"
- "Task 3"
- "Task 4"
ingest:
targetRepo: ce-specs
branch: batch/batch-02-ap-slc-frq-2024

CLI Commands

Ventana de terminal
# Run a batch
./ce-cli.sh --env local "spec batch run --batch ce-specs/batches/batch_02_ap_slc_frq_2024.yaml"
# Dry-run: show execution plan without running anything
./ce-cli.sh --env local "spec batch run --batch my_batch.yaml --dry-run"
# Resume from a specific stage (after a failure)
./ce-cli.sh --env local "spec batch run --batch my_batch.yaml --from parse"
# Show the status of the last batch run
./ce-cli.sh --env local "spec batch status"
# Generate a batch YAML using AI from a natural language description
./ce-cli.sh --env local \
"spec batch generate 'EBAU Spanish language exams Madrid 2024 and 2025' --out my_batch.yaml"

Available flags

FlagAliasDescription
--batch <file>-bPath to the batch YAML definition file
--from <stage>-fResume from this stage: taxonomy|discover|fetch|parse|approval|ingest
--dry-run-dShow execution plan without running anything

Checkpoint and recovery

The CLI persists batch state in .ce-batch-state.json in the working directory. If the pipeline fails at any stage, you can resume it from where it stopped:

Ventana de terminal
# Inspect the current checkpoint
cat .ce-batch-state.json
# Resume from parse (after fixing the error)
./ce-cli.sh --env local "spec batch run --batch my_batch.yaml --from parse"

The state file stores:

  • batchId, status (RUNNING | COMPLETED | FAILED | PAUSED)
  • discoveredUrls[], fetchedFiles[], parsedSpecs[]
  • approvalWorkflowIds[], ingestedReferenceCodes[]
  • log[] with the execution history

Valid specTypes

ValueOAS LayerDescription
C0_STANDARDSC0Regulatory standards ingested from official documents
BLOCK_RUBRICC0Immutable evaluation rubrics
C1_RECIPESC1Assembled pedagogical recipes
C2_EXERCISESC2Exercises with context
INTERACTIVE_LESSONC2Interactive lessons with slides
RESOURCE_LEARNINGC2Annotated OER pointers

Where the files live

ce-specs/
└── batches/
├── batch_01_ap_slc_frq_2023.yaml
├── batch_02_ap_slc_frq_2024.yaml
├── batch_03_ap_slit_frq_2024.yaml
├── batch_04_ap_slc_scoring_guide.yaml
├── batch_06_ebau_madrid_2024_25.yaml
├── batch_10_rubricas_lcl_nacional.yaml
├── discover_spanish_california.yaml
└── ...

See Also