Skip to content

The Evaluation Engine and UI

In the ColabEdu ecosystem, the Spec Manager and Recipe Compiler act as the central engines responsible for translating a legal and pedagogical standard into an AI-executable evaluation artifact, without requiring model retraining (Fine-Tuning).

Architectural Magic: Late Binding

One of the invisible problems of scaling EdTech platforms is the “combinatorial explosion” (each teacher assigning dozens of different texts for the same rubric). OAS v1beta1 solves this with Late Binding:

  1. The Hollow Mold: The C1 template (e.g., “Argumentative Essay”) is saved in Git completely empty of contextual content (requires_dynamic_context: true).
  2. The Trigger: When the teacher assigns the task, they simply select or attach the context article (C2 Layer).
  3. Just-in-Time Compilation: No new exercise is created in the database. Only at the exact millisecond the student opens the application, the engine dynamically links C2 into the hollow C1 mold.

Deterministic Fusion and Mathematical Forcing

Our platform allows Off-the-Shelf foundational models to evaluate strictly or empathetically in milliseconds thanks to fundamental pillars:

A. The Pre-Compiler Node

The system delegates semantic mediation to a micro-agent. If the Law (C0) says “Penalize for each accent mark error” but the Teacher (C3) instructs “Ignore accent marks due to dyslexia”, the pre-compiler node executes the Logical Merge, overrides the punitive instruction, and outputs a unified JSON. The main Evaluator LLM never receives contradictory instructions.

B. Mathematical Forcing (Structured Outputs)

We prohibit the model from behaving like a word processor. Through Guided Decoding and Structured Outputs, the inference engine mathematically blocks the LLM to exclusively return pure data under the exact JSON schema (The Compiled Payload or JSON Contract). By forcing the grammar, the model focuses all its power on evaluation, not on formatting. This guarantees:

  1. Zero Hallucination (Determinism).
  2. Guaranteed Structured Output (API Contract).
  3. Computational Efficiency and Massive Token Savings.

C. “Needle In A Haystack” (NIAH) and RAG

When Layer C2 injects a complete article into the context, the LLM does not suffer cognitive overload; it keeps it perfectly mapped in its KV Cache while analyzing the student’s text thanks to the immense context window of modern models.

Separation of Presentation and Data (Server-Driven UI)

OAS frees the AI from Frontend design by strictly applying the Server-Driven UI pattern through the A2UI (Agent-to-UI) protocol. The AI returns a structured JSON, and the backend deterministically maps those variables into Flutter Widgets.

  • Primitives and Components: pdf_viewer, markdown_editor, forms_mcq, etc.
  • The Chatbot Companion: A “Sidecar” attachable to any widget to offer real-time Socratic tutoring.

Parameterization (Key-Value)

The engineer declares the configuration_schema in the ExerciseType, and the pedagogue injects the variables into the C1 Recipe (e.g., word_count_min: 250, enable_companion_chatbot: true).

The Reporting Engine: Convention over Configuration

Default Layouts (Zero-Config Report)

Teachers do not program interfaces. The A2UI protocol has default templates. A creator only points to the exerciseTypeRef, and the backend automatically assembles the report. The report catalog includes:

  • Family A (Layout): document_header_widget (protects PII), section_title_widget, divider_widget.
  • Family B (Core Evaluation): score_header_widget, criteria_breakdown_list_widget, spelling_diff_table_widget.
  • Family C (Creative Feedback): markdown_viewer_widget (the “Escape Hatch” for teacher creativity through constructive_feedback, protecting the structured overall_score).

Advanced Override (The Override Pattern)

A school district can alter the default report by explicitly declaring the report_components block in its YAML, reordering widgets, or adding new ones (e.g., competency_radar_chart_widget).

Final Benefits of Abstraction

  1. Immaculate UX: Zero code for the teacher.
  2. Token Savings: The LLM returns pure JSON, it does not design Markdown tables.
  3. Data Privacy (PII/FERPA): The LLM never sees the student’s name; the backend injects it later into the document_header_widget.
  4. Forensic Traceability: Fundamental for Government audits (B2G).