Software Engineering Team CU Dept. of Biomedical Informatics

Typed LLM Outputs with Instructor

Typed LLM Outputs with Instructor

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you have any questions or suggestions for blog posts, please don’t hesitate to reach out!

Introduction

LLMs are good at generating natural-language text, but applications usually need structured data. instructor is a small Python library that lets you describe the output you want with a Pydantic model and have the model response validated automatically. Here, Pydantic provides the schema for the LLM output, and instructor uses that schema to validate the response. That means less prompt glue, less manual parsing, and clearer types in your code, even when running a local model in-process with llama-cpp-python.

Without instructor

Without a typing layer, we often end up asking for JSON through prompting alone and hoping the model follows the format closely enough.

from pydantic import BaseModel


class TicketSummary(BaseModel):
    title: str
    priority: str
    owner: str

response = llm.create_chat_completion(
    messages=[
        {
            "role": "user",
            "content": (
                "Extract a support ticket summary from this text. Return title, "
                "priority, and owner. Use the first name only for the owner. "
                "'Our benchmark says the refactor is 10x faster, but only if "
                "we start the timer after the function returns. Assign this to "
                "Avery with high priority.' Return JSON with exactly these "
                "fields: title, priority, owner."
            ),
        }
    ],
    temperature=0,
)

print(response["choices"][0]["message"]["content"])

This is where the prompt-only path often falls apart: even when you ask for JSON, the model may still return headings, prose, or a shape that is close to what you wanted but not quite what your application expects.

Example output from the prompt-only path:

Here is the extracted support ticket summary in JSON format:

{
  "title": "Refactor Performance Issue",
  "priority": "High",
  "owner": "Avery"
}

With instructor

With instructor, the schema becomes part of the request.

import instructor
from pydantic import BaseModel


class TicketSummary(BaseModel):
    title: str
    priority: str
    owner: str

create = instructor.patch(
    create=llm.create_chat_completion_openai_v1,
    mode=instructor.Mode.MD_JSON,
)

ticket = create(
    response_model=TicketSummary,
    messages=[
        {
            "role": "user",
            "content": (
                "Extract a support ticket summary from this text. Return title, "
                "priority, and owner. Use the first name only for the owner. "
                "'Our benchmark says the refactor is 10x faster, but only if "
                "we start the timer after the function returns. Assign this to "
                "Avery with high priority.'"
            ),
        }
    ],
    temperature=0,
)

print(ticket)

For this local llama-cpp-python setup, instructor.Mode.MD_JSON was the more reliable choice during testing than stricter JSON-schema mode. We use instructor.patch(...) here because this example runs the model in-process through llama-cpp-python; if you are using an OpenAI-compatible server, instructor.from_provider(...) is often a cleaner option.

Example result:

TicketSummary(
    title="Refactor Optimization Issue",
    priority="High",
    owner="Avery",
)

We do not recommend starting benchmarks after the function returns, no matter how impressive the speedup looks!

How It Works

At a high level, instructor takes your Pydantic model, turns it into output guidance for the LLM call, and then validates what comes back. If the response matches the expected shape, you get a normal Python object. If not, instructor can retry or raise an error instead of quietly leaving you with malformed output text.

Advantages of instructor

Disadvantages

The full demo used for this post is included in this repository as examples/instructor_demo.py.

Run It Locally

This version does not require Ollama or a local model server. On first run, the script downloads a small public GGUF model, may build the native llama-cpp-python backend, and then runs the model in-process:

uv run --with instructor --with llama-cpp-python --with pydantic \
  https://raw.githubusercontent.com/CU-DBMI/set-website/main/examples/instructor_demo.py
Previous post
Introducing uv: A Fast, Portable Python Environment Manager