Extended Thinking

Extended thinking gives agents a budget-controlled scratchpad for chain-of-thought reasoning before producing their final answer. This improves quality on complex tasks.

Enabling thinking

Pass a thinking config when creating the agent:

thinking.py
from veska import Agent

agent = Agent(
    name="analyst",
    system_prompt="You analyze complex problems step by step.",
    model="claude-sonnet-4-6",
    thinking={
        "enabled": True,
        "budget_tokens": 10000,   # Max tokens for thinking
        "output": "discard",      # What to do with thinking text
    },
)

Output modes

ModeBehavior
discardThinking text is used internally but not returned (default)
logThinking text is logged via the Logger system
exposeThinking text is returned in the response for debugging

Exposing thinking

expose.py
agent = Agent(
    name="debugger",
    system_prompt="You debug code issues.",
    model="claude-sonnet-4-6",
    thinking={"enabled": True, "budget_tokens": 15000, "output": "expose"},
)

result = agent.run("Why does this function return None?")
# result.output contains the final answer
# The thinking process was used internally to reason through the problem

Streaming with thinking

When streaming, thinking chunks arrive as thinking_delta events:

stream_thinking.py
def on_token(text):
    print(text, end="")

result = agent.run("Solve this math problem", stream=on_token)

ThinkingHandler

handler.py
from veska import ThinkingHandler

handler = ThinkingHandler(
    enabled=True,
    budget_tokens=10000,
    output="log",
)

config = handler.get_config()     # ThinkingConfig for the provider
handler.get_history()             # Past thinking entries
handler.clear_history()           # Clear thinking history

Provider support

Extended thinking is supported on compatible Claude models. Use provider.supports_thinking() to check if the current model supports it.