Extended Thinking

Extended thinking gives agents a budget-controlled scratchpad for chain-of-thought reasoning before producing their final answer. This improves quality on complex tasks.

Enabling thinking

Pass a thinking config when creating the agent:

thinking.py

from veska import Agent

agent = Agent(
    name="analyst",
    system_prompt="You analyze complex problems step by step.",
    model="claude-sonnet-4-6",
    thinking={
        "enabled": True,
        "budget_tokens": 10000,   # Max tokens for thinking
        "output": "discard",      # What to do with thinking text
    },
)

Output modes

Mode	Behavior
`discard`	Thinking text is used internally but not returned (default)
`log`	Thinking text is logged via the Logger system
`expose`	Thinking text is returned in the response for debugging

Exposing thinking

expose.py

agent = Agent(
    name="debugger",
    system_prompt="You debug code issues.",
    model="claude-sonnet-4-6",
    thinking={"enabled": True, "budget_tokens": 15000, "output": "expose"},
)

result = agent.run("Why does this function return None?")
# result.output contains the final answer
# The thinking process was used internally to reason through the problem

Streaming with thinking

When streaming, thinking chunks arrive as thinking_delta events:

stream_thinking.py

def on_token(text):
    print(text, end="")

result = agent.run("Solve this math problem", stream=on_token)

ThinkingHandler

handler.py

from veska import ThinkingHandler

handler = ThinkingHandler(
    enabled=True,
    budget_tokens=10000,
    output="log",
)

config = handler.get_config()     # ThinkingConfig for the provider
handler.get_history()             # Past thinking entries
handler.clear_history()           # Clear thinking history

Provider support

Extended thinking is supported on compatible Claude models. Use provider.supports_thinking() to check if the current model supports it.