Feature · Streaming

See responses as they happen

Don't make users stare at a loading spinner. Stream tokens in real-time with async generators — one parameter to enable.

Live comparison

The difference is instant

Watch the same request side by side. Without streaming, you wait. With streaming, you see every token as it arrives.

Without Streaming

What is Veska?

Waiting for response...

...

With Streaming

What is Veska?

0.0s

How it works

Three steps to real-time responses

User sends a message

Your app calls agent.run() with stream=True. The agent starts processing immediately.

Tokens stream back

The AI provider sends tokens one by one. Veska yields each through an async generator.

App displays instantly

Your frontend renders each token as it arrives. Users see the response building in real-time.

Benefits

Why streaming matters

Faster perceived speed

Users see the first token in milliseconds instead of waiting seconds for the full response.

Works with tools

Streaming continues to work when agents use tools mid-response. No interruption.

One parameter

Enable streaming by passing stream=True in run(). No config changes, no infrastructure setup.

SSE-ready

Wrap the async generator in Server-Sent Events for real-time web apps. Works with any frontend.

Architecture

Tokens flow from provider to user

When streaming is enabled, the AI provider sends tokens one at a time. Veska yields each token through an async generator. Your app renders them instantly — no buffering, no waiting.

Async generator — standard Python pattern

Works with both providers

Handles tool calls mid-stream seamlessly

Compatible with SSE, WebSockets, or any transport

Read the full documentation

Start streaming today

Enable real-time responses with a single parameter.

Get Started Next: Structured Output