Feature · Streaming

See responses as they happen

Don't make users stare at a loading spinner. Stream tokens in real-time with async generators — one parameter to enable.

Live comparison

The difference is instant

Watch the same request side by side. Without streaming, you wait. With streaming, you see every token as it arrives.

Without Streaming
U
What is Veska?
V
Waiting for response...
...
With Streaming
U
What is Veska?
V
0.0s

How it works

Three steps to real-time responses

What is Veska?
1

User sends a message

Your app calls agent.run() with stream=True. The agent starts processing immediately.

toktoktoktoktokasync generator
2

Tokens stream back

The AI provider sends tokens one by one. Veska yields each through an async generator.

3

App displays instantly

Your frontend renders each token as it arrives. Users see the response building in real-time.

Benefits

Why streaming matters

Faster perceived speed

Users see the first token in milliseconds instead of waiting seconds for the full response.

Works with tools

Streaming continues to work when agents use tools mid-response. No interruption.

One parameter

Enable streaming by passing stream=True in run(). No config changes, no infrastructure setup.

SSE-ready

Wrap the async generator in Server-Sent Events for real-time web apps. Works with any frontend.

Your AppVeska AgentAI ProvidertokensstreamUser sees tokens appear in real-time

Architecture

Tokens flow from provider to user

When streaming is enabled, the AI provider sends tokens one at a time. Veska yields each token through an async generator. Your app renders them instantly — no buffering, no waiting.

Async generator — standard Python pattern
Works with both providers
Handles tool calls mid-stream seamlessly
Compatible with SSE, WebSockets, or any transport
Read the full documentation

Start streaming today

Enable real-time responses with a single parameter.