The AI Agent

What the agent can do, voice vs text interfaces, and how to configure agents per project.

Ordinary Animator includes an AI agent that can answer questions about your production, navigate to any entity, retrieve data, and help with production tasks. It's accessible via voice or text.

What the agent can do

The agent has a set of tools it can call:

Navigation — open any page (project, episode, scene, shot, character, location, prop) by name or ID
Entity queries — list all characters, find a location by description, retrieve shot context
Data retrieval — read screenplay text, look up voice settings, check what media a shot has
UI commands — trigger actions on the current page

The agent is stateless between sessions. It uses Firebase RTDB for persistent project data and doesn't maintain memory across conversations.

Voice vs text

Voice — The voice agent connects via WebSocket. It uses built-in Voice Activity Detection (VAD) to detect when you're speaking and when you've finished. Responses stream back as audio. This is the primary interface for hands-free production work.

Text — The text interface is a standard chat panel. Useful when you want to read long responses, copy text, or work in a quiet environment.

Both interfaces share the same agent logic and tools.

The voice streaming model

Voice is handled by the agent service (deployed to Google Cloud Run). It uses Google ADK's run_live() with a LiveRequestQueue for bidirectional audio streaming. The agent is stateless — all persistent state is in Firebase RTDB. This means the agent can be restarted without losing your project data.

Configuring agents

Each project can have multiple agent configurations, each with:

A name
System instructions — the role and context for this agent
Context documents — additional reference material the agent reads
Tools — which tool categories the agent can use

Agent configurations are managed on the Agents page, accessible from the project menu. Default agents are seeded when a project is created. You can create specialised agents — one focused on story continuity, one on technical production questions, etc.

Using the agent effectively

The agent works best when your project data is complete and well-organised. It reads character profiles, location descriptions, shot context, and screenplay text to ground its answers. Filling in Story properties on entities gives the agent more to work with.

For navigation: "Go to the scene where Kira and Marcus meet" works if you've written meaningful scene descriptions. Vague entity names make navigation harder.