Concept & Research

Multiplayer Avatar Battles: Concept Notes

Why turn-based avatar-versus-avatar formats are interesting, and the technical reasons we haven't shipped them yet.

2026-04-30 2 min read Share on X

"Multiplayer Avatar Battles" sounds like a launched feature; it isn't. This article is concept notes — a public sketch of where the format gets interesting and where the engineering gets honest.

The interesting part

A 30-second creator-versus-creator format — your avatar, my avatar, one prompt, two responses — has obvious format virality. It's the AI-video equivalent of a rap battle round. The judge can be the audience (votes), the platform (engagement score), or a neutral third character.

The engineering reality

Multi-avatar rendering on the talking-photo pipeline doesn't exist. Each render today is a single speaker on a single background. Putting two avatars in one frame requires either:

1. Compositing two single-speaker renders post-hoc — feasible today, but the lip-sync timing has to be choreographed precisely. We'd need a script-side beat structure that guarantees Speaker A finishes their line before Speaker B starts.

2. A different provider tier — HeyGen's higher-tier flows support multi-character scenes natively. Costs more, longer render time, larger surface.

We'll know it's time to build this when audience metrics on single-speaker renders plateau and creators start asking for it explicitly. Not before.

The format we're testing first

Before building battles, we're testing a simpler shape: two-scene "reply" reels, where one creator's reel includes a CTA for another creator to render a reply scene. Same format virality, zero new infrastructure.

What you can do today

If you want to play with the battle format right now, render two reels, hand them to a video editor, and post the cut. We'd genuinely rather you try the format with the tools you already have than wait for us to ship a polished button.