Virtual Avatar Spokesperson, Downloadable Workflow Kit (SBW-007)

PROVENANCE: Jordan, the avatar on this site's homepage, built live through the HeyGen MCP, exactly as documented here. Generalized for any small business, no vendor lock-in, no secrets, adapt freely. The kit's INSTALL.md is a one-shot Claude Code build: unzip, paste one prompt, answer ~5 questions.

TL;DR

This is the literal build behind Jordan on our homepage. You generate one photo of your spokesperson, connect the HeyGen MCP to Claude Code once, paste one prompt, and Claude does the rest: uploads the photo, creates the avatar, renders a talking video in your chosen voice, compresses it for the web, and drops it on your site. One image in, a moving spokesperson out. You approve; the machine assembles.

What you'll have at the end

A 30–45 second photorealistic spokesperson video, a real-looking person who talks, moves, and gestures, playing on your site (click-to-play, captioned, web-light). The same avatar then becomes a reusable asset for UGC ads and, later, a live AI concierge. Jordan is our proof: built with this exact pipeline, in one sitting.

The exact pipeline (what Claude runs for you)

                    HOW JORDAN WAS ACTUALLY BUILT
═══════════════════════════════════════════════════════════════════════

  YOU                          CLAUDE CODE  +  HEYGEN MCP
  ───                          ──────────────────────────
  1. generate one photoreal    ┌──────────────────────────────┐
     still of your person  ──▶ │ upload image as a HeyGen asset│
     (ChatGPT image / MJ)      │  (create_asset_upload → PUT)  │
                               └───────────────┬──────────────┘
  2. connect HeyGen MCP once                   ▼
     (terminal / IDE / desktop) ┌──────────────────────────────┐
                                │ create_photo_avatar → "Jordan"│
  3. paste ONE prompt    ──────▶│  trains in seconds (AI face = │
     ("build my spokesperson")  │  no consent step)             │
                                └───────────────┬──────────────┘
  4. sip tea                                    ▼
                                ┌──────────────────────────────┐
                                │ list_voices → pick warm/upbeat│
                                │ create_video_from_avatar      │
                                │  · Avatar IV, expressiveness  │
                                │    HIGH (real movement)       │
                                │  · 4:5, 1080p, your script    │
                                └───────────────┬──────────────┘
                                                ▼
                                ┌──────────────────────────────┐
                                │ poll get_video → download MP4 │
                                │ ffmpeg 2-pass: 52MB → ~6.5MB  │
                                │ embed click-to-play on site   │
                                └──────────────────────────────┘
═══════════════════════════════════════════════════════════════════════

What you need

Piece	Why	Notes
HeyGen account, Creator plan	Renders the avatar; removes the watermark	$24/mo. Creator is the sweet spot, watermark-free, 1080p, photo avatars. Pro's 4K is unneeded for web.
The HeyGen MCP server	Lets Claude Code drive HeyGen directly	Hosted, OAuth, no API key. Connection steps below.
Claude Code	The operator	Terminal, IDE extension, or desktop, all work.
One photoreal still	The face	ChatGPT image, Midjourney, or a HeyGen-generated face. Front-on, mouth unobstructed.
ffmpeg	Web compression	HeyGen exports ~50MB; the web needs ~6MB. Free. `scoop install ffmpeg` / `brew install ffmpeg`.

Step 1, Generate the face (one still)

Front-facing, evenly lit, eyes open, natural closed-mouth smile, mouth unobstructed (critical, it's what HeyGen lip-syncs). Chest-up, vertical. One good image is enough; Avatar IV builds the whole talking video from it.

The prompt that made Jordan (adapt the person to your brand):

Photorealistic vertical 4:5 studio portrait of a youthful, approachable
[describe your spokesperson]. Looking directly into camera, relaxed
closed-mouth smile. Business-casual. Setting: a warm, softly-lit space that
fits your brand, shallow depth of field. Face fully visible, evenly lit.
Natural skin texture, 85mm lens, true-to-life color. Not airbrushed, not
over-smoothed. No text, no logos.

Generate 3–4, pick the most human. Save it, every future ad reuses the same face.

Step 2, Connect the HeyGen MCP (once)

The MCP is hosted at https://mcp.heygen.com/mcp/v1/, uses OAuth (no API key), and works in every Claude Code surface. Pick your environment:

A) Claude Code, terminal (CLI)

claude mcp add --transport http heygen https://mcp.heygen.com/mcp/v1/ -s user

Then run /mcp, select heygen, approve in the browser. (-s user makes it available in every project.)

B) Claude Code, IDE extension (VS Code / Cursor / JetBrains)

Open the MCP servers panel → + Add → choose HTTP → paste https://mcp.heygen.com/mcp/v1/ → name it heygen. It appears with a "Needs Auth" button, click it, approve in the browser, then reload the window so the tools load. (CLI-added servers also appear here after a window reload.)

C) Claude desktop app / claude.ai (Connectors)

Settings → Connectors → Add custom connector → paste https://mcp.heygen.com/mcp/v1/ → authenticate with OAuth. Start a new chat so the tools are available.

One rule across all three: MCP tools load when a session starts. After you connect and authenticate, reload/restart so Claude can see the HeyGen tools.

Step 3, Paste the one-shot prompt

Drop your photo into the project folder, open Claude Code, and paste the prompt from INSTALL.md (ships in this kit). In plain terms it tells Claude:

Here's my spokesperson photo and my 40-second script. Using the HeyGen MCP: upload the image, create a photo avatar, pick a warm upbeat voice, render it on Avatar IV at high expressiveness in 4:5 1080p, poll until done, download it, compress it under 8MB with ffmpeg, and drop it into my site as a click-to-play video.

Then you wait. Claude runs the whole chain, the same calls listed in the diagram above, and reports back with a finished, web-ready file.

Step 4, What Claude does under the hood (so you can trust it)

create_asset_upload → PUT the image bytes to S3 → complete_asset_upload.
create_photo_avatar → your avatar trains (AI faces skip the consent gate; real people require create_avatar_consent, a browser approval).
list_voices → choose a warm, upbeat young voice (filter by gender/language).
create_video_from_avatar → Avatar IV, expressiveness: high, aspectRatio: 4:5, resolution: 1080p, your script + voiceId.
get_video polling until completed → grab video_url + thumbnail_url.

The step everyone forgets: HeyGen's MP4 lands at ~10 Mbps (≈50MB for 40s). Two-pass ffmpeg drops it to ~6.5MB with no visible loss on a talking head:

ffmpeg -y -i in.mp4 -c:v libx264 -b:v 1150k -pass 1 -preset slow \
  -profile:v high -pix_fmt yuv420p -an -f mp4 /dev/null
ffmpeg -y -i in.mp4 -c:v libx264 -b:v 1150k -pass 2 -preset slow \
  -profile:v high -pix_fmt yuv420p -c:a aac -b:a 128k \
  -movflags +faststart out.mp4

Embed it click-to-play (poster + captions, preload="none", no autoplay) so the page stays fast.

Avatar IV vs Avatar V

Avatar IV builds a talking, gesturing avatar from one image, what you want here. Avatar V is "cross-reference-driven" and needs video footage / multiple angles, so it's for digital-twin avatars, not a single generated still. For a one-photo spokesperson, Avatar IV at high expressiveness is the right and best engine.

Voice, and an honest limit

list_voices has hundreds; filter for a warm, upbeat, conversational young voice (the emotion-tagged ones like "Friendly" land best). Set speed near 1.0 so natural pauses breathe. The honest catch: HeyGen's stock TTS can read a touch flat and monotone on longer scripts. For a homepage hero it's fine to start; for a polished v2, render the voice in ElevenLabs (more lifelike, real intonation, even laughs) and feed that audio to HeyGen as audio_url/audio_asset_id instead of a text script. Same avatar, far better delivery.

Scaling the same avatar (after the homepage video)

UGC ads: one sentence of intent → a new 9:16 script → a new render. Same face, new message, weekly. Flag AI-generated content in the ad platform.
Product showcases: a per-SKU template (intro → three features → CTA).
Live AI concierge: the avatar's face on a chat widget powered by an LLM that knows your catalog and steers customers to high-ticket items, honestly, premium-option-first.

Compliance

Disclose the AI (Jordan's script does it out loud). Never claim to be human.
AI-content flags ON for Meta/TikTok ads.
Never clone a real person's face or voice without written consent, law, not policy. AI-generated faces are fine and skip HeyGen's consent gate.

Numbers to watch

Metric	Healthy	Where
Web file size	< 8 MB per 40s	post-ffmpeg
Homepage perf score	unchanged (click-to-play)	Lighthouse
Cost per ad variant	< $10 once the avatar exists	HeyGen credits
Welcome video play rate	≥ 15% of visitors	analytics

Week-one checklist

Photoreal still generated, mouth unobstructed, saved as the canonical face
HeyGen MCP connected + authenticated in your environment; session reloaded
One-shot prompt run; avatar created and video rendered on Avatar IV
MP4 compressed under 8MB (ffmpeg two-pass) and embedded click-to-play
Captions attached; homepage performance unchanged
Same face saved for the next ad

Troubleshooting

MCP tools don't appear: you didn't reload after authenticating. Restart the session.
"Needs Auth" won't clear: remove duplicate entries (claude mcp list), keep one at user scope, re-auth, reload.
Avatar V unavailable: expected for a single-photo avatar, use Avatar IV.
Video is 50MB: you skipped the ffmpeg pass. Never ship the raw HeyGen export.
Voice sounds flat: swap to an ElevenLabs voice fed as audio (see Voice, above).
Hands/mouth look off: regenerate the still front-on with the mouth fully visible.