Small Biz Workflows
FORM SBW-007 · REV B · 5-FILE KITGrowth

Virtual Avatar Spokesperson

Build a branded, photorealistic AI spokesperson — exactly like Jordan on this site's homepage — by handing one photo to Claude Code and letting it drive HeyGen end to end. Sit back, drink your tea, it ships.

TIME TO VALUE
1 evening for your first talking spokesperson
RUNNING COST
$24/mo HeyGen Creator + your existing site
STACK
HeyGen (Creator plan) · The HeyGen MCP server · Claude Code · One photoreal still (ChatGPT image, Midjourney, or HeyGen) · ffmpeg (free) for web compression
Get the kit free ↓READ THE FULL BUILD BELOW · KIT FREE WITH EMAIL

PROVENANCE: Jordan — the avatar on this site's homepage — built live through the HeyGen MCP, exactly as documented here. Generalized for any small business — no vendor lock-in, no secrets, adapt freely. The kit's INSTALL.md is a one-shot Claude Code build: unzip, paste one prompt, answer ~5 questions.

TL;DR

This is the literal build behind Jordan on our homepage. You generate one photo of your spokesperson, connect the HeyGen MCP to Claude Code once, paste one prompt, and Claude does the rest: uploads the photo, creates the avatar, renders a talking video in your chosen voice, compresses it for the web, and drops it on your site. One image in, a moving spokesperson out. You approve; the machine assembles.

What you'll have at the end

A 30–45 second photorealistic spokesperson video — a real-looking person who talks, moves, and gestures — playing on your site (click-to-play, captioned, web-light). The same avatar then becomes a reusable asset for UGC ads and, later, a live AI concierge. Jordan is our proof: built with this exact pipeline, in one sitting.

The exact pipeline (what Claude runs for you)

                    HOW JORDAN WAS ACTUALLY BUILT
═══════════════════════════════════════════════════════════════════════

  YOU                          CLAUDE CODE  +  HEYGEN MCP
  ───                          ──────────────────────────
  1. generate one photoreal    ┌──────────────────────────────┐
     still of your person  ──▶ │ upload image as a HeyGen asset│
     (ChatGPT image / MJ)      │  (create_asset_upload → PUT)  │
                               └───────────────┬──────────────┘
  2. connect HeyGen MCP once                   ▼
     (terminal / IDE / desktop) ┌──────────────────────────────┐
                                │ create_photo_avatar → "Jordan"│
  3. paste ONE prompt    ──────▶│  trains in seconds (AI face = │
     ("build my spokesperson")  │  no consent step)             │
                                └───────────────┬──────────────┘
  4. sip tea                                    ▼
                                ┌──────────────────────────────┐
                                │ list_voices → pick warm/upbeat│
                                │ create_video_from_avatar      │
                                │  · Avatar IV, expressiveness  │
                                │    HIGH (real movement)       │
                                │  · 4:5, 1080p, your script    │
                                └───────────────┬──────────────┘
                                                ▼
                                ┌──────────────────────────────┐
                                │ poll get_video → download MP4 │
                                │ ffmpeg 2-pass: 52MB → ~6.5MB  │
                                │ embed click-to-play on site   │
                                └──────────────────────────────┘
═══════════════════════════════════════════════════════════════════════

What you need

Piece Why Notes
HeyGen account, Creator plan Renders the avatar; removes the watermark $24/mo. Creator is the sweet spot — watermark-free, 1080p, photo avatars. Pro's 4K is unneeded for web.
The HeyGen MCP server Lets Claude Code drive HeyGen directly Hosted, OAuth, no API key. Connection steps below.
Claude Code The operator Terminal, IDE extension, or desktop — all work.
One photoreal still The face ChatGPT image, Midjourney, or a HeyGen-generated face. Front-on, mouth unobstructed.
ffmpeg Web compression HeyGen exports ~50MB; the web needs ~6MB. Free. scoop install ffmpeg / brew install ffmpeg.

Step 1 — Generate the face (one still)

Front-facing, evenly lit, eyes open, natural closed-mouth smile, mouth unobstructed (critical — it's what HeyGen lip-syncs). Chest-up, vertical. One good image is enough; Avatar IV builds the whole talking video from it.

The prompt that made Jordan (adapt the person to your brand):

Photorealistic vertical 4:5 studio portrait of a youthful, approachable
[describe your spokesperson]. Looking directly into camera, relaxed
closed-mouth smile. Business-casual. Setting: a warm, softly-lit space that
fits your brand, shallow depth of field. Face fully visible, evenly lit.
Natural skin texture, 85mm lens, true-to-life color. Not airbrushed, not
over-smoothed. No text, no logos.

Generate 3–4, pick the most human. Save it — every future ad reuses the same face.

Step 2 — Connect the HeyGen MCP (once)

The MCP is hosted at https://mcp.heygen.com/mcp/v1/, uses OAuth (no API key), and works in every Claude Code surface. Pick your environment:

A) Claude Code — terminal (CLI)

claude mcp add --transport http heygen https://mcp.heygen.com/mcp/v1/ -s user

Then run /mcp, select heygen, approve in the browser. (-s user makes it available in every project.)

B) Claude Code — IDE extension (VS Code / Cursor / JetBrains)

Open the MCP servers panel → + Add → choose HTTP → paste https://mcp.heygen.com/mcp/v1/ → name it heygen. It appears with a "Needs Auth" button — click it, approve in the browser, then reload the window so the tools load. (CLI-added servers also appear here after a window reload.)

C) Claude desktop app / claude.ai (Connectors)

Settings → Connectors → Add custom connector → paste https://mcp.heygen.com/mcp/v1/ → authenticate with OAuth. Start a new chat so the tools are available.

One rule across all three: MCP tools load when a session starts. After you connect and authenticate, reload/restart so Claude can see the HeyGen tools.

Step 3 — Paste the one-shot prompt

Drop your photo into the project folder, open Claude Code, and paste the prompt from INSTALL.md (ships in this kit). In plain terms it tells Claude:

Here's my spokesperson photo and my 40-second script. Using the HeyGen MCP: upload the image, create a photo avatar, pick a warm upbeat voice, render it on Avatar IV at high expressiveness in 4:5 1080p, poll until done, download it, compress it under 8MB with ffmpeg, and drop it into my site as a click-to-play video.

Then you wait. Claude runs the whole chain — the same calls listed in the diagram above — and reports back with a finished, web-ready file.

Step 4 — What Claude does under the hood (so you can trust it)

  1. create_asset_upload → PUT the image bytes to S3 → complete_asset_upload.

  2. create_photo_avatar → your avatar trains (AI faces skip the consent gate; real people require create_avatar_consent — a browser approval).

  3. list_voices → choose a warm, upbeat young voice (filter by gender/language).

  4. create_video_from_avatarAvatar IV, expressiveness: high, aspectRatio: 4:5, resolution: 1080p, your script + voiceId.

  5. get_video polling until completed → grab video_url + thumbnail_url.

  6. The step everyone forgets: HeyGen's MP4 lands at ~10 Mbps (≈50MB for 40s). Two-pass ffmpeg drops it to ~6.5MB with no visible loss on a talking head:

    ffmpeg -y -i in.mp4 -c:v libx264 -b:v 1150k -pass 1 -preset slow \
      -profile:v high -pix_fmt yuv420p -an -f mp4 /dev/null
    ffmpeg -y -i in.mp4 -c:v libx264 -b:v 1150k -pass 2 -preset slow \
      -profile:v high -pix_fmt yuv420p -c:a aac -b:a 128k \
      -movflags +faststart out.mp4
    
  7. Embed it click-to-play (poster + captions, preload="none", no autoplay) so the page stays fast.

Avatar IV vs Avatar V

Avatar IV builds a talking, gesturing avatar from one image — what you want here. Avatar V is "cross-reference-driven" and needs video footage / multiple angles, so it's for digital-twin avatars, not a single generated still. For a one-photo spokesperson, Avatar IV at high expressiveness is the right and best engine.

Voice — and an honest limit

list_voices has hundreds; filter for a warm, upbeat, conversational young voice (the emotion-tagged ones like "Friendly" land best). Set speed near 1.0 so natural pauses breathe. The honest catch: HeyGen's stock TTS can read a touch flat and monotone on longer scripts. For a homepage hero it's fine to start; for a polished v2, render the voice in ElevenLabs (more lifelike, real intonation, even laughs) and feed that audio to HeyGen as audio_url/audio_asset_id instead of a text script. Same avatar, far better delivery.

Scaling the same avatar (after the homepage video)

Compliance

Numbers to watch

Metric Healthy Where
Web file size < 8 MB per 40s post-ffmpeg
Homepage perf score unchanged (click-to-play) Lighthouse
Cost per ad variant < $10 once the avatar exists HeyGen credits
Welcome video play rate ≥ 15% of visitors analytics

Week-one checklist

Troubleshooting

MEMBERS · FREE · THE KIT

Get the Virtual Avatar Spokesperson kit — free

Drop your email and the full kit opens right here. No spam; new-workflow announcements only.