- Ajay's Quest
- Posts
- This hack made AI Voice clones 100x better
This hack made AI Voice clones 100x better
Create videos in minutes not hours
I’ve made a quick 1-minute video explaining what I’ve figured out, and the details are below.
Some of you might have already guessed based on the title, but the entire video—including editing—was created using AI. It only took me 2–3 minutes to make (excluding rendering time), which is absolutely wild.
But here’s the strange part: when I first tried Eleven Labs’ instant voice cloning, it sounded nothing like me.
The Secret to Making AI Sound More Like You
There are specific tips and tricks to make your AI-generated voice sound 10–100x better, and I’ll walk you through them.
Step 1: Training Your Voice with ChatGPT
The first thing you’ll need is training data in the form of your voice recordings. The problem with uploading random videos is that they usually don’t have enough coverage for the model to learn effectively.
Here’s how I solved that:
I asked ChatGPT (or Claude—any LLM works) to generate a script filled with the most common words used in YouTube videos.
Since ChatGPT remembers past conversations, I also asked it to generate words I’m most likely to use, based on my interactions with it.
You can customise this further by generating words from specific domains or areas you often talk about.
The more training data you create, the better your results will be!
Step 2: Creating a Pro Voice with Eleven Labs
For Eleven Labs, you’ll need a Creator-tier subscription (it costs a bit, but it’s worth it for creating a pro voice).
Navigate to the Pro Voice section and upload your training data.
30 minutes of audio is a good starting point, but more is better. I used 30 minutes for mine, but I’m confident that 2–3 hours would lead to even greater improvements.
Right now, my voice sounds a little flat, so I’m planning to experiment further.
Once uploaded, your pro voice will be trained in about 1–2 hours.
A quick tip: I noticed that when the "Similarity" setting was maxed out, the voice sounded less like me. After some trial and error, these settings worked best for me:
Stability: 50%
Similarity: 75%
Style Exaggeration: 0%
These settings might differ for you, so don’t be afraid to tweak them!
Step 3: Post-Production
The post-production process was much simpler compared to training the voice. Here’s what I did:
I used HeyGen’s AI Avatars to create the video, adding my Eleven Labs voiceover.
I ran the video through Opus Clips to quickly add captions and B-roll with just two clicks.
I hope this guide was helpful and blew your mind as much as it did mine! It definitely made my Head of Engineering, Rene, rethink a few things.
Let me know if you try this out—I’d love to hear about your results.
Until next time,
Ajay
Things I actually use 🚀
Here are a few recommendations or ways I can help you. Follow the links if any of these are interesting. (FYI - Using these links helps support the newsletter through a small kick back!)
🌍 Hiring Global Talent: Fine A+ players at lower costs with Athyna
👨🎨 Get Unlimited Design Work: Want great designs at a low monthly cost? Try ManyPixels
📰 Build a Newsletter: Want to build a newsletter like me? Try Beehiiv