How to clone voices from sample Audio files

TheITN3RD

Contributor
I have Posted the same thread in Artificial Intelligence & Machine Learning section as i am not sure which is the right section. Mods please decide the appropriate section for this post.

Hi all,


as a hobby project i have started making audio books , my Youtube channel is Audio Books Galore.I am trying in many verticals like Mental Health, Bitcoin and Blockchain etc. I have mastered generating text for audiobook, generate audio ,Youtube thumbnails and generate subtitles. i use capecut/Filmora to club and generate my content. My primary problem is nural / natural sounding audio with voice modulation. i have tried dozons of Ai websites but all have limitations , my books range from 5 minutes to 1hr+ and yet to come up with a solution for that. I have used offline tools like Balabolka , 2nd Speech Center clubbed it togather with Google TTS, Microsoft TTS. Have tried multiple voice and now I am using Microsoft George as my default voice . but i am not satisfied with the quality, its too robotic. i found a good website again its not free. i am looking for neural voices like this . i have tried cloning it online but again its not free.... Is there a way i can clone the voice from any audio file offline or clone any online voice or from WIndows App like PlayHT or Minimax. and use them offline tools like Balabolka , 2nd Speech Center or with any web interface offlne?

Alternately can i get such high quality free to use voices. i have siffed through so many projects on Git hub but all are written in python, even after installing Python 3.x or Miniconda i cant seem to make them install/ work as i have absolutely 0 knowlege about programming or how to install stuff using python.


This is a side hobby project so dont want to spend as these voices each can cost up to 3₤ /$ each, and i need various modulations like commercial/Norration/ampathy/ Voice modulation for different type of content.

any help is much appricated.
 
Not sure if there are good free sources you can access without some basic comfort with code. I think you might be able to clone on ElevenLabs without spending a bomb? You could also try running models on Replicate, and they charge you based on use. The better models might get expensive with sustained use though.
 
https://github.com/rsxdalv/TTS-WebUI if you've got a computer with 12-16GB VRAM.
Or If you just need to convert Epub books - https://github.com/aedocw/epub2tts
well epub can create copright issues, will install Webui and get bact to you.

Update: downloaded the setup,
1747728939492.png

now its asking me to download Visual studion which is 6.8GB ......does this project require Visual Studio???

1747729013350.png
 
Last edited:
Make sure you've got plenty of space. The WebUI will also download multiple packages - it requires its own Python, VB and Conda environments (which the installer creates). Once you try and run a model, it'll also download the required model files and they might take up several GB, depending on what you're trying to run,.
Also assuming you've got an Nvidia GPU with at least 8GB VRAM.