How to clone voices from sample Audio files

TheITN3RD · May 19, 2025

I have Posted the same thread in Artificial Intelligence & Machine Learning section as i am not sure which is the right section. Mods please decide the appropriate section for this post.

Hi all,

as a hobby project i have started making audio books , my Youtube channel is Audio Books Galore.I am trying in many verticals like Mental Health, Bitcoin and Blockchain etc. I have mastered generating text for audiobook, generate audio ,Youtube thumbnails and generate subtitles. i use capecut/Filmora to club and generate my content. My primary problem is nural / natural sounding audio with voice modulation. i have tried dozons of Ai websites but all have limitations , my books range from 5 minutes to 1hr+ and yet to come up with a solution for that. I have used offline tools like Balabolka , 2nd Speech Center clubbed it togather with Google TTS, Microsoft TTS. Have tried multiple voice and now I am using Microsoft George as my default voice . but i am not satisfied with the quality, its too robotic. i found a good website again its not free. i am looking for neural voices like this . i have tried cloning it online but again its not free.... Is there a way i can clone the voice from any audio file offline or clone any online voice or from WIndows App like PlayHT or Minimax. and use them offline tools like Balabolka , 2nd Speech Center or with any web interface offlne?

Alternately can i get such high quality free to use voices. i have siffed through so many projects on Git hub but all are written in python, even after installing Python 3.x or Miniconda i cant seem to make them install/ work as i have absolutely 0 knowlege about programming or how to install stuff using python.

This is a side hobby project so dont want to spend as these voices each can cost up to 3₤ /$ each, and i need various modulations like commercial/Norration/ampathy/ Voice modulation for different type of content.

any help is much appricated.

SirMatterNot · May 19, 2025

Not sure if there are good free sources you can access without some basic comfort with code. I think you might be able to clone on ElevenLabs without spending a bomb? You could also try running models on Replicate, and they charge you based on use. The better models might get expensive with sustained use though.

AK3D · May 20, 2025

https://github.com/rsxdalv/TTS-WebUI if you've got a computer with 12-16GB VRAM.
Or If you just need to convert Epub books - https://github.com/aedocw/epub2tts

TheITN3RD · May 20, 2025

AK3D said:
https://github.com/rsxdalv/TTS-WebUI if you've got a computer with 12-16GB VRAM.
Or If you just need to convert Epub books - https://github.com/aedocw/epub2tts

well epub can create copright issues, will install Webui and get bact to you.

Update: downloaded the setup,

now its asking me to download Visual studion which is 6.8GB ......does this project require Visual Studio???

AK3D · May 20, 2025

Make sure you've got plenty of space. The WebUI will also download multiple packages - it requires its own Python, VB and Conda environments (which the installer creates). Once you try and run a model, it'll also download the required model files and they might take up several GB, depending on what you're trying to run,.
Also assuming you've got an Nvidia GPU with at least 8GB VRAM.

TheITN3RD · May 21, 2025

AK3D said:
Make sure you've got plenty of space. The WebUI will also download multiple packages - it requires its own Python, VB and Conda environments (which the installer creates). Once you try and run a model, it'll also download the required model files and they might take up several GB, depending on what you're trying to run,.
Also assuming you've got an Nvidia GPU with at least 8GB VRAM.

Ohh I thaught its a lightweight setup so had installed on my Laptop (i5/16gb), but now i know the scope i will install it on my PC (i7/16GB/1650 OC)...guess i7/1650 can take on the load.

Search

Search

How to clone voices from sample Audio files

TheITN3RD

SirMatterNot

AK3D

TheITN3RD

AK3D

TheITN3RD