Hi
I'm working with nanogpt, a repo that allows you to make a small version of ChatGPT, and i have run into a problem.
github.com
I have to train it using large amount of data, but having insufficient Ram i have to figure out a way where i train the model with a subset of data first, then move on to the next file.
What i have so far is multiple input files and a script that trains the model. Then i have a bash script that runs the training file by feeding it multiple inputs in a loop. But when i resume training, it just doesn't happen. No errors.
I'm new to ML and do not fully understand how the script works. The max iterations are 5000. Training resumes at 5000 steps. Is it because 5000 steps have already been completed that training doesn't resume?
Any input will be helpful
Thanks
I'm working with nanogpt, a repo that allows you to make a small version of ChatGPT, and i have run into a problem.
GitHub - karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
The simplest, fastest repository for training/finetuning medium-sized GPTs. - GitHub - karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
I have to train it using large amount of data, but having insufficient Ram i have to figure out a way where i train the model with a subset of data first, then move on to the next file.
What i have so far is multiple input files and a script that trains the model. Then i have a bash script that runs the training file by feeding it multiple inputs in a loop. But when i resume training, it just doesn't happen. No errors.
I'm new to ML and do not fully understand how the script works. The max iterations are 5000. Training resumes at 5000 steps. Is it because 5000 steps have already been completed that training doesn't resume?
Any input will be helpful
Thanks