Problems with Inference on GPT NEO X LLM

draglord

Forerunner
Hello

I have pretrained a small model using GPT NEO X but i am facing problems with inference. I am using docker image. Training runs fine, but when i try inference i get the following error

Can anyone please run this on their machine and see if they get the same error?

Thanks

File "generate.py", line 93, in <module>
Traceback (most recent call last):
File "generate.py", line 93, in <module>
main()
File "generate.py", line 75, in main
generate_samples_interactive(
File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 777, in generate_samples_interactive
main()
File "generate.py", line 75, in main
generate_samples_interactive(
File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 777, in generate_samples_interactive
for (
File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 319, in stream_tokens
logits[:, -1].view(batch_size, -1).contiguous()
TypeError: tuple indices must be integers or slices, not tuple
for (
File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 319, in stream_tokens
logits[:, -1].view(batch_size, -1).contiguous()
TypeError: tuple indices must be integers or slices, not tuple
 
What version of python are you running? Just ensure it matches whatever this gpt neo x requires.
.

It seems to require python 3.8 specifically and not newer versions...
 
Oh ok - I dont have GPU (nvidia) to try myself but I noticed https://github.com/EleutherAI/gpt-neox/pull/1122 where it's now using Python 3.10 on ubuntu 22.04 in the docker images.

Are you sure there is nothing running outside the docker container? Anything local that requires you to also use Python 3.10 in your local env?
.

Also try updating to latest versions of all libraries pytorch, tensorflow what not...? (yeah i know you are using docker image, but still)
 
Yeah I'm sure there's nothing running outside the docker image.

I think you have AMD GPUs right? There's support for those i think....or maybe it's Megatron i don't know

I also posted an issue on their repo but they're understaffed or i think they expect me to figure out the problem by myself.
I will look into the new branch with python 3.10
If you think of anything else, let me know
 
  • Like
Reactions: vishalrao
Hello

I have pretrained a small model using GPT NEO X but i am facing problems with inference. I am using docker image. Training runs fine, but when i try inference i get the following error

Can anyone please run this on their machine and see if they get the same error?

Thanks
I was using a different container setup locally based on Mistral and RAG. Worked fine for days and then suddenly started throwing a lot of Python exceptions. No amount of instances worked thereafter. Just beats me when that happens.
 
I was using a different container setup locally based on Mistral and RAG. Worked fine for days and then suddenly started throwing a lot of Python exceptions. No amount of instances worked thereafter. Just beats me when that happens.
Were you trying to pretrain a model?

Do you have the link?
 
It seems your dataset changes when moving away from training mode. That is an enumeration error for tuples and dicts. Probably not an image or PyPy induced error.
 
Were you trying to pretrain a model?

Do you have the link?
This was several months ago for a project. It is basically using a pretrained model and then grounding it with other data sources, like how Bing Copilot works where it is able to use the Bing knowledge graph along with the pretrained ChatGPT model.

It works fine if you don't want to expend too much resources in creating a fully pretrained model. If you have a very niche use case however, then you will have to pretrain the model which I haven't done yet because of the extensive resource requirement.
 
It seems your dataset changes when moving away from training mode. That is an enumeration error for tuples and dicts. Probably not an image or PyPy induced error.

True, but couple of points.

1) How come it changes only for my setup? Apprently other people have been able to perform inference properly. Maybe using the new image might help

2) In the line
logits[:, -1].view(batch_size, -1).contiguous()
the subscripting is being done properly. They are integers and not a tuple, so i am not sure where does the problem lie. But i will try other subscriptions as well, logits[0:-1]. But i doubt the existing code is faulty.
This was several months ago for a project. It is basically using a pretrained model and then grounding it with other data sources, like how Bing Copilot works where it is able to use the Bing knowledge graph along with the pretrained ChatGPT model.

It works fine if you don't want to expend too much resources in creating a fully pretrained model. If you have a very niche use case however, then you will have to pretrain the model which I haven't done yet because of the extensive resource requirement.
I am just doing this for fun. I tried pretrained models and they work fine, just wanted to see if i can pretrain a model.

@vishalrao The python 3.10 branch works. It cannot generate text unconditionally or through an input file but the interactive window works.

Thank you so much for your help.
 
Last edited:
Yup, I saw your comment in your github issue - you are using from someone's fork of the project it appears - maybe they are contributing a fix in progress.