Problems with Inference on GPT NEO X LLM

draglord · May 28, 2024

Hello

I have pretrained a small model using GPT NEO X but i am facing problems with inference. I am using docker image. Training runs fine, but when i try inference i get the following error

Can anyone please run this on their machine and see if they get the same error?

Thanks

File "generate.py", line 93, in <module>
Traceback (most recent call last):
File "generate.py", line 93, in <module>
main()
File "generate.py", line 75, in main
generate_samples_interactive(
File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 777, in generate_samples_interactive
main()
File "generate.py", line 75, in main
generate_samples_interactive(
File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 777, in generate_samples_interactive
for (
File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 319, in stream_tokens
logits[:, -1].view(batch_size, -1).contiguous()
TypeError: tuple indices must be integers or slices, not tuple
for (
File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 319, in stream_tokens
logits[:, -1].view(batch_size, -1).contiguous()
TypeError: tuple indices must be integers or slices, not tuple

vishalrao · May 28, 2024

What version of python are you running? Just ensure it matches whatever this gpt neo x requires.
.

It seems to require python 3.8 specifically and not newer versions...

draglord · May 28, 2024

Everything is bundled in the docker image. I am using 3.8 itself. Apart from that it needs fused kernels which the docker image already installs.

vishalrao · May 28, 2024

Oh ok - I dont have GPU (nvidia) to try myself but I noticed https://github.com/EleutherAI/gpt-neox/pull/1122 where it's now using Python 3.10 on ubuntu 22.04 in the docker images.

Are you sure there is nothing running outside the docker container? Anything local that requires you to also use Python 3.10 in your local env?
.

Also try updating to latest versions of all libraries pytorch, tensorflow what not...? (yeah i know you are using docker image, but still)

draglord · May 28, 2024

Yeah I'm sure there's nothing running outside the docker image.

I think you have AMD GPUs right? There's support for those i think....or maybe it's Megatron i don't know

I also posted an issue on their repo but they're understaffed or i think they expect me to figure out the problem by myself.
I will look into the new branch with python 3.10
If you think of anything else, let me know

t3chg33k · May 28, 2024

draglord said:
Hello

I have pretrained a small model using GPT NEO X but i am facing problems with inference. I am using docker image. Training runs fine, but when i try inference i get the following error

Can anyone please run this on their machine and see if they get the same error?

Thanks

I was using a different container setup locally based on Mistral and RAG. Worked fine for days and then suddenly started throwing a lot of Python exceptions. No amount of instances worked thereafter. Just beats me when that happens.

draglord · May 28, 2024

t3chg33k said:
I was using a different container setup locally based on Mistral and RAG. Worked fine for days and then suddenly started throwing a lot of Python exceptions. No amount of instances worked thereafter. Just beats me when that happens.

Were you trying to pretrain a model?

Do you have the link?

asingh · May 29, 2024

It seems your dataset changes when moving away from training mode. That is an enumeration error for tuples and dicts. Probably not an image or PyPy induced error.

t3chg33k · May 29, 2024

draglord said:
Were you trying to pretrain a model?

Do you have the link?

This was several months ago for a project. It is basically using a pretrained model and then grounding it with other data sources, like how Bing Copilot works where it is able to use the Bing knowledge graph along with the pretrained ChatGPT model.

It works fine if you don't want to expend too much resources in creating a fully pretrained model. If you have a very niche use case however, then you will have to pretrain the model which I haven't done yet because of the extensive resource requirement.

draglord · May 29, 2024

asingh said:
It seems your dataset changes when moving away from training mode. That is an enumeration error for tuples and dicts. Probably not an image or PyPy induced error.

True, but couple of points.

1) How come it changes only for my setup? Apprently other people have been able to perform inference properly. Maybe using the new image might help

2) In the line

logits[:, -1].view(batch_size, -1).contiguous()

the subscripting is being done properly. They are integers and not a tuple, so i am not sure where does the problem lie. But i will try other subscriptions as well, logits[0:-1]. But i doubt the existing code is faulty.

t3chg33k said:
This was several months ago for a project. It is basically using a pretrained model and then grounding it with other data sources, like how Bing Copilot works where it is able to use the Bing knowledge graph along with the pretrained ChatGPT model.

It works fine if you don't want to expend too much resources in creating a fully pretrained model. If you have a very niche use case however, then you will have to pretrain the model which I haven't done yet because of the extensive resource requirement.

I am just doing this for fun. I tried pretrained models and they work fine, just wanted to see if i can pretrain a model.

@vishalrao The python 3.10 branch works. It cannot generate text unconditionally or through an input file but the interactive window works.

Thank you so much for your help.

vishalrao · May 30, 2024

Yup, I saw your comment in your github issue - you are using from someone's fork of the project it appears - maybe they are contributing a fix in progress.

Problems with Inference on GPT NEO X LLM

draglord

vishalrao

Global Moral Police

draglord

vishalrao

Global Moral Police

draglord

t3chg33k

draglord

asingh

t3chg33k

draglord

vishalrao

Global Moral Police