4 GPUs, PC freeze. Possible pcie lanes saturation?

draglord

Forerunner
Hello

I have an intel i5 11400f processor and MSI ACE Z590 GOLD EDITION Motherboard, coupled with 3x3090 and a 4080 super. i have two SSDs installed. 128gb ram.

I use it for machine learning. (i am using ubuntu).

3x3090 GPUs are connected through risers, with 4080 being installed in the motherboard.

Each gpu works fine when installed in the motherboard individually.

If i use 2 specific 3090s with risers that i know are working, the system runs fine.

If i use the third 3090 with the riser, my system hangs. It only hangs when the gpus are being used (loading the models for example). I have tried 6 risers so far but none of them were stable.

First question is, could i be saturating my pcis lanes? What would happen if all the PCIE lanes are used? Is that why my system hangs?

Also, how else can i debug what the problem is?

PS: Anyone here who can sell me a riser that worked for them?


Thanks
 
I've had some experience with saturating PCIe slots on both Intel and AMD. In short, it rarely works because there's not enough lanes.

You have two slots connected to the CPU, all of the other slots are connected to the chipset, they will be throttled and slow down the entire system if they're expected to perform on par with the other two slots.

You'll need a server platform, or a HEDT platform to have more than two slots connected to the CPU directly.

You may be able to get three cards connected to the CPU if you use a m.2 adapter to convert the first SSD slot into a x4 slot for a GPU but this only works on AMD, I couldn't get it to work with Intel.

What kind of risers are you using?
 
Last edited:
I've had some experience with saturating PCIe slots on both Intel and AMD. In short, it rarely works because there's not enough lanes.

You have two slots connected to the CPU, all of the other slots are connected to the chipset, they will be throttled and slow down the entire system if they're expected to perform on par with the other two slots.

You'll need a server platform, or a HEDT platform to have more than two slots connected to the CPU directly.

You may be able to get three cards connected to the CPU if you use a m.2 adapter to convert the first SSD slot into a x4 slot for a GPU but this only works on AMD, I couldn't get it to work with Intel.

What kind of risers are you using?
ok thanks..but do you think it'ss freeze the system or just cause a bottleneck?
 
I've seen complete system lockups when I tried to do a software raid between SSDs on CPU lanes and Chipset lanes. The SSD connected to the chipset just couldn't keep up.

Occasional bursts are fine for anything connected to the chipset but they cannot handle sustained throughput.

What kind of risers are you using?
 
I think it might be the risers. I just tried running the system with card in motherboard and an additional 3090 with the new risers and the system froze again.

I am using Generic risers bought from amaozn and sp road bangalore
 
It's possible there are timing glitches with unshielded risers, but it depends on what kind of risers you're using.

Are they shielded, unshielded, x1 or x4 or x16? It all depends.

Do you have a photo or a link for the different kind of risers you've tried?
 
These "risers" are interesting. I don't follow along that much so asking, were these made/designed for mining? Also do GPUs work at 100% capacity when using these risers for any work ?
 
Ah, those are are x1 risers, shielded because of the USB3 cable. They work for mining but they're kind of useless for machine learning. You'll need the full bandwidth of the slot so you'll need x16 risers.

I'm not sure if you'll see performance degradations by dropping down to x8 or x4 with ML — maybe other members have experience and can comment.

But yeah, you'll need a HEDT platform if you want more than two x8 slots, which is what all consumer motherboards top out at.
 
  • Like
Reactions: DigitalDude
These "risers" are interesting. I don't follow along that much so asking, were these made/designed for mining? Also do GPUs work at 100% capacity when using these risers for any work ?
Yes they were primarily used for mining. They allow gpus to be connected to the motherboard externally.

No they work at x1 so they cause a bottleneck. Some of them work at x16 though
 
Ah, those are are x1 risers, shielded because of the USB3 cable. They work for mining but they're kind of useless for machine learning. You'll need the full bandwidth of the slot so you'll need x16 risers.

I'm not sure if you'll see performance degradations by dropping down to x8 or x4 with ML — maybe other members have experience and can comment.

But yeah, you'll need a HEDT platform if you want more than two x8 slots, which is what all consumer motherboards top out at.
I understand that i cannot use those risers at x16 but i am making do. I need vram, speed is not that big of a concern.

Are you sure my setup cannot handle 4 gpus? I was thinking of using pcie extender cable
I actually did read this before, i but i stll went for the x1 riser
 
x1 is not Working with the 4th gpu. And yes i was thinking about biying pcie extender cable.

I thought that i will go for a riser that has been proven to work before, and then go for a cable like this. Thanks for the suggestion
 
@omkar13 I mostly work with LLMs, i play around with them and my company recently asked me to fine tune LLama 3.1 8b with a documentation of a product. Which i did using Unsloth