GPU requirements

#32
by jmoneydw - opened

Perhaps i've missed it but what are the GPU requirements for this model? I have an A100 and running through vllm it's saying the model took 39gb. Is that right? The A100 has 40GB of memory, so not much memory left for anything else.

That does not sound right for the full precision model, you would need 300Gb of VRAM to run it on full precision, of course, only on full precision. With 40gb however I think its impossible to run it even with quantization.

at least using vllm, it's taking 32gb per gpu for "model loading". And then pytorch looks to be consuming the rest and you get a CUDA OOM. I tried to set --dtype half for quantization and it didn't seem to make much difference memory wise so I assumed I configured it wrong so still reading.

Sign up or log in to comment