Code snippet incredibly slow - can someone give some tips?

#77
by NatanRajch - opened

This code took a 13gb ram machine 3 hs to provide an answer. Im new to the field, can someone tell me if this is expected behaviour, or if I should tweak something in the code?
How does hugging_chat, replicate, and other chats get so quick responses?
Thank you in advance for your help!!

import transformers
import torch
access_token = 'MY_TOKEN'

model_id = "meta-llama/Meta-Llama-3-8B"

pipeline = transformers.pipeline(
    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto",token=access_token
)
pipeline("Hey how are you doing today?")

Sign up or log in to comment