First Word Ignored Issue / Single Word Instruction

#11
by pandora-s - opened

Okay, so this problem might actually also be a base model issue, after some testing, the model completely ignores single word instructions that are lower cased and in English, at least, A LOT of them. And on the base model from what I tested I also saw the same behavior, ignoring single word completion requests, I'm trying to get feedback from other users and try to find out what exactly is the source of this issue.

By ignoring, I mean it answers without taking any consideration of the word being used.

I've been experimenting and here is what I saw:

  • With the Instruct, 90% of english words on lowercase will end up with nonsense responses. (words like: sad, jump, high, scream... while with upper it works, Scream for example)
  • With the base, no MATTER what word I choose from the english dict, it will always answer/complete "Error"+something, other languages it works fine.

Feedback from someone, hands up to Saran Gangster on discord: https://pastebin.com/a5ENQU8b

100 words, 100 responses (instruct api):
https://pastebin.com/xxtFJ2WA

Forget the "uppercase" "lowercase" theory, seems to happen to work on both, but might be related to token quantity and not exactly words as one might have guessed.

I've been experimenting and here is what I saw:

  • With the Instruct, 90% of english words on lowercase will end up with nonsense responses. (words like: sad, jump, high, scream... while with upper it works, Scream for example)
  • With the base, no MATTER what word I choose from the english dict, it will always answer/complete "Error"+something, other languages it works fine.

Im using the base on together.ai, so if someone can run it locally and confirm my experiments would be nice.

I've observed the same behavior, however I'm using a quantized version of the model and wasn't sure if it had anything to do with this.

Thank you for posting your observations here.

Very surprised nobody else is talking about this issue... Maybe it's just us?

Edit: Yep, could be just an issue with exl2 quantized models. I don't have this issue with GGUF

Not sure about it being only us, cause me and another ones from discord used Mistral's API, TogetherAI and other platforms, and this issue seems to be an actual issue from what I understand. Or its a tokenizer or prompt template issue but... kinda doupt it, would have to test the base model local at some point.

Mistral AI_ org

I'm not sure if I follow. Are you saying it's ignoring the first word when it's lower case? Is the instruction only one single word? Could you send an example prompt that we can try reproduce?

Well, its actually even more simple than that (I was previously wrong), basically I'm having problems with just one single word instructions as provided here:

100 words, 100 responses (instruct api):
https://pastebin.com/xxtFJ2WA

These ones were made using Mistral API

And I am not sure if its together.ai or the base model but, when I play around with the base with single word completions I also saw a few issues.

Mistral AI_ org

Have you tried Le Chat? Did you see the same behavior with Le Chat?

I only noticed this with Mixtral 8x22

A nice example of this issue with the Instruct are examples like these ones:

User: cry
Mixtral8x22: Hello! How can I help you today? Is there a particular topic or question you have in mind related to the word "hello"? If you're looking for more information about the origin of the word "hello," it actually has an interesting history. The word "hello" was first used in writing in 1826, but it didn't become a common greeting until the invention of the telephone in the late 19th century. The use of "hello" as a telephone greeting was popularized by Thomas Edison, who recommended it as a more formal alternative to "hullo," which was commonly used in England at the time.

If you have any other questions or topics you'd like to discuss, please let me know! I'm here to help with any information or resources you might need.

this is something that never happens with Mixtral8x7 and most other models, so I've a feeling this is not something normal

Mistral AI_ org

yeah seems like an issue. we are looking into it!

Yeah, I noticed a few strange things about this model like it stopping mid sentence. It could be the Q4_K_Mquant but the strange thing is the Q4_K_Mquant of wizard-lm-2is rock solid and has no problems?

Yeah, I noticed a few strange things about this model like it stopping mid sentence. It could be the Q4_K_Mquant but the strange thing is the Q4_K_Mquant of wizard-lm-2is rock solid and has no problems?

Some of my problems were due to missing the space between <s> and [INST]:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]

Now I've added that it seems to be working fine. Working better, but still cutting off mid-sentence sometimes...

This model doesn't work right at all. Just produces garbage even when using their examples on the main model page.

This model doesn't work right at all. Just produces garbage even when using their examples on the main model page.

I dont think this is the same issue, what exactly is the problem? You can test the model online on demo websites and such, and works nicely, this is a pretty specific issue. Maybe open a new conversation/issue and develop the problem, and someone might be able to help 🔥.

Sign up or log in to comment