First Word Ignored Issue / Single Word Instruction

#11

by pandora-s - opened Apr 17

Apr 17

•

Okay, so this problem might actually also be a base model issue, after some testing, the model completely ignores single word instructions that are lower cased and in English, at least, A LOT of them. And on the base model from what I tested I also saw the same behavior, ignoring single word completion requests, I'm trying to get feedback from other users and try to find out what exactly is the source of this issue.

pandora-s

Apr 17

By ignoring, I mean it answers without taking any consideration of the word being used.

pandora-s

Apr 17

•

edited Apr 17

I've been experimenting and here is what I saw:

With the Instruct, 90% of english words on lowercase will end up with nonsense responses. (words like: sad, jump, high, scream... while with upper it works, Scream for example)
With the base, no MATTER what word I choose from the english dict, it will always answer/complete "Error"+something, other languages it works fine.

pandora-s

Apr 17

Feedback from someone, hands up to Saran Gangster on discord: https://pastebin.com/a5ENQU8b

pandora-s

Apr 17

•

edited Apr 18

100 words, 100 responses (instruct api):
https://pastebin.com/xxtFJ2WA

Forget the "uppercase" "lowercase" theory, seems to happen to work on both, but might be related to token quantity and not exactly words as one might have guessed.

pandora-s

Apr 17

I've been experimenting and here is what I saw:

With the Instruct, 90% of english words on lowercase will end up with nonsense responses. (words like: sad, jump, high, scream... while with upper it works, Scream for example)

With the base, no MATTER what word I choose from the english dict, it will always answer/complete "Error"+something, other languages it works fine.

Im using the base on together.ai, so if someone can run it locally and confirm my experiments would be nice.

Thireus

Apr 18

•

edited Apr 18

I've observed the same behavior, however I'm using a quantized version of the model and wasn't sure if it had anything to do with this.

Thank you for posting your observations here.

Thireus

Apr 18

•

edited Apr 18

Very surprised nobody else is talking about this issue... Maybe it's just us?

Edit: Yep, could be just an issue with exl2 quantized models. I don't have this issue with GGUF

pandora-s

Apr 19

Not sure about it being only us, cause me and another ones from discord used Mistral's API, TogetherAI and other platforms, and this issue seems to be an actual issue from what I understand. Or its a tokenizer or prompt template issue but... kinda doupt it, would have to test the base model local at some point.

sophiamyang

Mistral AI_ org Apr 19

I'm not sure if I follow. Are you saying it's ignoring the first word when it's lower case? Is the instruction only one single word? Could you send an example prompt that we can try reproduce?

pandora-s

Apr 19

•

edited Apr 19

Well, its actually even more simple than that (I was previously wrong), basically I'm having problems with just one single word instructions as provided here:

100 words, 100 responses (instruct api):
https://pastebin.com/xxtFJ2WA

These ones were made using Mistral API

pandora-s

Apr 19

And I am not sure if its together.ai or the base model but, when I play around with the base with single word completions I also saw a few issues.

sophiamyang

Mistral AI_ org Apr 19

Have you tried Le Chat? Did you see the same behavior with Le Chat?

pandora-s

Apr 19

I only noticed this with Mixtral 8x22

pandora-s

Apr 19

A nice example of this issue with the Instruct are examples like these ones:

User: cry
Mixtral8x22: Hello! How can I help you today? Is there a particular topic or question you have in mind related to the word "hello"? If you're looking for more information about the origin of the word "hello," it actually has an interesting history. The word "hello" was first used in writing in 1826, but it didn't become a common greeting until the invention of the telephone in the late 19th century. The use of "hello" as a telephone greeting was popularized by Thomas Edison, who recommended it as a more formal alternative to "hullo," which was commonly used in England at the time.

If you have any other questions or topics you'd like to discuss, please let me know! I'm here to help with any information or resources you might need.

this is something that never happens with Mixtral8x7 and most other models, so I've a feeling this is not something normal

sophiamyang

Mistral AI_ org Apr 19

yeah seems like an issue. we are looking into it!

jukofyork

Apr 21

Yeah, I noticed a few strange things about this model like it stopping mid sentence. It could be the Q4_K_Mquant but the strange thing is the Q4_K_Mquant of wizard-lm-2is rock solid and has no problems?

jukofyork

Apr 23

•

edited Apr 23

Yeah, I noticed a few strange things about this model like it stopping mid sentence. It could be the Q4_K_Mquant but the strange thing is the Q4_K_Mquant of wizard-lm-2is rock solid and has no problems?

Some of my problems were due to missing the space between <s> and [INST]:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]

~~Now I've added that it seems to be working fine.~~ Working better, but still cutting off mid-sentence sometimes...

clint-greene

Apr 25

This model doesn't work right at all. Just produces garbage even when using their examples on the main model page.

pandora-s

Apr 25

This model doesn't work right at all. Just produces garbage even when using their examples on the main model page.

I dont think this is the same issue, what exactly is the problem? You can test the model online on demo websites and such, and works nicely, this is a pretty specific issue. Maybe open a new conversation/issue and develop the problem, and someone might be able to help 🔥.

headmyshoulder

26 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment