z_fi t1_jc06htn wrote on March 13, 2023 at 1:51 AM

Excellent demo on your page, I just used it on a YT video featuring a non-native English speaker. There was only a slight error in punctuation due to an ambiguously long pause in the speech.

Is this a purely commercial product or will there be an open source release?

toothpastespiders t1_jc01mr9 wrote on March 13, 2023 at 1:12 AM

Reply to comment by Amazing_Painter_7692 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

> BUT someone has already made a webUI like the automatic1111 one!

There's a subreddit for it over at /r/Oobabooga too that deserves more attention. I've only had a little time to play around with it but it's a pretty sleek system from what I've seen.

> it looked really complicated for me to set up with 4-bits weights

I'd like to say that the warnings make it more intimidating than it really is. I think it was just copying and pasting four or five lines for me onto a terminal. Then again I also couldn't get it to work so I might be doing something wrong. I'm guessing it's just that my weirdo gpu wasn't really accounted for somewhere. I'm going to bang my head against it when I've got time just because it's frustrating having tons of vram to spare and not getting the most out of it.

vintage2019 t1_jbzzadd wrote on March 13, 2023 at 12:54 AM

Reply to comment by phys_user in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Has anyone ranked models with that and published the results?

[deleted] t1_jbzyaoz wrote on March 13, 2023 at 12:46 AM

Reply to comment by [deleted] in [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung

[removed]

boyetosekuji t1_jbzwwo7 wrote on March 13, 2023 at 12:35 AM

Reply to [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung

was it the beads

th3nan0byt3 t1_jbzw23a wrote on March 13, 2023 at 12:29 AM

Reply to comment by remghoost7 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

only if you turn your pc case upside down

rshah4 t1_jbzvzsl wrote on March 13, 2023 at 12:28 AM

Reply to [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes

Is it open source?

[deleted] t1_jbzva06 wrote on March 13, 2023 at 12:23 AM

Reply to comment by SuperNovaEmber in [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung

[deleted]

The_frozen_one t1_jbzv0gt wrote on March 13, 2023 at 12:21 AM

Reply to comment by remghoost7 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

It takes about 7 seconds to generate a full response using 13B to a prompt with the default (128) number of predicted tokens.

megacewl t1_jbzts4h wrote on March 13, 2023 at 12:11 AM

Reply to comment by ID4gotten in [P] vanilla-llama an hackable plain-pytorch implementation of LLaMA that can be run on any system (if you have enough resources) by poppear

I think so? As the model being converted to 8-bit or 4-bit literally means that it was shrunk and is now smaller (and ironically this almost doesn't change the output quality at all), which is why it requires less VRAM to load.

There's tutorials to setup llama.cpp with 4-bit converted LLaMA models which may be worth checking out to help you achieve your goal. llama.cpp is an implementation of LLaMA in C++, that uses the CPU and system RAM. Someone got it running the 7B model on a Raspberry Pi 4 4GB so llama.cpp may be worth checking out if you're low on VRAM.

[deleted] t1_jbztbxc wrote on March 13, 2023 at 12:08 AM

Reply to comment by kkg_scorpio in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

[removed]

Mayfieldmobster t1_jbzspe0 wrote on March 13, 2023 at 12:03 AM

Reply to [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

There are tools that allow you to train models of very large sizes on much smaller hardware like colossal ai

MinaKovacs t1_jbzso7m wrote on March 13, 2023 at 12:03 AM

Reply to comment by MurlocXYZ in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

One of the few things we know for certain about the human brain is it is nothing like a binary computer. Ask any neuroscientist and they will tell you we still have no idea how the brain works. The brain operates at a quantum level, manifested in mechanical, chemical, and electromagnetic characteristics, all at the same time. It is not a ball of transistors.

remghoost7 t1_jbzro03 wrote on March 12, 2023 at 11:55 PM

Reply to comment by The_frozen_one in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Nice!

How's the generation speed...?

The_frozen_one t1_jbzqvwc wrote on March 12, 2023 at 11:49 PM

Reply to comment by remghoost7 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

I'm running it using https://github.com/ggerganov/llama.cpp. The 4-bit version of 13b runs ok without GPU acceleration.

[deleted] t1_jbzqsrt wrote on March 12, 2023 at 11:49 PM

Reply to comment by remghoost7 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

[removed]

remghoost7 t1_jbzqf5m wrote on March 12, 2023 at 11:46 PM

Reply to comment by Amazing_Painter_7692 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Most excellent. Thank you so much! I will look into all of these.

Guess I know what I'm doing for the rest of the day. Time to make more coffee! haha.

You are my new favorite person this week.

Also, one final question, if you will. What's so unique about the 4-bit weights and why would you prefer to run it in that manner? Is it just VRAM optimization requirements....? I'm decently versed in Stable Diffusion, but LLMs are fairly new territory for me.

My question seemed to have been answered here, and it is a VRAM limitation. Also, that last link seems to support 4-bit models as well. ~~Doesn't seem too bad to set up.... Though I installed A1111 when it first came out, so I learned through the garbage of that. Lol.~~ I was wrong. Oh so wrong. haha.

Yet again, thank you for your time and have a wonderful rest of your day. <3