Username912773 t1_jbz7y8x wrote on March 12, 2023 at 9:32 PM

Reply to [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

LLMs cannot be sentient as they require input to generate an output and do not have initiative. They are essentially giant probabilistic networks that calculate the probability of the next token or word.

As you scale model size up, not only do you need more resources to train it but you also require more time and data to train it. So, why would anyone just “screw it” and spend potentially millions or billions of dollars on something that may or may not work and almost certainly have little monetary return?

ShowerVagina t1_jbz7ts9 wrote on March 12, 2023 at 9:31 PM

Reply to comment by pyepyepie in [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive

So how would this affect real world usage?

Amazing_Painter_7692 OP t1_jbz7hta wrote on March 12, 2023 at 9:28 PM

Reply to comment by 3deal in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

It's the HuggingFace transformers module version of the weights from Meta/Facebook Research.

https://github.com/huggingface/transformers/pull/21955

RedditLovingSun t1_jbz78cm wrote on March 12, 2023 at 9:27 PM

Reply to comment by MinaKovacs in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

Depends on your definition of intelligence, the human brain is nothing but a bunch of neurons passing electrical signals to each other, I don't see why it's impossible for computers to simulate something similar to achieve the same results as a brain does.

pyepyepie t1_jbz766k wrote on March 12, 2023 at 9:26 PM

Reply to comment by ShowerVagina in [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive

Correct me if I am wrong, I did to read the whole paper yet - they mask tokens out and see how it changes the loss, they do some trick that I had no energy to look for. It's not going to change the world. It's similar to this one: https://christophm.github.io/interpretable-ml-book/pixel-attribution.html

Toilet_Assassin t1_jbz6bjm wrote on March 12, 2023 at 9:20 PM

Reply to comment by TywinASOIAF in [D] Statsmodels ARIMA model predict function not working by ng_guardian

What do you mean when you say it can't handle hour data? I haven't ran into any issues with it as of yet.

3deal t1_jbz6b91 wrote on March 12, 2023 at 9:20 PM

Reply to [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Wait, the https://huggingface.co/decapoda-research/llama-13b-hf-int4/resolve/main/llama-13b-4bit.pt is the Facebook one ?

Is it fully open now ?

ShowerVagina t1_jbz680l wrote on March 12, 2023 at 9:20 PM

Reply to comment by currentscurrents in [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive

Can you explain this like I'm 5?

pyepyepie t1_jbz57hd wrote on March 12, 2023 at 9:12 PM

Reply to comment by currentscurrents in [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive

To be fair the paper looks interesting, the news title is garbage but it's not the fault of the authors who did a pretty cool job. Anyway, it seems like a nice application of a very well-known idea, which is cool.

By the way, is measuring the perturbation influence on the loss a common idea? Because I am mostly aware of using it to see how the regression value or class probabilities change - and the perturbation is done on the inputs, not params (edit ** incorrect, they do the perturbation on the inputs).

edit: "We follow the results of the studies [Koh and Liang, 2017; Bis et al., 2021] to approximate the perturbation effect directly through the model’s parameters when executing Leaving-One-Out experiments on the input. The influence function estimating the perturbation of an input z is then derived as:" - seems like I misunderstood it due to their notation. Seems like a pretty regular method.

bpw1009 t1_jbz4d57 wrote on March 12, 2023 at 9:06 PM

Reply to [D] What's the mathematical notation for "top k argmax"? by fullgoopy_alchemist

Here's ChatGPT's take on it for what it's worth 😂:

Yes, the notation you're looking for is "top k argmax". It's a common notation used in machine learning and optimization.

Formally, if you have a function f(x) over a set X, the top k argmax of f(x) is the set of k elements in X that maximize f(x). The notation is usually written as:

argmax_{x\in X} f(x) = {x_1, x_2, ..., x_k}

where x_1, x_2, ..., x_k are the k elements in X that maximize f(x).

Note that if k=1, then the top k argmax reduces to the usual argmax notation.

bmrheijligers t1_jbz402l wrote on March 12, 2023 at 9:04 PM

Reply to [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive

AnAtMan

SpaceCockatoo t1_jbz2mns wrote on March 12, 2023 at 8:54 PM

Reply to [P] vanilla-llama an hackable plain-pytorch implementation of LLaMA that can be run on any system (if you have enough resources) by poppear

Any plans to implement 4/8-bit quantization?

MinaKovacs t1_jbz2gqw wrote on March 12, 2023 at 8:53 PM

Reply to comment by hebekec256 in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

I think the math clearly doesn't work out; otherwise, Google would have monetized it already. ChatGPT is not profitable or practical for search. The cost of hardware, power consumption, and slow performance are already at the limits. It will take something revolutionary, beyond binary computing, to make ML anything more than expensive algorithmic pattern recognition.

charlesrwest t1_jbz1w0u wrote on March 12, 2023 at 8:49 PM

Reply to [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

Isn't that more or less what GPT-3 was? As I recall, most of the really big models are costing millions to train?

currentscurrents t1_jbz1hbw wrote on March 12, 2023 at 8:46 PM

Reply to [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive

TL;DR they suppress one token at a time and map how it affects the cross-entropy loss. Tokens which have a big impact must have been important for the output. It reminds me of older techniques for image explainability.

Paper link: https://arxiv.org/abs/2301.08110

clementiasparrow t1_jbz0x4x wrote on March 12, 2023 at 8:42 PM

Reply to comment by WesternLettuce0 in [D] Simple Questions Thread by AutoModerator

I think the standard solution would be concatting the two embeddings an putting a dense layer on top

hebekec256 OP t1_jbz0mpm wrote on March 12, 2023 at 8:40 PM

Reply to comment by MinaKovacs in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

Yes, I understand that. but LLMs and extensions of LLMs (like PALM-E) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would just break, otherwise I can't see why they wouldn't do it, since the risk to reward ratio seems favorable to me

MinaKovacs t1_jbyzv1v wrote on March 12, 2023 at 8:34 PM

Reply to [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

A binary computer is nothing more than an abacus. It doesn't matter how much you scale up an abacus, it will never achieve anything even remotely like "intelligence."