Recent comments in /f/MachineLearning
ShowerVagina t1_jbz7ts9 wrote
Reply to comment by pyepyepie in [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive
So how would this affect real world usage?
Amazing_Painter_7692 OP t1_jbz7hta wrote
Reply to comment by 3deal in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
It's the HuggingFace transformers module version of the weights from Meta/Facebook Research.
RedditLovingSun t1_jbz78cm wrote
Reply to comment by MinaKovacs in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
Depends on your definition of intelligence, the human brain is nothing but a bunch of neurons passing electrical signals to each other, I don't see why it's impossible for computers to simulate something similar to achieve the same results as a brain does.
pyepyepie t1_jbz766k wrote
Reply to comment by ShowerVagina in [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive
Correct me if I am wrong, I did to read the whole paper yet - they mask tokens out and see how it changes the loss, they do some trick that I had no energy to look for. It's not going to change the world. It's similar to this one: https://christophm.github.io/interpretable-ml-book/pixel-attribution.html
Toilet_Assassin t1_jbz6bjm wrote
Reply to comment by TywinASOIAF in [D] Statsmodels ARIMA model predict function not working by ng_guardian
What do you mean when you say it can't handle hour data? I haven't ran into any issues with it as of yet.
3deal t1_jbz6b91 wrote
Reply to [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Wait, the https://huggingface.co/decapoda-research/llama-13b-hf-int4/resolve/main/llama-13b-4bit.pt is the Facebook one ?
Is it fully open now ?
ShowerVagina t1_jbz680l wrote
Reply to comment by currentscurrents in [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive
Can you explain this like I'm 5?
pyepyepie t1_jbz57hd wrote
Reply to comment by currentscurrents in [N] AtMan could solve the biggest problem of ChatGPT by Number_5_alive
To be fair the paper looks interesting, the news title is garbage but it's not the fault of the authors who did a pretty cool job. Anyway, it seems like a nice application of a very well-known idea, which is cool.
By the way, is measuring the perturbation influence on the loss a common idea? Because I am mostly aware of using it to see how the regression value or class probabilities change - and the perturbation is done on the inputs, not params (edit ** incorrect, they do the perturbation on the inputs).
edit: "We follow the results of the studies [Koh and Liang, 2017; Bis et al., 2021] to approximate the perturbation effect directly through the model’s parameters when executing Leaving-One-Out experiments on the input. The influence function estimating the perturbation of an input z is then derived as:" - seems like I misunderstood it due to their notation. Seems like a pretty regular method.
bpw1009 t1_jbz4d57 wrote
Here's ChatGPT's take on it for what it's worth 😂:
Yes, the notation you're looking for is "top k argmax". It's a common notation used in machine learning and optimization.
Formally, if you have a function f(x) over a set X, the top k argmax of f(x) is the set of k elements in X that maximize f(x). The notation is usually written as:
argmax_{x\in X} f(x) = {x_1, x_2, ..., x_k}
where x_1, x_2, ..., x_k are the k elements in X that maximize f(x).
Note that if k=1, then the top k argmax reduces to the usual argmax notation.
bmrheijligers t1_jbz402l wrote
AnAtMan
SpaceCockatoo t1_jbz2mns wrote
Reply to [P] vanilla-llama an hackable plain-pytorch implementation of LLaMA that can be run on any system (if you have enough resources) by poppear
Any plans to implement 4/8-bit quantization?
MinaKovacs t1_jbz2gqw wrote
Reply to comment by hebekec256 in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
I think the math clearly doesn't work out; otherwise, Google would have monetized it already. ChatGPT is not profitable or practical for search. The cost of hardware, power consumption, and slow performance are already at the limits. It will take something revolutionary, beyond binary computing, to make ML anything more than expensive algorithmic pattern recognition.
charlesrwest t1_jbz1w0u wrote
Reply to [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
Isn't that more or less what GPT-3 was? As I recall, most of the really big models are costing millions to train?
currentscurrents t1_jbz1hbw wrote
TL;DR they suppress one token at a time and map how it affects the cross-entropy loss. Tokens which have a big impact must have been important for the output. It reminds me of older techniques for image explainability.
Paper link: https://arxiv.org/abs/2301.08110
clementiasparrow t1_jbz0x4x wrote
Reply to comment by WesternLettuce0 in [D] Simple Questions Thread by AutoModerator
I think the standard solution would be concatting the two embeddings an putting a dense layer on top
hebekec256 OP t1_jbz0mpm wrote
Reply to comment by MinaKovacs in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
Yes, I understand that. but LLMs and extensions of LLMs (like PALM-E) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would just break, otherwise I can't see why they wouldn't do it, since the risk to reward ratio seems favorable to me
MinaKovacs t1_jbyzv1v wrote
Reply to [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
A binary computer is nothing more than an abacus. It doesn't matter how much you scale up an abacus, it will never achieve anything even remotely like "intelligence."
ML4Bratwurst t1_jbyzell wrote
Reply to [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Can't wait for the 1 bit quantization
Icy-Curve2747 t1_jbyyux1 wrote
Very interesting, thanks for sharing. I can’t wait for explainable AI to catch up with the rest of ML
Philiatrist t1_jbyyqo8 wrote
There’s not a pretty way that I know of. You just could do:
{i | a_i in topk(A, 5)}
where topk is defined
topk(A, 1) = {max(A)}
topk(A, i + 1) = {max(A \ topk(A, i))} U topk(A, i)
You can modify the first expression to deal with duplicates.
Bulky_Highlight_3352 t1_jbyxv8s wrote
I've tried Aleph's playground and mostly saw it generate complete garbage. Not sure how they will solve any of the ChatGPT's problems.
Illustrious-Bar5621 t1_jbyxpbh wrote
Could just do something like
$ \argmax_{I \subset [n]: |I| = k } \sum_{i \in I} f(i) $ , where $ [n] = \{1,2,\ldots, n\} $.
dhruv-kadam t1_jbyw9t7 wrote
I love reading these geeky comments even though I don't understand a thing. I love this!!
studpufffin t1_jbyw0ah wrote
Probably by supposing the values are ordered and only taking the values with indices leq to k.
Username912773 t1_jbz7y8x wrote
Reply to [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
LLMs cannot be sentient as they require input to generate an output and do not have initiative. They are essentially giant probabilistic networks that calculate the probability of the next token or word.
As you scale model size up, not only do you need more resources to train it but you also require more time and data to train it. So, why would anyone just “screw it” and spend potentially millions or billions of dollars on something that may or may not work and almost certainly have little monetary return?