Recent comments in /f/MachineLearning

Username912773 t1_jbz7y8x wrote

LLMs cannot be sentient as they require input to generate an output and do not have initiative. They are essentially giant probabilistic networks that calculate the probability of the next token or word.

As you scale model size up, not only do you need more resources to train it but you also require more time and data to train it. So, why would anyone just “screw it” and spend potentially millions or billions of dollars on something that may or may not work and almost certainly have little monetary return?

−2

RedditLovingSun t1_jbz78cm wrote

Depends on your definition of intelligence, the human brain is nothing but a bunch of neurons passing electrical signals to each other, I don't see why it's impossible for computers to simulate something similar to achieve the same results as a brain does.

10

pyepyepie t1_jbz57hd wrote

To be fair the paper looks interesting, the news title is garbage but it's not the fault of the authors who did a pretty cool job. Anyway, it seems like a nice application of a very well-known idea, which is cool.

By the way, is measuring the perturbation influence on the loss a common idea? Because I am mostly aware of using it to see how the regression value or class probabilities change - and the perturbation is done on the inputs, not params (edit ** incorrect, they do the perturbation on the inputs).

edit: "We follow the results of the studies [Koh and Liang, 2017; Bis et al., 2021] to approximate the perturbation effect directly through the model’s parameters when executing Leaving-One-Out experiments on the input. The influence function estimating the perturbation  of an input z is then derived as:" - seems like I misunderstood it due to their notation. Seems like a pretty regular method.

1

bpw1009 t1_jbz4d57 wrote

Here's ChatGPT's take on it for what it's worth 😂:

Yes, the notation you're looking for is "top k argmax". It's a common notation used in machine learning and optimization.

Formally, if you have a function f(x) over a set X, the top k argmax of f(x) is the set of k elements in X that maximize f(x). The notation is usually written as:

argmax_{x\in X} f(x) = {x_1, x_2, ..., x_k}

where x_1, x_2, ..., x_k are the k elements in X that maximize f(x).

Note that if k=1, then the top k argmax reduces to the usual argmax notation.

−7

MinaKovacs t1_jbz2gqw wrote

I think the math clearly doesn't work out; otherwise, Google would have monetized it already. ChatGPT is not profitable or practical for search. The cost of hardware, power consumption, and slow performance are already at the limits. It will take something revolutionary, beyond binary computing, to make ML anything more than expensive algorithmic pattern recognition.

−1

hebekec256 OP t1_jbz0mpm wrote

Yes, I understand that. but LLMs and extensions of LLMs (like PALM-E) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would just break, otherwise I can't see why they wouldn't do it, since the risk to reward ratio seems favorable to me

0