visarga

visarga t1_iy2uc9o wrote

Young'uns I still remember 8bit processors in 1980s and loading programs from cassette tape. My father was still using IBM-style cards at work when I was a child, I messed up a whole stack playing with them. One card was a line of code. He had to sort it back by hand.

I think the biggest factor of change in the last 20 years was the leap in computing and communication speed. It took us from the PC era into the internet era. This meant an explosion in online media and indirectly allowed the collection of huge datasets that are being used to train AI today.

The things I've seen. I remember Geoffrey Hinton presenting his pre-deep-learning paper "Restricted Boltzmann Machines" around 2005. That instantly got my attention and I started following the topic, back then ML was a pariah. 12 years later I was working in AI. I have seen blow by blow from the front seat every step AI has made since 2012 when things got heated up. I read the Residual Neural Network paper the same day it was published, and witnessed the birth of transformer. I have seen GANs come and go, and even talked with the original author Ian Goodfellow right here on reddit before he got famous. I got to train many neural nets and play with even more. Much of what I learned is already useless, GPT-3 and SD are so open ended they make projects that took years take just weeks now.

Funny thing, when Hinton published the RBM paper he was using unsupervised learning. I thought it was very profound. But in 2012 the big breakthroughs were supervised learning (ImageNet). For five years only supervised learning got the attention and admiration. But in the last 5 years unsupervised won the spotlight again. How the wheel turns.

5

visarga t1_ixmbq7v wrote

AI is not that creative yet, maybe in the future, but how many mathematicians are? Apparently it is able to solve hard problems that are not in the training set:

> Meta AI has built a neural theorem prover that has solved 10 International Math Olympiad (IMO) problems — 5x more than any previous AI system.

> trained on a dataset of successful mathematical proofs and then learns to generalize to new, very different kinds of problems

This is from 3 weeks ago: link

1

visarga t1_ixiec41 wrote

The main idea here is to use

  • a method to generate solution candidates - a language model

  • a method to filter/rank the candidates - ensemble of predictions or running a test (such as in testing code)

Minerva - https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html

AlphaCode

FLAN-PaLM - https://paperswithcode.com/paper/scaling-instruction-finetuned-language-models (top score on MMLU math problems)

DiVeRSe - https://paperswithcode.com/paper/on-the-advance-of-making-language-models (top score MetaMath)

1

visarga t1_ixh56gc wrote

From AlphaGo to Diplomacy in just 6 years! They were saying the Go board is simple, everything visible and has a short list of possible actions, while real world has uncertainty, complexity and much more diversified actions. But Diplomacy has all that.

5

visarga t1_ixfdcaj wrote

I don't believe that, OpenAI and a slew of other companies can make a buck on cutting edge language/image models.

My problem with Google is that it often fails to understand the semantic of my queries replying with other content that is totally unrelated, so I don't believe in their deployed AI. It's dumb as the night. They might have shiny AI in the labs but the product is painfully bad. And their research teams almost always block the release of the models and don't even have demos. What's the point in admiring such a bunch? Where's the access to PaLM, Imagen, Flamingo, and other toys they dangled in front of us?

Given this situation I don't think they really align themselves with AI advancement, instead they align with short term profit making, which is to be expected. Am I making conspiracies or just saying what we all know - companies work for profits, not for art.

1

visarga t1_ixd6ygt wrote

> I have absolutely no idea what you mean by "1-2 years late", in what way are they late?

GPT-3 was published in May 2020, PaLM in Apr 2022. There were a few other models in-between but they were not on the same level.

Dall-E was published in Jan 2021, Google's Imagen is from May 2022.

> Google is already looking at integrating language models

Yes, they are. But do a search and you'll see how poor the results are in reality. They don't want us to actually find what we're looking for, not immediately. They stand to lose money.

Look at Google Assistant - the language models can write convincing prose and handle long dialogues, in the meantime Assistant defaults to web search 90% of the questions and can't hold much context. Why? Because Assistant is cutting into their profits.

I think Google wants to monopolise research but quietly delay its deployment as much as possible. So their researchers are happy and don't make competing products, while we are happy waiting for upgrades.

1

visarga t1_ixbka8a wrote

They have a bunch of good models but they are 1-2 years late.

Also Google is standing to lose from the next wave of AI, from a business-wise perspective. The writing on the wall is that traditional search is on its way out, now more advanced AI can do direct question answering. This means ads won't get displayed. They are dragging their feet for this reason, this is my theory. The days of good old web search are limited.

But hey, you could say they might ask the language model to shill for various products. True, but language models can also run on the edge, so we could have our own models that listen to our priorities and wishes.

That was not something possible to do with web search, but accessible through AI. The moral of the story is that Google's centralised system is getting eroded and they are losing control and ad impressions.

1

visarga t1_ixa9md9 wrote

> They even used a Genetic Evolution algorithm to find new proofs, and got a few that no human had thought of before

This shows you haven't been following the state of the art in theorem proving.

> AGI is an illusion, although a very good one that’s useful.

Hahaha. Yes, enjoy your moment until it comes.

2

visarga t1_ixa8oko wrote

Look, you can prompt GPT-3 to tell you this kind of advice if that's your thing. It's pretty competent at generating heaps of text like you wrote.

You can ask it to take any position on any topic, the perspective of anyone you want, and it will happily oblige. It's not one personality but a distribution of personalities, and its message is not "The Message of the AI" but just a random sample from a distribution.

2

visarga t1_ix2wyw6 wrote

There is also prompt-tuning that will fine-tune only a few token embeddings keeping the model itself frozen. This changes the problem from finding that elusive prompt to finding a few labeled examples + fine-tuning the prompt.

Another approach is to use a LLM to generate prompts and filter them by evaluation. This has also been used to generate step by step reasoning traces for datasets that only have input-output pairs. Then train another model on the examples + chain of thought for a big jump in accuracy.

There's a relevant paper here: Large Language Models Can Self-Improve. They find that

> fine-tuning on reasoning is critical for self-improvement

I would add that sometimes you can evaluate a result, for example when generating math or code. Then you can learn from the validated outputs of the network. Basically what was used for AlphaZero to reach super-human level without supervision, but requires a kind of simulator - a game engine, a python interpreter, or a symbolic math engine.

1

visarga t1_ix2vuys wrote

You mean like this? You just prepend "The following is a conversation with [a very intelligent AI | a human expert]". In image generation the trick is to add artist names to the prompt "in the style of X and Y", also called "style phrases" or "vitamin phrases".

Dall-E 2 was tweaked in a similar way to be more diverse when asking for a photo of a CEO, or other job, they would add various race and gender keywords. People were generally upset about having their prompts modified. But prepending the modifier on top by default might be useful in some cases.

If you want to extract a specific style or ability more precisely from a model you can fine-tune it on a small dataset, probably <1000 examples. This is easy to do using the cloud APIs, but not as easy as prompting.

1

visarga t1_ix0ji8d wrote

> it was utter trash and excessively arrogant

Galactica is a great model for citation retrieval. It has innovations in citation learning and beats all other systems. Finding good citations is a time consuming task when writing papers.

It also has a so called <work> token that triggers additional resources such as a calculator or Python interpreter. This is potentially very powerful, combining neural and symbolic reasoning.

Another interesting finding from this paper is that a smaller, very high quality dataset can replace a much larger, noisy dataset. So there's a trade-off here between quality and quantity, it's not sure which direction has the most payoff.

I'd say the paper was targeted for critique because it comes from Yann LeCunn's AI institute. Yann has some enemies on Twitter since a few years ago. They don't forget or forgive. There's a good video on this topic by Yannic Kilcher.

And by the way, the demo still lives on HuggingFace: https://huggingface.co/spaces/lewtun/galactica-demo

5

visarga OP t1_iwutuf1 wrote

This is good reading, will synthesize where we are and what's coming soon.

I'd like to add that training LLMs on massive video datasets like YouTube will improve their procedural knowledge - how to do things step by step, with applications in robotics and software automation. We have seen large models on text and images, but video adds the time dimension, there is audio and speech as well. Very multi-modal.

Action driven models are going to replace more and more human work, much more than the tool-AIs we have today. They will cause big changes in the job market.

10