igorhorst t1_jc372db wrote on March 13, 2023 at 6:34 PM

Reply to [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby

> Without a clear path to increasing this vital metric, I struggle to see how modern generative AI models can be used for any important tasks that are sensitive to correctness.

My immediate response is "human-in-the-loop" - let the machine generate solutions and then let the human user validate the correctness of said solutions. That being said, that relies on humans being competent to validate correctness, which may be a dubious proposition.

Perhaps a better way forward is to take a general-purpose text generator and finetune it on a more limited corpus that you can guarantee validity on. Then use this finetuned model on important tasks that are sensitive to correctness. This is the basis behind this Othello-GPT paper - take an existing GPT-3 model and finetune it on valid Othello boards so you can generate valid Othello moves. You wouldn't trust this Othello-GPT to write code for you, but you don't have to - you would find a specific machine learning model finetuned on code, and let that model generate code. It's interesting that OpenAI has Codex models that is finetuned on code, such as "code-davinci-003" (which is based off GPT-3).

This latter approach kinda reminds me of the Bitter Solution:

>The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

But the flipside of the Bitter Solution is that building knowledge into your agent (via approaches like finetuning) will lead to better results in the short-term. In the long-term, solutions based on scaling computation by search and learning may outperform current solutions - but we shouldn't wait for the long term to show up. We have tasks to solve now, and so it's okay to build knowledge into our agents. The resulting agents might become obsolete in a few years, but that's okay. We build tools to solve problems, we solve those problems, and then we retire those tools and move on.

>And certainly we are really far from anything remotely "AGI".

The issue is that we're dealing with "general intelligence" here, and just because a human is terrible at bunch of subjects, we do not say that human lacks general intelligence. I generally conflate the term "AGI" with "general-purpose", and while ChatGPT isn't fully general-purpose (at the end of the day, it just generates text - though it's surprising to me that lots of tasks can be modeled and expressed by mere text), you could use ChatGPT to generate a bunch of solutions. So, I think we're close to getting general-purpose agents that can generate solutions for everything, but the timeline for getting correct solutions for everything may be longer.

MysteryInc152 t1_jc36042 wrote on March 13, 2023 at 6:27 PM

Reply to [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby

Hallucinations are a product of training. Plausible guessing is the next best thing to reduce loss after knowledge and understanding fail (and it will find instances it fails regardless of how intelligent the system gets). Unless you reach the heart of the issue, you're not going to reduce hallucinations except for the simple fact that bigger and smarter models need to guess less and therefore hallucinate less.

There are works to reduce hallucinations by plugging in external augmentation modules https://arxiv.org/abs/2302.12813.

But really any way for the model to evaluate the correctness of its statements will reduce hallucinations.

farmingvillein t1_jc3602d wrote on March 13, 2023 at 6:27 PM

Reply to comment by Taenk in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

No source, they are making it up.

Atupis t1_jc35ppl wrote on March 13, 2023 at 6:26 PM

Reply to comment by RabbitContrarian in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Meta should do it it would seriously affect the Microsoft-OpenAI thing and might also hurt Google down the line.

abriec t1_jc34zx3 wrote on March 13, 2023 at 6:21 PM

Reply to [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby

Given the constant evolution of information through time combining LLMs with retrieval and reasoning modules is the way forward imo

RabbitContrarian t1_jc34fjp wrote on March 13, 2023 at 6:18 PM

Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

They did not. Some random person is asking Meta to change it.

Bulky_Highlight_3352 t1_jc34398 wrote on March 13, 2023 at 6:15 PM

Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

nada, last commit last week

[deleted] t1_jc33onx wrote on March 13, 2023 at 6:13 PM

Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

[deleted]

cyvr_com t1_jc33n6x wrote on March 13, 2023 at 6:13 PM

Reply to comment by Taenk in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Check git commits

Bulky_Highlight_3352 t1_jc33l11 wrote on March 13, 2023 at 6:12 PM

Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

source?

Taenk t1_jc33k5h wrote on March 13, 2023 at 6:12 PM

Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Can you please link a source?

spiritus_dei OP t1_jc33gcf wrote on March 13, 2023 at 6:11 PM

Reply to [D] ChatGPT without text limits. by spiritus_dei

Here is a link to the paper: https://arxiv.org/pdf/2301.04589.pdf

[deleted] t1_jc331bf wrote on March 13, 2023 at 6:09 PM

Reply to comment by topcodemangler in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

[removed]

cyvr_com t1_jc32sel wrote on March 13, 2023 at 6:07 PM

Reply to comment by Bulky_Highlight_3352 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Llama changed their license this morning

3deal t1_jc32o55 wrote on March 13, 2023 at 6:06 PM

Reply to comment by MorallyDeplorable in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Cool, it is sad here is no download link to try it 🙂

djaym7 t1_jc32kmz wrote on March 13, 2023 at 6:06 PM

Reply to comment by Bulky_Highlight_3352 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

This just sucks

MorallyDeplorable t1_jc32jfw wrote on March 13, 2023 at 6:06 PM

Reply to comment by 3deal in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

It should, yea. I'm running it on a 4090 which has the same amount of VRAM. It takes about 20-21 GB of RAM.

3deal t1_jc32dgv wrote on March 13, 2023 at 6:05 PM

Reply to comment by MorallyDeplorable in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Does it run on a RTX 3090 ?

luaks1337 t1_jc320gp wrote on March 13, 2023 at 6:02 PM

Reply to [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

With 4-bit quantization you could run something that compares to text-davinci-003 on a Raspberry Pi or smartphone. What a time to be alive.

Bulky_Highlight_3352 t1_jc31exq wrote on March 13, 2023 at 5:58 PM

Reply to [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

really nice, thanks for sharing.
The license is still limited to non-commercial use due to model being fine-tuned LLaMA.

>We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited. There are three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial license, so we necessarily inherit this decision. Second, the instruction data is based OpenAI's text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI. Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.

currentscurrents t1_jc31c23 wrote on March 13, 2023 at 5:58 PM

Reply to [D]: Generalisation ability of autoencoders by Blutorangensaft

Vanilla autoencoders don't generalize well, but variational autoencoders have a much better structured latent space and generalize much better.

Generalization really comes down to inductive biases. Autoencoders are downscalers -> upscalers, so they have an inductive bias towards preserving large features in the data and discarding small details. This is reasonable for images but not so much for text.

But autoencoders are just one example of an information bottleneck model, which includes everything from autoregressive language models to diffusion models to U-Nets. (U-Nets are basically just autoencoders with skip connections!) They all throw away part of the data and learn how to reconstruct it.

Different kinds of bottlenecks have different inductive biases and are better suited to different kinds of data. Next-word-prediction seems to be better suited for text because it reflects the natural flow of language.

Valuable-Kick7312 t1_jc2ziwy wrote on March 13, 2023 at 5:47 PM

Reply to comment by jacobgil in [P] Introducing confidenceinterval, the long missing python library for computing confidence intervals by jacobgil

Thank you for the answer!

Just a few notes: In general, confidence intervals do not assume iid. Moreover, in theory, if the data is not drawn iid then CI can also be smaller. However, I have not encountered this in practice yet.