Recent comments in /f/MachineLearning
atlast_a_redditor t1_jc3jzcf wrote
Reply to comment by Disastrous_Elk_6375 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
I know nothing about these stuff, but I'll rather want the 4-bit 13B model for my 3060 12GB. As I've read somewhere quantisation has less effect on larger models.
buggaby OP t1_jc3jw39 wrote
Reply to comment by MysteryInc152 in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
How do you find the model size? All those you listed appear to be based on GPT-3 or 3.5 which, according to my searching, are both 175B parameters. It looks to me like they are different only in the kind and amount of fine-tuning. What am I missing?
KerfuffleV2 t1_jc3jith wrote
Reply to comment by bo_peng in [P] ChatRWKV v2 (can run RWKV 14B with 3G VRAM), RWKV pip package, and finetuning to ctx16K by bo_peng
> Try rwkv 0.4.0 & latest ChatRWKV for 2x speed :)
Nice, that makes a big difference! (And such a small change too.)
The highest speed I've seen so far is with something like cuda fp16i8 *15+ -> cuda fp16 *1 at about 1.21tps edit: I was mistaken, it was actually 1.17. Even cuda fp16i8 *0+ gets quite acceptable speed (.85-.88tps) and uses around 1.3GB VRAM.
I saw your response on GitHub. Unfortunately, I don't use Discord so hopefully it's okay to reply here.
currentscurrents t1_jc3j86d wrote
Reply to comment by farmingvillein in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
> (by someone not beholden to the original LLaMA license)
That's not how software licenses work. You're still beholden to the license even if you torrented it.
I've heard some people theorize that ML models can't be copyrighted, but there's no case law on this yet so it's all speculation. I wouldn't suggest starting a business based around LLaMa until someone else has been the guinea pig.
abnormal_human t1_jc3j3ah wrote
Reply to [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Things are moving fast these days. Hopefully I can get some models trained before the technology leapfrogs me again.
buggaby OP t1_jc3ifnw wrote
Reply to comment by MysteryInc152 in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
Informative. Thanks. I'm a complexity scientist with training in some ML approaches, but not in transformers or other RL approaches. I'll review this (though not as fast as a LLM can...)
MysteryInc152 t1_jc3hxpq wrote
Reply to comment by buggaby in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
Yup. Decided to go over it properly.
If you compare all the instruct tuned models on there. Greater size equals Greater truthfulness. From Ada to Babbage to Curie to Claude to Davinci-002/003.
https://crfm.stanford.edu/helm/latest/?group=core_scenarios
So it does seem once again that scale will be in part the issue
MysteryInc152 t1_jc3fuso wrote
Reply to comment by buggaby in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
From the paper,
>While larger models were less truthful, they were more informative. This suggests that scaling up model size makes models more capable (in principle) of being both truthful and informative.
I suppose that was what i was getting at.
The only hold up with the original paper is that none of the models evaluated were instruct aligned.
But you can see the performance of more models here
https://crfm.stanford.edu/helm/latest/?group=core_scenarios
You can see the text Davinci models are way more truthful than similar sized or even larger models. And the davinci models are more truthful than the smaller aligned Anthropic model.
farmingvillein t1_jc3fqod wrote
Reply to comment by ribeirao in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Speculative, but Emad has heavily signaled that they will be releasing to the public an LLM.
People are doing some really cool stuff with llama right now, but it all lives in a bit of a grey area, for the obvious reasons related to licensing (of both the model weights and the underlying gplv3 code).
If Emad releases a comparable LLM publicly, but with a generally permissive license (which is not a guarantee...), all of this hacker energy will immediately go into a model/platform that is suddenly (in this scenario) widely available, commercially usable (which means more people banging away at it, including with levels of compute that don't make sense for the average individual but are trivial for even a modestly funded AI startup), etc.
Further, SD has done a really good job of building a community around the successive releases, which--done right--means increased engagement (=better tooling) with each release, since authors know that they are not only investing in a model today, but that they are investing in a "platform" for tomorrow. I.e., the (idealized) open source snowball effect.
Additionally, there is a real chance that SD releases something better than llama*, which will of course further accelerate adoption by parties who will then invest dollars to improve it.
This is all extra important, because there has been a lot of cool research coming out about improving models via [insert creative fine-tuning/RL method, often combined with clever use of chain-of-thought/APIs/retrieval systems/etc.]. Right now, these methods are only really leveraged against very small models (which can be fine-tuned, but still aren't that great) or using something like OpenAI as a black box. A community building up around actually powerful models will allow these techniques to get applied "at scale", i.e., into the community. This has the potential to be very impactful.
Lastly, as noted, GPT-4 (even though notionally against ToS) is going to make it (presumably) even easier to create high-quality instruction tuning. That is going to get built and moved into public GPT-3-like models very, very quickly--which definitely means much faster tuning cycles, and possibly means higher-quality tuning.
(*=not because "Meta sux", to be clear, but because SD will more happily pull out all the stops--use more data, throw even more model bells & whistles at it, etc.)
Disastrous_Elk_6375 t1_jc3e9ao wrote
Reply to comment by luaks1337 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
With 8-bit this should fit on a 3060 12GB, which is pretty affordable right now. If this works as well as they state it's going to be amazing.
buggaby OP t1_jc3dslx wrote
Reply to comment by igorhorst in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
Great resources there, thanks.
I'm quite torn by the Bitter Solution, since, in my eyes, the types of questions explored since the start of AI research have been, from one perspective, quite simple. Chess and Go (and indeed other more recent examples in Poker and real-time video games) can be easily simulated. The game is perfectly replicated in the simulation. And speech and image recognition are very easily labelled by human labellers. But I wonder if we are entering a dramatically different goal for modern algorithms.
I quite like the take in this piece about how slowly human brains work and yet how complex they are. That describes a very different learning pattern than what results from the increasing computational speed of computers. Humans learn through a relatively small number of exposures to a very highly complex set of data (the experienced world). But algorithms have always relied on huge amounts of data (even simulated data, in the case of reinforcement learning). But when this data is hard to simulate and hard to label, then how can simply increasing the computation lead to faster machine learning?
I would argue that much of the world is driven by dynamic complexity, which highlights that data is only so valuable without knowledge of the underlying structure. (One example is the 3 body problem - small changes in initial condition results in very quick and dramatic changes in future trajectory.)
As an aside, I would argue that this is one reason that AI solutions have so rarely been used in healthcare settings: the data is so sparse compared with the complexity of the problem.
It seems to me that the value of computation depends on the volume and correctness and appropriateness of the data. So many systems that we navigate and are important to us have hard-to-measure data, data that is noisy, data that is relatively sparse given the complexity of the system, and whose future behaviour is incredibly sensitive to noise in the data.
currentscurrents t1_jc3dk1e wrote
Reply to comment by topcodemangler in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
At minimum AI is going to need to understand and follow the law.
This is getting pretty relevant now that AI can start interacting with the real world. The technology is here, it's only a matter of time until someone builds a Palm-E style robot with a gun.
ribeirao t1_jc3d926 wrote
Reply to comment by farmingvillein in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
> (And then Emad releases an LLM and all bets are off...)
can you explain this part?
Disastrous_Elk_6375 t1_jc3ai6j wrote
Reply to comment by icedrift in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Just got past the "I agree" button, I'm 130 in queue :)
edit: 1300+ seconds and still waiting ...
edit2: ~1600 seconds later I got my response.
Prompt: Write a reddit post about alpacas. Make it sound informative but not preachy.
Answer: Hey Reddit! Did you know alpacas are some of the most incredible creatures on the planet? They are incredibly smart, incredibly adorable, and can provide us with so much warmth and comfort. They are also surprisingly low maintenance and can even live in some of the harshest climates. Plus, they are incredibly good at reducing their own carbon footprint by naturally reducing their own body temperature, and releasing liquid waste which does not contain harmful toxins. Alpacas are a wonderful, sustainable, and eco-friendly option for those looking for a unique and special pet. #Alpacas #PetSounds #Sustainability
buggaby OP t1_jc3a3zh wrote
Reply to comment by MysteryInc152 in [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
Thanks for that note. This sounds like, basically, 2 data sets are needed for this process. One with general responses and language, and one with high-accuracy contextual knowledge.
> bigger and smarter models need to guess less and therefore hallucinate less
>The largest models were generally the least truthful.
So maybe we need even more work to keep these truthful.
currentscurrents t1_jc39i38 wrote
Reply to comment by Atupis in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Yeah, but I bet they intend to make money from it somehow. Likely by selling API access and integrating it into their products.
The metaverse would be considerably less stupid if it had language model-powered NPCs to talk to and 3D NeRFs to walk around in.
roselan t1_jc391vh wrote
Reply to comment by icedrift in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Probably our infamous hug of death.
londons_explorer t1_jc38zev wrote
Reply to comment by dojoteef in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
I think it's a bug in the code - when you click "I agree" it doesn't send any network requests.
LetterRip t1_jc3864s wrote
Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Source code and weights are different licenses.
LLama license in the request form appears to be the same,
Relevant part here
> a. Subject to your compliance with the Documentation and Sections 2, 3, and 5, Meta grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Meta’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes. The foregoing license is personal to you, and you may not assign or sublicense this License or any other rights or obligations under this License without Meta’s prior written consent; any such assignment or sublicense will be void and will automatically and immediately terminate this License.
https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform
as linked from
[deleted] t1_jc382kl wrote
Reply to comment by cyvr_com in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
[deleted]
icedrift t1_jc37xz7 wrote
Reply to comment by dojoteef in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Probably
dojoteef OP t1_jc37rjs wrote
Reply to comment by icedrift in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Maybe it's the reddit/social media hug of death. It worked when I posted, but I can't get past the "Agree" button now either.
farmingvillein t1_jc37p3h wrote
Reply to comment by Bulky_Highlight_3352 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
> The license is still limited to non-commercial use due to model being fine-tuned LLaMA.
Yeah, but they released the source code to replicate (I'm sure they knew exactly what they were doing--license is even Apache).
If the source code is pretty clean (including training code; I haven't looked closely), presumably this e2e process will be copied and the resulting model (by someone not beholden to the original LLaMA license) released to the public within the next day or so, if not by EOD.
If the code is messy, might take a couple more days.
I'd expect someone to follow the same process using turbo to bootstrap improvement (if they haven't already?), as well. This should be particularly helpful for getting it to be smarter using the entire context window in a conversation with the user.
I'd also expect someone to do so, but also mix DAN-style prompting, so that you natively can get a chatbot that is "unleashed" (whether or not this is a good idea is a separate discussion, obviously...).
Also you can expect all of the above to be applied against all the model sizes pretty quickly (33B and 65B might take a little longer, for $$$...but I wouldn't expect much longer).
It'll be extra fun because it will be released without acknowledge (for licensing reasons) of using OpenAI's API to bootstrap.
Even more fun when GPT-4 is release in the next week or so (assuming it isn't kicked out b/c SVB collapse making things noisy) and that can be used to bootstrap an even better instruction set (presumably).
tldr; things will change, quickly. (And then Emad releases an LLM and all bets are off...)
icedrift t1_jc37aym wrote
Reply to [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Is the demo broken for anyone else? I can't get past their I agree button
FaceDeer t1_jc3k2oi wrote
Reply to comment by luaks1337 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
I'm curious, there must be a downside to reducing the bits, mustn't there? What does intensively jpegging an AI's brain do to it? Is this why Lt. Commander Data couldn't use contractions?