Recent comments in /f/MachineLearning

KerfuffleV2 t1_jc3jith wrote

> Try rwkv 0.4.0 & latest ChatRWKV for 2x speed :)

Nice, that makes a big difference! (And such a small change too.)

The highest speed I've seen so far is with something like cuda fp16i8 *15+ -> cuda fp16 *1 at about 1.21tps edit: I was mistaken, it was actually 1.17. Even cuda fp16i8 *0+ gets quite acceptable speed (.85-.88tps) and uses around 1.3GB VRAM.

I saw your response on GitHub. Unfortunately, I don't use Discord so hopefully it's okay to reply here.

1

currentscurrents t1_jc3j86d wrote

> (by someone not beholden to the original LLaMA license)

That's not how software licenses work. You're still beholden to the license even if you torrented it.

I've heard some people theorize that ML models can't be copyrighted, but there's no case law on this yet so it's all speculation. I wouldn't suggest starting a business based around LLaMa until someone else has been the guinea pig.

10

MysteryInc152 t1_jc3hxpq wrote

Yup. Decided to go over it properly.

If you compare all the instruct tuned models on there. Greater size equals Greater truthfulness. From Ada to Babbage to Curie to Claude to Davinci-002/003.

https://crfm.stanford.edu/helm/latest/?group=core_scenarios

So it does seem once again that scale will be in part the issue

2

MysteryInc152 t1_jc3fuso wrote

From the paper,

>While larger models were less truthful, they were more informative. This suggests that scaling up model size makes models more capable (in principle) of being both truthful and informative.

I suppose that was what i was getting at.

The only hold up with the original paper is that none of the models evaluated were instruct aligned.

But you can see the performance of more models here

https://crfm.stanford.edu/helm/latest/?group=core_scenarios

You can see the text Davinci models are way more truthful than similar sized or even larger models. And the davinci models are more truthful than the smaller aligned Anthropic model.

3

farmingvillein t1_jc3fqod wrote

Speculative, but Emad has heavily signaled that they will be releasing to the public an LLM.

People are doing some really cool stuff with llama right now, but it all lives in a bit of a grey area, for the obvious reasons related to licensing (of both the model weights and the underlying gplv3 code).

If Emad releases a comparable LLM publicly, but with a generally permissive license (which is not a guarantee...), all of this hacker energy will immediately go into a model/platform that is suddenly (in this scenario) widely available, commercially usable (which means more people banging away at it, including with levels of compute that don't make sense for the average individual but are trivial for even a modestly funded AI startup), etc.

Further, SD has done a really good job of building a community around the successive releases, which--done right--means increased engagement (=better tooling) with each release, since authors know that they are not only investing in a model today, but that they are investing in a "platform" for tomorrow. I.e., the (idealized) open source snowball effect.

Additionally, there is a real chance that SD releases something better than llama*, which will of course further accelerate adoption by parties who will then invest dollars to improve it.

This is all extra important, because there has been a lot of cool research coming out about improving models via [insert creative fine-tuning/RL method, often combined with clever use of chain-of-thought/APIs/retrieval systems/etc.]. Right now, these methods are only really leveraged against very small models (which can be fine-tuned, but still aren't that great) or using something like OpenAI as a black box. A community building up around actually powerful models will allow these techniques to get applied "at scale", i.e., into the community. This has the potential to be very impactful.

Lastly, as noted, GPT-4 (even though notionally against ToS) is going to make it (presumably) even easier to create high-quality instruction tuning. That is going to get built and moved into public GPT-3-like models very, very quickly--which definitely means much faster tuning cycles, and possibly means higher-quality tuning.

(*=not because "Meta sux", to be clear, but because SD will more happily pull out all the stops--use more data, throw even more model bells & whistles at it, etc.)

24

buggaby OP t1_jc3dslx wrote

Great resources there, thanks.

I'm quite torn by the Bitter Solution, since, in my eyes, the types of questions explored since the start of AI research have been, from one perspective, quite simple. Chess and Go (and indeed other more recent examples in Poker and real-time video games) can be easily simulated. The game is perfectly replicated in the simulation. And speech and image recognition are very easily labelled by human labellers. But I wonder if we are entering a dramatically different goal for modern algorithms.

I quite like the take in this piece about how slowly human brains work and yet how complex they are. That describes a very different learning pattern than what results from the increasing computational speed of computers. Humans learn through a relatively small number of exposures to a very highly complex set of data (the experienced world). But algorithms have always relied on huge amounts of data (even simulated data, in the case of reinforcement learning). But when this data is hard to simulate and hard to label, then how can simply increasing the computation lead to faster machine learning?

I would argue that much of the world is driven by dynamic complexity, which highlights that data is only so valuable without knowledge of the underlying structure. (One example is the 3 body problem - small changes in initial condition results in very quick and dramatic changes in future trajectory.)

As an aside, I would argue that this is one reason that AI solutions have so rarely been used in healthcare settings: the data is so sparse compared with the complexity of the problem.

It seems to me that the value of computation depends on the volume and correctness and appropriateness of the data. So many systems that we navigate and are important to us have hard-to-measure data, data that is noisy, data that is relatively sparse given the complexity of the system, and whose future behaviour is incredibly sensitive to noise in the data.

5

currentscurrents t1_jc3dk1e wrote

At minimum AI is going to need to understand and follow the law.

This is getting pretty relevant now that AI can start interacting with the real world. The technology is here, it's only a matter of time until someone builds a Palm-E style robot with a gun.

−6

Disastrous_Elk_6375 t1_jc3ai6j wrote

Just got past the "I agree" button, I'm 130 in queue :)

edit: 1300+ seconds and still waiting ...

edit2: ~1600 seconds later I got my response.

Prompt: Write a reddit post about alpacas. Make it sound informative but not preachy.

Answer: Hey Reddit! Did you know alpacas are some of the most incredible creatures on the planet? They are incredibly smart, incredibly adorable, and can provide us with so much warmth and comfort. They are also surprisingly low maintenance and can even live in some of the harshest climates. Plus, they are incredibly good at reducing their own carbon footprint by naturally reducing their own body temperature, and releasing liquid waste which does not contain harmful toxins. Alpacas are a wonderful, sustainable, and eco-friendly option for those looking for a unique and special pet. #Alpacas #PetSounds #Sustainability

41

buggaby OP t1_jc3a3zh wrote

Thanks for that note. This sounds like, basically, 2 data sets are needed for this process. One with general responses and language, and one with high-accuracy contextual knowledge.

> bigger and smarter models need to guess less and therefore hallucinate less

According to OpenAI

>The largest models were generally the least truthful.

So maybe we need even more work to keep these truthful.

4

LetterRip t1_jc3864s wrote

Source code and weights are different licenses.

LLama license in the request form appears to be the same,

Relevant part here

> a. Subject to your compliance with the Documentation and Sections 2, 3, and 5, Meta grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Meta’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes. The foregoing license is personal to you, and you may not assign or sublicense this License or any other rights or obligations under this License without Meta’s prior written consent; any such assignment or sublicense will be void and will automatically and immediately terminate this License.

https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform

as linked from

https://github.com/facebookresearch/llama

2

farmingvillein t1_jc37p3h wrote

> The license is still limited to non-commercial use due to model being fine-tuned LLaMA.

Yeah, but they released the source code to replicate (I'm sure they knew exactly what they were doing--license is even Apache).

If the source code is pretty clean (including training code; I haven't looked closely), presumably this e2e process will be copied and the resulting model (by someone not beholden to the original LLaMA license) released to the public within the next day or so, if not by EOD.

If the code is messy, might take a couple more days.

I'd expect someone to follow the same process using turbo to bootstrap improvement (if they haven't already?), as well. This should be particularly helpful for getting it to be smarter using the entire context window in a conversation with the user.

I'd also expect someone to do so, but also mix DAN-style prompting, so that you natively can get a chatbot that is "unleashed" (whether or not this is a good idea is a separate discussion, obviously...).

Also you can expect all of the above to be applied against all the model sizes pretty quickly (33B and 65B might take a little longer, for $$$...but I wouldn't expect much longer).

It'll be extra fun because it will be released without acknowledge (for licensing reasons) of using OpenAI's API to bootstrap.

Even more fun when GPT-4 is release in the next week or so (assuming it isn't kicked out b/c SVB collapse making things noisy) and that can be used to bootstrap an even better instruction set (presumably).

tldr; things will change, quickly. (And then Emad releases an LLM and all bets are off...)

28