Recent comments in /f/MachineLearning

pm_me_your_pay_slips t1_jcatwi5 wrote

They don't release that information because they don't want to lose their competitive advantage to other companies. It's a race towards AGI/Transformative AI. It could alsoo be a race for resources: e.g. convincing the US government to concentrate its funding on the leading AI project alone. This means any release of details may come only when OpenAI knows that trainnig for the next generation of models is running without problems.

This is likely based on the idea that newer models can be used to design/build/train the next generation of models, leading to an exponential amplification of capabilities over time that makes any lead time over the competition a decisive factor.

6

Deep-Station-1746 t1_jcamy6n wrote

Patenting a dropout feels a lot like NFTs - it's useless. So why bother?

Edit:

What I don't understand is how can anyone prove that someone is multiplying together matrices in some way as long as they don't admit to that themselves.

That's like someone patenting a thought. If you think about a particular patented pair of pants™, can you be sued for propagating a patented neural activity through your bio network? It's absurd.

−13

trnka t1_jcalqfm wrote

Converting the text to fixed-size windows is done to make training more efficient. If the inputs are shorter, they're padded up to the correct length with null tokens. Otherwise they're clipped. It's done so that you can combine multiple examples into a single batch, which becomes an additional dimension on your tensors. It's a common technique even for LSTMs/CNNs.

It's often possible to take the trained model and apply it to variable-length testing data so long as you're dealing with a single example at a time rather than a batch. But keep in mind with transformers that attention does N^2 comparisons, where N is the number of tokens, so it doesn't scale well to long texts.

It's possible that the positional encoding may be specific to the input length, depending on the transformer implementation. For instance in Karpathy's GPT recreation video he made the positional encoding learnable by position, so it wouldn't have defined values for longer sequences.

One common alternative in training is to create batches of examples that are mostly the same text length, then pad to the max length. You can get training speedups that way but it takes a bit of extra code.

2

ScientiaEtVeritas t1_jcahkze wrote

I think we should value much more what Meta & Google are doing. While they also potentially don't release every model (see Google's PaLM, LaMDA) or only with non-commercial licenses after request (see Meta's OPT, LLaMA), they are at least very transparent when it comes to ideas, architectures, trainings, and so on.

OpenAI itself changed a lot from being open to being closed but what's worse is that OpenAI could be the reason that the whole culture around AI research changes as well, which is sad and pretty ironic when we consider its name. That's why I'm generally not very supportive of OpenAI. So, as a research community, we should largely ignore OpenAI -- in fact, they proactively opted out of it, and instead let's value and amplify open research from Meta, Google, Huggingface, Stability AI, real non-profits (e.g., EleutherAI), and universities. We need counterbalance now.

580

MrTacobeans t1_jcagwir wrote

I don't know anything about this side of AI but when it's boiled down it's fancy algorithms, can those be patented?

Maybe that's the driving force of the "open" nature of AI. An algorithm can't be patented but a product based on that can be. Kinda like how LAMBDA has the non-commercial license but if a community rebuilt it under a permissive license that'd be totally kosher.

This may be why openAI is being hush about their innovations because if it's published someone else can copy it without the legal woes.

2

NoScallion2450 t1_jcagotv wrote

−6

KingsmanVince t1_jcaf167 wrote

Sorry for my lack of knowledge, what do you mean by patents? Which things are the patents applied to? Model's weight? Model's source code? Model's theory (white papers)?

Researchers reuse others ideas and rethink of others work all the time. So if people want to against each other, just don't release white papers.

4

NoScallion2450 t1_jcaci1w wrote

Well that depends on whether OpenAI can prove Google is deriving commerical value from OpenAI's patented research. On the other hand for OpenAI, I can see a clear case of using idea from other labs (Google -- Attention is all you need)

But just to clarify, I am not on one side or either. Definitely a bit sad for AI research going forward. But would be interested in knowing how the landscape changes.

4