Recent comments in /f/MachineLearning

VelveteenAmbush t1_jcd6opg wrote

You could patent your algorithm and offer some sort of GPL-like patent license, but no one respects software patents anyway (for good reason IMO) and you'd be viewed as a patent troll if you tried to sue to enforce it.

GPL itself is a copyright license and does you no good if OpenAI is using your ideas but not your code. (Plus you'd actually want AGPL to force code release for an API-gated service, but that's a separate issue.)

8

noiseinvacuum t1_jcd41kc wrote

From a research perspective, imo, it's 1000x better to follow the Meta model of release code + architecture openly and share weights with researchers than to be completely closed and call yourself Open. I understand that there are genuine risks with weights been available to adversaries but I think it's still better for the progress of the very young field of AI.

4

suflaj t1_jcd4131 wrote

> I’ve been using ChatGPT to write all of my sales emails for difficult clients lately, and it has been fantastic. It took what should have been another staffmember at my company and made it into a proofreading duty I can handle while working on other things.

I fail to see the point you're making.

> Also… hate to say it, but the fact that you’re using the words “humiliated” and “jailbroken” in this context doesn’t exactly cast a very good light on your understanding of the situation.

I also fail to see what you're saying. How else would you describe events in which you show how stupid ChatGPT actually is and those where you get to trick it to bypass all security filters?

−8

noiseinvacuum t1_jcd2xzt wrote

Completely agree with you on this. This will get much worse IMO, specially with big investment from Microsoft in OpenAI and the fact that MS is now openly and directly challenging Google. This whole AI Alpha aggressive posturing from Satya Nadella has put Google in a difficult spot, I can't see how Google will continue to justify sharing their research openly to its investors.

6

LegacyAngel t1_jcd1idr wrote

>But without OpenAI, who would have spent the billions of dollars they have burned through creating and then actually giving people access to models like GPT-3 and now GPT-4?

Other companies are providing access. OpenAI is just being reckless.

usual disclaimer here

3

E_Snap t1_jcd16ks wrote

I’ve been using ChatGPT to write all of my sales emails for difficult clients lately, and it has been fantastic. It took what should have been another staffmember at my company and made it into a proofreading duty I can handle while working on other things.

Also… hate to say it, but the fact that you’re using the words “humiliated” and “jailbroken” in this context doesn’t exactly cast a very good light on your understanding of the situation.

9

twilight-actual t1_jcd0wcs wrote

What exactly would that pushback be? Boycott? Post mean things?

About the only thing that could potentially prevent this is if the algorithms that we put into the public domain are protected by a license like the GPL, or something similar.

I haven't been following code releases, so I don't know if that's being done. And to be honest, I doubt most of the information flow is going by code. Rather, it's in the papers.

Is there a way to protect papers by a "GPL"? I honestly doubt it, because at that level, we're dealing strictly with ideas. And the only way to protect an idea is to patent them.

Perhaps the community, as a whole, should start patenting all their ideas, and then assigning the patents to a public trust that ensures that any derivative technology is published freely, too, under the same patent type.

21

HyperModerate t1_jcd0lnn wrote

The way AI is used to launder copyright and licensing is concerning. Copyrighted data is used to train a model. The model’s output, now also licensed, is used to finetune a second model, also separately licensed. Finally, this highly licensed model is considered for public release.

The attitude is basically the same as a pirating but there is no similar legal precedent.

To be clear, I think AI research should be open.

2

sam__izdat t1_jccyxl4 wrote

As a spectator, it's the standard story that's played out a million times now. I see ML as pre-scientific. If capital is allowed to take full control and call all the shots, it's not moving past that "pre" any time soon. It'll be a digital Taylorist discipline for PR industry surveillance and optimizing Amazon packers' pee breaks, and the brief flurry of actually useful progress is probably done.

4

camp4climber t1_jccy0qa wrote

Yea that's a fair point. These kind of examples certainly exist and often come from large research labs at the very edge of state of the art where the interesting narrative point is scale. The context of specific benchmarks or applications certainly matters.

I still think my point stands in the general case. At least for most of us independent researchers. Ultimately research is about revealing novel insights. Train for longer is not that interesting. But an LLM that fits onto a single GPU, contains 13B parameters, and is capable of outperforming a 175B parameter model is certainly interesting.

2

Daos-Lies t1_jcctn72 wrote

Could I pick you up on your point about it not being interesting enough for a paper.

​

A comprehensive and properly conducted hyperparameter sweep of a selection of state of the art models would provide useful information to the community at large. It would be useful to have the knowledge of what settings are ideal to train any particular model architecture (or checkpoint of that architecture) for any particular type of dataset.

​

There would be variation in the exact hyperparameters that are best for training on the particular dataset of cat pictures the paper used, rather than your own dataset of cat pictures, but the best hyperparameters for any set of cat pictures, on that particular model, are probably going to be quite similar.

And so it is useful to have that knowledge, presented in this hypothetical paper, to refer to when you start training a model on cat pictures.

​

---

​

I have a tendency to treat epochs and learning rate like an accelerator on a car, pumping them up when you want to go faster and bringing them down when you want more control and the ability to check where you're going so you don't miss your exit.

​

Whereas a hyperparameter like C with an svm, I'm much more likely to actually bother with formally looping through and finding the 'right' C than just trying some values and going for it.

​

And the key point there is that SVMs tend to train much much faster than NNs, so I'm not bothering to take the massive extra time it would take to find the 'right' epoch and learning rate. (also epoch and LR are quite intuitive in what they actually mean, which does make them a bit easier to guess at)

​

But if someone had already put the effort in to find the 'right' epoch and LR, even if I was aware that they'd only be approximately 'right' for my particular problem, I'd definitely use that as my starting point.

​

---

​

Ok and I've written quite a lot here already, but I'm going to end by mentioning that in the paper that accompanied the GPT-4 release, they had a whole section on predicting the loss that would be achieved at a certain point in GPT-4s training procedure. Because when you get to training on that scale its pretty costly to guess at your training procedure and so any metrics you have at all on how to get it right the first time are valuable just in terms of the cost of compute time.

​

So yes u/TheWittyScreenName it is worth a paper, and my recommendation would be to have it focused around your conducting a solid and systematic analysis to present.

​

Edit: Well gosh, I've just reread your comment u/camp4climber and you are basically saying the same thing. But maybe my fleshing it out is useful for OP.

2

neato5000 t1_jccszzz wrote

I've had jobs that were similar to what you describe. My current job contains less by way of tiny tweaks to massive DL models and more feature engineering and engineering in general which suits me better.

My slightly warm take is that DL at the coal face in industry feels very random, very time consuming, and as a result a bit demoralising. More power to you if you have the knack for it, and enjoy it, it's just not super my bag.

1

suflaj t1_jccsipl wrote

You mean the same type of foresight with GPT3, when people (or rather "people", given that it was mostly journalists) got baited into spreading hysteria over the authors claims that the technology is world ending? Or ChatGPT, which was humiliated and jailbroken within 36 hours of its public release?

It has been a day now, and I've heard the same concerns that it's ultimately biased. Definitely not career-ending.

−7

E_Snap t1_jccs0vy wrote

I thought Reddit’s patented lack of foresight regarding technology was mostly located in /r/technology, and yet…

The way I see it, with the pace at which this field moves, those sorts of objections aren’t worth the energy required to type them. They’ll be obsolete and irrelevant by the time you finish writing them.

9

farmingvillein t1_jccqy2i wrote

> Generally it would be unfair to claim that you beat benchmark results if you train for 8x more epochs than other methods. Benchmarks exist to ensure that methods are on a somewhat level playing field. There's certainly some wiggle room depending on the task, but in this case I don't believe that a lower learning rate and more epochs is novel or interesting enough to warrant a full paper.

Although the Llama paper is a bit of a rejoinder here, since training longer is (arguably) their core contribution.

3