Recent comments in /f/MachineLearning

PhysZhongli t1_jccbh4w wrote

Hi everyone, I am a novice trying to learn ML and AI. I am trying to train a CNN model to classify 9000+ images with 100 labels. These images are flower patterns/leaves from what I can tell. The catch is that the actual test dataset has 101 labels and the when the model detects an image not in the original 100 labels it has to assign it to the 101st label. What would be the best way to go about doing this?

I have used resnet50 with imagenet weights and made some of the previous layers trainable to fine tune the model. I have followed it with a globalaverage layer, a 1024 node dense layer with l2 regularization, batchnorm, dropout and softmax layer as the classifer. I am using adam optimizer with a batch size of 16, learning rate of 0.0001. I then assign a threshold value of 0.6 and if the model prediction is below the threshold value it assigns it the 101st label. Currently i have a ~90% testing accuracy.

Are there any obvious things i should be doing better/changing and how can i go about optimising the threshold value or is there a better way to handle the 101st label? Should i be using resnet or something else for flower patterns and leaves given my training dataset of 9000+ images

1

KerfuffleV2 t1_jccb5v1 wrote

Sounds good! The 4bit stuff seems pretty exciting too.

By the way, not sure if you saw it but it looks like PyTorch 2.0 is close to being released: https://www.reddit.com/r/MachineLearning/comments/11s58n4/n_pytorch_20_our_next_generation_release_that_is/

They seem to be claiming you can just drop in torch.compile() and see benefits with no code changes.

1

Purplekeyboard t1_jcc7cuo wrote

But without OpenAI, who would have spent the billions of dollars they have burned through creating and then actually giving people access to models like GPT-3 and now GPT-4?

You can use GPT-3, and even versions of GPT-4, today. Or you can stand and look up at the fortress of solitude that is Google's secret mountain lair where models are created and then hoarded forever.

0

Mikkelisk t1_jcc4spe wrote

> Coming up with architectures that randomly work/don't work, tuningparameters, waiting for days till the model is trained... the level ofuncertainty is just too high for me.

Good news! You say you work in computer vision. There's a high chance that in practice you'll mostly use off-the-shelf solutions and most of your actual time will be spent gathering data:)

16

VelveteenAmbush t1_jcc4mvf wrote

Transformers aren't products, they're technology. Search, Maps, Ads, Translation, etc. -- those were the products. Those products had their own business models and competitive moats that had nothing to do with the technical details of the transformer.

Whereas GPT-4 is the product. Access to it is what OpenAI is selling, and its proprietary technology is the only thing that prevents others from commoditizing it. They'd be crazy to open up those secrets.

−4