Recent comments in /f/MachineLearning
googler_ooeric t1_jc7q62q wrote
Reply to comment by sebzim4500 in [D] ChatGPT without text limits. by spiritus_dei
i'd say it depends, DALL-E is better at photorealistic stuff and stability from my experience, but Stable Diffusion is way more versatile and can actually replicate famous IPs
BellRock99 t1_jc7oki4 wrote
Reply to comment by chaotycmunkey in [D] Comparing models implemented in PyTorch and Tensorflow by chaotycmunkey
You could try asking their implementation
foreignEnigma t1_jc7n9mq wrote
IMO, first, it's the fault of AC, who is sending the paper to the wrong set of reviewers. Second, I guess you may need to distinguish the work better and explain why it differs from the current trend. Third, good luck :)
pyepyepie t1_jc7l2m4 wrote
Reply to [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Write me a poem about right-wing vs left-wing:
Right wing, left wing, Politics divide us and make us see. Different views, different opinions, One thing is clear, it’s a battle royale.
The right with its conservative views, Firm in its beliefs, it sticks to the rules. The left, ever so liberal, In favor of social justice, it won’t relent.
Both sides struggle to gain the upper hand, Each thinking their way is the right way. One thing is clear, we can’t agree, So let us compromise and find a way.
In the end, it’s the people that matter, Not the right or the left, that will make the matter. Let us come together and make a stance, Unity and respect will be our gain.
So right and left, let us find a way, To come together and make a new day.
ChatGPT: Right-wing and left-wing, two sides of a coin Each with its own ideology to join One touting tradition, the other progress Each thinks their way is the best
The right-wing values individuality Free markets and small governmentality They believe in personal responsibility And defend their rights and liberty
The left-wing sees society as a whole Inequality and injustice they want to control They promote social welfare and public good And want to create a fairer neighbourhood
Right-wing and left-wing, different as can be Both with their own views, but not easy to agree Their debates can be fierce, their opinions strong But a balance between them is where we belong
So let's listen to each other, let's try to understand For a democracy to flourish, we need both sides at hand We can't dismiss one, for the other to prevail We need both wings to make our society sail.
chaotycmunkey OP t1_jc7ki7b wrote
Reply to comment by BellRock99 in [D] Comparing models implemented in PyTorch and Tensorflow by chaotycmunkey
My goal is to test their model on a new dataset that I believe will fail to perform on. And my proposed model is supposed to be an improvement. As such, I have to run their model.
suflaj t1_jc7jibo wrote
Reply to comment by spiritus_dei in [D] ChatGPT without text limits. by spiritus_dei
> The same could have been said of Deep Learning until the Image Net breakthrough. The improvement process is evolutionary, and this may be a step in that process.
This is not comparable at all. ImageNet is a database for a competition - it is not a model, architecture or technique. When it was "beaten", it was beaten not by a certain philosophy or ideas, it was beaten by a proven implementation of a mathematically sound idea.
This is neither evaluated on a concrete dataset, nor is it delved into deeply in the mathematical sense. This is a preprint of an idea that someone fiddled with using a LLM.
> As for reinforcement learning, it has been successfully applied in many real-world scenarios, including robotics, game playing, and autonomous driving.
My point is that so has the 6 year old DNC. The thing is, however, that neither of those is your generic reinforcement learning - they're very specifically tuned for the exact problem they are dealing with. If you actually look at what is available for DRL, you will see that aside from very poor framework support, probably the best we have is Gym, the biggest issue is how to even get the environment set up to enable learning. The issue is in making the actual task you're learning easy enough for the agent to even start learning. The task of knowing how to memorize or recall is incredibly hard, and we humans don't even understand memory well enough to construct problem formulations for those two.
Whatever technique you come up with, if you can't reproduce it for other problems or models, you will just be ending up with a specific model. I mean - look at what you are saying. You're mentioning AlphaGo. Why are you mentioning a specific model/architecture for a specific task? Why not a family of models/architectures? Maybe AlphaZero, AlphaGo, MuZero sound similar, but they're all very, very different. And there is no real generalization of them, even though they all represent reinforcement learning.
> This is one path and other methods could be incorporated such as capsule networks, which aim to address the limitations of traditional convolutional neural networks by explicitly modeling the spatial relationships between features.
And those are long shown to be a scam, basically. Well, maybe not fundamentally scam, but definitely dead. Do you know what essentially killed them? Transformers. And do you know why Transformers are responsible for almost killing the rest of DL architectures? Because they showed actual results. The paper that is the topic of this thread fails to differentiate the contribution of this method disregarding the massive transformer they're using alongside it. If you are trying to show the benefits of a memory augmented system, why simply not use a CNN or LSTM as controller? Are the authors implying that this memory system they're proposing needs a massive transformer to even use it? Everything about it is just so unfinished and rough.
> Another approach is to use memory augmented networks to store and update embeddings of entities and their relationships over time, and use capsule networks to decode and interpret these embeddings to make predictions. This approach can be especially useful for tasks that involve sequential data, such as language modeling and time-series forecasting.
Are you aware that this exactly has been done by Graves et al., where the external memory is essentially a list of embeddings that is 1D convoluted on? The problem, like I mentioned, is that this kind of process is barely differentiable. Even if you do fuzzy search (Graves at al. use sort of an attention based on access frequency alongside the similarity one), your gradients are so sparse your network basically doesn't learn anything. Furthermore, the output of your model is tied to this external memory. If you do not optimize the memory, then you are limiting the performance of your model severely. If you are, then what you're doing is nothing novel, you have just arbitrarily decided that part of your monolithic network is memory, even though it's just one thing.
BellRock99 t1_jc7fsag wrote
Trust the implementation, or simply just use the metrics presented in their papers on the standard datasets. The latter is more correct in my opinion, since even your implementation could be wrong.
[deleted] t1_jc7ffe5 wrote
[deleted]
ivalm t1_jc7e22p wrote
Reply to comment by modeless in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Yup, catastrophically failed all my medical reasoning prompts (that davinci-2/3/ChatGPT get right)
spiritus_dei OP t1_jc7cu8c wrote
Reply to comment by spiritus_dei in [D] ChatGPT without text limits. by spiritus_dei
Here is more information on capsule networks: https://arxiv.org/abs/1710.09829
spiritus_dei OP t1_jc7ccww wrote
Reply to comment by suflaj in [D] ChatGPT without text limits. by spiritus_dei
>I have skimmed over it before writing this. They have what working? Synthetic toy examples? Great, Graves et al. had even more practically relevant problems solved 6 years ago. The thing is, it never translated into solving real world problems, and the paper and follow up work didn't really manage to demonstrate how it could actually be used.
>
>So, until this paper results in some metrics on known datasets, model frameworks and weights, I'm afraid there's nothing really to talk about. Memory augmented networks are nasty in the sense that they require transfer learning or reinforcement learning to even work. Memorizing things with external memory is not exactly a compression task, which DNNs and gradient descent solve.
The same could have been said of Deep Learning until the Image Net breakthrough. The improvement process is evolutionary, and this may be a step in that process.
You make a valid point. While the paper demonstrates the computational universality of memory-augmented language models, it does not provide concrete metrics on known datasets or model frameworks. Additionally, as you mentioned, memory-augmented networks can be challenging to train and require transfer learning or reinforcement learning to work effectively.
Regarding the concern about transfer learning, it is true that transferring knowledge from one task to another can be challenging. However, recent research has shown that transfer learning can be highly effective for certain tasks, such as natural language processing and computer vision. For example, the BERT model has achieved state-of-the-art performance on many natural language processing benchmarks using transfer learning. Similarly, transfer learning has been used to improve object recognition in computer vision tasks.
As for reinforcement learning, it has been successfully applied in many real-world scenarios, including robotics, game playing, and autonomous driving. For example, AlphaGo, the computer program that defeated a world champion in the game of Go, was developed using reinforcement learning.
This is one path and other methods could be incorporated such as capsule networks, which aim to address the limitations of traditional convolutional neural networks by explicitly modeling the spatial relationships between features. For example, capsule networks could be used in tandem with memory augmented networks by using capsule networks to encode information about entities and their relationships, and using the memory augmented networks to store and retrieve this information as needed for downstream tasks. This approach can be especially useful for tasks that involve complex reasoning, such as question answering and knowledge graph completion.
Another approach is to use memory augmented networks to store and update embeddings of entities and their relationships over time, and use capsule networks to decode and interpret these embeddings to make predictions. This approach can be especially useful for tasks that involve sequential data, such as language modeling and time-series forecasting.
Jeffy29 t1_jc7a2af wrote
Reply to comment by Disastrous_Elk_6375 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
> and releasing liquid waste which does not contain harmful toxins
Gonna add this to my CV as one of my skills.
Jeffy29 t1_jc79t9p wrote
Reply to comment by modeless in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Yep, I tried it using some of the prompts I had in my ChatGPT history and it was way worse. At best it performed slightly worse at simple prompts but failed completely at more complex prompts ones and code analyses. Still good for 7B model nothing like ChatGPT.
LetterRip t1_jc79qjb wrote
Reply to comment by farmingvillein in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Stability.AI has been funding RWKV's training.
xKraazY t1_jc79i02 wrote
Reply to comment by big_ol_tender in [D] ChatGPT without text limits. by spiritus_dei
Don't use external libraries because they abstract important concepts (talking about langchain and llama-index). They're great for starting out, but the rate at which everything is moving, these libraries become obsolete in 2-3 months.
Anjz t1_jc758w9 wrote
Reply to [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Blows my mind, they used a large language model to train a small one.
>Fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers.
Now imagine what's possible with GPT-4 training a smaller language model and a bigger instruction sample with corporate backing to use hundreds of A100's at the same time for days at a time?
We're already in reach of exponential growth for low powered devices, it's not going to take years like people have predicted.
suflaj t1_jc73bnx wrote
Reply to comment by [deleted] in [D] ChatGPT without text limits. by spiritus_dei
I have skimmed over it before writing this. They have what working? Synthetic toy examples? Great, Graves et al. had even more practically relevant problems solved 6 years ago. The thing is, it never translated into solving real world problems, and the paper and follow up work didn't really manage to demonstrate how it could actually be used.
So, until this paper results in some metrics on known datasets, model frameworks and weights, I'm afraid there's nothing really to talk about. Memory augmented networks are nasty in the sense that they require transfer learning or reinforcement learning to even work. It's hard to devise a scheme where you can punish bad memorization or recall, because it's hard to link the outcome of some recall + processing to the process that caused such recall.
Part of the reason for bad associative memorization and recall is the data itself. So naturally, it follows that you should just be able to optimize the memorized data, no? Well, it sounds trivial, but it ends up either non-differentiable (because of an exact choice, rather than a fuzzy one), or hard to train (vanishing or sparse gradients). And you have just created a set of neural networks, rather than just a monolithic one. That might be an advantage, but it is nowhere near as exciting as this paper would lead you to believe. And that would not be novel at all: hooking up a pretrained ResNet with a classifier would be of the same semantics as that, if you consider the ResNet a memory bank: a 7 year old technique at this point.
Memorizing things with external memory is not exactly a compression task, which DNNs and gradient descent solve, so it makes sense that it's hard in a traditional DL setting.
[deleted] t1_jc732z3 wrote
Reply to comment by suflaj in [D] ChatGPT without text limits. by spiritus_dei
[deleted]
madebyollin t1_jc7306u wrote
Reply to [R] Training Small Diffusion Model by crappr
some "full-power" repos from well-known developers:
- https://github.com/huggingface/diffusers
- https://github.com/lucidrains/denoising-diffusion-pytorch
- https://github.com/crowsonkb/k-diffusion
I also posted my own tiny standalone code for 64x64 diffusion model training here: https://github.com/madebyollin/dino-diffusion - though it doesn't have text conditioning.
suflaj t1_jc7119l wrote
Reply to [D] ChatGPT without text limits. by spiritus_dei
This is not something new. It was already present 6 years ago, pioneered by Graves et al (https://www.nature.com/articles/nature20101). The takeaway was that it's hard, if not impossible to train.
The paper did not present any benchmarks on known sets. Until that happens, sadly, there is nothing really to discuss. Neat idea, but DL is all about results nowadays.
I was personally working on a full neural memory system myself, I built the whole framework for it, just to find out it wouldn't train on even a toy task. Graves' original work required curriculum learning to work for even toy tasks, and I am not aware of any significant achievement using his Differentiable Neural Computers.
apluskale t1_jc6zl9r wrote
Reply to comment by sebzim4500 in [D] ChatGPT without text limits. by spiritus_dei
You have to remember that Dall-E is worse only because there's little interest and money in it. Text is much more useful/hyped compared to images.
mikonvergence t1_jc6yxlo wrote
Reply to [R] Training Small Diffusion Model by crappr
This has example of both low and high resolution data, all from scratch and also accompanying videos! No text to image case though as it only focuses on image modalities.
PriestOfFern t1_jc6x37m wrote
Reply to comment by v_krishna in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Take it from someone who spent a long time working on a davinchi support bot, it’s not that easy. It doesn’t matter how much time you spend working on the prompt, gpt will no matter what, find some way to randomly hallucinate something.
Sure it might get rid of a majority of hallucinating, but not a reasonable amount. Fine tuning might fix this (citation needed), but I haven’t played around with it enough to comfortably tell you.
Hameliton t1_jc6pz1h wrote
Reply to comment by rpnewc in [R] Training Small Diffusion Model by crappr
Phil Wang is the goat as always
No_Complaint_1304 t1_jc7szbg wrote
Reply to [D] Simple Questions Thread by AutoModerator
Complete beginner looking for insight
I made an extremely efficient algorithm in C that skim through a data base and search for words, I want to add a feature that if it is not found the program can somehow understand the context and predict what is the actual word intended and also conjugate the verbs accordingly. I have no idea if what I am saying is crazy hard to implement or can easily be done by someone with experience. This field interest me a lot and i will definitely come back to this sub sooner or later, but right now i don’t have time to dig in this subject, I just want to finish this project, slap a good looking gui and get over with it. Can I achieve what i stated above in a week or am i just dreaming? If it is possible what resources do you think I should be looking at? Ty :>