Recent comments in /f/MachineLearning
CKtalon t1_jc9hm91 wrote
Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr
Don't think a 40K budget can get you a machine with 256GB VRAM. It's barely enough to get 8xRTX6000 Ada, and that's ignoring how you would need a high-end workstation/server-grade CPU/motherboard to support 8 cards.
bo_peng OP t1_jc9gf72 wrote
Reply to comment by KerfuffleV2 in [P] ChatRWKV v2 (can run RWKV 14B with 3G VRAM), RWKV pip package, and finetuning to ctx16K by bo_peng
Update ChatRWKV v2 & pip rwkv package (0.5.0) and set os.environ["RWKV_CUDA_ON"] = '1'
for 1.5x speed f16i8 (and 10% less VRAM, now 14686MB for 14B instead of 16462M - so you can put more layers on GPU)
SnooHesitations8849 t1_jc9ge3v wrote
Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr
lamda lab is cheaper
Disastrous_Elk_6375 t1_jc9g5v0 wrote
Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr
This is the best article for you right now - https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/
Follow the GPU recommendation chart, and check out the formulas for figuring out if buying or renting is worth it for you. Tim probably has you covered for what you need.
__ingeniare__ t1_jc9f2xb wrote
Reply to comment by EricLee8 in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada
No, it will provide an answer that aligns with the CCP's agenda
EricLee8 t1_jc9epdi wrote
Reply to comment by Sinkencronge in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada
It will definitely refuse to answer this question.
YouAgainShmidhoobuh t1_jc9a44k wrote
Reply to comment by respeckKnuckles in [D] On research directions being "out of date" by redlow0992
Not so sure about this. It seems like a tempting argument but gpt4 has no explanation of model architecture or training approach at all, so there is no way for fair comparison of any kind.
kizumada OP t1_jc99iwj wrote
Reply to comment by jakderrida in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada
ERNIE 3.0 or 3.0 Titan is more like BERT+GPT, which fuses both auto-regressive network and auto-encoding network
[deleted] t1_jc98890 wrote
Reply to comment by Disastrous_Elk_6375 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
[deleted]
Sinkencronge t1_jc983mf wrote
It would be interesting to see the alignment work they did. ERNIE, what's your opinion on Taiwan?
jakderrida t1_jc911o0 wrote
Reply to comment by currentscurrents in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada
More like their BERT. Get it??
Abradolf--Lincler t1_jc8ynrt wrote
Reply to [D] Simple Questions Thread by AutoModerator
Learning about language transformers and I’m a bit confused.
It seems like the tutorials on transformers always make input sequences (ie. Text files batched to 100 words per window) the same length to help with batching.
Doesn’t that mean that the model will only work with that exact sequence length? How do you efficiently train a model to work with any sequence length, such as shorter sequences with no padding and longer sequences than the batched sequence length?
I see attention models advertised as having an infinite window, are there any good resources/tutorials to explain how to make a model like this?
respeckKnuckles t1_jc8xver wrote
"Not using gpt4" is going to be in all NLP conference paper reviews for the next six months.
kryatoshi t1_jc8suy9 wrote
Reply to [P] vanilla-llama an hackable plain-pytorch implementation of LLaMA that can be run on any system (if you have enough resources) by poppear
You can fit 4bit quantized 65B on M1 Max 64GB RAM, it takes of 40GB unified memory. here https://twitter.com/tljstewart/status/1635326012346048528?s=20
bhagy7 t1_jc8rdbj wrote
Reply to [R] Training Small Diffusion Model by crappr
Yes, it is possible to train a small diffusion model conditioned on text captions from scratch on 64x64 images or even smaller. Depending on the complexity of the model and the number of GPUs you are using, it could take anywhere from a few hours to several days. If you are
baffo32 t1_jc8jvoo wrote
Reply to [D] ChatGPT without text limits. by spiritus_dei
start some code! invite contributors! :)
baffo32 t1_jc8jgd4 wrote
Reply to comment by xKraazY in [D] ChatGPT without text limits. by spiritus_dei
i’m thinking, with practice and research, these abstractions could be done in dynamic ways that can pivot and diversify to new norms
baffo32 t1_jc8j4w0 wrote
Reply to comment by big_ol_tender in [D] ChatGPT without text limits. by spiritus_dei
thoughts: each approach has generally something unique that can make it useful, and approaches usually have ways in which they can merge
AllowFreeSpeech t1_jc8fcdz wrote
Reply to comment by spiritus_dei in [D] ChatGPT without text limits. by spiritus_dei
Here is a link to the abstract page: https://arxiv.org/abs/2301.04589
trnka t1_jc8csxm wrote
Reply to comment by Neeraj666 in [D] Simple Questions Thread by AutoModerator
If you have significant data, I'd suggest starting with BERT (and including some basic baselines).
If you only have a small amount of data, you might be able to use GPT models with a fair amount of prompt engineering.
Also, you'll probably face different challenges if the candidate types the response vs an interviewer is summarizing a response. If it's an interviewer's notes, you might find simple proxies like certain interviewers will type more for good candidates.
currentscurrents t1_jc8a641 wrote
This is basically China's GPT-3, right?
CatalyzeX_code_bot t1_jc88kq2 wrote
Found relevant code at https://github.com/lonePatient/ERNIE-text-classification-pytorch + all code implementations here
--
Found relevant code at https://github.com/PaddlePaddle/ERNIE + all code implementations here
--
To opt out from receiving code links, DM me
therentedmule t1_jc86u50 wrote
Reply to comment by femboyxx98 in [R] Training Small Diffusion Model by crappr
Many repos are not usable and have click-bait names (e.g., palm-Rlhf)
v_krishna t1_jc7wzmx wrote
Reply to comment by PriestOfFern in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
I don't doubt it. I've only been using it for workflow aids (copilot style stuff, and using it to generate unit tests to capture error handling conditions etc), and now we are piloting first generative text products but very human in the loop (customer data used to feed into a prompt but the output then feeds into an editor for a human being to proof and update before doing something with it). The amount of totally fake webinars hosted by totally fake people it has hallucinated is wild (the content and agendas and such sound great and are sensible but none of it exists!)
PK_thundr OP t1_jc9ikiv wrote
Reply to comment by Disastrous_Elk_6375 in [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr
I'm aware of this article, my question is more about whether it's better to build our own server or to use existing cloud providers