PK_thundr OP t1_jc9ikiv wrote on March 15, 2023 at 6:53 AM

Reply to comment by Disastrous_Elk_6375 in [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr

I'm aware of this article, my question is more about whether it's better to build our own server or to use existing cloud providers

CKtalon t1_jc9hm91 wrote on March 15, 2023 at 6:41 AM

Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr

Don't think a 40K budget can get you a machine with 256GB VRAM. It's barely enough to get 8xRTX6000 Ada, and that's ignoring how you would need a high-end workstation/server-grade CPU/motherboard to support 8 cards.

bo_peng OP t1_jc9gf72 wrote on March 15, 2023 at 6:25 AM

Reply to comment by KerfuffleV2 in [P] ChatRWKV v2 (can run RWKV 14B with 3G VRAM), RWKV pip package, and finetuning to ctx16K by bo_peng

Update ChatRWKV v2 & pip rwkv package (0.5.0) and set os.environ["RWKV_CUDA_ON"] = '1'

for 1.5x speed f16i8 (and 10% less VRAM, now 14686MB for 14B instead of 16462M - so you can put more layers on GPU)

SnooHesitations8849 t1_jc9ge3v wrote on March 15, 2023 at 6:25 AM

Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr

lamda lab is cheaper

Disastrous_Elk_6375 t1_jc9g5v0 wrote on March 15, 2023 at 6:22 AM

Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr

This is the best article for you right now - https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/

Follow the GPU recommendation chart, and check out the formulas for figuring out if buying or renting is worth it for you. Tim probably has you covered for what you need.

ingeniare t1_jc9f2xb wrote on March 15, 2023 at 6:09 AM

Reply to comment by EricLee8 in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada

No, it will provide an answer that aligns with the CCP's agenda

EricLee8 t1_jc9epdi wrote on March 15, 2023 at 6:04 AM

Reply to comment by Sinkencronge in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada

It will definitely refuse to answer this question.

YouAgainShmidhoobuh t1_jc9a44k wrote on March 15, 2023 at 5:11 AM

Reply to comment by respeckKnuckles in [D] On research directions being "out of date" by redlow0992

Not so sure about this. It seems like a tempting argument but gpt4 has no explanation of model architecture or training approach at all, so there is no way for fair comparison of any kind.

kizumada OP t1_jc99iwj wrote on March 15, 2023 at 5:05 AM

Reply to comment by jakderrida in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada

ERNIE 3.0 or 3.0 Titan is more like BERT+GPT, which fuses both auto-regressive network and auto-encoding network

[deleted] t1_jc98890 wrote on March 15, 2023 at 4:51 AM

Reply to comment by Disastrous_Elk_6375 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

[deleted]

Sinkencronge t1_jc983mf wrote on March 15, 2023 at 4:50 AM

Reply to [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada

It would be interesting to see the alignment work they did. ERNIE, what's your opinion on Taiwan?

jakderrida t1_jc911o0 wrote on March 15, 2023 at 3:46 AM

Reply to comment by currentscurrents in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada

More like their BERT. Get it??

Abradolf--Lincler t1_jc8ynrt wrote on March 15, 2023 at 3:26 AM

Reply to [D] Simple Questions Thread by AutoModerator

Learning about language transformers and I’m a bit confused.

It seems like the tutorials on transformers always make input sequences (ie. Text files batched to 100 words per window) the same length to help with batching.

Doesn’t that mean that the model will only work with that exact sequence length? How do you efficiently train a model to work with any sequence length, such as shorter sequences with no padding and longer sequences than the batched sequence length?

I see attention models advertised as having an infinite window, are there any good resources/tutorials to explain how to make a model like this?

respeckKnuckles t1_jc8xver wrote on March 15, 2023 at 3:20 AM

Reply to [D] On research directions being "out of date" by redlow0992

"Not using gpt4" is going to be in all NLP conference paper reviews for the next six months.

kryatoshi t1_jc8suy9 wrote on March 15, 2023 at 2:43 AM

Reply to [P] vanilla-llama an hackable plain-pytorch implementation of LLaMA that can be run on any system (if you have enough resources) by poppear

You can fit 4bit quantized 65B on M1 Max 64GB RAM, it takes of 40GB unified memory. here https://twitter.com/tljstewart/status/1635326012346048528?s=20

bhagy7 t1_jc8rdbj wrote on March 15, 2023 at 2:32 AM

Reply to [R] Training Small Diffusion Model by crappr

Yes, it is possible to train a small diffusion model conditioned on text captions from scratch on 64x64 images or even smaller. Depending on the complexity of the model and the number of GPUs you are using, it could take anywhere from a few hours to several days. If you are

baffo32 t1_jc8jvoo wrote on March 15, 2023 at 1:41 AM

Reply to [D] ChatGPT without text limits. by spiritus_dei

start some code! invite contributors! :)

baffo32 t1_jc8jgd4 wrote on March 15, 2023 at 1:38 AM

Reply to comment by xKraazY in [D] ChatGPT without text limits. by spiritus_dei

i’m thinking, with practice and research, these abstractions could be done in dynamic ways that can pivot and diversify to new norms

baffo32 t1_jc8j4w0 wrote on March 15, 2023 at 1:36 AM

Reply to comment by big_ol_tender in [D] ChatGPT without text limits. by spiritus_dei

thoughts: each approach has generally something unique that can make it useful, and approaches usually have ways in which they can merge

AllowFreeSpeech t1_jc8fcdz wrote on March 15, 2023 at 1:11 AM

Reply to comment by spiritus_dei in [D] ChatGPT without text limits. by spiritus_dei

Here is a link to the abstract page: https://arxiv.org/abs/2301.04589

trnka t1_jc8csxm wrote on March 15, 2023 at 12:54 AM

Reply to comment by Neeraj666 in [D] Simple Questions Thread by AutoModerator

If you have significant data, I'd suggest starting with BERT (and including some basic baselines).

If you only have a small amount of data, you might be able to use GPT models with a fair amount of prompt engineering.

Also, you'll probably face different challenges if the candidate types the response vs an interviewer is summarizing a response. If it's an interviewer's notes, you might find simple proxies like certain interviewers will type more for good candidates.

currentscurrents t1_jc8a641 wrote on March 15, 2023 at 12:36 AM

Reply to [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada

This is basically China's GPT-3, right?

CatalyzeX_code_bot t1_jc88kq2 wrote on March 15, 2023 at 12:25 AM

Reply to [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada

Found relevant code at https://github.com/lonePatient/ERNIE-text-classification-pytorch + all code implementations here

--

Found relevant code at https://github.com/PaddlePaddle/ERNIE + all code implementations here

--

To opt out from receiving code links, DM me

therentedmule t1_jc86u50 wrote on March 15, 2023 at 12:09 AM

Reply to comment by femboyxx98 in [R] Training Small Diffusion Model by crappr

Many repos are not usable and have click-bait names (e.g., palm-Rlhf)

v_krishna t1_jc7wzmx wrote on March 14, 2023 at 6:11 PM

Reply to comment by PriestOfFern in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

I don't doubt it. I've only been using it for workflow aids (copilot style stuff, and using it to generate unit tests to capture error handling conditions etc), and now we are piloting first generative text products but very human in the loop (customer data used to feed into a prompt but the output then feeds into an editor for a human being to proof and update before doing something with it). The amount of totally fake webinars hosted by totally fake people it has hallucinated is wild (the content and agendas and such sound great and are sensible but none of it exists!)

Recent comments in /f/MachineLearning