Recent comments in /f/MachineLearning
Kinexity t1_jc1lwah wrote
Reply to comment by light24bulbs in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
That is fast. We are literally talking about a high end laptop CPU from 5 years ago running a 30B LLM.
EcstaticStruggle t1_jc1jts4 wrote
Reply to [D] Simple Questions Thread by AutoModerator
How do you combine hyper parameter optimization with early stopping in cross-validation for LightGBM?
Do you:
- Use the same validation set for hyperparameter performance estimation as well as early stopping evaluation (e.g., 80% training, 20% early stopping + validation set)
- Create a separate fold within cross-validation for early stopping evaluation. (e.g. 80%, 10%, 10% training, early stopping, validation set)
- Set aside a different dataset altogether (like a test set) which is constantly used for early stopping across different cross-validation folds for early stopping evaluation.
In the case of 1) and 2), how would you use early stopping once you identified optimal hyperparameters? Normally, you would re-fit on the entire dataset with the best hyperparameters, but this removes the early stopping data.
KerfuffleV2 t1_jc1jtg5 wrote
Reply to [P] ChatRWKV v2 (can run RWKV 14B with 3G VRAM), RWKV pip package, and finetuning to ctx16K by bo_peng
I didn't want to clutter up the issue here: https://github.com/BlinkDL/ChatRWKV/issues/30#issuecomment-1465226569
In case this information is useful for you:
| strategy | time | tps | tokens |
|---|---|---|---|
cuda fp16 *0+ -> cuda fp16 *10 |
45.44 | 1.12 | 51 |
cuda fp16 *0+ -> cuda fp16 *5 |
43.73 | 0.94 | 41 |
cuda fp16 *0+ -> cuda fp16 *1 |
52.7 | 0.83 | 44 |
cuda fp16 *0+ -> cpu fp32 *1 |
59.06 | 0.81 | 48 |
cuda fp16i8 *12 -> cuda fp16 *0+ -> cpu fp32 *1 |
65.41 | 0.69 | 45 |
I ran the tests using this frontend: https://github.com/oobabooga/text-generation-webui
It was definitely using rwkv version 0.3.1
env RKWV_JIT_ON=1 python server.py \
--rwkv-cuda-on \
--rwkv-strategy STRATEGY_HERE \
--model RWKV-4-Pile-7B-20230109-ctx4096.pth
For each test, I let it generate a few tokens first to let it warm up, then stopped it and let it generate a decent number. Hardware is a Ryzen 5 1600, 32GB RAM, GeForce GTX 1060 6GB VRAM.
Surprisingly, streaming everything as fp16 was still faster than putting 12 fp16i8 layers in VRAM. A 1060 is a pretty old card, so maybe it has unusual behavior dealing with that format. I'm not sure.
Necessary_Ad_9800 t1_jc1j36g wrote
Reply to comment by MorallyDeplorable in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Where can I see stuff generated from this model?
Guitargamer57 t1_jc1hhjz wrote
I tested it using Japanese and it seems like it misses punctuation for the most part. But, overall, seems to be doing a good job getting the words.
ihexx t1_jc1g3bd wrote
Reply to comment by boyetosekuji in [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes
my guess is model size
I1onza t1_jc1g0u9 wrote
Reply to [D] Simple Questions Thread by AutoModerator
I'm a material engineering student and an outsider to the ML and AI community. During my studies I take notes on my laptop and don't have a quick and reliable solution for copying down simple graphs. With recent publicity of AI models I was wondering if someone already tried to train a model to draw graphs form natural language. DALL - E does it quite horribly (Cf. picture ). If you haven't heard of such a thing, maybe its a project you might find interesting to make.
filisterr t1_jc1c8lp wrote
So is this post kind of a hidden advertisement or what?
filisterr t1_jc1c7ss wrote
Reply to comment by Deep-Station-1746 in [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes
Yes, that's probably a cherry picking marketing only.
filisterr t1_jc1c3aa wrote
Reply to comment by Bulky_Highlight_3352 in [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes
Typical, basing your research on open source projects and then make a commercial product on top of other people's work. Great achievement.
firecz t1_jc1bwjs wrote
I would love to see this as a (Windows) GUI, similar to what some Stable Diffusion solutions do (nmkd, grisk...) - the entire code running offline on your PC, not sending something to Discord or elsewhere.
This would open it to masses, which in turn would pour in more money for research.
Jean-Porte t1_jc1axno wrote
Reply to [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung
-Machine find a strategy to beat machine
-Human implements the strategy and beats machine
-Therefore human beats machine
Deep-Station-1746 t1_jc196cg wrote
>25% improvement over Whisper
>Not open source
>doubt.jpeg
gaybooii t1_jc1950l wrote
Reply to comment by brandonZappy in [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes
Lmaoo
KerfuffleV2 t1_jc18f6a wrote
Reply to comment by Select_Beautiful8 in [P] ChatRWKV v2 (can run RWKV 14B with 3G VRAM), RWKV pip package, and finetuning to ctx16K by bo_peng
Huh, that's weird. You can try reducing the first one from 7 to 6 or maybe even 5:
cuda fp16 *6 -> cuda fp16 *0+ -> cpu fp32 *1
Also, be sure to double check for typos. :) Any incorrect numbers/punctuation will probably cause problems. Especially the "+" in the second part.
CashyJohn t1_jc184r4 wrote
Wav2vec2 is still sota as long as this isn’t open source it’s kinda useless lmao
Balance- t1_jc16bi6 wrote
Reply to [P] Introducing confidenceinterval, the long missing python library for computing confidence intervals by jacobgil
Looks awesome!
I would also post at r/Python and/or r/DataScience
AsIAm t1_jc168cw wrote
Reply to comment by Taenk in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
It is. But that doesn't mean 1-bit neural nets are impossible. Even Turing himself toyed with such networks – https://www.npl.co.uk/getattachment/about-us/History/Famous-faces/Alan-Turing/80916595-Intelligent-Machinery.pdf?lang=en-GB
[deleted] t1_jc1652y wrote
[removed]
Viacheslav_Varenia t1_jc13mkn wrote
Does it support Ukrainian and Russian?
nucLeaRStarcraft t1_jc1334g wrote
Why is this tagged [R]. This is a commercial project at best. Where's the paper, where's the code? Can we use it today on our PC like whisper? This really isn't 'research'.
kuraisle t1_jc11j00 wrote
Reply to comment by Simusid in [D] Simple Questions Thread by AutoModerator
That's really helpful, thank you!
HotRecognition0121 t1_jc10odk wrote
are there wer for other languages? Like in the github page for whisper? I want to compare the performance in other languages
futilehabit t1_jc10obb wrote
Reply to comment by cr125rider in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Guess hospice is pretty boring
Raise_Fickle t1_jc1p9x5 wrote
Reply to [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Anyone having any luck finetuning LLama in a multi-gpu setup?