Recent comments in /f/MachineLearning

EcstaticStruggle t1_jc1jts4 wrote

How do you combine hyper parameter optimization with early stopping in cross-validation for LightGBM?

Do you:

  1. Use the same validation set for hyperparameter performance estimation as well as early stopping evaluation (e.g., 80% training, 20% early stopping + validation set)
  2. Create a separate fold within cross-validation for early stopping evaluation. (e.g. 80%, 10%, 10% training, early stopping, validation set)
  3. Set aside a different dataset altogether (like a test set) which is constantly used for early stopping across different cross-validation folds for early stopping evaluation.

In the case of 1) and 2), how would you use early stopping once you identified optimal hyperparameters? Normally, you would re-fit on the entire dataset with the best hyperparameters, but this removes the early stopping data.

1

KerfuffleV2 t1_jc1jtg5 wrote

/u/bo_peng

I didn't want to clutter up the issue here: https://github.com/BlinkDL/ChatRWKV/issues/30#issuecomment-1465226569

In case this information is useful for you:

strategy time tps tokens
cuda fp16 *0+ -> cuda fp16 *10 45.44 1.12 51
cuda fp16 *0+ -> cuda fp16 *5 43.73 0.94 41
cuda fp16 *0+ -> cuda fp16 *1 52.7 0.83 44
cuda fp16 *0+ -> cpu fp32 *1 59.06 0.81 48
cuda fp16i8 *12 -> cuda fp16 *0+ -> cpu fp32 *1 65.41 0.69 45

I ran the tests using this frontend: https://github.com/oobabooga/text-generation-webui

It was definitely using rwkv version 0.3.1

env RKWV_JIT_ON=1 python server.py \ 
  --rwkv-cuda-on \ 
  --rwkv-strategy STRATEGY_HERE \ 
  --model RWKV-4-Pile-7B-20230109-ctx4096.pth 

For each test, I let it generate a few tokens first to let it warm up, then stopped it and let it generate a decent number. Hardware is a Ryzen 5 1600, 32GB RAM, GeForce GTX 1060 6GB VRAM.

Surprisingly, streaming everything as fp16 was still faster than putting 12 fp16i8 layers in VRAM. A 1060 is a pretty old card, so maybe it has unusual behavior dealing with that format. I'm not sure.

1

I1onza t1_jc1g0u9 wrote

I'm a material engineering student and an outsider to the ML and AI community. During my studies I take notes on my laptop and don't have a quick and reliable solution for copying down simple graphs. With recent publicity of AI models I was wondering if someone already tried to train a model to draw graphs form natural language. DALL - E does it quite horribly (Cf. picture ). If you haven't heard of such a thing, maybe its a project you might find interesting to make.

0

firecz t1_jc1bwjs wrote

I would love to see this as a (Windows) GUI, similar to what some Stable Diffusion solutions do (nmkd, grisk...) - the entire code running offline on your PC, not sending something to Discord or elsewhere.
This would open it to masses, which in turn would pour in more money for research.

1

KerfuffleV2 t1_jc18f6a wrote

Huh, that's weird. You can try reducing the first one from 7 to 6 or maybe even 5:

cuda fp16 *6 -> cuda fp16 *0+ -> cpu fp32 *1

Also, be sure to double check for typos. :) Any incorrect numbers/punctuation will probably cause problems. Especially the "+" in the second part.

2