Poll: Were my instructions clear enough?

LadyBabyloin · (457)

Hello all! I made a pair of pregnancy fiction focused LLMs, specialized in RP in particular.

Eileithyia-7B is a 7 billion parameter model based on Mistral 7B. Eileithyia-13B is a 13 billion parameter model based on Llama 2 (via TiefighterLR).

Don't let the jargon spook you, these days running your own LLM at home is easier than ever, and I set things up so you don't even need a graphics card anymore! The skill ceiling is still high, but the floor is a lot lower.

Just install Oobabooga WebUI by downloading this ZIP. Run 'start_windows.bat', follow the simple prompts about your hardware, and you're installed. It will start a program you can access in your browser, at 127.0.0.1:7860 (or the link given in the CMD box)

Now, you need to download one of the models. For a beginner to such things, I recommend the smaller 7B model, at the 5 bit ("q5_k_m") size. This basically means it has been shrunk down to about a quarter of the original size, with minimal quality loss. It will use about 10GB of system RAM (or VRAM).

The BLUE box on the guide shows the model tab. When it loads, the UI will be in chat mode, so click into the Model tab to see the screen in the example. The GREEN box on the guide shows how to download the 5 bit 7B model. Be sure to fill out both boxes, because without the bottom box you'll download all 3 sizes!

Once the download is complete, use the "Model" dropdown (RED box) to select the model. At this point, the screen should look just like the demo, and you can click "Load" to the right of the dropdown and load the model (WHITE box). Then, the AI will speak to you back on the first page, though you may want to make some adjustments over time. If you have a graphics card, you can increase "n-gpu-layers" (YELLOW box) to move part or all of the model (remember, ~10GB of RAM/VRAM total) to your GPU's VRAM, speeding the AI up by a very VERY large amount.

loki80 · (0)

Already downloading!
Any chance for GPTQ version for 13B model?

LadyBabyloin · (457)

My hardware really makes GPTQ out of reach, and Colab is being a PITA.

I can do EXL2 if you want? It's my preference for inference anyhow, and works with most things GPTQ is compatible with.

loki80 · (0)

I do not tried exl2 before (will need to update drivers and Ooba for newer CUDA), but I like to try.
I have 3060 with 12GB - and with exllama it's allow me to use 13B models in VRAM (GPTQ ones) stable and fast. But GGUF with Q5 is don't fit - as result it's slow on CPU offload and unstable with OOM's when context is growing...

But I already tried both models - and I like them. 7B is not bad (even pretty good as story writer), but 13b is absolutely what I want from the such thematic model. Otherwise the storywriting it's very good for the RP and ERP, looks like finetune sticks good with the tiefighter base.

Feunski · (357)

I'm new to using LLMs. Which one would work best with a RTX 4090 and 64 GB of system memory?

yosukebellyhunter · (21)

it keeps telling me connection error out

LadyBabyloin · (457)

(November 15, 2023, 5:47 pm)loki80 I do not tried exl2 before (will need to update drivers and Ooba for newer CUDA), but I like to try.
I have 3060 with 12GB - and with exllama it's allow me to use 13B models in VRAM (GPTQ ones) stable and fast. But GGUF with Q5 is don't fit - as result it's slow on CPU offload and unstable with OOM's when context is growing...

But I already tried both models - and I like them. 7B is not bad (even pretty good as story writer), but 13b is absolutely what I want from the such thematic model. Otherwise the storywriting it's very good for the RP and ERP, looks like finetune sticks good with the tiefighter base.

Thank you very much! Once my dev cycle cools down, I'll get EXL2s available as well.

(November 16, 2023, 3:13 am)Feunski I'm new to using LLMs. Which one would work best with a RTX 4090 and 64 GB of system memory?

Unquestionably the 13B model, you can run the q5_k_m fast, and probably the slightly better q8_0 as well. I'm working on an even bigger 20B model, that's probably going to be the sweet spot for you.

(November 19, 2023, 11:21 am)yosukebellyhunter it keeps telling me connection error out

Strange! My advice is to check the command window that opens for a link, and try that instead, it *should* work.

LadyBabyloin · (457)

The 20b version of the model is now available, strongly suggested if you have >12GB VRAM. It's a huge improvement!

genius531 · (0)

Thank you for this. This is one of the coolest things ive ever seen on this forum and especially with newest models are just outstanding and great work!

gohstwolf · (216)

I have tried it but I have the problem that it is really slow I love the idea but I don't want to wait so long for a response.
I have a 3090 with 32gbram my pc is not the big problem has you eventually Tipps to make it faster?