Anybody train a personal LLM? Any suggestions?

Drop table users;

Wise, Aged Ars Veteran
545
Whatup nerds. I am looking for a personal llm that I can feed my documents so I can conversationally retrieve information. The goal is to run it in a vm in my proxmox host and feed it documentation so I can more easily find what I am looking for. There are a number of personal LLMS currently in existence, but I don't want to spend the next six months researching and deploying shitty solutions until I find the right one.
So I was hoping some of you have already done so and can help me.

Thanks in advance.
 

OrangeCream

Ars Legatus Legionis
55,362
Whatup nerds. I am looking for a personal llm that I can feed my documents so I can conversationally retrieve information. The goal is to run it in a vm in my proxmox host and feed it documentation so I can more easily find what I am looking for. There are a number of personal LLMS currently in existence, but I don't want to spend the next six months researching and deploying shitty solutions until I find the right one.
So I was hoping some of you have already done so and can help me.

Thanks in advance.
The term you’re looking for, I think, is RAG: retrieval augmented generation.

Is that what you’re looking for? A well reviewed and received program you can build/install/configure then use? I have no personal experience so all I can mention is how you can search GitHub for RAG, and this was the top hit:

If you’ve got a 30xx GPU you can give Chat with RTX a spin:

Ars has a small review which also called it rough around the edges:

Having never tried any of them all I can provide is links. Good luck, this is an interesting development in software that I’m interested in as well.
 

ShuggyCoUk

Ars Tribunus Angusticlavius
9,975
Subscriptor++
There is RAG, but also you could fine tune with something like LoRa instead. I believe the latter is computationally much more expensive, and to a degree impractical without access to a serious GPU. Though my knowledge of successful fine tunes of a highly quantised model is horribly out of date, so maybe it is something that is in range of a 16GB consumer card rather than a 40+ data centre setup (or indeed a cluster of same)