I’m currently shopping around for something a bit faster than ollama and because I could not get it to use a different context and output length, which seems to be a known and long ignored issue. Somehow everything I’ve tried so far did miss one or more critical features, like:
- “Hot” model replacement, so loading and unloading models on demand
- Function calling
- Support of most models
- OpenAI API compatibility (to work well with Open WebUI)
I’d be happy about any recommendations!
You must log in or register to comment.
Vllm unfortunately doesn’t support switching the model without a restart.
I would not recommend LocalAI. There documentation is somewhat lacking and it’s an all in one utility with many moving parts. The parts also tend to break, quite often.