vAquilla

Deploy local LLMs with smart and auto GPU management

vAquila is an open-source AI model inference manager. It combines the absolute simplicity of a CLI with the production performance of vLLM and the isolation of Docker, all with smart and automated GPU management. It orchestrates everything for you. Like an eagle soaring over your infrastructure, it analyzes your GPU state in real-time, calculates the perfect memory ratio, and deploys the vLLM Docker container invisibly and securely.