llama.jar

Local AI inference in Minecraft based on llama.cpp

46 Downloads

fabricforgelibrary

Rent Server with this Mod

About this Mod

llama.jar 🦙

llama.jar is a lightweight, high-performance Minecraft mod that brings the power of local Large Language Models (LLMs) directly into Minecraft using llama.cpp.

Runs completely offline on your own hardware with no API keys, no subscriptions, and zero external network requests.

[!WARNING]
This is an alpha release and has not been fully tested in all environments. Use with caution and please report any bugs or issues you encounter.

Key Features

Offline Local LLM Inference: Run your favorite open-source models (such as LLaMA-3, Qwen-2.5, Phi-3, Mistral, and more) directly inside the Minecraft process.
Hardware Acceleration & GPU Support: Supports multi-core CPU execution alongside GPU offloading (CUDA, Metal, OpenCL/CLBlast) for hardware-accelerated generation. Supports multi-GPU configurations.
Built-in Game Integration: Prompt your loaded models in real-time using in-game commands or listen to interactions.
Modding Library Base: Fully architected to serve as a library mod. You can import llama.jar in your custom mods to easily add AI-driven NPCs, dynamic quest generation, or intelligent automated assistants.
Wide Version Compatibility: Supports both Forge and Fabric platforms across multiple Minecraft versions:
- 1.20.1
- 1.21.1
- 1.21.11

How to Setup

Download the Mod: Download the correct jar matching your mod loader (Forge or Fabric) and Minecraft version.
Add GGUF Models: Download any GGUF format model (e.g. llama-3-8b-instruct.Q4_K_M.gguf or smaller models like qwen-2.5-1.5b-instruct) and place the .gguf file inside your .minecraft/models/ folder.
Launch Minecraft: The mod will automatically set up the workspace on startup.

In-Game Commands

Manage models and run prompts on the fly using command permissions:

/model list — Lists all available GGUF files in your models/ directory.
/model load <filename> — Loads the selected model into memory.
/model status — Displays information about the currently active model.
/llama <prompt> — Submits a prompt to the loaded model and streams the output directly to chat.
/llama stop — Immediately halts the current text generation.

Configuration

A common configuration file (llamajar-common.toml or llamajar.json) is generated in your config directory. You can tweak performance parameters:

modelName — Name of the GGUF model to load automatically on startup (leave empty to load manually).
systemPrompt — Set a default system prompt to customize model behavior, personality, or guidelines (leave empty for no system prompt).
gpuLayers — The number of model layers to offload to the GPU (higher offloads more computation to GPU VRAM).
threads — Number of CPU threads to allocate to model inference (defaults to matching CPU physical cores).
contextLength — The maximum context window size (tokens) for conversations.

Developer Integration (Library Mod Usage)

To use llama.jar as a foundation for your own AI-enabled mod, add it to your development environment.

Gradle Setup

repositories {
    mavenLocal() // After publishing llama.jar locally
}

dependencies {
    // For Fabric development
    modImplementation "com.popr4x.llamajar:llamajar-fabric-1.20.1:alpha-1.0"
    
    // For Forge development
    implementation "com.popr4x.llamajar:llamajar-forge-1.20.1:alpha-1.0"
}

Accessing llama.cpp context in Java

import com.llamajar.LlamaJar;
import de.kherud.llama.LlamaModel;
import de.kherud.llama.LlamaIterator;

// Check if a model is currently loaded
if (LlamaJar.isModelLoaded()) {
    LlamaModel model = LlamaJar.getModel();
    // Perform custom inferences, register custom listeners, or manage model state
}

License

This project is licensed under the AGPLv3 License. Under the hood, it utilizes llama.cpp and the Java JNI bindings from de.kherud:llama licensed under Apache-2.0.

Available Versions

llama.jar alpha-1.0alpha

MC 1.21.11forge

May 19, 2026

llama.jar alpha-1.0alpha

MC 1.21.1forge

May 19, 2026

llama.jar alpha-1.0alpha

MC 1.20.1forge

May 19, 2026

llama.jar alpha-1.0alpha

MC 1.21.11fabric

May 19, 2026

llama.jar alpha-1.0alpha

MC 1.21.1fabric

May 19, 2026

How to Install llama.jar on Your Server

Order Server

Order a Minecraft Java server with at least 3 GB RAM (4 GB recommended).

Set fabric Loader

In the panel under "Egg", select the fabric loader and matching Minecraft version (1.21.11).

Install Mod

Open the mod browser in the dashboard and search for "llama.jar". Click "Install" – done! Alternatively, upload the .jar via SFTP to the /mods folder.

Compatibility

Mod Loaders

fabricforge

Minecraft Versions

1.21.11, 1.21.1, 1.20.1

Server-side

✓ Required

Recommended RAM

4 GB(min. 3 GB)

Frequently Asked Questions

llama.jar server crashes on startup – what to do?

Most common cause: wrong fabric version or insufficient RAM. Check the server log (latest.log) for "OutOfMemoryError" or "Mixin" errors. With Mado Hosting: ensure at least 3 GB RAM is allocated and the loader matches the mod version (1.21.11). You can switch loaders with one click in the panel.

Is llama.jar compatible with fabric and forge?

llama.jar officially supports fabric, forge for Minecraft 1.21.11, 1.21.1, 1.20.1. Note: Forge and Fabric mods are NOT cross-compatible – pick one loader and stick with it. The Mado dashboard automatically detects incompatible loader combinations.

Server lagging with llama.jar – how to optimize performance?

Recommended RAM: 4 GB (per 8 players). Use /spark profiler to check if llama.jar consumes the most tick time. Common fixes: reduce server view-distance to 8-10, install "performant" or "starlight" as supplementary mods on Forge. With Mado Hosting, your server runs on NVMe SSDs with dedicated CPU cores for minimal latency.