Ever wondered if you could run an AI model locally without paying for cloud GPUs? I recently hosted a language model on my Raspberry Pi.
Why Do this?
- Great way to learn about LLMs and APIs
- Easy and fun mini project.
- No monthly fees
- Full control over your data
- Runs entirely offline
Diagram

What I used
Hardware: Raspberry Pi 5 (8GB RAM) with Raspberry Pi Os.
Software:
- Hugging Face (for the LLM model – you’ll need an account)
- Python (Tested with Python 3.11)
- FastAPI (for building a simple API).
Step by Step Guide
- Update Raspberry Pi. First, make sure your Pi is fully updated.
sudo apt update;
sudo apt full-upgrade;
2. Clone the Project Repo. I’ve uploaded all the code to GitHub so you can follow along easily:
git clone https://github.com/franciscosanchezoliver/self_hosting_llm_in_raspberry.git;
cd self_hosting_llm_in_raspberry;
3. Create a Virtual Environment. We’ll isolate the project’s dependencies from the system’s Python packages:
python -m venv .venv;
source .venv/bin/activate;
4. Install Python Dependencies. Install all required libraries using requirements.txt
:
pip install -r requirements.txt
5. Run the API. Start the FastAPI server. The first time you run it, the LLM model will download from Hugging Face and be cached locally.
uvicorn main:app --host 0.0.0.0 --port 9999

6. Test the Summarizer API. Let’s send a request to the API and get a summarized response:
curl -X POST http://localhost:9999/summarize \
-H "Content-Type: application/json" \
-d '{"text": "The Industrial Revolution, which began in the 18th century, was a period of major industrialization and innovation that fundamentally changed the way goods were produced and societies were organized. Originating in Britain, it spread rapidly across Europe and North America. Key innovations included the steam engine, mechanized textile production, and improved methods of iron smelting. These changes led to the growth of cities, the rise of factory-based economies, and significant shifts in population from rural to urban areas. However, the revolution also brought social challenges, such as poor working conditions, child labor, and environmental degradation."}'

Under the Hood – How it works
Here’s a high-level overview of what the code does:
- Creates an API endpoint using FastAPI.
- Loads the summarization model from Hugging Face.
- Passes user input from the API to the model.
- Returns a summary as JSON.

Full Source Code
Here’s the core logic (in case you didn’t clone the repo):
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline
summarizer = pipeline(task="summarization", model="facebook/bart-large-cnn")
app = FastAPI()
class TextToSummarize(BaseModel):
text: str
max_length: int = 50
@app.post("/summarize")
def summarize_text(data: TextToSummarize):
"""
Summarizes the given text using a pre-trained model.
Parameters:
- data: TextToSummarize object containing the text to summarize and max_length.
Returns:
- A dictionary containing the summary.
"""
try:
summary = summarizer(data.text, max_length=data.max_length)
return {"summary": summary[0]["summary_text"]}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Final Thoughts
Running an AI model locally is not only possible, it’s practical, educational, and fun. Whether you’re interested in privacy, cost-saving, or just playing with LLMs, hosting your own summarization API on a Raspberry Pi is a great place to start.
Let’s Connect!
If you liked this post or built something similar, feel free to comment below or reach out!