🏠 Getting Started with Local LLMs [02] Choosing Models
Local LLM Setup Series
- [01] Learning the Basics
- [02] Choosing Models
- [03] Installing Ollama and Checking It Works
- [04] Installing Open WebUI and Using It Across the LAN
👀 Table of Contents
- 🍓 Models for Raspberry Pi Connection Experiments
- 📚 Models for ZINE Writing and Structure Support
- 🤔 Comparing the Tendencies of Each AI’s Suggestions
- 🎯 Narrowing Down the First Models to Try
- ✍️ Summary This Time
🍓 Models for Raspberry Pi Connection Experiments
As a first step in trying local LLMs, I asked several generative AIs to suggest models suited for
“connecting with a Raspberry Pi and checking whether it can respond.”
My PC setup is an RTX 4070 SUPER 12GB / 64GB RAM / Ryzen 7 7800X3D.
With this configuration, it seems most realistic to begin with models in roughly the 4B to 9B range.
Models Recommended by ChatGPT
With an RTX 4070 SUPER 12GB / 64GB RAM / 7800X3D,
this is already a solid machine to use as a local LLM host for experimentation.
That said, the realistic target range is mainly 4B to 9B. If you start by dreaming too big and jump straight to large models, your research will be harder to move forward.
| No. | Model Name | Notes |
|---|---|---|
| 1 | Qwen3.5 4B | Good for checking the basic setup first. Fast and unlikely to fail. |
| 2 | Qwen3.5 9B | Main candidate. Good for personality tuning, long conversations, and design discussion. |
| 3 | Gemma 4 E4B | Useful for comparing a different model family and seeing differences in tone and personality. |
| 4 | Qwen3.5 27B | A heavier comparison test, mainly to see where the boundary is for practical daily use. |
Models Recommended by Copilot
RTX 4070 SUPER (12GB) + 7800X3D + 64GB RAM is a golden balance that fully covers lightweight to mid-sized local LLMs.
| No. | Model Name | Notes |
|---|---|---|
| 1 | Qwen2.5 7B | Stable personality and strong Japanese performance. |
| 2 | Qwen2.5 3B | Good for high-speed experiments, especially Raspberry Pi connection tests. |
| 3 | Phi-3.5 Mini / Medium | Adds more depth to the worldbuilding side. |
| 4 | Llama 3.1 8B | A candidate for future expansion, especially for longer text and reasoning support. |
Models Recommended by Gemini
Since the GPU has 12GB of VRAM, the best experience will likely come from fully loading models in the 8B to 10B class into VRAM and running them at high speed.
| No. | Model Name | Notes |
|---|---|---|
| 1 | Llama-3.1-8B (GGUF) | A high-performance standard model. Also good at Japanese. |
| 2 | Gemma-2-9B-it | A lightweight model developed by Google. |
The explanations for the first and second choices were also very practical.
First choice: Llama-3.1-8B-Instruct (GGUF / Q8_0 or Q6_K)
- Reason: With 12GB of VRAM, even a higher-bit quantization should still fit comfortably.
- Size: Around 6GB to 8GB
Second choice: Gemma-2-9B-it (GGUF / Q6_K)
- Reason: Its Japanese tone tends to feel softer.
- Size: Around 8GB
A More Numerical View of What Seems Likely to Run Well
Copilot also summarized the practical range in table form.
| Model | Recommended Quantization | VRAM Usage | On RTX 4070 SUPER | Response Speed from RasPi |
|---|---|---|---|---|
| Qwen2.5 3B | Q4_K_M | 3–4GB | Plenty of room | Very fast |
| Qwen2.5 7B | Q4_K_M | 6–7GB | Plenty of room | Fast |
| Phi-3.5 Mini | Q4_K_M | 3GB | Plenty of room | Fast |
| Phi-3 Medium | Q4_K_M | 6GB | Plenty of room | Medium |
| Llama 3.1 8B | Q4_K_M | 8–9GB | Tight but workable | Medium |
📚 Models for ZINE Writing and Structure Support
Next, I also asked about models suited for supporting ZINE writing and structure.
What I want from a local LLM is not just casual chat.
I also wanted to see whether it could help with drafting, chapter structure, organizing key points, and adjusting sentence endings and tone.
Models Recommended by ChatGPT
To put the conclusion first: for ZINE writing and structure, mid-sized instruct models that can naturally organize long text are more likely to be useful than oversized reasoning-heavy models.
| Model Name | Notes |
|---|---|
| Qwen3 14B | Good for chapter structure, heading organization, and turning ideas into bullet points. |
| Gemma 3 12B | Good at reading long text, restructuring it, and unifying tone. |
| LLM-jp 8B | Useful for comparing Japanese phrasing. |
Models Recommended by Copilot
To put it simply: local LLMs that really help with the “structure” of ZINE writing do exist.
| Model Name | Notes |
|---|---|
| Qwen2.5 14B | Strong at outlining, organizing key points, and structuring content. |
| Llama 3.1 8B | Good at making writing easier to read. |
| Gemma 9B | Good at adding a ZINE-like sense of warmth and tone. |
Models Recommended by Gemini
If you want to make use of this PC setup locally (RTX 4070 SUPER / 12GB VRAM), there are several models that are strong for brainstorming structure and revising drafts.
| Model Name | Notes |
|---|---|
| Llama-3-Swallow-8B-v0.1 (or Instruct) | Good balance of Japanese ability and structural strength. |
| Gemma-2-9B-IT | Also balanced in Japanese ability and structural support. |
| Command R | Stronger for context understanding and long-text consistency. |
| Mistral-Nemo-12B-Instruct-v1 | A good balance of speed and lightness. |
🤔 Comparing the Tendencies of Each AI’s Suggestions
Looking at them side by side, each generative AI had its own distinct tendency.
- Copilot tended to give practical suggestions separated clearly by use case
- ChatGPT tended to suggest candidates with an overall balance in mind
- Gemini tended to narrow things down based on what best fits the PC specs
For the Raspberry Pi connection experiment, the suggestions broadly followed the same flow:
- First, check the connection with a lightweight model
- Next, evaluate conversation quality with something in the 7B to 9B range
- Then, if needed, compare with heavier models
For ZINE writing support, on the other hand, the division seemed closer to each phase of the work:
- Models that are good for drafting
- Models that are good at organizing long text
- Models that are good at adjusting sentence endings and tone
So it looks like the strengths differ depending on the production step.
🎯 Narrowing Down the First Models to Try
Looking over all of this, these seem like the best candidates to start with.
1. First candidates for Raspberry Pi connection experiments
- Qwen2.5 3B
- Qwen2.5 7B
- Llama 3.1 8B
These seem to have a good balance of lightness, response speed, and stable Japanese output,
which makes them well suited to the purpose of
“getting it running,” “connecting it,” and “seeing how it responds.”
2. First candidates for ZINE writing support
- Llama 3.1 8B / Llama-3-Swallow-8B family
- Gemma 9B / 12B family
- Qwen 14B family
For this purpose, it seems better not to throw everything into real use all at once, but instead switch models depending on the production phase:
- Drafting, rough text layouts, and bullet points
- Turning ideas into fuller prose
- Adjusting sentence endings and tone
If I compare that to background art, programming, or video production, it might look something like this:
| ZINE | Background Art | Programming | Video |
|---|---|---|---|
| Drafts, rough text layouts, bullet points | Color planning | Front-end design / initial planning | Storyboard |
| Prose expansion and long-form writing | Paint-in / build-up | Implementation and back-end design | Video storyboard / rough edit |
| Adjusting endings and tone | Final polish | Cleanup and refactoring | Final compositing |
✍️ Summary This Time
What became clear this time is that choosing models for local LLMs seems easier if I decide on
entry-point models for each purpose,
rather than trying to find one single “strongest” model.
At least in the beginning, it seems more practical to separate them into:
- Lightweight models for connection experiments
- Mid-sized models for writing and structure support
That way, the direction of trial and error becomes much easier to see.
Next, I’ll actually install Ollama and start building a local LLM environment by running a lightweight model first.