← Return to Log

🏠 Getting Started with Local LLMs [02] Choosing Models

P-chan
LLM DevLog PochomLab

Local LLM Setup Series

👀 Table of Contents


🍓 Models for Raspberry Pi Connection Experiments

As a first step in trying local LLMs, I asked several generative AIs to suggest models suited for
“connecting with a Raspberry Pi and checking whether it can respond.”

My PC setup is an RTX 4070 SUPER 12GB / 64GB RAM / Ryzen 7 7800X3D.
With this configuration, it seems most realistic to begin with models in roughly the 4B to 9B range.

With an RTX 4070 SUPER 12GB / 64GB RAM / 7800X3D,
this is already a solid machine to use as a local LLM host for experimentation.
That said, the realistic target range is mainly 4B to 9B. If you start by dreaming too big and jump straight to large models, your research will be harder to move forward.

No.Model NameNotes
1Qwen3.5 4BGood for checking the basic setup first. Fast and unlikely to fail.
2Qwen3.5 9BMain candidate. Good for personality tuning, long conversations, and design discussion.
3Gemma 4 E4BUseful for comparing a different model family and seeing differences in tone and personality.
4Qwen3.5 27BA heavier comparison test, mainly to see where the boundary is for practical daily use.

RTX 4070 SUPER (12GB) + 7800X3D + 64GB RAM is a golden balance that fully covers lightweight to mid-sized local LLMs.

No.Model NameNotes
1Qwen2.5 7BStable personality and strong Japanese performance.
2Qwen2.5 3BGood for high-speed experiments, especially Raspberry Pi connection tests.
3Phi-3.5 Mini / MediumAdds more depth to the worldbuilding side.
4Llama 3.1 8BA candidate for future expansion, especially for longer text and reasoning support.

Since the GPU has 12GB of VRAM, the best experience will likely come from fully loading models in the 8B to 10B class into VRAM and running them at high speed.

No.Model NameNotes
1Llama-3.1-8B (GGUF)A high-performance standard model. Also good at Japanese.
2Gemma-2-9B-itA lightweight model developed by Google.

The explanations for the first and second choices were also very practical.

  • First choice: Llama-3.1-8B-Instruct (GGUF / Q8_0 or Q6_K)

    • Reason: With 12GB of VRAM, even a higher-bit quantization should still fit comfortably.
    • Size: Around 6GB to 8GB
  • Second choice: Gemma-2-9B-it (GGUF / Q6_K)

    • Reason: Its Japanese tone tends to feel softer.
    • Size: Around 8GB

A More Numerical View of What Seems Likely to Run Well

Copilot also summarized the practical range in table form.

ModelRecommended QuantizationVRAM UsageOn RTX 4070 SUPERResponse Speed from RasPi
Qwen2.5 3BQ4_K_M3–4GBPlenty of roomVery fast
Qwen2.5 7BQ4_K_M6–7GBPlenty of roomFast
Phi-3.5 MiniQ4_K_M3GBPlenty of roomFast
Phi-3 MediumQ4_K_M6GBPlenty of roomMedium
Llama 3.1 8BQ4_K_M8–9GBTight but workableMedium

📚 Models for ZINE Writing and Structure Support

Next, I also asked about models suited for supporting ZINE writing and structure.

What I want from a local LLM is not just casual chat.
I also wanted to see whether it could help with drafting, chapter structure, organizing key points, and adjusting sentence endings and tone.

To put the conclusion first: for ZINE writing and structure, mid-sized instruct models that can naturally organize long text are more likely to be useful than oversized reasoning-heavy models.

Model NameNotes
Qwen3 14BGood for chapter structure, heading organization, and turning ideas into bullet points.
Gemma 3 12BGood at reading long text, restructuring it, and unifying tone.
LLM-jp 8BUseful for comparing Japanese phrasing.

To put it simply: local LLMs that really help with the “structure” of ZINE writing do exist.

Model NameNotes
Qwen2.5 14BStrong at outlining, organizing key points, and structuring content.
Llama 3.1 8BGood at making writing easier to read.
Gemma 9BGood at adding a ZINE-like sense of warmth and tone.

If you want to make use of this PC setup locally (RTX 4070 SUPER / 12GB VRAM), there are several models that are strong for brainstorming structure and revising drafts.

Model NameNotes
Llama-3-Swallow-8B-v0.1 (or Instruct)Good balance of Japanese ability and structural strength.
Gemma-2-9B-ITAlso balanced in Japanese ability and structural support.
Command RStronger for context understanding and long-text consistency.
Mistral-Nemo-12B-Instruct-v1A good balance of speed and lightness.

🤔 Comparing the Tendencies of Each AI’s Suggestions

Looking at them side by side, each generative AI had its own distinct tendency.

  • Copilot tended to give practical suggestions separated clearly by use case
  • ChatGPT tended to suggest candidates with an overall balance in mind
  • Gemini tended to narrow things down based on what best fits the PC specs

For the Raspberry Pi connection experiment, the suggestions broadly followed the same flow:

  • First, check the connection with a lightweight model
  • Next, evaluate conversation quality with something in the 7B to 9B range
  • Then, if needed, compare with heavier models

For ZINE writing support, on the other hand, the division seemed closer to each phase of the work:

  • Models that are good for drafting
  • Models that are good at organizing long text
  • Models that are good at adjusting sentence endings and tone

So it looks like the strengths differ depending on the production step.


🎯 Narrowing Down the First Models to Try

Looking over all of this, these seem like the best candidates to start with.

1. First candidates for Raspberry Pi connection experiments

  • Qwen2.5 3B
  • Qwen2.5 7B
  • Llama 3.1 8B

These seem to have a good balance of lightness, response speed, and stable Japanese output,
which makes them well suited to the purpose of
“getting it running,” “connecting it,” and “seeing how it responds.”

2. First candidates for ZINE writing support

  • Llama 3.1 8B / Llama-3-Swallow-8B family
  • Gemma 9B / 12B family
  • Qwen 14B family

For this purpose, it seems better not to throw everything into real use all at once, but instead switch models depending on the production phase:

  • Drafting, rough text layouts, and bullet points
  • Turning ideas into fuller prose
  • Adjusting sentence endings and tone

If I compare that to background art, programming, or video production, it might look something like this:

ZINEBackground ArtProgrammingVideo
Drafts, rough text layouts, bullet pointsColor planningFront-end design / initial planningStoryboard
Prose expansion and long-form writingPaint-in / build-upImplementation and back-end designVideo storyboard / rough edit
Adjusting endings and toneFinal polishCleanup and refactoringFinal compositing

✍️ Summary This Time

What became clear this time is that choosing models for local LLMs seems easier if I decide on
entry-point models for each purpose,
rather than trying to find one single “strongest” model.

At least in the beginning, it seems more practical to separate them into:

  • Lightweight models for connection experiments
  • Mid-sized models for writing and structure support

That way, the direction of trial and error becomes much easier to see.

Next, I’ll actually install Ollama and start building a local LLM environment by running a lightweight model first.