Parameters — Local AI Guide

What is a parameter?

A neural network is a large graph of simple math operations. At each connection in that graph sits a number — a weight — that controls how strongly one node influences another. These weights are the parameters. During training, billions of examples are fed through the network, and the weights are adjusted slightly each time until the network's predictions match the desired outputs.

After training is done, the weights are frozen. The model file you download is essentially just a very large list of these frozen numbers.

Analogy: Think of parameters as the volume knobs on a mixing board with billions of channels. Training gradually turns each knob to the right setting. Once trained, the board is locked in — and the "sound" it produces is the model's intelligence.

Common model sizes and what they mean

Size	Example models	FP16 VRAM needed	Use case
1–4B	Phi-3 Mini 3.8B, Gemma 2 2B	~3–8 GB	On-device, fast, light tasks
7–8B	Llama 3.1 8B, Mistral 7B	~14–16 GB	Great all-rounder for everyday use
13B	Llama 2 13B, Vicuna 13B	~26 GB	Better reasoning, still efficient
27–34B	Gemma 2 27B, Qwen 2.5 32B	~54–68 GB	Coding, long documents, nuanced tasks
70B	Llama 3.3 70B	~140 GB	Near-frontier quality locally
405B	Llama 3.1 405B	~810 GB	Multi-node or cloud territory

The good news: quantization (the next topic) can cut these VRAM numbers by 4–8×, making even a 70B model runnable on a high-end consumer GPU.

More parameters ≠ always better

Architecture improvements matter enormously. A well-designed 8B model trained on high-quality data can outperform a poorly-trained 70B model on specific tasks. The parameter count is the starting point, not the whole story. Pay attention to benchmarks and community reports, not just the "B" number.

How this maps to your RTX / Spark

Parameter count maps almost directly to VRAM: each parameter stored at FP16 takes 2 bytes, so an 8B model needs roughly 16 GB at full precision. An RTX 4060 Ti 16GB can hold it exactly; quantized to 4-bit it drops to ~4 GB, freeing room for context and activations.

The DGX Spark's 128 GB unified memory holds a 70B model at full FP16 precision (140 GB is tight — quantized Q8 at ~70 GB fits easily) and makes 30B–70B models feel like running an 8B on a desktop GPU.