What is a parameter?

A neural network is a large graph of simple math operations. At each connection in that graph sits a number — a weight — that controls how strongly one node influences another. These weights are the parameters. During training, billions of examples are fed through the network, and the weights are adjusted slightly each time until the network's predictions match the desired outputs.

After training is done, the weights are frozen. The model file you download is essentially just a very large list of these frozen numbers.

Analogy: Think of parameters as the volume knobs on a mixing board with billions of channels. Training gradually turns each knob to the right setting. Once trained, the board is locked in — and the "sound" it produces is the model's intelligence.
Small (5 params) few connections limited capacity Large (billions of params) billions of weighted connections vast knowledge encoded

Common model sizes and what they mean

SizeExample modelsFP16 VRAM neededUse case
1–4BPhi-3 Mini 3.8B, Gemma 2 2B~3–8 GBOn-device, fast, light tasks
7–8BLlama 3.1 8B, Mistral 7B~14–16 GBGreat all-rounder for everyday use
13BLlama 2 13B, Vicuna 13B~26 GBBetter reasoning, still efficient
27–34BGemma 2 27B, Qwen 2.5 32B~54–68 GBCoding, long documents, nuanced tasks
70BLlama 3.3 70B~140 GBNear-frontier quality locally
405BLlama 3.1 405B~810 GBMulti-node or cloud territory

The good news: quantization (the next topic) can cut these VRAM numbers by 4–8×, making even a 70B model runnable on a high-end consumer GPU.

More parameters ≠ always better

Architecture improvements matter enormously. A well-designed 8B model trained on high-quality data can outperform a poorly-trained 70B model on specific tasks. The parameter count is the starting point, not the whole story. Pay attention to benchmarks and community reports, not just the "B" number.

How this maps to your RTX / Spark

Parameter count maps almost directly to VRAM: each parameter stored at FP16 takes 2 bytes, so an 8B model needs roughly 16 GB at full precision. An RTX 4060 Ti 16GB can hold it exactly; quantized to 4-bit it drops to ~4 GB, freeing room for context and activations.

The DGX Spark's 128 GB unified memory holds a 70B model at full FP16 precision (140 GB is tight — quantized Q8 at ~70 GB fits easily) and makes 30B–70B models feel like running an 8B on a desktop GPU.