Weights and Biases
Every connection between nodes in a neural network has a weight (denoted w). Weights are the primary learnable parameters of a network — they determine how much influence each input has on a node's output.
Think of a weight as a volume knob on a signal: a large positive weight amplifies an input's contribution, a weight near zero mutes it, and a negative weight inverts it. When you train a neural network, you are essentially tuning millions of these knobs so that the combined signal produces the right output.
Mathematically, a node computes a weighted sum of its inputs before passing the result through an activation function:
z = w₁x₁ + w₂x₂ + … + wₙxₙ
Each weight wᵢ scales the corresponding input xᵢ. Weights start at small random values and are updated during training via backpropagation — the network nudges each weight in whichever direction reduces the error.
Every node in a neural network also has a term called the bias (denoted β or b).
Two Different Meanings of 'Bias'
The word "bias" means two very different things depending on context:
- Statistical/fairness bias: Systematic errors or prejudiced outcomes in data. This kind of bias is bad and we work hard to eliminate it.
- Neural network bias (this section): A learned parameter that makes your model more flexible. This kind is good.
Weights control the shape of the activation function — specifically, its steepness. If you only had weights, you could stretch or compress the activation curve, but you would always be anchored to zero. You could not shift the entire curve left or right along the input axis.
The bias term does exactly that: it shifts the entire activation function horizontally. Without a bias, your model is constrained to represent relationships that pass through the origin. With a bias, your model can fit any relationship regardless of where it sits on the input scale.
Mathematically, the full computation at a node looks like:
output = φ(w₁x₁ + w₂x₂ + … + wₙxₙ + b)
where φ is the activation function, w are the weights, x are the inputs, and b is the bias. The bias is learned during training via backpropagation, just like the weights.
Current formula
output = σ(1.0·x + 0.0)
Curve midpoint (output = 0.5) at x = 0.00 — bias shifts this left or right; weight controls steepness.
Adjust bias and weight to see how they shift and steepen the sigmoid curve.
Real-World Application
If you are working with data where the relevant feature ranges are shifted far from zero — for instance, predicting housing prices based on square footage, where inputs range from 800 to 5,000 square feet — the bias term is doing a lot of work to anchor the model appropriately. If you accidentally disable biases in your architecture (most frameworks have this as an option), you may see inexplicably poor performance, especially when your features are not zero-centered.
A colleague builds a neural network without bias terms. They find that the model always predicts near zero for inputs with small magnitudes, even though the true outputs for those inputs are large positive values. What most likely explains this?