Activation Functions

You can think of the activation function as the network's decision-maker at each node. After computing the weighted sum of its inputs, a neuron asks: "How much of this signal should I pass forward?" The activation function answers that question.

Without activation functions, no matter how many layers you stack, your network would still be computing a linear function. A linear function of a linear function is still linear. Activation functions introduce the non-linearity that makes deep networks capable of approximating complex patterns.

ƒInteractive · Activation Functions

SigmoidTanhReLULeaky ReLU

−10Input x = 0.0010

Current output values

Sigmoid0.5000

Sigmoid— σ(x) = 1 / (1 + e^(−x)) · Maps input to (0, 1). Used as an output activation for binary classification.

Toggle functions on and off, then drag the slider to see how each activation responds to different input values.

Checkpoint

You are building a classifier that identifies one of ten different manufacturing defect types from sensor readings. Which activation function should you use in the output layer?

←PreviousWeights and BiasesFoundations of Neural Networks Next→Deep LearningFoundations of Neural Networks