Introduction to Neural Networks in AI, Machine Learning

12 Sep, 2025
AI

An Introduction to Neural Networks

Neural Network. This architecture is also known by other names, such as a Feedforward Network, Multi-Layer Perceptron (MLP), or, when it has many hidden layers, a Deep Neural Network (DNN). It is a fundamental concept for anyone who wants to learn about deep learning.

Neural Networks in AI

Let's start by assuming a simple network with:

One input layer
One hidden layer
One output layer

As you can see in the diagram, we have three input nodes, represented by X?, X?, and X?. We have two nodes in the hidden layer: H? and H?. The final node produces the output, which we will call ? (Y-hat).

Each input is connected to every node in the hidden layer by a value called a 'weight'. This "everything-to-everything” connection pattern is what defines a dense or fully connected network. The hidden layer nodes H? and H? are then also connected to the output node by another set of weights.

What a Neural Network Does (In Simple Terms)

Traditional Problem-Solving: Here, we have 'Input', a predefined 'Equation/Logic', and we get an 'Output'.
Neural Network Problem-Solving: We have 'Input' and the corresponding 'Output' data. We train the neural network so it can discover the underlying pattern or logic that connects the inputs to the outputs. Once trained, the network can apply this learned logic to new inputs to produce accurate predictions.

Key Components Explained

Weights:
Weights are the core parameters of the network that are continuously updated during training. They represent the strength and direction (positive or negative) of the connection between neurons. The process of learning is essentially the process of finding the optimal values for these weights.

Bias:
Bias is an additional parameter that provides the model with more flexibility. It allows the network to fit the data better by shifting the activation function to the left or right. You can think of it as an adjustment value that helps the model make accurate predictions even when all inputs are zero. It accounts for any constant offset in the data.

Activation Function:
By itself, the calculation inside a neuron (a weighted sum of inputs plus a bias) is linear. However, most real-world data requires non-linear relationships to be modeled accurately. An activation function is applied to this linear sum to introduce non-linearity into the network, enabling it to learn complex patterns.

Some common activation functions are:

Sigmoid: Compresses any input value into a range between 0 and 1. It is often used for output layers in binary classification. (Note: Its use in hidden layers is now less common).
ReLU (Rectified Linear Unit): Outputs the input directly if it is positive; otherwise, it outputs zero. It is the most widely used activation function for hidden layers due to its computational efficiency.
Leaky ReLU: A variant of ReLU that allows a small, non-zero output for negative inputs. This helps prevent the "dying ReLU" problem where neurons can become permanently inactive.
Tanh (Hyperbolic Tangent): Compresses any input value into a range between -1 and 1. It is often preferred over the sigmoid function for hidden layers as it produces outputs that are zero-centered.

Forward Propagation Calculation

Connections:

Input X? is connected to H? (Weight W?) and H? (Weight W?).
Input X? is connected to H? (Weight W?) and H? (Weight W?).
Input X? is connected to H? (Weight W?) and H? (Weight W?).

Therefore:

Hidden node H? is connected by weights W?, W?, and W?.
Hidden node H? is connected by weights W?, W?, and W?.

Given Values:
Inputs: X? = 5, X? = 1, X? = 13

Step 1: Calculate the value for Hidden Node H?

Weights to H?: W?=2, W?=0.3, W?=5
Bias for H?: B? = -8
Z? = (W? * X?) + (W? * X?) + (W? * X?) + B?
Z? = (2 * 5) + (0.3 * 1) + (5 * 13) + (-8)
Z? = 10 + 0.3 + 65 - 8 = 67.3
Apply the Sigmoid activation function to get the output G?:

G? = σ(Z?) = 1 / (1 + e??¹)
G? = 1 / (1 + e^(-67.3))
Since e^(-67.3) is an extremely small number (effectively 0), G? ≈ 1.0

Step 2: Calculate the value for Hidden Node H?

Weights to H?: W?=0.12, W?=2, W?=0.32
Bias for H?: B? = -7
Z? = (W? * X?) + (W? * X?) + (W? * X?) + B?
Z? = (0.12 * 5) + (2 * 1) + (0.32 * 13) + (-7)
Z? = 0.6 + 2 + 4.16 - 7 = -0.24
Apply the Sigmoid activation function to get the output G?:

G? = σ(Z?) = 1 / (1 + e??²)
G? = 1 / (1 + e^(-(-0.24))) = 1 / (1 + e^(0.24))
e^(0.24) ≈ 1.271, so G? = 1 / (1 + 1.271) ≈ 1 / 2.271 ≈ 0.44

Step 3: Calculate the Final Output (?)

The outputs G? and G? become the inputs to the output layer.
Weights to Output: W?=4, W?=9
Bias for Output: B?=7
? = (W? * G?) + (W? * G?) + B?
? = (4 * 1.0) + (9 * 0.44) + 7
? = 4 + 3.96 + 7 = 14.96

This process of calculating an output from an input by passing it through the network is called forward propagation. The next step in training is backpropagation, where the network's prediction (14.96) is compared to the true value, the error is calculated, and all the weights and biases are adjusted to reduce this error.

Various architectures of neural networks are extensively employed in computer vision and generative AI, enabling tasks such as image recognition, object detection, and the creation of highly realistic synthetic content