Bayesian Neural Networks - Capturing The Uncertainty Of The Real World
Life is inherently uncertain and probabilistic, and Bayesian Neural Networks (BNNs) are designed to capture and quantify that uncertainty
In many real-world applications, it's not sufficient to make a prediction; you also want to know how confident you are in that prediction. For example, in healthcare, a model that says a patient has a 70% chance of having a particular disease is less informative than one that says there's a 70% chance but with a margin of error of ±10%.
BNNs are less prone to overfitting, can be more data efficient as they can incorporate priors, and can output a probability distribution for each prediction. Knowing the uncertainty or the probability that a particular prediction is accurate builds trust and confidence with business users.
So how do Bayesian Networks Work?
The core idea is to replace the fixed weights w in a standard neural network with probability distributions P(w)
The famous equation from Bayes is:
P(A|B)=P(B|A)P(A) / P(B)
In the context of BNNs:
A is the model parameters (weights and biases).
B is the observed data.
P(A∣B) is the posterior distribution of the parameters given the data.
P(B∣A) is the likelihood of the data given the parameters.
P(A) is the prior distribution of the parameters.
P(B) is the evidence, often considered a normalizing constant.
Prior Distribution - You start with a prior distribution P(w) over the weights. This represents your initial belief about the model parameters before seeing any data.
Posterior Distribution - The goal is to compute the posterior distribution P(w∣D), which represents the updated belief about the weights after observing data D. Bayes theorem along with some approximation methods are used to calculate this distribution.
Prediction - Finally, to make a prediction for a new input x, you average over all possible weights, weighted by their posterior probabilities:
P(y∣x,D)=∫P(y∣x,w)×P(w∣D)dw
This gives you not just a point estimate but a distribution over the possible outputs y, capturing the model's uncertainty.
For example: BNNs can be applied to a dataset of MRI scans where each scan is labeled either "Cancer" or "No Cancer." The goal is to build a model that can predict these labels for new, unlabeled MRI scans. A BNN can say, "I'm 80% sure this is cancer, but there's a 20% chance it's not," which is valuable information for clinicians.
BNNs are useful wherever uncertainty quantification is important including disease diagnosis, risk assessment, energy forecasting, and real-time decision-making