Foundations of Algorithmic Decision Making: Probabilistic Reasoning & Representation

- February 14, 2026

Foundations of Algorithmic Decision Making

Representation for Probabilistic Reasoning

Uncertainty is everywhere. Even systems governed by precise physical laws — like satellite trajectories — can drift because small imprecisions compound over time. Measurements contain noise. Sensors fail. Humans behave unpredictably.

A decision-making system that ignores uncertainty is fragile. One that models it explicitly becomes powerful. Probability gives us a structured language for representing uncertainty and reasoning under it.

1. Degrees of Belief

Probability can be interpreted as a degree of belief — a numerical way of expressing how plausible we think a proposition is.

$$ A \succ B \quad \text{or} \quad P(A) > P(B) $$

To behave rationally, our beliefs must follow two core assumptions:

Universal Comparability — Any two events can be ranked.
Transitivity — If $$ A \succeq B $$ and $$ B \succeq C $$, then $$ A \succeq C $$.

Example Use: Medical Triage Prioritization

A clinician assigns degrees of belief to “Patient X will deteriorate within 2 hours” (e.g., P = 0.82) versus “Patient Y will deteriorate within 2 hours” (e.g., P = 0.47). This quantified belief directly informs who receives urgent attention — turning subjective judgment into auditable, consistent prioritization.

2. Discrete Probability Distributions

A discrete distribution assigns probabilities to outcomes from a countable set — like dice rolls or categories. All probabilities must sum to 1:

$$ \sum_{i=1}^{n} P(X=i) = 1 $$

Bias toward face 6 (θ₆): 0.30

As you increase the bias toward face 6, the probabilities of the other faces decrease proportionally so that the total still sums to 1.

Example Use: Spam Email Classification

An email classifier outputs a discrete distribution over classes: P(spam) = 0.93, P(personal) = 0.04, P(work) = 0.02, P(newsletter) = 0.01. The sum is exactly 1. This allows not just binary “spam/ham” decisions, but calibrated risk assessment — e.g., routing high-confidence spam to quarantine, while low-confidence cases go to human review.

3. Continuous Probability Distributions

When variables can take infinitely many values, we use probability density functions (PDFs). The Gaussian (Normal) distribution is the most fundamental example:

$$ p(x) = \mathcal{N}(x \mid \mu, \sigma^2) $$

Mean (μ): 0

Std Dev (σ): 1

Example Use: Sensor Fusion in Autonomous Vehicles

A self-driving car fuses lidar and radar distance readings into a single Gaussian distribution: 𝒩(μ = 12.4 m, σ² = 0.09 m²). This continuous representation captures both the best estimate (12.4 m) and its precision (±0.3 m), enabling safe path planning — e.g., braking only when P(distance < 10 m) > 0.99.

4. Conditional Probability

Often, we want to update beliefs given evidence. Bayes’ Rule formalizes this transition from prior to posterior:

$$ P(x|y) = \frac{P(y|x)P(x)}{P(y)} $$

Example Use: Fraud Detection in Real-Time Payments

A bank computes P(fraud | transaction amount = $4,892, location = Tokyo, time = 2:17 AM) using Bayes’ rule. It combines the prior fraud rate (e.g., 0.001), the likelihood of that pattern given fraud (e.g., 0.21), and the overall probability of that pattern (e.g., 0.004). Result: P(fraud | evidence) ≈ 0.052 — triggering step-up authentication, not blocking.

5. Sigmoid Model (Classification)

For binary decisions, we often use a smooth transition. Instead of a hard threshold, the sigmoid provides a confidence level:

$$ P(x=1 \mid y) = \frac{1}{1 + \exp\left(-\frac{2(y-\theta_1)}{\theta_2}\right)} $$

Threshold (θ₁): 0

Sharpness (θ₂): 1

Example Use: Credit Approval Scoring

A lending model maps a credit score y (e.g., FICO 680) through a sigmoid: P(approval | y=680) = 0.74. This isn’t just “yes/no” — it enables dynamic pricing (higher interest for lower P(approval)) and explains rejections (“Your score places you at 74% approval likelihood; raising it by 20 points would increase to 91%”).

6. Bayesian Networks

Real systems involve many interacting variables. Bayesian networks represent these relationships using a directed acyclic graph.

$$ P(x_{1:n}) = \prod_{i=1}^{n} P(x_i \mid \text{pa}(x_i)) $$

This factorization works because of conditional independence—the key insight that makes large systems computationally tractable.

Example Use: Diagnostic Reasoning in Healthcare AI

A Bayesian network models symptoms (fever, cough), test results (PCR+, chest X-ray), and diseases (flu, pneumonia, COVID-19). Given fever + cough + PCR+, it computes P(COVID-19 | evidence) = 0.87 while automatically accounting for dependencies (e.g., fever is less specific if flu is present). Clinicians see not just the top diagnosis, but full posterior distributions across all conditions.

Putting It All Together

Probabilistic reasoning provides the backbone of intelligent behavior. Whether you're building recommender systems or autonomous agents, these representations are the foundation of modern AI.

Search This Blog

World's Better Half