Jun 10, 2026 · 14 min read
LINEAR REGRESSION
The foundational supervised learning algorithm for predicting continuous values. Learn how the best-fit line is found, its assumptions, limitations, and why it remains the essential baseline for forecasting, pricing, and trend analysis.
Linear Regression is the simplest and most important supervised learning algorithm for predicting continuous numbers.
It finds the straight line (or hyperplane in higher dimensions) that best describes the relationship between your input features and a numeric target. Once the line is found, you can plug in new inputs and get a prediction.
It powers house price estimators, demand forecasting, salary models, and is the default first model every practitioner tries.
How Linear Regression Actually Works
The algorithm does not "guess" the line. It solves a precise optimization problem:
It chooses the slope and intercept that minimize the sum of squared vertical distances between every real data point and the line.

Blue points represent real observations (e.g. house size vs price). The red line is the model’s prediction. The vertical gray lines are the errors the algorithm works to minimize.
This is called Ordinary Least Squares. The math has a closed-form solution, which is why linear regression is extremely fast even on large datasets.
Watch: Linear Regression Explained with Beautiful Animations
VIDEO — STATQUEST (JOSH STARMER)
8-minute masterclass. Watch how the line "settles" into the position that minimizes total squared error. Essential viewing.
Key Assumptions (and Why They Matter)
Linear regression makes several assumptions. When they are badly violated, predictions become unreliable:
- Linearity: The true relationship is approximately straight.
- Independence: Each observation is independent.
- Homoscedasticity: The spread of errors is roughly constant across the range of predictions.
- Normality (for statistical inference): Residuals are normally distributed.
In practice, you check these with residual plots after fitting. Many real-world datasets only roughly satisfy them, yet the model remains useful as a baseline.
Real-World Applications
- Pricing & valuation — Real estate, used cars, insurance premiums
- Demand forecasting — Retail inventory, staffing, cloud compute
- Risk & finance — Credit scoring (as a component), sales projections
- Science & engineering — Sensor calibration, physics simulations
Whenever the output is a number and you want something fast and explainable to stakeholders, start here.
Strengths and Limitations
Strengths
- Extremely fast to train and predict
- Coefficients are directly interpretable ("for every additional square foot, price goes up $X")
- Works surprisingly well as a baseline even when data is messy
- Closed-form solution — no hyperparameter tuning required for basic version
Limitations
- Assumes linear relationships (struggles with curves, interactions, thresholds)
- Sensitive to outliers (they pull the entire line)
- Cannot capture complex patterns without feature engineering (polynomials, interactions)
- Multicollinearity between features can make coefficients unstable
Pro Tip: Always Start with Linear Regression
Even when you ultimately plan to use a fancy model, fit a simple linear regression first. It will tell you:
- Whether there is any signal at all in your features.
- Which features matter most (via coefficient size and significance).
- How much performance you actually gain by moving to a more complex model.
Many "advanced" projects discover that a well-tuned linear model was 90% as good as the neural net — and far more trustworthy in production.
Next Steps
- Read the full guide: 5 Essential Machine Learning Algorithms Explained Simply
- Build one yourself: Building Your First ML Model
- Understand the bigger picture: Machine Learning Concepts
Linear regression is not just a beginner algorithm — it is the reference point against which all more powerful methods are judged. Master it first.
Part of the knowledge graph at The Best Blog Ever — reference definitions for ideas that matter.
Related Concepts
Related Analysis