📈 Linear regression & correlation

Ever noticed that taller players tend to score more in basketball? Or that the more you practice, the better you get? Data often follows patterns like this — and linear regression is the math tool that helps you find, describe, and use those patterns.

Use Next and Previous at the bottom to move through each page. Each page has one idea, real examples, and sometimes a graph.

Here’s what you’ll learn:

What correlation means — when two things tend to go up or down together
The correlation coefficient r — a number that tells you how strong the pattern is
What a line of best fit is and how to use it to make predictions

Watch these (open in a new tab):

Fitting a line to data (Khan Academy)
Covariance and the regression line (Khan Academy)

Quick check

What does linear regression help you do? (linear regression = the method from this lesson)

Count how many data points you have Find and use patterns between two variables Draw bar graphs

What is correlation?

Correlation describes whether two things tend to move together. When one goes up, does the other usually go up too? Go down? Or is there no connection at all?

Here are three types of patterns you might see:

Three possible patterns

Positive: Both go up together — like hours of sleep and how energized you feel. Negative: One goes up while the other goes down — like hours of screen time before bed and how rested you feel in the morning. No pattern: No real connection — like shoe size and favorite color.

Quick check

If one variable goes up and the other usually goes down, the correlation is: (correlation = how x and y move together)

Negative Positive Zero

The correlation coefficient r

Scientists summarize correlation with a single number called r. It’s always between −1 and +1. You don’t calculate it by hand — you use it to read and describe how strong a pattern is.

What does r mean?

r	What it means	What the graph looks like
close to +1	Strong positive — when x goes up, y usually goes up too	Dots cluster near a line going up-right
close to 0	No real pattern — the two things aren’t related	Dots scattered everywhere, no clear line
close to −1	Strong negative — when x goes up, y usually goes down	Dots cluster near a line going down-right

r ≈ +1 → strong positive | r ≈ −1 → strong negative | r ≈ 0 → no pattern

Think of r as a "pattern score." The closer to +1 or −1, the stronger and clearer the pattern.

Quick check

When r is close to 0, what does that mean? (r = correlation coefficient)

Strong positive pattern Strong negative pattern No clear linear pattern

Scatter plot: positive correlation

A scatter plot shows two things plotted against each other — each dot is one data point. This one shows hours studied (x) vs test score (y). You can see right away that more study time tends to mean a higher score.

Dots going up and to the right = positive correlation. The pattern doesn’t need to be a perfect line — as long as the trend is clear, it counts.

Quick check

On a scatter plot, dots going up and to the right suggest: (scatter plot = graph of (x,y) points)

Negative correlation Positive correlation No correlation

Line of best fit

The line of best fit is a straight line drawn through the middle of your scatter plot — it’s the line that gets as close as possible to all the dots at once. We use it to make predictions.

Its equation looks like y = mx + b, where m is the slope (how steep the line is) and b is where it starts on the y-axis.

Example: If the line is score = 15 × hours + 40, then 3 hours of studying predicts a score of 15(3) + 40 = 85. Pretty useful!

The red line is the line of best fit. The closer the dots are to the line, the stronger the correlation — and the more reliable your predictions.

Quick check

What is the line of best fit used for? (line of best fit = the straight line through the data)

Making predictions from the pattern Counting how many dots there are Drawing bar charts

When r is close to 0

Sometimes two things have no real connection. The dots on the scatter plot are all over the place — no clear direction up or down. That’s when r is close to 0.

Real-life example: Hours you spend on a hobby vs your math grade. For most people, there’s no consistent link — one doesn’t predict the other.

You can still draw a line of best fit, but it won’t mean much. A nearly flat line surrounded by scattered dots is the graph version of “I don’t see a pattern here.”

Important: just because two things happen at the same time doesn’t mean one causes the other. Correlation isn’t causation!

Quick check

Correlation tells you that two variables are related. Does that mean one causes the other? (causation = one thing causing the other)

Yes, always No — correlation is not causation Only when r is 1

Nice work — here's a recap

Correlation: Two things moving together — positive (both up), negative (one up, one down), or no pattern at all.
Correlation coefficient r: A number from −1 to +1. Close to ±1 = strong pattern. Close to 0 = no pattern.
Line of best fit: The straight line closest to all the dots. Equation: y = mx + b. Use it to predict y when you know x.
Use a scatter plot to see the relationship. Use r to measure how strong it is. Use the line to make predictions.

Head to the quiz and see what you remember!

Quick check

To predict y when you know x, you use: (x and y = the variables from the lesson)

The correlation coefficient r only The number of data points The line of best fit (e.g. y = mx + b)

📈 Linear regression & correlation

What is correlation?

The correlation coefficient r

Scatter plot: positive correlation

Line of best fit

When r is close to 0

Nice work — here's a recap

End of lesson test