Data and fitted lines scroll zoom · drag pan · dbl-click reset
Your data points are grey. The black line is the best fit. Solid coloured lines are lines you tested; dashed coloured lines show where each gradient-descent run ended. Tick "show errors" to see the vertical gaps between the points and the best-fit line, which are exactly what the cost squares and adds up.
3D cost surface left-drag rotate · right-drag pan · scroll zoom · dbl-click reset
The cost drawn as a landscape. Each spot on the floor is one possible line (its slope and intercept), and the height above it is that line's cost. The bottom of the bowl, marked with the black dot, is the best-fitting line. Coloured stems mark your test lines; coloured curves are the descent paths. Drag to rotate, scroll to zoom.
Cost surface J(slope, intercept) click adds a test · scroll zoom · drag pan
A top-down map of the same bowl. Every point is one line; the colour and the contour rings show its cost (warm colours = high cost, cool = low). The white dot is the best fit, diamonds are your test lines, and the curves are the route gradient descent takes. The route bends because the bowl is a long, narrow valley: descent first drops across the valley, then crawls along its floor to the lowest point. Clicking anywhere adds that line as a test.
Gradient descent convergence cost vs iteration · scroll zoom · drag pan
The cost after each step of gradient descent (the vertical scale counts in powers of ten). A falling curve means the line is getting better; levelling out onto the dashed "optimum" line means it has reached the best fit. A curve that climbs instead means the learning rate is too large and the method is diverging.
Best fit (least squares)
Data
Test a line
Gradient descent
Lines and runs
Reading the plots
How it works
A line is written y = m x + b, where m is the slope and b is the intercept. For each data point the "error" is the vertical gap between the line and the point. The cost squares every error and averages them, so one big miss counts a lot:
Gradient descent starts from a guess and repeatedly nudges m and b a little in the direction that lowers the cost. How big each nudge is depends on the learning rate α:
b ← b − α · ∂J/∂b
A small α learns slowly but safely; too large an α overshoots and the cost blows up. Because this cost valley is long and narrow, the path naturally bends rather than heading straight to the bottom.
Run it in MATLAB
This explorer began life as a MATLAB GUI. Download the original script linear_regression_cost_explorer.m to open and run it in MATLAB.