Any $m \times n$ matrix $A$ defines four subspaces. The column space and left null space live in $\mathbb{R}^m$. The row space and null space live in $\mathbb{R}^n$. The column space and left null space are orthogonal complements (together they span all of $\mathbb{R}^m$), and likewise for the row space and null space in $\mathbb{R}^n$.
The rank $r$ of the matrix sets the dimensions: column space and row space are each $r$-dimensional, the null space is $(n-r)$-dimensional, and the left null space is $(m-r)$-dimensional.
$U$ is $m \times m$ orthogonal, $\Sigma$ is $m \times n$ diagonal with $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0$, and $V$ is $n \times n$ orthogonal. The columns of $U$ are the left singular vectors; the columns of $V$ are the right singular vectors. These aren't arbitrary orthonormal bases — they are matched pairs that diagonalize the action of $A$:
Each right singular vector maps to the corresponding left singular vector, scaled by the singular value. The right singular vectors with nonzero $\sigma$ form an orthonormal basis for the row space. The left singular vectors with nonzero $\sigma$ form an orthonormal basis for the column space. The remaining singular vectors fill out the null space and left null space respectively.
The same pairing and gains, but the arrows reverse:
Right-multiplying maps from input (row) space to output (column) space. Left-multiplying (via $A^T$) reverses the direction. The pairing and gains are invariant — only the arrow flips. Toggle the diagram below to see the reversal:
The SVD lets you write $A$ as a sum of rank-1 matrices:
The Eckart-Young theorem says the best rank-$k$ approximation (in Frobenius or operator norm) is the SVD truncation: keep the $k$ terms with largest $\sigma$. Each term says: project the input onto row-space direction $v_i$, scale by $\sigma_i$, output in column-space direction $u_i$. The residual error is:
If singular values decay rapidly, the data is effectively low-dimensional and the truncation loses almost nothing.
This is where the subspace geometry connects to demography. The key reference is Clark (2015), which develops a general SVD-based component model for age-correlated demographic quantities.
Organize a matrix $A$ so that each column is one population-year's age schedule and each row is an age group. Each row is a point in $\mathbb{R}^H$ (where $H$ = number of population-years). The SVD identifies the primary directions in this cloud of age-group points.
The critical observation: because mortality (or fertility) at each age is similar across population-years, the age-group points cluster on a line from the origin with slope $\approx 1$. The first right singular vector $v_1$ points from the origin through the cloud — it captures level. How far each age-group point sits along $v_1$ encodes the canonical age schedule $u_1$.
For log mortality: old-age points (age 85, high mortality, small negative log) are near the origin; young-adult points (age 5, low mortality, large negative log) are far from the origin. The spacing along $v_1$ is the bathtub curve of mortality.
Toggle the visualization above between "Uncentered" and "Centered" to see the core difference.
Put precisely: the Lee-Carter component $b_x \cdot k_t$ is actually the second SVD-Comp component — the dominant separable departure from the canonical age schedule. The full demographic rate structure lives in the SVD-Comp decomposition but is split across two unrelated objects ($a_x$ and the SVD residual) in Lee-Carter.
Each column (population-year) of the data matrix can be written:
where $\Lambda_i = s_i \cdot u_i$ are the age-varying components. Taking the first $c$ terms gives a $c$-parameter model. The weights $v_{h,i}$ are the parameters — typically just 2 or 3 numbers capture an entire age schedule.
Right-multiplying $A$ by $v_1$ (projecting the data onto the dominant year-direction) gives $\sigma_1 u_1$ — the canonical age schedule, scaled by overall level. You're asking: "across all years, what is the dominant age pattern of mortality?"
Left-multiplying $A^T$ by $u_1$ (projecting the data onto the canonical age pattern) gives $\sigma_1 v_1$ — the year-to-year level index. You're asking: "given this canonical age shape, how does overall mortality level track across years?"
Same question, two directions. The first lives in age-space, the second in year-space, and $\sigma_1$ is the coupling strength between them.
The SVD of a matrix $A$ simultaneously provides orthonormal bases for all four fundamental subspaces and pairs the row-space basis with the column-space basis through the singular values. Each row-space direction $v_i$ has a partner column-space direction $u_i$, and $\sigma_i$ is the gain along that pair. This pairing is the deep insight that no other decomposition provides.
For demographic age schedules:
SVD-Comp (uncentered) preserves the full geometric structure. The first matched pair captures level × canonical age structure. Subsequent pairs capture separable age-specific deviations from canonical. Each column of the data matrix is a weighted sum of the $u_i$ components, with weights from the corresponding $v_i$. Typically 2–3 weights suffice.
Lee-Carter (centered) discards the level information by subtracting the mean. The SVD then operates on residuals, so its first component captures the dominant deviation pattern rather than the canonical structure. This is fine for forecasting $k_t$ as a time series, but it loses the clean separation between "what mortality looks like on average" and "how it's changing differentially by age."