The SVD as subspace geometry

From row-space / column-space projections to demographic age schedules.
How SVD-Comp (Clark, 2015) differs from Lee-Carter (1992) — and why it matters for interpretation.
See also Clark (2019) for the full SVD-Comp mortality model.

Samuel J. Clark
March 2026

1. The four fundamental subspaces

Any $m \times n$ matrix $A$ defines four subspaces. The column space and left null space live in $\mathbb{R}^m$. The row space and null space live in $\mathbb{R}^n$. The column space and left null space are orthogonal complements (together they span all of $\mathbb{R}^m$), and likewise for the row space and null space in $\mathbb{R}^n$.

The rank $r$ of the matrix sets the dimensions: column space and row space are each $r$-dimensional, the null space is $(n-r)$-dimensional, and the left null space is $(m-r)$-dimensional.

2. What SVD gives you — matched pairs

$$A = U \Sigma V^T$$

$U$ is $m \times m$ orthogonal, $\Sigma$ is $m \times n$ diagonal with $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0$, and $V$ is $n \times n$ orthogonal. The columns of $U$ are the left singular vectors; the columns of $V$ are the right singular vectors. These aren't arbitrary orthonormal bases — they are matched pairs that diagonalize the action of $A$:

$$Av_i = \sigma_i u_i$$

Each right singular vector maps to the corresponding left singular vector, scaled by the singular value. The right singular vectors with nonzero $\sigma$ form an orthonormal basis for the row space. The left singular vectors with nonzero $\sigma$ form an orthonormal basis for the column space. The remaining singular vectors fill out the null space and left null space respectively.

The geometric picture: $A$ takes the unit sphere in $\mathbb{R}^n$ to a hyperellipsoid in $\mathbb{R}^m$. The $v_i$ are the input sphere's axes that map onto the output ellipsoid's axes (the $u_i$). The singular values $\sigma_i$ are the ellipsoid's axis lengths — how much $A$ stretches along each matched direction.

3. The directional flip

The same pairing and gains, but the arrows reverse:

$$Av_i = \sigma_i u_i \qquad \text{(row space} \to \text{column space)}$$ $$A^T u_i = \sigma_i v_i \qquad \text{(column space} \to \text{row space)}$$

Right-multiplying maps from input (row) space to output (column) space. Left-multiplying (via $A^T$) reverses the direction. The pairing and gains are invariant — only the arrow flips. Toggle the diagram below to see the reversal:

Direction:

4. The rank-$k$ approximation

The SVD lets you write $A$ as a sum of rank-1 matrices:

$$A = \sum_{i=1}^{r} \sigma_i \, u_i \, v_i^T$$

The Eckart-Young theorem says the best rank-$k$ approximation (in Frobenius or operator norm) is the SVD truncation: keep the $k$ terms with largest $\sigma$. Each term says: project the input onto row-space direction $v_i$, scale by $\sigma_i$, output in column-space direction $u_i$. The residual error is:

$$\|A - A_k\|_F^2 = \sigma_{k+1}^2 + \sigma_{k+2}^2 + \cdots + \sigma_r^2$$

If singular values decay rapidly, the data is effectively low-dimensional and the truncation loses almost nothing.

5. SVD-Comp — the uncentered SVD for demographic age schedules

This is where the subspace geometry connects to demography. The key reference is Clark (2015), which develops a general SVD-based component model for age-correlated demographic quantities.

Organize a matrix $A$ so that each column is one population-year's age schedule and each row is an age group. Each row is a point in $\mathbb{R}^H$ (where $H$ = number of population-years). The SVD identifies the primary directions in this cloud of age-group points.

Why the uncentered cloud works for demography

The critical observation: because mortality (or fertility) at each age is similar across population-years, the age-group points cluster on a line from the origin with slope $\approx 1$. The first right singular vector $v_1$ points from the origin through the cloud — it captures level. How far each age-group point sits along $v_1$ encodes the canonical age schedule $u_1$.

For log mortality: old-age points (age 85, high mortality, small negative log) are near the origin; young-adult points (age 5, low mortality, large negative log) are far from the origin. The spacing along $v_1$ is the bathtub curve of mortality.

Show:
Overlay:
Each point is an age group. X = log mortality year 1, Y = year 2. Points cluster on a line from origin (slope ≈ 1) because mortality at each age is similar across years. v1 points from origin through the cloud — this IS the level direction.

6. Lee-Carter vs. SVD-Comp — what centering costs you

Toggle the visualization above between "Uncentered" and "Centered" to see the core difference.

Lee-Carter subtracts the mean age profile $a_x$ before applying SVD. This translates the cloud to the origin. Now $v_1$ aligns with within-cloud variance — the dominant deviation pattern — not level. The origin-to-cloud information is absorbed into $a_x$ and lost from the decomposition. The Lee-Carter model is: $$\ln(m_{x,t}) = a_x + b_x k_t + \varepsilon_{x,t}$$ where $b_x$ and $k_t$ come from the rank-1 SVD of the residual after subtracting $a_x$.
SVD-Comp works on the uncentered matrix. The first term $\sigma_1 u_1 v_1^T$ literally IS level × canonical age structure. Subsequent terms are separable age-specific deviations from canonical. Each column of the data matrix is a weighted sum of the $u_i$ components (Equation 10 of Clark 2015), with weights from the corresponding $v_i$. Typically 2–3 weights suffice to reproduce an entire age schedule.

Put precisely: the Lee-Carter component $b_x \cdot k_t$ is actually the second SVD-Comp component — the dominant separable departure from the canonical age schedule. The full demographic rate structure lives in the SVD-Comp decomposition but is split across two unrelated objects ($a_x$ and the SVD residual) in Lee-Carter.

The column reconstruction — Equation 10 of Clark (2015)

Each column (population-year) of the data matrix can be written:

$$x_h = \sum_{i=1}^{\rho} v_{h,i} \cdot s_i \, u_i = \sum_{i=1}^{\rho} v_{h,i} \cdot \Lambda_i$$

where $\Lambda_i = s_i \cdot u_i$ are the age-varying components. Taking the first $c$ terms gives a $c$-parameter model. The weights $v_{h,i}$ are the parameters — typically just 2 or 3 numbers capture an entire age schedule.

Population-year:
Components:
data reconstruction Λ1 (canonical) Λ2 (deviation) Λ3 (residual)

7. The directional flip in demographic terms

Right-multiplying $A$ by $v_1$ (projecting the data onto the dominant year-direction) gives $\sigma_1 u_1$ — the canonical age schedule, scaled by overall level. You're asking: "across all years, what is the dominant age pattern of mortality?"

Left-multiplying $A^T$ by $u_1$ (projecting the data onto the canonical age pattern) gives $\sigma_1 v_1$ — the year-to-year level index. You're asking: "given this canonical age shape, how does overall mortality level track across years?"

Same question, two directions. The first lives in age-space, the second in year-space, and $\sigma_1$ is the coupling strength between them.

In Lee-Carter terms: what LC calls $b_x$ and $k_t$ are the $u_2$ and $v_2$ of SVD-Comp (the second matched pair), because the first pair was absorbed into $a_x$ by centering. This means $k_t$ is not the overall mortality level — it's the time-varying weight on the dominant deviation from the mean age schedule. The actual level is hidden in $a_x$.

8. Summary

The SVD of a matrix $A$ simultaneously provides orthonormal bases for all four fundamental subspaces and pairs the row-space basis with the column-space basis through the singular values. Each row-space direction $v_i$ has a partner column-space direction $u_i$, and $\sigma_i$ is the gain along that pair. This pairing is the deep insight that no other decomposition provides.

For demographic age schedules:

SVD-Comp (uncentered) preserves the full geometric structure. The first matched pair captures level × canonical age structure. Subsequent pairs capture separable age-specific deviations from canonical. Each column of the data matrix is a weighted sum of the $u_i$ components, with weights from the corresponding $v_i$. Typically 2–3 weights suffice.

Lee-Carter (centered) discards the level information by subtracting the mean. The SVD then operates on residuals, so its first component captures the dominant deviation pattern rather than the canonical structure. This is fine for forecasting $k_t$ as a time series, but it loses the clean separation between "what mortality looks like on average" and "how it's changing differentially by age."


References
Clark, S.J. (2015). A singular value decomposition-based factorization and parsimonious component model of demographic quantities correlated by age. arXiv:1504.02057.
Clark, S.J. (2019). A general age-specific mortality model with an example indexed by child mortality or both child and adult mortality. Demography, 56(3), 1131–1159.
Lee, R.D. & Carter, L.R. (1992). Modeling and forecasting U.S. mortality. Journal of the American Statistical Association, 87, 659–671.
Strang, G. (2009). Introduction to Linear Algebra, 4th ed. Wellesley-Cambridge Press.