A principal surface (Hastie and Stuetzle, 1989) is a smooth two-dimensional manifold fitted through the middle of a \(p\)-dimensional point cloud, estimated by alternating an expectation step (the coordinate functions are loess smooths of each variable on the surface coordinates) and a projection step (each sample is projected onto its nearest point of the current surface). Where a principal component analysis flattens the data onto the best-fitting plane, a principal surface is allowed to bend, so it recovers curved structure that a plane must smear out.
The display problem is that a curved surface has no linear biplot axes. This package supplies the missing reading rules. Each variable’s fitted values over the surface form a smooth scalar field; the gradient-flow axis for a variable is the steepest-ascent trajectory of that field, started from the centre of the samples. It crosses the variable’s contours orthogonally, carries a monotone calibration in the variable’s original units, and is read exactly like a classical biplot axis: project a sample onto the axis, read off the value. A variable whose field is not axis-readable — because it has an interior extremum (Criterion 1) or because the constructed axis spans too little of the variable’s range (Criterion 2) — is deferred and read from its contour lines instead.
The workflow is prinsurf() to fit, plot()
to display, and predict() / predictivity() /
axis_predictive_error() to quantify how well the display
can be read.
We start with data simulated on a blanket-shaped surface: two latent coordinates observed almost directly, and a third variable that bends parabolically in one of them.
set.seed(1)
n <- 200
s <- runif(n, -1, 1) # latent coordinate 1
t <- runif(n, -1, 1) # latent coordinate 2
e <- function() runif(n, -0.05, 0.05)
x1 <- s + e()
x2 <- t + e()
x3 <- 0.7 * s + 0.9 * t^2 + e() # curved (parabolic) contours in (s,t)
X <- cbind(x1, x2, x3)
colnames(X) <- c("x", "y", "z")fit_b <- prinsurf(X, verbose = FALSE)
fit_b
#> Principal surface fit: 200 samples, 3 variables
#> loess span 0.60, converged in 9 iterations
#> variables: x, y, zThe default display draws the samples inside a plain frame (the
surface coordinates carry no metric meaning, so there are no coordinate
ticks) and a calibrated gradient-flow axis for every variable that
passes the two criteria. plot() invisibly returns the names
of the deferred variables.
The x and y axes are close to straight,
because those variables are close to linear in the latent coordinates.
The interest is in z: its contours over the surface are
curved, and its gradient-flow axis bends smoothly so as to stay
orthogonal to them. To see the field itself, suppress the axes and ask
for the contours, coloured by their marker value:
With col_by_level = TRUE each contour line is coloured
by the value it represents (low to high through a sequential palette),
so colour is the calibration — the contour analogue of tick
marks on an axis. Only the calibrated lines are drawn; the space between
them is left blank, so the display never colours a region where the
fitted value would be an extrapolation.
Contour levels and axis calibrations are always labelled in the
variable’s original units, matching what predict() returns.
Where the contours stop short of the frame, the surface simply has no
local data support there — the grid is masked to the region the samples
cover, and refuses to draw where the fitted value would be an
extrapolation.
How much better is the bent axis than a straight one?
kink_max() measures the largest turning angle along an axis
(small for a smooth flow), and axis_predictive_error()
gives the root-mean-square error, per axis, of reading each sample by
orthogonal projection — the standard predictive error of the Gower
biplot tradition, in working (standardised) units.
The second example is data on a spherical cap. One coordinate — the one along the cap’s axis — attains its maximum at the pole, in the interior of the surface. No monotone axis can represent such a variable, and this is exactly what Criterion 1 detects.
set.seed(05012014)
data1 <- function(n, side) {
x <- 2 * runif(n) - 1
theta <- 2 * pi * runif(n) - pi
y <- sin(theta) * sqrt(1 - x^2)
z <- cos(theta) * sqrt(1 - x^2)
X <- matrix(c(x, y, z), nrow = n, ncol = 3, byrow = FALSE)
X[X[, side] > -0.4, ] # keep cap (> half ball) along axis `side`
}
X <- data1(200, 1)
colnames(X) <- c("x", "y", "z")
cat(sprintf("half-sphere: n = %d after the cap filter\n", nrow(X)))
#> half-sphere: n = 138 after the cap filterThe deferred variable gets no axis. Reading it is still possible — from its contours, which on a cap are concentric rings around the pole:
psaxis() reports the deferral decision for any single
variable, and predict() respects it: for a monotone
variable it projects onto the calibrated axis, while for a deferred
variable it returns the surface value \(\hat
f_j(\lambda_i)\) (the contour reading), with a message. Either
way the result is in the variable’s original units.
psaxis(fit_s, deferred[1])
#> psaxis for 'z': deferred to contour reading (coverage 0.36)
head(predict(fit_s, deferred[1]))
#> 'z' is deferred (no axis); returning surface values f_hat(lambda).
#> 1 2 3 4 5 6
#> 0.57804689 -0.25506006 -0.01820171 0.15460037 -0.01820171 0.86127929If you want an axis regardless — for instance to demonstrate
why deferral is the right call — switch the criteria off with
defer = FALSE in psaxis() or in
plot(), and inspect the coverage the forced axis
achieves.
Fisher’s iris data has four measured variables and a species factor.
Variables are standardised by default (scale = TRUE), which
is recommended whenever the variables are on unequal scales.
fit_i <- prinsurf(as.matrix(iris[, 1:4]), verbose = FALSE)
deferred <- plot(fit_i, group = iris$Species)The group argument colours the samples and adds a
legend; the axes are read in the usual way, in centimetres. To study one
variable’s behaviour over the whole surface, overlay its contours on the
biplot together with its axis, or drop the axes altogether:
Passing contours = TRUE switches to a panel display, one
panel per variable with its contours coloured by level — the quickest
overall diagnostic view of the fitted surface:
Two complementary summaries. Sample predictivity asks, per sample, what proportion of its squared length the surface reconstructs, \(1 - \lVert x_i - \hat f(\lambda_i)\rVert^2 / \lVert x_i \rVert^2\); values near 1 are well represented, and negative values flag samples the surface cannot reach. Axis predictive error asks, per variable, how accurately the axis-projection reading recovers the true values.
pred <- predictivity(fit_i)
attr(pred, "overall")
#> [1] 0.9377464
summary(as.numeric(pred))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.5423 0.9193 0.9792 0.9377 0.9947 0.9998
axis_predictive_error(fit_i)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 0.4381697 0.2340688 0.1707248 0.3199863
#> attr(,"overall")
#> [1] 0.2907374And the direct check — predicted against observed for one variable:
plot(iris$Petal.Length, predict(fit_i, "Petal.Length"),
xlab = "observed", ylab = "read from the biplot axis")
abline(0, 1, col = "grey50")predict() offers two rules. The default,
rule = "axis", projects each sample onto the calibrated
axis — the value a reader recovers by eye. rule = "contour"
instead returns the surface value \(\hat
f_j(\lambda_i)\) for every variable, the reading from the
contours; this carries no contour-curvature loss (see the theory
vignette, Proposition 4) and is the comparison the method turns on. The
two coincide only where the contours are straight, and diverge in
proportion to their curvature:
ax <- predict(fit_i, "Petal.Length", rule = "axis")
cont <- predict(fit_i, "Petal.Length", rule = "contour")
obs <- iris$Petal.Length
c(axis = sqrt(mean((obs - ax)^2)),
contour = sqrt(mean((obs - cont)^2)))
#> axis contour
#> 0.3013802 0.2186153The contour reading is at least as accurate everywhere, and strictly better where the field bends — but it is not a drawn axis a reader can follow, which is the trade the mixed biplot makes.
Span. The single smoothing parameter is the loess
span (default 0.6). Smaller values let the surface bend
more; larger values push it towards the principal-component plane. If
the fitted contours look implausibly wiggly, increase it.
Deferral tuning. Criterion 1 uses delta
(the margin, as a fraction of the value range, by which an interior
extremum must exceed the boundary) and Criterion 2 uses
cover_min (the minimum fraction of the variable’s range the
axis must span, default 0.55). Both are arguments to
psaxis() and to plot(). Deferral is a display
decision, not a verdict on the variable: a deferred variable is still
fitted, still predictable, and often the most interesting one on the
plot.
Colouring by level. With
col_by_level = TRUE, contour lines are coloured by the
value they represent, so colour encodes the calibration directly. Only
the calibrated curves are drawn; unlike a filled surface, nothing is
painted in the space between them, so the display never suggests a value
where the surface merely interpolates. Set n_contours for
the number of levels and level_pal for the palette.
Blank regions. The contours are evaluated on a grid masked to the data-support region: cells with no sample nearby are dropped. Where the contours stop short of the frame, the configuration has a genuine void there — between clusters, or at a sparse edge — and the smooth surface would otherwise extrapolate a value with no data behind it.
Units. Displayed calibrations, contour levels, and
everything returned by predict() are in the original units
of each variable. The internal computations,
predictivity(), and axis_predictive_error()
operate in the working (centred, optionally standardised) units.
Gower, J.C. and Hand, D.J. (1996). Biplots. Chapman & Hall, London.
Gower, J.C., Lubbe, S. and le Roux, N. (2011). Understanding Biplots. Wiley, Chichester.
Hastie, T. and Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84, 502–516.