---
title: "Principal surface biplots with prinsurf"
author: "Raeesa Ganey"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Principal surface biplots with prinsurf}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>",
  fig.width = 7, fig.height = 5.5, fig.align = "center"
)
library(prinsurf)
```

# Introduction

A principal surface (Hastie and Stuetzle, 1989) is a smooth two-dimensional
manifold fitted through the middle of a $p$-dimensional point cloud, estimated
by alternating an expectation step (the coordinate functions are loess smooths
of each variable on the surface coordinates) and a projection step (each sample
is projected onto its nearest point of the current surface). Where a principal
component analysis flattens the data onto the best-fitting plane, a principal
surface is allowed to bend, so it recovers curved structure that a plane must
smear out.

The display problem is that a curved surface has no linear biplot axes. This
package supplies the missing reading rules. Each variable's fitted values over
the surface form a smooth scalar field; the **gradient-flow axis** for a
variable is the steepest-ascent trajectory of that field, started from the
centre of the samples. It crosses the variable's contours orthogonally, carries
a monotone calibration in the variable's original units, and is read exactly
like a classical biplot axis: project a sample onto the axis, read off the
value. A variable whose field is not axis-readable — because it has an
interior extremum (Criterion 1) or because the constructed axis spans too
little of the variable's range (Criterion 2) — is **deferred** and read from
its contour lines instead.

The workflow is `prinsurf()` to fit, `plot()` to display, and
`predict()` / `predictivity()` / `axis_predictive_error()` to quantify how
well the display can be read.

# A curved sheet

We start with data simulated on a blanket-shaped surface: two latent
coordinates observed almost directly, and a third variable that bends
parabolically in one of them.

```{r blanket-data}
set.seed(1)
n  <- 200
s  <- runif(n, -1, 1)          # latent coordinate 1
t  <- runif(n, -1, 1)          # latent coordinate 2
e  <- function() runif(n, -0.05, 0.05)
x1 <- s + e()
x2 <- t + e()
x3 <- 0.7 * s + 0.9 * t^2 + e()   # curved (parabolic) contours in (s,t)
X  <- cbind(x1, x2, x3)
colnames(X) <- c("x", "y", "z")
```

```{r blanket-fit}
fit_b <- prinsurf(X, verbose = FALSE)
fit_b
```

The default display draws the samples inside a plain frame (the surface
coordinates carry no metric meaning, so there are no coordinate ticks) and a
calibrated gradient-flow axis for every variable that passes the two criteria.
`plot()` invisibly returns the names of the deferred variables.

```{r blanket-plot}
deferred <- plot(fit_b)
deferred
```

The `x` and `y` axes are close to straight, because those variables are close
to linear in the latent coordinates. The interest is in `z`: its contours over
the surface are curved, and its gradient-flow axis bends smoothly so as to stay
orthogonal to them. To see the field itself, suppress the axes and ask for the
contours, coloured by their marker value:

```{r blanket-contour}
plot(fit_b, vars = NULL, contours = "z", col_by_level = TRUE)
```

With `col_by_level = TRUE` each contour line is coloured by the value it
represents (low to high through a sequential palette), so colour *is* the
calibration — the contour analogue of tick marks on an axis. Only the
calibrated lines are drawn; the space between them is left blank, so the
display never colours a region where the fitted value would be an
extrapolation.

Contour levels and axis calibrations are always labelled in the variable's
original units, matching what `predict()` returns. Where the contours stop
short of the frame, the surface simply has no local data support there — the
grid is masked to the region the samples cover, and refuses to draw where the
fitted value would be an extrapolation.

How much better is the bent axis than a straight one? `kink_max()` measures
the largest turning angle along an axis (small for a smooth flow), and
`axis_predictive_error()` gives the root-mean-square error, per axis, of
reading each sample by orthogonal projection — the standard predictive error
of the Gower biplot tradition, in working (standardised) units.

```{r blanket-diagnostics}
kink_max(psaxis(fit_b, "z")$axis)
axis_predictive_error(fit_b)
```

# A half-sphere cap: deferral in action

The second example is data on a spherical cap. One coordinate — the one along
the cap's axis — attains its maximum at the pole, in the *interior* of the
surface. No monotone axis can represent such a variable, and this is exactly
what Criterion 1 detects.

```{r sphere-data}
set.seed(05012014)
data1 <- function(n, side) {
  x <- 2 * runif(n) - 1
  theta <- 2 * pi * runif(n) - pi
  y <- sin(theta) * sqrt(1 - x^2)
  z <- cos(theta) * sqrt(1 - x^2)
  X <- matrix(c(x, y, z), nrow = n, ncol = 3, byrow = FALSE)
  X[X[, side] > -0.4, ]                 # keep cap (> half ball) along axis `side`
}
X <- data1(200, 1)
colnames(X) <- c("x", "y", "z")
cat(sprintf("half-sphere: n = %d after the cap filter\n", nrow(X)))
```

```{r sphere-fit}
fit_s <- prinsurf(X, verbose = FALSE)
deferred <- plot(fit_s)
deferred
```

The deferred variable gets no axis. Reading it is still possible — from its
contours, which on a cap are concentric rings around the pole:

```{r sphere-contour}
plot(fit_s, vars = NULL, contours = deferred, col_by_level = TRUE)
```

`psaxis()` reports the deferral decision for any single variable, and
`predict()` respects it: for a monotone variable it projects onto the
calibrated axis, while for a deferred variable it returns the surface value
$\hat f_j(\lambda_i)$ (the contour reading), with a message. Either way the
result is in the variable's original units.

```{r sphere-predict}
psaxis(fit_s, deferred[1])
head(predict(fit_s, deferred[1]))
```

If you want an axis regardless — for instance to demonstrate *why* deferral is
the right call — switch the criteria off with `defer = FALSE` in `psaxis()`
or in `plot()`, and inspect the coverage the forced axis achieves.

# Iris: a real dataset with groups

Fisher's iris data has four measured variables and a species factor. Variables
are standardised by default (`scale = TRUE`), which is recommended whenever
the variables are on unequal scales.

```{r iris-fit}
fit_i <- prinsurf(as.matrix(iris[, 1:4]), verbose = FALSE)
deferred <- plot(fit_i, group = iris$Species)
deferred
```

The `group` argument colours the samples and adds a legend; the axes are read
in the usual way, in centimetres. To study one variable's behaviour over the
whole surface, overlay its contours on the biplot together with its axis, or
drop the axes altogether:

```{r iris-one-variable}
plot(fit_i, vars = "Petal.Length", contours = "Petal.Length",
     group = iris$Species)
```

Passing `contours = TRUE` switches to a panel display, one panel per variable
with its contours coloured by level — the quickest overall diagnostic view of
the fitted surface:

```{r iris-panels, fig.height = 7}
plot(fit_i, contours = TRUE, col_by_level = TRUE, group = iris$Species)
```

## How well can the display be read?

Two complementary summaries. **Sample predictivity** asks, per sample, what
proportion of its squared length the surface reconstructs,
$1 - \lVert x_i - \hat f(\lambda_i)\rVert^2 / \lVert x_i \rVert^2$; values
near 1 are well represented, and negative values flag samples the surface
cannot reach. **Axis predictive error** asks, per variable, how accurately the
axis-projection reading recovers the true values.

```{r iris-diagnostics}
pred <- predictivity(fit_i)
attr(pred, "overall")
summary(as.numeric(pred))

axis_predictive_error(fit_i)
```

And the direct check — predicted against observed for one variable:

```{r iris-predict}
plot(iris$Petal.Length, predict(fit_i, "Petal.Length"),
     xlab = "observed", ylab = "read from the biplot axis")
abline(0, 1, col = "grey50")
```

## Two reading rules

`predict()` offers two rules. The default, `rule = "axis"`, projects each
sample onto the calibrated axis — the value a reader recovers by eye.
`rule = "contour"` instead returns the surface value $\hat f_j(\lambda_i)$ for
every variable, the reading from the contours; this carries no
contour-curvature loss (see the theory vignette, Proposition 4) and is the
comparison the method turns on. The two coincide only where the contours are
straight, and diverge in proportion to their curvature:

```{r iris-rules}
ax   <- predict(fit_i, "Petal.Length", rule = "axis")
cont <- predict(fit_i, "Petal.Length", rule = "contour")
obs  <- iris$Petal.Length
c(axis    = sqrt(mean((obs - ax)^2)),
  contour = sqrt(mean((obs - cont)^2)))
```

The contour reading is at least as accurate everywhere, and strictly better
where the field bends — but it is not a drawn axis a reader can follow, which
is the trade the mixed biplot makes.

# Practical notes

**Span.** The single smoothing parameter is the loess span (default `0.6`).
Smaller values let the surface bend more; larger values push it towards the
principal-component plane. If the fitted contours look implausibly wiggly,
increase it.

**Deferral tuning.** Criterion 1 uses `delta` (the margin, as a fraction of
the value range, by which an interior extremum must exceed the boundary) and
Criterion 2 uses `cover_min` (the minimum fraction of the variable's range the
axis must span, default `0.55`). Both are arguments to `psaxis()` and to
`plot()`. Deferral is a display decision, not a verdict on the variable: a
deferred variable is still fitted, still predictable, and often the most
interesting one on the plot.

**Colouring by level.** With `col_by_level = TRUE`, contour lines are coloured
by the value they represent, so colour encodes the calibration directly. Only
the calibrated curves are drawn; unlike a filled surface, nothing is painted in
the space between them, so the display never suggests a value where the surface
merely interpolates. Set `n_contours` for the number of levels and `level_pal`
for the palette.

**Blank regions.** The contours are evaluated on a grid masked to the
data-support region: cells with no sample nearby are dropped. Where the
contours stop short of the frame, the configuration has a genuine void there —
between clusters, or at a sparse edge — and the smooth surface would otherwise
extrapolate a value with no data behind it.

**Units.** Displayed calibrations, contour levels, and everything returned by
`predict()` are in the original units of each variable. The internal
computations, `predictivity()`, and `axis_predictive_error()` operate in the
working (centred, optionally standardised) units.

# References

Gower, J.C. and Hand, D.J. (1996). *Biplots*. Chapman & Hall, London.

Gower, J.C., Lubbe, S. and le Roux, N. (2011). *Understanding Biplots*.
Wiley, Chichester.

Hastie, T. and Stuetzle, W. (1989). Principal curves. *Journal of the
American Statistical Association*, **84**, 502–516.