Overview
Click on the menu bar items to navigate to chapters
Click here for PDF version
Constrained Functions
Functional Data Analysis - Chapter 6
CEDAS FDA-VDS Seminar 2026
Julian Rakuschek
11.06.2026
We are going to cover chapter 6 in Functional Data Analysis on Constrained Functions
Focus:
Counts, for example page views
Strictly speaking: This would be non-negative
Probability functions
Many quantities can only be positive
Energy
Measures
Information Theoretic
Image Source: Kong, Sangwon & Lee, Kyung & Kim, Junho & Jang, Seong. (2014). The Effect of Two Different Hand Exercises on Grip Strength, Forearm Circumference, and Vascular Maturation in Patients Who Underwent Arteriovenous Fistula Surgery. Annals of Rehabilitation Medicine. 38. 648. 10.5535/arm.2014.38.5.648.
Chapter 5 – smoothing with roughness penalty:
\[ \text{PENSSE}_\lambda(x|\mathbf{y}) = \underbrace{[\mathbf{y}-x(\mathbf{t})]^T \mathbf{W} [\mathbf{y}-x(\mathbf{t})]^2}_{\text{Fit of the data}} + \lambda \underbrace{\int (D^2 x(s))^2 ds}_{\text{Function roughness based on curvature}} \]Remember, $x(t)$ is a linear expansion
\[ x(t) = \sum_{k=1}^K c_k \phi_k (t) \]fdParobj <- fdPar(
fdobj = basis,
Lfdobj = 2,
lambda = 1e-8
)
fit <- smooth.basis(
argvals = x_values,
y = y_values,
fdParobj = fdParobj
)
Instead of
\[ x(t) = \sum_{k=1}^K c_k \phi_k (t) \]we now define
\[ x(t) = e^{\sum_{k=1}^K c_k \phi_k (t)} = e^{W(t)} \]Positive smoothing function $x$ can always be defined on the exponential of an unconstrained function $W$.
Any other basis can be used as well!
Nota bene: This function can never be exactly zero. It can only get very close to zero.
now becomes
\[ \text{PENSSE}_\lambda(\textcolor{red}{W}|\mathbf{y}) = [\mathbf{y}-\textcolor{red}{e^{W(\mathbf{t})}}]^T \mathbf{W} [\mathbf{y}-\textcolor{red}{e^{W(\mathbf{t})}}]^2 + \lambda \int (D^2 \textcolor{red}{W(t)})^2 dt \]Needs to be solved through numerical methods. But these converge quickly because the expression is only mildly nonlinear.
fdParobj <- fdPar(
fdobj = basis,
Lfdobj = 2,
lambda = 1e-8
)
fit <- smooth.pos(
argvals = x_values,
y = y_values,
WfdParobj = fdParobj,
dbglev = 0
)
Unconstrained
Constrained
Growth
Growth
Volume is monotone increasing in the radius.
\[ V(r) = \frac{4}{3} \pi r^3 \]Cumulative Distribution Functions
The first derivative must be positive everywhere.
\[ Dx(t) = e^{W(t)} \] Integrate both sides: \[ x(t) = c + \int_{t_0}^t e^{W(u)} du \]$c$ is a constant estimated from the data.
Depending on $c$, the function will also be positive everywhere once it crosses the 0-level.
But: A zero increase is not possible, the function is always strictly monotone.
now becomes
\[ \text{PENSSE}_\lambda(\textcolor{red}{W}|\mathbf{y}) = \\ \left[\mathbf{y}-\textcolor{red}{\left(c + \int_{t_0}^t e^{W(u)} du\right)}\right]^T \mathbf{W} \left[\mathbf{y}-\textcolor{red}{\left(c + \int_{t_0}^t e^{W(u)} du\right)}\right]^2 \\ + \lambda \int (D^2 \textcolor{red}{W(t)})^2 dt \]Needs to be solved through numerical methods.
fdParobj <- fdPar(
fdobj = basis,
Lfdobj = 2,
lambda = 1e-8
)
fit <- smooth.monotone(
argvals = x_values,
y = y_values,
WfdParobj = fdParobj
)
Unconstrained
Constrained
Each $t$ has a probability of $p(t)$ to occur.
This is our density model.
The goal is to find an optimal coefficient vector $c$ for the model $p(t) = \frac{e^{W(t)}}{\int e^{W(t)} dt}$
Using maximum likelihood estimation is the way to go!
Likelihood for all $N$ observations $t_1, \ldots, t_N$:
\[ L(W|c) = \prod_{i=1}^N p(t_i) \]Maximum likelihood means finding a density that assigns high probability density to the observed samples.
Therefore we apply the log to convert it to a sum:
\[ \log L(W|c) = \sum_{i = 1}^N \log p(t_i) \]Apply $\log \left( \frac{A}{B} \right) = \log A - \log B$
\[ = \sum_{i = 1}^N \left[ \log \left(e^{W(t)}\right) - \log \left( \int e^{W(t)} dt \right) \right] \\ = \sum_{i = 1}^N \left[ W(t) \right] - N \log \left( \int e^{W(t)} dt \right) \]
82 observatons of the velocities (in 1000 km/second) of distant galaxies diverging from our own
Preliminaries
library(fda)
library(MASS)
data(galaxies)
dataset <- galaxies
N <- length(dataset)
Construct a B-Spline basis
Wknots <- dataset[round(N * seq(1 / N, 1, length.out = 11), 0)]
Wnbasis <- length(Wknots) + 2
Wbasis <- create.bspline.basis(
rangeval = range(dataset),
nbasis = Wnbasis,
norder = 4,
breaks = Wknots
)
Then we compute the fit using density.fd
Wlambda <- 0
Wfd0 <- fd(matrix(0, Wnbasis, 1), Wbasis)
WfdPar <- fdPar(Wfd0, Lfdobj = 2, lambda = Wlambda)
# Fit the density with density.fd.
# The result contains:
# - Wfdobj: fitted log-density component W(x)
# - C: normalizing constant
densityList <- density.fd(dataset, WfdPar)
Attention: density.fd is only available up until version 6.2.0. It has been removed in version 6.3.0.
Evaluate the fitted fda density on a fine grid.
Wfd <- densityList$Wfdobj
Cconst <- densityList$C
Zfine <- seq(min(dataset), max(dataset), length.out = 401)
Wfine <- eval.fd(Zfine, Wfd)
Pfine <- exp(Wfine) / Cconst
densityList only gives you $W(t)$ and the normalization constant, but not $p(t)$, so you need to stitch it together yourself.
Finally plot it.
plot(
Zfine, Pfine,
type = "l",
lwd = 2,
xlab = "Velocity of galaxy moving away from Earth (km/s divided by 1000)",
ylab = "Estimated density",
main = "Galaxy Dataset: density.fd vs KDE"
)
lines(kde_fit$x, kde_fit$y, lwd = 2, lty = 2)
No smoothness control!
$\text{PEN}_3(W)$ penalizes changes in curvature:
$\lambda$ is the smoothing parameter (similar to bandwidth in KDE).
Find coefficient vector $\mathbf{c}$ via numerical optimization.
Adjust Wlambda accordingly
Wlambda <- 3e+9
Wfd0 <- fd(matrix(0, Wnbasis, 1), Wbasis)
WfdPar <- fdPar(Wfd0, Lfdobj = 2, lambda = Wlambda)
# Fit the density with density.fd.
# The result contains:
# - Wfdobj: fitted log-density component W(x)
# - C: normalizing constant
densityList <- density.fd(dataset, WfdPar)
$\lambda = 0$
$\lambda = 3 \times 10^9$
intensity.fd is essentially the same as density.fd, but does not normalize.
# Fit the intensity with intensity.fd.
# The result contains:
# - Wfdobj: fitted log-intensity component W(x)
densityList <- intensity.fd(dataset, WfdPar)
Wfd <- densityList$Wfdobj
# Evaluate the fitted intensity on a fine grid.
Zfine <- seq(min(dataset), max(dataset), length.out = 401)
Wfine <- eval.fd(Zfine, Wfd)
Ifine <- as.numeric(exp(Wfine))
intensity.fd does not provide us a normalization constant.
Why would we use this method instead of simply computing the usual KDE?