Modelization — User Guide

If you see this, something is wrong

Collapse and expand sections

To get acquainted with the document, the best thing to do is to select the "Collapse all sections" item from the "View" menu. This will leave visible only the titles of the top-level sections.

Clicking on a section title toggles the visibility of the section content. If you have collapsed all of the sections, this will let you discover the document progressively, from the top-level sections to the lower-level ones.

Cross-references and related material

Generally speaking, anything that is blue is clickable.

Clicking on a reference link (like an equation number, for instance) will display the reference as close as possible, without breaking the layout. Clicking on the displayed content or on the reference link hides the content. This is recursive: if the content includes a reference, clicking on it will have the same effect. These "links" are not necessarily numbers, as it is possible in LaTeX2Web to use full text for a reference.

Clicking on a bibliographical reference (i.e., a number within brackets) will display the reference.

Speech bubbles indicate a footnote. Click on the bubble to reveal the footnote (there is no page in a web document, so footnotes are placed inside the text flow). Acronyms work the same way as footnotes, except that you have the acronym instead of the speech bubble.

Discussions

By default, discussions are open in a document. Click on the discussion button below to reveal the discussion thread. However, you must be registered to participate in the discussion.

If a thread has been initialized, you can reply to it. Any modification to any comment, or a reply to it, in the discussion is signified by email to the owner of the document and to the author of the comment.

First published on Wednesday, Jun 17, 2026 and last modified on Thursday, Jun 18, 2026 by François Chaplais.

Like what you see? Register!

Modelization — User Guide

François Chaplais WebMagic

\[ \hat Y = f(X) \]

\[ f(x) = \bigl(f_1(x), \dots, f_n(x)\bigr), \]

\[ \tilde x_r = \frac{x_r - \hat\mu_r}{\hat\sigma_r}, ~~ \tilde y_j = \frac{y_j - \hat\mu_j^Y}{\hat\sigma_j^Y}. \]

\[ t_0 = \dots = t_d < \xi_1 < \dots < \xi_K < t_{K+d+1} = \dots = t_{K+2d+1}. \]

\[ B = K + d + 1 = K + 4. \]

\[ N_{i,0}(x) = \left\{ \begin{array}{ll}1 & t_i \le x < t_{i+1}\\ 0 & \text{otherwise,}\end{array}\right. \]

\[ N_{i,d}(x) = \frac{x - t_i}{t_{i+d} - t_i} N_{i,d-1}(x) + \frac{t_{i+d+1} - x}{t_{i+d+1} - t_{i+1}} N_{i+1,d-1}(x), \]

\[ \sum_{i=1}^B N_{i,d}(x) = 1 ~ \forall\, x. \]

\[ \Delta x_{\min} \approx \frac{\text{input range}}{K}. \]

\[ A = \bigl[1 \;\big|\; \Phi_1 \;\big|\; \Phi_2 \;\big|\; \dots \;\big|\; \Phi_m\bigr] \in \mathbb{R}^{N \times P}, \]

\[ P = 1 + \sum_{r=1}^m B_r. \]

\[ \min_{C \in \mathbb{R}^{P \times n}} \;\|AC - Y\|_F^2 + \lambda\, C^\top \Omega\, C, \]

\[ \bigl(A^\top A + \lambda\, \Omega\bigr) C = A^\top Y. \]

\[ \lambda \in \{10^{-4},\; 10^{-3},\; 10^{-2},\; 10^{-1},\; 1,\; 10,\; 100\} \]

\[ G_\lambda = A^\top A + \lambda\, \Omega \]

\[ \kappa = \frac{\sigma_{\max}(G_\lambda)}{\sigma_{\min}(G_\lambda)}, \]

\[ f_j(x) = \beta_j. \]

\[ f_j(x) = \beta_j + \sum_{r=1}^m w_{jr} x_r. \]

\[ f_j(x) = \beta_j + \sum_{r=1}^m \sum_{\ell=1}^{B} c_{jr\ell}\, N_{\ell,3}(\tilde x_r), \]

\[ f_j(x) = f_j^{M3}(x) + \sum_{\ell=1}^{K^2} d_{j\ell}\, \bigl[N_{\cdot,3}(\tilde x_r) \otimes N_{\cdot,3}(\tilde x_s)\bigr]_\ell, \]

\[ \tilde X = U \Sigma V^\top, \]

\[ Z_q = \tilde X\, V_{:,1:q} \in \mathbb{R}^{N \times q}. \]

\[ d = \sqrt{\left(\frac{\text{cvRmse}}{1.0}\right)^2 + \left(\frac{\log_{10}\kappa}{8.0}\right)^2}. \]

\[ \text{EDF} = \mathrm{tr}(H) = P - \lambda\,\mathrm{tr}(G_\lambda^{-1}\Omega) \in [1, P]. \]

Rung	Basis functions per input	Total P (m inputs)
M2	\( B = K + 4\)	\( 1 + Bm\)
M3 (refined)	\( B' = 2K + 3\)	\( 1 + B'm\)
M4 interaction block	—	\( P_{M3} + K^2\)
M5 interaction block	—	\( P_{M4} + K^2\)

K	B (M2)	P, M2, m=3	P, M2, m=5	P, M4, m=5
2	6	19	31	35
4	8	25	41	57
8	12	37	61	125
12	16	49	81	225
16	20	61	101	357

CV RMSE	Interpretation
\( \approx 1.0\)	No skill — model does no better than predicting the training mean
\( < 0.5\)	Reasonable predictive skill
\( < 0.1\)	Strong fit

\( \kappa\)	Interpretation
\( < 10^4\)	Very well-conditioned
\( 10^4\) to \( 10^8\)	Acceptable
\( > 10^8\)	Disqualified — coefficient estimates are numerically unstable

Model	P (general)	P (m=5, K=4)	P (m=5, K=8)
M0	\( 1\)	1	1
M1	\( 1 + m\)	6	6
M2	\( 1 + (K+4)m\)	41	61
M3	\( 1 + (2K+3)m\)	56	86
M4	\( P_{M3} + K^2\)	72	150
M5	\( P_{M4} + K^2\)	88	214

Model	Description	P (q PCs, K=4)
B\( q\) -lin	Linear on \( q\) PCs	\( 1 + q\)
B\( q\) -spl	Additive spline on \( q\) PCs	\( 1 + 8q\)
B\( q\) -int\( _1\)	Spline + PC\( _0 \times\) PC\( _1\) interaction	\( 1 + 11q + 16\)
B\( q\) -int\( _2\)	Spline + 2 interactions	\( 1 + 11q + 32\)

Format	File	Use case
JSON	`model.json`	Model archive, custom loaders
JavaScript	`model.js`	Browser or Node, drop-in `predict()` function
Python	`model.py`	Pure Python ≥ 3.8, no NumPy required
C	`model.c`	Embedded systems, ANSI C89/C99
R	`model.R`	Base R, no packages required

Symbol	Meaning
\( N\)	Number of data rows
\( m\)	Number of input columns
\( n\)	Number of output columns
\( K\)	Number of interior knots per spline term — user-configurable (default 4, range 2–20)
\( K'\)	Refined knot count after one midpoint insertion: \( K' = 2K-1\)
\( B\)	Basis functions per coarse spline term: \( B = K + 4\)
\( B'\)	Basis functions per refined spline term: \( B' = 2K + 3\)
\( P\)	Total number of model parameters per output
\( A \in \mathbb{R}^{N \times P}\)	Design matrix
\( C \in \mathbb{R}^{P \times n}\)	Coefficient matrix
\( \lambda\)	Ridge regularization parameter
\( \Omega\)	Ridge penalty matrix: \( \mathrm{diag}(0,1,\dots,1)\)
\( \kappa\)	Condition number of \( A^\top A + \lambda\Omega\)
\( \text{edf}\)	Effective degrees of freedom: \( \mathrm{tr}(H) = P - \lambda\,\mathrm{tr}(G_\lambda^{-1}\Omega)\)
\( \text{cvRmse}\)	Mean cross-validated RMSE in standardized output space
\( q\)	Number of principal components (Branch B)
\( V\)	PCA loading matrix (columns = PC directions)

Dynamic display of documents.

Collapse and expand sections

Cross-references and related material

Discussions

Table of contents

1 What the tool does

2 Data preparation

2.1 Column selection

2.2 Row order matters

3 Standardization

4 B-spline basis

4.1 Why B-splines

4.2 Knot vector and degree

4.3 Cox–de Boor recursion

4.4 Knot placement

4.5 Choosing K

4.5.1 What K controls

4.5.2 Effect on the model ladder

4.5.3 The P/N constraint

4.5.4 Reading the diagnostics

4.5.5 When to increase K

4.5.6 When to keep K small

5 Design matrix and ridge regression

5.1 Additive model (Branch A)

5.2 Ridge regression objective

5.3 λ grid search

6 Cross-validation and scoring

6.1 k-fold protocol

6.2 RMSE interpretation

7 Condition number

8 Branch A — original-coordinate ladder

8.1 M0 — Intercept only

8.2 M1 — Linear

8.3 M2 — Additive spline

8.4 M3 — Additive spline, refined knots

8.5 M4 — M3 + one greedy interaction

8.6 M5 — M4 + second greedy interaction

9 Branch B — PCA coordinate ladder

9.1 PCA preprocessing

9.2 Branch B model classes

9.3 When to use Branch B

10 Model selection

10.1 Stopping criterion

10.2 Accuracy vs. robustness scatter plot

10.3 Ranking metric

10.4 The complexity–robustness tradeoff and the EDF signal

11 Export formats

12 Practical advice

12.1 High CV RMSE (≈ 1.0)

12.2 High condition number

12.3 Row order and temporal data

12.4 Choosing between Branch A and Branch B

12.5 Confounded inputs and the meaningfulness of the model

12.6 Effect of filtering

13 Mathematical notation summary

Discussion: create topic login to participate.