From ea59d06d8be3da8bd1f0badff3fcee18bf914ad6 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Thu, 10 Oct 2024 16:56:42 -0500
Subject: [PATCH 01/22] ht/initial hglm

---
 h2o-docs/src/product/data-science/hglm.rst | 52 ++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 h2o-docs/src/product/data-science/hglm.rst

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
new file mode 100644
index 000000000000..b46b13e60ea7
--- /dev/null
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -0,0 +1,52 @@
+Hierarchical Generalized Linear Model (HGLM) 
+============================================
+
+Hierarchical linear models (HLM) is used in situations where measurements are taken with clusters of data and there are effects of the cluster that can affect the coefficient values of GLM. For instance, if we measure the students' performances from multiple schools along with other predictors like family annual incomes, students' health, school type (public, private, religious, etc.), and etc., we suspect that students from the same school will have similar performances than students from different schools. Therefore, we can denote a coefficient for predictor :math:`m \text{ as } \beta_{mj}` where :math:`j` denotes the school index in our example. :math:`\beta_{0j}` denotes the intercept associated with school :math:`j`.
+
+
+Defining an HGLM model
+----------------------
+Parameters are optional unless specified as *required*.
+
+Algorithm-specific parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **gen_syn_data**
+- **group_column**
+- **random_intercept**
+- **em_epsilon**
+- **method**
+- **random_columns**
+- **tau_e_var_init**
+- **tau_u_var_init**
+- **initial_t_matrox**
+- **initial_random_effects**
+- **initial_fixed_effects**
+
+GLM-family parameters
+~~~~~~~~~~~~~~~~~~~~~
+
+- family
+- rand_family
+- plug_values
+
+
+Common parameters
+~~~~~~~~~~~~~~~~~
+
+- max_iterations
+- standardize
+- missing_values_handling
+- seed
+- score_values_handling
+- score_each_iteration
+- custom_metric_func
+- max_runtime_secs
+- weights_column
+- offset_column
+- ignore_const_cols
+- ignored_columns
+- response_column
+- validation_frame
+- training_frame
+- model_id
\ No newline at end of file

From 284651938d2800677e8ce78cea5a1a4c8ca7b170 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Thu, 10 Oct 2024 17:43:03 -0500
Subject: [PATCH 02/22] ht/added descriptions, ordered, values

---
 h2o-docs/src/product/data-science/hglm.rst | 111 +++++++++++++++------
 1 file changed, 82 insertions(+), 29 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index b46b13e60ea7..0d84f5f69e10 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -11,42 +11,95 @@ Parameters are optional unless specified as *required*.
 Algorithm-specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-- **gen_syn_data**
-- **group_column**
-- **random_intercept**
-- **em_epsilon**
-- **method**
-- **random_columns**
-- **tau_e_var_init**
-- **tau_u_var_init**
-- **initial_t_matrox**
-- **initial_random_effects**
-- **initial_fixed_effects**
+- **em_epsilon**: (Only available for EM method) Converge if beta/ubeta/tmat/tauEVar changes less (using L-infinity norm) than EM epsilon (defaults to ``0.001``).
+
+- **gen_syn_data**: If enabled, will add gaussian noise with variance specified in ``tau_e_var_init`` (defaults to ``False``).
+
+- **group_column**: The column that is categorical and used to generate the groups in HGLM (defaults to ``None``).
+
+- **initial_fixed_effects**: An array that contains the initial values of the fixed effects coefficient (defaults to ``None``).
+
+- **initial_random_effects**: An H2OFrame ID tht contains the initial values of the random effects coefficietn. The row names should be the random coefficient names (defaults to ``None``).
+	
+	.. note::
+
+		If you aren't sure wha the random coefficient names are, then build the HGLM model with ``max_iterations=0`` and check out the model output field ``random_coefficient_names``. The number of rows of this frame should be the number of level 2 units. To figure this out, build the HGLM model with ``max_iterations=0`` and check out the model output field ``group_column_names``. The number of rows should equal the length of the ``group_column_names``.
+
+- **initial_t_matrix**: An H2OFrame ID that contains the initial values of the T matrix. It should be a positive symmetric matrix (defaults to ``None``).
+
+- **method**: Obtains the fixed and random coefficients as well as the various variances (defaults to ``"em"``).
+
+- `random_columns <algo-params/random_columns.html>`__: An array of random column indices to be used for ``HGLM``.
+
+- **random_intercept**: If enabled, will allow random component to the GLM coefficients (defaults to ``True``).
+
+- **tau_e_var_init**: Initial varience of random noise. If set, this should provide a value of > 0.0. If not set, this will be randomly set during the model building process (defaults to ``0.0``).
+
+- **tau_u_var_init**: Initial variance of random coefficient effects. If set, should provide a value > 0.0. If not set, this will be randomly set during the model building process (defaults to ``0.0``).
 
 GLM-family parameters
 ~~~~~~~~~~~~~~~~~~~~~
 
-- family
-- rand_family
-- plug_values
+-  `family <algo-params/family.html>`__: Specify the model type.
+
+   -  If the family is ``gaussian``, the response must be numeric (**Real** or **Int**).
+   -  If the family is ``binomial``, the response must be categorical 2 levels/classes or binary (**Enum** or **Int**).
+   -  If the family is ``fractionalbinomial``, the response must be a numeric between 0 and 1.
+   -  If the family is ``multinomial``, the response can be categorical with more than two levels/classes (**Enum**).
+   -  If the family is ``ordinal``, the response must be categorical with at least 3 levels.
+   -  If the family is ``quasibinomial``, the response must be numeric.
+   -  If the family is ``poisson``, the response must be numeric and non-negative (**Int**).
+   -  If the family is ``negativebinomial``, the response must be numeric and non-negative (**Int**).
+   -  If the family is ``gamma``, the response must be numeric and continuous and positive (**Real** or **Int**).
+   -  If the family is ``tweedie``, the response must be numeric and continuous (**Real**) and non-negative.
+   -  If the family is ``AUTO`` (default),
+
+      - and the response is **Enum** with cardinality = 2, then the family is automatically determined as ``binomial``.
+      - and the response is **Enum** with cardinality > 2, then the family is automatically determined as ``multinomial``.
+      - and the response is numeric (**Real** or **Int**), then the family is automatically determined as ``gaussian``.
+
+-  `rand_family <algo-params/rand_family.html>`__: The Random Component Family specified as an array. You must include one family for each random component. Currently only ``rand_family=["gaussisan"]`` is supported.
+
+-  `plug_values <algo-params/plug_values.html>`__: (Applicable only if ``missing_values_handling="PlugValues"``) Specify a single row frame containing values that will be used to impute missing values of the training/validation frame.
 
 
 Common parameters
 ~~~~~~~~~~~~~~~~~
 
-- max_iterations
-- standardize
-- missing_values_handling
-- seed
+- `custom_metric_func <algo-params/custom_metric_func.html>`__: Specify a custom evaluation function.
+
+-  `ignore_const_cols <algo-params/ignore_const_cols.html>`__: Enable this option to ignore constant training columns, since no information can be gained from them. This option defaults to ``True`` (enabled).
+
+-  `ignored_columns <algo-params/ignored_columns.html>`__: (Python and Flow only) Specify the column or columns to be excluded from the model. In Flow, click the checkbox next to a column name to add it to the list of columns excluded from the model. To add all columns, click the **All** button. To remove a column from the list of ignored columns, click the X next to the column name. To remove all columns from the list of ignored columns, click the **None** button. To search for a specific column, type the column name in the **Search** field above the column list. To only show columns with a specific percentage of missing values, specify the percentage in the **Only show columns with more than 0% missing values** field. To change the selections for the hidden columns, use the **Select Visible** or **Deselect Visible** buttons.
+
+-  `max_iterations <algo-params/max_iterations.html>`__: Specify the number of training iterations. This options defaults to ``-1``.
+
+- `max_runtime_secs <algo-params/max_runtime_secs.html>`__: Maximum allowed runtime in seconds for model training. Use ``0`` (default) to disable. 
+
+-  `missing_values_handling <algo-params/missing_values_handling.html>`__: Specify how to handle missing values. One of: ``Skip``, ``MeanImputation`` (default), or ``PlugValues``.
+
+-  `model_id <algo-params/model_id.html>`__: Specify a custom name for the model to use as a reference. By default, H2O automatically generates a destination key.
+
+-  `offset_column <algo-params/offset_column.html>`__: Specify a column to use as the offset; the value cannot be the same as the value for the ``weights_column``.
+   
+     **Note**: Offsets are per-row "bias values" that are used during model training. For Gaussian distributions, they can be seen as simple corrections to the response (``y``) column. Instead of learning to predict the response (y-row), the model learns to predict the (row) offset of the response column. For other distributions, the offset corrections are applied in the linearized space before applying the inverse link function to get the actual response values. 
+
+-  `score_each_iteration <algo-params/score_each_iteration.html>`__: Enable this option to score during each iteration of the model training. This option defaults to ``False`` (disabled).
 - score_values_handling
-- score_each_iteration
-- custom_metric_func
-- max_runtime_secs
-- weights_column
-- offset_column
-- ignore_const_cols
-- ignored_columns
-- response_column
-- validation_frame
-- training_frame
-- model_id
\ No newline at end of file
+-  `seed <algo-params/seed.html>`__: Specify the random number generator (RNG) seed for algorithm components dependent on randomization. The seed is consistent for each H2O instance so that you can create models with the same starting conditions in alternative configurations. This option defaults to ``-1`` (time-based random number).
+-  `standardize <algo-params/standardize.html>`__: Specify whether to standardize the numeric columns to have a mean of zero and unit variance. Standardization is highly recommended; if you do not use standardization, the results can include components that are dominated by variables that appear to have larger variances relative to other attributes as a matter of scale, rather than true contribution. This option defaults to ``True`` (enabled).
+-  `training_frame <algo-params/training_frame.html>`__: *Required* Specify the dataset used to build the model. **NOTE**: In Flow, if you click the **Build a model** button from the ``Parse`` cell, the training frame is entered automatically.
+-  `validation_frame <algo-params/validation_frame.html>`__: Specify the dataset used to evaluate the accuracy of the model.
+-  `weights_column <algo-params/weights_column.html>`__: Specify a column to use for the observation weights, which are used for bias correction. The specified ``weights_column`` must be included in the specified ``training_frame``. 
+   
+    *Python only*: To use a weights column when passing an H2OFrame to ``x`` instead of a list of column names, the specified ``training_frame`` must contain the specified ``weights_column``. 
+   
+    **Note**: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more due to the larger loss function pre-factor.
+
+-  `x <algo-params/x.html>`__: Specify a vector containing the names or indices of the predictor variables to use when building the model. If ``x`` is missing, then all columns except ``y`` are used.
+
+-  `y <algo-params/y.html>`__: *Required* Specify the column to use as the dependent variable.
+
+   -  For a regression model, this column must be numeric (**Real** or **Int**).
+   -  For a classification model, this column must be categorical (**Enum** or **String**). If the family is ``Binomial``, the dataset cannot contain more than two levels.
+

From 352db78a8a48c194467e6c6f2e4241c060c7cc75 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Thu, 10 Oct 2024 17:44:47 -0500
Subject: [PATCH 03/22] ht/references

---
 h2o-docs/src/product/data-science/hglm.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 0d84f5f69e10..b624018afac3 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -103,3 +103,13 @@ Common parameters
    -  For a regression model, this column must be numeric (**Real** or **Int**).
    -  For a classification model, this column must be categorical (**Enum** or **String**). If the family is ``Binomial``, the dataset cannot contain more than two levels.
 
+References
+----------
+
+David Ruppert, M. P. Wand and R. J. Carroll, Semiparametric Regression, Chapter 4, Cambridge University Press, 2003.
+
+Stephen w. Raudenbush, Anthony S. Bryk, Hierarchical Linear Models Applications and Data Analysis Methods, Second Edition, Sage Publications, 2002.
+
+Rao, C. R. (1973). Linear Statistical Inference and Its Applications. New York: Wiley. 
+
+Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Seires B, 39, 1-8.
\ No newline at end of file

From 173fa2a6d3d1cdf7a75ddbd928d9e8e089fd15b8 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Thu, 10 Oct 2024 18:07:37 -0500
Subject: [PATCH 04/22] ht/pulled old intro; defining hlm to own section

---
 h2o-docs/src/product/data-science/hglm.rst | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index b624018afac3..b95621918a58 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -1,8 +1,7 @@
 Hierarchical Generalized Linear Model (HGLM) 
 ============================================
 
-Hierarchical linear models (HLM) is used in situations where measurements are taken with clusters of data and there are effects of the cluster that can affect the coefficient values of GLM. For instance, if we measure the students' performances from multiple schools along with other predictors like family annual incomes, students' health, school type (public, private, religious, etc.), and etc., we suspect that students from the same school will have similar performances than students from different schools. Therefore, we can denote a coefficient for predictor :math:`m \text{ as } \beta_{mj}` where :math:`j` denotes the school index in our example. :math:`\beta_{0j}` denotes the intercept associated with school :math:`j`.
-
+HGLM fits generalized linear models with random effects, where the random effect can come from a conjugate exponential-family distribution (for example, gaussian). HGLM lets you specify both fixed and random effects, which allows for fitting correlation to random effects as well as random regression models. HGLM can be used for linear mixed models and for generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects.
 
 Defining an HGLM model
 ----------------------
@@ -103,6 +102,11 @@ Common parameters
    -  For a regression model, this column must be numeric (**Real** or **Int**).
    -  For a classification model, this column must be categorical (**Enum** or **String**). If the family is ``Binomial``, the dataset cannot contain more than two levels.
 
+Definining an HLM
+-----------------
+
+Hierarchical linear models (HLM) is used in situations where measurements are taken with clusters of data and there are effects of the cluster that can affect the coefficient values of GLM. For instance, if we measure the students' performances from multiple schools along with other predictors like family annual incomes, students' health, school type (public, private, religious, etc.), and etc., we suspect that students from the same school will have similar performances than students from different schools. Therefore, we can denote a coefficient for predictor :math:`m \text{ as } \beta_{mj}` where :math:`j` denotes the school index in our example. :math:`\beta_{0j}` denotes the intercept associated with school :math:`j`.
+
 References
 ----------
 

From fe4d7eef7894060fdca319d627e0acad4bdc5ac6 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Thu, 10 Oct 2024 21:09:24 -0500
Subject: [PATCH 05/22] ht/thru pg2

---
 h2o-docs/src/product/data-science/hglm.rst | 40 ++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index b95621918a58..28b1f81c3d44 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -107,6 +107,46 @@ Definining an HLM
 
 Hierarchical linear models (HLM) is used in situations where measurements are taken with clusters of data and there are effects of the cluster that can affect the coefficient values of GLM. For instance, if we measure the students' performances from multiple schools along with other predictors like family annual incomes, students' health, school type (public, private, religious, etc.), and etc., we suspect that students from the same school will have similar performances than students from different schools. Therefore, we can denote a coefficient for predictor :math:`m \text{ as } \beta_{mj}` where :math:`j` denotes the school index in our example. :math:`\beta_{0j}` denotes the intercept associated with school :math:`j`.
 
+A level-1 HLM can be expressed as:
+
+.. math::
+	
+	y_{ij} = \beta_{0j} + \sum_{m=1}^{p-1} x_{mij} \beta{mj} + \varepsilon_{ij} \quad \text{ equation 1}
+
+The level-2 model can be expressed as:
+	
+.. math::
+	
+	\beta_{0j} = \beta_{00} + u_{0j}, \beta_{mj} = \beta_{m0} + u_{mj} \quad \text{ equation 2}
+
+where:
+
+- :math:`j(=[1,2,...,J])` denotes the cluster (level-2 variable) the measurement is taken from (e.g. the school index);
+- :math:`i(=1,2,...,n_j)` denotes the data index taken from within cluster :math:`j`;
+- :math:`\beta_{00}` is the fixed intercept;
+- :math:`\beta_{0j}` is the random intercept;
+- :math:`\beta_{m0}` is the fixed coefficient for predictor :math:`m`;
+- The dimension of fixed effect coefficients is :math:`p` which includes the intercept;
+- :math:`u_{mj}` is the random coefficient for predictor :math:`m`. For predictors without a random coefficient, :math:`u_{mj} = 0`;
+- The dimension of the random effect coefficients is :math:`q` which can include the intercept. Note that :math:`q \leq p`;
+- :math:`\varepsilon_{ij} \sim N(0, \delta_e^2)`;
+- :math:`u_{ij} \sim N(0, \delta_u^2)`:
+- :math:`\varepsilon_{ij}, u_{mj}` are independent;
+- :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m`.
+
+We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_{m0}, u_{mj}, \delta_e^2, \delta_u^2`. To do this, we use the standard linear mixed model expressed with vectors and matrices:
+
+.. math::
+	
+	Y = X\beta + Z u + e \quad \text{ equation 3}
+
+where:
+
+- :math:`Y = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_{1}1} \\ y_{12} \\ y_{22} \\ \vdots \y_{n_{2}2} \\ \vdots \\ y_{1J} \\ y_{2J} \\ \vdots \\ y_{n_{J}J} \\\end{bmatrix}` is a :math:`n(= \sum^J_{j=1} n_j)` by 1 vector where :math:`n` is the number of all independent and identically distributed observations across all clusters;
+- :math:`X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_J \\\end{bmatrix}` where :math:`X_j = \begin{bmatrix} 1 & x_{11j} & x_{21j} & \cdots & x_{(p-1)1j} \\ 1 & x_{12j} & x_{22j} & \cdots & x_{(p-1)2j} \\ 1 & x_{13j} & x_{23j} & \cdots & x_{(p-1)3j} \\ \vdots & \vdots & \ddots & \cdots & \vdots \\ 1 & x_{1n_{j}j} & x_{2n_{j}j} & \cdots & x_{(p-1)n_{j}j} \\\end{bmatrix} = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_j} \\\end{bmatrix}`. We are just stacking all the :math:`X_j` across all the clusters;
+- :math:`\beta = \start{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0}` is :math:`p` by 1 fixed coefficients vector including the intercept;
+- :math:`Z = \begin{bmatrix} Z_1 & 0_{12} & 0_{13} & \cdots & 0_{1J} \\ 0_{21} & Z_2 & 0_{23} & \cdots & 0_{2J} \\ 0_{31} & 0_{32} & Z_3 & \cdots & 0_{3J} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{J1} & 0_{J2} & 0_{J3} & \cdots & Z_J \\\end{bmatrix}` where :math:`Z_J \text{ is a } n_j \times q` matrix, and :math:`0_{ij} n_i \times q` is a zero matrix. Therefore, :math:`Z` is a :math:`n \times (J * q)` matrix containing blocks of non-zero sub-matrices across its diagonal;
+
 References
 ----------
 

From cba131d7004144b9a561450e388f50cc23d3e1b4 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Fri, 11 Oct 2024 13:13:36 -0500
Subject: [PATCH 06/22] ht/first thru HLM section

---
 h2o-docs/src/product/data-science/hglm.rst | 23 ++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 28b1f81c3d44..e152375f35bf 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -146,6 +146,29 @@ where:
 - :math:`X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_J \\\end{bmatrix}` where :math:`X_j = \begin{bmatrix} 1 & x_{11j} & x_{21j} & \cdots & x_{(p-1)1j} \\ 1 & x_{12j} & x_{22j} & \cdots & x_{(p-1)2j} \\ 1 & x_{13j} & x_{23j} & \cdots & x_{(p-1)3j} \\ \vdots & \vdots & \ddots & \cdots & \vdots \\ 1 & x_{1n_{j}j} & x_{2n_{j}j} & \cdots & x_{(p-1)n_{j}j} \\\end{bmatrix} = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_j} \\\end{bmatrix}`. We are just stacking all the :math:`X_j` across all the clusters;
 - :math:`\beta = \start{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0}` is :math:`p` by 1 fixed coefficients vector including the intercept;
 - :math:`Z = \begin{bmatrix} Z_1 & 0_{12} & 0_{13} & \cdots & 0_{1J} \\ 0_{21} & Z_2 & 0_{23} & \cdots & 0_{2J} \\ 0_{31} & 0_{32} & Z_3 & \cdots & 0_{3J} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{J1} & 0_{J2} & 0_{J3} & \cdots & Z_J \\\end{bmatrix}` where :math:`Z_J \text{ is a } n_j \times q` matrix, and :math:`0_{ij} n_i \times q` is a zero matrix. Therefore, :math:`Z` is a :math:`n \times (J * q)` matrix containing blocks of non-zero sub-matrices across its diagonal;
+- :math:`u = \begin{bmatrix} u_{01} \\ u_{11} \\ u_{(q-1)1} \\ u_{02} \\ u_{12} \\ \vdots \\ u_{(q-1)2} \\ \vdots \\ u_{0J} \\ u_{1J} \\ \vdots \\ u_{(q-1)J} \\\end{bmatrix} \text{ if } J * q` by 1 random effects vector and some coefficients may not have a random effect;
+- :math:`e \sim N(0, \delta^2_e I_n), u \sim N (0, \delta^2_u I_{(J*q)}) \text{ where } I_n \text{ is an } n \times n \text{ and } I_{(J*q)} \text{ is an } (J*q) \times (J*q)` identity matrix;
+- :math:`e,u` are independent;
+- :math:`E \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\\end{bmatrix} , cov \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} G & 0 \\ 0 & R \\\end{bmatrix} , G = \delta^2_u I_{(J*q)} , R = \delta^2_e I_{n \cdot} E \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a size } (J * q + n) \text{ vector }, cov \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a } (J * q + n) \times (J * q + n)` matrix. 
+
+In addition, we also consider the following alternate form:
+
+.. math::
+   
+   Y = X\beta + e^*, e^* = Zu + e \quad \text{ equation 4}
+
+where:
+
+.. math::
+   
+   cov(e^*) = V = ZGZ^T + R = \delta^2_u ZZ^T + \delta^2_e I_n \quad \text{ equation 5}
+
+We solve for :math:`\beta, u, \delta^2_u, \text{ and } \delta^2_e`.
+
+Estimation of parameters using machine learning estimation via EM
+-----------------------------------------------------------------
+
+The EM algorithm addresses 
 
 References
 ----------

From 96c8f1a00af15831a14c8fab5d6e7c41e5b9d18f Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Fri, 11 Oct 2024 14:17:51 -0500
Subject: [PATCH 07/22] ht/next section

---
 h2o-docs/src/product/data-science.rst      |  1 +
 h2o-docs/src/product/data-science/hglm.rst | 45 +++++++++++++++++++---
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/h2o-docs/src/product/data-science.rst b/h2o-docs/src/product/data-science.rst
index bf6fef75163f..b7b16ff0403e 100644
--- a/h2o-docs/src/product/data-science.rst
+++ b/h2o-docs/src/product/data-science.rst
@@ -42,6 +42,7 @@ H2O-3 supports the following supervised algorithms:
    data-science/model_selection
    data-science/gam
    data-science/anova_glm
+   data-science/hglm
    data-science/gbm
    data-science/naive-bayes
    data-science/rulefit
diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index e152375f35bf..58630a45566c 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -81,7 +81,9 @@ Common parameters
 
 -  `offset_column <algo-params/offset_column.html>`__: Specify a column to use as the offset; the value cannot be the same as the value for the ``weights_column``.
    
-     **Note**: Offsets are per-row "bias values" that are used during model training. For Gaussian distributions, they can be seen as simple corrections to the response (``y``) column. Instead of learning to predict the response (y-row), the model learns to predict the (row) offset of the response column. For other distributions, the offset corrections are applied in the linearized space before applying the inverse link function to get the actual response values. 
+     .. note:: 
+
+      Offsets are per-row "bias values" that are used during model training. For Gaussian distributions, they can be seen as simple corrections to the response (``y``) column. Instead of learning to predict the response (y-row), the model learns to predict the (row) offset of the response column. For other distributions, the offset corrections are applied in the linearized space before applying the inverse link function to get the actual response values. 
 
 -  `score_each_iteration <algo-params/score_each_iteration.html>`__: Enable this option to score during each iteration of the model training. This option defaults to ``False`` (disabled).
 - score_values_handling
@@ -93,7 +95,9 @@ Common parameters
    
     *Python only*: To use a weights column when passing an H2OFrame to ``x`` instead of a list of column names, the specified ``training_frame`` must contain the specified ``weights_column``. 
    
-    **Note**: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more due to the larger loss function pre-factor.
+    .. note:: 
+
+      Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more due to the larger loss function pre-factor.
 
 -  `x <algo-params/x.html>`__: Specify a vector containing the names or indices of the predictor variables to use when building the model. If ``x`` is missing, then all columns except ``y`` are used.
 
@@ -142,11 +146,11 @@ We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_
 
 where:
 
-- :math:`Y = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_{1}1} \\ y_{12} \\ y_{22} \\ \vdots \y_{n_{2}2} \\ \vdots \\ y_{1J} \\ y_{2J} \\ \vdots \\ y_{n_{J}J} \\\end{bmatrix}` is a :math:`n(= \sum^J_{j=1} n_j)` by 1 vector where :math:`n` is the number of all independent and identically distributed observations across all clusters;
+- :math:`Y = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_{1}1} \\ y_{12} \\ y_{22} \\ \vdots \y_{n_{2}2} \\ \vdots \\ y_{1J} \\ y_{2J} \\ \vdots \\ y_{n_{J}J} \\\end{bmatrix}` is a :math:`n(= \sum^J_{j=1} n_j)` by 1 vector where :math:`n` is the number of all independent and identically distributed (i.i.d.) observations across all clusters;
 - :math:`X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_J \\\end{bmatrix}` where :math:`X_j = \begin{bmatrix} 1 & x_{11j} & x_{21j} & \cdots & x_{(p-1)1j} \\ 1 & x_{12j} & x_{22j} & \cdots & x_{(p-1)2j} \\ 1 & x_{13j} & x_{23j} & \cdots & x_{(p-1)3j} \\ \vdots & \vdots & \ddots & \cdots & \vdots \\ 1 & x_{1n_{j}j} & x_{2n_{j}j} & \cdots & x_{(p-1)n_{j}j} \\\end{bmatrix} = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_j} \\\end{bmatrix}`. We are just stacking all the :math:`X_j` across all the clusters;
-- :math:`\beta = \start{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0}` is :math:`p` by 1 fixed coefficients vector including the intercept;
+- :math:`\beta = \start{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0}` is a :math:`p` by 1 fixed coefficients vector including the intercept;
 - :math:`Z = \begin{bmatrix} Z_1 & 0_{12} & 0_{13} & \cdots & 0_{1J} \\ 0_{21} & Z_2 & 0_{23} & \cdots & 0_{2J} \\ 0_{31} & 0_{32} & Z_3 & \cdots & 0_{3J} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{J1} & 0_{J2} & 0_{J3} & \cdots & Z_J \\\end{bmatrix}` where :math:`Z_J \text{ is a } n_j \times q` matrix, and :math:`0_{ij} n_i \times q` is a zero matrix. Therefore, :math:`Z` is a :math:`n \times (J * q)` matrix containing blocks of non-zero sub-matrices across its diagonal;
-- :math:`u = \begin{bmatrix} u_{01} \\ u_{11} \\ u_{(q-1)1} \\ u_{02} \\ u_{12} \\ \vdots \\ u_{(q-1)2} \\ \vdots \\ u_{0J} \\ u_{1J} \\ \vdots \\ u_{(q-1)J} \\\end{bmatrix} \text{ if } J * q` by 1 random effects vector and some coefficients may not have a random effect;
+- :math:`u = \begin{bmatrix} u_{01} \\ u_{11} \\ u_{(q-1)1} \\ u_{02} \\ u_{12} \\ \vdots \\ u_{(q-1)2} \\ \vdots \\ u_{0J} \\ u_{1J} \\ \vdots \\ u_{(q-1)J} \\\end{bmatrix} \text{ is a } J * q` by 1 random effects vector and some coefficients may not have a random effect;
 - :math:`e \sim N(0, \delta^2_e I_n), u \sim N (0, \delta^2_u I_{(J*q)}) \text{ where } I_n \text{ is an } n \times n \text{ and } I_{(J*q)} \text{ is an } (J*q) \times (J*q)` identity matrix;
 - :math:`e,u` are independent;
 - :math:`E \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\\end{bmatrix} , cov \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} G & 0 \\ 0 & R \\\end{bmatrix} , G = \delta^2_u I_{(J*q)} , R = \delta^2_e I_{n \cdot} E \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a size } (J * q + n) \text{ vector }, cov \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a } (J * q + n) \times (J * q + n)` matrix. 
@@ -168,7 +172,36 @@ We solve for :math:`\beta, u, \delta^2_u, \text{ and } \delta^2_e`.
 Estimation of parameters using machine learning estimation via EM
 -----------------------------------------------------------------
 
-The EM algorithm addresses 
+The EM algorithm addresses the problem of maximizing the likelihood by conceiving this as a problem with missing data.
+
+Model setup
+~~~~~~~~~~~
+
+Consider a combined model for each unit :math:`j`:
+
+.. math::
+   
+   Y_j = A_{fj} \theta_f + A_{rj} \theta_{rj} + r_j, \theta_{rj} \sim N(0,T_j), r_j \sim N(0, \sigma^2I) \quad \text{ equation 6}
+
+where:
+
+- :math:`Y_j = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_j} \\\end{bmatrix}` is a known :math:`n_j \text{ by } p` matrix of level-1 predictors and :math:`x_{ji} = \begin{bmatrix} x^1_{ji} \\ x^2_{ji} \\ \vdots \\ x^{p-1}_{ji} \\ 1 \\\end{bmatrix}`;
+   
+   .. note::
+
+      In general, you can place the intercept at the beginning or the end of each row of data, but we chose to put it at the end for our implementation.
+
+- :math:`\theta_f \text{ is a } p` by 1 vector of fixed coefficients;
+- :math:`A_{rj}` is usually denoted by :math:`Z_j \text{ where } Z_j = \begin{bmatrix} z^T_{j1} \\ z^T_{j2} \\ z^T_{j3} \\ \vdots \\ z^T_{jn_j} \\\end{bmatrix}`;
+   
+   .. note::
+
+      We included a term for the random intercept here. However, there are cases where we do not have a random intercept, and the last element of 1 will not be there for :math:`z_{ji}`.
+
+- :math:`\theta_{rj}` represents the random coefficient and is a :math:`q` by 1 vector;
+- :math:`r_j \text{ is an } n_j`by 1 vector of level-1 random effects assumed multivariate normal in distribution with 0 mean vector, covariance matrix :math:`\sigma^2 I_n_{j\times nj} \text{ where } I_n_{j \times nj}` is the identity matrix, :math:`n_j \text{ by } n_j`;
+- :math:`j` denotes the level-2 units where :math:`j = 1,2, \cdots , J`;
+- :math:`T_j` is a symmetric positive definite matrix of size :math:`n_j \text{ by } n_j`. For simplicity, all :math:`T_j` are the same. We assume that :math:`T_j` is the same for all :math:`j = 1,2, \cdots , J`. However, we can assume that the fixed coefficients are i.i.d. :math:`\sim N (0, \sigma^2_u I_{n_j \times n_j})` for simplicity initially and keep :math:`T_j` to be symmetric positive definite matrix as the iteration continues.
 
 References
 ----------

From ae2565221f45299272c3876ef9e9fc194d056b2f Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Fri, 11 Oct 2024 14:28:38 -0500
Subject: [PATCH 08/22] ht/added to available algo-params

---
 .../src/product/data-science/algo-params/custom_metric_func.rst | 2 +-
 h2o-docs/src/product/data-science/algo-params/family.rst        | 2 +-
 .../src/product/data-science/algo-params/ignore_const_cols.rst  | 2 +-
 .../src/product/data-science/algo-params/ignored_columns.rst    | 2 +-
 .../src/product/data-science/algo-params/max_iterations.rst     | 2 +-
 .../src/product/data-science/algo-params/max_runtime_secs.rst   | 2 +-
 .../data-science/algo-params/missing_values_handling.rst        | 2 +-
 h2o-docs/src/product/data-science/algo-params/model_id.rst      | 2 +-
 h2o-docs/src/product/data-science/algo-params/offset_column.rst | 2 +-
 h2o-docs/src/product/data-science/algo-params/plug_values.rst   | 2 +-
 h2o-docs/src/product/data-science/algo-params/rand_family.rst   | 2 +-
 .../src/product/data-science/algo-params/random_columns.rst     | 2 +-
 .../product/data-science/algo-params/score_each_iteration.rst   | 2 +-
 h2o-docs/src/product/data-science/algo-params/seed.rst          | 2 +-
 h2o-docs/src/product/data-science/algo-params/standardize.rst   | 2 +-
 .../src/product/data-science/algo-params/training_frame.rst     | 2 +-
 .../src/product/data-science/algo-params/validation_frame.rst   | 2 +-
 .../src/product/data-science/algo-params/weights_column.rst     | 2 +-
 h2o-docs/src/product/data-science/algo-params/x.rst             | 2 +-
 h2o-docs/src/product/data-science/algo-params/y.rst             | 2 +-
 20 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/h2o-docs/src/product/data-science/algo-params/custom_metric_func.rst b/h2o-docs/src/product/data-science/algo-params/custom_metric_func.rst
index f58bf37805ce..2895d9d27a8d 100644
--- a/h2o-docs/src/product/data-science/algo-params/custom_metric_func.rst
+++ b/h2o-docs/src/product/data-science/algo-params/custom_metric_func.rst
@@ -3,7 +3,7 @@
 ``custom_metric_func``
 ----------------------
 
-- Available in: GBM, GLM, DRF, Deeplearning, Stacked Ensembles, XGBoost
+- Available in: GBM, GLM, HGLM, DRF, Deeplearning, Stacked Ensembles, XGBoost
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/family.rst b/h2o-docs/src/product/data-science/algo-params/family.rst
index 29a5b28c60ba..91e6572f52a5 100644
--- a/h2o-docs/src/product/data-science/algo-params/family.rst
+++ b/h2o-docs/src/product/data-science/algo-params/family.rst
@@ -1,7 +1,7 @@
 ``family``
 ----------
 
-- Available in: GLM, GAM
+- Available in: GLM, GAM, HGLM
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/ignore_const_cols.rst b/h2o-docs/src/product/data-science/algo-params/ignore_const_cols.rst
index bb1f3865a9a9..a940534d60b3 100644
--- a/h2o-docs/src/product/data-science/algo-params/ignore_const_cols.rst
+++ b/h2o-docs/src/product/data-science/algo-params/ignore_const_cols.rst
@@ -1,7 +1,7 @@
 ``ignore_const_cols``
 ---------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Aggregator, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Aggregator, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/ignored_columns.rst b/h2o-docs/src/product/data-science/algo-params/ignored_columns.rst
index 3bdd8ed63e64..decc242aae60 100644
--- a/h2o-docs/src/product/data-science/algo-params/ignored_columns.rst
+++ b/h2o-docs/src/product/data-science/algo-params/ignored_columns.rst
@@ -1,7 +1,7 @@
 ``ignored_columns``
 -------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/max_iterations.rst b/h2o-docs/src/product/data-science/algo-params/max_iterations.rst
index 1d15c6240ade..c30afb10eb3d 100644
--- a/h2o-docs/src/product/data-science/algo-params/max_iterations.rst
+++ b/h2o-docs/src/product/data-science/algo-params/max_iterations.rst
@@ -1,7 +1,7 @@
 ``max_iterations``
 ------------------
 
-- Available in: GLM,  GAM, PCA, GLRM, K-Means, CoxPH
+- Available in: GLM, GAM, HGLM, PCA, GLRM, K-Means, CoxPH
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/max_runtime_secs.rst b/h2o-docs/src/product/data-science/algo-params/max_runtime_secs.rst
index 40803a1e173e..bd1f3848e7b0 100644
--- a/h2o-docs/src/product/data-science/algo-params/max_runtime_secs.rst
+++ b/h2o-docs/src/product/data-science/algo-params/max_runtime_secs.rst
@@ -3,7 +3,7 @@
 ``max_runtime_secs``
 -----------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, PCA, GLRM, Naïve-Bayes, K-Means, AutoML, XGBoost, Word2vec, Isolation Forest, Stacked Ensembles, Uplift DRF
+- Available in: GBM, DRF, Deep Learning, GLM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, AutoML, XGBoost, Word2vec, Isolation Forest, Stacked Ensembles, Uplift DRF
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/missing_values_handling.rst b/h2o-docs/src/product/data-science/algo-params/missing_values_handling.rst
index 4785d6cad8a8..bc3d18080e82 100644
--- a/h2o-docs/src/product/data-science/algo-params/missing_values_handling.rst
+++ b/h2o-docs/src/product/data-science/algo-params/missing_values_handling.rst
@@ -1,7 +1,7 @@
 ``missing_values_handling``
 ---------------------------
 
-- Available in: Deep Learning, GLM, GAM
+- Available in: Deep Learning, GLM, GAM, HGLM
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/model_id.rst b/h2o-docs/src/product/data-science/algo-params/model_id.rst
index 222a4188c5b9..ce49357b1fd5 100644
--- a/h2o-docs/src/product/data-science/algo-params/model_id.rst
+++ b/h2o-docs/src/product/data-science/algo-params/model_id.rst
@@ -1,7 +1,7 @@
 ``model_id``
 ------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, PCA, GLRM, Naïve-Bayes, K-Means, Word2Vec, Stacked Ensembles, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Word2Vec, Stacked Ensembles, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/offset_column.rst b/h2o-docs/src/product/data-science/algo-params/offset_column.rst
index 999c763a6694..584d7a00c390 100644
--- a/h2o-docs/src/product/data-science/algo-params/offset_column.rst
+++ b/h2o-docs/src/product/data-science/algo-params/offset_column.rst
@@ -1,7 +1,7 @@
 ``offset_column``
 -----------------
 
-- Available in: GBM, Deep Learning, GLM, GAM, CoxPH, XGBoost, Stacked Ensembles
+- Available in: GBM, Deep Learning, GLM, GAM, HGLM, CoxPH, XGBoost, Stacked Ensembles
 - Hyperparameter: no
 
 
diff --git a/h2o-docs/src/product/data-science/algo-params/plug_values.rst b/h2o-docs/src/product/data-science/algo-params/plug_values.rst
index 9c60330da220..550662d65618 100644
--- a/h2o-docs/src/product/data-science/algo-params/plug_values.rst
+++ b/h2o-docs/src/product/data-science/algo-params/plug_values.rst
@@ -1,7 +1,7 @@
 ``plug_values``
 ---------------
 
-- Available in: GLM, GAM
+- Available in: GLM, GAM, HGLM
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/rand_family.rst b/h2o-docs/src/product/data-science/algo-params/rand_family.rst
index d02187b3ffa3..93039511d074 100644
--- a/h2o-docs/src/product/data-science/algo-params/rand_family.rst
+++ b/h2o-docs/src/product/data-science/algo-params/rand_family.rst
@@ -1,7 +1,7 @@
 ``rand_family``
 ---------------
 
-- Available in: GLM
+- Available in: HGLM
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/random_columns.rst b/h2o-docs/src/product/data-science/algo-params/random_columns.rst
index 7e7c53406ab5..01076d51a91f 100644
--- a/h2o-docs/src/product/data-science/algo-params/random_columns.rst
+++ b/h2o-docs/src/product/data-science/algo-params/random_columns.rst
@@ -1,7 +1,7 @@
 ``random_columns``
 ------------------
 
-- Available in: GLM
+- Available in: HGLM
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/score_each_iteration.rst b/h2o-docs/src/product/data-science/algo-params/score_each_iteration.rst
index 9a759131e6aa..28543b73f03f 100644
--- a/h2o-docs/src/product/data-science/algo-params/score_each_iteration.rst
+++ b/h2o-docs/src/product/data-science/algo-params/score_each_iteration.rst
@@ -3,7 +3,7 @@
 ``score_each_iteration``
 ------------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Isolation Forest, Extended Isolation Forest, Uplift DRF
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Isolation Forest, Extended Isolation Forest, Uplift DRF
 - Hyperparameter: no
 
 
diff --git a/h2o-docs/src/product/data-science/algo-params/seed.rst b/h2o-docs/src/product/data-science/algo-params/seed.rst
index 334db7479a1e..96f918db857f 100644
--- a/h2o-docs/src/product/data-science/algo-params/seed.rst
+++ b/h2o-docs/src/product/data-science/algo-params/seed.rst
@@ -1,7 +1,7 @@
 ``seed``
 --------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, PCA, GLRM, Naïve-Bayes, K-Means, AutoML, XGBoost, Stacked Ensembles, Isolation Forest, Target Encoding, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, AutoML, XGBoost, Stacked Ensembles, Isolation Forest, Target Encoding, Extended Isolation Forest, Uplift DRF, AdaBoost
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/standardize.rst b/h2o-docs/src/product/data-science/algo-params/standardize.rst
index 946fbd60c8f7..0402dea6ffa2 100644
--- a/h2o-docs/src/product/data-science/algo-params/standardize.rst
+++ b/h2o-docs/src/product/data-science/algo-params/standardize.rst
@@ -1,7 +1,7 @@
 ``standardize``
 ---------------
 
-- Available in: Deep Learning, GLM, GAM, K-Means
+- Available in: Deep Learning, GLM, GAM, HGLM, K-Means
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/training_frame.rst b/h2o-docs/src/product/data-science/algo-params/training_frame.rst
index 76d9dd4ad91b..52929dfbb314 100644
--- a/h2o-docs/src/product/data-science/algo-params/training_frame.rst
+++ b/h2o-docs/src/product/data-science/algo-params/training_frame.rst
@@ -1,7 +1,7 @@
 ``training_frame``
 ------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, PCA, GLRM, Naïve-Bayes, K-Means, Word2Vec, Stacked Ensembles, AutoML, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Word2Vec, Stacked Ensembles, AutoML, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/validation_frame.rst b/h2o-docs/src/product/data-science/algo-params/validation_frame.rst
index 4725a02883be..9ca03bc88479 100644
--- a/h2o-docs/src/product/data-science/algo-params/validation_frame.rst
+++ b/h2o-docs/src/product/data-science/algo-params/validation_frame.rst
@@ -1,7 +1,7 @@
 ``validation_frame``
 --------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, PCA, GLRM, Naïve-Bayes, K-Means, Stacked Ensembles, AutoML, XGBoost, Uplift DRF
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Stacked Ensembles, AutoML, XGBoost, Uplift DRF
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/weights_column.rst b/h2o-docs/src/product/data-science/algo-params/weights_column.rst
index 3ecbda1ca3b6..ac06c64ef4a2 100644
--- a/h2o-docs/src/product/data-science/algo-params/weights_column.rst
+++ b/h2o-docs/src/product/data-science/algo-params/weights_column.rst
@@ -1,7 +1,7 @@
 ``weights_column``
 ------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, AutoML, XGBoost, CoxPH, Stacked Ensembles, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, AutoML, XGBoost, CoxPH, Stacked Ensembles, AdaBoost
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/x.rst b/h2o-docs/src/product/data-science/algo-params/x.rst
index 98e27e8a5bc9..a172239d03fa 100644
--- a/h2o-docs/src/product/data-science/algo-params/x.rst
+++ b/h2o-docs/src/product/data-science/algo-params/x.rst
@@ -1,7 +1,7 @@
 ``x``
 -----
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, PCA, GLRM, Naïve-Bayes, K-Means, Stacked Ensembles, AutoML, XGBoost, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Stacked Ensembles, AutoML, XGBoost, Uplift DRF, AdaBoost
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/y.rst b/h2o-docs/src/product/data-science/algo-params/y.rst
index 82c7276707d1..841f3dcacf75 100644
--- a/h2o-docs/src/product/data-science/algo-params/y.rst
+++ b/h2o-docs/src/product/data-science/algo-params/y.rst
@@ -1,6 +1,6 @@
 ``y``
 -----
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, Naïve-Bayes, Stacked Ensembles, AutoML, XGBoost, Aggregator, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, Naïve-Bayes, Stacked Ensembles, AutoML, XGBoost, Aggregator, Uplift DRF, AdaBoost
 - Hyperparameter: no
 
 

From 771edca91bceb591e4594bc1e0f11ddb9982cd0e Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Fri, 11 Oct 2024 15:13:26 -0500
Subject: [PATCH 09/22] ht/next section draft

---
 h2o-docs/src/product/data-science/hglm.rst | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 58630a45566c..549dfa0ea8d2 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -203,6 +203,50 @@ where:
 - :math:`j` denotes the level-2 units where :math:`j = 1,2, \cdots , J`;
 - :math:`T_j` is a symmetric positive definite matrix of size :math:`n_j \text{ by } n_j`. For simplicity, all :math:`T_j` are the same. We assume that :math:`T_j` is the same for all :math:`j = 1,2, \cdots , J`. However, we can assume that the fixed coefficients are i.i.d. :math:`\sim N (0, \sigma^2_u I_{n_j \times n_j})` for simplicity initially and keep :math:`T_j` to be symmetric positive definite matrix as the iteration continues.
 
+M step
+~~~~~~
+
+Expectation-Maximization (EM) conceives of :math:`Y_j` as the observed data with :math:`\theta_{rj}` as the missing data. Therefore, the complete data are :math:`(Y_j, \theta_{rj}), j=1, \cdots, J \text{ while } \theta_f, \sigma^2, \text{ and } T_j` are the parameters that need to be estimated. If the complete data were observed, finding the ML estimates will be simple. To estimate :math:`\theta_f`, subtract :math:`A_{rj} \theta_{rj}` from both sides of *equation 6*:
+
+.. math::
+   
+   Y_j - A_{rj} \theta_{rj} = A_{fj} \theta_f + r_f \quad \text{ equation 7}
+
+and justifying the ordinary least squares (OLS) estimate:
+
+.. math::
+   
+   \hat{\theta_f} = \Big( \sum^J_{j=1} A^T_{fj} A_{fj} \Big)^{-1} \sum^J_{j=1} A^T_{fj} (Y_j - A_{rj} \theta_{rj}) \quad \text{ equation 8}
+
+*Equation 8* can also be solved by multipying *equation 7* with :math:`A^T_{fj}` and sum across the level-2 unit :math:`j`. 
+
+.. note::
+   
+   :math:`\sum^J_{j=1} A^T_{fj} r_j \sim 0` and rearrange the terms and you get *equation 8*.
+
+Next, ML estimators for :math:`T_j` and :math:`\sigma^2` are straightforward:
+
+.. math::
+   
+   \hat{T_j} = J^{-1} \sum^J_{j=1} \theta_{rj} \theta^T_{rj} \quad \text{ equation 9}
+
+.. math::
+   
+   \hat{\sigma^2} = N^{-1} \sum^J_{j=1} \hat{r^T_j} \hat{r_j} = N^{-1} \sum^J_{j=1} \big( Y_j - A_{fj} \hat{\theta_f} - A_{rj} \theta_{rj} \big)^T \big( Y_j - A_{fj} \hat{\theta_{f}} - A_{rj} \theta_{rj} \big) \quad \text{ equation 10}
+
+where :math:`N = \sum^J_{j=1} n_j`.
+
+.. note::
+   
+   This reasoning defines certain complete-data sufficent statistics (CDSS), that is, statistics that would be sufficient to estimate :math:`\theta_f, T, \text{ and } \sigma^2` if the complete data were observed. These are:
+
+   .. math::
+
+      \sum^J_{j=1} A^T_{fj} A_{rj} \theta_{rj}, \sum^J_{j=1} \theta_{rj} \theta^T_{rj}, \sum^J_{j=1} Y^T_j A_{rj} \theta_{rj}, \sum^J_{j=1} \theta^T_{rj} A^T_{rj} A_{rj} \theta_{rj} \quad \text{ equation 11}.
+
+E step
+~~~~~~
+
 References
 ----------
 

From 0b3b2902ab6564b5038788d825db4fdc1a3123c3 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Fri, 11 Oct 2024 15:42:44 -0500
Subject: [PATCH 10/22] ht/equation fixes

---
 h2o-docs/src/product/data-science/hglm.rst | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 549dfa0ea8d2..529cc198c9d8 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -146,9 +146,9 @@ We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_
 
 where:
 
-- :math:`Y = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_{1}1} \\ y_{12} \\ y_{22} \\ \vdots \y_{n_{2}2} \\ \vdots \\ y_{1J} \\ y_{2J} \\ \vdots \\ y_{n_{J}J} \\\end{bmatrix}` is a :math:`n(= \sum^J_{j=1} n_j)` by 1 vector where :math:`n` is the number of all independent and identically distributed (i.i.d.) observations across all clusters;
+- :math:`Y = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_{1}1} \\ y_{12} \\ y_{22} \\ \vdots \\ y_{n_{2}2} \\ \vdots \\ y_{1J} \\ y_{2J} \\ \vdots \\ y_{n_{J}J} \\\end{bmatrix}` is a :math:`n(= \sum^J_{j=1} n_j)` by 1 vector where :math:`n` is the number of all independent and identically distributed (i.i.d.) observations across all clusters;
 - :math:`X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_J \\\end{bmatrix}` where :math:`X_j = \begin{bmatrix} 1 & x_{11j} & x_{21j} & \cdots & x_{(p-1)1j} \\ 1 & x_{12j} & x_{22j} & \cdots & x_{(p-1)2j} \\ 1 & x_{13j} & x_{23j} & \cdots & x_{(p-1)3j} \\ \vdots & \vdots & \ddots & \cdots & \vdots \\ 1 & x_{1n_{j}j} & x_{2n_{j}j} & \cdots & x_{(p-1)n_{j}j} \\\end{bmatrix} = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_j} \\\end{bmatrix}`. We are just stacking all the :math:`X_j` across all the clusters;
-- :math:`\beta = \start{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0}` is a :math:`p` by 1 fixed coefficients vector including the intercept;
+- :math:`\beta = \begin{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0} \\\end{bmatrix}` is a :math:`p` by 1 fixed coefficients vector including the intercept;
 - :math:`Z = \begin{bmatrix} Z_1 & 0_{12} & 0_{13} & \cdots & 0_{1J} \\ 0_{21} & Z_2 & 0_{23} & \cdots & 0_{2J} \\ 0_{31} & 0_{32} & Z_3 & \cdots & 0_{3J} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{J1} & 0_{J2} & 0_{J3} & \cdots & Z_J \\\end{bmatrix}` where :math:`Z_J \text{ is a } n_j \times q` matrix, and :math:`0_{ij} n_i \times q` is a zero matrix. Therefore, :math:`Z` is a :math:`n \times (J * q)` matrix containing blocks of non-zero sub-matrices across its diagonal;
 - :math:`u = \begin{bmatrix} u_{01} \\ u_{11} \\ u_{(q-1)1} \\ u_{02} \\ u_{12} \\ \vdots \\ u_{(q-1)2} \\ \vdots \\ u_{0J} \\ u_{1J} \\ \vdots \\ u_{(q-1)J} \\\end{bmatrix} \text{ is a } J * q` by 1 random effects vector and some coefficients may not have a random effect;
 - :math:`e \sim N(0, \delta^2_e I_n), u \sim N (0, \delta^2_u I_{(J*q)}) \text{ where } I_n \text{ is an } n \times n \text{ and } I_{(J*q)} \text{ is an } (J*q) \times (J*q)` identity matrix;
@@ -199,7 +199,7 @@ where:
       We included a term for the random intercept here. However, there are cases where we do not have a random intercept, and the last element of 1 will not be there for :math:`z_{ji}`.
 
 - :math:`\theta_{rj}` represents the random coefficient and is a :math:`q` by 1 vector;
-- :math:`r_j \text{ is an } n_j`by 1 vector of level-1 random effects assumed multivariate normal in distribution with 0 mean vector, covariance matrix :math:`\sigma^2 I_n_{j\times nj} \text{ where } I_n_{j \times nj}` is the identity matrix, :math:`n_j \text{ by } n_j`;
+- :math:`r_j \text{ is an } n_j` by 1 vector of level-1 random effects assumed multivariate normal in distribution with 0 mean vector, covariance matrix :math:`\sigma^2 I_{n_{j\times nj}} \text{ where } I_{n_{j \times nj}}` is the identity matrix, :math:`n_j \text{ by } n_j`;
 - :math:`j` denotes the level-2 units where :math:`j = 1,2, \cdots , J`;
 - :math:`T_j` is a symmetric positive definite matrix of size :math:`n_j \text{ by } n_j`. For simplicity, all :math:`T_j` are the same. We assume that :math:`T_j` is the same for all :math:`j = 1,2, \cdots , J`. However, we can assume that the fixed coefficients are i.i.d. :math:`\sim N (0, \sigma^2_u I_{n_j \times n_j})` for simplicity initially and keep :math:`T_j` to be symmetric positive definite matrix as the iteration continues.
 
@@ -247,6 +247,10 @@ where :math:`N = \sum^J_{j=1} n_j`.
 E step
 ~~~~~~
 
+
+
+
+
 References
 ----------
 

From e509afb3442a9357a59f41142fffc3d71c56628d Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Fri, 11 Oct 2024 16:10:48 -0500
Subject: [PATCH 11/22] ht/next section

---
 h2o-docs/src/product/data-science/hglm.rst | 31 +++++++++++++++++++---
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 529cc198c9d8..dda4f6898ff8 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -172,7 +172,7 @@ We solve for :math:`\beta, u, \delta^2_u, \text{ and } \delta^2_e`.
 Estimation of parameters using machine learning estimation via EM
 -----------------------------------------------------------------
 
-The EM algorithm addresses the problem of maximizing the likelihood by conceiving this as a problem with missing data.
+The Expectation-Maximization (EM) algorithm addresses the problem of maximizing the likelihood by conceiving this as a problem with missing data.
 
 Model setup
 ~~~~~~~~~~~
@@ -203,10 +203,10 @@ where:
 - :math:`j` denotes the level-2 units where :math:`j = 1,2, \cdots , J`;
 - :math:`T_j` is a symmetric positive definite matrix of size :math:`n_j \text{ by } n_j`. For simplicity, all :math:`T_j` are the same. We assume that :math:`T_j` is the same for all :math:`j = 1,2, \cdots , J`. However, we can assume that the fixed coefficients are i.i.d. :math:`\sim N (0, \sigma^2_u I_{n_j \times n_j})` for simplicity initially and keep :math:`T_j` to be symmetric positive definite matrix as the iteration continues.
 
-M step
+M-step
 ~~~~~~
 
-Expectation-Maximization (EM) conceives of :math:`Y_j` as the observed data with :math:`\theta_{rj}` as the missing data. Therefore, the complete data are :math:`(Y_j, \theta_{rj}), j=1, \cdots, J \text{ while } \theta_f, \sigma^2, \text{ and } T_j` are the parameters that need to be estimated. If the complete data were observed, finding the ML estimates will be simple. To estimate :math:`\theta_f`, subtract :math:`A_{rj} \theta_{rj}` from both sides of *equation 6*:
+EM conceives of :math:`Y_j` as the observed data with :math:`\theta_{rj}` as the missing data. Therefore, the complete data are :math:`(Y_j, \theta_{rj}), j=1, \cdots, J \text{ while } \theta_f, \sigma^2, \text{ and } T_j` are the parameters that need to be estimated. If the complete data were observed, finding the ML estimates will be simple. To estimate :math:`\theta_f`, subtract :math:`A_{rj} \theta_{rj}` from both sides of *equation 6*:
 
 .. math::
    
@@ -244,10 +244,33 @@ where :math:`N = \sum^J_{j=1} n_j`.
 
       \sum^J_{j=1} A^T_{fj} A_{rj} \theta_{rj}, \sum^J_{j=1} \theta_{rj} \theta^T_{rj}, \sum^J_{j=1} Y^T_j A_{rj} \theta_{rj}, \sum^J_{j=1} \theta^T_{rj} A^T_{rj} A_{rj} \theta_{rj} \quad \text{ equation 11}.
 
-E step
+E-step
 ~~~~~~
 
+While the CDSS are not observed, they can be estimated by their conditional expectations given the data :math:`Y` and parameter estimates from the previous iterations. `Dempster et al. <#references>`__ showed that substituting the expected CDSS for the M-step formulas would produce new parameter estimates having a higher likelihood than the current estimates.
 
+To find :math:`E(CDSS | Y, \theta_f, T, \sigma^2)` requires deriving the conditional distribution of the missing data :math:`\theta_r`, given :math:`Y, \theta_f, T, \sigma^2`. From *equation  6*, the joint distribution of the complete data is:
+
+.. math::
+   
+   \begin{pmatrix} Y_j \\ \theta_{rj} \\\end{pmatrix} \sim N \Bigg[ \begin{pmatrix} A_{fj} \theta_{f} \\ 0 \\\end{pmatrix} , \begin{pmatrix} A_{rj}T_jA^T_{rj} + \sigma^2 & A_{rj}T_j \\ T_j A^T_{rj} & T_j \\\end{pmatrix} \Bigg] \quad \text{ equation 12}
+
+From *equation 12*, we can dervie the conditional distribution of the missing data given the complete data as follows:
+
+.. math::
+   
+   \theta_{rj} | Y, \theta_f, T_j, \sigma^2 \sim N (\theta^*_{rj}, \sigma^2 C_j^{-1}) \quad \text{ equation 13} 
+
+with
+
+.. math::
+   
+   \theta^*_{rj} = C^{-1}_j A^T_{rj} (Y_j - A_{fj} \theta_f) \quad \text{ equation 14}
+
+   C_j = A^T_{rj} A_{rj} + \sigma^2 T^{-1}_j \quad \text{ equation 15}
+
+Generate a random positive definite matrix
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 
 

From 993ad2883873a32b28433070b04ef9e51c1f13bb Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Fri, 11 Oct 2024 17:04:16 -0500
Subject: [PATCH 12/22] ht/weekend pause

---
 h2o-docs/src/product/data-science/hglm.rst | 48 ++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index dda4f6898ff8..97e3d33f099c 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -272,6 +272,54 @@ with
 Generate a random positive definite matrix
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+To randomly generate a symmetric positive definite matrix, do the following
+
+1. Generate :math:`n \text{ by } n \text{ matrix } B` with uniform numbers from -1 to 1;
+2. Set :math:`A = 0.5 * (B + B^T)`;
+
+The final symmetric mpositive definite matrix is :math:`T = A + 2 * I_n`.
+
+Log-likelihood for HGLM
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The model for level-2 unit :math:`j` can be written as:
+
+.. math::
+   
+   Y_j = A_{fj} \theta_f + d_j = X_j \theta_f + d_j, \quad d_j \sim N(0,V_j)
+
+where:
+
+- :math:`Y_j \text{ is an } n_j` by 1 outcome vector;
+- :math:`A_{fj} / X_j = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_{j}} \\\end{bmatrix}` is a known :math:`n_j \text{ by } p` matrix of level-1 predictors and :math:`x_{ji} = \begin{bmatrix} x^1_{ji} \\ x^2_{ji} \\ \vdots \\ x^{p-1}_{ji} \\ 1 \\\end{bmatrix}`;
+- :math:`\theta_f \text{ is a } p` by 1 vector of fixed effects;
+- :math:`d_j = A_{rj} \theta_{rj} + r_j = Z_j \theta_{rj} + r_j , A_{rj} / Z_j \text{ is } n_j \text{ by } q`;
+- :math:`\theta_{rj} \sim N(0,T), \theta_{rj} \text{ is } q` by 1, :math:`T \text{ is } q \text{ by } q`;
+- :math:`r_j \sim N(0, \sigma^2 I_{n_j}), I_{n_j} \text{ is } n_j \text{ by } n_j`;
+- :math:`V_j = A_{rj} TA^T_{rj} + \sigma^2 I_{n_j} = Z_j TZ^T_j + \sigma^2 I_{n_j}, \text{ is } n_j \text{ by } n)j`.
+
+For each level-2 value :math:`j`, the likelihood can be written as:
+
+.. math::
+   
+   L(Y_j; \theta_f, \sigma^2, T_j) = (2 \pi)^{-n_{j} /2} |V_j |^{-1/2} \exp \{ -\frac{1}{2} d^T_j V^{-1}_j d_j\}
+
+The log-likelihood is:
+
+.. math::
+   
+   ll(Y_j; \theta_f, \sigma^2 , T_j) = -\frac{1}{2} \Big( n_j \log{(2 \pi)} + \log{(|V_j|)} + (Y_j - X_j \theta_f)^T V^{-1}_j (Y_j - X_j \theta_f) \Big)
+
+Since we assume that random effects are i.i.d., the total log-likelihood is just the sum of the log-likelihood for each level-2 value. Let :math:`T=T_j`:
+
+.. math::
+   
+   ll(Y; \theta_f, \sigma^2, T) \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad
+
+   = \sum^J_{j=1} \Big\{ - \frac{1}{2} \big( n_j \log{(2 \pi)} + \log{(|V_j|)} + (Y_j - X_j \theta_f)^T V^{-1}_j (Y_j - X_j \theta_f) \big) \Big\} =
+
+   -\frac{1}{2} n \log{(2 \pi)} -\frac{1}{2} \Big\{ \sum^J_{j=1} \big( \log{(|V_j|)} + (Y_j - X_j \theta_f)^T V^{-1}_j (Y_j - X_j \theta_f) \big) \Big\}
+
 
 
 References

From 79a5b389cafb3293a85e7a478f46f4b42c91e902 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Mon, 14 Oct 2024 13:11:32 -0500
Subject: [PATCH 13/22] ht/draft log-likelihood

---
 h2o-docs/src/product/data-science/hglm.rst | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 97e3d33f099c..13fb892509ba 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -320,7 +320,27 @@ Since we assume that random effects are i.i.d., the total log-likelihood is just
 
    -\frac{1}{2} n \log{(2 \pi)} -\frac{1}{2} \Big\{ \sum^J_{j=1} \big( \log{(|V_j|)} + (Y_j - X_j \theta_f)^T V^{-1}_j (Y_j - X_j \theta_f) \big) \Big\}
 
+:math:`|V_j|` can be calculated as:
 
+.. math::
+   
+   |V_j| = \Big|Z_j TZ^T_j + \sigma^2 I_{n_j} \Big| = \Big|T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j \Big| |T| \Big| \sigma^2 I_{n_j} \Big| = \sigma^2 \Big| T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j \Big| |T|
+
+where: :math:`V^{-1}_j = \frac{1}{\sigma^2} I_{n_j} - \frac{1}{\sigma^4} Z_j \Big( T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j \Big)^{-1} Z^T_j`
+
+:math:`(Y_j - X_j \theta_f)^T V_j^{-1} (Y_j - X_j \theta_f)` can be calculated as:
+
+.. math::
+   
+   (Y_j - X_j \theta_f)^T V_j^{-1} (Y_j - X_j \theta_f) = \frac{1}{\sigma^2} (Y_j - X_j \theta_f)^T (Y_j - X_j \theta_f) - \frac{1}{\sigma^4} (Y_j - X_j \theta_f)^T Z_j (T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j)^{-1} Z^T_j (Y_j - X_J \theta_f)
+
+The final log-likelihood is:
+
+.. math::
+   
+   ll(Y; \theta_f, \sigma^2, T) = - \frac{1}{2} n \log{(2 \pi)} - \frac{1}{2} \Big\{ \sum^J_{j=1} \big( \log{(|V_j|)} + \frac{1}{\sigma^2} (Y_j - X_j \theta_f)^T (Y_j - X_j \theta_f) 
+
+   - \frac{1}{\sigma^4} (Y_j - X_j \theta_f)^T Z_j \big(T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j \big)^{-1} Z^T_j (Y_j - X_j \theta_f) \big) \Big\} \quad \quad \quad \quad
 
 References
 ----------

From 3e8d086a2505934f0d53c0d66cab48451ea98115 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Mon, 14 Oct 2024 15:27:59 -0500
Subject: [PATCH 14/22] ht/alternate log-likelihood & updated refs

---
 h2o-docs/src/product/data-science/hglm.rst | 67 ++++++++++++++++++++--
 1 file changed, 63 insertions(+), 4 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 13fb892509ba..4945e2068d04 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -342,13 +342,72 @@ The final log-likelihood is:
 
    - \frac{1}{\sigma^4} (Y_j - X_j \theta_f)^T Z_j \big(T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j \big)^{-1} Z^T_j (Y_j - X_j \theta_f) \big) \Big\} \quad \quad \quad \quad
 
+Alternate log-likelihood for HGLM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+According to *equation 3*, f you write :math:`Y = X \beta + e^* \text{ where } e^* = ZU + \varepsilon`, then you will have :math:`cov(Y) = cov(e^*) = V = ZGZ^T + R = ZGZ^T + \delta^2_e I_n \text{ and } E(Y) = X\beta`. Note that:
+
+.. math::
+   
+   G_{Jq \times Jq} = \begin{bmatrix} T_j & 0_{q \times q} & 0_{q \times q} & \cdots & 0_{q \times q} \\ 0_{q \times q} & T_j & 0_{q \times q} & \cdots & 0_{q \times q} \\ 0_{q \times q} & \cdots & T_j & \cdots & 0_{q \times q} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{q \times q} & 0_{q \times q} & 0_{q \times q} & \cdots & T_j \\\end{bmatrix}
+
+The log-likelihood is:
+
+.. math::
+   
+   l(\theta_f, V) = - \frac{1}{2} \Big\{ n \log{(n \pi)} + \log{(|V|)} + (Y - X \theta_f)^T V^{-1} (Y -X \theta_f) \Big\} \quad \text{ equation 16}
+
+.. note::
+   
+   You need to find :math:`|V| \text{ and } V^{-1}` (which are gigantic matrices). Therefore, you need to use the matrix determinant lemma to calculate :math:`|V|` and the Woodbury matrix identity to calculate :math:`V^{-1}`.
+
+Matrix determinant lemma
+''''''''''''''''''''''''
+
+From `[5] <#references>`__, :math:`|V|` can be calculated as:
+
+.. math::
+   
+   |V| = |ZGZ^T + R| = |G^{-1} + Z^T R^{-1}Z| |G| |R| = \sigma^2_e \Big| G^{-1} + \frac{1}{\sigma^2_e} Z^T Z \Big| |G|
+
+Woodbury matrix identity
+''''''''''''''''''''''''
+
+From `[6] <#references>`__, :math:`V^{-1}` can be calculated as:
+
+.. math::
+   
+   V^{-1} = R^{-1} - R^{-1} Z(G^{-1} + Z^T R^{-1}Z)^{-1} Z^T R^{-1} = \frac{1}{\sigma^2_e} I_{n \times n} - \frac{1}{\sigma^4_e} Z \Big( G^{-1} + {1}{\sigma^2_e} Z^T Z \Big)^{-1} Z^T
+
+Combine the Woodbury matrix identity and matrix determinant lemma
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Substitute these two equations into *equation 16* to be able to calculate the log-likelihood. Let's walk through the most difficult part to calculate: :math:`(Y - X\theta_f)^T V^{-1} (Y - X\theta_f)`:
+
+.. math::
+   
+   (Y - X\theta_f)^T V^{-1} (Y - X \theta_f) = \frac{1}{\sigma^2_e} (Y - X \theta_f)^T (Y - X\theta_f) - \frac{1}{\sigma^4_e} (Y - X\theta_f)^T Z \Big( G^{-1} + \frac{1}{\sigma^2_e} Z^T Z \Big)^{-1} Z^T (Y - X \theta_f)
+
+During the calculation process, you need to calculate the following:
+
+- :math:`(Y - X\theta_f)^T (Y - X\theta_f) = \sum^n_{i=1} (y_i - x^T_i \theta_f)^2`;
+- :math:`(Y - X\theta_f)^T Z = \Big[ \sum^{n_1}_{i=1} \big( y_{1i} - x^T_{1i} \theta_f \big) z^T_{1i} \sum^{n_2}_{i=1} \big( y_{2i} - x^T_{2i} \theta_f \big) z^T_{2i} \sum^{n_3}_{i=1} \big( y_{3i} - x^T_{3i} \theta_f \big) z^T_{3i} \cdots \sum^{n_J}_{i=1} \big( y_{Ji} - X^T_{Jj} \theta_f \big) z^T_{Jj} \Big]`;
+- :math:`Z^TZ = \begin{bmatrix} \sum^{n1}_{i=1} z_{1i} z^T_{1i} & 0_{q \times q} & \cdots & 0_{q \times q} \\ 0_{q \times q} & \sum^{n_2}_{i=1} z_{2i} z^T_{2i} & \cdots & 0_{q \times q} \\ \vdots & \vdots & \ddots & \vdots \\ 0_{q \times q} & 0_{q \times q} & 0_{q \times q} & \sum^{n_J}_{i=1} z_{Ji}Z^T_{Ji} \\\end{bmatrix}`
+   
+   - where :math:`\sum^J_{i=1} n_i = n`.
+
+
 References
 ----------
 
-David Ruppert, M. P. Wand and R. J. Carroll, Semiparametric Regression, Chapter 4, Cambridge University Press, 2003.
+[1] David Ruppert, M. P. Wand and R. J. Carroll, Semiparametric Regression, Chapter 4, Cambridge University Press, 2003.
+
+[2] Stephen w. Raudenbush, Anthony S. Bryk, Hierarchical Linear Models Applications and Data Analysis Methods, Second Edition, Sage Publications, 2002.
+
+[3] Rao, C. R. (1973). Linear Statistical Inference and Its Applications. New York: Wiley. 
 
-Stephen w. Raudenbush, Anthony S. Bryk, Hierarchical Linear Models Applications and Data Analysis Methods, Second Edition, Sage Publications, 2002.
+[4] Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Seires B, 39, 1-8.
 
-Rao, C. R. (1973). Linear Statistical Inference and Its Applications. New York: Wiley. 
+[5] Matrix determinant lemma: https://en.wikipedia.org/wiki/Matrix_determinant_lemma.
 
-Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Seires B, 39, 1-8.
\ No newline at end of file
+[6] Woodbury matrix identity: https://en.wikipedia.org/wiki/Woodbury_matrix_identity.
\ No newline at end of file

From bbae761df0f57ea3bfd025413dcf199929102b74 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Mon, 14 Oct 2024 15:56:15 -0500
Subject: [PATCH 15/22] ht/pulling hglm from algo-params & glm page

---
 .../product/data-science/algo-params/hglm.rst |  81 ------
 h2o-docs/src/product/data-science/glm.rst     | 253 ------------------
 h2o-docs/src/product/data-science/hglm.rst    |  17 +-
 3 files changed, 13 insertions(+), 338 deletions(-)
 delete mode 100644 h2o-docs/src/product/data-science/algo-params/hglm.rst

diff --git a/h2o-docs/src/product/data-science/algo-params/hglm.rst b/h2o-docs/src/product/data-science/algo-params/hglm.rst
deleted file mode 100644
index 24cbf686b56e..000000000000
--- a/h2o-docs/src/product/data-science/algo-params/hglm.rst
+++ /dev/null
@@ -1,81 +0,0 @@
-``HGLM``
---------
-
-- Available in: GLM
-- Hyperparameter: no
-
-Description
-~~~~~~~~~~~
-
-Generalized Linear Models (GLM) estimate regression models for outcomes following exponential distributions. Hierarchical GLM (HGLM) fits generalized linear models with random effects, where the random effect can come from a conjugate exponential-family distribution (for example, Gaussian). HGLM allows you to specify both fixed and random effects, which allows fitting correlated to random effects as well as random regression models. 
-
-HGLM produces estimates for fixed effects, random effects, variance components and their standard errors. It also produces diagnostics, such as variances and leverages. 
-
-The ``HGLM`` option allows you to build a hierarchical generalized linear model. This option is disabled by default.
-
-**Note**: This initial release of HGLM supports only Gaussian for ``family`` and ``rand_family``.
-
-Related Parameters
-~~~~~~~~~~~~~~~~~~
-
-- `random_columns <random_columns.html>`__
-- `rand_family <rand_family.html>`__
-
-Example
-~~~~~~~
-
-.. tabs::
-   .. code-tab:: r R
-
-        library(h2o)
-        h2o.init()
-
-        # Import the semiconductor dataset
-        h2odata <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/semiconductor.csv")
-
-        # Set the response, predictor, and random columns
-        yresp <- "y"
-        xlist <- c("x1", "x3", "x5", "x6")
-        z <- c(1)
-
-        # Convert the "Device" column to a factor
-        h2odata$Device <- h2o.asfactor(h2odata$Device)
-
-        # Train and view the model
-        m11H2O <- h2o.glm(x = xlist, 
-                          y = yresp, 
-                          family = "gaussian", 
-                          rand_family = c("gaussian"), 
-                          rand_link = c("identity"), 
-                          training_frame = h2odata, 
-                          HGLM = TRUE, 
-                          random_columns = z, 
-                          calc_like = TRUE)
-        print(m11H2O)
-
-   .. code-tab:: python
-
-        import h2o
-        from h2o.estimators.glm import H2OGeneralizedLinearEstimator
-        h2o.init()
-
-        # Import the semiconductor dataset
-        h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/semiconductor.csv")
-
-        # Set the response, predictor, and random columns
-        y = "y"
-        x = ["x1","x3","x5","x6"]
-        z = 0
-
-        # Convert the "Device" column to a factor
-        h2o_data["Device"] = h2o_data["Device"].asfactor()
-
-        # Train and view the model
-        h2o_glm = H2OGeneralizedLinearEstimator(HGLM=True, 
-                                                family="gaussian", 
-                                                rand_family=["gaussian"], 
-                                                random_columns=[z],
-                                                rand_link=["identity"],
-                                                calc_like=True)
-        h2o_glm.train(x=x, y=y, training_frame=h2o_data)
-        print(h2o_glm)
diff --git a/h2o-docs/src/product/data-science/glm.rst b/h2o-docs/src/product/data-science/glm.rst
index 565648f504db..61a20c054ce1 100644
--- a/h2o-docs/src/product/data-science/glm.rst
+++ b/h2o-docs/src/product/data-science/glm.rst
@@ -69,18 +69,6 @@ Algorithm-specific parameters
 
 - `upload_custom_metric <algo-params/upload_custom_metric.html>`__: Upload a custom metric into a running H2O cluster.
 
-HGLM parameters
-'''''''''''''''
-
--  `HGLM <algo-params/hglm.html>`__: If enabled, then an HGLM model will be built. If disabled (default), then a GLM model will be built. 
-
--  **rand_link**: The link function for random component in HGLM specified as an array. Available options include ``identity`` and ``family_default``. 
-
-- `random_columns <algo-params/random_columns.html>`__: An array of random column indices to be used for ``HGLM``.
-
--  **startval**: The initial starting values for fixed and randomized coefficients in ``HGLM`` specified as a double array. 
-
-
 Shared GLM family parameters
 ''''''''''''''''''''''''''''
 
@@ -952,247 +940,6 @@ While not converged:
    
       i. Theta :math:`\gets` Maximum Likelihood estimate using Newton’s method with learning rate estimated using Golden section search
 
-Hierarchical GLM
-~~~~~~~~~~~~~~~~
-
-Introduced in 3.28.0.1, Hierarchical GLM (HGLM) fits generalized linear models with random effects, where the random effect can come from a conjugate exponential-family distribution (for example, Gaussian). HGLM allows you to specify both fixed and random effects, which allows fitting correlated to random effects as well as random regression models. HGLM can be used for linear mixed models and for generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects. 
-
-**Note**: The initial release of HGLM supports only the Gaussian family and random family.
-
-Gaussian Family and Random Family in HGLM
-'''''''''''''''''''''''''''''''''''''''''
-
-To build an HGLM, we need the hierarchical log-likelihood (h-likelihood) function. The h-likelihood function can be expressed as (equation 1):
-
-.. math::
-
- h(\beta, \theta, u) = \log(f (y|u)) + \log (f(u))
-
-for fixed effects :math:`\beta`, variance components :math:`\theta`, and random effects :math:`u`.
-
-A standard linar mixed model can be expressed as (equation 2):
-
-.. math::
-
-  y = X\beta + Zu + e
-
-where
-
- - :math:`e \text ~ N(0, I_n, \delta_e^2), u \text ~ N(0, I_k, \delta_u^2)`
- - :math:`e, u` are independent, and :math:`u` represents the random effects
- - :math:`n` is the number of i.i.d observations of :math:`y` with mean :math:`0`
- - :math:`q` is the number of values :math:`Z` can take
-
-Then rewriting equation 2 as :math:`e = X\beta + Zu - y` and derive the h-likelihood as:
-
-.. figure:: ../images/h-likelihood.png
-   :align: center
-
-where :math:`C_1 = - \frac{n}{2} \log(2\pi), C_2 = - \frac{q}{2} \log(2\pi)`
-
-In principal, the HGLM model building involves the following main steps:
-
-1. Set the initial values to :math:`\delta_u^2, \delta_e^2, u, \beta`
-2. Estimate the fixed (:math:`\beta`) and random effects (:math:`u`) by solving for :math:`\frac{\partial h}{\partial \beta} = 0, \frac{\partial h}{\partial u} = 0`
-3. Estimate variance components using the adjusted profile likelihood:
-
- .. math::
-
-   h_p = \big(h + \frac{1}{2} log \big| 2 \pi D^{-1}\big| \big)_{\beta=\hat \beta, u=\hat u}
-
- and solving for
-
- .. math::
-
-   \frac{\partial h_p}{\partial \theta} = 0
-
- Note that :math:`D` is the matrix of the second derivatives of :math:`h` around :math:`\beta = \hat \beta, u = \hat u, \theta = (\delta_u^2, \delta_e^2)`.
-
-H2O Implementation
-''''''''''''''''''
-
-In reality, Lee and Nelder (see References) showed that linear mixed models can be fitted using a hierarchy of GLM by using an augmented linear model.  The linear mixed model will be written as:
-
-.. math::
-
-  y = X\beta + Zu + e \\
-  v = ZZ^T\sigma_u^2 + R\sigma_e^2
-
-where :math:`R` is a diagonal matrix with elements given by the estimated dispersion model. The dispersion model refers to the variance part of the fixed effect model with error :math:`e`. There are cases where the dispersion model is modeled itself as :math:`exp(x_d, \beta_d)`. However, in our current version, the variance is just a constant :math:`\sigma_e^2`, and hence :math:`R` is just a scalar value. It is initialized to be the identity matrix.  The model can be written as an augmented weighted linear model:
-
-.. math::
-
-  y_a = T_a \delta + e_a
-
-where
-
-.. figure:: ../images/hglm_augmentation.png
-   :align: center
-
-Note that :math:`q` is the number of columns in :math:`Z, 0_q` is a vector of :math:`q` zeroes, :math:`I_q` is the :math:`qxq` identity matrix. The variance-covariance matrix of the augmented residual matrix is
-
-.. figure:: ../images/hglm_variance_covariance.png
-   :align: center
-
-Fixed and Random Coefficients Estimation
-''''''''''''''''''''''''''''''''''''''''
-
-The estimates for :math:`\delta` from weighted least squares are given by solving
-
-.. math::
-
-  T_a^T W^{-1} T_a \delta=T_a^T W^{-1} y_a 
-
-where 
-
-.. math::
-
-  W= V(e_a )
-
-The two variance components are estimated iteratively by applying a gamma GLM to the residuals :math:`e_i^2,u_i^2`. Because we are not using a dispersion model, there is only an intercept terms in the linear predictors. The leverages :math:`h_i` for these models are calculated from the diagonal elements of the hat matrix: 
-
-.. math::
-
- H_a=T_a (T_a^T W^{-1} T_a )^{-1} T_a^T W^{-1}
-
-Estimation of Fixed Effect Dispersion Parameter/Variance
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-A gamma GLM is used to fit the dispersion part of the model with response
-:math:`y_{d,i}=(e_i^2)⁄(1-h_i )` where :math:`E(y_d )=u_d` and :math:`u_d≡\phi` (i.e., :math:`\delta_e^2` for a Gaussian response). The GLM model for the dispersion parameter is then specified by the link function :math:`g_d (.)` and the linear predictor :math:`X_d \beta_d` with prior weights for :math:`(1-h_i )⁄2` for :math:`g_d (u_d )=X_d \beta_d`. Because we are not using a dispersion model, :math:`X_d \beta_d` will only contain the intercept term.
-
-Estimation of Random Effect Dispersion Parameter/Variance
-'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Similarly, a gamma GLM is fitted to the dispersion term :math:`alpha` (i.e., :math:`\delta_e^2` for a GLM) for the random effect :math:`v`, with :math:`y_\alpha,j = u_j^2⁄(1-h_{n+j}), j=1,2,…,q` and :math:`g_\alpha (u_\alpha )=\lambda`, where the prior weights are :math:`(1-h_{n+j} )⁄2`, and the estimated dispersion term for the random effect is given by :math:`\hat \alpha = g_α^{-1}(\hat \lambda)`.
-
-Fitting Algorithm Overview
-''''''''''''''''''''''''''
-
-The following fitting algorithm from "Generalized linear models with random effects" (Y. Lee, J. A. Nelder and Y. Pawitan; see References) is used to build our HGLM. Let :math:`n` be the number of observations and :math:`k` be the number of levels in the random effect. The algorithm that was implemented here at H2O will perform the following:
-
-1. Initialize starting values either from user by setting parameter startval or by the system if startval is left unspecified.  
-2. Construct an augmented model with response :math:`y_{aug}= {y \choose {E(u)}}`.
-3. Use a GLM to estimate :math:`\delta={\beta \choose u}` given the dispersion :math:`\phi` and :math:`\lambda`. Save the deviance components and leverages from the fitted model.
-4. Use a gamma GLM to estimate the dispersion parameter for :math:`\phi` (i.e. :math:`\delta_e^2` for a Gaussian response).
-5. Use a similar GLM as in step 4 to estimate :math:`\lambda` from the last :math:`k` deviance components and leverages obtained from the GLM in step 3.
-6. Iterate between steps 3-5 until convergence. Note that the convergence measure here is either a timeout event or the following condition has been met: :math:`\frac {\Sigma_i{(\text{eta}. i - \text{eta}.o)^2}} {\Sigma_i(\text{eta}.i)^2 \text{<} 1e - 6}`.
-
-A timeout event can be defined as the following:
-
-1. Maximum number of iterations have been reached
-2. Model building run time exceeds what is specified in ``max_runtime_secs``
-3. A user has clicked on stop model button or similar from Flow.
-
-For families and random families other than Gaussian, link functions are used to translate from the linear space to the model the mean output.  
-
-Linear Mixed Model with Correlated Random Effect
-''''''''''''''''''''''''''''''''''''''''''''''''
-
-Let :math:`A` be a matrix with known elements that describe the correlation among the random effects. The model is now given by:
-
-.. figure:: ../images/hglm_linear_mixed_model1.png
-   :align: center
-
-where :math:`N` is normal distribution and :math:`MVN` is multi-variable normal. This can be easily translated to:
-
-.. figure:: ../images/hglm_linear_mixed_model2.png
-   :align: center
-
-where :math:`Z^* = ZL` and :math:`L` is the Cholesky factorization of :math:`A`. Hence, if you have correlated random effects, you can first perform the transformation to your data before using our HGLM implementation here.
-
-HGLM Model Metrics
-''''''''''''''''''
-
-H2O provides the following model metrics at the end of each HGLM experiment:
-
-- fixef: fixed effects coefficients
-- ranef: random effects coefficients
-- randc: vector of random column indices
-- varfix: dispersion parameter of the mean model
-- varranef: dispersion parameter of the random effects
-- converge: true if algorithm has converge, otherwise false
-- sefe: standard errors of fixed effects
-- sere: standard errors of random effects
-- dfrefe: deviance degrees of freedom for the mean part of model
-- sumvc1: estimates and standard errors of linear predictor in the dispersion model
-- summvc2: estimates and standard errors of the linear predictor for the dispersion parameter of the random effects
-- likelihood: if ``calc_like`` is true, the following four values are returned:
-
-   - hlik: log-h-likelihood;
-   - pvh: adjusted profile log-likelihood profiled over the random effects;
-   - pbvh: adjusted profile log-likelihood profiled over fixed and random effects;
-   - caic: conditional AIC.
-
-- bad: row index of the most influential observation.
-
-Mapping of Fitting Algorithm to the H2O-3 Implementation
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-This mapping is done in four steps:
-
-1. Initialize starting values by the system.
-2. Estimate :math:`\delta =` :math:`\beta \choose u`.
-3. Estimate :math:`\delta_e^2(\text {tau})`.
-4. Estimate :math:`\delta_u^2(\text {phi})`.
-
-**Step 1**: Initialize starting values by the system.
-
-Following the implementation from R, when a user fails to specify starting values for psi, :math:`\beta`, :math:`\mu`, :math:`\delta_e^2`, :math:`\delta_u^2`, we will do it for the users as follows: 
-
- 1. A GLM model is built with just the fixed columns and response.
- 2. Next init_sig_e(:math:`\delta_e^2`)/tau is set to 0.6*residual_deviance()/residual_degrees_of_freedom().
- 3. init_sig_u(:math:`\delta_u^2`) is set to 0.66*init_sig_e.
- 4. For numerical stability, we restrict the magnitude to init_sig_e and init_sig_u to >= 0.1.
- 5. Set phi = vector of length number of random columns of value init_sig_u/(number of random columns).
- 6. Set :math:`\beta` to the GLM model coefficients, :math:`\mu` to be a zero vector.
- 7. Set psi to be a zero vector.
-
-**Step 2**: Estimate :math:`\delta =` :math:`\beta \choose u`.
-
-Given the current values of :math:`\delta_e^2, \delta_u^2`, we will solve for :math:`\delta =` :math:`\beta \choose u`. Instead of solving :math:`\delta` from :math:`T_a^T W^{-1} T_a \delta=T_a^T W^{-1} y_a`, a different set of formulae are used. A loop is used to solve for the coefficients:
-
- 1. The following variables are generated:
-
-  - :math:`v.i= g_r^{-1} (u_i)` where :math:`u_i` are the random coefficients of the random effects/columns and :math:`g_r^{-1}` can be considered as the inverse link function.
-  - :math:`tau` is a vector of length number of data containing init.sig.e;
-  - :math:`eta.i=X_i \beta+offset` and store the previous :math:`eta.i` as :math:`eta.o`.
-  - :math:`mu.i=g^{-1} (eta.i)`.
-  - dmu_deta is derivative of :math:`g^{-1} (eta.i)` with respect to :math:`eta.i`, which is 1 for identity link.
-  - :math:`z_i=eta.i-offset+(y_i-mu.i)/\text {dmu_deta}`
-  - :math:`zmi= \text{psi}`
-  - :math:`augZ =` :math:`zi \choose zmi`.
-  - du_dv is the derivative of :math:`g_r^{-1} (u_i)` with respect to :math:`v.i.`  Again, for identity link, this is 1.
-  - The weight :math:`W =` :math:`wdata \choose wpsi` where :math:`wdata = \frac {d \text{mu_deta}^2}{\text {prior_weight*family}\$\text{variance}(mu.i)*tau}` and :math:`wpsi = \frac {d \text{u_dv}^2}{\text {prior_weight*family}\$\text{variance(psi)*phi}}`
-
- 2. Finally the following formula is used to solve for the parameters: :math:`augXZ \cdot \delta=augZW` where :math:`augXZ=T_a \cdot W` and :math:`augZW=augZ \cdot W`:
-
-  - Use QR decomposition to augXZ and obtain: :math:`QR \delta = augZW`.
-  - Use backward solve to obtain the coefficients :math:`\delta` from :math:`R \delta = Q^T augZW`.
-  - Calculate :math:`hv=\text{rowsum}(Q)` of length n+number of expanded and store in returnFrame.
-  - Calculate :math:`dev =` :math:`prior weight*(y_i-mu.i)^2 \choose (psi -u_i )^2` of length n+number of expanded random columns and store in returnFrame.
-  - Calculate :math:`resid= \frac {(y-mu.i)} {\sqrt \frac {sum(dev)(1-hv)}{n-p}}` of length n and store in returnFrame.
-  - Go back to step 1 unless :math:`\Sigma_i(eta.i-eta.o)^2 / \Sigma_i(eta.i)^2<1e-6` or a timeout event has occurred. 
-
-**Step 3**: Estimate :math:`\delta_e^2(\text {tau})`
-
-With the newly estimated fixed and random coefficients, we will estimate the dispersion parameter for the fixed effects/columns by building a gamma GLM:
-
- 1. Generate a training frame with constant predictor column of 1 to force glm model to generate only the intercept term:
-
-  - Response column as :math:`dev/(1-hv)`.
-  - Weight column as :math:`(1-hv)/2`.
-  - Predictor column of ones.
-  - The length of the training frame is the number of data rows.
-
- 2. Build a gamma GLM with ``family=gamma`` and ``link=log``.
- 3. Set :math:`tau = \text {exp (intercept value)}`.
- 4. Assign estimation standard error and sigma from the GLM standard error calculation for coefficients.
-
-**Step 4**: Estimate :math:`\delta_u^2(\text {phi})`.
-
-Again, a gamma GLM model is used here. In addition, the error estimates are generated for each random column. Exactly the same steps are used here as in Step 3. The only difference is that we are looking at the :math:`dev,hv` corresponding to the expanded random columns/effects.
-
 .. _regularization:
 
 Regularization
diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 4945e2068d04..f7149064baba 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -314,7 +314,7 @@ Since we assume that random effects are i.i.d., the total log-likelihood is just
 
 .. math::
    
-   ll(Y; \theta_f, \sigma^2, T) \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad
+   ll(Y; \theta_f, \sigma^2, T) \\
 
    = \sum^J_{j=1} \Big\{ - \frac{1}{2} \big( n_j \log{(2 \pi)} + \log{(|V_j|)} + (Y_j - X_j \theta_f)^T V^{-1}_j (Y_j - X_j \theta_f) \big) \Big\} =
 
@@ -338,9 +338,7 @@ The final log-likelihood is:
 
 .. math::
    
-   ll(Y; \theta_f, \sigma^2, T) = - \frac{1}{2} n \log{(2 \pi)} - \frac{1}{2} \Big\{ \sum^J_{j=1} \big( \log{(|V_j|)} + \frac{1}{\sigma^2} (Y_j - X_j \theta_f)^T (Y_j - X_j \theta_f) 
-
-   - \frac{1}{\sigma^4} (Y_j - X_j \theta_f)^T Z_j \big(T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j \big)^{-1} Z^T_j (Y_j - X_j \theta_f) \big) \Big\} \quad \quad \quad \quad
+   ll(Y; \theta_f, \sigma^2, T) = - \frac{1}{2} n \log{(2 \pi)} - \frac{1}{2} \Big\{ \sum^J_{j=1} \big( \log{(|V_j|)} + \frac{1}{\sigma^2} (Y_j - X_j \theta_f)^T (Y_j - X_j \theta_f) \\ - \frac{1}{\sigma^4} (Y_j - X_j \theta_f)^T Z_j \big(T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j \big)^{-1} Z^T_j (Y_j - X_j \theta_f) \big) \Big\} \quad \quad \quad
 
 Alternate log-likelihood for HGLM
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -396,6 +394,17 @@ During the calculation process, you need to calculate the following:
    
    - where :math:`\sum^J_{i=1} n_i = n`.
 
+Complete the EM algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The complete EM algorithm is as follows:
+
+1. Initialization: randomly assign some small values to :math:`\theta_f, \sigma^2, T_j`;
+2. Estimation: estimate the CDSS:
+   
+   .. math::
+
+      E \big( \sum^J_{j=1} A^T_{fj} \theta_{rj} \theta_{rj} | Y, \theta_f, T_j, \sigma^2 \big) = \sum^J_{j=1} A^T_{fj} A_{rj} \theta^*_{rj} \\ E \big( \sum^J_{j=1} \theta_{rj} \theta^T_{rj} | Y, \theta_f, T_j, \sigma^2 \big) = \sum^J_{j=1} \theta^*_{rj} \theta^{*T}_{rj} + \sigma^2 \sum^J_{j=1} C^{-1}_j & \quad \text{ equation 17} \\ E \big( \sum^J_{j=1} r^T_j r_j \big) = \sum^J_{j=1} r^{*T}_j r^*_j + \sigma^2 \sum^J_{j=1} tr(C^{-1}_j A^T_{rj} A_{rj})
 
 References
 ----------

From 4dd844477021a3cbe79f3e9bd496b596b6192591 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Mon, 14 Oct 2024 16:28:35 -0500
Subject: [PATCH 16/22] ht/end piece

- example stand-in
- glm family image add-in for shared algo-params
---
 h2o-docs/src/product/data-science/glm.rst  |  13 +++--
 h2o-docs/src/product/data-science/hglm.rst |  57 +++++++++++----------
 h2o-docs/src/product/images/HGLM.png       | Bin 0 -> 9054 bytes
 h2o-docs/src/product/parameters.rst        |   1 -
 4 files changed, 39 insertions(+), 32 deletions(-)
 create mode 100644 h2o-docs/src/product/images/HGLM.png

diff --git a/h2o-docs/src/product/data-science/glm.rst b/h2o-docs/src/product/data-science/glm.rst
index 61a20c054ce1..8c0267e226d7 100644
--- a/h2o-docs/src/product/data-science/glm.rst
+++ b/h2o-docs/src/product/data-science/glm.rst
@@ -87,7 +87,12 @@ Shared GLM family parameters
    :scale: 5%
    :align: middle
 
-**GLM Family**: |GAM| `Generalized Additive Models <gam.html#defining-a-gam-model>`__ (GAM) |MS| `ModelSelection <model_selection.html#defining-a-modelselection-model>`__ |ANOVA| `ANOVA GLM <anova_glm.#defining-an-anova-glm-model>`__
+.. |HGLM| image:: ../images/HGLM.png
+   :alt: HGLM
+   :scale: 5%
+   :align: middle
+
+**GLM Family**: |GAM| `Generalized Additive Models <gam.html#defining-a-gam-model>`__ (GAM) |MS| `ModelSelection <model_selection.html#defining-a-modelselection-model>`__ |ANOVA| `ANOVA GLM <anova_glm.#defining-an-anova-glm-model>`__ |HGLM| `Hierarchical Generalized Linear Model <hglm.html>`__ (HGLM)
 
 -  `alpha <algo-params/alpha.html>`__: |GAM| |MS| |ANOVA| Specify the regularization distribution between L1 and L2. A value of ``1`` produces LASSO regression; a value of ``0`` produces Ridge regression. The default value of ``alpha`` is ``0`` when ``SOLVER = 'L-BFGS'``; otherwise it is ``0.5`` to specify a mixing between LASSO and Ridge regression.
 
@@ -101,7 +106,7 @@ Shared GLM family parameters
 
 -  `compute_p_values <algo-params/compute_p_values.html>`__: |GAM| |MS| |ANOVA| Request computation of p-values. P-values can be computed with or without regularization. Setting ``remove_collinear_columns`` is recommended. H2O will return an error if p-values are requested and there are collinear columns and ``remove_collinear_columns`` flag is not enabled. Note that this option is not available for ``family="multinomial"`` or ``family="ordinal"``; ``IRLSM`` solver requried. This option defaults to ``False`` (disabled).
 
--  `family <algo-params/family.html>`__: |GAM| |MS| |ANOVA| Specify the model type.
+-  `family <algo-params/family.html>`__: |GAM| |MS| |ANOVA| |HGLM| Specify the model type.
 
    -  If the family is ``gaussian``, the response must be numeric (**Real** or **Int**).
    -  If the family is ``binomial``, the response must be categorical 2 levels/classes or binary (**Enum** or **Int**).
@@ -161,7 +166,7 @@ Shared GLM family parameters
 
 -  `objective_epsilon <algo-params/objective_epsilon.html>`__: |GAM| If the objective value is less than this threshold, then the model is converged. If ``lambda_search=True``, then this value defaults to ``.0001``. If ``lambda_search=False`` and ``lambda`` is equal to zero, then this value defaults to ``.000001``. For any other value of ``lambda``, the default value of ``objective_epsilon`` is set to ``.0001``. The default value is ``-1``.
 
--  `plug_values <algo-params/plug_values.html>`__: |GAM| |MS| |ANOVA| (Applicable only if ``missing_values_handling="PlugValues"``) Specify a single row frame containing values that will be used to impute missing values of the training/validation frame.
+-  `plug_values <algo-params/plug_values.html>`__: |GAM| |MS| |ANOVA| |HGLM| (Applicable only if ``missing_values_handling="PlugValues"``) Specify a single row frame containing values that will be used to impute missing values of the training/validation frame.
 
 -  `prior <algo-params/prior.html>`__: |GAM| |MS| |ANOVA| Specify prior probability for :math:`p(y==1)`. Use this parameter for logistic regression if the data has been sampled and the mean of response does not reflect reality. This value defaults to ``-1`` and must be a value in the range (0,1).
    
@@ -169,7 +174,7 @@ Shared GLM family parameters
 
 -  `remove_collinear_columns <algo-params/remove_collinear_columns.html>`__: |GAM| |MS| Specify whether to automatically remove collinear columns during model-building. When enabled, collinear columns will be dropped from the model and will have 0 coefficient in the returned model. This option defaults to ``False`` (disabled).
 
-- **score_iteration_interval**: |MS| Perform scoring for every ``score_iteration_interval`` iteration. This option defaults to ``-1``.
+- **score_iteration_interval**: |MS| |HGLM| Perform scoring for every ``score_iteration_interval`` iteration. This option defaults to ``-1``.
 
 -  `solver <algo-params/solver.html>`__: |GAM| |MS| |ANOVA| Specify the solver to use. One of: 
    
diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index f7149064baba..037a4b90eea3 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -30,38 +30,14 @@ Algorithm-specific parameters
 
 - `random_columns <algo-params/random_columns.html>`__: An array of random column indices to be used for ``HGLM``.
 
+-  `rand_family <algo-params/rand_family.html>`__: The Random Component Family specified as an array. You must include one family for each random component. Currently only ``rand_family=["gaussisan"]`` is supported.
+
 - **random_intercept**: If enabled, will allow random component to the GLM coefficients (defaults to ``True``).
 
 - **tau_e_var_init**: Initial varience of random noise. If set, this should provide a value of > 0.0. If not set, this will be randomly set during the model building process (defaults to ``0.0``).
 
 - **tau_u_var_init**: Initial variance of random coefficient effects. If set, should provide a value > 0.0. If not set, this will be randomly set during the model building process (defaults to ``0.0``).
 
-GLM-family parameters
-~~~~~~~~~~~~~~~~~~~~~
-
--  `family <algo-params/family.html>`__: Specify the model type.
-
-   -  If the family is ``gaussian``, the response must be numeric (**Real** or **Int**).
-   -  If the family is ``binomial``, the response must be categorical 2 levels/classes or binary (**Enum** or **Int**).
-   -  If the family is ``fractionalbinomial``, the response must be a numeric between 0 and 1.
-   -  If the family is ``multinomial``, the response can be categorical with more than two levels/classes (**Enum**).
-   -  If the family is ``ordinal``, the response must be categorical with at least 3 levels.
-   -  If the family is ``quasibinomial``, the response must be numeric.
-   -  If the family is ``poisson``, the response must be numeric and non-negative (**Int**).
-   -  If the family is ``negativebinomial``, the response must be numeric and non-negative (**Int**).
-   -  If the family is ``gamma``, the response must be numeric and continuous and positive (**Real** or **Int**).
-   -  If the family is ``tweedie``, the response must be numeric and continuous (**Real**) and non-negative.
-   -  If the family is ``AUTO`` (default),
-
-      - and the response is **Enum** with cardinality = 2, then the family is automatically determined as ``binomial``.
-      - and the response is **Enum** with cardinality > 2, then the family is automatically determined as ``multinomial``.
-      - and the response is numeric (**Real** or **Int**), then the family is automatically determined as ``gaussian``.
-
--  `rand_family <algo-params/rand_family.html>`__: The Random Component Family specified as an array. You must include one family for each random component. Currently only ``rand_family=["gaussisan"]`` is supported.
-
--  `plug_values <algo-params/plug_values.html>`__: (Applicable only if ``missing_values_handling="PlugValues"``) Specify a single row frame containing values that will be used to impute missing values of the training/validation frame.
-
-
 Common parameters
 ~~~~~~~~~~~~~~~~~
 
@@ -86,11 +62,15 @@ Common parameters
       Offsets are per-row "bias values" that are used during model training. For Gaussian distributions, they can be seen as simple corrections to the response (``y``) column. Instead of learning to predict the response (y-row), the model learns to predict the (row) offset of the response column. For other distributions, the offset corrections are applied in the linearized space before applying the inverse link function to get the actual response values. 
 
 -  `score_each_iteration <algo-params/score_each_iteration.html>`__: Enable this option to score during each iteration of the model training. This option defaults to ``False`` (disabled).
-- score_values_handling
+
 -  `seed <algo-params/seed.html>`__: Specify the random number generator (RNG) seed for algorithm components dependent on randomization. The seed is consistent for each H2O instance so that you can create models with the same starting conditions in alternative configurations. This option defaults to ``-1`` (time-based random number).
+
 -  `standardize <algo-params/standardize.html>`__: Specify whether to standardize the numeric columns to have a mean of zero and unit variance. Standardization is highly recommended; if you do not use standardization, the results can include components that are dominated by variables that appear to have larger variances relative to other attributes as a matter of scale, rather than true contribution. This option defaults to ``True`` (enabled).
+
 -  `training_frame <algo-params/training_frame.html>`__: *Required* Specify the dataset used to build the model. **NOTE**: In Flow, if you click the **Build a model** button from the ``Parse`` cell, the training frame is entered automatically.
+
 -  `validation_frame <algo-params/validation_frame.html>`__: Specify the dataset used to evaluate the accuracy of the model.
+
 -  `weights_column <algo-params/weights_column.html>`__: Specify a column to use for the observation weights, which are used for bias correction. The specified ``weights_column`` must be included in the specified ``training_frame``. 
    
     *Python only*: To use a weights column when passing an H2OFrame to ``x`` instead of a list of column names, the specified ``training_frame`` must contain the specified ``weights_column``. 
@@ -406,6 +386,29 @@ The complete EM algorithm is as follows:
 
       E \big( \sum^J_{j=1} A^T_{fj} \theta_{rj} \theta_{rj} | Y, \theta_f, T_j, \sigma^2 \big) = \sum^J_{j=1} A^T_{fj} A_{rj} \theta^*_{rj} \\ E \big( \sum^J_{j=1} \theta_{rj} \theta^T_{rj} | Y, \theta_f, T_j, \sigma^2 \big) = \sum^J_{j=1} \theta^*_{rj} \theta^{*T}_{rj} + \sigma^2 \sum^J_{j=1} C^{-1}_j & \quad \text{ equation 17} \\ E \big( \sum^J_{j=1} r^T_j r_j \big) = \sum^J_{j=1} r^{*T}_j r^*_j + \sigma^2 \sum^J_{j=1} tr(C^{-1}_j A^T_{rj} A_{rj})
 
+   where: :math:`r^*_j = Y_j - A_{fj} \theta_f - A_{fj} \theta^*_{rj}, \theta^*_{rj} = C^{-1}_j A^T_{rj} (Y_j - A_{fj} \theta_f), C_j = A^T_{rj} A_{rj} + \sigma^2 T^{-1} \text{ and } \theta_f, \sigma^2, T` are based on the previous iteration or from initialization;
+
+3. Substitution: substitute the estimated CDSS from *equation 17* into the M-step forumulas (*equations 8, 9,* and *10*);
+4. Processing: feed the new estimates of :math:`\theta_f, \sigma^2, T_j` into step 2;
+5. Cycling: continue steps 2, 3, and 4 until the following stopping conditions are satisfied:
+   
+   a. Changes in the log-likelihood (*equation 16*) become sufficiently small, or
+   b. The largest change in the value of any of the parameters is sufficiently small.
+
+Examples
+--------
+
+The following are simple HGLM examples in Python and R.
+
+.. tabs::
+   .. code-tab:: python
+
+      blah
+
+   .. code-tab:: r R
+
+      blah
+
 References
 ----------
 
diff --git a/h2o-docs/src/product/images/HGLM.png b/h2o-docs/src/product/images/HGLM.png
new file mode 100644
index 0000000000000000000000000000000000000000..a21e438b9172088444de21e4ba8199fe5e322329
GIT binary patch
literal 9054
zcmeHtc{tSlzxRkN*;7%n@5#Q8EsQ-&D3N_e_8G?5g%CxOEfo{WE_=u>DwJd=`xay0
z*U0&d?qBD=@83DkAI~}0xz2MvKi4&v>HGaG@AdV5z2ADEud7aZ>cS}q1VX8)0W*L=
z2sZH_QeyDrbKh75_(kTbVd4&fP`$^02om_I7$Fd%B?qIMn43CRp>ULopfv(zV=L(G
z;tEznAadSN@YTf@W6kRA;*4~Mddu@1uYiK@_{Ty#3a42A_>n^+ZbJ=VYJV&R|B~mi
z$6#EcLPB0%UV>g?f+)0|kcf<ojF7OXkf^8tSRvr<gTz>S3n1M&z)62S3TEpLM?1J;
z98gGB{L$7nC=ZN056|yazeGa7e_!M3fp$K=1OXSab+&b}MPl5AL<B|tA|(P2#W-M`
zZT}{?k~8LSPn;d_nn0bck#_Pt-U0~Q+twb=7#;;BUA-GhtOBeeazc2qoE`qO@A%1I
zj}rP1E<9fLC;k2}Rae4b&<-}g)knGuS|300z}rj^xKqd)fj5BgUkvbT7gP!1{wKll
zni$%;{p|@b0se)nHQL?Q$j8-Ip2xt}9p!<B+y2?_e;lZSwzbCCA{0c0Ma2a~gat&z
z<%ItB>bL9U{&=D4jnPCX2#X_bOWWGo3P>ZYB?QC~A|e9PGUCz#GU9O2+cuJ7A~v>&
z-)H^W@=qtJ*?@y2L?uKeC1gazMWkdTq=kR)Iez(1TZ~X1aC^L33ZlPP{`<3kU;dxC
z`LE{shpK-y>wlA@UH_CzcrPC(Dc--}1@N!fU;GQ6L6tl(_9(Q1hBXptZO>}t;Oy*T
zjg-SL|HrGpsrO$GMgSCliNxPBdRx2xQw)N|(BpVPp^Z=|X9X2&q^C72uYoNRVT-m!
z^T`SQeZ~Jt-hUv_-_`&u`cpg${l+qd&#@>Y2!xql6Q*S3oxIqqMA;xr(@<2v`be>x
z@;&o)4o0N_pW8F$kcQsvIS6E2QH3N3@~_XoZ1BG%27;&|kfWo;5kLL2vwI|~LJ<To
z8rU=yEq`3kTJU>qrlcE;ui%-d8KY9~lQz-{rA8HNLLCz9FUrP+MFuJN8eF?eL>Xav
ziZ+tCCe`NB(=$T65d&6M`pXq#EZFHpaU1FC6`5+gy@1=vSL0_#n#Is{a@Bs@wyAWq
zv<?tL?EU*1gbX+s40fU63J*Jb7lar}OB+Kh-0|T4ees~Xx-eKe0R@Jgot@I^|BEf|
z8I1aQsi~}QtE;thYdsduiHeDx^g@_Hu~ggJ+p&&?7PS$>!)EW8a>A367;9>4C%%6_
zNli@+*28ffX+<R^!5JCfD~^`t=9Jvs#Y|02JG#1pGc&n+`uZl@v9fRP{E#fLtbcT?
z0-07?YB*OP%XxU{moA02NL;D%+2AfIDIvljE_{5%6lrYXqq+@cVq)4I51|GRecRL?
zGIYMe&^g8@B}Hn+ackxZT)NcQc#j+}AibbjP*zq}`rzx=mtUJ#-*8p`R8(5p6whZk
zK06yuM#pWnI8e+W>HM^4YiMv#|Jt>?+KGZ`Z{D2w@Zm#te*W3=I<~Eym6*{MAM?7r
zva++?nL16iemg{%jDYj!&p#D`hb?-T-ns?U<x1t~ZX(0t0`WSJ*7$hueNPl|=;C~u
z#JD{4zOnH{U0vPk)|Tqcn+(3bzWOF65A+ld&!|T;fA|ndPwDAwAf>IX4O3N}@T&iT
z?aF5uPS=cE{cQQuU{3NXGa;5VsH{vPI5>EGWhELf_IShHh)wI;w~1L;Sax@JSw%%p
zSD?n!l#~d7YuvtnI;-l-{{6D)8&Wqnx9pspYgK1MOe>H~ZEbDw$foJFOzHXl7k9Qz
zoWGTvOcXR%zIv77;Naj4`3BG0`nocZaeaN=brW}uh=|ClBUMSOLRHPi);8nWXXDzn
zjg6+ZXJQFR6*>l7kC2a_9|srLNxSxBC2w!(#l^*V<XjH8M%ZT^@&3byiOCO<-m_RQ
zqcO1uv@|pjdV2bdA>pT63;pm0h*}J(|JfYN;?hzTAiybd@`;rdRBU%aTpZoYSFhZ*
z7fqZ8iU|7p`m`!w#NytoR<BJ;)#fBA8vFa{>-T#L<B=+2Kn7H3Xy`Z)sbdaP^yZCG
z71OuHrKPo*B-G3FbfTD;m}IoL^V;Nx=gD0%*vJpYvEpC9eLMeSXQg?2X=nn6%f`qy
z0?KX;e7pN&?C8-0I-W}-ROf@vWo@oaHBV(ES{PJO>R6}7WghPL*8>wP)q77dC<e$q
zx+oK@lO*XdNkMeS(#nc88H)v0dFrvymoranBw&>q$4$u~3>hkSQE_z4<<43krMMtY
z3IyWi<!u156ORlhDk(4TYl~O#T)f%#LjKmAq<HvU-RE#8r^wAsOlVlx_R!bzN6hjU
zfPA}OPyw$n{G9suo05}(?&Ua~z3wbAfG8c{wML#4|3+Cmf`NenRz5y55fPD^H4Tn1
zit{$>hWmPf`-W{;Sz~a)gR$@F=iwh8zKh{dKO^5e+}nGvIhy5FK>-ytivkQ}XH`|z
znT6)IHj2^FQQY$Kjn^cn`Gti$(uFwI1}i64I}6jv6ciMsUB`nukr#4va+FL=&KCdx
z$j!YpT;)ODUudZghfmSfkM{MQ14oQ4cS$<-@b2#KHx3OArFOetzn(CYBHlL!1hx9|
zTw-^M<tQ7&@9Vc*IUBl|t)}<!3QyeZ(o)pJhZG@UKE}rMf)+LRr<$Xu-anwr&du%0
zt1B+11zeh!pZ~|`iYydrv%fj_RNhbWcH2|7%{f$le*XQ34`Fxi)D}r-GCjF$5*!*z
z)YIF`HxgmdS7?dq`0{99wyp%xB_}8M_}McuDJiLt*GH=pDvM+FEDjD1Z~b-(R=4W+
z=lL%x_%no4vxMEdcXFofS;#$ddK>V&L{IzY1a)9ThIS&6hnAxa93Ik}XAIw%`Jz?P
zWd}@?+)nwhzoQjlVrpu$vpf>J;}ZCsgqqn31eo{4J^CV#g&QHEdq<8=PAa;(R3NEW
zyR?PaI5-G_<gtC9T3W7{nm!+=($sk(XdW@q6wW`QmXd#~qMeha%vp;8hbU3l9VdLq
zqG*A8zO=No3qT^6-qN48wYAkAi9A;oV0qohh!qOGP`~>@$aT7fb`$(mme<r=MAaXb
z6I63Z6crc0%e|f#k9?)47R?;xvoXUr5;h~!(aNa`gIT=`KB?Sa<GFNUrX3qOcJy<{
z-rk<3s@&-M_3Nb6)Xl*s$yW;l4;R$y>(bH)pjh2v7Ff$$AnGX+5|WIe-D}_{xjhP0
zNp8DO@qFGpBVPCN`@6eC?)=!cZcmoEb4rIRS;n2%eWp#x#>S?QC1hcF`QyhZ#%m@f
z4%5pTeC6fkz<)+nQCaKT+aC}1HWHApEI~49ibzS}Q+BOQ>?^5BVQno_{oxX-*t&_h
zwLdQ^qqLL`#5^e}srRJBVSN0V>FH_Sj`Mtc($Wl|Mrl=a<{hVtS|bRrl7K^Bxub&(
z<vvGe=lAR1Kf8@L5U?$r*S>a*7Oyoqy`cXOKQ1mV@0Hq}Gm8=Nq8OZhuZ4Et_BXq|
zU26`T)VS8wwKdx9fzM%K!~i5yQ&X?B_<r0Te@A%vW)TFWE2ti-2m~ioRyF|{iNN7-
zkIfh_Uc9_{p7<q9U0qutiF<YnR|vc*Dv27sjJVYSsv$`DNr0Cpstk0UocIB~z@t3?
zD*h3wR?O0%#PC};6@z>%7CWDb5fT!r*&cL?b@cP~eVmlU|CVD34s7&P&PNoV2Y%Q`
zkEn1Ms6UVxoJVpe#Z-O12Y3N1>!np-S?>HSH*+&PoA>6;n;N-?vx-L#gz9#cUDXGh
zfq7_{m;?~~JuXA#Y>JAOSJYvXOwz7b)~W41Jw0<Pr$$Cb%=?<PrY0weJi7yhJ2baI
z?Fe-ae+~-p`}Z&cCff6Ac6L0TOM_gM76OL8Mn>_|pW#`0DQa$R#TL|tFOPuz=JU-e
z4Yn}>E2HwSE$c1x)nQ?!s4)d2V`J^yvWkju-7LD{m;f_#^R@kjLhXX+=xF)@og?79
z>!qN8gF=8PECqpUQ6b4?5C?QkPfyn<pe7@`3zDee(PpXVlIcwo6F>&|_>`_)iy42k
zPfZNX8FVVF<=*=;!ML}#=RbbxS^S$f;ypb*8o5k>1K1c=MMjdJIB|lr-yNC=fH-%1
zKCkdo?#ldp<Byekp=?I>DSME#0L-WNJ391LGdCri`m9#R-^Dt9O;L=F<IxNFaa0pl
zpa^by5*|)s;XQHE#A*8b_q3NUm4SvIOYOcg1InY7d)-FM3yoH`vPzxy)ty}z>Ch~@
z-Jbk_fH3`t4v?gYB%)`p`}%Gyek)Z+AA<O_hQm3EUS-{?_7r%XpU>56QRlx$!ou@C
z^!Tp6o*YBlUdu}K$wVQ`b3K$XYZ;lDlZr=s(dnijDFKfqUB<OuWMZ;4m(|nIpaF*b
zaD}Hg*01Ui*ZC?K%4VPjaDRJwguhZjT)c<ZaR0f46DP~;rv!d!&&2`ifE2o_pA5Kk
zEei{a)y>U#v$1dA9z-ktylJHl`=p_x^P%4|P$0*1cXa{)1pRO)b#yVJ^>pi2t^DGH
zyT=lCg_6-BdM&Fxb=E0WVUFv#PL6MeRXt8acZwZ*FSU02sH&b+G`K`Vh&6JUZiy+G
z2W04^hjMxiNaUXWe(hWj3`Tq)9*IIl0~*@8CF{9}+xjY<wh#@x1PsiT_j9<?9j}Rn
ziVES+?TUKM0%~${v6lEca6SbS)W~2d(K6t{;-d3ArY+QrmQ<?h)W;~Xy;^^4a`GAZ
zy5TDzq3C<aVxXKnJRF>yFoU2+3?e7^Z@i|dihPE}G6jc(aP{+WamjCemBC+&gn(pt
zD((>5<?>YA*x0zW3}EqFpN-50<o$SKOUr0bz+YR`ni|{zY|6mUuyc6$a(c@3yc?RM
z-<I~b7Fs@iI^9KbsQ3cSbNzxS;{NoY-_A0g{8~xD<{!t$hXQJ-nd^Z@H%6T058Y(5
zfvzks2h8`oD)+MhD`~ceT{b{5vl~=bRY5PxdF!LWJx{0Y@&glfvqEZXYtsh-nQq;e
z)nCCo4t>JBhfkGVVz3xa^8ER8&R&2^9OB|eeQPc*F0`jlvsVVpwqq5cSi<Cv41l%s
zH*ej71CANHaLd$`$WtX<keXTUvBX4Jcz9D+mjSv~H>;`0x{0%Qw9=hqZf-6%T1O8c
zN3YM!Gdo1@#=eY$xsA=sX%;g8R<!5Nap&}X{7C#cQ7F-f98&GMG{3RK!-F|o6ZA4*
z%8@$sS83l&^UTP|aK&JH^0>vs=oqDt;`-|8q0gURK-k(^e=D(#oo{MtD)!q!z9}!C
zH=i9JXPtWrimM7JmFcGQjJmjVD7ecWg-SFNn%%D+1jTp$P<#12wL#YIuJ@a=GGI(M
zw@$w<P+p#R9LfXE3Yv&JKYev^`v5pEwN@7Eyf?3bIeP6Zo0~AwfJ)fV*!c3RB?~ii
z!Opid)Np;EW#0xa2Zgui{)<xwf-=+@Y@%FiYiqGzZ&y}TZTo1k$axpNp7X~S{jc2m
zV<4>)(vkFJ5GpFF*c2;OgjQ~Dt}CvCt#u2EEzmRE2d?E*q}qQMLNZY6SCMYoMGag#
zw>(lE>-43qZG38qqll4xD<V3&e_?Rb3*7M&7v=Hf$&=WX2$CRqzio}G(2q{P<R7WK
zbL&QJN=izq?){!6P)<v>&-Bk>7{u-M3j>+Yog=ohv(t1^^xvh)$;mOe#UZ~2cOrKE
zMX{CUBoVD`ZJ|j?NjA-ij5q*fYG`yxD)N4hfS@2%Vcq_vwl7<O+F;17uA$LF7t-I@
zsNA<P2K40L%kfO^*lNBa6|}wWX>{YprIE19=H})R%ts)|OXhiWlX<i0*tdS{)E`B`
zn;*R!8?!XPM~~&vem_fcm1UiO<#f{3`k(twi-Tp_1_BLN5`(jbH51xX6onQ_aqaEx
zyWhSBa2Ii~dqhS>B^-pj4G(XoF37lb^Cl}hyE0l4FgAN<=Xf){V>e1jNTk)&nDt!%
zCHxFP!hOJn#&_4IVi$}H%=T)X=H9(~7b{dPAuetO0zTG{KsrtP9SGb@0s;xw6+su1
zHM*#hE&-fclDP>O-U$dX<pm<FE>C4N;4f)uX-a5$%_jT?KE9aiE92uqpnW5s57RhQ
zI682jo}C4y>fyuC%1W6h0=I}k3#{oRE-87ck}csEgqmjQVjBUOwariKo)fl$96d8U
zqGe<p$~yd*^1i#9j*X42By;2N$IcWWez6Oh(7NszN1)V1x1M4`*~jc_uZrwD*_D`R
zbH=`ZPPDx<R`X-I8js0FKX%vPoZQ@63fo?8zc50l9WGE`>^7w~FgSP$7uF&U`Y}x^
zn9LA0h*KKNJ6j3UzOlz;E;{glu&k`Cq|epE<HHcDULc!;eXH7a;)E_1YTE;sl#(J(
zYA4Xe8GvHnTUA}X*2<I8V!;Eb72vdV9kekH_ah_4=GR0;MO}}6-g(6wSBD4Y#N_15
zPJtMRu#}WK9uAmZ-M^}%sY#}rqTmt_FDoy%X@10{+&}%Ridne02!KW(Q8+a}ACaEU
zIl85L7*?FQ8LhBS3L;JHQ*pMJza2+h5xkiS<a4Wq9pL@Dt4-7=F|hQ2Kj?*0bZ1{*
z7$E!;3+Q7jTvhh)kf36g<3;2L1wougs;;+~!W`*YSWq2X)J#xLKQGwn5WrR8piL`T
z@jm(pa<a&0<2IWUZH|exH5-US)Z;SHWl(Gr5(3NuEEyp#F3!r$tv2AxZ(Q&mj4$?T
zyt?F6?kxkCxoynoguK?l$D(Fyd;7=MIG%g?Qh`7H5eP&>XXmwn=P<{(l$2;NW*{@~
ze%vk7KxTf582T)^V$@d~SNamD@#x^dzb#qDaNv2EFmhNLuo%7(;%B^_Ec~Cpd{G5m
zc!z{0Gk(iZ1&RX-HC>mC(VLu{3<d?KW@pq7kbBzl0#ZRi!TV8BA3!rJjA)_B7-FOI
zSR4?EPe>p$$;-<lFD@=t?qhcY{4;iaW8+?Z8pTCv8d_Rf-b<I1uUrZ0D9gZyLSOzZ
zx0$x{M8w36V7xQ;q~$%}#AGHsQc|GdSiASQ%(uyK8Lw;Wes{{x(BR+`6APeyWCx((
z<KyFfxYZx^ll6k4q9TaAG|-HLayo9L+X$$6>T!|+pddcty@*e!xN`J&EwdW$PF5$q
zr@{neWMtBQ+m|Js2OuaED&!&wWO#VE$al*T6#EH~&w%GYE?YQz4K<BSjZI7p&oeGO
zHJ|B)haz`xL>IHXZzz}c6INMSM$qbtw20Ky)U1Muz+{xvXhQ<OvHs1Q_uXgPCjh~+
zcX3I0lbMmhm8px3nCs4h5594DDr|j%nCc?$TlbeNpW55ijEv5_@LQItb*iqej;fm@
z-(n)fUbVP$NB*3U#Y4a;4yJh&MF!l6DHtG@mX@H^J+ZsOAyj{`u034qN8j7qn_W<#
z1}Ge0$elesq07ri3T8Qm8#it+h}aOBnVF5x%-{z(4=>7;SO)&Y({UL`M@P_~8G~s-
zOKa=3D#{Q|Ev;D0U=3(A2BJI^!+yaD8OT~-iYvapRq3rNV1h#akWq>X&ul)ey?Ni(
zcD2`z^LXa-E>&S4#{ouQ=U%BtFT3mL=}~97Crn9Tx*pEYJMRAYF;)6i!LaJ=^_c3d
z%MN;8UQ)4faa*}EKI@!X+S-&ooz%ZTMLYvEeUaw`8bPclAXVLf2Q@S`C&U)!9)Khp
zm~AdcL!~I?zUw~k?(Dn^EIkQg=zc`Rd}7L2o%<}+sP`1*nX_m4Mo#4zflEexW|^)g
zNiM0^mxBQWKx8px<D+Px+y0m=(-tF$aCJ8~5ru<Q6>o2E-jPSwFc=K0ygbuN{gJ{2
zM#cnWxKTk-5zW1O_p~aMf>+~Z9R`65MV$IBPtVL;8i{y^mGe0b93eXDyC4uI^Gxnn
zwV&zj@8`IDnF2(OnukYO*AnT@-fRj7FE8ocyLYp*v-PW(*bPVtvHMYg<uZWIh#hqK
zz;o#s*^;yL^c(>AN^gHr4hX30TB0HQKxQ6Dftfc3-Ys?+l5Be_LI{R~u_>FG@#W=`
z7Zn38g4sY5Xk=p&6SFYdu=LYlbz<GOU=PqAYXCl8RbRyxpO^^${6t_COxFP4j|ylO
zi7CdLd<C5qkY8SY?pC?;yOg5?JN!(qux9-fh}PVQ<`NJ?lYsq8V{kNr=2bvGSUMU^
zrVn<E4n}MJioI4WC|DFOZp?md2B9wEHVFg7jxi$~G@AelCh$AmCW$Wb^GBi0*d->9
zfw=Q1J1-A3IhO!y1%F+m0_l4d3q4E6qx*j2t0Fbvy5H#hL4UeDTp6;yjt1x82c}>k
zgIioYvsu9^>Oz4zd=|wZ5?pv8hPmy_7ZLQd#={E|r(V2x0SMlKr*Ro7fZ&R@HaR{W
ze$9aaxncqHg|}`~Vcd!epsTR?HCJCj|35l@-&F+vx!~w%srN1g_|m`{xZvl{71Kdr
zbp;0zX2n1lXrK@ZQYHv-C>4ZInIwoHjj$o;U$6gVmj8ipxC_~eeR2BrWl|&vO^Bwd
LF04S=D)`?3B0vth

literal 0
HcmV?d00001

diff --git a/h2o-docs/src/product/parameters.rst b/h2o-docs/src/product/parameters.rst
index e785975a42fc..edd607a41b6d 100644
--- a/h2o-docs/src/product/parameters.rst
+++ b/h2o-docs/src/product/parameters.rst
@@ -53,7 +53,6 @@ This Appendix provides detailed descriptions of parameters that can be specified
    data-science/algo-params/fold_column
    data-science/algo-params/gainslift_bins
    data-science/algo-params/gradient_epsilon
-   data-science/algo-params/hglm
    data-science/algo-params/histogram_type
    data-science/algo-params/huber_alpha
    data-science/algo-params/ignore_const_cols

From c76d3a7d5cd8a9b4175f886f769a4cf254c1131a Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Tue, 15 Oct 2024 14:03:29 -0500
Subject: [PATCH 17/22] ht/requested updates

---
 h2o-docs/src/product/data-science/hglm.rst | 276 ++++++++++-----------
 1 file changed, 130 insertions(+), 146 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 037a4b90eea3..712286e3cf8f 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -3,6 +3,71 @@ Hierarchical Generalized Linear Model (HGLM)
 
 HGLM fits generalized linear models with random effects, where the random effect can come from a conjugate exponential-family distribution (for example, gaussian). HGLM lets you specify both fixed and random effects, which allows for fitting correlation to random effects as well as random regression models. HGLM can be used for linear mixed models and for generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects.
 
+Introduction
+------------
+
+Hierarchical linear models (HLM) are used in situations where measurements are taken with clusters of data and there are effects of the cluster that can affect the coefficient values of GLM. 
+
+For instance, if we measure the performance of students from multiple schools along with other predictors like family annual incomes, students' health, school type (public, private, religious, etc.), and etc., we would suspect that students from the same school will have similar performances than students from different schools. Therefore, we can denote a coefficient for predictor :math:`m \text{ as } \beta_{mj}` where :math:`j` denotes the school index in our example. :math:`\beta_{0j}` denotes the intercept associated with school :math:`j`.
+
+A level-1 HLM can be expressed as:
+
+.. math::
+   
+   y_{ij} = \beta_{0j} + \sum_{m=1}^{p-1} x_{mij} \beta{mj} + \varepsilon_{ij} \quad \text{ equation 1}
+
+The level-2 model can be expressed as:
+   
+.. math::
+   
+   \beta_{0j} = \beta_{00} + u_{0j}, \beta_{mj} = \beta_{m0} + u_{mj} \quad \text{ equation 2}
+
+where:
+
+- :math:`j(=[1,2,...,J])` denotes the cluster (level-2 variable) the measurement is taken from (e.g. the school index);
+- :math:`i(=1,2,...,n_j)` denotes the data index taken from within cluster :math:`j`;
+- :math:`\beta_{00}` is the fixed intercept;
+- :math:`\beta_{0j}` is the random intercept;
+- :math:`\beta_{m0}` is the fixed coefficient for predictor :math:`m`;
+- The dimension of fixed effect coefficients is :math:`p` which includes the intercept;
+- :math:`u_{mj}` is the random coefficient for predictor :math:`m`. For predictors without a random coefficient, :math:`u_{mj} = 0`;
+- The dimension of the random effect coefficients is :math:`q` which can include the intercept. Note that :math:`q \leq p`;
+- :math:`\varepsilon_{ij} \sim N(0, \delta_e^2)`;
+- :math:`u_{ij} \sim N(0, \delta_u^2)`:
+- :math:`\varepsilon_{ij}, u_{mj}` are independent;
+- :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m`.
+
+We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_{m0}, u_{mj}, \delta_e^2, \delta_u^2`. To do this, we use the standard linear mixed model expressed with vectors and matrices:
+
+.. math::
+   
+   Y = X\beta + Z u + e \quad \text{ equation 3}
+
+where:
+
+- :math:`Y = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_{1}1} \\ y_{12} \\ y_{22} \\ \vdots \\ y_{n_{2}2} \\ \vdots \\ y_{1J} \\ y_{2J} \\ \vdots \\ y_{n_{J}J} \\\end{bmatrix}` is an :math:`n(= \sum^J_{j=1} n_j)` by 1 vector where :math:`n` is the number of all independent and identically distributed (i.i.d.) observations across all clusters;
+- :math:`X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_J \\\end{bmatrix}` where :math:`X_j = \begin{bmatrix} 1 & x_{11j} & x_{21j} & \cdots & x_{(p-1)1j} \\ 1 & x_{12j} & x_{22j} & \cdots & x_{(p-1)2j} \\ 1 & x_{13j} & x_{23j} & \cdots & x_{(p-1)3j} \\ \vdots & \vdots & \ddots & \cdots & \vdots \\ 1 & x_{1n_{j}j} & x_{2n_{j}j} & \cdots & x_{(p-1)n_{j}j} \\\end{bmatrix} = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_j} \\\end{bmatrix}`. We are just stacking all the :math:`X_j` across all the clusters;
+- :math:`\beta = \begin{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0} \\\end{bmatrix}` is a :math:`p` by 1 fixed coefficients vector including the intercept;
+- :math:`Z = \begin{bmatrix} Z_1 & 0_{12} & 0_{13} & \cdots & 0_{1J} \\ 0_{21} & Z_2 & 0_{23} & \cdots & 0_{2J} \\ 0_{31} & 0_{32} & Z_3 & \cdots & 0_{3J} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{J1} & 0_{J2} & 0_{J3} & \cdots & Z_J \\\end{bmatrix}` where :math:`Z_J \text{ is an } n_j \times q` matrix, and :math:`0_{ij} n_i \times q` is a zero matrix. Therefore, :math:`Z` is an :math:`n \times (J * q)` matrix containing blocks of non-zero sub-matrices across its diagonal;
+- :math:`u = \begin{bmatrix} u_{01} \\ u_{11} \\ u_{(q-1)1} \\ u_{02} \\ u_{12} \\ \vdots \\ u_{(q-1)2} \\ \vdots \\ u_{0J} \\ u_{1J} \\ \vdots \\ u_{(q-1)J} \\\end{bmatrix} \text{ is a } J * q` by 1 random effects vector and some coefficients may not have a random effect;
+- :math:`e \sim N(0, \delta^2_e I_n), u \sim N (0, \delta^2_u I_{(J*q)}) \text{ where } I_n \text{ is an } n \times n \text{ and } I_{(J*q)} \text{ is a } (J*q) \times (J*q)` identity matrix;
+- :math:`e,u` are independent;
+- :math:`E \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\\end{bmatrix} , cov \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} G & 0 \\ 0 & R \\\end{bmatrix} , G = \delta^2_u I_{(J*q)} , R = \delta^2_e I_{n \cdot} E \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a size } (J * q + n) \text{ vector }, cov \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a } (J * q + n) \times (J * q + n)` matrix. 
+
+In addition, we also consider the following alternate form:
+
+.. math::
+   
+   Y = X\beta + e^*, e^* = Zu + e \quad \text{ equation 4}
+
+where:
+
+.. math::
+   
+   cov(e^*) = V = ZGZ^T + R = \delta^2_u ZZ^T + \delta^2_e I_n \quad \text{ equation 5}
+
+We solve for :math:`\beta, u, \delta^2_u, \text{ and } \delta^2_e`.
+
 Defining an HGLM model
 ----------------------
 Parameters are optional unless specified as *required*.
@@ -18,11 +83,11 @@ Algorithm-specific parameters
 
 - **initial_fixed_effects**: An array that contains the initial values of the fixed effects coefficient (defaults to ``None``).
 
-- **initial_random_effects**: An H2OFrame ID tht contains the initial values of the random effects coefficietn. The row names should be the random coefficient names (defaults to ``None``).
+- **initial_random_effects**: An H2OFrame ID that contains the initial values of the random effects coefficient. The row names should be the random coefficient names (defaults to ``None``).
 	
 	.. note::
 
-		If you aren't sure wha the random coefficient names are, then build the HGLM model with ``max_iterations=0`` and check out the model output field ``random_coefficient_names``. The number of rows of this frame should be the number of level 2 units. To figure this out, build the HGLM model with ``max_iterations=0`` and check out the model output field ``group_column_names``. The number of rows should equal the length of the ``group_column_names``.
+		If you aren't sure what the random coefficient names are, then build the HGLM model with ``max_iterations=0`` and check out the model output field ``random_coefficient_names``. The number of rows of this frame should be the number of level 2 units. To figure this out, build the HGLM model with ``max_iterations=0`` and check out the model output field ``group_column_names``. The number of rows should equal the length of the ``group_column_names``.
 
 - **initial_t_matrix**: An H2OFrame ID that contains the initial values of the T matrix. It should be a positive symmetric matrix (defaults to ``None``).
 
@@ -86,68 +151,6 @@ Common parameters
    -  For a regression model, this column must be numeric (**Real** or **Int**).
    -  For a classification model, this column must be categorical (**Enum** or **String**). If the family is ``Binomial``, the dataset cannot contain more than two levels.
 
-Definining an HLM
------------------
-
-Hierarchical linear models (HLM) is used in situations where measurements are taken with clusters of data and there are effects of the cluster that can affect the coefficient values of GLM. For instance, if we measure the students' performances from multiple schools along with other predictors like family annual incomes, students' health, school type (public, private, religious, etc.), and etc., we suspect that students from the same school will have similar performances than students from different schools. Therefore, we can denote a coefficient for predictor :math:`m \text{ as } \beta_{mj}` where :math:`j` denotes the school index in our example. :math:`\beta_{0j}` denotes the intercept associated with school :math:`j`.
-
-A level-1 HLM can be expressed as:
-
-.. math::
-	
-	y_{ij} = \beta_{0j} + \sum_{m=1}^{p-1} x_{mij} \beta{mj} + \varepsilon_{ij} \quad \text{ equation 1}
-
-The level-2 model can be expressed as:
-	
-.. math::
-	
-	\beta_{0j} = \beta_{00} + u_{0j}, \beta_{mj} = \beta_{m0} + u_{mj} \quad \text{ equation 2}
-
-where:
-
-- :math:`j(=[1,2,...,J])` denotes the cluster (level-2 variable) the measurement is taken from (e.g. the school index);
-- :math:`i(=1,2,...,n_j)` denotes the data index taken from within cluster :math:`j`;
-- :math:`\beta_{00}` is the fixed intercept;
-- :math:`\beta_{0j}` is the random intercept;
-- :math:`\beta_{m0}` is the fixed coefficient for predictor :math:`m`;
-- The dimension of fixed effect coefficients is :math:`p` which includes the intercept;
-- :math:`u_{mj}` is the random coefficient for predictor :math:`m`. For predictors without a random coefficient, :math:`u_{mj} = 0`;
-- The dimension of the random effect coefficients is :math:`q` which can include the intercept. Note that :math:`q \leq p`;
-- :math:`\varepsilon_{ij} \sim N(0, \delta_e^2)`;
-- :math:`u_{ij} \sim N(0, \delta_u^2)`:
-- :math:`\varepsilon_{ij}, u_{mj}` are independent;
-- :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m`.
-
-We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_{m0}, u_{mj}, \delta_e^2, \delta_u^2`. To do this, we use the standard linear mixed model expressed with vectors and matrices:
-
-.. math::
-	
-	Y = X\beta + Z u + e \quad \text{ equation 3}
-
-where:
-
-- :math:`Y = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_{1}1} \\ y_{12} \\ y_{22} \\ \vdots \\ y_{n_{2}2} \\ \vdots \\ y_{1J} \\ y_{2J} \\ \vdots \\ y_{n_{J}J} \\\end{bmatrix}` is a :math:`n(= \sum^J_{j=1} n_j)` by 1 vector where :math:`n` is the number of all independent and identically distributed (i.i.d.) observations across all clusters;
-- :math:`X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_J \\\end{bmatrix}` where :math:`X_j = \begin{bmatrix} 1 & x_{11j} & x_{21j} & \cdots & x_{(p-1)1j} \\ 1 & x_{12j} & x_{22j} & \cdots & x_{(p-1)2j} \\ 1 & x_{13j} & x_{23j} & \cdots & x_{(p-1)3j} \\ \vdots & \vdots & \ddots & \cdots & \vdots \\ 1 & x_{1n_{j}j} & x_{2n_{j}j} & \cdots & x_{(p-1)n_{j}j} \\\end{bmatrix} = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_j} \\\end{bmatrix}`. We are just stacking all the :math:`X_j` across all the clusters;
-- :math:`\beta = \begin{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0} \\\end{bmatrix}` is a :math:`p` by 1 fixed coefficients vector including the intercept;
-- :math:`Z = \begin{bmatrix} Z_1 & 0_{12} & 0_{13} & \cdots & 0_{1J} \\ 0_{21} & Z_2 & 0_{23} & \cdots & 0_{2J} \\ 0_{31} & 0_{32} & Z_3 & \cdots & 0_{3J} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{J1} & 0_{J2} & 0_{J3} & \cdots & Z_J \\\end{bmatrix}` where :math:`Z_J \text{ is a } n_j \times q` matrix, and :math:`0_{ij} n_i \times q` is a zero matrix. Therefore, :math:`Z` is a :math:`n \times (J * q)` matrix containing blocks of non-zero sub-matrices across its diagonal;
-- :math:`u = \begin{bmatrix} u_{01} \\ u_{11} \\ u_{(q-1)1} \\ u_{02} \\ u_{12} \\ \vdots \\ u_{(q-1)2} \\ \vdots \\ u_{0J} \\ u_{1J} \\ \vdots \\ u_{(q-1)J} \\\end{bmatrix} \text{ is a } J * q` by 1 random effects vector and some coefficients may not have a random effect;
-- :math:`e \sim N(0, \delta^2_e I_n), u \sim N (0, \delta^2_u I_{(J*q)}) \text{ where } I_n \text{ is an } n \times n \text{ and } I_{(J*q)} \text{ is an } (J*q) \times (J*q)` identity matrix;
-- :math:`e,u` are independent;
-- :math:`E \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\\end{bmatrix} , cov \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} G & 0 \\ 0 & R \\\end{bmatrix} , G = \delta^2_u I_{(J*q)} , R = \delta^2_e I_{n \cdot} E \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a size } (J * q + n) \text{ vector }, cov \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a } (J * q + n) \times (J * q + n)` matrix. 
-
-In addition, we also consider the following alternate form:
-
-.. math::
-   
-   Y = X\beta + e^*, e^* = Zu + e \quad \text{ equation 4}
-
-where:
-
-.. math::
-   
-   cov(e^*) = V = ZGZ^T + R = \delta^2_u ZZ^T + \delta^2_e I_n \quad \text{ equation 5}
-
-We solve for :math:`\beta, u, \delta^2_u, \text{ and } \delta^2_e`.
 
 Estimation of parameters using machine learning estimation via EM
 -----------------------------------------------------------------
@@ -249,15 +252,26 @@ with
 
    C_j = A^T_{rj} A_{rj} + \sigma^2 T^{-1}_j \quad \text{ equation 15}
 
-Generate a random positive definite matrix
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Complete the EM algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
-To randomly generate a symmetric positive definite matrix, do the following
+The complete EM algorithm is as follows:
 
-1. Generate :math:`n \text{ by } n \text{ matrix } B` with uniform numbers from -1 to 1;
-2. Set :math:`A = 0.5 * (B + B^T)`;
+1. Initialization: randomly assign some small values to :math:`\theta_f, \sigma^2, T_j`;
+2. Estimation: estimate the CDSS:
+   
+   .. math::
 
-The final symmetric mpositive definite matrix is :math:`T = A + 2 * I_n`.
+      E \big( \sum^J_{j=1} A^T_{fj} \theta_{rj} \theta_{rj} | Y, \theta_f, T_j, \sigma^2 \big) = \sum^J_{j=1} A^T_{fj} A_{rj} \theta^*_{rj} \\ E \big( \sum^J_{j=1} \theta_{rj} \theta^T_{rj} | Y, \theta_f, T_j, \sigma^2 \big) = \sum^J_{j=1} \theta^*_{rj} \theta^{*T}_{rj} + \sigma^2 \sum^J_{j=1} C^{-1}_j & \quad \text{ equation 17} \\ E \big( \sum^J_{j=1} r^T_j r_j \big) = \sum^J_{j=1} r^{*T}_j r^*_j + \sigma^2 \sum^J_{j=1} tr(C^{-1}_j A^T_{rj} A_{rj})
+
+   where: :math:`r^*_j = Y_j - A_{fj} \theta_f - A_{fj} \theta^*_{rj}, \theta^*_{rj} = C^{-1}_j A^T_{rj} (Y_j - A_{fj} \theta_f), C_j = A^T_{rj} A_{rj} + \sigma^2 T^{-1} \text{ and } \theta_f, \sigma^2, T` are based on the previous iteration or from initialization;
+
+3. Substitution: substitute the estimated CDSS from *equation 17* into the M-step forumulas (*equations 8, 9,* and *10*);
+4. Processing: feed the new estimates of :math:`\theta_f, \sigma^2, T_j` into step 2;
+5. Cycling: continue steps 2, 3, and 4 until the following stopping conditions are satisfied:
+   
+   a. Changes in the log-likelihood (*equation 16*) become sufficiently small, or
+   b. The largest change in the value of any of the parameters is sufficiently small.
 
 Log-likelihood for HGLM
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -320,81 +334,6 @@ The final log-likelihood is:
    
    ll(Y; \theta_f, \sigma^2, T) = - \frac{1}{2} n \log{(2 \pi)} - \frac{1}{2} \Big\{ \sum^J_{j=1} \big( \log{(|V_j|)} + \frac{1}{\sigma^2} (Y_j - X_j \theta_f)^T (Y_j - X_j \theta_f) \\ - \frac{1}{\sigma^4} (Y_j - X_j \theta_f)^T Z_j \big(T^{-1} + \frac{1}{\sigma^2} Z^T_j Z_j \big)^{-1} Z^T_j (Y_j - X_j \theta_f) \big) \Big\} \quad \quad \quad
 
-Alternate log-likelihood for HGLM
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-According to *equation 3*, f you write :math:`Y = X \beta + e^* \text{ where } e^* = ZU + \varepsilon`, then you will have :math:`cov(Y) = cov(e^*) = V = ZGZ^T + R = ZGZ^T + \delta^2_e I_n \text{ and } E(Y) = X\beta`. Note that:
-
-.. math::
-   
-   G_{Jq \times Jq} = \begin{bmatrix} T_j & 0_{q \times q} & 0_{q \times q} & \cdots & 0_{q \times q} \\ 0_{q \times q} & T_j & 0_{q \times q} & \cdots & 0_{q \times q} \\ 0_{q \times q} & \cdots & T_j & \cdots & 0_{q \times q} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{q \times q} & 0_{q \times q} & 0_{q \times q} & \cdots & T_j \\\end{bmatrix}
-
-The log-likelihood is:
-
-.. math::
-   
-   l(\theta_f, V) = - \frac{1}{2} \Big\{ n \log{(n \pi)} + \log{(|V|)} + (Y - X \theta_f)^T V^{-1} (Y -X \theta_f) \Big\} \quad \text{ equation 16}
-
-.. note::
-   
-   You need to find :math:`|V| \text{ and } V^{-1}` (which are gigantic matrices). Therefore, you need to use the matrix determinant lemma to calculate :math:`|V|` and the Woodbury matrix identity to calculate :math:`V^{-1}`.
-
-Matrix determinant lemma
-''''''''''''''''''''''''
-
-From `[5] <#references>`__, :math:`|V|` can be calculated as:
-
-.. math::
-   
-   |V| = |ZGZ^T + R| = |G^{-1} + Z^T R^{-1}Z| |G| |R| = \sigma^2_e \Big| G^{-1} + \frac{1}{\sigma^2_e} Z^T Z \Big| |G|
-
-Woodbury matrix identity
-''''''''''''''''''''''''
-
-From `[6] <#references>`__, :math:`V^{-1}` can be calculated as:
-
-.. math::
-   
-   V^{-1} = R^{-1} - R^{-1} Z(G^{-1} + Z^T R^{-1}Z)^{-1} Z^T R^{-1} = \frac{1}{\sigma^2_e} I_{n \times n} - \frac{1}{\sigma^4_e} Z \Big( G^{-1} + {1}{\sigma^2_e} Z^T Z \Big)^{-1} Z^T
-
-Combine the Woodbury matrix identity and matrix determinant lemma
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Substitute these two equations into *equation 16* to be able to calculate the log-likelihood. Let's walk through the most difficult part to calculate: :math:`(Y - X\theta_f)^T V^{-1} (Y - X\theta_f)`:
-
-.. math::
-   
-   (Y - X\theta_f)^T V^{-1} (Y - X \theta_f) = \frac{1}{\sigma^2_e} (Y - X \theta_f)^T (Y - X\theta_f) - \frac{1}{\sigma^4_e} (Y - X\theta_f)^T Z \Big( G^{-1} + \frac{1}{\sigma^2_e} Z^T Z \Big)^{-1} Z^T (Y - X \theta_f)
-
-During the calculation process, you need to calculate the following:
-
-- :math:`(Y - X\theta_f)^T (Y - X\theta_f) = \sum^n_{i=1} (y_i - x^T_i \theta_f)^2`;
-- :math:`(Y - X\theta_f)^T Z = \Big[ \sum^{n_1}_{i=1} \big( y_{1i} - x^T_{1i} \theta_f \big) z^T_{1i} \sum^{n_2}_{i=1} \big( y_{2i} - x^T_{2i} \theta_f \big) z^T_{2i} \sum^{n_3}_{i=1} \big( y_{3i} - x^T_{3i} \theta_f \big) z^T_{3i} \cdots \sum^{n_J}_{i=1} \big( y_{Ji} - X^T_{Jj} \theta_f \big) z^T_{Jj} \Big]`;
-- :math:`Z^TZ = \begin{bmatrix} \sum^{n1}_{i=1} z_{1i} z^T_{1i} & 0_{q \times q} & \cdots & 0_{q \times q} \\ 0_{q \times q} & \sum^{n_2}_{i=1} z_{2i} z^T_{2i} & \cdots & 0_{q \times q} \\ \vdots & \vdots & \ddots & \vdots \\ 0_{q \times q} & 0_{q \times q} & 0_{q \times q} & \sum^{n_J}_{i=1} z_{Ji}Z^T_{Ji} \\\end{bmatrix}`
-   
-   - where :math:`\sum^J_{i=1} n_i = n`.
-
-Complete the EM algorithm
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The complete EM algorithm is as follows:
-
-1. Initialization: randomly assign some small values to :math:`\theta_f, \sigma^2, T_j`;
-2. Estimation: estimate the CDSS:
-   
-   .. math::
-
-      E \big( \sum^J_{j=1} A^T_{fj} \theta_{rj} \theta_{rj} | Y, \theta_f, T_j, \sigma^2 \big) = \sum^J_{j=1} A^T_{fj} A_{rj} \theta^*_{rj} \\ E \big( \sum^J_{j=1} \theta_{rj} \theta^T_{rj} | Y, \theta_f, T_j, \sigma^2 \big) = \sum^J_{j=1} \theta^*_{rj} \theta^{*T}_{rj} + \sigma^2 \sum^J_{j=1} C^{-1}_j & \quad \text{ equation 17} \\ E \big( \sum^J_{j=1} r^T_j r_j \big) = \sum^J_{j=1} r^{*T}_j r^*_j + \sigma^2 \sum^J_{j=1} tr(C^{-1}_j A^T_{rj} A_{rj})
-
-   where: :math:`r^*_j = Y_j - A_{fj} \theta_f - A_{fj} \theta^*_{rj}, \theta^*_{rj} = C^{-1}_j A^T_{rj} (Y_j - A_{fj} \theta_f), C_j = A^T_{rj} A_{rj} + \sigma^2 T^{-1} \text{ and } \theta_f, \sigma^2, T` are based on the previous iteration or from initialization;
-
-3. Substitution: substitute the estimated CDSS from *equation 17* into the M-step forumulas (*equations 8, 9,* and *10*);
-4. Processing: feed the new estimates of :math:`\theta_f, \sigma^2, T_j` into step 2;
-5. Cycling: continue steps 2, 3, and 4 until the following stopping conditions are satisfied:
-   
-   a. Changes in the log-likelihood (*equation 16*) become sufficiently small, or
-   b. The largest change in the value of any of the parameters is sufficiently small.
-
 Examples
 --------
 
@@ -403,7 +342,52 @@ The following are simple HGLM examples in Python and R.
 .. tabs::
    .. code-tab:: python
 
-      blah
+      # Initialize H2O-3 and import the HGLM estimator:
+      import h2o
+      h2o.init()
+      from h2o.estimators import H2OHGLMEstimator as hglm
+
+      # Import the Gaussian wintercept dataset:
+      h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/hglm_test/gaussian_0GC_678R_6enum_5num_p05oise_p08T_wIntercept_standardize.gz")
+
+      # Split the data into training and validation sets:
+      train, valid = h2o_data.split_frame(ratios = [.8], seed = 1234)
+
+      # Define the predictors and response:
+      y = "response"
+      x = h2o_data.names
+      x.remove("response")
+      x.remove("C1")
+
+      # Set the random columns:
+      random_columns = ["C10","C20","C30"]
+
+      # Build and train the model:
+      hglm_model = hglm(random_columns=random_columns, 
+                        group_column = "C1", 
+                        score_each_iteration=True, 
+                        seed=12345, 
+                        em_epsilon = 0.000005)
+      hglm_model.train(x=x, y=y, training_frame=train, validation_frame=valid)
+
+      # Grab various metrics (model metrics, scoring history coefficients, etc.):
+      modelMetrics = hglm_model.training_model_metrics()
+      scoring_history = hglm_model.scoring_history(as_data_frame=False)
+      scoring_history_valid = hglm_model.scoring_history_valid(as_data_frame=False)
+      model_summary = hglm_model.summary()
+      coef = hglm_model.coef()
+      coef_norm = hglm_model.coef_norm()
+      coef_names = hglm_model.coef_names()
+      coef_random = hglm_model.coefs_random()
+      coef_random_names = hglm_model.coefs_random_names()
+      coef_random_norm = hglm_model.coefs_random_norm()
+      coef_random_names_norm = hglm_model.coefs_random_names_norm()
+      t_mat = hglm_model.matrix_T()
+      residual_var = hglm_model.residual_variance()
+      mse = hglm_model.mse()
+      mse_fixed = hglm_model.mean_residual_fixed()
+      mse_fixed_valid = hglm_model.mean_residual_fixed(train=False)
+      icc = hglm_model.icc()
 
    .. code-tab:: r R
 

From 649b5220476e95458696fd811e38a7b3c33bd3e1 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Tue, 15 Oct 2024 14:20:03 -0500
Subject: [PATCH 18/22] ht/removed intro

---
 h2o-docs/src/product/data-science/hglm.rst | 2 --
 1 file changed, 2 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 712286e3cf8f..1878024f35ce 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -1,8 +1,6 @@
 Hierarchical Generalized Linear Model (HGLM) 
 ============================================
 
-HGLM fits generalized linear models with random effects, where the random effect can come from a conjugate exponential-family distribution (for example, gaussian). HGLM lets you specify both fixed and random effects, which allows for fitting correlation to random effects as well as random regression models. HGLM can be used for linear mixed models and for generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects.
-
 Introduction
 ------------
 

From a0dd5ba8a7b00a7ff1092fd702d287ceb3a17ff4 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Mon, 21 Oct 2024 14:26:54 -0500
Subject: [PATCH 19/22] ht/requested updates

---
 .../algo-params/ignore_const_cols.rst         |  2 +-
 .../algo-params/ignored_columns.rst           |  2 +-
 .../algo-params/max_iterations.rst            |  2 +-
 .../algo-params/max_runtime_secs.rst          |  2 +-
 .../algo-params/missing_values_handling.rst   |  2 +-
 .../data-science/algo-params/model_id.rst     |  2 +-
 .../algo-params/offset_column.rst             |  2 +-
 .../data-science/algo-params/plug_values.rst  |  2 +-
 .../algo-params/random_columns.rst            |  2 +-
 .../algo-params/score_each_iteration.rst      |  2 +-
 .../product/data-science/algo-params/seed.rst |  2 +-
 .../data-science/algo-params/standardize.rst  |  2 +-
 .../algo-params/training_frame.rst            |  2 +-
 .../algo-params/validation_frame.rst          |  2 +-
 .../algo-params/weights_column.rst            |  2 +-
 .../product/data-science/algo-params/x.rst    |  2 +-
 .../product/data-science/algo-params/y.rst    |  2 +-
 h2o-docs/src/product/data-science/hglm.rst    | 47 ++++---------------
 18 files changed, 26 insertions(+), 55 deletions(-)

diff --git a/h2o-docs/src/product/data-science/algo-params/ignore_const_cols.rst b/h2o-docs/src/product/data-science/algo-params/ignore_const_cols.rst
index a940534d60b3..118ed290d864 100644
--- a/h2o-docs/src/product/data-science/algo-params/ignore_const_cols.rst
+++ b/h2o-docs/src/product/data-science/algo-params/ignore_const_cols.rst
@@ -1,7 +1,7 @@
 ``ignore_const_cols``
 ---------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Aggregator, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Aggregator, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost, Decision Tree, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/ignored_columns.rst b/h2o-docs/src/product/data-science/algo-params/ignored_columns.rst
index decc242aae60..7f2ac060544d 100644
--- a/h2o-docs/src/product/data-science/algo-params/ignored_columns.rst
+++ b/h2o-docs/src/product/data-science/algo-params/ignored_columns.rst
@@ -1,7 +1,7 @@
 ``ignored_columns``
 -------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost, Decision Tree, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/max_iterations.rst b/h2o-docs/src/product/data-science/algo-params/max_iterations.rst
index c30afb10eb3d..d791368f1798 100644
--- a/h2o-docs/src/product/data-science/algo-params/max_iterations.rst
+++ b/h2o-docs/src/product/data-science/algo-params/max_iterations.rst
@@ -1,7 +1,7 @@
 ``max_iterations``
 ------------------
 
-- Available in: GLM, GAM, HGLM, PCA, GLRM, K-Means, CoxPH
+- Available in: GLM, GAM, HGLM, PCA, GLRM, K-Means, CoxPH, ANOVAGLM, ModelSelection
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/max_runtime_secs.rst b/h2o-docs/src/product/data-science/algo-params/max_runtime_secs.rst
index bd1f3848e7b0..0f7273c0a575 100644
--- a/h2o-docs/src/product/data-science/algo-params/max_runtime_secs.rst
+++ b/h2o-docs/src/product/data-science/algo-params/max_runtime_secs.rst
@@ -3,7 +3,7 @@
 ``max_runtime_secs``
 -----------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, AutoML, XGBoost, Word2vec, Isolation Forest, Stacked Ensembles, Uplift DRF
+- Available in: GBM, DRF, Deep Learning, GLM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, AutoML, XGBoost, Word2vec, Isolation Forest, Stacked Ensembles, Uplift DRF, ANOVAGLM, ModelSelection
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/missing_values_handling.rst b/h2o-docs/src/product/data-science/algo-params/missing_values_handling.rst
index bc3d18080e82..01cb03a8482a 100644
--- a/h2o-docs/src/product/data-science/algo-params/missing_values_handling.rst
+++ b/h2o-docs/src/product/data-science/algo-params/missing_values_handling.rst
@@ -1,7 +1,7 @@
 ``missing_values_handling``
 ---------------------------
 
-- Available in: Deep Learning, GLM, GAM, HGLM
+- Available in: Deep Learning, GLM, GAM, HGLM, ANOVAGLM, ModelSelection
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/model_id.rst b/h2o-docs/src/product/data-science/algo-params/model_id.rst
index ce49357b1fd5..f1074afe934a 100644
--- a/h2o-docs/src/product/data-science/algo-params/model_id.rst
+++ b/h2o-docs/src/product/data-science/algo-params/model_id.rst
@@ -1,7 +1,7 @@
 ``model_id``
 ------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Word2Vec, Stacked Ensembles, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Word2Vec, Stacked Ensembles, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost, Decision Tree, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/offset_column.rst b/h2o-docs/src/product/data-science/algo-params/offset_column.rst
index 584d7a00c390..a7b7bce8c91e 100644
--- a/h2o-docs/src/product/data-science/algo-params/offset_column.rst
+++ b/h2o-docs/src/product/data-science/algo-params/offset_column.rst
@@ -1,7 +1,7 @@
 ``offset_column``
 -----------------
 
-- Available in: GBM, Deep Learning, GLM, GAM, HGLM, CoxPH, XGBoost, Stacked Ensembles
+- Available in: GBM, Deep Learning, GLM, GAM, HGLM, CoxPH, XGBoost, Stacked Ensembles, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 
diff --git a/h2o-docs/src/product/data-science/algo-params/plug_values.rst b/h2o-docs/src/product/data-science/algo-params/plug_values.rst
index 550662d65618..051df8dc2044 100644
--- a/h2o-docs/src/product/data-science/algo-params/plug_values.rst
+++ b/h2o-docs/src/product/data-science/algo-params/plug_values.rst
@@ -1,7 +1,7 @@
 ``plug_values``
 ---------------
 
-- Available in: GLM, GAM, HGLM
+- Available in: GLM, GAM, HGLM, ANOVAGLM, ModelSelection
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/random_columns.rst b/h2o-docs/src/product/data-science/algo-params/random_columns.rst
index 01076d51a91f..ee2e621e114c 100644
--- a/h2o-docs/src/product/data-science/algo-params/random_columns.rst
+++ b/h2o-docs/src/product/data-science/algo-params/random_columns.rst
@@ -2,7 +2,7 @@
 ------------------
 
 - Available in: HGLM
-- Hyperparameter: no
+- Hyperparameter: yes
 
 Description
 ~~~~~~~~~~~
diff --git a/h2o-docs/src/product/data-science/algo-params/score_each_iteration.rst b/h2o-docs/src/product/data-science/algo-params/score_each_iteration.rst
index 28543b73f03f..5d2426a4a7ac 100644
--- a/h2o-docs/src/product/data-science/algo-params/score_each_iteration.rst
+++ b/h2o-docs/src/product/data-science/algo-params/score_each_iteration.rst
@@ -3,7 +3,7 @@
 ``score_each_iteration``
 ------------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Isolation Forest, Extended Isolation Forest, Uplift DRF
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, XGBoost, Isolation Forest, Extended Isolation Forest, Uplift DRF, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 
diff --git a/h2o-docs/src/product/data-science/algo-params/seed.rst b/h2o-docs/src/product/data-science/algo-params/seed.rst
index 96f918db857f..1e5c20ed6afc 100644
--- a/h2o-docs/src/product/data-science/algo-params/seed.rst
+++ b/h2o-docs/src/product/data-science/algo-params/seed.rst
@@ -1,7 +1,7 @@
 ``seed``
 --------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, AutoML, XGBoost, Stacked Ensembles, Isolation Forest, Target Encoding, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, AutoML, XGBoost, Stacked Ensembles, Isolation Forest, Target Encoding, Extended Isolation Forest, Uplift DRF, AdaBoost, Decision Tree, ANOVAGLM, ModelSelection
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/standardize.rst b/h2o-docs/src/product/data-science/algo-params/standardize.rst
index 0402dea6ffa2..27bf489429ab 100644
--- a/h2o-docs/src/product/data-science/algo-params/standardize.rst
+++ b/h2o-docs/src/product/data-science/algo-params/standardize.rst
@@ -1,7 +1,7 @@
 ``standardize``
 ---------------
 
-- Available in: Deep Learning, GLM, GAM, HGLM, K-Means
+- Available in: Deep Learning, GLM, GAM, HGLM, K-Means, ANOVAGLM, ModelSelection
 - Hyperparameter: yes
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/training_frame.rst b/h2o-docs/src/product/data-science/algo-params/training_frame.rst
index 52929dfbb314..aa75fc63e10f 100644
--- a/h2o-docs/src/product/data-science/algo-params/training_frame.rst
+++ b/h2o-docs/src/product/data-science/algo-params/training_frame.rst
@@ -1,7 +1,7 @@
 ``training_frame``
 ------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Word2Vec, Stacked Ensembles, AutoML, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Word2Vec, Stacked Ensembles, AutoML, XGBoost, Aggregator, CoxPH, Isolation Forest, Extended Isolation Forest, Uplift DRF, AdaBoost, Decision Tree, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/validation_frame.rst b/h2o-docs/src/product/data-science/algo-params/validation_frame.rst
index 9ca03bc88479..66b9ce8a4a72 100644
--- a/h2o-docs/src/product/data-science/algo-params/validation_frame.rst
+++ b/h2o-docs/src/product/data-science/algo-params/validation_frame.rst
@@ -1,7 +1,7 @@
 ``validation_frame``
 --------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Stacked Ensembles, AutoML, XGBoost, Uplift DRF
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Stacked Ensembles, AutoML, XGBoost, Uplift DRF, ModelSelection
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/weights_column.rst b/h2o-docs/src/product/data-science/algo-params/weights_column.rst
index ac06c64ef4a2..f1a64286915b 100644
--- a/h2o-docs/src/product/data-science/algo-params/weights_column.rst
+++ b/h2o-docs/src/product/data-science/algo-params/weights_column.rst
@@ -1,7 +1,7 @@
 ``weights_column``
 ------------------
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, AutoML, XGBoost, CoxPH, Stacked Ensembles, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, AutoML, XGBoost, CoxPH, Stacked Ensembles, AdaBoost, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/x.rst b/h2o-docs/src/product/data-science/algo-params/x.rst
index a172239d03fa..06020ab907a0 100644
--- a/h2o-docs/src/product/data-science/algo-params/x.rst
+++ b/h2o-docs/src/product/data-science/algo-params/x.rst
@@ -1,7 +1,7 @@
 ``x``
 -----
 
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Stacked Ensembles, AutoML, XGBoost, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, PCA, GLRM, Naïve-Bayes, K-Means, Stacked Ensembles, AutoML, XGBoost, Uplift DRF, AdaBoost, Decision Tree, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 Description
diff --git a/h2o-docs/src/product/data-science/algo-params/y.rst b/h2o-docs/src/product/data-science/algo-params/y.rst
index 841f3dcacf75..46c1f07c9061 100644
--- a/h2o-docs/src/product/data-science/algo-params/y.rst
+++ b/h2o-docs/src/product/data-science/algo-params/y.rst
@@ -1,6 +1,6 @@
 ``y``
 -----
-- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, Naïve-Bayes, Stacked Ensembles, AutoML, XGBoost, Aggregator, Uplift DRF, AdaBoost
+- Available in: GBM, DRF, Deep Learning, GLM, GAM, HGLM, Naïve-Bayes, Stacked Ensembles, AutoML, XGBoost, Aggregator, Uplift DRF, AdaBoost, Decision Tree, ANOVAGLM, ModelSelection
 - Hyperparameter: no
 
 
diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 1878024f35ce..7b560d3917f4 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -35,36 +35,7 @@ where:
 - :math:`\varepsilon_{ij}, u_{mj}` are independent;
 - :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m`.
 
-We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_{m0}, u_{mj}, \delta_e^2, \delta_u^2`. To do this, we use the standard linear mixed model expressed with vectors and matrices:
-
-.. math::
-   
-   Y = X\beta + Z u + e \quad \text{ equation 3}
-
-where:
-
-- :math:`Y = \begin{bmatrix} y_{11} \\ y_{21} \\ \vdots \\ y_{n_{1}1} \\ y_{12} \\ y_{22} \\ \vdots \\ y_{n_{2}2} \\ \vdots \\ y_{1J} \\ y_{2J} \\ \vdots \\ y_{n_{J}J} \\\end{bmatrix}` is an :math:`n(= \sum^J_{j=1} n_j)` by 1 vector where :math:`n` is the number of all independent and identically distributed (i.i.d.) observations across all clusters;
-- :math:`X = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_J \\\end{bmatrix}` where :math:`X_j = \begin{bmatrix} 1 & x_{11j} & x_{21j} & \cdots & x_{(p-1)1j} \\ 1 & x_{12j} & x_{22j} & \cdots & x_{(p-1)2j} \\ 1 & x_{13j} & x_{23j} & \cdots & x_{(p-1)3j} \\ \vdots & \vdots & \ddots & \cdots & \vdots \\ 1 & x_{1n_{j}j} & x_{2n_{j}j} & \cdots & x_{(p-1)n_{j}j} \\\end{bmatrix} = \begin{bmatrix} x^T_{j1} \\ x^T_{j2} \\ x^T_{j3} \\ \vdots \\ x^T_{jn_j} \\\end{bmatrix}`. We are just stacking all the :math:`X_j` across all the clusters;
-- :math:`\beta = \begin{bmatrix} \beta_{00} \\ \beta_{10} \\ \vdots \\ \beta_{(p-1)0} \\\end{bmatrix}` is a :math:`p` by 1 fixed coefficients vector including the intercept;
-- :math:`Z = \begin{bmatrix} Z_1 & 0_{12} & 0_{13} & \cdots & 0_{1J} \\ 0_{21} & Z_2 & 0_{23} & \cdots & 0_{2J} \\ 0_{31} & 0_{32} & Z_3 & \cdots & 0_{3J} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0_{J1} & 0_{J2} & 0_{J3} & \cdots & Z_J \\\end{bmatrix}` where :math:`Z_J \text{ is an } n_j \times q` matrix, and :math:`0_{ij} n_i \times q` is a zero matrix. Therefore, :math:`Z` is an :math:`n \times (J * q)` matrix containing blocks of non-zero sub-matrices across its diagonal;
-- :math:`u = \begin{bmatrix} u_{01} \\ u_{11} \\ u_{(q-1)1} \\ u_{02} \\ u_{12} \\ \vdots \\ u_{(q-1)2} \\ \vdots \\ u_{0J} \\ u_{1J} \\ \vdots \\ u_{(q-1)J} \\\end{bmatrix} \text{ is a } J * q` by 1 random effects vector and some coefficients may not have a random effect;
-- :math:`e \sim N(0, \delta^2_e I_n), u \sim N (0, \delta^2_u I_{(J*q)}) \text{ where } I_n \text{ is an } n \times n \text{ and } I_{(J*q)} \text{ is a } (J*q) \times (J*q)` identity matrix;
-- :math:`e,u` are independent;
-- :math:`E \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\\end{bmatrix} , cov \begin{bmatrix} u \\ e \\\end{bmatrix} = \begin{bmatrix} G & 0 \\ 0 & R \\\end{bmatrix} , G = \delta^2_u I_{(J*q)} , R = \delta^2_e I_{n \cdot} E \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a size } (J * q + n) \text{ vector }, cov \begin{bmatrix} u \\ e \\\end{bmatrix} \text{ is a } (J * q + n) \times (J * q + n)` matrix. 
-
-In addition, we also consider the following alternate form:
-
-.. math::
-   
-   Y = X\beta + e^*, e^* = Zu + e \quad \text{ equation 4}
-
-where:
-
-.. math::
-   
-   cov(e^*) = V = ZGZ^T + R = \delta^2_u ZZ^T + \delta^2_e I_n \quad \text{ equation 5}
-
-We solve for :math:`\beta, u, \delta^2_u, \text{ and } \delta^2_e`.
+We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_{m0}, u_{mj}, \delta_e^2, \delta_u^2`.
 
 Defining an HGLM model
 ----------------------
@@ -75,9 +46,9 @@ Algorithm-specific parameters
 
 - **em_epsilon**: (Only available for EM method) Converge if beta/ubeta/tmat/tauEVar changes less (using L-infinity norm) than EM epsilon (defaults to ``0.001``).
 
-- **gen_syn_data**: If enabled, will add gaussian noise with variance specified in ``tau_e_var_init`` (defaults to ``False``).
+- **gen_syn_data**: If enabled, it will generate synthetic HGLM data with the fixed coefficients specified in ``initial_fixed_effects`` and the random coefficients taken from ``initial_random_effects`` or the random effects are randomly generated. In particular, it will generate the folowing output: :math:`Y_j = A_{fj} \theta_f + A_{rj} \theta_{rj} + r_j`. The gaussian noise is generated with variance that's specified in ``tau_e_var_init``. If the random coefficients are to be randomly generated, they are generated with gaussian distribution with variance that's specified in ``tau_u_var_init``.
 
-- **group_column**: The column that is categorical and used to generate the groups in HGLM (defaults to ``None``).
+- **group_column**: Specify the level-2 variable name which is categorical and used to generate the groups in HGLM (defaults to ``None``).
 
 - **initial_fixed_effects**: An array that contains the initial values of the fixed effects coefficient (defaults to ``None``).
 
@@ -85,7 +56,7 @@ Algorithm-specific parameters
 	
 	.. note::
 
-		If you aren't sure what the random coefficient names are, then build the HGLM model with ``max_iterations=0`` and check out the model output field ``random_coefficient_names``. The number of rows of this frame should be the number of level 2 units. To figure this out, build the HGLM model with ``max_iterations=0`` and check out the model output field ``group_column_names``. The number of rows should equal the length of the ``group_column_names``.
+		If you aren't sure what the random coefficient names are, then build the HGLM model with ``max_iterations=0`` and check out the model output field ``random_coefficient_names``. The number of rows of this frame should be the number of level 2 units. Check out the model output field ``group_column_names``. The number of rows should equal the length of the ``group_column_names``.
 
 - **initial_t_matrix**: An H2OFrame ID that contains the initial values of the T matrix. It should be a positive symmetric matrix (defaults to ``None``).
 
@@ -93,13 +64,13 @@ Algorithm-specific parameters
 
 - `random_columns <algo-params/random_columns.html>`__: An array of random column indices to be used for ``HGLM``.
 
--  `rand_family <algo-params/rand_family.html>`__: The Random Component Family specified as an array. You must include one family for each random component. Currently only ``rand_family=["gaussisan"]`` is supported.
+-  `rand_family <algo-params/rand_family.html>`__: Specify the distribution of the random effects. Currently only ``rand_family=["gaussisan"]`` is supported.
 
-- **random_intercept**: If enabled, will allow random component to the GLM coefficients (defaults to ``True``).
+- **random_intercept**: If enabled, will generate a random intercept as part of the random effects coefficients (defaults to ``True``).
 
-- **tau_e_var_init**: Initial varience of random noise. If set, this should provide a value of > 0.0. If not set, this will be randomly set during the model building process (defaults to ``0.0``).
+- **tau_e_var_init**: Initial variance estimate of random noise (residual noise). If set, this should provide a value of > 0.0. If not set, this will be randomly set during the model building process (defaults to ``0.0``).
 
-- **tau_u_var_init**: Initial variance of random coefficient effects. If set, should provide a value > 0.0. If not set, this will be randomly set during the model building process (defaults to ``0.0``).
+- **tau_u_var_init**: Initial variance estimate of random effects. If set, should provide a value > 0.0. If not set, this will be randomly set during the model building process (defaults to ``0.0``).
 
 Common parameters
 ~~~~~~~~~~~~~~~~~
@@ -182,7 +153,7 @@ where:
 - :math:`\theta_{rj}` represents the random coefficient and is a :math:`q` by 1 vector;
 - :math:`r_j \text{ is an } n_j` by 1 vector of level-1 random effects assumed multivariate normal in distribution with 0 mean vector, covariance matrix :math:`\sigma^2 I_{n_{j\times nj}} \text{ where } I_{n_{j \times nj}}` is the identity matrix, :math:`n_j \text{ by } n_j`;
 - :math:`j` denotes the level-2 units where :math:`j = 1,2, \cdots , J`;
-- :math:`T_j` is a symmetric positive definite matrix of size :math:`n_j \text{ by } n_j`. For simplicity, all :math:`T_j` are the same. We assume that :math:`T_j` is the same for all :math:`j = 1,2, \cdots , J`. However, we can assume that the fixed coefficients are i.i.d. :math:`\sim N (0, \sigma^2_u I_{n_j \times n_j})` for simplicity initially and keep :math:`T_j` to be symmetric positive definite matrix as the iteration continues.
+- :math:`T_j` is a symmetric positive definite matrix of size :math:`n_j \text{ by } n_j`. We assume that :math:`T_j` is the same for all :math:`j = 1,2, \cdots , J`, and it is kept to be symmetric positive definite throughout the whole model building process.
 
 M-step
 ~~~~~~

From d402f15fc01392bf801f56ccc6a725461026a2dc Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Wed, 23 Oct 2024 12:50:08 -0500
Subject: [PATCH 20/22] ht/added r examples

---
 h2o-docs/src/product/data-science/hglm.rst | 25 +++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index 7b560d3917f4..bde7fb8fb38a 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -360,7 +360,30 @@ The following are simple HGLM examples in Python and R.
 
    .. code-tab:: r R
 
-      blah
+      # Import the Gaussian wintercept dataset:
+      h2odata <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/hglm_test/gaussian_0GC_allRC_2enum2numeric_p5oise_p08T_wIntercept_standardize.gz")
+
+      # Set the predictors and response:
+      yresp <- "response"
+      predictor <- c("C2", "C3", "C4", "C5")
+
+      # Set the random and group columns:
+      random_columns <- c("C2", "C3", "C4", "C5")
+      group_column <- "C1"
+
+      # Build and train the model:
+      hglm_model <- h2o.hglm(x = predictor, 
+                             y = yresp, 
+                             training_frame = h2odata, 
+                             group_column = group_column, 
+                             random_columns = random_columns, 
+                             seed = 12345, 
+                             max_iterations = 10, 
+                             em_epsilon = 0.0000001, 
+                             random_intercept = TRUE)
+
+      # Find the coefficient:
+      coeff <- h2o.coef(hglm_model)
 
 References
 ----------

From a710c2e2a26110862a29a8d72669b881a93a4ea9 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Wed, 23 Oct 2024 13:11:16 -0500
Subject: [PATCH 21/22] ht/requested changes rd.2

---
 h2o-docs/src/product/data-science/hglm.rst | 33 +++++++++-------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index bde7fb8fb38a..ec9ea62597e8 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -33,7 +33,7 @@ where:
 - :math:`\varepsilon_{ij} \sim N(0, \delta_e^2)`;
 - :math:`u_{ij} \sim N(0, \delta_u^2)`:
 - :math:`\varepsilon_{ij}, u_{mj}` are independent;
-- :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m`.
+- :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m'`.
 
 We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_{m0}, u_{mj}, \delta_e^2, \delta_u^2`.
 
@@ -62,7 +62,7 @@ Algorithm-specific parameters
 
 - **method**: Obtains the fixed and random coefficients as well as the various variances (defaults to ``"em"``).
 
-- `random_columns <algo-params/random_columns.html>`__: An array of random column indices to be used for ``HGLM``.
+- `random_columns <algo-params/random_columns.html>`__: An array of random column names from which random effects coefficients will be generated in the model building process.
 
 -  `rand_family <algo-params/rand_family.html>`__: Specify the distribution of the random effects. Currently only ``rand_family=["gaussisan"]`` is supported.
 
@@ -144,37 +144,31 @@ where:
       In general, you can place the intercept at the beginning or the end of each row of data, but we chose to put it at the end for our implementation.
 
 - :math:`\theta_f \text{ is a } p` by 1 vector of fixed coefficients;
-- :math:`A_{rj}` is usually denoted by :math:`Z_j \text{ where } Z_j = \begin{bmatrix} z^T_{j1} \\ z^T_{j2} \\ z^T_{j3} \\ \vdots \\ z^T_{jn_j} \\\end{bmatrix}`;
+- :math:`A_{rj}` is usually denoted by :math:`Z_{rj} \text{ where } Z_{rj} = \begin{bmatrix} z^T_{j1} \\ z^T_{j2} \\ z^T_{j3} \\ \vdots \\ z^T_{jn_j} \\\end{bmatrix}`;
    
    .. note::
 
       We included a term for the random intercept here. However, there are cases where we do not have a random intercept, and the last element of 1 will not be there for :math:`z_{ji}`.
 
 - :math:`\theta_{rj}` represents the random coefficient and is a :math:`q` by 1 vector;
-- :math:`r_j \text{ is an } n_j` by 1 vector of level-1 random effects assumed multivariate normal in distribution with 0 mean vector, covariance matrix :math:`\sigma^2 I_{n_{j\times nj}} \text{ where } I_{n_{j \times nj}}` is the identity matrix, :math:`n_j \text{ by } n_j`;
+- :math:`r_j \text{ is an } n_j` by 1 vector of level-1 residual noise assumed multivariate normal in distribution with 0 mean vector, covariance matrix :math:`\sigma^2 I_{n_{j}\times n_{j}} \text{ where } I_{n_{j \times nj}}` is the identity matrix, :math:`n_j \text{ by } n_j`;
 - :math:`j` denotes the level-2 units where :math:`j = 1,2, \cdots , J`;
 - :math:`T_j` is a symmetric positive definite matrix of size :math:`n_j \text{ by } n_j`. We assume that :math:`T_j` is the same for all :math:`j = 1,2, \cdots , J`, and it is kept to be symmetric positive definite throughout the whole model building process.
 
 M-step
 ~~~~~~
 
-EM conceives of :math:`Y_j` as the observed data with :math:`\theta_{rj}` as the missing data. Therefore, the complete data are :math:`(Y_j, \theta_{rj}), j=1, \cdots, J \text{ while } \theta_f, \sigma^2, \text{ and } T_j` are the parameters that need to be estimated. If the complete data were observed, finding the ML estimates will be simple. To estimate :math:`\theta_f`, subtract :math:`A_{rj} \theta_{rj}` from both sides of *equation 6*:
+EM conceives of :math:`Y_j` as the observed data with :math:`\theta_{rj}` as the missing data. Therefore, the complete data are :math:`(Y_j, \theta_{rj}), j=1, \cdots, J \text{ while } \theta_f, \sigma^2, \text{ and } T_j` are the parameters that need to be estimated. If the complete data were observed, finding the ML estimates will be simple. To estimate :math:`\theta_f`, subtract :math:`A_{rj} \theta_{rj}` from both sides of *equation 6* yielding:
 
 .. math::
    
    Y_j - A_{rj} \theta_{rj} = A_{fj} \theta_f + r_f \quad \text{ equation 7}
 
-and justifying the ordinary least squares (OLS) estimate:
+Next, multiply *equation 7* with :math:`A^T_{fj}` and sum across the level-2 unit :math:`j`. Note that :math:`\sum^J_{j=1} A^T_{fj} r_j \sim 0`. Re-arrange the terms and you will get *equation 8*, which is also the ordinary least squares (OLS) estimate:
 
 .. math::
-   
-   \hat{\theta_f} = \Big( \sum^J_{j=1} A^T_{fj} A_{fj} \Big)^{-1} \sum^J_{j=1} A^T_{fj} (Y_j - A_{rj} \theta_{rj}) \quad \text{ equation 8}
 
-*Equation 8* can also be solved by multipying *equation 7* with :math:`A^T_{fj}` and sum across the level-2 unit :math:`j`. 
-
-.. note::
-   
-   :math:`\sum^J_{j=1} A^T_{fj} r_j \sim 0` and rearrange the terms and you get *equation 8*.
+   \hat{\theta_f} = \Big( \sum^J_{j=1} A^T_{fj} A_{fj} \Big)^{-1} \sum^J_{j=1} A^T_{fj} (Y_j - A_{rj} \theta_{rj}) \quad \text{ equation 8}
 
 Next, ML estimators for :math:`T_j` and :math:`\sigma^2` are straightforward:
 
@@ -199,7 +193,7 @@ where :math:`N = \sum^J_{j=1} n_j`.
 E-step
 ~~~~~~
 
-While the CDSS are not observed, they can be estimated by their conditional expectations given the data :math:`Y` and parameter estimates from the previous iterations. `Dempster et al. <#references>`__ showed that substituting the expected CDSS for the M-step formulas would produce new parameter estimates having a higher likelihood than the current estimates.
+While the CDSS are not observed, they can be estimated by their conditional expectations given the data :math:`Y` and parameter estimates from the previous iterations. `Dempster et al. [4] <#references>`__ showed that substituting the expected CDSS for the M-step formulas would produce new parameter estimates having a higher likelihood than the current estimates.
 
 To find :math:`E(CDSS | Y, \theta_f, T, \sigma^2)` requires deriving the conditional distribution of the missing data :math:`\theta_r`, given :math:`Y, \theta_f, T, \sigma^2`. From *equation  6*, the joint distribution of the complete data is:
 
@@ -207,7 +201,7 @@ To find :math:`E(CDSS | Y, \theta_f, T, \sigma^2)` requires deriving the conditi
    
    \begin{pmatrix} Y_j \\ \theta_{rj} \\\end{pmatrix} \sim N \Bigg[ \begin{pmatrix} A_{fj} \theta_{f} \\ 0 \\\end{pmatrix} , \begin{pmatrix} A_{rj}T_jA^T_{rj} + \sigma^2 & A_{rj}T_j \\ T_j A^T_{rj} & T_j \\\end{pmatrix} \Bigg] \quad \text{ equation 12}
 
-From *equation 12*, we can dervie the conditional distribution of the missing data given the complete data as follows:
+From *equation 12*, we can obtain the conditional distribution of the missing data given the complete data as follows:
 
 .. math::
    
@@ -221,7 +215,7 @@ with
 
    C_j = A^T_{rj} A_{rj} + \sigma^2 T^{-1}_j \quad \text{ equation 15}
 
-Complete the EM algorithm
+The complete EM algorithm
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The complete EM algorithm is as follows:
@@ -237,10 +231,9 @@ The complete EM algorithm is as follows:
 
 3. Substitution: substitute the estimated CDSS from *equation 17* into the M-step forumulas (*equations 8, 9,* and *10*);
 4. Processing: feed the new estimates of :math:`\theta_f, \sigma^2, T_j` into step 2;
-5. Cycling: continue steps 2, 3, and 4 until the following stopping conditions are satisfied:
-   
-   a. Changes in the log-likelihood (*equation 16*) become sufficiently small, or
-   b. The largest change in the value of any of the parameters is sufficiently small.
+5. Cycling: continue steps 2, 3, and 4 until the following stopping condition is satisfied:
+
+   - The largest change in the value of any of the parameters is sufficiently small.
 
 Log-likelihood for HGLM
 ~~~~~~~~~~~~~~~~~~~~~~~

From bf94373de52bd774a66f9062c323b31a369b6311 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Thu, 24 Oct 2024 11:48:00 -0500
Subject: [PATCH 22/22] ht/requetsed changed rd.3

---
 h2o-docs/src/product/data-science/hglm.rst | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index ec9ea62597e8..1a98e63ad3ca 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -33,7 +33,7 @@ where:
 - :math:`\varepsilon_{ij} \sim N(0, \delta_e^2)`;
 - :math:`u_{ij} \sim N(0, \delta_u^2)`:
 - :math:`\varepsilon_{ij}, u_{mj}` are independent;
-- :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m'`.
+- :math:`u_{mj}, u_{m^{'}j}` are independent if :math:`m \neq m^{'}`.
 
 We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_{m0}, u_{mj}, \delta_e^2, \delta_u^2`.
 
@@ -64,7 +64,7 @@ Algorithm-specific parameters
 
 - `random_columns <algo-params/random_columns.html>`__: An array of random column names from which random effects coefficients will be generated in the model building process.
 
--  `rand_family <algo-params/rand_family.html>`__: Specify the distribution of the random effects. Currently only ``rand_family=["gaussisan"]`` is supported.
+-  `rand_family <algo-params/rand_family.html>`__: Specify the distribution of the random effects. Currently only ``rand_family="gaussisan"`` is supported.
 
 - **random_intercept**: If enabled, will generate a random intercept as part of the random effects coefficients (defaults to ``True``).
 
@@ -99,8 +99,6 @@ Common parameters
 
 -  `seed <algo-params/seed.html>`__: Specify the random number generator (RNG) seed for algorithm components dependent on randomization. The seed is consistent for each H2O instance so that you can create models with the same starting conditions in alternative configurations. This option defaults to ``-1`` (time-based random number).
 
--  `standardize <algo-params/standardize.html>`__: Specify whether to standardize the numeric columns to have a mean of zero and unit variance. Standardization is highly recommended; if you do not use standardization, the results can include components that are dominated by variables that appear to have larger variances relative to other attributes as a matter of scale, rather than true contribution. This option defaults to ``True`` (enabled).
-
 -  `training_frame <algo-params/training_frame.html>`__: *Required* Specify the dataset used to build the model. **NOTE**: In Flow, if you click the **Build a model** button from the ``Parse`` cell, the training frame is entered automatically.
 
 -  `validation_frame <algo-params/validation_frame.html>`__: Specify the dataset used to evaluate the accuracy of the model.