Package 'sae.prop'

Title: Small Area Estimation using Fay-Herriot Models with Additive Logistic Transformation
Description: Implements Additive Logistic Transformation (alr) for Small Area Estimation under Fay Herriot Model. Small Area Estimation is used to borrow strength from auxiliary variables to improve the effectiveness of a domain sample size. This package uses Empirical Best Linear Unbiased Prediction (EBLUP) estimator. The Additive Logistic Transformation (alr) are based on transformation by Aitchison J (1986). The covariance matrix for multivariate application is base on covariance matrix used by Esteban M, Lombardía M, López-Vizcaíno E, Morales D, and Pérez A <doi:10.1007/s11749-019-00688-w>. The non-sampled models are modified area-level models based on models proposed by Anisa R, Kurnia A, and Indahwati I <doi:10.9790/5728-10121519>, with univariate model using model-3, and multivariate model using model-1. The MSE are estimated using Parametric Bootstrap approach. For non-sampled cases, MSE are estimated using modified approach proposed by Haris F and Ubaidillah A <doi:10.4108/eai.2-8-2019.2290339>.
Authors: M. Rijalus Sholihin [aut, cre], Cucu Sumarni [aut]
Maintainer: M. Rijalus Sholihin <[email protected]>
License: GPL-3
Version: 0.1.1
Built: 2026-06-05 09:51:23 UTC
Source: https://github.com/mrijalussholihin/sae.prop

Help Index


Data generated based on Multivariate Fay Herriot Model with Additive Logistic Transformation

Description

This data is generated based on multivariate Fay-Herriot model and then transformed by using inverse Additive Logistic Transformation (alr). The steps are as follows:

  1. Set these following variables:

    • q=4q = 4

    • r1=r2=r3=2,r=6r_{1} = r_{2} = r_{3} = 2, r = 6

    • β1=(β11,β12)=(1,1),β2=(β21,β22)=(1,1),β3=(β31,β32)=(1,1)\beta_{1} = (\beta_{11}, \beta_{12})' = (1, 1)', \beta_{2} = (\beta_{21}, \beta_{22})' = (1, 1)', \beta_{3} = (\beta_{31}, \beta_{32})' = (1, 1)'

    • μx1=μx2=μx3\mu_{x1} = \mu_{x2} = \mu_{x3} and σx11=1,σx22=3/2,σx33=2\sigma_{x11} = 1, \sigma_{x22} = 3/2, \sigma_{x33} = 2

    • for k=1,2,,q1k = 1, 2, \dots, q -1 and d=1,,Dd = 1, \dots, D, generate Xd=diag(xd1,xd2,xd3)(q1)×rX_{d} = diag(x_{d1}, x_{d2}, x_{d3})_{(q-1) \times r}, where:

      • xd1=(xd11,xd11)x_{d1} = (x_{d11}, x_{d11})

      • xd1=(xd21,xd22)x_{d1} = (x_{d21}, x_{d22})

      • xd1=(xd31,xd31)x_{d1} = (x_{d31}, x_{d31})

      • xd11=xd21=xd31=1x_{d11} = x_{d21} = x_{d31} = 1

      • UdkU(0,1)U_{dk} \sim U(0, 1)

      • xd12=μx1+σx111/2Ud1x_{d12} = \mu_{x1} + \sigma_{x11}^{1/2}U_{d1}

      • xd22=μx2+σx221/2Ud2x_{d22} = \mu_{x2} + \sigma_{x22}^{1/2}U_{d2}

      • xd32=μx3+σx331/2Ud3x_{d32} = \mu_{x3} + \sigma_{x33}^{1/2}U_{d3}

  2. For random effects uu, udNq1(0,Vud)u_{d} \sim N_{q-1}(0, V_{ud}), where θ1=1,θ2=3/2,θ3=2,θ4=1/2,θ5=1/2,θ6=0\theta_{1} = 1, \theta_{2} = 3/2, \theta_{3} = 2, \theta_{4} = -1/2, \theta_{5} = -1/2, \theta_{6} = 0

  3. For sampling errors ee, edNq1(0,Ved)e_{d} \sim N_{q-1}(0, V_{ed}), where c=1/4c = -1/4

  4. The generated data is transformed using inverse alr transformation, so the data will be within the range of proportion.

Auxiliary variables X1,X2,X3X_{1}, X_{2}, X_{3}, direct estimation Y1,Y2,Y3Y_{1}, Y_{2}, Y_{3}, and sampling variance-covariance v1,v2,v3,v12,v13,v23v_{1}, v_{2}, v_{3}, v_{12}, v_{13}, v_{23} are combined into a data frame called datasaem. For more details about the structure of covariance matrix, it is available in supplementary materials of Reference.

Usage

datasaem

Format

A data frame with 30 rows and 12 columns:

Y1

Direct Estimation of Y1

Y2

Direct Estimation of Y2

Y3

Direct Estimation of Y3

X1

Auxiliary variable of X1

X2

Auxiliary variable of X2

X3

Auxiliary variable of X3

v1

Sampling Variance of Y1

v2

Sampling Variance of Y2

v3

Sampling Variance of Y3

v12

Sampling Covariance of Y1 and Y2

v13

Sampling Covariance of Y1 and Y3

v23

Sampling Covariance of Y2 and Y3

Reference

Esteban, M. D., Lombardía, M. J., López-Vizcaíno, E., Morales, D., & Pérez, A. (2020). Small area estimation of proportions under area-level compositional mixed models. Test, 29(3), 793–818. https://doi.org/10.1007/s11749-019-00688-w.


Data generated based on Multivariate Fay Herriot Model with Additive Logistic Transformation with Non-Sampled Cases

Description

This data is generated based on multivariate Fay-Herriot model and then transformed by using inverse Additive Logistic Transformation (alr). Then some domain would be edited to be non-sampled. The steps are as follows:

  1. Set these following variables:

    • q=4q = 4

    • r1=r2=r3=2,r=6r_{1} = r_{2} = r_{3} = 2, r = 6

    • β1=(β11,β12)=(1,1),β2=(β21,β22)=(1,1),β3=(β31,β32)=(1,1)\beta_{1} = (\beta_{11}, \beta_{12})' = (1, 1)', \beta_{2} = (\beta_{21}, \beta_{22})' = (1, 1)', \beta_{3} = (\beta_{31}, \beta_{32})' = (1, 1)'

    • μx1=μx2=μx3\mu_{x1} = \mu_{x2} = \mu_{x3} and σx11=1,σx22=3/2,σx33=2\sigma_{x11} = 1, \sigma_{x22} = 3/2, \sigma_{x33} = 2

    • for k=1,2,,q1k = 1, 2, \dots, q -1 and d=1,,Dd = 1, \dots, D, generate Xd=diag(xd1,xd2,xd3)(q1)×rX_{d} = diag(x_{d1}, x_{d2}, x_{d3})_{(q-1) \times r}, where:

      • xd1=(xd11,xd11)x_{d1} = (x_{d11}, x_{d11})

      • xd1=(xd21,xd22)x_{d1} = (x_{d21}, x_{d22})

      • xd1=(xd31,xd31)x_{d1} = (x_{d31}, x_{d31})

      • xd11=xd21=xd31=1x_{d11} = x_{d21} = x_{d31} = 1

      • UdkU(0,1)U_{dk} \sim U(0, 1)

      • xd12=μx1+σx111/2Ud1x_{d12} = \mu_{x1} + \sigma_{x11}^{1/2}U_{d1}

      • xd22=μx2+σx221/2Ud2x_{d22} = \mu_{x2} + \sigma_{x22}^{1/2}U_{d2}

      • xd32=μx3+σx331/2Ud3x_{d32} = \mu_{x3} + \sigma_{x33}^{1/2}U_{d3}

  2. For random effects uu, udNq1(0,Vud)u_{d} \sim N_{q-1}(0, V_{ud}), where θ1=1,θ2=3/2,θ3=2,θ4=1/2,θ5=1/2,θ6=0\theta_{1} = 1, \theta_{2} = 3/2, \theta_{3} = 2, \theta_{4} = -1/2, \theta_{5} = -1/2, \theta_{6} = 0

  3. For sampling errors ee, edNq1(0,Ved)e_{d} \sim N_{q-1}(0, V_{ed}), where c=1/4c = -1/4

  4. The generated data is transformed using inverse alr transformation, so the data will be within the range of proportion.

  5. Domain 3, 15, and 25 are set to be examples of non-sampled cases (0, 1, or NA).

  6. c1c1, c2c2, and c3c3 are clusters performed using k-medoids algorithm with pamk.

Auxiliary variables X1,X2,X3X_{1}, X_{2}, X_{3}, direct estimation Y1,Y2,Y3Y_{1}, Y_{2}, Y_{3}, sampling variance-covariance v1,v2,v3,v12,v13,v23v_{1}, v_{2}, v_{3}, v_{12}, v_{13}, v_{23}, and cluster c1,c2,c3c1, c2, c3 are combined into a data frame called datasaem.ns. For more details about the structure of covariance matrix, it is available in supplementary materials of Reference.

Usage

datasaem.ns

Format

A data frame with 30 rows and 15 columns:

Y1

Direct Estimation of Y1

Y2

Direct Estimation of Y2

Y3

Direct Estimation of Y3

X1

Auxiliary variable of X1

X2

Auxiliary variable of X2

X3

Auxiliary variable of X3

v1

Sampling Variance of Y1

v2

Sampling Variance of Y2

v3

Sampling Variance of Y3

v12

Sampling Covariance of Y1 and Y2

v13

Sampling Covariance of Y1 and Y3

v23

Sampling Covariance of Y2 and Y3

c1

Cluster of Y1

c2

Cluster of Y2

c3

Cluster of Y3

Reference

Esteban, M. D., Lombardía, M. J., López-Vizcaíno, E., Morales, D., & Pérez, A. (2020). Small area estimation of proportions under area-level compositional mixed models. Test, 29(3), 793–818. https://doi.org/10.1007/s11749-019-00688-w.


Data generated based on Univariate Fay Herriot Model with Additive Logistic Transformation

Description

This data is generated based on univariate Fay-Herriot model and then transformed by using inverse Additive Logistic Transformation (alr). The steps are as follows:

  1. β\beta are set to be β0=β1=β2=1\beta_{0} = \beta_{1} = \beta_{2} = 1

  2. Auxiliary variables are set as follows:

    • x1N(0,1)x_{1} \sim N(0, 1)

    • x2N(0.5,1)x_{2} \sim N(0.5, 1)

  3. For random effects, uN(0,Vu)u \sim N(0, V_{u}), where Vu=1V_{u} = 1.

  4. For sampling errors eN(0,Ved)e \sim N(0, V_{ed}), where VedV_{ed} is generated VedInvGamma(50,0.5)V_{ed} \sim InvGamma(50, 0.5).

  5. The generated data is transformed using inverse alr transformation, so the data will be within the range of proportion.

Auxiliary variables x1,x2x_{1}, x_{2}, direct estimation yy, and sampling variance vardirvardir are combined into a data frame called datasaeu.

Usage

datasaeu

Format

A data frame with 30 rows and 4 columns:

y

Direct Estimation of y

x1

Auxiliary variable of x1

x2

Auxiliary variable of x2

vardir

Sampling Variance of y


Data generated based on Univariate Fay Herriot Model with Additive Logistic Transformation with Non-Sampled Cases

Description

This data is generated based on univariate Fay-Herriot model and then transformed by using inverse Additive Logistic Transformation (alr). Then some domain would be edited to be non-sampled. The steps are as follows:

  1. β\beta are set to be β0=β1=β2=1\beta_{0} = \beta_{1} = \beta_{2} = 1

  2. Auxiliary variables are set as follows:

    • x1N(0,1)x_{1} \sim N(0, 1)

    • x2N(0.5,1)x_{2} \sim N(0.5, 1)

  3. For random effects, uN(0,Vu)u \sim N(0, V_{u}), where Vu=1V_{u} = 1.

  4. For sampling errors eN(0,Ved)e \sim N(0, V_{ed}), where VedV_{ed} is generated VedInvGamma(50,0.5)V_{ed} \sim InvGamma(50, 0.5).

  5. The generated data is transformed using inverse alr transformation, so the data will be within the range of proportion.

  6. Domain 3, 15, and 25 are set to be examples of non-sampled cases (0, 1, or NA).

  7. clustercluster is cluster performed using k-medoids algorithm with pamk.

Auxiliary variables x1,x2x_{1}, x_{2}, direct estimation yy, and sampling variance vardirvardir are combined into a data frame called datasaeu.

Usage

datasaeu.ns

Format

A data frame with 30 rows and 5 columns:

y

Direct Estimation of y

x1

Auxiliary variable of x1

x2

Auxiliary variable of x2

vardir

Sampling Variance of y

cluster

Cluster of y


Parametric Bootstrap Mean Squared Error of EBLUPs based on a Multivariate Fay Herriot model with Additive Logistic Transformation

Description

This function gives the MSE of transformed EBLUP and Empirical Best Predictor (EBP) based on a multivariate Fay-Herriot model with modified parametric bootstrap approach proposed by Gonzalez-Manteiga.

Usage

mseFH.mprop(
  formula,
  vardir,
  MAXITER = 100,
  PRECISION = 1e-04,
  L = 1000,
  B = 400,
  data
)

Arguments

formula

an object of class formula that describe the fitted model.

vardir

sampling variances of direct estimations. If data is defined, it is a vector containing names of sampling variance columns. If data is not defined, it should be a data frame of sampling variances of direct estimators. The order is var1,var2,,var(q1),cov12,,cov1(q1),cov23,,cov(q2)(q1)var1, var2, \dots, var(q-1), cov12, \dots, cov1(q-1), cov23, \dots, cov(q-2)(q-1).

MAXITER

maximum number of iterations allowed in the Fisher-scoring algorithm, Default: 100.

PRECISION

convergence tolerance limit for the Fisher-scoring algorithm, Default: 1e-4.

L

number of Monte Carlo iterations in calculating Empirical Best Predictor (EBP), Default: 1000.

B

number of Bootstrap iterations in calculating MSE, Default: 400.

data

optional data frame containing the variables named in formula and vardir.

Value

The function returns a list with the following objects:

est

a list containing the following objects:

  • PC : data frame containing transformed EBLUP estimators using inverse alr for each category.

  • EBP : data frame containing Empirical Best Predictor using Monte Carlo for each category.

fit

a list containing the following objects (model is fitted using REML):

  • convergence : logical value equal to TRUE if Fisher-scoring algorithm converges in less than MAXITER iterations.

  • iterations : number of iterations performed by the Fisher-scoring algorithm.

  • estcoef : data frame that contains the estimated model coefficients, standard errors, t-statistics, and p-values of each coefficient.

  • refvar : estimated covariance matrix of random effects.

components

a list containing the following objects:

  • random.effects : data frame containing estimated random effect values of the fitted model for each category.

  • residuals : data frame containing residuals of the fitted model for each category.

mse

a list containing estimated MSE of the estimators.

  • PC : estimated MSE of plugin (PC) estimators for each category.

  • EBP : estimated MSE of EBP estimators for each category.

Examples

## Load dataset
data(datasaem)

## If data is defined
Fo = list(Y1 ~ X1,
          Y2 ~ X2,
          Y3 ~ X3)
vardir = c("v1", "v2", "v3", "v12", "v13", "v23")
MSE.data <- mseFH.mprop(Fo, vardir, data = datasaem, B = 10)

## If data is undefined
Fo = list(datasaem$Y1 ~ datasaem$X1,
          datasaem$Y2 ~ datasaem$X2,
          datasaem$Y3 ~ datasaem$X3)
vardir = datasaem[, c("v1", "v2", "v3", "v12", "v13", "v23")]
MSE <- mseFH.mprop(Fo, vardir, B = 10)

## See the estimators
MSE$mse

## NOTE:
## B = 10 is just for examples.
## Please choose a proper number for Bootstrap iterations in real calculation.

Parametric Bootstrap Mean Squared Error of EBLUPs based on a Multivariate Fay Herriot model with Additive Logistic Transformation for Non-Sampled Data

Description

This function gives the MSE of transformed EBLUP based on a multivariate Fay-Herriot model. For sampled domains, MSE is estimated using modified parametric bootstrap approach proposed by Gonzalez-Manteiga. For non-sampled domains, MSE is estimated using modified approach by using average sampling variance of sampled domain in each cluster.

Usage

mseFH.ns.mprop(
  formula,
  vardir,
  MAXITER = 100,
  PRECISION = 1e-04,
  cluster = "auto",
  B = 400,
  data
)

Arguments

formula

an object of class formula that describe the fitted model.

vardir

sampling variances of direct estimations. If data is defined, it is a vector containing names of sampling variance columns. If data is not defined, it should be a data frame of sampling variances of direct estimators. The order is var1,var2,,var(q1),cov12,,cov1(q1),cov23,,cov(q2)(q1)var1, var2, \dots, var(q-1), cov12, \dots, cov1(q-1), cov23, \dots, cov(q-2)(q-1).

MAXITER

maximum number of iterations allowed in the Fisher-scoring algorithm, Default: 100.

PRECISION

convergence tolerance limit for the Fisher-scoring algorithm, Default: 1e-4.

cluster

Default: "auto". If cluster = "auto", then the clustering will be performed by the function by finding optimal number of cluster. If cluster is a vector containing numbers of cluster for each category, then clustering will be performed based on the chosen number of cluster. If cluster is a data frame or matrix containing cluster information, then the vector will be used directly to find average of random effects. Clustering is performed with k-medoids algorithms using the function pamk. If "auto" is chosen, krange are set to 2:(nrow(data)-1).

B

number of Bootstrap iterations in calculating MSE, Default: 400.

data

optional data frame containing the variables named in formula and vardir.

Value

The function returns a list with the following objects:

est

a data frame containing values of the estimators for each domains.

  • PC : transformed EBLUP estimators using inverse alr for each categoory.

  • status : status of corresponding domain, whether sampled or non-sampled.

fit

a list containing the following objects (model is fitted using REML):

  • convergence : a logical value equal to TRUE if Fisher-scoring algorithm converges in less than MAXITER iterations.

  • iterations : number of iterations performed by the Fisher-scoring algorithm.

  • estcoef : a data frame that contains the estimated model coefficients, standard errors, t-statistics, and p-values of each coefficient.

  • refvar : estimated covariance matrix of random effects.

  • cluster : cluster of each category.

  • cluster.information : a list containing data frames with average random effects of sampled domain in each cluster.

components

a list containing the following objects:

  • random.effects : data frame containing estimated random effect values of the fitted model for each category and their status whether sampled or non-sampled.

  • residuals : data frame containing residuals of the fitted model for each category and their status whether sampled or non-sampled.

mse

data frame containing estimated MSE of the estimators.

  • PC : estimated MSE of plugin (PC) estimators for each category.

  • status : status of domain, whether sampled or non-sampled.

Examples

## Load dataset
data(datasaem.ns)

## If data is defined
Fo = list(Y1 ~ X1,
          Y2 ~ X2,
          Y3 ~ X3)
vardir = c("v1", "v2", "v3", "v12", "v13", "v23")
MSE.ns <- mseFH.ns.mprop(Fo, vardir, data = datasaem.ns, B = 10)

## If data is undefined (and option for cluster arguments)
Fo = list(datasaem.ns$Y1 ~ datasaem.ns$X1,
          datasaem.ns$Y2 ~ datasaem.ns$X2,
          datasaem.ns$Y3 ~ datasaem.ns$X3)
vardir = datasaem.ns[, c("v1", "v2", "v3", "v12", "v13", "v23")]

### "auto"
MSE.ns1 <- mseFH.ns.mprop(Fo, vardir, cluster = "auto", B = 10)

### number of clusters
MSE.ns2 <- mseFH.ns.mprop(Fo, vardir, cluster = c(3, 2, 2), B = 10)

### data frame or matrix containing cluster for each domain
MSE.ns3 <- mseFH.ns.mprop(Fo, vardir, cluster = datasaem.ns[, c("c1", "c2", "c3")], B = 10)

## See the estimators
MSE.ns$mse

## NOTE:
## B = 10 is just for examples.
## Please choose a proper number for Bootstrap iterations in real calculation.

Parametric Bootstrap Mean Squared Error of EBLUPs based on a Univariate Fay Herriot model with Additive Logistic Transformation for Non-Sampled Data

Description

This function gives the MSE of transformed EBLUP based on a univariate Fay-Herriot model. For sampled domains, MSE is estimated using modified parametric bootstrap approach proposed by Butar & Lahiri. For non-sampled domains, MSE is estimated using modified approach proposed by Haris & Ubaidillah.

Usage

mseFH.ns.uprop(
  formula,
  vardir,
  MAXITER = 100,
  PRECISION = 1e-04,
  cluster = "auto",
  B = 1000,
  data
)

Arguments

formula

an object of class formula that describe the fitted model.

vardir

vector containing the sampling variances of direct estimators for each domain. The values must be sorted as the variables in formula.

MAXITER

maximum number of iterations allowed in the Fisher-scoring algorithm, Default: 100.

PRECISION

convergence tolerance limit for the Fisher-scoring algorithm, Default: 1e-4.

cluster

Default: "auto". If cluster = "auto", then the clustering will be performed by the function by finding optimal number of cluster. If cluster is a number, then clustering will be performed based on the chosen number of cluster. If cluster is a vector containing cluster information, then the vector will be used directly to find average of random effects. Clustering is performed with k-medoids algorithms using the function pamk. If "auto" is chosen, krange are set to 2:(nrow(data)-1).

B

number of Bootstrap iterations in calculating MSE, Default: 1000.

data

optional data frame containing the variables named in formula and vardir.

Value

The function returns a list with the following objects:

est

a data frame containing values of the estimators for each domains.

  • PC : transformed EBLUP estimators using inverse alr.

  • status : status of corresponding domain, whether sampled or non-sampled.

  • cluster : cluster of corresponding domain.

fit

a list containing the following objects (model is fitted using REML):

  • convergence : a logical value equal to TRUE if Fisher-scoring algorithm converges in less than MAXITER iterations.

  • iterations : number of iterations performed by the Fisher-scoring algorithm.

  • estcoef : a data frame that contains the estimated model coefficients, standard errors, t-statistics, and p-values of each coefficient.

  • refvar : estimated random effects variance.

  • cluster.information : a data frame containing average random effects of sampled domain in each cluster.

components

a data frame containing the following columns:

  • random.effects : estimated random effect values of the fitted model.

  • residuals : residuals of the fitted model.

  • status : status of corresponding domain, whether sampled or non-sampled.

mse

a data frame containing estimated MSE of the estimators.

  • PC : estimated MSE of plugin (PC) estimators.

  • status : status of domain, whether sampled or non-sampled.

Examples

## Load dataset
data(datasaeu.ns)

## If data is defined
Fo = y ~ x1 + x2
vardir = "vardir"
MSE.ns <- mseFH.ns.uprop(Fo, vardir, data = datasaeu.ns)

## If data is undefined (and option for cluster arguments)
Fo = datasaeu.ns$y ~ datasaeu.ns$x1 + datasaeu.ns$x2
vardir = datasaeu.ns$vardir

### "auto"
MSE.ns1 <- mseFH.ns.uprop(Fo, vardir, cluster = "auto")

### number of clusters
MSE.ns2 <- mseFH.ns.uprop(Fo, vardir, cluster = 2)

### vector containing cluster for each domain
MSE.ns3 <- mseFH.ns.uprop(Fo, vardir, cluster = datasaeu.ns$cluster)

## See the estimators
MSE.ns$mse

Parametric Bootstrap Mean Squared Error of EBLUPs based on a Univariate Fay Herriot model with Additive Logistic Transformation

Description

This function gives the MSE of transformed EBLUP and Empirical Best Predictor (EBP) based on a univariate Fay-Herriot model with modified parametric bootstrap approach proposed by Butar & Lahiri.

Usage

mseFH.uprop(
  formula,
  vardir,
  MAXITER = 100,
  PRECISION = 1e-04,
  L = 1000,
  B = 1000,
  data
)

Arguments

formula

an object of class formula that describe the fitted model.

vardir

vector containing the sampling variances of direct estimators for each domain. The values must be sorted as the variables in formula.

MAXITER

maximum number of iterations allowed in the Fisher-scoring algorithm, Default: 100.

PRECISION

convergence tolerance limit for the Fisher-scoring algorithm, Default: 1e-4.

L

number of Monte Carlo iterations in calculating Empirical Best Predictor (EBP), Default: 1000.

B

number of Bootstrap iterations in calculating MSE, Default: 1000.

data

optional data frame containing the variables named in formula and vardir.

Value

The function returns a list with the following objects:

est

a data frame containing values of the estimators for each domains.

  • PC : transformed EBLUP estimators using inverse alr.

  • EBP : Empirical Best Predictor using Monte Carlo.

fit

a list containing the following objects (model is fitted using REML):

  • convergence : a logical value equal to TRUE if Fisher-scoring algorithm converges in less than MAXITER iterations.

  • iterations : number of iterations performed by the Fisher-scoring algorithm.

  • estcoef : a data frame that contains the estimated model coefficients, standard errors, t-statistics, and p-values of each coefficient.

  • refvar : estimated random effects variance.

components

a data frame containing the following columns:

  • random.effects : estimated random effect values of the fitted model.

  • residuals : residuals of the fitted model.

mse

a data frame containing estimated MSE of the estimators.

  • PC : estimated MSE of plugin (PC) estimators.

  • EBP : estimated MSE of EBP estimators.

Examples

## Load dataset
data(datasaeu)

## If data is defined
Fo = y ~ x1 + x2
vardir = "vardir"
MSE.data <- mseFH.uprop(Fo, vardir, data = datasaeu)

## If data is undefined
Fo = datasaeu$y ~ datasaeu$x1 + datasaeu$x2
vardir = datasaeu$vardir
MSE <- mseFH.uprop(Fo, vardir)

## See the estimators
MSE$mse

sae.prop : Small Area Estimation for Proportion using Fay Herriot Models with Additive Logistic Transformation

Description

Implements Additive Logistic Transformation (alr) for Small Area Estimation under Fay Herriot Model. Small Area Estimation is used to borrow strength from auxiliary variables to improve the effectiveness of a domain sample size. This package uses Empirical Best Linear Unbiased Prediction (EBLUP) estimator. The Additive Logistic Transformation (alr) are based on transformation by Aitchison J (1986). The covariance matrix for multivariate application is base on covariance matrix used by Esteban M, Lombardía M, López-Vizcaíno E, Morales D, and Pérez A <doi:10.1007/s11749-019-00688-w>. The non-sampled models are modified area-level models based on models proposed by Anisa R, Kurnia A, and Indahwati I <doi:10.9790/5728-10121519>, with univariate model using model-3, and multivariate model using model-1. The MSE are estimated using Parametric Bootstrap approach. For non-sampled cases, MSE are estimated using modified approach proposed by Haris F and Ubaidillah A <doi:10.4108/eai.2-8-2019.2290339>.

Author(s)

M. Rijalus Sholihin, Cucu Sumarni

Maintainer: M. Rijalus Sholihin [email protected]

Functions

saeFH.uprop

EBLUPs based on a Univariate Fay Herriot model with Additive Logistic Transformation

mseFH.uprop

Parametric Bootstrap Mean Squared Error of EBLUPs based on a Univariate Fay Herriot model with Additive Logistic Transformation

saeFH.ns.uprop

EBLUPs based on a Univariate Fay Herriot model with Additive Logistic Transformation for Non-Sampled Data

mseFH.ns.uprop

Parametric Bootstrap Mean Squared Error of EBLUPs based on a Univariate Fay Herriot model with Additive Logistic Transformation for Non-Sampled Data

saeFH.mprop

EBLUPs based on a Multivariate Fay Herriot model with Additive Logistic Transformation

mseFH.mprop

Parametric Bootstrap Mean Squared Error of EBLUPs based on a Multivariate Fay Herriot model with Additive Logistic Transformation

saeFH.ns.mprop

EBLUPs based on a Multivariate Fay Herriot model with Additive Logistic Transformation for Non-Sampled Data

mseFH.ns.mprop

Parametric Bootstrap Mean Squared Error of EBLUPs based on a Multivariate Fay Herriot model with Additive Logistic Transformation for Non-Sampled Data

Reference

  • Rao, J.N.K & Molina. (2015). Small Area Estimation 2nd Edition. New York: John Wiley and Sons, Inc.

  • Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Springer Netherlands.

  • Esteban, M. D., Lombardía, M. J., López-Vizcaíno, E., Morales, D., & Pérez, A. (2020). Small area estimation of proportions under area-level compositional mixed models. Test, 29(3), 793–818. https://doi.org/10.1007/s11749-019-00688-w.

  • Anisa, R., Kurnia, A., & Indahwati, I. (2014). Cluster Information of Non-Sampled Area In Small Area Estimation. IOSR Journal of Mathematics, 10(1), 15–19. https://doi.org/10.9790/5728-10121519.

  • Haris, F., & Ubaidillah, A. (2020, January 21). Mean Square Error of Non-Sampled Area in Small Area Estimation. https://doi.org/10.4108/eai.2-8-2019.2290339.


EBLUPs based on a Multivariate Fay Herriot model with Additive Logistic Transformation

Description

This function gives the transformed EBLUP and Empirical Best Predictor (EBP) based on a multivariate Fay-Herriot model. This function is used for multinomial compositional data. If data has PP as proportion and total of qq categories (P1+P2++Pq=1)(P_{1} + P_{2} + \dots + P_{q} = 1), then function should be used to estimate P1,P2,,Pq1{P_{1}, P_{2}, \dots, P_{q-1}}.

Usage

saeFH.mprop(formula, vardir, MAXITER = 100, PRECISION = 1e-04, L = 1000, data)

Arguments

formula

an object of class formula that describe the fitted model.

vardir

sampling variances of direct estimations. If data is defined, it is a vector containing names of sampling variance columns. If data is not defined, it should be a data frame of sampling variances of direct estimators. The order is var1,var2,,var(q1),cov12,,cov1(q1),cov23,,cov(q2)(q1)var1, var2, \dots, var(q-1), cov12, \dots, cov1(q-1), cov23, \dots, cov(q-2)(q-1).

MAXITER

maximum number of iterations allowed in the Fisher-scoring algorithm, Default: 100.

PRECISION

convergence tolerance limit for the Fisher-scoring algorithm, Default: 1e-4.

L

number of Monte Carlo iterations in calculating Empirical Best Predictor (EBP), Default: 1000.

data

optional data frame containing the variables named in formula and vardir.

Value

The function returns a list with the following objects:

est

a list containing data frame of the estimators for each domains.

  • PC : transformed EBLUP estimators using inverse alr for each category.

  • EBP : Empirical Best Predictor using Monte Carlo for each category.

fit

a list containing the following objects (model is fitted using REML):

  • convergence : a logical value equal to TRUE if Fisher-scoring algorithm converges in less than MAXITER iterations.

  • iterations : number of iterations performed by the Fisher-scoring algorithm.

  • estcoef : a data frame that contains the estimated model coefficients, standard errors, t-statistics, and p-values of each coefficient.

  • refvar : estimated covariance matrix of random effects.

components

a list containing the following objects:

  • random.effects : data frame containing estimated random effect values of the fitted model for each category.

  • residuals : data frame containing residuals of the fitted model for each category.

Examples

## Load dataset
data(datasaem)

## If data is defined
Fo = list(Y1 ~ X1,
          Y2 ~ X2,
          Y3 ~ X3)
vardir = c("v1", "v2", "v3", "v12", "v13", "v23")
model.data <- saeFH.mprop(Fo, vardir, data = datasaem)

Fo = list(datasaem$Y1 ~ datasaem$X1,
          datasaem$Y2 ~ datasaem$X2,
          datasaem$Y3 ~ datasaem$X3)
vardir = datasaem[, c("v1", "v2", "v3", "v12", "v13", "v23")]
model <- saeFH.mprop(Fo, vardir)

## See the estimators
model$est

EBLUPs based on a Multivariate Fay Herriot model with Additive Logistic Transformation for Non-Sampled Data

Description

This function gives the transformed EBLUP based on a multivariate Fay-Herriot model. Random effects for sampled domains are from the fitted model and random effects for non-sampled domains are from cluster information. This function is used for multinomial compositional data. If data has PP as proportion and total of qq categories (P1+P2++Pq=1)(P_{1} + P_{2} + \dots + P_{q} = 1), then function should be used to estimate P1,P2,,Pq1{P_{1}, P_{2}, \dots, P_{q-1}}.

Usage

saeFH.ns.mprop(
  formula,
  vardir,
  MAXITER = 100,
  PRECISION = 1e-04,
  cluster = "auto",
  data
)

Arguments

formula

an object of class formula that describe the fitted model.

vardir

sampling variances of direct estimations. If data is defined, it is a vector containing names of sampling variance columns. If data is not defined, it should be a data frame of sampling variances of direct estimators. The order is var1,var2,,var(q1),cov12,,cov1(q1),cov23,,cov(q2)(q1)var1, var2, \dots, var(q-1), cov12, \dots, cov1(q-1), cov23, \dots, cov(q-2)(q-1).

MAXITER

maximum number of iterations allowed in the Fisher-scoring algorithm, Default: 100.

PRECISION

convergence tolerance limit for the Fisher-scoring algorithm, Default: 1e-4.

cluster

Default: "auto". If cluster = "auto", then the clustering will be performed by the function by finding optimal number of cluster. If cluster is a vector containing numbers of cluster for each category, then clustering will be performed based on the chosen number of cluster. If cluster is a data frame or matrix containing cluster information, then the vector will be used directly to find average of random effects. Clustering is performed with k-medoids algorithms using the function pamk. If "auto" is chosen, krange are set to 2:(nrow(data)-1).

data

optional data frame containing the variables named in formula and vardir.

Value

The function returns a list with the following objects:

est

a data frame containing values of the estimators for each domains.

  • PC : transformed EBLUP estimators using inverse alr for each categoory.

  • status : status of corresponding domain, whether sampled or non-sampled.

fit

a list containing the following objects (model is fitted using REML):

  • convergence : a logical value equal to TRUE if Fisher-scoring algorithm converges in less than MAXITER iterations.

  • iterations : number of iterations performed by the Fisher-scoring algorithm.

  • estcoef : a data frame that contains the estimated model coefficients, standard errors, t-statistics, and p-values of each coefficient.

  • refvar : estimated covariance matrix of random effects.

  • cluster : cluster of each category.

  • cluster.information : a list containing data frames with average random effects of sampled domain in each cluster.

components

a list containing the following objects:

  • random.effects : data frame containing estimated random effect values of the fitted model for each category and their status whether sampled or non-sampled.

  • residuals : data frame containing residuals of the fitted model for each category and their status whether sampled or non-sampled.

Examples

## Load dataset
data(datasaem.ns)

## If data is defined
Fo = list(Y1 ~ X1,
          Y2 ~ X2,
          Y3 ~ X3)
vardir = c("v1", "v2", "v3", "v12", "v13", "v23")
model.ns <- saeFH.ns.mprop(Fo, vardir, data = datasaem.ns)

## If data is undefined (and option for cluster arguments)
Fo = list(datasaem.ns$Y1 ~ datasaem.ns$X1,
          datasaem.ns$Y2 ~ datasaem.ns$X2,
          datasaem.ns$Y3 ~ datasaem.ns$X3)
vardir = datasaem.ns[, c("v1", "v2", "v3", "v12", "v13", "v23")]

### "auto"
model.ns1 <- saeFH.ns.mprop(Fo, vardir, cluster = "auto")

### number of clusters
model.ns2 <- saeFH.ns.mprop(Fo, vardir, cluster = c(3, 2, 2))

### data frame or matrix containing cluster for each domain
model.ns3 <- saeFH.ns.mprop(Fo, vardir, cluster = datasaem.ns[, c("c1", "c2", "c3")])

## See the estimators
model.ns$est

EBLUPs based on a Univariate Fay Herriot model with Additive Logistic Transformation for Non-Sampled Data

Description

This function gives the transformed EBLUP based on a univariate Fay-Herriot model. Random effects for sampled domains are from the fitted model and random effects for non-sampled domains are from cluster information.

Usage

saeFH.ns.uprop(
  formula,
  vardir,
  MAXITER = 100,
  PRECISION = 1e-04,
  cluster = "auto",
  data
)

Arguments

formula

an object of class formula that describe the fitted model.

vardir

vector containing the sampling variances of direct estimators for each domain. The values must be sorted as the variables in formula.

MAXITER

maximum number of iterations allowed in the Fisher-scoring algorithm, Default: 100.

PRECISION

convergence tolerance limit for the Fisher-scoring algorithm, Default: 1e-4.

cluster

Default: "auto". If cluster = "auto", then the clustering will be performed by the function by finding optimal number of cluster. If cluster is a number, then clustering will be performed based on the chosen number of cluster. If cluster is a vector containing cluster information, then the vector will be used directly to find average of random effects. Clustering is performed with k-medoids algorithms using the function pamk. If "auto" is chosen, krange are set to 2:(nrow(data)-1).

data

optional data frame containing the variables named in formula and vardir.

Value

The function returns a list with the following objects:

est

a data frame containing values of the estimators for each domains.

  • PC : transformed EBLUP estimators using inverse alr.

  • status : status of corresponding domain, whether sampled or non-sampled.

  • cluster : cluster of corresponding domain.

fit

a list containing the following objects (model is fitted using REML):

  • convergence : a logical value equal to TRUE if Fisher-scoring algorithm converges in less than MAXITER iterations.

  • iterations : number of iterations performed by the Fisher-scoring algorithm.

  • estcoef : a data frame that contains the estimated model coefficients, standard errors, t-statistics, and p-values of each coefficient.

  • refvar : estimated random effects variance.

  • cluster.information : a data frame containing average random effects of sampled domain in each cluster.

components

a data frame containing the following columns:

  • random.effects : estimated random effect values of the fitted model.

  • residuals : residuals of the fitted model.

  • status : status of corresponding domain, whether sampled or non-sampled.

Examples

## Load dataset
data(datasaeu.ns)

## If data is defined
Fo = y ~ x1 + x2
vardir = "vardir"
model.ns <- saeFH.ns.uprop(Fo, vardir, data = datasaeu.ns)

## If data is undefined (and option for cluster arguments)
Fo = datasaeu.ns$y ~ datasaeu.ns$x1 + datasaeu.ns$x2
vardir = datasaeu.ns$vardir

### "auto"
model.ns1 <- saeFH.ns.uprop(Fo, vardir, cluster = "auto")

### number of clusters
model.ns2 <- saeFH.ns.uprop(Fo, vardir, cluster = 2)

### vector containing cluster for each domain
model.ns3 <- saeFH.ns.uprop(Fo, vardir, cluster = datasaeu.ns$cluster)

## See the estimators
model.ns$est

EBLUPs based on a Univariate Fay Herriot model with Additive Logistic Transformation

Description

This function gives the transformed EBLUP and Empirical Best Predictor (EBP) based on a univariate Fay-Herriot model.

Usage

saeFH.uprop(formula, vardir, MAXITER = 100, PRECISION = 1e-04, L = 1000, data)

Arguments

formula

an object of class formula that describe the fitted model.

vardir

vector containing the sampling variances of direct estimators for each domain. The values must be sorted as the variables in formula.

MAXITER

maximum number of iterations allowed in the Fisher-scoring algorithm, Default: 100.

PRECISION

convergence tolerance limit for the Fisher-scoring algorithm, Default: 1e-4.

L

number of Monte Carlo iterations in calculating Empirical Best Predictor (EBP), Default: 1000.

data

optional data frame containing the variables named in formula and vardir.

Value

The function returns a list with the following objects:

est

a data frame containing values of the estimators for each domains.

  • PC : transformed EBLUP estimators using inverse alr.

  • EBP : Empirical Best Predictor using Monte Carlo.

fit

a list containing the following objects (model is fitted using REML):

  • convergence : a logical value equal to TRUE if Fisher-scoring algorithm converges in less than MAXITER iterations.

  • iterations : number of iterations performed by the Fisher-scoring algorithm.

  • estcoef : a data frame that contains the estimated model coefficients, standard errors, t-statistics, and p-values of each coefficient.

  • refvar : estimated random effects variance.

components

a data frame containing the following columns:

  • random.effects : estimated random effect values of the fitted model.

  • residuals : residuals of the fitted model.

Examples

## Load dataset
data(datasaeu)

## If data is defined
Fo = y ~ x1 + x2
vardir = "vardir"
model.data <- saeFH.uprop(Fo, vardir, data = datasaeu)

## If data is undefined
Fo = datasaeu$y ~ datasaeu$x1 + datasaeu$x2
vardir = datasaeu$vardir
model <- saeFH.uprop(Fo, vardir)

## See the estimators
model$est