| Title: | Approximate Marginal Inference for Regression-Scale Models |
|---|---|
| Description: | Implements likelihood inference based on higher order approximations for linear nonnormal regression models. |
| Authors: | Alessandra R. Brazzale [aut, cre] (author of original code for S, author of R port following earlier work by Douglas Bates) |
| Maintainer: | Alessandra R. Brazzale <[email protected]> |
| License: | GPL (>= 2) | file LICENCE |
| Version: | 1.2-4 |
| Built: | 2026-05-10 08:37:35 UTC |
| Source: | https://github.com/cran/marg |
Likelihood inference based on higher order approximations for linear nonnormal regression models
| Package: | marg |
| Version: | 1.2-0 |
| Date: | 2009-10-03 |
| Depends: | R (>= 2.6.0), statmod, survival |
| Suggests: | boot, cond, csampling, nlreg |
| License: | GPL (>= 2) |
| URL: | http://www.r-project.org, http://statwww.epfl.ch/AA/ |
| LazyLoad: | yes |
| LazyData: | yes |
Index:
Functions:
=========
cond Approximate Conditional Inference - Generic
Function
cond.rsm Approximate Conditional Inference in
Regression-Scale Models
dHuber Huber's Least Favourable Distribution
family.rsm Use family() on a "rsm" object
family.rsm.object Family Object for Regression-Scale Models
logLik.rsm Compute the Log Likelihood for
Regression-Scale Models
marg.object Approximate Marginal Inference Object
plot.marg Generate Plots for an Approximate Marginal
Inference Object
print.summary.marg Use print() on a "summary.marg" object
residuals.rsm Compute Residuals for Regression-Scale Models
rsm Fit a Regression-Scale Model
rsm.diag Diagnostics for Regression-Scale Models
rsm.diag.plots Diagnostic Plots for Regression-Scale Models
rsm.families Generate a RSM Family Object
rsm.fit Fit a Regression-Scale Model Without Computing
the Model Matrix
rsm.null Fit an Empty Regression-Scale Model
rsm.object Regression-Scale Model Object
rsm.surv Fit a Regression-Scale Model Without Computing
the Model Matrix
summary.marg Summary Method for Objects of Class "marg"
summary.rsm Summary Method for Regression-Scale Models
update.rsm Update and Re-fit a RSM Model Call
vcov.rsm Calculate Variance-Covariance Matrix for a
Fitted RSM Model
Datasets:
========
darwin Darwin's Data on Growth Rates of Plants
houses House Price Data
nuclear Nuclear Power Station Data
venice Sea Level Data
Further information is available in the following vignettes:
Rnews-paper |
hoa: An R Package Bundle for Higher Order Likelihood Inference (source, pdf) |
S original by Alessandra R. Brazzale <[email protected]>. R port by Alessandra R. Brazzale <[email protected]>, following earlier work by Douglas Bates.
Maintainer: Alessandra R. Brazzale <[email protected]>
Performs approximate conditional inference.
cond(object, offset, ...)cond(object, offset, ...)
object |
a fitted model object. Families supported are binomial and
Poisson with canonical link function (class |
offset |
the covariate occurring in the model formula whose coefficient
represents the parameter of interest. May be numerical or a
two-level factor. In case of a two-level factor, it must be
coded by contrasts and not appear as two dummy variables in the
model. Can also be a call to a mathematical function (such as
|
... |
absorbs any additional arguments. See |
This function is generic (see methods); method
functions can be written to handle specific classes of data.
Classes which already have methods for this function include:
glm and rsm.
The returned value is an approximate conditional inference
object. Classes already supported are cond and
marg depending on whether the fitted model object passed
through the object argument has class glm or
rsm. See cond.object or
marg.object for more details.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne. Chapter 6.
cond.glm, cond.rsm,
cond.object, marg.object
## Urine Data ## Not run: data(urine) urine.glm <- glm(r ~ gravity + ph + osmo + cond + urea + log(calc), family = binomial, data = urine) ## ## function call as offset variable labels(coef(urine.glm)) cond(urine.glm, log(calc)) ## ## large estimate of regression coefficient urine.glm <- glm(r ~ gravity + ph + osmo + cond + urea + calc, family = binomial, data = urine) coef(urine.glm) urine.glm <- glm(r ~ I(gravity * 100) + ph + osmo + cond + urea + calc, family = binomial, data = urine) coef(urine.glm) urine.cond <- cond(urine.glm, I(gravity * 100)) plot(urine.cond, which = 4) ## End(Not run) ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## ## parameter of interest: scale parameter houses.marg <- cond(houses.rsm, scale) plot(houses.marg, which = 2)## Urine Data ## Not run: data(urine) urine.glm <- glm(r ~ gravity + ph + osmo + cond + urea + log(calc), family = binomial, data = urine) ## ## function call as offset variable labels(coef(urine.glm)) cond(urine.glm, log(calc)) ## ## large estimate of regression coefficient urine.glm <- glm(r ~ gravity + ph + osmo + cond + urea + calc, family = binomial, data = urine) coef(urine.glm) urine.glm <- glm(r ~ I(gravity * 100) + ph + osmo + cond + urea + calc, family = binomial, data = urine) coef(urine.glm) urine.cond <- cond(urine.glm, I(gravity * 100)) plot(urine.cond, which = 4) ## End(Not run) ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## ## parameter of interest: scale parameter houses.marg <- cond(houses.rsm, scale) plot(houses.marg, which = 2)
Performs approximate conditional inference on a scalar parameter of
interest in regression-scale models. The output is stored in an
object of class marg.
## S3 method for class 'rsm' cond(object, offset, formula = NULL, family = NULL, dispersion = NULL, data = sys.frame(sys.parent()), pts = 20, n = max(100, 2*pts), tms = 0.6, from = NULL, to = NULL, control = glm.control(...), trace = FALSE, ...)## S3 method for class 'rsm' cond(object, offset, formula = NULL, family = NULL, dispersion = NULL, data = sys.frame(sys.parent()), pts = 20, n = max(100, 2*pts), tms = 0.6, from = NULL, to = NULL, control = glm.control(...), trace = FALSE, ...)
object |
a |
offset |
either the covariate occurring in the model formula whose
coefficient represents the parameter of interest or |
formula |
a formula expression (only if no |
family |
a |
dispersion |
argument only to be used if no |
data |
an optional data frame in which to interpret the variables
occurring in the formula (only if no |
pts |
number of output points (minimum 10) that are calculated exactly; the default is 20. |
n |
approximate number of output points (minimum 50) produced by the
spline interpolation. The default is the maximum between 100 and
twice |
tms |
defines the range MLE +/- |
from |
starting value of the sequence that contains the values of the parameter of interest for which output points are calculated exactly. The default is MLE - 3.5 * s.e. |
to |
ending value of the sequence that contains the values of the parameter of interest for which output points are calculated exactly. The default is MLE + 3.5 * s.e. |
control |
a list of iteration and algorithmic constants that control the
|
trace |
if |
... |
additional arguments, such as |
This function is a method for the generic function cond
for class rsm. It can be invoked by calling cond for
an object of the appropriate class, or directly by calling
cond.rsm regardless of the class of the object.
cond.rsm has also to be used if the rsm object is not
provided throught the object argument but specified by
formula and family.
The function cond.rsm implements several small sample
asymptotic methods for approximate conditional inference in
regression-scale models. Approximations for both the modified/marginal
log likelihood function and approximate conditional/marginal tail
probabilities are
available (see marg.object for details). Attention is
restricted to a scalar parameter of interest, either a regression
coefficient or the scale parameter. In the first case, the
associated covariate may be either numerical or a two-level factor.
Approximate conditional (or equivalently marginal) inference is performed
by either updating a
fitted regression-scale model or defining the model formula and
family. All approximations are calculated exactly for pts
equally spaced points ranging from from to to. A
spline interpolation is used to extend them over the whole interval
of interest, except for the range of values defined by MLE
+/- tms * s.e. where the spline interpolation is
replaced by a higher order polynomial interpolation. This is done
in order to avoid numerical instabilities which are likely to occur
for values of the parameter of interest close to the MLE.
Results
are stored in an object of class marg. Method functions
like print, summary and
plot can be used to examine the output or
represent it graphically. Components can be extracted using
coef, formula and
family.
Main references for the methods considered are the papers by Barndorff-Nielsen (1991), DiCiccio, Field and Fraser (1990) and DiCiccio and Field (1991). The theory and statistics used are summarized in Brazzale (2000, Chapters 2 and 3). More details of the implementation are given in Brazzale (1999; 2000, Section 6.3.1).
The returned value is an object of class marg; see
marg.object for details.
If the parameter of interest is the scale parameter, all calculations are performed on the logarithmic scale, though most results are reported on the original scale.
In rare occasions, cond.rsm dumps because of non-convergence
of the function rsm which is used to refit the model
for a fixed value of the parameter of interest. This happens for
instance if this value is too extreme. The arguments from
and to may then be used to limit the default range of
MLE +/- 3.5 * s.e. A further possibility is to
fine-tuning the constants (number of iterations, convergence
threshold) that control the rsm fit through the
control argument.
cond.rsm may also dump if the estimate of the parameter of
interest is large (tipically > 400) in absolute value. This may be
avoided by reparametrizing the model.
Barndorff-Nielsen, O. E. (1991) Modified signed log likelihood ratio. Biometrika, 78, 557–564.
Brazzale, A. R. (1999) Approximate conditional inference for logistic and loglinear models. J. Comput. Graph. Statist., 8, 653–661.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
DiCiccio, T. J., Field, C. A. and Fraser, D. A. S. (1990) Approximations of marginal tail probabilities and inference for scalar parameters. Biometrika, 77, 77–95.
DiCiccio, T. J. and Field, C. A. (1991) An accurate method for approximate conditional and Bayesian inference about linear regression models from censored data. Biometrika, 78, 903–910.
marg.object, summary.marg,
plot.marg, rsm
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) ## ## quadratic model fitted to the sea level, includes 18.62-year ## astronomical tidal cycle and 11-year sunspot cycle venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) names(coef(venice.rsm)) ## "(Intercept)" "Year" "I(Year^2)" "c11" "s11" "c19" "s19" ## ## variable of interest: quadratic term venice.marg <- cond(venice.rsm, I(Year^2)) ## detach() ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## ## parameter of interest: scale parameter houses.marg <- cond(houses.rsm, scale)## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) ## ## quadratic model fitted to the sea level, includes 18.62-year ## astronomical tidal cycle and 11-year sunspot cycle venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) names(coef(venice.rsm)) ## "(Intercept)" "Year" "I(Year^2)" "c11" "s11" "c19" "s19" ## ## variable of interest: quadratic term venice.marg <- cond(venice.rsm, I(Year^2)) ## detach() ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## ## parameter of interest: scale parameter houses.marg <- cond(houses.rsm, scale)
The darwin data frame has 15 rows and 3 columns.
Charles Darwin conducted an experiment to examine the superiority of cross-fertilized plants over self-fertilized plants. 15 pairs of plants were used. Each pair consisted of one cross-fertilized plant and one self-fertilized plant which germinated at the same time and grew in the same pot. The heights of the plants were measured at a fixed time after planting. Four different pots were used.
data(darwin)data(darwin)
This data frame contains the following columns:
crossthe heights of the cross-fertilized plants (in inches);
selfthe heights of the self-fertilized plants (in inches);
pota factor variable for the pot number.
The data were obtained from
Andrews, D. F. and Herzberg, A. M. (1985) Data: A Collection of Problems From Many Fields for the Student and Research Worker (Chapter 2). New York: Springer-Verlag.
Darwin, C. (1878) The Effects of Cross and Self Fertilisation in the Vegetable Kingdom (2nd ed.). London: John Murray.
data(darwin) plot(cross - self ~ pot, data = darwin)data(darwin) plot(cross - self ~ pot, data = darwin)
This is a method for the function family() for objects
from which a familyRsm object can be extracted. Typically
a fitted rsm model object. See family for
the general behaviour of this function.
## S3 method for class 'rsm' family(object, ...)## S3 method for class 'rsm' family(object, ...)
object |
any object from which a |
... |
absorbs any additional argument. |
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) family(venice.rsm) detach() ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) family(houses.rsm)## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) family(venice.rsm) detach() ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) family(houses.rsm)
Class of objects that characterize the error distribution of a regression-scale model.
This class of objects is returned by a call to a family.rsm
generator function. See rsm.families for the
distributions which are supported by the marg package. The
object includes a list of functions and expressions that
characterize the error distribution of a regression-scale model.
These are used by the IRLS algorithm implemented in the
rsm fitting routine. New families can be added to the
ones already supported. See the demonstration file
‘margdemo.R’ that ships with the package. There is a
print method for familyRsm objects which
produces a simple summary without any detail; use
unclass(familyRsm.object) to see the contents.
The following components, with the corresponding functionality,
are required for a familyRsm object:
familya character vector giving the family name.
g0a function that yields minus the log density of the error distribution in the regression-scale model.
g1a function that yields the first derivative of minus the log density.
g2a function that yields the second derivative of minus the log density.
dfargument with NULL value; must be included to guarantee
compatibility with the existing code.
kargument with NULL value; must be included to guarantee
compatibility with the existing code.
For the sake of compatibility, the g0, g1 and
g2 functions of a user-defined family can only take two
arguments: y representing an observation and the
... argument which absorbes any additional arguments.
The houses data frame has 26 rows and 5 columns.
Ms. Terry Tasch of Long-Kogan Realty, Chicago, provides data on the selling prices of houses and on different variables describing their status. This data frame contains the prices and a subset of the covariates.
data(houses)data(houses)
This data frame contains the following columns:
priceselling price (in thousands of dollars);
bdroomnumber of bedrooms;
floorfloor space (in square feet);
roomstotal number of rooms;
frontfront footage of lot (in feet).
The data were obtained from
Sen, A. and Srivastava, M. (1990) Regression Analysis: Theory, Methods and Applications (Exhibit 2.2, page 32). New York: Springer-Verlag.
data(houses) summary(houses) pairs(houses)data(houses) summary(houses) pairs(houses)
Density, cumulative distribution, quantiles and random number generator for Huber's least favourable distribution.
dHuber(x, k = 1.345) pHuber(q, k = 1.345) qHuber(p, k = 1.345) rHuber(n, k = 1.345)dHuber(x, k = 1.345) pHuber(q, k = 1.345) qHuber(p, k = 1.345) rHuber(n, k = 1.345)
x |
vector of quantiles. Missing values ( |
q |
vector of quantiles. Missing values ( |
p |
vector of probabilities. Missing values ( |
n |
sample size. If |
k |
tuning constant. Values should preferably lie between 1 and 1.5. The default is 1.345, which gives 95% efficiency at the Normal. |
Inversion of the cumulative distribution function is used to generate deviates from Huber's least favourable distribution.
Density (dHuber), probability (pHuber),
quantile (qHuber), or random sample (rHuber)
for Huber's least favourable distribution with tuning constant
k. If values are missing, NAs will be returned.
The function rHuber causes creation of the dataset
.Random.seed if it does not already exist; otherwise its
value is updated.
Huber's least favourable distribution is a compound distribution
with gaussian behaviour in the interval (-k,k) and
double exponential tails. It is strongly related to Huber's
M-estimator, which represents the maximum likelihood estimator of
the location parameter.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986) Robust Statistics: The Approach Based on Influence Functions. New York: Wiley.
pHuber(0.5) ## 0.680374 pHuber(0.5, k = 1.5) ## 0.6842623pHuber(0.5) ## 0.680374 pHuber(0.5, k = 1.5) ## 0.6842623
Computes the log likelihood for regression-scale models.
## S3 method for class 'rsm' logLik(object, ...)## S3 method for class 'rsm' logLik(object, ...)
object |
an object inheriting from class |
... |
absorbs any additional argument. |
This is a method for the function logLik() for objects
inheriting from class rsm.
Returns an object class logLik which is a number
with attributes, attr(r, "df") (degrees of freedom)
giving the number of parameters (regression coefficients plus
scale parameter, if not fixed) in the model.
The default
print method for logLik objects is used.
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## logLik(venice.rsm) detach()## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## logLik(venice.rsm) detach()
Class of objects returned when performing approximate conditional inference for regression-scale models.
Objects of class marg are implemented as a list. The
following components are included:
workspace |
a list whose elements are the spline interpolations of several first order and higher order statistics. They are used to implement the following likelihood quantities: - the profile and modified profile/approximate marginal log likelihoods; - the Wald pivots from the unconditional and conditional/approximate marginal MLEs; - the profile and modified/approximate marginal likelihood roots; - the conditional/approximate marginal Lugannani-Rice tail area approximation; - the correction term used in the higher order statistics; - the conditional/marginal information and nuisance parameter aspects. Method functions work mainly on this part of the object. In order to avoid errors in the calculation of confidence intervals and tail probabilities, this part of the object should not be modified. |
coefficients |
a |
call |
the function call that created the |
formula |
the model formula. |
family |
the name of the error distribution. |
offset |
the covariate occurring in the model formula whose coefficient
represents the parameter of interest or |
diagnostics |
diagnostics related to the decomposition of the higher order adjustments into an information and a nuisance parameters term. |
n.approx |
the number of output points for which the statistics have been calculated exactly. |
omitted.val |
the range of values omitted in the spline interpolation of some of the higher order statistics considered. The aim is to avoid numerical instabilities around the maximum likelihood estimate. |
is.scalar |
a logical value indicating whether there are any nuisance
parameters. If |
Main references for the methods considered are the papers by Barndorff-Nielsen (1991), DiCiccio, Field and Fraser (1990) and DiCiccio and Field (1991). The theory and statistics used are summarized in Brazzale (2000, Chapters 2 and 3). More details of the implementation and of the methods considered are given in Brazzale (1999; 2000, Section 6.3.1).
This class of objects is returned from calls to the function
cond.rsm.
The class marg has methods for summary,
plot, print,
coef and family, among
others.
If the parameter of interest is the scale parameter, all calculations are performed on the logarithmic scale, though most results are reported on the original scale.
Barndorff-Nielsen, O. E. (1991) Modified signed log likelihood ratio. Biometrika, 78, 557–564.
Brazzale, A. R. (1999) Approximate conditional inference for logistic and loglinear models. J. Comput. Graph. Statist., 8, 653–661.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
DiCiccio, T. J., Field, C. A. and Fraser, D. A. S. (1990) Approximations of marginal tail probabilities and inference for scalar parameters. Biometrika, 77, 77–95.
DiCiccio, T. J. and Field, C. A. (1991) An accurate method for approximate conditional and Bayesian inference about linear regression models from censored data. Biometrika, 78, 903–910.
cond.rsm, summary.marg,
plot.marg
This data frame contains data on the construction of 32 light water reactors constructed in the USA.
data(nuclear)data(nuclear)
A data frame with 32 observations on the following 11 variables.
costcost of construction (in billions of dollars adjusted to a 1976 base)
datedate at which the construction permit was issued
T1time between application for and issue of permit
T2time between issue of operating license and construction permit
cappower plant capacity (in MWe)
PR1 if light water reactor already present on site
NE1 if constructed in north-east region of USA
CT1 if cooling tower used
BW1 if nuclear steam supply system manufactured by
Babcock-Wilcox
Ncumulative number of power plants constructed by each architect-engineer
PT1 if partial turnkey plant
The data were obtained from
Cox, D.R. and Snell, E.J. (1981). Applied Statistics (page 81, Example G). Chapman and Hall, London.
data(nuclear)data(nuclear)
Creates a set of plots for an object of class marg.
## S3 method for class 'marg' plot(x = stop("nothing to plot"), from = x.axis[1], to = x.axis[n], which = NULL, alpha = 0.05, add.leg = TRUE, loc.leg = FALSE, add.labs = TRUE, cex = 0.7, cex.lab = 1, cex.axis = 1, cex.main = 1, lwd1 = 1, lwd2 = 2, lty1 = "solid", lty2 = "dashed", col1 = "black", col2 = "blue", tck = 0.02, las = 1, adj = 0.5, lab = c(15, 15, 5), ...)## S3 method for class 'marg' plot(x = stop("nothing to plot"), from = x.axis[1], to = x.axis[n], which = NULL, alpha = 0.05, add.leg = TRUE, loc.leg = FALSE, add.labs = TRUE, cex = 0.7, cex.lab = 1, cex.axis = 1, cex.main = 1, lwd1 = 1, lwd2 = 2, lty1 = "solid", lty2 = "dashed", col1 = "black", col2 = "blue", tck = 0.02, las = 1, adj = 0.5, lab = c(15, 15, 5), ...)
x |
a |
from |
the starting value for the x-axis range. The default value has
been set by |
to |
the ending value for the x-axis range. The default value has been
set by |
which |
which plot to print. Admissible values are |
alpha |
the level used to read off confidence intervals; the default is 5%. |
add.leg |
if |
loc.leg |
if |
add.labs |
if |
cex, cex.lab, cex.axis, cex.main
|
the character expansions relative to the standard size of the
device to be used for printing text, labels, axes and main title.
See |
lwd1, lwd2
|
the line widths used to compare different curves in the same plot;
default is |
lty1, lty2
|
line type used to compare different curves in the same plot;
default is |
col1, col2
|
colors used to compare different curves in the same plot; default
is |
tck, las, adj, lab
|
further graphical parameters. See |
... |
optional graphical parameters; see |
Several plots are produced for an object of class marg. A
menu lists all the plots that can be produced. They may be one or
all of the following ones:
Make a plot selection (or 0 to exit) 1: All 2: Profile and modified profile log likelihoods 3: Profile and modified profile likelihood ratios 4: Profile and modified likelihood root 5: Lugannani-Rice approximation 6: Confidence intervals 7: Diagnostics based on INF/NP decomposition Selection:
If no nuisance parameters are presented, a subset of the above pictures is produced. A message is printed if this is the case. More details and examples are given in Brazzale (2000, Sections 6.5 and 5.3.2).
This function is a method for the generic function plot() for
class marg. It can be invoked by calling plot or
directly plot.marg for an object of the appropriate class.
A plot is created on the current graphics device.
The current device is cleared. When add.leg is
TRUE, a legend is added to each plot. Furthermore, if
loc.leg is TRUE, the location of the legend can
be set by the user. All screens are closed, but not cleared, on
termination of the function.
If the parameter of interest is the scale parameter, all calculations are performed on the log scale, though most results are reported on the original scale.
Four diagnostic plots are provided. The two panels on the right trace the information and nuisance correction terms, INF and NP, against the likelihood root statistic. These are generally smooth functions and used to approximate the information and nuisance parameter aspects as a function of the parameter of interest, as shown in the two panels on the left. This procedure has the advantage of largely eliminating the numerical instabilities that affect the statistics around the MLE. All four pictures are intended to give an idea of the order of magnitude of the two correction terms while trying to deal with the numerical problems that likely occur for these kinds of data.
More details can be found in Brazzale (2000, Appendix B.2).
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
cond.rsm, marg.object,
summary.marg
# Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) # # quadratic model fitted to the sea level, includes 18.62-year # astronomical tidal cycle and 11-year sunspot cycle venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) venice.marg <- cond(venice.rsm, I(Year^2)) plot(venice.marg, which = 4) ## detach()# Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) # # quadratic model fitted to the sea level, includes 18.62-year # astronomical tidal cycle and 11-year sunspot cycle venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) venice.marg <- cond(venice.rsm, I(Year^2)) plot(venice.marg, which = 4) ## detach()
This is a method for the function print() for objects of
class summaryMarg. See print
and print.default for the general behaviour of
this function and for the interpretation of digits.
## S3 method for class 'summaryMarg' print(x, all = x$all, Coef = x$cf, int = x$int, test = x$hyp, digits = if(!is.null(x$digits)) x$digits else max(3, getOption("digits")-3), ...) ## S3 method for class 'summaryMarg' print(x, all, Coef, int, test, digits, ...)## S3 method for class 'summaryMarg' print(x, all = x$all, Coef = x$cf, int = x$int, test = x$hyp, digits = if(!is.null(x$digits)) x$digits else max(3, getOption("digits")-3), ...) ## S3 method for class 'summaryMarg' print(x, all, Coef, int, test, digits, ...)
x |
a |
all |
if |
Coef |
if |
int |
if |
test |
if |
digits |
the number of significant digits to be printed. The default
depends on the value of |
... |
additional arguments. |
Changing the default values of all, Coef, int
and test allows only a subset of the information in the
summaryMarg object to be printed. With all = FALSE,
one-sided confidence intervals and the Lugannani-Rice tail area
approximation are omitted. See summary.marg for more
details.
If the parameter of interest is the scale parameter, all calculations are performed on the log scale, though most results are reported on the original scale.
The amount of information printed may vary depending on whether there are any nuisance parameters. A message is printed if there are none.
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) houses.cond <- cond(houses.rsm, front) print(summary(houses.cond), digits = 4) print(summary(houses.cond), Coef = FALSE)## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) houses.cond <- cond(houses.rsm, front) print(summary(houses.cond), digits = 4) print(summary(houses.cond), Coef = FALSE)
Computes one of the six types of residuals available for regression-scale models.
## S3 method for class 'rsm' residuals(object, type = c("deviance", "pearson", "response", "r.star", "prob", "deletion"), weighting = "observed", ...)## S3 method for class 'rsm' residuals(object, type = c("deviance", "pearson", "response", "r.star", "prob", "deletion"), weighting = "observed", ...)
object |
an object inheriting from class |
type |
character string; defines the type of residuals, with choices
|
weighting |
character string; defines the weight matrix that should be used in
the calculation of the residuals and diagnostics. Possible
choices are |
... |
absorbs any additional argument. |
This is a method for the function residuals() for objects
inheriting from class rsm. As several types of residuals are
available for rsm objects, there is an additional optional
argument type. The "deviance", "pearson",
"r.star", "prob" and "deletion" residuals are
derived from the final IRLS fit. The "response" residuals
are standardized residuals on the scale of the response, the
"prob" residuals are on the scale,
whereas the remaining ones follow approximately the standard normal
distribution.
The default weighting scheme used is "observed". The weights
used are the values stored in the q2 component of the
rsm object. Some of the IRLS weights
returned by rsm may be negative if the error distribution
is Student's t or user-defined. In order to avoid missing values
in the residuals, the default weighting scheme used is then
"score" unless otherwise specified. The "score"
weights are also used by default if Huber's least favourable error
distribution is used.
More details, in particular of the use of these residuals, are given in Brazzale (2000, Section 6.3.1).
A numeric vector of residuals. See Davison and Snell (1991) for detailed definitions of each type of residual.
The summary method for rsm objects produces
response residuals. The residuals component of a rsm
object contains the response residuals.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In Statistical Theory and Modelling: In Honour of Sir David Cox (eds. D.V. Hinkley, N. Reid, and E.J. Snell), 83–106. London: Chapman & Hall.
Davison, A. C. and Tsai, C.-L. (1992) Regression model diagnostics. Int. Stat. Rev., 60, 337–353.
Jorgensen, B. (1984). The delta algorithm and GLIM. Int. Stat. Rev., 52, 283–300.
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## residuals(venice.rsm) ## deviance residuals with observed weights residuals(venice.rsm, type = "r.star", weighting = "score") ## r* residuals with score weights detach()## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## residuals(venice.rsm) ## deviance residuals with observed weights residuals(venice.rsm, type = "r.star", weighting = "score") ## r* residuals with score weights detach()
Produces an object of class rsm which is a regression-scale
model fit of the data.
rsm(formula = formula(data), family = gaussian, data = sys.frame(sys.parent()), dispersion = NULL, weights = NULL, subset = NULL, na.action = na.fail, offset = NULL, method = "rsm.surv", control = glm.control(maxit=100, trace=FALSE), model = FALSE, x = FALSE, y = TRUE, contrasts = NULL, ...)rsm(formula = formula(data), family = gaussian, data = sys.frame(sys.parent()), dispersion = NULL, weights = NULL, subset = NULL, na.action = na.fail, offset = NULL, method = "rsm.surv", control = glm.control(maxit=100, trace=FALSE), model = FALSE, x = FALSE, y = TRUE, contrasts = NULL, ...)
formula |
a formula expression as for other linear regression models, of the
form |
family |
a |
data |
an optional data frame in which to interpret the variables
occurring in the model formula, or in the |
dispersion |
if |
weights |
the optional weights for the fitting criterion. If supplied, the
response variable and the covariates are multiplied by the weights
in the IRLS algorithm. The length of the |
subset |
expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default. |
na.action |
a function to filter missing data. This is applied to the model
frame after any |
offset |
this can be used to specify an a priori known component to
be included in the linear predictor during fitting. An
|
method |
the fitting method to be used; the default is |
control |
a list of iteration and algorithmic constants. See
|
model |
if |
x |
if |
y |
if |
contrasts |
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. The names of the list should be the names of the corresponding variables, and the elements should either be contrast-type matrices (matrices with as many rows as levels of the factor and with columns linearly independent of each other and of a column of one's), or else they should be functions that compute such contrast matrices. |
... |
absorbs any additional argument. |
The model is fitted using Iteratively Reweighted Least
Squares, IRLS for short (Green, 1984,
Jorgensen, 1984). The working response and iterative
weights are computed using the functions contained in the
family.rsm object.
The two workhorses of rsm are rsm.fit and
rsm.surv, which expect an X and Y
argument rather then a formula. The first function is used for the
families student with df 3 and
Huber;
the second one, based on the survreg.fit
routine for fitting parametric survival models, is used in case of
extreme, logistic, logWeibull,
logExponential, logRayleigh and student (with
df > 2) error distributions. In the presence of a
user-defined error distribution the rsm.fit routine is used.
The rsm.null function is invoked to fit an empty (null)
model.
The details are given in Brazzale (2000, Section 6.3.1).
an object of class rsm is returned which inherits from
glm and lm. See rsm.object for details.
The output can be examined by print,
summary, rsm.diag.plots and
anova. Components can be extracted using
fitted, residuals,
formula and family. It can
be modified using update. It has most of the
components of a glm object, with a few more. Use
rsm.object for further details.
In case of extreme, logistic, logWeibull,
logExponential, logRayleigh and student (with
df > 2) error distributions, both methods,
rsm.fit (default choice) and
rsm.surv, can be used to fit the model.
There are, however, examples where one of the two algorithms (most
likely the one invoked by rsm.surv) breaks
down. If this is the case, try and refit the model with the
alternative choice.
The message "negative iterative weights returned!" is
returned if some of the iterative weights (q2 component of
the fitted rsm object) are negative. These would be used by
default by the rsm.diag routine for the definition of
residuals and regression diagnostics. In order to avoid missing
values (NAs), the default weighting scheme "observed"
automatically switches to "score" unless otherwise specified.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
Green, P. J. (1984) Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with Discussion). J. R. Statist. Soc. B, 46, 149–192.
Jorgensen, B. (1984) The delta algorithm and GLIM. Int. Stat. Rev., 52, 283–300.
rsm.object, rsm.fit,
rsm.surv, rsm.null,
rsm.families
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## model fit including all covariates houses.rsm <- rsm(price ~ ., family = student(5), data = houses, method = "rsm.fit", control = glm.control(trace = TRUE)) ## prints information about the iterative procedure at each iteration update(houses.rsm, ~ . - bdroom + offset(7 * bdroom)) ## "bdroom" is included as offset variable with fixed (= 7) coefficient ## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 venice.2.rsm <- rsm(sea ~ Year + I(Year^2), family = extreme) ## quadratic model fitted to sea level data venice.1.rsm <- update(venice.2.rsm, ~. - I(Year^2)) ## linear model fit ## c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## includes 18.62-year astronomical tidal cycle and 11-year sunspot cycle venice.11.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11, family = extreme) venice.19.rsm <- rsm(sea ~ Year + I(Year^2) + c19 + s19, family = extreme) ## includes either astronomical cycle ## ## comparison of linear, quadratic and periodic (11-year, 19-year) models plot(year, sea, ylab = "sea level") lines(year, fitted(venice.1.rsm)) lines(year, fitted(venice.2.rsm), col="red") lines(year, fitted(venice.11.rsm), col="blue") lines(year, fitted(venice.19.rsm), col="green") ## detach() ## Darwin's Data on Growth Rates of Plants data(darwin) darwin.rsm <- rsm(cross - self ~ pot - 1, family = student(3), data = darwin) ## Maximum likelihood estimates darwin.rsm <- rsm(cross - self ~ pot - 1, family = Huber, data = darwin) ## M-estimates## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## model fit including all covariates houses.rsm <- rsm(price ~ ., family = student(5), data = houses, method = "rsm.fit", control = glm.control(trace = TRUE)) ## prints information about the iterative procedure at each iteration update(houses.rsm, ~ . - bdroom + offset(7 * bdroom)) ## "bdroom" is included as offset variable with fixed (= 7) coefficient ## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 venice.2.rsm <- rsm(sea ~ Year + I(Year^2), family = extreme) ## quadratic model fitted to sea level data venice.1.rsm <- update(venice.2.rsm, ~. - I(Year^2)) ## linear model fit ## c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## includes 18.62-year astronomical tidal cycle and 11-year sunspot cycle venice.11.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11, family = extreme) venice.19.rsm <- rsm(sea ~ Year + I(Year^2) + c19 + s19, family = extreme) ## includes either astronomical cycle ## ## comparison of linear, quadratic and periodic (11-year, 19-year) models plot(year, sea, ylab = "sea level") lines(year, fitted(venice.1.rsm)) lines(year, fitted(venice.2.rsm), col="red") lines(year, fitted(venice.11.rsm), col="blue") lines(year, fitted(venice.19.rsm), col="green") ## detach() ## Darwin's Data on Growth Rates of Plants data(darwin) darwin.rsm <- rsm(cross - self ~ pot - 1, family = student(3), data = darwin) ## Maximum likelihood estimates darwin.rsm <- rsm(cross - self ~ pot - 1, family = Huber, data = darwin) ## M-estimates
Calculates different types of residuals, Cook's distance and the leverages for a regression-scale model.
rsm.diag(rsmfit, weighting = "observed")rsm.diag(rsmfit, weighting = "observed")
rsmfit |
an |
weighting |
character string; defines the weight matrix that should be used
in the calculation of the residuals and diagnostics. Possible
choices are |
If the weighting scheme is "observed", the weights used are
the values stored in the q2 component of the rsm
object rsmfit. Otherwise, they are calculated by
rsm.diag. Some of the IRLS weights returned by
rsm may be negative if the error distribution is Student's
t or user-defined. In order to avoid missing values in the
residuals and regression diagnostics, the default weighting scheme
used in rsm.diag switches automatically from
"observed" to "score" unless otherwise specified. The
"score" weights are also used by default if Huber's least
favourable error distribution is used.
There are three types of residuals. The response residuals are
taken on the response scale, whereas the probability transform
residuals are on the scale. The remaining
ones follow approximately the standard normal distribution.
More details and in particular the definitions of the above residuals and diagnostics can be found in Brazzale (2000, Section 6.3.1).
Returns a list with the following components:
resid |
the response residuals on the response scale. |
rd |
the standardized deviance residuals from the IRLS fit. |
rp |
the standardized Pearson residuals from the IRLS fit. |
rg |
the deletion residuals from the IRLS fit. |
rs |
the |
rcs |
the probability transform residuals from the IRLS fit. |
cook |
Cook's distance. |
h |
the leverages of the observations. |
dispersion |
the value of the scale parameter. |
This function is based on A.J. Canty's function glm.diag
contained in the package boot.
Huber's least favourable distribution represents a special case.
The regression diagnostics are only meaningful if the errors
truly follow a Huber-type distribution. This no longer holds
if the option family = Huber in rsm is used to
obtain the M-estimates of the parameters in place or the maximum
likelihood estimates.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
Jorgensen, B. (1984) The delta algorithm and GLIM. Int. Stat. Rev., 52, 283–300.
Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In Statistical Theory and Modelling: In Honour of Sir David Cox (eds. D. V. Hinkley, N. Reid, and E. J. Snell), 83–106. London: Chapman & Hall.
Davison, A. C. and Tsai, C.-L. (1992) Regression model diagnostics. Int. Stat. Rev., 60, 337–353.
rsm.diag.plots, rsm.object,
summary.rsm
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) venice.diag <- rsm.diag(venice.rsm) ## observed weights detach() ## Darwin's Data on Growth Rates of Plants data(darwin) darwin.rsm <- rsm(cross-self ~ pot - 1, family = Huber, data = darwin) darwin.diag <- rsm.diag(darwin.rsm) ## score weights## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) venice.diag <- rsm.diag(venice.rsm) ## observed weights detach() ## Darwin's Data on Growth Rates of Plants data(darwin) darwin.rsm <- rsm(cross-self ~ pot - 1, family = Huber, data = darwin) darwin.diag <- rsm.diag(darwin.rsm) ## score weights
Generates diagnostic plots for a regression-scale model using different types of residuals, Cook's distance and the leverages.
rsm.diag.plots(rsmfit, rsmdiag = NULL, weighting = NULL, which = NULL, subset = NULL, iden = FALSE, labels = NULL, ret = FALSE, ...) ## S3 method for class 'rsm' plot(x, ...)rsm.diag.plots(rsmfit, rsmdiag = NULL, weighting = NULL, which = NULL, subset = NULL, iden = FALSE, labels = NULL, ret = FALSE, ...) ## S3 method for class 'rsm' plot(x, ...)
rsmfit, x
|
a |
rsmdiag |
the object returned by a call to |
weighting |
character string; defines the weight matrix that should be used in
the calculation of the residuals and diagnostics. Possible
choices are |
which |
which plot to print. Admissible values are |
subset |
subset of data used in the original |
iden |
logical argument. If |
labels |
a vector of labels for use with |
ret |
logical argument indicating if |
... |
additional arguments such as graphical parameters. |
The diagnostics required for the plots are calculated by
rsm.diag. These are then used to produce the plots
on the current graphics device.
A menu lists all the plots that can be produced. They may be one or all of the following:
Make a plot selection (or 0 to exit) 1: All 2: Response residuals against fitted values 3: Deviance residuals against fitted values 4: QQ-plot of deviance residuals 5: Normal QQ-plot of r* residuals 6: Cook statistic against h/(1-h) 7: Cook statistic against observation number Selection:
In the normal scores plots, the dotted line represents the expected line if the residuals are normally distributed, i.e. it is the line with intercept 0 and slope 1.
In general, when plotting Cook's distance against the standardized
leverages, there will be two dotted lines on the plot. The
horizontal line is at , where is
the number of observations and is the number of
estimated parameters. Points above this line may be points with
high influence on the model. The vertical line is at
and points to the right of this line have
high leverage compared to the variance of the raw residual at that
point. If all points are below the horizontal line or to the left
of the vertical line then the line is not shown.
Use of iden = TRUE is encouraged for proper exploration of
these plots as a guide to how well the model fits the data and
whether certain observations have an unduly large effect on
parameter estimates.
If ret is TRUE then the value of rsmdiag
is returned, otherwise there is no returned value.
The current device is cleared. If iden = TRUE, interactive
identification of points is enabled. All screens are closed, but
not cleared, on termination of the function.
This function is based on A. J. Canty's function
glm.diag.plots contained in the package boot.
Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In Statistical Theory and Modelling: In Honour of Sir David Cox (eds. D. V. Hinkley, N. Reid, and E. J. Snell), 83–106. London: Chapman & Hall, London.
Davison, A. C. and Tsai, C.-L. (1992) Regression model diagnostics. Int. Stat. Rev., 60, 337–353.
Jorgensen, B. (1984) The Delta Algorithm and GLIM. Int. Stat. Rev., 52, 283–300.
rsm.diag, rsm.object,
identify
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## Not run: rsm.diag.plots(venice.rsm, which = 3) ## End(Not run) ## or ## Not run: plot(venice.rsm) ## End(Not run) ## menu-driven ## rsm.diag.plots(venice.rsm, which = 5, las = 1) ## normal QQ-plot of r* residuals ## Not run: rsm.diag.plots(venice.rsm, which = 7, iden = T, labels = paste(1931:1981)) ## End(Not run) ## year 1932 highly influential detach()## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## Not run: rsm.diag.plots(venice.rsm, which = 3) ## End(Not run) ## or ## Not run: plot(venice.rsm) ## End(Not run) ## menu-driven ## rsm.diag.plots(venice.rsm, which = 5, las = 1) ## normal QQ-plot of r* residuals ## Not run: rsm.diag.plots(venice.rsm, which = 7, iden = T, labels = paste(1931:1981)) ## End(Not run) ## year 1932 highly influential detach()
Generates a familyRsm object containing a list of functions
and expressions used by rsm.
extreme() Huber(k = 1.345) logistic() logWeibull() student(df = stop("Argument \"df\" is missing, with no default"))extreme() Huber(k = 1.345) logistic() logWeibull() student(df = stop("Argument \"df\" is missing, with no default"))
k |
the tuning constant in Huber's least favourable distribution. |
df |
the degrees of freedom in Student's t distribution. |
Each of the names are associated with a member of the class of error
distributions for regression-scale models. Users can construct
their own families, as long as they have components compatible with
those given in rsm.distributions. The demonstration file
‘margdemo.R’ that accompanies the package shows how to
create a new generator function. When passed as an argument to
rsm with the default setting, the empty parentheses
() can be omitted. There is a print method for the
class familyRsm.
A familyRsm object, which is a list of functions and
expressions used by rsm in the iteratively reweighed
least-squares algorithm. See familyRsm.object for
details.
familyRsm.object, family.rsm,
rsm, Huber
student(df = 3) ## generates Student's t error distribution with 3 d.f. ## Not run: rsm(formula = value, data = value, family = extreme) ## End(Not run)student(df = 3) ## generates Student's t error distribution with 3 d.f. ## Not run: rsm(formula = value, data = value, family = extreme) ## End(Not run)
Fits a rsm model without computing the model matrix of the
response vector.
rsm.fit(X, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)rsm.fit(X, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
X |
the model matrix (design matrix). |
Y |
the response vector. |
dispersion |
if |
score.dispersion |
must default to |
offset |
optional offset added to the linear predictor. |
family |
a |
maxit |
maximum number of iterations allowed. |
epsilon |
convergence threshold. |
trace |
if |
... |
not used, but do absorb any redundant argument. |
The rsm.fit function is called internally by the
rsm routine to do the actual model fitting. Although
it is not intended to be used directly by the user, it may be useful
when the same data frame is used over and over again. It might save
computational time, since the model matrix is not created. No
formula needs to be specified as an argument. As no weights
argument is available, the response Y and the model matrix
X must already include the weights if weighting is desired.
an object which is a subset of a rsm object.
The rsm.fit function is the workhorse of the rsm
fitting routine for the student (with df
2), Huber and user-defined error
distributions. It receives X and Y data rather than a
formula, but still uses the family.rsm object to define the
IRLS steps. Users can write
their own versions of rsm.fit, and pass the name of their
function via the method argument to rsm. Care should
be taken to include as many of the arguments as feasible, but
definitely the ... argument, which will absorb any
additional argument given in the call from rsm.
rsm, rsm.surv, rsm.null,
rsm.object, rsm.families
Fits a rsm model with empty model matrix.
rsm.null(X = NULL, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)rsm.null(X = NULL, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
X |
defaults to |
Y |
the response vector. |
dispersion |
either |
score.dispersion |
must default to |
offset |
optional offset added to the linear predictor. |
family |
a |
maxit |
maximum number of iterations allowed. |
epsilon |
convergence threshold. |
trace |
if |
... |
not used, but do absorb any redundant argument. |
The rsm.null function is called internally by the
rsm routine to do the actual model fitting in case of an
empty model. It is not intended to be used directly by the user. As
no weights argument is available, the response Y and
the model matrix X must already include the weights if
weighting is desired.
an object which is a subset of a rsm object.
rsm, rsm.surv, rsm.fit,
rsm.object, rsm.families
Class of objects returned when fitting a regression-scale model.
The following components must be included in a rsm object:
coefficients |
the coefficients of the linear predictor, which multiply the columns of the model matrix. The names of the coefficients are the names of the single-degree-of-freedom effects (the columns of the model matrix). If the model is over-determined there will be missing values in the coefficients corresponding to inestimable coefficients. |
dispersion |
the (estimated or known) value of the scale parameter. |
fixed |
a logical value. If |
residuals |
the response residuals from the fit. If weights were used, they
are not taken into account. If you need other kinds of
residuals, use the |
fitted.values |
the fitted values from the fit. If weights were used, the fitted values are not adjusted for the weights. |
loglik |
the log likelihood from the fit. |
q1 |
the value of the first derivative of minus the log density for each observation. |
q2 |
the value of the second derivative of minus the log density for each observation. |
rank |
the computed rank (number of linearly independent columns in the model matrix). |
R |
the unscaled observed information matrix. |
score.dispersion |
a list containing the value of the objective function, its gradient and the convergence diagnostic, that result from estimating the scale parameter. |
iter |
the number of IRLS iterations used to compute the estimates. |
weights |
the (optional) weights used for the fit. |
assign |
the list of assignments of coefficients (and effects) to the terms in the model. The names of this list are the names of the terms. The ith element of the list is the vector saying which coefficients correspond to the ith term. It may be of length 0 if there were no estimable effects for the term. |
df.residuals |
the number of degrees of freedom for residuals. |
family |
the entire |
user.def |
a logical value. If |
dist |
a character string representing the name of the error distribution. |
formula |
the model formula. |
data |
the data frame in which to interpret the variables occurring in
the model formula, or in the |
terms |
an object of mode |
contrasts |
a list containing sufficient information to construct the contrasts used to fit any factors occurring in the model. The list contains entries that are either matrices or character vectors. When a factor is coded by contrasts, the corresponding contrast matrix is stored in this list. Factors that appear only as dummy variables and variables in the model that are matrices correspond to character vectors in the list. The character vector has the level names for a factor or the column labels for a matrix. |
control |
a list of iteration and algorithmic constants used in |
call |
an image of the call that produced the object, but with the arguments all named and with the actual formula included as the formula argument. |
y |
optionally the response, if |
x |
optionally the model matrix, if |
model |
optionally the model frame, if |
This class of objects is returned by the rsm function
to represent a fitted regression-scale model. Class rsm
inherits from classes glm and
lm, since it is fitted by iteratively reweighted
least squares. The object returned has all the components of a
weighted least squares object.
Objects of this class have methods for the functions
print, summary,
anova and fitted among
others.
The residuals, fitted values and coefficients should be extracted by
the generic functions of the same name, rather than by the $
operator.
Fits a rsm model without computing the model matrix of the
response vector.
rsm.surv(X, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)rsm.surv(X, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
X |
the model matrix (design matrix). |
Y |
the response vector. |
offset |
optional offset added to the linear predictor. |
family |
a |
dispersion |
if |
score.dispersion |
must default to |
maxit |
maximum number of iterations. |
epsilon |
convergence threshold. |
trace |
if |
... |
not used, but do absorb any redundant argument. |
The rsm.surv function is called internally by the
rsm routine to do the actual model fitting. Although
it is not intended to be used directly by the user, it may be useful
when the same data frame is used over and over again. It might save
computational time, since the model matrix is not created. No
formula needs to be specified as an argument. As no weights
argument is available, the response Y and the model matrix
X must already include the weights if weighting is desired.
an object, which is a subset of a rsm object.
The rsm.surv function is the default option for
rsm for the extreme, logistic,
logWeibull, logExponential, logRayleigh and
student (with df larger than 2) error distributions.
It makes use of the survreg.fit routine to
estimate parametric survival models. It receives X and
Y data rather than a formula, but still uses the
family.rsm object to define the IRLS steps. The
rsm.surv routine cannot be used for Huber-type and
user-defined error distributions.
rsm, rsm.fit, rsm.null,
rsm.object, rsm.families
Returns a summary list for objects of class marg.
## S3 method for class 'marg' summary(object, alpha = 0.05, test = NULL, all = FALSE, coef = TRUE, int = ifelse((is.null(test) || all), TRUE, FALSE), digits = NULL, ...)## S3 method for class 'marg' summary(object, alpha = 0.05, test = NULL, all = FALSE, coef = TRUE, int = ifelse((is.null(test) || all), TRUE, FALSE), digits = NULL, ...)
object |
a |
alpha |
a vector of levels for confidence intervals; the default is 5%. |
test |
a vector of values of the parameter of interest one wants to test
for. If |
all |
logical value; if |
coef |
logical value; if |
int |
logical value; if |
digits |
the number of significant digits to be printed. The default depends
on the value of |
... |
absorbs any additional argument. |
This function is a method for the generic function summary()
for objects of class marg. It can be invoked by calling
summary or directly summary.marg for an object of the
appropriate class.
A list is returned with the following components:
coefficients |
a |
conf.int |
a matrix containing, for each level given in |
signif.tests |
a list with two elements. The first ( |
call |
the function call that created the |
formula |
the model formula. |
family |
the name of the error distribution. |
offset |
the covariate occurring in the model formula whose coefficient
represents the parameter of interest or |
alpha |
the vector of levels used to compute the confidence intervals. |
hypotheses |
the values for the parameter of interest that have been tested for. |
diagnostics |
the information and nuisance parameters aspects; see
|
n.approx |
the number of output points that have been calculated exactly. |
all |
logical value; if |
cf |
logical value; if |
int |
logical value; if |
is.scalar |
a logical value indicating whether there are any nuisance
parameters. If |
digits |
the number of significant digits to be printed. |
If the parameter of interest is the scale parameter, all calculations are performed on the log scale, though most results are reported on the original scale.
The amount of information calculated may vary depending on whether there are any nuisance parameters. A message is printed if there are none.
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) houses.marg <- cond(houses.rsm, floor) summary(houses.marg, test = 0, coef = FALSE)## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) houses.marg <- cond(houses.rsm, floor) summary(houses.marg, test = 0, coef = FALSE)
Returns a summary list for a fitted regression-scale model.
## S3 method for class 'rsm' summary(object, correlation = FALSE, digits = NULL, ...)## S3 method for class 'rsm' summary(object, correlation = FALSE, digits = NULL, ...)
object |
a fitted |
correlation |
logical argument. If |
digits |
a non-null value specifies the minimum number of significant
digits to be printed in values. If |
... |
absorbs any additional argument. |
This function is a method for the generic function
summary() for class rsm. It can be invoked by
calling summary for an object of the appropriate class, or
directly by calling summary.rsm regardless of the class of
the object.
A list is returned with the following components:
coefficients |
a matrix with four columns, containing the coefficients, their
standard errors, the |
dispersion |
the value of the scale parameter used in the computations. |
fixed |
a logical value. If |
residuals |
the response residuals. |
cov.unscaled |
the unscaled covariance matrix, i.e, a matrix such that multiplying it by the squared scale parameter, or an estimate thereof, produces an estimated (asymptotic) covariance matrix for the coefficients. |
correlation |
the computed correlation matrix for the coefficients in the model. |
family |
the entire |
loglik |
the computed log likelihood. |
terms |
an object of mode |
df |
the number of degrees of freedom for the model and for the residuals. |
iter |
the number of IRLS iterations used to compute the estimates. |
nas |
a logical vector indicating which, if any, coefficients are missing. |
call |
an image of the call that produced the |
digits |
the value of the |
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) summary(houses.rsm)## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) summary(houses.rsm)
update.rsm is used to update a rsm model
formulae. This typically involves adding or dropping terms, but
updates can be more general.
## S3 method for class 'rsm' update(object, formula., ..., evaluate = TRUE)## S3 method for class 'rsm' update(object, formula., ..., evaluate = TRUE)
object |
a model of class |
formula. |
changes to the formula – see |
... |
additional arguments to the call, or arguments with changed
values. Use |
evaluate |
if |
If evaluate = TRUE the fitted object, otherwise the updated
call.
Based upon update.default.
update, update.default,
update.formula
data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## model fit including all covariates ## houses.rsm <- update(houses.rsm, method = "rsm.fit", control = glm.control(trace = TRUE)) ## prints information about the iterative procedure at each iteration ## update(houses.rsm, ~ . - bdroom + offset(7 * bdroom)) ## "bdroom" is included as offset variable with fixed (= 7) coefficientdata(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## model fit including all covariates ## houses.rsm <- update(houses.rsm, method = "rsm.fit", control = glm.control(trace = TRUE)) ## prints information about the iterative procedure at each iteration ## update(houses.rsm, ~ . - bdroom + offset(7 * bdroom)) ## "bdroom" is included as offset variable with fixed (= 7) coefficient
Returns the variance-covariance matrix of the parameters of a
fitted rsm model object.
## S3 method for class 'rsm' vcov(object, correlation = FALSE, ...)## S3 method for class 'rsm' vcov(object, correlation = FALSE, ...)
object |
a fitted model object of class |
correlation |
if |
... |
absobs any additional argument. |
This is a method for function vcov for objects
of class rsm.
A matrix of the estimated covariances between the parameter
estimates of a fitted regression-scale model, or, if
dispersion = TRUE the correlation matrix.
vcov, rsm.object,
rsm, summary.rsm
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## vcov(venice.rsm) vcov(venice.rsm, corr = TRUE) ## detach()## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## vcov(venice.rsm) vcov(venice.rsm, corr = TRUE) ## detach()
The venice data frame has 51 rows and 2 columns.
Pirazzoli (1982) collected the ten largest values of sea
levels in Venice (with a few exceptions) for the years 1887–1981.
The venice data frame contains the maxima for the years
1931–1981.
data(venice)data(venice)
This data frame contains the following columns:
yearthe years;
seathe sea levels (in cm).
The data were obtained from
Smith, R. L. (1986) Extreme value theory based on the
-largest annual events. Journal of Hydrology ,
86, 27–43.
Pirazzoli, P. (1982) Maree estreme a Venezia (periodo 1872–1981). Acqua Aria, 10, 1023–1039.
data(venice) attach(venice) # plot(sea ~ year, ylab = "sea level") ## Year <- 1:51/51 venice.l <- rsm(sea ~ Year + I(Year^2), family = extreme) lines(year, fitted(venice.l)) ## c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.p <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) lines(year, fitted(venice.p), col = "red") ## detach()data(venice) attach(venice) # plot(sea ~ year, ylab = "sea level") ## Year <- 1:51/51 venice.l <- rsm(sea ~ Year + I(Year^2), family = extreme) lines(year, fitted(venice.l)) ## c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.p <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) lines(year, fitted(venice.p), col = "red") ## detach()