model.frame.Formula.Rd
Computation of model frames, model matrices, and model responses for
extended formulas of class Formula
.
# S3 method for Formula
model.frame(formula, data = NULL, ...,
lhs = NULL, rhs = NULL, dot = "separate")
# S3 method for Formula
model.matrix(object, data = environment(object), ...,
lhs = NULL, rhs = 1, dot = "separate")
# S3 method for Formula
terms(x, ...,
lhs = NULL, rhs = NULL, dot = "separate")
model.part(object, ...)
# S3 method for Formula
model.part(object, data, lhs = 0, rhs = 0,
drop = FALSE, terms = FALSE, dot = NULL, ...)
an object of class Formula
.
a data.frame, list or environment containing the variables in
formula
. For model.part
it needs to be the model.frame
.
indexes specifying which elements of the left- and
right-hand side, respectively, should be employed. NULL
corresponds to all parts, 0
to none. At least one lhs
or
one rhs
has to be specified.
character specifying how to process formula parts with a dot
(.
) on the right-hand side. This can be: "separate"
so that each formula part is expanded separately. "sequential"
so that the parts are expanded sequentially conditional on all prior parts.
"previous"
so the part is expanded to the previous part.
logical. Should the data.frame
be dropped for single
column data frames?
logical. Should the "terms"
attribute (corresponding to
the model.part
extracted) be added?
further arguments passed to the respective
formula
methods.
All three model computations leverage the corresponding standard methods. Additionally, they allow specification of the part(s) of the left- and right-hand side (LHS and RHS) that should be included in the computation.
The idea underlying all three model computations is to extract a suitable
formula
from the more general Formula
and then calling
the standard model.frame
, model.matrix
,
and terms
methods.
More specifically, if the Formula
has multiple parts on the RHS,
they are collapsed, essentially replacing |
by +
. If there
is only a single response on the LHS, then it is kept on the LHS.
Otherwise all parts of the formula are collapsed on the RHS (because formula
objects can not have multiple responses). Hence, for multi-response Formula
objects, the (non-generic) model.response
does
not give the correct results. To avoid confusion a new generic model.part
with suitable formula
method is provided which can always
be used instead of model.response
. Note, however, that it has a different
syntax: It requires the Formula
object in addition to the readily
processed model.frame
supplied in data
(and optionally the lhs
). Also, it returns either a data.frame
with
multiple columns or a single column (dropping the data.frame
property)
depending on whether multiple responses are employed or not.
If the the formula contains one or more dots (.
), some care has to be
taken to process these correctly, especially if the LHS contains transformartions
(such as log
, sqrt
, cbind
, Surv
, etc.). Calling the
terms
method with the original data (untransformed, if any) resolves
all dots (by default separately for each part, otherwise sequentially) and also
includes the original and updated formula as part of the terms. When calling
model.part
either the original untransformed data should be provided
along with a dot
specification or the transformed model.frame
from the same formula without another dot
specification (in which
case the dot
is inferred from the terms
of the model.frame
).
Zeileis A, Croissant Y (2010). Extended Model Formulas in R: Multiple Parts and Multiple Responses. Journal of Statistical Software, 34(1), 1--13. doi:10.18637/jss.v034.i01
## artificial example data
set.seed(1090)
dat <- as.data.frame(matrix(round(runif(21), digits = 2), ncol = 7))
colnames(dat) <- c("y1", "y2", "y3", "x1", "x2", "x3", "x4")
for(i in c(2, 6:7)) dat[[i]] <- factor(dat[[i]] > 0.5, labels = c("a", "b"))
dat$y2[1] <- NA
dat
#> y1 y2 y3 x1 x2 x3 x4
#> 1 0.82 <NA> 0.27 0.09 0.22 b a
#> 2 0.70 b 0.17 0.26 0.46 a a
#> 3 0.65 a 0.28 0.03 0.37 b b
######################################
## single response and two-part RHS ##
######################################
## single response with two-part RHS
F1 <- Formula(log(y1) ~ x1 + x2 | I(x1^2))
length(F1)
#> [1] 1 2
## set up model frame
mf1 <- model.frame(F1, data = dat)
mf1
#> log(y1) x1 x2 I(x1^2)
#> 1 -0.1984509 0.09 0.22 0.0081
#> 2 -0.3566749 0.26 0.46 0.0676
#> 3 -0.4307829 0.03 0.37 9e-04
## extract single response
model.part(F1, data = mf1, lhs = 1, drop = TRUE)
#> 1 2 3
#> -0.1984509 -0.3566749 -0.4307829
model.response(mf1)
#> 1 2 3
#> -0.1984509 -0.3566749 -0.4307829
## model.response() works as usual
## extract model matrices
model.matrix(F1, data = mf1, rhs = 1)
#> (Intercept) x1 x2
#> 1 1 0.09 0.22
#> 2 1 0.26 0.46
#> 3 1 0.03 0.37
#> attr(,"assign")
#> [1] 0 1 2
model.matrix(F1, data = mf1, rhs = 2)
#> (Intercept) I(x1^2)
#> 1 1 0.0081
#> 2 1 0.0676
#> 3 1 0.0009
#> attr(,"assign")
#> [1] 0 1
#########################################
## multiple responses and multiple RHS ##
#########################################
## set up Formula
F2 <- Formula(y1 + y2 | log(y3) ~ x1 + I(x2^2) | 0 + log(x1) | x3 / x4)
length(F2)
#> [1] 2 3
## set up full model frame
mf2 <- model.frame(F2, data = dat)
mf2
#> y1 y2 log(y3) x1 I(x2^2) log(x1) x3 x4
#> 2 0.70 b -1.771957 0.26 0.2116 -1.347074 a a
#> 3 0.65 a -1.272966 0.03 0.1369 -3.506558 b b
## extract responses
model.part(F2, data = mf2, lhs = 1)
#> y1 y2
#> 2 0.70 b
#> 3 0.65 a
model.part(F2, data = mf2, lhs = 2)
#> log(y3)
#> 2 -1.771957
#> 3 -1.272966
## model.response(mf2) does not give correct results!
## extract model matrices
model.matrix(F2, data = mf2, rhs = 1)
#> (Intercept) x1 I(x2^2)
#> 2 1 0.26 0.2116
#> 3 1 0.03 0.1369
#> attr(,"assign")
#> [1] 0 1 2
model.matrix(F2, data = mf2, rhs = 2)
#> log(x1)
#> 2 -1.347074
#> 3 -3.506558
#> attr(,"assign")
#> [1] 1
model.matrix(F2, data = mf2, rhs = 3)
#> (Intercept) x3b x3a:x4b x3b:x4b
#> 2 1 0 0 0
#> 3 1 1 0 1
#> attr(,"assign")
#> [1] 0 1 2 2
#> attr(,"contrasts")
#> attr(,"contrasts")$x3
#> [1] "contr.treatment"
#>
#> attr(,"contrasts")$x4
#> [1] "contr.treatment"
#>
#######################
## Formulas with '.' ##
#######################
## set up Formula with a single '.'
F3 <- Formula(y1 | y2 ~ .)
mf3 <- model.frame(F3, data = dat)
## without y1 or y2
model.matrix(F3, data = mf3)
#> (Intercept) y3 x1 x2 x3b x4b
#> 2 1 0.17 0.26 0.46 0 0
#> 3 1 0.28 0.03 0.37 1 1
#> attr(,"assign")
#> [1] 0 1 2 3 4 5
#> attr(,"contrasts")
#> attr(,"contrasts")$x3
#> [1] "contr.treatment"
#>
#> attr(,"contrasts")$x4
#> [1] "contr.treatment"
#>
## without y1 but with y2
model.matrix(F3, data = mf3, lhs = 1)
#> (Intercept) y2b y3 x1 x2 x3b x4b
#> 2 1 1 0.17 0.26 0.46 0 0
#> 3 1 0 0.28 0.03 0.37 1 1
#> attr(,"assign")
#> [1] 0 1 2 3 4 5 6
#> attr(,"contrasts")
#> attr(,"contrasts")$y2
#> [1] "contr.treatment"
#>
#> attr(,"contrasts")$x3
#> [1] "contr.treatment"
#>
#> attr(,"contrasts")$x4
#> [1] "contr.treatment"
#>
## without y2 but with y1
model.matrix(F3, data = mf3, lhs = 2)
#> (Intercept) y1 y3 x1 x2 x3b x4b
#> 2 1 0.70 0.17 0.26 0.46 0 0
#> 3 1 0.65 0.28 0.03 0.37 1 1
#> attr(,"assign")
#> [1] 0 1 2 3 4 5 6
#> attr(,"contrasts")
#> attr(,"contrasts")$x3
#> [1] "contr.treatment"
#>
#> attr(,"contrasts")$x4
#> [1] "contr.treatment"
#>
## set up Formula with multiple '.'
F3 <- Formula(y1 | y2 | log(y3) ~ . - x3 - x4 | .)
## process both '.' separately (default)
mf3 <- model.frame(F3, data = dat, dot = "separate")
## only x1-x2
model.part(F3, data = mf3, rhs = 1)
#> x1 x2
#> 2 0.26 0.46
#> 3 0.03 0.37
## all x1-x4
model.part(F3, data = mf3, rhs = 2)
#> x1 x2 x3 x4
#> 2 0.26 0.46 a a
#> 3 0.03 0.37 b b
## process the '.' sequentially, i.e., the second RHS conditional on the first
mf3 <- model.frame(F3, data = dat, dot = "sequential")
## only x1-x2
model.part(F3, data = mf3, rhs = 1)
#> x1 x2
#> 2 0.26 0.46
#> 3 0.03 0.37
## only x3-x4
model.part(F3, data = mf3, rhs = 2)
#> x3 x4
#> 2 a a
#> 3 b b
## process the second '.' using the previous RHS element
mf3 <- model.frame(F3, data = dat, dot = "previous")
## only x1-x2
model.part(F3, data = mf3, rhs = 1)
#> x1 x2
#> 2 0.26 0.46
#> 3 0.03 0.37
## x1-x2 again
model.part(F3, data = mf3, rhs = 2)
#> x1 x2
#> 2 0.26 0.46
#> 3 0.03 0.37
##############################
## Process multiple offsets ##
##############################
## set up Formula
F4 <- Formula(y1 ~ x3 + offset(x1) | x4 + offset(log(x2)))
mf4 <- model.frame(F4, data = dat)
## model.part can be applied as above and includes offset!
model.part(F4, data = mf4, rhs = 1)
#> x3 offset(x1)
#> 1 b 0.09
#> 2 a 0.26
#> 3 b 0.03
## additionally, the corresponding corresponding terms can be included
model.part(F4, data = mf4, rhs = 1, terms = TRUE)
#> x3 offset(x1)
#> 1 b 0.09
#> 2 a 0.26
#> 3 b 0.03
## hence model.offset() can be applied to extract offsets
model.offset(model.part(F4, data = mf4, rhs = 1, terms = TRUE))
#> [1] 0.09 0.26 0.03
model.offset(model.part(F4, data = mf4, rhs = 2, terms = TRUE))
#> [1] -1.5141277 -0.7765288 -0.9942523