How to obtain regression coefficients and model adjustments using a correlation or covariance matrix instead of the database using R?

advertisements

I want to be able to regression coefficients from multiple linear regression by supplying a correlation or covariance matrix instead of a data.frame. I realise you lose some information relevant to determining the intercept and so on, but it should even the correlation matrix should be sufficient for getting standardised coefficients and estimates of variance explained.

So for example, if you had the following data

# get some data
library(MASS)
data("Cars93")
x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]

You could run a regression as follows:

lm(EngineSize ~ Horsepower + RPM, x)

but what if instead of having data you had the correlation matrix or the covariance matrix:

corx <- cor(x)
covx <- cov(x)

  • What function in R allows you to run a regression based on the correlation or covariance matrix? Ideally it should be similar to lm so that you can easily obtain things like r-squared, adjusted r-squared, predicted values and so on. Presumably, for some of these things, you would need to also provide the sample size and possibly a vector of means. But that would also be fine.

I.e., something like:

lm(EngineSize ~ Horsepower + RPM, cov = covx) # obviously this doesn't work

Note that this answer on Stats.SE provides a theoretical explanation for why it's possible, and provides an example of some custom R code for calculating coefficients?


Using lavaan you could do the following:

library(MASS)
data("Cars93")
x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]

lav.input<- cov(x)
lav.mean <- colMeans(x)

library(lavaan)
m1 <- 'EngineSize ~ Horsepower+RPM'
fit <- sem(m1, sample.cov = lav.input,sample.nobs = nrow(x), meanstructure = TRUE, sample.mean = lav.mean)
summary(fit, standardize=TRUE)

Results are:

Regressions:
                   Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
  EngineSize ~
    Horsepower          0.015    0.001   19.889    0.000      0.015    0.753
    RPM                -0.001    0.000  -15.197    0.000     -0.001   -0.576

Intercepts:
                  Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
   EngineSize          5.805    0.362   16.022    0.000      5.805    5.627

Variances:
                  Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
    EngineSize          0.142    0.021    6.819    0.000      0.142    0.133