At the onset, this was strictly an excercise of my own curiosity and I didn't imagine writing this down in any form at all. As someone who has done some modelling work in the past, I'm embarrassed to say that I had never fully grasped how one can gauge the error of a model output without having to do some sort of Monte Carlo simulation whereby the model parameters are repeatedly randomized within a given confidence interval. Its relatively easy to imagine that a model containing many parameters, each with an associated error, will tend to propagate these errors throughout. Without getting to far over my head here, I will just say that there are defined methods for calculating the error of a variable if one knows the underlying error of the functions that define them (and I have tried out only a very simple one here!).
In the example below, I have three main variables (x, y, and z) and two functions that define the relationships y~x and z~y. The question is, given these functions, what would be the error of a predicted z value given an initial x value? The most general rule seems to be:
error(z~x)^2 = error(y~x)^2 + error(z~y)^2
However, correlated errors require additional terms (see Wikipedia: Propagation of uncertainty). The following example does just that by simulating correlated error terms using the MASS package's function mvrnorm().
In the example below, I have three main variables (x, y, and z) and two functions that define the relationships y~x and z~y. The question is, given these functions, what would be the error of a predicted z value given an initial x value? The most general rule seems to be:
error(z~x)^2 = error(y~x)^2 + error(z~y)^2
However, correlated errors require additional terms (see Wikipedia: Propagation of uncertainty). The following example does just that by simulating correlated error terms using the MASS package's function mvrnorm().
example:
library(MASS) #package required for generating correlated random numbers set.seed(1111) #set the seed of the random number generator n=10000 #number of values to generate for the variable x e1.sd <- 3 #the standard deviation of the error term in the relationship y~a+b*x+e1 e2.sd <- 10 #the standard deviation of the error term in the relationship z~c+d*y+e2 e.cor <- matrix(c(1,0.3,0.3,1),2) #the correlation matrix of the two error terms (i.e. in this example they are correlated at r=0.3) e.cov <- e.cor*as.matrix(c(e1.sd,e2.sd))%*%t(as.matrix(c(e1.sd,e2.sd))) # the covariance matrix of the error terms e <- mvrnorm(n,c(0,0),e.cov) #generate the error terms givin the defined correlation e1 <- e[,1] #error term #1 e2 <- e[,2] #error term #2 #cor(e) #shows that we are very close to what we input into the mvrnorm function #cov(e) x <- runif(n, min=0, max=20) #randomly generated x values #the first model defining y~x a <- 10 b <- 2 y <- a + b*x + e1 #the second model defining z~y c <- 20 d <- 12 z <- c + d*y + e2 #implied relationship defining z~x z.plus.e <- c + d*a + d*b*x + d*e1 + e2 #The two error terms are "d*e1" and "e2" #relationship without error (i.e. the z-values that would be calulated with the linear models) z.minus.e <- c + d*a + d*b*x #Comparison of the derived z values as calculated by the models versus the actual z values fit.z <- lm(z ~ z.minus.e - 1) summary(fit.z) plot(z ~ z.minus.e, pch=".", cex=4, col=rgb(0,0,1,0.2)) abline(0,1,col=rgb(0.5,0.5,0.5,0.5), lwd=10) abline(fit.z,col=2, lwd=2) #Comparison of modelled z error versus that calculated from the two introduced error terms sd(resid(fit.z)) sqrt(sd(d*e1)^2 + sd(e2)^2 + 2*cov(d*e1,e2))
Have you tried the 'propagate' function in my qpcR package. It does Monte Carlo (also using mvrnorm and covariance structire), classical first-order Taylor expansion and permutation with ties. If you try it, please make some comments on how to improve it.
ReplyDeleteGreets,
Andrej
Hi Marc. This is not a comment on your post. However, I have a question regarding how to generate correlated outcomes between groups. If I have 3 groups (Control, Treatment1 and Treatment2) and each has an outcome (Y), but the outcomes between groups are correlated by .80. I was wondering how you generate these data. Thanks!
ReplyDeleteHi - you would need to have a matrix (e.g. like e.cor), where the diagonal is 1 and all other values are 0.8. Since you don't mention covariance, then this correlation matrix could go directly into the mvrnorm function:
Deletelibrary(MASS)
n <- 10000
e.cor <- matrix(0.8, 3, 3)
diag(e.cor) <- 1
e <- mvrnorm(n, c(0,0,0), e.cor)
cor(e) # see results