First CCA pattern of Sea Level Pressure (SLP) and Sea Surface Temperature (SST) monthly anomalies for the region between -180 °W to -70 °W and +30 °N to -30 °S. |
This particular method was illustrated by Barnett and Preisendorfer (1997) - it constructs a CCA model based on a truncated subset of EOF coefficients (i.e. "principle components") instead of using the original field (as with MCA). This truncation has several benefits for the fitting of the model - First, one reduces the amount of noise in the problem by eliminating the higher EOF modes, which represent poorly organized, small-scale features of the fields. Second, by using orthogonal functions, the algebra of the problem is simplified (see von Storch and Zweiers 1999 for details). Bretherton etal. (1992) reviewed several techniques for diagnosing coupled patterns and found the Barnett and Preisendorfer method (hereafter "BPCCA") and MCA to be the most robust.
I have used the same data fields (SLP and SST for the equatorial Pacific) as for my prior post on MCA. In the first step of BPCCA, the SLP monthly anomaly field (predictor, 'x') was truncated to 3 EOFs, explaining 73% of the field's variance, and the SST monthly anomaly field (predictant, 'y') was truncated to 4 EOFs, explaining 70% of the field's variance.
For the purpose of the example, I have performed an EOF on the fields for the period of 1950-2004, although only EOF coefficients from 1950-1990 will be included in the BPCCA model. The remaining part of the signal will be used for the demonstration of field prediction. The BPCCA model is then calculated using the "bp.cca" function below, which is based on the excellent recipe for CCA within the book by Wilks (2006). Another helpful resource was the book of von Storch and Zweiers (1999).
The CCA model produces 4 hierarchical variates of decreasing correlation as revealed by the canonical correlation coefficents (rho):
> bpcca$rho
[1] 0.7232763 0.3749655 0.2005449 0.1071221
Using the p.asym function from the CCP package, the first three canonical correlation coefficients are deemed significant at the p < 0.05 level.
Pillai-Bartlett Trace, using F-approximation:
stat approx df1 df2 p.value
1 to 4: 0.71542120 20.648593 20 1896 0.000000e+00
2 to 4: 0.19229257 8.012806 12 1904 8.104628e-15
3 to 4: 0.05169341 4.172160 6 1912 3.568567e-04
4 to 4: 0.01147515 2.761959 2 1920 6.341890e-02
The resulting CCA variates can also be transformed into the original spatial coordinates to produce a map as in the figure at the top. You can see that the 1st CCA pattern largely resembles the 1st MCA mode.
Based on the CCA model, one can predict values for the 4 SST EOF coefficients based on the SLP EOF coefficients (plot below). Furthermore, the model can predict the values after 1990 even though they were not included in the original model. Only the 1st SST EOF coefficient is reproduced with any degree of success, but this is substantial considering the that this leading pattern was responsible for 49% of the SST field's variance.
Finally, given these predicted EOF coefficeients, one can reconstruct the entire field using the spatial EOF patterns. The figure below shows the correlation of this predicted SST field to that of the original. Another often used measure of the predicted field is to calculate a skill score that relates the sum of squared differences of the predicted field (e.g. "SSpred") to that of a reference field (e.g. "SSref") that has been reconstructed from the same truncated EOFs used in the BPCCA model. For example, SSpred / SSref * 100 would calculate the percent increase in sum of squared differences over that of the reference field.
In order to reproduce the analysis shown here, you will need the bp.cca function below. This function is written to work with the output objects of the eof.mca function. Other functions found within this blog are needed to reproduce the example. (color.palette, anomaly, val2col, lon.lat.filter, image.scale, exp.mat, cov4gappy, color.palette) .
References:
- Barnett, T.P., Preisendorfer, R., 1987. Origins and levels of monthly and seasonal forecast skill for united-states surface air temperatures determined by canonical correlation-analysis. Monthly Weather Review 115 (9), 1825-1850.
- Bretherton, C.S., Smith, C., Wallace, J.M., 1992. An intercomparison of methods for finding coupled patterns in climate data. Journal of Climate 5 (6), 541-560.
- Storch, H.V., Zwiers, F.W., 1999. Statistical analysis in climate research. Cambridge University Press, Cambridge.
- Wilks, D.S., 2006. Statistical methods in the atmospheric sciences, 2nd ed. Academic Press, Amsterdam.
the bpcca function:
bp.cca <- function(eofx,eofy, n_pcx_incl=eofx$n_sig, n_pcy_incl=eofy$n_sig, rowx_incl=1:length(eofx$A[,1]), rowy_incl=1:length(eofy$A[,1]), y_lag=0){ require(CCP) #since these are PCs no centering is needed x <- eofx$A[rowx_incl,1:n_pcx_incl] %*% exp.mat(diag(eofx$Lambda[1:n_pcx_incl]), -0.5) y <- eofy$A[rowy_incl,1:n_pcy_incl] %*% exp.mat(diag(eofy$Lambda[1:n_pcy_incl]), -0.5) if(sign(y_lag) != 0){ if(sign(y_lag)==1){ x <- x[1:(length(x[,1])-y_lag),] y <- y[(1+y_lag):length(y[,1]),] } if(sign(y_lag)==-1){ x <- x[(1-y_lag):length(x[,1]),] y <- y[1:(length(y[,1])+y_lag),] } } xdim <- dim(x) ydim <- dim(y) mindim=min(xdim[2], ydim[2]) if(xdim[1]!=ydim[1]) stop("unequal number of rows in x and y") xcols <- 1:xdim[2] ycols <- (xdim[2]+1):(xdim[2]+ydim[2]) Sxx <- cov4gappy(x) Sxy <- cov4gappy(x,F2=y) Syy <- cov4gappy(y) ############################################ #Canonical vectors ("patterns" or loadings)# ############################################ if(xdim[2] >= ydim[2]){ svd1 <- svd(exp.mat(Sxx, -0.5) %*% Sxy %*% exp.mat(Syy, -0.5)) A <- exp.mat(Sxx, -0.5) %*% svd1$u B <- exp.mat(Syy, -0.5) %*% svd1$v } else { svd1 <- svd(exp.mat(Syy, -0.5) %*% t(Sxy) %*% exp.mat(Sxx, -0.5)) A <- exp.mat(Sxx, -0.5) %*% svd1$v B <- exp.mat(Syy, -0.5) %*% svd1$u } #################### #Canonical variates# #################### V <- x %*% A W <- y %*% B ######################## #Canonical correlations# ######################## rho=NA*1:mindim for(i in 1:mindim){ rho[i] <- cor(as.vector(W[,i]), as.vector(V[,i]), use="pairwise") } #A test of CCA significance cca.sig <- p.asym(rho, xdim[1], xdim[2], ydim[2], tstat = "Pillai") #out list( rho=rho, stat=cca.sig$stat, df1=cca.sig$df1, df2=cca.sig$df2, approx=cca.sig$approx,p.value=cca.sig$p.value, mindim=mindim, n_pcx_incl=n_pcx_incl, n_pcy_incl=n_pcy_incl, rowx_incl=rowx_incl, rowy_incl=rowy_incl, A=A, B=B, V=V, W=W, x_ts=rownames(x), y_ts=rownames(y), y_lag=y_lag, x_dim=eofx$F1_dim, y_dim=eofy$F1_dim, x_cols_incl=eofx$F1_cols_incl, y_cols_incl=eofy$F1_cols_incl, x_center=eofx$F1_center, x_scale=eofx$F1_scale, y_center=eofy$F1_center, y_scale=eofy$F1_scale ) }
the code to produce the examples above:
###Load libraries library(maps) library(mapproj) library(ncdf) library(CCP) ###Required functions (from "www.menugget.blogspot.com") source("val2col.R") source("lon.lat.filter.R") source("image.scale.R") source("eof.mca.R") source("cov4gappy.R") source("color.palette.R") source("anomaly.R", sep="")) source("bp.cca.R") source("exp.mat.R") ###Load data #Hadley SLP monthly mean dataset #http://www.esrl.noaa.gov/psd/gcos_wgsp/Gridded/data.hadslp2.html #ftp://ftp.cdc.noaa.gov/Datasets.other/hadslp/slp.mnmean.nc #Kaplan SST monthly anomaly dataset #http://www.esrl.noaa.gov/psd/thredds/catalog/Datasets/kaplan_sst/catalog.html #http://www.esrl.noaa.gov/psd/thredds/fileServer/Datasets/kaplan_sst/sst.mon.anom.nc #slp nc <- open.ncdf("slp.mnmean.nc") # opens nc file slp.lon <- get.var.ncdf( nc, "lon") slp.lat <- get.var.ncdf( nc, "lat") slp.t <- get.var.ncdf( nc, "time") slp.raw <- get.var.ncdf(nc, "slp") close.ncdf(nc) slp.t <- as.Date(slp.t, origin="1800-01-01") temp <- which(slp.lon>180) slp.lon[temp] <- slp.lon[temp]-360 slp.grd <- expand.grid(slp.lon, slp.lat) colnames(slp.grd) <- c("lon", "lat") slp <- matrix(c(slp.raw), nrow=length(slp.t), ncol<-length(slp.lon)*length(slp.lat), byrow=TRUE) row.names(slp) <- as.character(slp.t) dim(slp) slp <- anomaly(slp, as.POSIXlt(slp.t), level="monthly") #sst nc <- open.ncdf("sst.mon.anom.nc") # opens nc file sst.lon <- get.var.ncdf( nc, "lon") sst.lat <- get.var.ncdf( nc, "lat") sst.t <- get.var.ncdf( nc, "time") sst.raw <- get.var.ncdf(nc, "sst") close.ncdf(nc) sst.t <- as.Date(sst.t, origin="1800-01-01") temp <- which(sst.lon>180) sst.lon[temp] <- sst.lon[temp]-360 sst.grd <- expand.grid(sst.lon, sst.lat) colnames(sst.grd) <- c("lon", "lat") sst<- matrix(c(sst.raw), nrow=length(sst.t), ncol<-length(sst.lon)*length(sst.lat), byrow=TRUE) row.names(sst) <- as.character(sst.t) dim(sst) ###Make polygons for the data grids #SLP spacing=5 # degrees slp.poly <- vector(mode="list", dim(slp.grd)[1]) for(i in seq(slp.poly)){ x=c(slp.grd[i,1]-spacing/2, slp.grd[i,1]+spacing/2, slp.grd[i,1]+spacing/2, slp.grd[i,1]-spacing/2) y=c(slp.grd[i,2]-spacing/2, slp.grd[i,2]-spacing/2, slp.grd[i,2]+spacing/2, slp.grd[i,2]+spacing/2) slp.poly[[i]] <- data.frame(x=x, y=y) } #SST spacing=5 # degrees sst.poly <- vector(mode="list", dim(sst.grd)[1]) for(i in seq(sst.poly)){ x=c(sst.grd[i,1]-spacing/2, sst.grd[i,1]+spacing/2, sst.grd[i,1]+spacing/2, sst.grd[i,1]-spacing/2) y=c(sst.grd[i,2]-spacing/2, sst.grd[i,2]-spacing/2, sst.grd[i,2]+spacing/2, sst.grd[i,2]+spacing/2) sst.poly[[i]] <- data.frame(x=x, y=y) } ###Select data for analysis #time selection range(slp.t) range(sst.t) t.min <- as.Date("1950-01-01") t.max1 <- as.Date("1989-12-31") t.max2 <- as.Date("2004-12-31") slp.t.incl <- which(slp.t >= t.min & slp.t <= t.max2) sst.t.incl <- which(sst.t >= t.min & sst.t <= t.max2) #space selection lon.lim <- c(-180, -70) lat.lim <- c(-30, 30) slp.grd.incl <- lon.lat.filter(slp.grd[,1], slp.grd[,2], lon.lim[1], lon.lim[2], lat.lim[2], lat.lim[1]) sst.grd.incl <- lon.lat.filter(sst.grd[,1], sst.grd[,2], lon.lim[1], lon.lim[2], lat.lim[2], lat.lim[1]) ###BPCCA #Calculate EOFs eof.slp <- eof.mca(slp[slp.t.incl, slp.grd.incl], centered=TRUE, nu=20) eof.sst <- eof.mca(sst[sst.t.incl, sst.grd.incl], centered=TRUE, nu=20) dim(eof.slp$A) dim(eof.sst$A) #cumulative variance explained by the "significant" EOFs heights=c(5) widths=c(5,5) res=200 #figure png(filename="cum_expl_var.png", width = sum(widths), height = sum(heights), units="in", res=res) par(mfcol=c(1,2)) plot(cumsum(eof.slp$expl_var[1:20])*100, ylab="cum.expl.var.[%]", xlab="EOF", cex=2, main="Significant EOFs - SLP") grid() points(cumsum(eof.slp$expl_var[1:eof.slp$n_sig])*100, pch=20, col=2, cex=2) text(x=eof.slp$n_sig, y=sum(eof.slp$expl_var[1:eof.slp$n_sig])*100, labels=paste(round(sum(eof.slp$expl_var[1:eof.slp$n_sig])*100),"%"), pos=4) plot(cumsum(eof.sst$expl_var[1:20])*100, ylab="cum.expl.var.[%]", xlab="EOF", cex=2, main="Significant EOFs - SST") grid() points(cumsum(eof.sst$expl_var[1:eof.sst$n_sig])*100, pch=20, col=2, cex=2) text(x=eof.sst$n_sig, y=sum(eof.sst$expl_var[1:eof.sst$n_sig])*100, labels=paste(round(sum(eof.sst$expl_var[1:eof.sst$n_sig])*100),"%"), pos=4) dev.off() #CCA model will be built on a smaller time subset of the PCs (eof$A) eof.slp.match <- which(as.Date(rownames(eof.slp$A)) >= t.min & as.Date(rownames(eof.slp$A)) <= t.max1) eof.sst.match <- which(as.Date(rownames(eof.sst$A)) >= t.min & as.Date(rownames(eof.sst$A)) <= t.max1) #The BPCCA model based on SLP and SST EOFS bpcca <- bp.cca(eof.slp, eof.sst, n_pcx_incl=eof.slp$n_sig, n_pcy_incl=eof.sst$n_sig, rowx_incl=eof.slp.match, rowy_incl=eof.sst.match) ###Map of CCA slp.cca.modes <- eof.slp$u[,1:bpcca$n_pcx_incl] %*% bpcca$A sst.cca.modes <- eof.sst$u[,1:bpcca$n_pcy_incl] %*% bpcca$B MODE=1 zran <- range(slp.cca.modes[,MODE], sst.cca.modes[,MODE]) zlim <- c(-max(abs(zran)), max(abs(zran))) heights=c(3,2) widths=c(4,4,0.5) pal=color.palette(c("red", "yellow", "white", "cyan", "blue"), c(10,1,1,10)) ncol=100 res=200 colorvalues1 <- val2col(slp.cca.modes[,MODE], zlim, col=pal(ncol)) #color levels for the polygons colorvalues2 <- val2col(sst.cca.modes[,MODE], zlim, col=pal(ncol)) #color levels for the polygons #mapproj settings project="fisheye" orientation=c(mean(lat.lim), mean(lon.lim), 0) PAR=1 #figure png(filename=paste("map_cca_mode", MODE, ".png", sep=""), width = sum(widths), height = sum(heights), units="in", res=res) #x11(width = sum(widths), height = sum(heights)) par(omi=c(0.5, 0.5, 0.5, 0.5), ps=12) layout(matrix(c(1,2,3,4,4,4),nrow=2,ncol=3,byrow=TRUE), widths = widths, heights = heights, respect=TRUE) layout.show(4) #plot of map1 par(mai=c(0.1, 0.1, 0.1, 0.1)) map("world",project=project, orientation=orientation, par=PAR, ylim=lat.lim, xlim=lon.lim) for(i in seq(slp.grd.incl)){ polygon(mapproject(x=slp.poly[[slp.grd.incl[i]]][,1], y=slp.poly[[slp.grd.incl[i]]][,2]), col=colorvalues1[i], border=colorvalues1[i], lwd=0.3) } map("world",project=project, orientation=orientation, par=PAR, fill=FALSE, add=TRUE, col="black") map.grid(c(-180, 180, -90, 90), nx=36, ny=18, labels=FALSE, col="grey", lwd=1) box() mtext(paste("SLP Monthly Anomaly CCA", MODE), side=3, line=1, col=3) #plot of map2 par(mai=c(0.1, 0.1, 0.1, 0.1)) map("world",project=project, orientation=orientation, par=PAR, ylim=lat.lim, xlim=lon.lim, xaxs="i", yaxs="i") for(i in seq(sst.grd.incl)){ polygon(mapproject(x=sst.poly[[sst.grd.incl[i]]][,1], y=sst.poly[[sst.grd.incl[i]]][,2]), col=colorvalues2[i], border=colorvalues2[i], lwd=0.3) } map("world",project=project, orientation=orientation, par=PAR, fill=FALSE, add=TRUE, col="black") map.grid(c(-180, 180, -90, 90), nx=36, ny=18, labels=FALSE, col="grey", lwd=1) box() mtext(paste("SST Monthly Anomaly CCA", MODE), side=3, line=1, col=4) #add scale par(mai=c(0.1, 0.1, 0.1, 0.1)) image.scale(1, zlim, col=pal(ncol), horiz=FALSE, yaxt="n") axis(4) box() #add ts YLIM <- range(c(bpcca$V[,MODE], bpcca$W[,MODE])) XLIM <- range(c(as.Date(bpcca$x_ts), as.Date(bpcca$y_ts))) par(mai=c(0.1, 0.1, 0.5, 0.1)) plot(as.Date(bpcca$x_ts), bpcca$V[,MODE], t="n", ylim=YLIM, xlim=XLIM, ylab="", xlab="") abline(h=seq(-4,4,2), col=8, lty=3) abline(v=seq(as.Date("1900-01-01"),as.Date("2100-01-01"),by="5 years"), col=8, lty=3) lines(as.Date(bpcca$x_ts), bpcca$V[,MODE],col=3) lines(as.Date(bpcca$y_ts), bpcca$W[,MODE],col=4) mtext(paste("CCA", MODE, "Variates; R =", round(bpcca$rho[MODE],2)), side=3, line=1) dev.off() ###CCA model prediction of EOF coefficients #use only significant CCAs cca_use=which(bpcca$p.value<0.05) # 1. create extended V using the full PCs of slp (to 2004) V_pred <- as.matrix(eof.slp$A[,1:bpcca$n_pcx_incl]) %*% exp.mat(diag(eof.slp$Lambda[1:bpcca$n_pcx_incl],bpcca$n_pcx_incl), -0.5) %*% as.matrix(bpcca$A) # 2. create predicted W W_pred <- as.matrix(V_pred[,cca_use]) %*% diag(bpcca$rho[cca_use], length(cca_use)) # 3. create predicted PCs PCy_pred <- W_pred %*% as.matrix(t(bpcca$B[,cca_use])) # 4. add units back to predicted PCs PCy_pred_norm <- PCy_pred %*% exp.mat(diag(eof.sst$Lambda[1:bpcca$n_pcy_incl],bpcca$n_pcy_incl), 0.5) #figure of original vs. predicted EOF coefficients LWD=c(1,1,1) LTY=c(1,1,1) COL=c(1,8,rgb(0,1,0,0.7)) png(filename="cca_pred_sst_pcs.png", width = 6, height = 6, units="in", res=400) par(mfcol=c(dim(PCy_pred_norm )[2]+1, 1), omi=c(0.2,0.2,0.2,0.2)) par(mar=c(0.5,0.5,0.5,0.5)) plot(1,1,bty="n", t="n", ylab="", xlab="", yaxt="n", xaxt="n") legend( "center", inset = c(0.01,0.01), legend=c( paste("SST PC - portion", "used in CCA model"), paste("SST PC - portion", "not used in CCA model"), paste("Predicted SST PC") ), col=COL, lty=LTY, lwd=LWD, ) par(mar=c(3,5,0.5,0.5)) for(i in seq(dim(PCy_pred_norm )[2])){ MODE=i XLIM <- range(c(as.Date(rownames(PCy_pred_norm)), as.Date(rownames(eof.sst$A)))) YLIM <- range(c(PCy_pred_norm[,MODE],eof.sst$A[,MODE])) plot(as.Date(rownames(PCy_pred_norm)),PCy_pred_norm[,MODE], t="n", ylim=YLIM, xlim=XLIM, ylab="", xlab="") abline(h=seq(par()$yaxp[1],par()$yaxp[2],,par()$yaxp[3]+1), col=8, lty=3, lwd=0.5) abline(v=seq(as.Date("1900-01-01"),as.Date("2100-01-01"),by="5 years"), col=8, lty=3, lwd=0.5) lines(as.Date(rownames(eof.sst$A)[eof.sst.match]),eof.sst$A[eof.sst.match,MODE], col=COL[1], lty=LTY[1], lwd=LWD[1]) lines(as.Date(rownames(eof.sst$A)[-eof.sst.match]),eof.sst$A[-eof.sst.match,MODE], col=COL[2], lty=LTY[2], lwd=LWD[2]) lines(as.Date(rownames(PCy_pred_norm)),PCy_pred_norm[,MODE], col=COL[3], lty=LTY[3], lwd=LWD[3]) mtext(paste("PC ", MODE, " (", round(eof.sst$expl_var[i]*100), " %)", sep=""), side=2, line=3, cex=0.7) } dev.off() #5. reconstruct field field.pred <- PCy_pred_norm %*% t(eof.sst$u[,1:bpcca$n_pcy_incl]) dim(field.pred) dim(sst[sst.t.incl, sst.grd.incl]) #6. a possible map - correlation between original and predicted COR <- NaN*seq(dim(field.pred)[2]) for(i in seq(dim(field.pred)[2])){ COR[i] <- cor(sst[sst.t.incl, sst.grd.incl[i]], field.pred[,i], use="pairwise") } zlim <- c(-1,1) pal=color.palette(c("red", "yellow", "cyan", "blue")) ncol=20 colorvalues1 <- val2col(COR, zlim, col=pal(ncol)) #color levels for the polygons #device settings heights=c(3) widths=c(4,0.5) res=200 #mapproj settings project="fisheye" orientation=c(mean(lat.lim), mean(lon.lim), 0) PAR=1 #figure png(filename="sst_pred_vs_orig_field.png", width = sum(widths), height = sum(heights), res=res, units="in") #x11(width = sum(widths), height = sum(heights)) par(omi=c(0.1, 0.1, 0.5, 0.5), ps=12) layout(matrix(c(1,2),nrow=1,ncol=2,byrow=TRUE), widths = widths, heights = heights, respect=TRUE) layout.show(2) #plot of map par(mai=c(0.1, 0.1, 0.1, 0.1)) map("world",project=project, orientation=orientation, par=PAR, ylim=lat.lim, xlim=lon.lim) for(i in seq(slp.grd.incl)){ polygon(mapproject(x=sst.poly[[sst.grd.incl[i]]][,1], y=sst.poly[[sst.grd.incl[i]]][,2]), col=colorvalues1[i], border=colorvalues1[i], lwd=0.3) } map("world",project=project, orientation=orientation, par=PAR, fill=FALSE, add=TRUE, col="black") map.grid(c(-180, 180, -90, 90), nx=36, ny=18, labels=FALSE, col="grey", lwd=1) box() mtext("Correlation of original vs. predicted SST fields", side=3, line=1) #add scale par(mai=c(0.1, 0.1, 0.1, 0.1)) image.scale(1, zlim, col=pal(ncol), horiz=FALSE, yaxt="n") axis(4) box() dev.off()
No comments:
Post a Comment