Maximum Correlation Analysis (MCA) is similar to Empirical Orthogonal Function Analysis (EOF) in that they both deal with the decomposition of a covariance matrix. In EOF, this is a covariance matrix based on a single spatio-temporal field, while MCA is based on the decomposition of a "cross-covariance" matrix derived from two fields.
In the computation of cross-covariance matrix using R, the fields need not have the same number of columns (e.g. spatial points), although the row dimension (e.g. time) must be equivalent. Since the resulting matrix is not necessarily square, a singular value decomposition (SVD) is appropriate, and in fact some authors refer to MCA as the "SVD" method (Björnsson and Venegas, 1997).
The above figure shows the first MCA mode calculated between the monthly anomaly fields of Sea Surface Pressure (SLP) and Sea Surface Temperature (SST) for the region of the equatorial Pacific. Mode 1 explains a large portion of the squared covariance (94%) and shows the strong large scale coupling between SLP and SST anomalies (e.g. areas with positive MCA values in SLP appear to coincide with negative SST). This is what we would expect of the El Niño Southern Oscillation (ENSO) - High SLP anomalies in the western Pacific is typical of the warm El Niño phase, which results in warmer SST anomalies throughout the equatorial Pacific. The opposite La Niña phase results from low SLP in the western Pacific.
In order to reproduce the analysis you will need several functions that I have posted earlier (val2col, lon.lat.filter, image.scale, eof.mca, cov4gappy, color.palette) plus an additional function for the calculation of anomalies ("anomaly"; below)
Following my post of EOF, there was a comment on which statistic is correct to present with each EOF mode: "explained variance" or "squared covariance fraction". It seems that explained variance is more often reported along with EOF and squared covariance fraction with MCA, although neither is a measure of significance in and of itself. For a nice review of this topic, those interested should read the tutorial on EOF by Norris. Additional information on the testing of significance can be found in the book by von Storch and Zwiers (1999).
In the computation of cross-covariance matrix using R, the fields need not have the same number of columns (e.g. spatial points), although the row dimension (e.g. time) must be equivalent. Since the resulting matrix is not necessarily square, a singular value decomposition (SVD) is appropriate, and in fact some authors refer to MCA as the "SVD" method (Björnsson and Venegas, 1997).
The above figure shows the first MCA mode calculated between the monthly anomaly fields of Sea Surface Pressure (SLP) and Sea Surface Temperature (SST) for the region of the equatorial Pacific. Mode 1 explains a large portion of the squared covariance (94%) and shows the strong large scale coupling between SLP and SST anomalies (e.g. areas with positive MCA values in SLP appear to coincide with negative SST). This is what we would expect of the El Niño Southern Oscillation (ENSO) - High SLP anomalies in the western Pacific is typical of the warm El Niño phase, which results in warmer SST anomalies throughout the equatorial Pacific. The opposite La Niña phase results from low SLP in the western Pacific.
In order to reproduce the analysis you will need several functions that I have posted earlier (val2col, lon.lat.filter, image.scale, eof.mca, cov4gappy, color.palette) plus an additional function for the calculation of anomalies ("anomaly"; below)
Following my post of EOF, there was a comment on which statistic is correct to present with each EOF mode: "explained variance" or "squared covariance fraction". It seems that explained variance is more often reported along with EOF and squared covariance fraction with MCA, although neither is a measure of significance in and of itself. For a nice review of this topic, those interested should read the tutorial on EOF by Norris. Additional information on the testing of significance can be found in the book by von Storch and Zwiers (1999).
example to produce the figure...
#Hadley SLP monthly mean dataset #http://www.esrl.noaa.gov/psd/gcos_wgsp/Gridded/data.hadslp2.html #ftp://ftp.cdc.noaa.gov/Datasets.other/hadslp/slp.mnmean.nc #Kaplan SST monthly anomaly dataset #http://www.esrl.noaa.gov/psd/thredds/catalog/Datasets/kaplan_sst/catalog.html #http://www.esrl.noaa.gov/psd/thredds/fileServer/Datasets/kaplan_sst/sst.mon.anom.nc ###Required functions and libraries source("val2col.R") source("lon.lat.filter.R") source("image.scale.R") source("eof.mca.R") source("cov4gappy.R") source("color.palette.R") source("anomaly.R") library(maps) library(mapproj) library(ncdf) ###load datasets #slp nc <- open.ncdf("slp.mnmean.nc") # opens nc file slp.lon <- get.var.ncdf( nc, "lon") slp.lat <- get.var.ncdf( nc, "lat") slp.t <- get.var.ncdf( nc, "time") slp.raw <- get.var.ncdf(nc, "slp") close.ncdf(nc) slp.t <- as.Date(slp.t, origin="1800-01-01") temp <- which(slp.lon>180) slp.lon[temp] <- slp.lon[temp]-360 slp.grd <- expand.grid(slp.lon, slp.lat) colnames(slp.grd) <- c("lon", "lat") slp <- matrix(c(slp.raw), nrow=length(slp.t), ncol<-length(slp.lon)*length(slp.lat), byrow=TRUE) row.names(slp) <- as.character(slp.t) dim(slp) slp <- anomaly(slp, as.POSIXlt(slp.t), level="monthly") #calculates the monthly anomaly #sst nc <- open.ncdf("sst.mon.anom.nc") # opens nc file sst.lon <- get.var.ncdf( nc, "lon") sst.lat <- get.var.ncdf( nc, "lat") sst.t <- get.var.ncdf( nc, "time") sst.raw <- get.var.ncdf(nc, "sst") close.ncdf(nc) sst.t <- as.Date(sst.t, origin="1800-01-01") temp <- which(sst.lon>180) sst.lon[temp] <- sst.lon[temp]-360 sst.grd <- expand.grid(sst.lon, sst.lat) colnames(sst.grd) <- c("lon", "lat") sst<- matrix(c(sst.raw), nrow=length(sst.t), ncol<-length(sst.lon)*length(sst.lat), byrow=TRUE) row.names(sst) <- as.character(sst.t) dim(sst) ###Create polygons for spatial grids of datasets #slp spacing=5 # degrees slp.poly <- vector(mode="list", dim(slp.grd)[1]) for(i in seq(slp.poly)){ x=c(slp.grd[i,1]-spacing/2, slp.grd[i,1]+spacing/2, slp.grd[i,1]+spacing/2, slp.grd[i,1]-spacing/2) y=c(slp.grd[i,2]-spacing/2, slp.grd[i,2]-spacing/2, slp.grd[i,2]+spacing/2, slp.grd[i,2]+spacing/2) slp.poly[[i]] <- data.frame(x=x, y=y) } #sst spacing=5 # degrees sst.poly <- vector(mode="list", dim(sst.grd)[1]) for(i in seq(sst.poly)){ x=c(sst.grd[i,1]-spacing/2, sst.grd[i,1]+spacing/2, sst.grd[i,1]+spacing/2, sst.grd[i,1]-spacing/2) y=c(sst.grd[i,2]-spacing/2, sst.grd[i,2]-spacing/2, sst.grd[i,2]+spacing/2, sst.grd[i,2]+spacing/2) sst.poly[[i]] <- data.frame(x=x, y=y) } ###Select space and time to include #time selection range(slp.t) range(sst.t) t.min <- as.Date("1950-01-01") t.max <- as.Date("1999-12-31") slp.t.incl <- which(slp.t > t.min & slp.t < t.max) sst.t.incl <- which(sst.t > t.min & sst.t < t.max) #space selection lon.lim <- c(-180, -70) lat.lim <- c(-30, 30) slp.grd.incl <- lon.lat.filter(slp.grd[,1], slp.grd[,2], lon.lim[1], lon.lim[2], lat.lim[2], lat.lim[1]) sst.grd.incl <- lon.lat.filter(sst.grd[,1], sst.grd[,2], lon.lim[1], lon.lim[2], lat.lim[2], lat.lim[1]) ###MCA mca <- eof.mca(slp[slp.t.incl, slp.grd.incl], F2=sst[sst.t.incl, sst.grd.incl], centered=TRUE, scaled=TRUE) ###FIGURE #settings MODE=1 zran <- range(mca$u[,MODE], mca$v[,MODE]) zlim <- c(-max(abs(zran)), max(abs(zran))) heights=c(4,2) widths=c(4,4,0.5) pal=color.palette(c("red", "yellow", "white", "cyan", "blue"), c(10,1,1,10)) ncol=100 res=200 colorvalues1 <- val2col(mca$u[,MODE], zlim, col=pal(ncol)) #color levels for the polygons colorvalues2 <- val2col(mca$v[,MODE], zlim, col=pal(ncol)) #color levels for the polygons #mapproj settings project="fisheye" orientation=c(mean(lat.lim), mean(lon.lim), 0) PAR=1 #figure png(filename=paste("map_mca_mode", MODE, ".png", sep=""), width = sum(widths), height = sum(heights), units="in", res=res) par(omi=c(0.5, 0.5, 0.5, 0.5), ps=12) layout(matrix(c(1,2,3,4,4,4),nrow=2,ncol=3,byrow=TRUE), widths = widths, heights = heights, respect=TRUE) layout.show(4) #plot of map1 par(mai=c(0.1, 0.1, 0.1, 0.1)) map("world",project=project, orientation=orientation, par=PAR, ylim=lat.lim, xlim=lon.lim) for(i in seq(slp.grd.incl)){ polygon(mapproject(x=slp.poly[[slp.grd.incl[i]]][,1], y=slp.poly[[slp.grd.incl[i]]][,2]), col=colorvalues1[i], border=colorvalues1[i], lwd=0.3) } map("world",project=project, orientation=orientation, par=PAR, fill=FALSE, add=TRUE, col="black") map.grid(c(-180, 180, -90, 90), nx=36, ny=18, labels=FALSE, col="grey", lwd=1) box() mtext(paste("SLP Monthly Anomaly MCA Mode", MODE), side=3, line=1, col=3) #plot of map2 par(mai=c(0.1, 0.1, 0.1, 0.1)) map("world",project=project, orientation=orientation, par=PAR, ylim=lat.lim, xlim=lon.lim, xaxs="i", yaxs="i") for(i in seq(sst.grd.incl)){ polygon(mapproject(x=sst.poly[[sst.grd.incl[i]]][,1], y=sst.poly[[sst.grd.incl[i]]][,2]), col=colorvalues2[i], border=colorvalues2[i], lwd=0.3) } map("world",project=project, orientation=orientation, par=PAR, fill=FALSE, add=TRUE, col="black") map.grid(c(-180, 180, -90, 90), nx=36, ny=18, labels=FALSE, col="grey", lwd=1) box() mtext(paste("SST Monthly Anomaly MCA Mode", MODE), side=3, line=1, col=4) #add scale par(mai=c(0.1, 0.1, 0.1, 0.1)) image.scale(1, zlim, col=pal(ncol), horiz=FALSE, yaxt="n") axis(4) box() #add ts YLIM <- range(c(mca$A[,MODE]/sqrt(mca$Lambda[MODE]), mca$B[,MODE]/sqrt(mca$Lambda[MODE]))) par(mai=c(0.1, 0.1, 0.5, 0.1)) plot(slp.t[slp.t.incl], mca$A[,MODE]/sqrt(mca$Lambda[MODE]), t="l", col=3, ylim=YLIM, ylab="", xlab="") lines(sst.t[sst.t.incl], mca$B[,MODE]/sqrt(mca$Lambda[MODE]),col=4) abline(h=0, col=8, lty=2) box() mtext(paste("MCA Mode", MODE, "Coefficients; SCF =", round(mca$sq_cov_frac[MODE],2)), side=3, line=1) dev.off()
the anomaly function...
anomaly<-function(y, x, level="daily"){ #y is a vector or matrix of measurements #x is a time series for the vector measurements in POSIXlt format #level is "daily" or "monthly" y <- as.matrix(y) if(level=="monthly"){levs=unique(x$mon)} if(level=="daily"){levs=unique(x$yday)} levs_lookup=vector("list", length(levs)) names(levs_lookup)<-levs for(i in 1:length(levs)){ if(level=="monthly"){levs_lookup[[i]]<-which(x$mon == names(levs_lookup[i]))} if(level=="daily"){levs_lookup[[i]]<-which(x$yday == names(levs_lookup[i]))} } for(j in 1:length(levs)){ #for every time level y[levs_lookup[[j]],] <- t(t(as.matrix(y[levs_lookup[[j]],])) - apply(as.matrix(y[levs_lookup[[j]],]), 2, mean, na.rm=TRUE)) } y }
References
Björnsson, H. and Venegas, S.A. (1997). "A manual for EOF and SVD analyses of climate data", McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp.
Norris, J. EOF Tutorial. homepage.
von Storch, H, Zwiers, F.W. (1999). Statistical analysis in climate research. Cambridge University Press.
hi, very useful your post anyway I would like to ask you what kind of assumptions must be made when you try to use MCA, is it a non-parametric method? thank you in advance
ReplyDeleteError in sst[sst.t.incl, sst.grd.incl] : subscript out of bounds
ReplyDeletekeep getting this error message at the MCA section (around line 86 when ran)
Please help anyone??
Error in sst[sst.t.incl, sst.grd.incl] : subscript out of bounds
ReplyDeletekeep getting this error message in the MCA section of the code. Please help anyone??
Hi,
ReplyDeleteHow to cite your works?
Thank you,
This comment has been removed by the author.
DeleteGenerally, you could now cite the sinkr package (https://github.com/marchtaylor/sinkr), which contains most of the functions that I have written on this blog. Unfortunately, MCA is not yet included, so I suppose you could just cite the webpage itself if that is the function you are using.
DeleteThanks for your nice Maximum Covariance Analysis (MCA) program. I am running your example program, but I get some error message. I am new to R program.
ReplyDeleteThis is the error.
Error in cov4gappy(F1[, F1_cols_incl], F2 = F2[, F2_cols_incl]) :
object 'cov_mat' not found