Tuesday, December 13, 2011

Maximum Covariance Analysis (MCA)

Maximum Covariance Analysis (MCA) (Mode 1; scaled) of Sea Level Pressure (SLP) and Sea Surface Temperature (SST) monthly anomalies for the region between -180 °W to -70 °W and +30 °N to -30 °S.  MCA coefficients (scaled) are below. The mode represents 94% of the squared covariance fraction (SCF).

Maximum Correlation Analysis (MCA) is similar to Empirical Orthogonal Function Analysis (EOF) in that they both deal with the decomposition of a covariance matrix. In EOF, this is a covariance matrix based on a single spatio-temporal field, while MCA is based on the decomposition of a "cross-covariance" matrix derived from two fields.

In the computation of cross-covariance matrix using R, the fields need not have the same number of columns (e.g. spatial points), although the row dimension (e.g. time) must be equivalent. Since the resulting matrix is not necessarily square, a singular value decomposition (SVD) is appropriate, and in fact some authors refer to MCA as the "SVD" method (Björnsson and Venegas, 1997).

The above figure shows the first MCA mode calculated between the monthly anomaly fields of Sea Surface Pressure (SLP) and Sea Surface Temperature (SST) for the region of the equatorial Pacific. Mode 1 explains a large portion of the squared covariance (94%) and shows the strong large scale coupling between SLP and SST anomalies (e.g. areas with positive MCA values in SLP appear to coincide with negative SST). This is what we would expect of the El Niño Southern Oscillation (ENSO) - High SLP anomalies in the western Pacific is typical of the warm El Niño phase, which results in warmer SST anomalies throughout the equatorial Pacific. The opposite La Niña phase results from low SLP in the western Pacific.

In order to reproduce the analysis you will need several functions that I have posted earlier (val2col, lon.lat.filter, image.scale, eof.mca, cov4gappy, color.palette) plus an additional function for the calculation of anomalies ("anomaly"; below)

Following my post of EOF, there was a comment on which statistic is correct to present with each EOF mode: "explained variance" or "squared covariance fraction". It seems that explained variance is more often reported along with EOF and squared covariance fraction with MCA, although neither is a measure of significance in and of itself. For a nice review of this topic, those interested should read the tutorial on EOF by Norris. Additional information on the testing of significance can be found in the book by von Storch and Zwiers (1999).

example to produce the figure...
#Hadley SLP monthly mean dataset
#Kaplan SST monthly anomaly dataset
###Required functions and libraries
###load datasets
nc <- open.ncdf("slp.mnmean.nc")   # opens nc file
slp.lon <- get.var.ncdf( nc, "lon")
slp.lat <- get.var.ncdf( nc, "lat")
slp.t <- get.var.ncdf( nc, "time")
slp.raw <- get.var.ncdf(nc, "slp")
slp.t <- as.Date(slp.t, origin="1800-01-01")
temp <- which(slp.lon>180)
slp.lon[temp] <- slp.lon[temp]-360
slp.grd <- expand.grid(slp.lon, slp.lat)
colnames(slp.grd) <- c("lon", "lat")
slp <- matrix(c(slp.raw), nrow=length(slp.t), ncol<-length(slp.lon)*length(slp.lat), byrow=TRUE)
row.names(slp) <- as.character(slp.t)
slp <- anomaly(slp, as.POSIXlt(slp.t), level="monthly") #calculates the monthly anomaly
nc <- open.ncdf("sst.mon.anom.nc")   # opens nc file 
sst.lon <- get.var.ncdf( nc, "lon")
sst.lat <- get.var.ncdf( nc, "lat")
sst.t <- get.var.ncdf( nc, "time")
sst.raw <- get.var.ncdf(nc, "sst")
sst.t <- as.Date(sst.t, origin="1800-01-01")
temp <- which(sst.lon>180)
sst.lon[temp] <- sst.lon[temp]-360
sst.grd <- expand.grid(sst.lon, sst.lat)
colnames(sst.grd) <- c("lon", "lat")
sst<- matrix(c(sst.raw), nrow=length(sst.t), ncol<-length(sst.lon)*length(sst.lat), byrow=TRUE)
row.names(sst) <- as.character(sst.t)
###Create polygons for spatial grids of datasets
spacing=5 # degrees
slp.poly <- vector(mode="list", dim(slp.grd)[1])
for(i in seq(slp.poly)){
 x=c(slp.grd[i,1]-spacing/2, slp.grd[i,1]+spacing/2, slp.grd[i,1]+spacing/2, slp.grd[i,1]-spacing/2)
 y=c(slp.grd[i,2]-spacing/2, slp.grd[i,2]-spacing/2, slp.grd[i,2]+spacing/2, slp.grd[i,2]+spacing/2)
 slp.poly[[i]] <- data.frame(x=x, y=y)
spacing=5 # degrees
sst.poly <- vector(mode="list", dim(sst.grd)[1])
for(i in seq(sst.poly)){
 x=c(sst.grd[i,1]-spacing/2, sst.grd[i,1]+spacing/2, sst.grd[i,1]+spacing/2, sst.grd[i,1]-spacing/2)
 y=c(sst.grd[i,2]-spacing/2, sst.grd[i,2]-spacing/2, sst.grd[i,2]+spacing/2, sst.grd[i,2]+spacing/2)
 sst.poly[[i]] <- data.frame(x=x, y=y)
###Select space and time to include
#time selection
t.min <- as.Date("1950-01-01")
t.max <- as.Date("1999-12-31")
slp.t.incl <- which(slp.t > t.min & slp.t < t.max)
sst.t.incl <- which(sst.t > t.min & sst.t < t.max)
#space selection
lon.lim <- c(-180, -70)
lat.lim <- c(-30, 30)
slp.grd.incl <- lon.lat.filter(slp.grd[,1], slp.grd[,2], lon.lim[1], lon.lim[2], lat.lim[2], lat.lim[1])
sst.grd.incl <- lon.lat.filter(sst.grd[,1], sst.grd[,2], lon.lim[1], lon.lim[2], lat.lim[2], lat.lim[1])
mca <- eof.mca(slp[slp.t.incl, slp.grd.incl], F2=sst[sst.t.incl, sst.grd.incl],
centered=TRUE, scaled=TRUE)
zran <- range(mca$u[,MODE], mca$v[,MODE])
zlim <- c(-max(abs(zran)), max(abs(zran)))
pal=color.palette(c("red", "yellow", "white", "cyan", "blue"), c(10,1,1,10))
colorvalues1 <- val2col(mca$u[,MODE], zlim, col=pal(ncol)) #color levels for the polygons
colorvalues2 <- val2col(mca$v[,MODE], zlim, col=pal(ncol)) #color levels for the polygons
#mapproj settings
orientation=c(mean(lat.lim), mean(lon.lim), 0)
png(filename=paste("map_mca_mode", MODE, ".png", sep=""), width = sum(widths), height = sum(heights), units="in", res=res)
par(omi=c(0.5, 0.5, 0.5, 0.5), ps=12)
layout(matrix(c(1,2,3,4,4,4),nrow=2,ncol=3,byrow=TRUE), widths = widths, heights = heights, respect=TRUE)
#plot of map1
par(mai=c(0.1, 0.1, 0.1, 0.1))
map("world",project=project, orientation=orientation, par=PAR, ylim=lat.lim, xlim=lon.lim)
for(i in seq(slp.grd.incl)){
 polygon(mapproject(x=slp.poly[[slp.grd.incl[i]]][,1], y=slp.poly[[slp.grd.incl[i]]][,2]), col=colorvalues1[i], border=colorvalues1[i], lwd=0.3)
map("world",project=project, orientation=orientation, par=PAR, fill=FALSE, add=TRUE, col="black")
map.grid(c(-180, 180, -90, 90), nx=36, ny=18, labels=FALSE, col="grey", lwd=1)
mtext(paste("SLP Monthly Anomaly MCA Mode", MODE), side=3, line=1, col=3)
#plot of map2
par(mai=c(0.1, 0.1, 0.1, 0.1))
map("world",project=project, orientation=orientation, par=PAR, ylim=lat.lim, xlim=lon.lim, xaxs="i", yaxs="i")
for(i in seq(sst.grd.incl)){
 polygon(mapproject(x=sst.poly[[sst.grd.incl[i]]][,1], y=sst.poly[[sst.grd.incl[i]]][,2]), col=colorvalues2[i], border=colorvalues2[i], lwd=0.3)
map("world",project=project, orientation=orientation, par=PAR, fill=FALSE, add=TRUE, col="black")
map.grid(c(-180, 180, -90, 90), nx=36, ny=18, labels=FALSE, col="grey", lwd=1)
mtext(paste("SST Monthly Anomaly MCA Mode", MODE), side=3, line=1, col=4)
#add scale
par(mai=c(0.1, 0.1, 0.1, 0.1))
image.scale(1, zlim, col=pal(ncol), horiz=FALSE, yaxt="n")
#add ts
YLIM <- range(c(mca$A[,MODE]/sqrt(mca$Lambda[MODE]), mca$B[,MODE]/sqrt(mca$Lambda[MODE])))
par(mai=c(0.1, 0.1, 0.5, 0.1))
plot(slp.t[slp.t.incl], mca$A[,MODE]/sqrt(mca$Lambda[MODE]), t="l", col=3, ylim=YLIM, ylab="", xlab="")
lines(sst.t[sst.t.incl], mca$B[,MODE]/sqrt(mca$Lambda[MODE]),col=4)
abline(h=0, col=8, lty=2)
mtext(paste("MCA Mode", MODE, "Coefficients; SCF =", round(mca$sq_cov_frac[MODE],2)), side=3, line=1)
Created by Pretty R at inside-R.org

the anomaly function...
anomaly<-function(y, x, level="daily"){ 
#y is a vector or matrix of measurements
#x is a time series for the vector measurements in POSIXlt format
#level is "daily" or "monthly"
 y <- as.matrix(y)
 levs_lookup=vector("list", length(levs))
 for(i in 1:length(levs)){
  if(level=="monthly"){levs_lookup[[i]]<-which(x$mon == names(levs_lookup[i]))}
  if(level=="daily"){levs_lookup[[i]]<-which(x$yday == names(levs_lookup[i]))}
 for(j in 1:length(levs)){     #for every time level
  y[levs_lookup[[j]],] <- t(t(as.matrix(y[levs_lookup[[j]],])) - apply(as.matrix(y[levs_lookup[[j]],]), 2, mean, na.rm=TRUE))
Created by Pretty R at inside-R.org


Björnsson, H. and Venegas, S.A. (1997). "A manual for EOF and SVD analyses of climate data", McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp.

Norris, J. EOF Tutorial. homepage.

von Storch, H, Zwiers, F.W. (1999). Statistical analysis in climate research. Cambridge University Press.


  1. hi, very useful your post anyway I would like to ask you what kind of assumptions must be made when you try to use MCA, is it a non-parametric method? thank you in advance

  2. Error in sst[sst.t.incl, sst.grd.incl] : subscript out of bounds

    keep getting this error message at the MCA section (around line 86 when ran)

    Please help anyone??

  3. Error in sst[sst.t.incl, sst.grd.incl] : subscript out of bounds

    keep getting this error message in the MCA section of the code. Please help anyone??

  4. Hi,

    How to cite your works?

    Thank you,

    1. This comment has been removed by the author.

    2. Generally, you could now cite the sinkr package (https://github.com/marchtaylor/sinkr), which contains most of the functions that I have written on this blog. Unfortunately, MCA is not yet included, so I suppose you could just cite the webpage itself if that is the function you are using.

  5. Thanks for your nice Maximum Covariance Analysis (MCA) program. I am running your example program, but I get some error message. I am new to R program.
    This is the error.
    Error in cov4gappy(F1[, F1_cols_incl], F2 = F2[, F2_cols_incl]) :
    object 'cov_mat' not found