In a simple example that performs SVD on a field of sea level pressure of the equatorial Pacific, svds outperforms irlba both in speed and correlation of singular vector (e.g. $u) to the output of the base svd function. One can see in the following graph, that trailing vectors break down in their correlation while svds maintains nearly perfect correlation. Interestingly, this artifact is removed by first centering the data field.
While the effect looks dramatic above, it should be noted that the trailing vectors usually carry only a small fraction of information, and thus contribute only marginally to errors in field reconstruction. Below is a figure showing the reconstruction error of svd, svds, and irlba with increasing levels of truncation.
Finally, both methods were compared in their performance within dineof. With the non-centered field both approaches arrive to a similar RMS, but svds converges with less iterations and EOFs than irlba. With the centered data both methods produce nearly identical results.
Code to reproduce:
# packages ---------------------------------------------------------------- library(RSpectra) library(irlba) library(sinkr) library(plyr) # data -------------------------------------------------------------------- # full field data(slp) F <- slp$field Fs <- scale(F, center = TRUE, scale = FALSE) # gappy field gaps <- sample(length(F), length(F)*0.5) Fg <- replace(F, gaps, NaN) Fsg <- replace(Fs, gaps, NaN) k <- 20 df <- expand.grid(nu=seq(k), method=c("svd", "svds", "irlba")) df$cor_w_svd <- NaN df$recon_rms <- NaN head(df) k <- 30 system.time(F.svd <- svd(F)) system.time(F.svds <- svds(F, k=k)) system.time(F.irlba <- irlba(F, nv=k, nu=k)) system.time(Fs.svd <- svd(Fs)) system.time(Fs.svds <- svds(Fs, k=k)) system.time(Fs.irlba <- irlba(Fs, nv=k, nu=k)) png("corSVD.png", width=7, height=3.5, units="in", type="cairo", res=400, family="Arial") op <- par(mfrow=c(1,2), mar=c(3,3,2,1), mgp=c(2,0.5,0), tcl=-0.25, ps=10) # Un-scaled plot(abs(diag(cor(F.svd$u[,seq(k)], F.svds$u[,seq(k)]))), t="l", ylim=c(0,1), ylab="Correlation to svd [abs(R)]", xlab="Singular vectors [u]", main="Un-centered field [F]" ) lines(abs(diag(cor(F.svd$u[,seq(k)], F.irlba$u[,seq(k)]))), t="l", col=2, lty=2 ) # Scaled plot(abs(diag(cor(Fs.svd$u[,seq(k)], Fs.svds$u[,seq(k)]))), t="l", ylim=c(0,1), ylab="Correlation to svd [abs(R)]", xlab="Singular vectors [u]", main="Centered field [Fs]" ) lines(abs(diag(cor(Fs.svd$u[,seq(k)], Fs.irlba$u[,seq(k)]))), t="l", col=2, lty=2 ) # legend legend("bottomright", legend=c("svds", "irlba"), col=1:2, lty=1:2, bty="n") par(op) dev.off() obj.svd<- c("F.svd", "F.svds", "F.irlba") ref.svd <- "F.svd" ref.field <- "F" df <- expand.grid(nu=seq(k), obj=obj.svd, stringsAsFactors = FALSE) df$method <- substr(df$obj, 3, 10) df$svd_cor <- NaN df$recon_rms <- NaN # df <- data.frame(nu=seq(k), svd.cor=NaN, svds.cor=NaN, irlba.cor=NaN, svd.rms=NaN, svds.rms=NaN, irlba.rms=NaN) for(i in seq(nrow(df))){ OBJ <- get(df$obj[i]) NU <- df$nu[i] df$svd_cor[i] <- abs(cor(get(ref.svd)$u[,df$nu[i]], OBJ$u[,NU])) RECON <- (OBJ$u[,seq(NU)]) %*% diag(OBJ$d[seq(NU)],NU,NU) %*% t(OBJ$v[,seq(NU)]) df$recon_rms[i] <- sqrt(mean((get(ref.field) - RECON)^2, na.rm=TRUE)) } df png("summary_rms.png", width=3.5, height=3.5, units="in", type="cairo", res=400, family="Arial") op <- par(mar=c(3,3,3,1), mgp=c(2,0.5,0), tcl=-0.25, ps=10) # RMS with F df$method <- factor(df$method) levs <- levels(df$method) col <- addAlpha(jetPal(length(levs)),0.5) lty <- seq(length(levs)) lwd <- rep(2,length(levs)) for(i in seq(length(levs))){ if(i == 1){ plot(recon_rms~nu, df, t="n", log="y", ylab="RMSE", xlab="Cumulative singular vectors [u]", main="Reconstruction error\nof un-centered field [F]") } lines(recon_rms~nu, df, subset=c(method==levs[i]), col=col[i], lty=lty[i], lwd=lwd[i] ) legend("topright", legend=levs, col=col, lty=lty, lwd=lwd, bty="n") } par(op) dev.off() delta.rms <- 1e-3 set.seed(1); system.time(Fg.din.irlba <- dineof(Fg, method = "irlba", delta.rms = delta.rms)) set.seed(1); system.time(Fg.din.svds <- dineof(Fg, method = "svds", delta.rms = delta.rms)) set.seed(1); system.time(Fsg.din.irlba <- dineof(Fsg, method = "irlba", delta.rms = delta.rms)) set.seed(1); system.time(Fsg.din.svds <- dineof(Fsg, method = "svds", delta.rms = delta.rms)) png("dineofRMS.png", width=7, height=3.5, units="in", type="cairo", res=400, family="Arial") op <- par(mfrow=c(1,2), mar=c(3,3,3,3), mgp=c(2,0.5,0), tcl=-0.25, ps=10) # Un-scaled plot(Fg.din.svds$RMS, log="y", t="l", xlim=range(seq(Fg.din.irlba$RMS), seq(Fg.din.svds$RMS)), ylim=range(Fg.din.irlba$RMS, Fg.din.svds$RMS), ylab="RMSE", xlab="Iteration (n)", main="DINEOF performance on\nun-centered gappy field [Fg]" ) lines(Fg.din.irlba$RMS, t="l", col=2, lty=2 ) par(new=TRUE) plot(Fg.din.svds$NEOF, log="", t="l", xlim=range(seq(Fg.din.irlba$NEOF), seq(Fg.din.svds$NEOF)), ylim=range(Fg.din.irlba$NEOF, Fg.din.svds$NEOF), axes=FALSE, xlab="", ylab="" ) lines(Fg.din.irlba$NEOF, t="l", col=2, lty=2 ) axis(4); mtext("EOFs [n]", side=4, line=2) legend("top", legend=c("svds", "irlba"), lty=1:2, col=1:2, bty="n") # Scaled plot(Fsg.din.svds$RMS, log="y", t="l", xlim=range(seq(Fsg.din.irlba$RMS), seq(Fsg.din.svds$RMS)), ylim=range(Fsg.din.irlba$RMS, Fsg.din.svds$RMS), ylab="RMSE", xlab="Iteration (n)", main="DINEOF performance on\ncentered gappy field [Fsg]" ) lines(Fsg.din.irlba$RMS, t="l", col=2, lty=2 ) par(new=TRUE) plot(Fsg.din.svds$NEOF, log="", t="l", xlim=range(seq(Fsg.din.irlba$NEOF), seq(Fsg.din.svds$NEOF)), ylim=range(Fsg.din.irlba$NEOF, Fsg.din.svds$NEOF), axes=FALSE, xlab="", ylab="" ) lines(Fsg.din.irlba$NEOF, t="l", col=2, lty=2 ) axis(4); mtext("EOFs [n]", side=4, line=2) legend("top", legend=c("svds", "irlba"), lty=1:2, col=1:2, bty="n") par(op) dev.off()
Hi Marc,
ReplyDeleteJust to let you know that I submitted a new package RSpectra (https://cran.r-project.org/web/packages/RSpectra/index.html) to CRAN,
which is essentially the same as rARPACK but with the new name to avoid confusion (since rARPACK no longer relies on ARPACK). Would you please also change that in your package?
Thank you so much!
Best,
Yixuan
Will do!
DeleteCheers, Marc
Thanks!
Delete