The Globcolour project (http://www.globcolour.info/)
provides relatively easy access to ocean color remote sensing data. Data is
provided at http://hermes.acri.fr/
and the following parameters are available:
· Chlorophyll-a (CHL1 and CHL2)
· Fully normalised water leaving radiances at 412, 443, 490,
510, 531, 550-565, 620, 665-
670, 681 and 709 nm (Lxxx)
· Coloured dissolved and detrital organic materials
absorption coefficient (CDM)
· Diffuse attenuation coefficient (Kd(490))
· Particulate back-scattering coefficient (bbp)
· Total Suspended Matter (TSM)
· Relative excess of radiance at 555 nm (EL555)
· Photosynthetic Available Radiation (PAR)
· Heated layer depth (ZHL)
· Secchi disk depth (ZSD)
· Primary production (PP)
· Aerosol optical thickness over water (T865)
· Cloud Fraction (CF)
Of particular interest to ecologists are the estimates of Chlorophyll a (chla) , which combines data from several satellites for better coverage - SeaWiFS (NASA), MODIS (NASA), MERIS (ESA). Data is available
at several temporal (daily, 8-days, and monthly averages) and spatial (4.63 km,
0.25°, and 1°) resolutions for the global domain. Several merged products are available: simple averaging (AV), weighted averaging (AVW), and Garver, Siegel, Maritorena Model (GSM) [for more information see the Product User Guide].
Due to the gappy nature of the data (e.g. due to land and clouds), many of the data products only provide values at grids where estimation was possible. For high resolution data, such as in 4.63 km resolution daily estimates, grids with values are often far fewer than the total number of ISIN grids (n=23,761,676) used by the product. This saves space in the files for download, but you may need to reconstruct the field (with NaN's included for grids without observations) for some analyses.
Due to the gappy nature of the data (e.g. due to land and clouds), many of the data products only provide values at grids where estimation was possible. For high resolution data, such as in 4.63 km resolution daily estimates, grids with values are often far fewer than the total number of ISIN grids (n=23,761,676) used by the product. This saves space in the files for download, but you may need to reconstruct the field (with NaN's included for grids without observations) for some analyses.
The following example shows how to retrieve Globcolour data
and process it using R. Global data is available, but I have provided
instructions for processing a smaller area of 4.63 km resolution chla data from
the Galapagos archipelago. One can define lat/lon limits for the desired area on the http://hermes.acri.fr/ interface. An ftp address will be sent via Email as to the location of the data when finished.
Several steps are presented in the example below:
In UNIX-type console:
1. Download and unzip .nc files (I use the "wget" command within a cygwin console on my PC)
In R:
2. Open successive .nc files and extract the chla estimates
with corresponding ISIN grid column and row identifiers. Due to the varying
size of each file, the data are stored in lists of undefined size.
3. Unlist the variables into vectors and create a data.frame
4. Make a sparse matrix with available data
5. (optional) Convert to a matrix object for some analyses
6. Obtain ISIN grid data using function
"isin.convert" found below (grid identification, lat/lon positions, polygons for
mapping)
7. Calculate statistics and plot
I have had some issues with projection-type plots for small areas due to some troubles of the mapproject function in using small lat/lon limits. Therefore, the example just creates a simple plot using Cartesian coordinates (although the dimensions of the plot have been set to approximate the geographic distances of the area. You will need several other functions found within this blog to reproduce the example (val2col, color.palette, image.scale, earth.dist, map.frame.R). Several packages are also required (ncdf, Matrix, maps, mapdata, PBSmapping).
the isin.grid function:
#This function is used to convert information regarding the #ISIN grid information used by Globcolour as well as to construct #associated polygons for use in mapping. #The raw Globcolour .nc files come with column and row pointers #as to the the grid's location. For 4.63 km resolution data, this #translates to 4320 latitudinal rows with varying number of #associated longitudinal columns depending on the latitudinal #circumference. #Input must be either a vector of grid numbers ["grd"] or a dataframe #with column and row identifiers ["coord", e.g. columns in coord$col and #rows in coord$row] #When the argument "polygon=FALSE" (Default), the function will output #a dataframe object containing grid information (gridnumber["output$grd"], #column["output$col"], row["output$row"], longitude["output$lon"], and #latitude[output$lat]) #If the argument "polygon=TRUE", then the putput will be a list with #polygon shapes in a dataframe(longitudinal coordinates of corners #["[[i]]$x"], latitudinal coordinates of corners ["[[i]]$y"]) isin.convert <- function(grd=NULL, coord=NULL, polygons=FALSE){ ###ISIN grid definition Re=6378.145 #NUMROWS Nlat=4320 dr=(pi*Re)/Nlat #lat bin width dPHI=pi/Nlat PHI = -(pi/2)+(1:Nlat)*dPHI-(dPHI/2) p=2*pi*Re*cos(PHI) Nlon=round(p/dr, 0) dlon=p/Nlon dphi=(2*pi)/Nlon Ntot=sum(Nlon) lat=PHI*180/pi if(!missing(grd)){ #calc. coord cumNlon <- c(0,cumsum(Nlon)) g.row <- sapply(grd, fun <- function(x){max(which(cumNlon < x))}) g.col <- grd - cumNlon[g.row] #calc. lonlat g.lat = lat[g.row] Nlon_row = Nlon[g.row] g.lon <- (360*(g.col-0.5)/Nlon_row) - 180 } if(!missing(coord)){ #calc. coord g.row <- coord$row g.col <- coord$col #calc. lonlat g.lat = lat[g.row] Nlon_row = Nlon[g.row] g.lon <- (360*(g.col-0.5)/Nlon_row) - 180 #calc. grd cumNlon <- c(0,cumsum(Nlon)) grd <- cumNlon[g.row] + g.col } if(polygons){ Nlon_row = Nlon[g.row] g.width <- 360/Nlon_row g.height <- 180/Nlat tmp <- data.frame(g.lon=g.lon, g.lat=g.lat, g.width=g.width, g.height=g.height) polys <- vector(mode="list", length(grd)) for(i in seq(polys)){ xs <- c(tmp$g.lon[i]-tmp$g.width[i]/2, tmp$g.lon[i]-tmp$g.width[i]/2, tmp$g.lon[i]+tmp$g.width[i]/2, tmp$g.lon[i]+tmp$g.width[i]/2) ys <- c(tmp$g.lat[i]-tmp$g.height[i]/2, tmp$g.lat[i]+tmp$g.height[i]/2, tmp$g.lat[i]+tmp$g.height[i]/2, tmp$g.lat[i]-tmp$g.height[i]/2) polys[[i]] <- data.frame(x=xs, y=ys) } return(polys) } if(!polygons){ data.frame(grd=grd, col=g.col, row=g.row, lon=g.lon, lat=g.lat) } }
to reproduce the example above:
###required libraries library(ncdf) library(Matrix) library(maps) library(mapdata) library(PBSmapping) #required functions (from "www.menugget.blogspot.com") source("val2col.R") source("isin.convert.R") source("color.palette.R") source("image.scale.R") source("earth.dist.R") source("map.frame.R") ###to get ordered products and unzip them ###get all 2003 average weighted value (AWV) files #wget -r -nd --ftp-user=hermes --ftp-password=hermes% 'ftp://hermes.acri.fr/922733645/L3b_2003*AWV*gz' ###unzip .nc files #gunzip *.gz #define path of .nc files #may need to adjust slash orientation depending on UNIX or Windows #raw_path <- "C:/cygwin/home/ME/" raw_path <- "C:\\cygwin\\home\\ME\\globcolour\\galapagos\\" #Define years to include YEAR_min=1997 YEAR_max=2012 #Import data grd <- list() chl <- list() date <- list() iter=0 for(i in seq(YEAR_max-YEAR_min+1)){ #tmp = system(paste("ls ", raw_path), intern=TRUE) #for UNIX tmp = unlist(strsplit(shell(paste("dir", raw_path), intern=TRUE), " ")) #for Windows YEAR <- c(YEAR_min:YEAR_max)[i] tmp <- tmp[grep(".nc", tmp, fixed=TRUE)] tmp <- tmp[grep(paste("L3b_", YEAR, sep=""), tmp)] #additional filters if needed #tmp <- tmp[grep("GSM", tmp)] #only GSM products #tmp <- tmp[grep("MO", tmp)] #only monthly products list_obs = data.frame(file_name=tmp, date=format(strptime(substr(tmp, 5, 12), "%Y%m%d"))) for (j in 1:length(list_obs[,1])){ #loop_2: for each date j iter <- iter+1 nc <- open.ncdf(paste(raw_path,as.character(list_obs$file_name[j]),sep="")) # opens Globcolour data file and idxrow = get.var.ncdf(nc, "row")+1 idxcol = get.var.ncdf(nc, "col")+1 val = get.var.ncdf(nc, "CHL1_mean") close.ncdf(nc) rm("nc") date[[iter]] <- as.character(rep(list_obs$date[j], length(val))) grd[[iter]] <- isin.convert(coord=data.frame(col=idxcol, row=idxrow))$grd chl[[iter]] <- val print(as.character(list_obs[j,2])) } } #unite data into dataframe DB <- data.frame( date=unlist(date), grd=unlist(grd), chl=unlist(chl) ) head(DB) tail(DB) dim(DB) ###Fast way of filling sparse data into a field #create lookup tables for unique dates and grids #lookup for dates u.date <- unique(DB$date) u.date <- u.date[order(u.date)] date.lut <- data.frame(date=u.date, id=seq(u.date)) date.lut DB$dateid <- date.lut$id[match(DB$date, date.lut$date)] #lookup for grids u.grd <- unique(DB$grd) u.grd <- u.grd[order(u.grd)] grd.lut <- data.frame(grd=u.grd, id=seq(u.grd)) grd.lut DB$grdid <- grd.lut$id[match(DB$grd, grd.lut$grd)] #Create sparse matrix field <- sparseMatrix(i=DB$dateid, j=DB$grdid, x=DB$chl) dim(field) #Convert field object to class matrix and give appropriate row and column names field <- as.matrix(field) field[which(field==0)] <- NaN rownames(field) <- as.character(date.lut$date) colnames(field) <- grd.lut$grd #grid info using isin.convert() isin <- isin.convert(grd=as.numeric(colnames(field))) head(isin) polys <- isin.convert(grd=as.numeric(colnames(field)), polygons=TRUE) ###Explorations of data #fraction of months with observation. Retain only those with greater than 20% frac.obs <- colSums(!is.na(field))/dim(field)[1] #fraction of dates with observation value grd.incl <- which(frac.obs > 0.20) # grids to include - fraction observed > 0.2 length(grd.incl) #Monthly mean values MON <- as.POSIXlt(rownames(field))$mon mon.mean <- matrix(NaN, nrow=12, ncol=dim(field)[2]) rownames(mon.mean) <- 1:12; colnames(mon.mean) <- colnames(field) for(i in 1:12){ mon.mean[i,grd.incl] <- colMeans(field[which(MON == i-1),grd.incl], na.rm=TRUE) } #Yearly means spatial.mean <- colMeans(mon.mean, na.rm=TRUE) #Plot of map without projection val <- spatial.mean pal=color.palette(rev(c("red", "yellow", "cyan", "blue")), n.steps.between=c(2,5,10)) COLS <- val2col(val, col=pal(100)) YLIM <- c(min(isin$lat), max(isin$lat)) XLIM <- c(min(isin$lon), max(isin$lon)) x.dist <- earth.dist(XLIM[1], mean(YLIM), XLIM[2], mean(YLIM)) y.dist <- earth.dist(mean(XLIM), YLIM[1], mean(XLIM), YLIM[2]) heights=c(6) widths=c(heights[1]*x.dist/y.dist, 1.5) res=400 png(filename="spatial.mean.no.projection.png", width = sum(widths), height = sum(heights), units="in", res=res) par(omi=c(0.1, 0.1, 0.1, 0.1), ps=12) layout(matrix(c(1,2),nrow=1,ncol=2,byrow=TRUE), widths = widths, heights = heights, respect=TRUE) layout.show(2) par(mai=c(0.5, 0.5, 0.1, 0.1)) plot(isin$lon, isin$lat, t="n", xlim=XLIM, ylim=YLIM, xaxs="i", yaxs="i") for(i in seq(polys)){ polygon(polys[[i]], col=COLS[i], border=COLS[i], lwd=0.001) } map("worldHires", fill=TRUE, col="grey", add=TRUE) abline(h=seq(-90,90,0.5), lty=3, col="grey") abline(v=seq(-180,180,0.5), lty=3, col="grey") map.frame(deg.ext=0.5) #add scale par(mai=c(0.5, 0.1, 0.1, 1)) image.scale(val, col=pal(100), horiz=FALSE, yaxt="n", ylab="", xlab="") axis(4) mtext(paste("yearly mean chl a [mg m^-3, ", YEAR_min, "-", YEAR_max, "]", sep=""), side=4, line=3) box() dev.off()
This is fantastic. Thanks! I'm re-writing this in python and will report back once I have it all together. Just curious, what is it you are using the `dphi` for in the `isin.convert` function?
ReplyDeleteGlad you find it helpful. `dphi` is the "Latitudinal angular discretisation (radians)". You can find a full description of all of the ISIN grid definitions in the "Product User's Guide" .pdf , p.69(see link at the end of the second paragraph). Cheers
ReplyDelete