me nugget: 2012

Sunday, December 30, 2012

Spirograph with R

Just had to figure out how to replicate this old toy of mine with R! I had no idea how long it's been around:

Finding a pin in a haystack - PCA image filtering

I found the following post regarding the anomalous metal object observed in a Curiosity Rover photo to be fascinating - specifically, the clever ways that some programmers used for filtering the image for the object. The following answer on mathematica.stackexchange.com was especially illuminating for its use of a multivariate distribution to describe the color channels for a test region of "sand". This distribution was subsequently used to assess if the rest of the image colors belonged to the same distribution.

DINEOF (Data Interpolating Empirical Orthogonal Functions)

I finally got around to reproducing the DINEOF method (Beckers and Rixon, 2003) for optimizing EOF analysis on gappy data fields - it is especially useful for remote sensing data where cloud cover can result in large gaps in data. Their paper gives a nice overview of some of the various methods that have been used for such data sets. One of these approaches, which I have written about before, involves deriving EOFs from a covariance matrix as calculated from available data. Unfortunately, as the author's point out, such covariance matrices are no longer positive-definite, which can lead to several problems. The DINEOF method seems to overcome several of these issues.

Create polygons from a matrix

The following function matrix.poly allows for the addition of polygons to a plot based on a matrix and defined matrix positions. I have used this function on occasion to highlight specific matrix locations (e.g. in the above figure). You can do the same by overlaying another image (left in above plot) but with this function you will have all other polygon plotting possibilities (e.g. borders etc.).

Adding a transparent image layer to a plot

The following example shows how to add a transparent image-type layer to a plot. The add.alpha function (below) simply adds transparency to a vector of colors which is then introduced in the "col" argument of an image plot.

Working with Globcolour data

[UPDATE: new example can be found here. Functions have also been added to the sinkr package.]

The Globcolour project (http://www.globcolour.info/) provides relatively easy access to ocean color remote sensing data. Data is provided at http://hermes.acri.fr/ and the following parameters are available:

· Chlorophyll-a (CHL1 and CHL2)

· Fully normalised water leaving radiances at 412, 443, 490, 510, 531, 550-565, 620, 665-

670, 681 and 709 nm (Lxxx)

· Coloured dissolved and detrital organic materials absorption coefficient (CDM)

· Diffuse attenuation coefficient (Kd(490))

· Particulate back-scattering coefficient (bbp)

· Total Suspended Matter (TSM)

· Relative excess of radiance at 555 nm (EL555)

· Photosynthetic Available Radiation (PAR)

· Heated layer depth (ZHL)

· Secchi disk depth (ZSD)

· Primary production (PP)

· Aerosol optical thickness over water (T865)

· Cloud Fraction (CF)

Of particular interest to ecologists are the estimates of Chlorophyll a (chla) , which combines data from several satellites for better coverage - SeaWiFS (NASA), MODIS (NASA), MERIS (ESA). Data is available at several temporal (daily, 8-days, and monthly averages) and spatial (4.63 km, 0.25°, and 1°) resolutions for the global domain. Several merged products are available: simple averaging (AV), weighted averaging (AVW), and Garver, Siegel, Maritorena Model (GSM) [for more information see the Product User Guide].

Due to the gappy nature of the data (e.g. due to land and clouds), many of the data products only provide values at grids where estimation was possible. For high resolution data, such as in 4.63 km resolution daily estimates, grids with values are often far fewer than the total number of ISIN grids (n=23,761,676) used by the product. This saves space in the files for download, but you may need to reconstruct the field (with NaN's included for grids without observations) for some analyses.

The following example shows how to retrieve Globcolour data and process it using R. Global data is available, but I have provided instructions for processing a smaller area of 4.63 km resolution chla data from the Galapagos archipelago. One can define lat/lon limits for the desired area on the http://hermes.acri.fr/ interface. An ftp address will be sent via Email as to the location of the data when finished.

Here is a function that adds a frame of alternating colors to a map (un-projected). One defines the extension of each bar (in degrees) and an optional width of the bars (in inches). It uses the "joinPolys" function of the package to trim the bars near the map corners where the axes meet.

the map.frame function:

Canonical Correlation Analysis for finding patterns in coupled fields

First CCA pattern of Sea Level Pressure (SLP) and Sea Surface Temperature (SST) monthly anomalies for the region between -180 °W to -70 °W and +30 °N to -30 °S.

The following post demonstrates the use of Canonical Correlation Analysis (CCA) for diagnosing coupled patterns in climate fields. The method produces similar results to that of Maximum Covariance Analysis (MCA), but patterns reflect maximum correlation rather than maximum covariance. Furthermore, the output of the model is a combination of linear models that can be used for field prediction.

This particular method was illustrated by Barnett and Preisendorfer (1997) - it constructs a CCA model based on a truncated subset of EOF coefficients (i.e. "principle components") instead of using the original field (as with MCA). This truncation has several benefits for the fitting of the model - First, one reduces the amount of noise in the problem by eliminating the higher EOF modes, which represent poorly organized, small-scale features of the fields. Second, by using orthogonal functions, the algebra of the problem is simplified (see von Storch and Zweiers 1999 for details). Bretherton etal. (1992) reviewed several techniques for diagnosing coupled patterns and found the Barnett and Preisendorfer method (hereafter "BPCCA") and MCA to be the most robust.

Exponentiation of a matrix (including pseudoinverse)

The following function "exp.mat" allows for the exponentiation of a matrix (i.e. calculation of a matrix to a given power). The function follows three steps:
1) Singular Value Decomposition (SVD) of the matrix
2) Exponentiation of the singular values
3) Re-calculation of the matrix with the new singular values

The most common case where the method is applied is in the calculation of a "Moore–Penrose pseudoinverse", or a matrix raised to the power of -1. In this case, the function returns the same result as the "pseudoinverse" function from the corpcor package (although maybe not as nicely implemented). This should also be analogous to the pinv function in Matlab.

Less common is the need to calculate other powers of matrices. For example, I use this function to calculate the square root of a matrix (and square root of the inverse of a matrix) within Canonical Correlation Analysis (CCA). I hope to write a post shortly on the use of CCA in identifying coupled patterns in climate data.

Below is the code for the exp.mat function as well as some demonstrations of its use in R.

A ridiculous proof of concept: xyz interpolation

Ridiculous Orb

This is really the last one on this theme for a while... I had alluded to a combination of methods regarding xyz interpolation at the end of my last post and wanted to demonstrate this in a final example.

The ridiculousness that you see above involved two interpolation steps. First, a thin plate spline interpolation ("Tps" function of the fields package) is applied to the original random xyz field of distance to Mecca. This fitted model is then used to predict values at a new grid of 2° resolution. Finally, in order to avoid plotting polygons for each grid (which can be slow for fine grids), I obtain their projected coordinates with the mapproject function. Using these projected coordinates and their respective z values, a second interpolation is done with the "interp" function of the akima package onto a relatively fine grid of 1000x1000 positions. The result is a smooth field that can then be overlayed on the map using the "image" function (very fast).

So you may ask - When is this even necessary? I would say that it really only makes sense for projecting a filled.contour-type plot for relatively sparse geographic data. Be warned - for large amounts of xyz data, the interpolation algorithms can take a long time.

A couple of functions, found within this blog, are needed to reproduce the plot (earth.dist, color.palette).

the code to reproduce the figure...

XYZ geographic data interpolation, part 3

This will be probably be a final posting on interpolation of xyz data as I believe I have come to some conclusions to my original issues. I show three methods of xyz interpolation:
1. The quick and dirty method of interpolating projected xyz points (bi-linear)
2. Interpolation using Cartesian coordinates (bi-linear)
3. Interpolation using spherical coordinates and geographic distances (thin plate spline)

XYZ geographic data interpolation, part 2

Having recently received a comment on a post regarding geographic xyz data interpolation, I decided to return to my original "xyz.map" function and open it up for easier interpretation. This should make the method easier to adapt and follow.

The above graph shows the distance to Mecca as interpolated from 1000 randomly generated lat/lon data using the "interp" function of the akima package. Several functions, found within this blog, are needed to reproduce the plot (pos2coord, earth.dist, new.lon.lat, color.palette, val2col, image.scale). One thing you will notice is the strip of uninterpolated area within the stereographic projection. This is a problem that I have yet to resolve and has to do with the fact that the interpolation is not considering the connection along the 180° longitude line. This will probably require some other type of interpolation based on geographic distances rather than Cartesian coordinates.

R code to produce the above graph...

me nugget

Sunday, December 30, 2012

Spirograph with R

Tuesday, December 4, 2012

Finding a pin in a haystack - PCA image filtering

Tuesday, October 30, 2012

DINEOF (Data Interpolating Empirical Orthogonal Functions)

Friday, April 27, 2012

Create polygons from a matrix

Thursday, April 19, 2012

Adding a transparent image layer to a plot

Monday, April 2, 2012

Working with Globcolour data

Add a frame to a map

Sunday, March 25, 2012

Canonical Correlation Analysis for finding patterns in coupled fields

Thursday, March 22, 2012

Exponentiation of a matrix (including pseudoinverse)

Wednesday, March 14, 2012

A ridiculous proof of concept: xyz interpolation

Monday, March 12, 2012

XYZ geographic data interpolation, part 3

Wednesday, February 29, 2012

XYZ geographic data interpolation, part 2

NPR Topics: Science

WWF - Environmental News