Thursday, April 25, 2013

A plea for less word clouds

Word cloud of DOMA hearing transcripts

I must admit, there is something appealing about the word cloud - that is, until you try to understand what it actually means...

Word clouds are pervasive - even in the science world. I was somewhat spurred to write this given the incredibly wasteful summaries of EGU General Assembly survey results that include several useless word clouds (link to document). Capitalization of words isn't even considered; e.g. "Nice" vs."nice". I have been hesitant to equate word clouds to the hilariously labeled "mullets of the internet" but, on second thought, it is entirely appropriate. They were once fad, but seem reluctant to die...

Oh, and yes, a "tag cloud" is a type of word cloud - I have fallen into the trap myself by including such a thing on this blog! I honestly didn't make the connection at first, because, at least, it had the function of showing the relative importance of terms that I personally defined as topics - not an arbitrary puking up of all the words that I have ever written here. Nevertheless, I think it must be removed now - I can't tell you how many times that I have wanted to go to a specific blog post by clicking on a tag, only to be forced to search into the nether regions of (extremely) small font size. Simple alphabetical arrangement probably makes more sense.

There are some attempts at making word clouds with R (most notable the "wordcloud" package), but they don't seem to be as visually appealing as those easily produced by sites such as Wordle. Nevertheless, you continue to see such things produced - just do a search for "word cloud" on R-bloggers for many examples.

I decided to give Wordle a try, and chose the Defence of Marriage Act (DOMA) hearing transcripts as a source for text. The above word cloud shows the results (with some beautiful patriotic colonial-looking font to boot!). It doesn't reveal much to me. An initial attempt caught me off-guard in that the dominant word was "justice" (below), which would have possibly been insightful if it hadn't been a construct of the prevalence of the speakers titles (i.e. "Justice Kagan"):

An even more worthless word cloud of DOMA hearing transcripts

Anyway, I'm glad I'm not alone in this thinking - I have come across many discussions along the same lines; in particular, the nice article Jacob Harris. Unfortunately, it seems they are here to stay, and I will just have to learn to better avert my eyes from their alluring power in the future...