Showing posts with label doBy. Show all posts
Showing posts with label doBy. Show all posts

R is the easiest language to speak badly

I am amazed by the number of comments I received on my recent blog entry about "by", "apply" and friends. I had started my post by pointing out that R is a language. Well indeed, I have come to the conclusion, that it is a language with lots of irregular expressions and dialects. It feels a bit like German or French where you have to learn and memorise the different articles. The Germans have three singular definite articles: der (male), die (female) and das (neutral), the French have two: le (male) and la (female). Of course there is no mapping between them, and how do you explain that a girl in German is neutral (das Mädchen), while manhood is female (die Männlichkeit)?

Back to R. As I found out, there are lots of different ways to calculate the means on subsets of data. I begin to wonder, why so many different interfaces and functions have been developed over the years, and also why I didn't use the aggregate function more often in the past?

Can we blame internet search engines? Why should I learn a programming language properly, when I can find approximate answers to my problem online. I may not end up with the best answer, but with something which will work after all: Don't know why, but it works.

And sometimes the help files can be more difficult to understand than the code in the examples. Hence, I end up playing around with the example code until it works, and only then I try to figure out how it works. That was my experience with reshape.

Maybe this is a bit harsh. It is always up to the individual to improve his language skills, but you can get drunk in a pub as well, by only being able to order beer. I think it was George Bernard Shaw, who said: "R is the easiest language to speak badly." No, actually he said: "English is the easiest language to speak badly." Maybe that explains the success of English and R?

Reading helps. More and more books have been published on R over the last years, and not only in English. But which should you pick? Xi'an's review on the Art of R Programming suggests that it might be a good start.

Back to aggregate. Has anyone noticed, that the formula interface of aggregate is different to summaryBy?

aggregate(cbind(Sepal.Width, Petal.Width) ~ Species, data=iris, FUN=mean)
Species Sepal.Width Petal.Width
1 setosa 3.428 0.246
2 versicolor 2.770 1.326
3 virginica 2.974 2.026

versus

library(doBy)
summaryBy(Sepal.Width + Petal.Width ~ Species, data=iris, FUN=mean)
Species Sepal.Width.mean Petal.Width.mean
1 setosa 3.428 0.246
2 versicolor 2.770 1.326
3 virginica 2.974 2.026

And another slightly more complex example:
aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, FUN=sum)
summaryBy(ncases + ncontrols ~ alcgp + tobgp, data = esoph, FUN=sum)


Say it in R with "by", "apply" and friends

Iris versicolor 
By Danielle Langlois
License: CC-BY-SA

R is a language, as Luis Apiolaza pointed out in his recent post. This is absolutely true, and learning a programming language is not much different from learning a foreign language. It takes time and a lot of practice to be proficient in it. I started using R when I moved to the UK and I wonder, if I have a better understanding of English or R by now.

Languages are full of surprises, in particular for non-native speakers. The other day I learned that there is courtesy and curtsey. Both words sounded very similar to me, but of course created some laughter when I mixed them up in an email.

With languages you can get into habits of using certain words and phrases, but sometimes you see or hear something, which shakes you up again. So did the following two lines in R with me:


f <- function(x) x^2
sapply(1:10, f)
[1] 1 4 9 16 25 36 49 64 81 100
Read more »