Beginning R

R is a unique object-oriented language designed specifically for statistics. However, it’s also extremely versatile and has a large community dedicated to developing open-source content. This means that R’s versatility extends beyond statistics to other mathematical applications, graphical representation of data, and interfacing with other coding languages. If you’re not familiar with coding there’s a bit of a learning curve ahead of you (but probably not more so than you would have for any other coding language). If you are completely unwilling to learn coding but need to perform statistics I’d encourage you to look into some of the GUIs (graphical user interfaces) available for R before defaulting to other software (SPSS, SAS, etc.). There are many reasons why R should be used instead of other software but my biggest reasons are as follows:

  1. It’s free: As a poor student this has been a huge benefit to me but has far reaching benefits. Unlike other products, free means better quality. This is because free software that’s actively developed in a scientific community is constantly undergoing improvements. There area lot of packages you can download for R that are the product of someone’s PhD. In fact, the most difficult part of using R for me has been sifting through the large amount of content available to find the best tool for what I’m doing.

  2. Increases good research: In my opinion this is not only a product of increased reproducibility of research but because people also can’t blindly click on menus until a screen with statistical results pops up. The first statistical software I ever learned was SPSS and I ended up with a lot of meaningless results in my introductory courses using it (although, they did look impressive on paper). This is primarily because the way statistical formulas are represented in R requires atleast an idea of what you’re doing. It doesn’t stop people from doing bad statistics, but it does have enough of a learning curve to get rid of some who are too lazy to do it right.

Learning R

If you’re looking for a basic introduction to the R interface there’s a brief course called “swirl” you can go through on your own. To do it type the following into your R interface.

install.packages("swirl")
library(swirl)
install_from_swirl("R Programming Alt")
swirl()

Going through this should help you get more comfortable with what you’re actually doing when typing code into R.

Quick-R has proven to be a great resource for learning the rudimentary aspects of statistics in R. It covers some basic topics on coding interface but concentrates more on the actual statistics. Statistics range from beginner (i.e. simple regression) to more advanced topics (i.e. bootstrapping). There is an accompanying book to the webpage with far more content. My advice for all books that teach R is as follows. Ask yourself if 30 extra minutes searching for similar content online but saving some money outways the convenience of having the book but spending the money. To be clear, I’m not endorsing illegal downloading. But most good packages have tutorial information out there somewhere. Oftentimes its available as a vignette (a tutorial that ships with the package). These can be accessed either through the downloaded package or they can be accessed on the CRAN (Comprehensive R Archive Network). If you just search for <package_name> cran it’s usually one of the first results.

The other resource is google. I know that sounds obvious, but you wouldn’t believe how much you can find online that isn’t useless dribble if you actualy put in some effort. One gem I found a couple years ago was R-Tutorials. It was a great way of learning how to do statistics right, or with a critical eye. If the topic you’re interested in is commonly a part of a statistics course it probably has plentiful resources available on its application in R.

Coding in R

If you’re interested in learning more about how R works and how to write code for R I’d encourage you to look at Advanced R by Hadley Wickham. Not only is it a good reference for learning the important aspects of R’s internal workings when writing code, but it also provides an introduction to the the C++ and C interface that exists in R (both of which are useful if you’re interested in writing extremely fast code or interfacing to existing C++/C libraries). It’s also a good idea to look at the Rcpp homepage if you’re going to be doing a lot of C++ in R. If you’re ultimately writing code that is to be shared with someone else it’s probably a good idea to write it in a package (even if it’s not going to be a part of an official repository). Once again Hadley Wickham provides some great materials in R packages for understanding how R packages work.

The alternative to the Hadley Wickham resources is The R Manual. It consists of several big PDFs that provide a fairly comprehensive resource on how R works. I personally have only skimmed through them because everytime I looked through them my will to live was slowly sucked away. They are very boring and very long. Only read these if you want to be an expert or experience extreme depression.

Great Packages

As I previously mentioned, there are a lot of packages that out there and sorting through them can be overwhelming. Here are some good packages to have ready.

  • MASS: provides some additional statistical funcitonality to R and extends R with some Matlab like functions.

  • lme4: A great resource for performing mixed models

  • multcomp: A resource for looking at different contrasts within statistical models. Personally, I find it much easier to use than the native system for doing contrasts in R.

  • devtools: There are a lot packages on github that are worth using and this provides an interface to installing them into R directly from github. It also has a lot of other tools, but I mostly use it for that.

  • glmnet: Useful for fitting lasso and elastic-net regularized generalized linear models

  • kniter: Converts .Rnw, .Rmd, and latex documents into PDF, markdwon, or html documents. Surprisingly easy to use for such a difficult task it performs (although it is not without some flaws).

  • Matrix: A lot of packages use this and it provides sparse matrix tools

  • Rcpp: R interface to C++

  • dplyr: tools for manupilating and cleaning up data

  • ggplot2: Visualizing data. Widely used extensivly documented.

  • rgl: renders 3-dimensional interactive models

  • misc3d: A convenient way of rendering surfaces of 3-dimensional structures

  • ANTsR: The R interface to the advanced normalization tools (note: only available on github). The best available resource (in my opinion) for working with medical images.

  • magrittr: Provides additional syntax for more readable and condensed coding. It’s worth looking at the magrittr abstract

There are many more packages that are also useful but these are the ones I use the most.

Comments

Popular posts from this blog

Making a Brain Template

Preparing for the MCAT

Align and Bias Correct Your Brain!