Intro to R Statistics and Data Analysis

There are a lot of statistics programs that can help facilitate your data analysis. Each one is unique and has something special to offer you. But there is one program that stands out above the rest… R. R is a statistics program that is capable for very powerful analyses, but its language can be daunting and down right confusing at times. Let’s discuss some of the biggest points about R statistics and R data analysis.

The Program

Probably one of the best things about R is that it is free to download and use. Not only that, thousands of programmers and statisticians are constantly tweaking it and adding to it for our benefit. This means that R is constantly updating and adding features. R can handle just about any statistical method that you need, and if it doesn’t, someone is working on way for it to do that.

Now, the R program itself is free and available here. Personally, I download the most recent version every few months because the contributors to R are constantly updating it. Once you download the version (Mac, PC, Linux) that you need, you just open it up and you are ready to go.

Once you open it, you see a command console similar to any terminal in any operating system. That is all there is to R, no buttons or drop down menus to mess with. However, this programming style makes it sleek and cool looking, but very confusing. That is why I use something called R Studio.

R Studio

R Studio is a user interface that keeps everything in easy-to-find places. It is a separate program, but it opens up R in the background so you don’t have to have 2 open programs at once. There are a few key parts that you should be familiar with. For our example, I have loaded some of my notes from a class I took in item response theory (my second favorite set of statistical methods).

r statistics -magoosh

The console is where are the real work is done. In this section, you type all your commands and programs that you want to conduct. Everything that is run in R is typed in the console.

The section above the console is the script editor. You can type your code up here and run it. The benefit of the script editor is that you can easily manipulate the code that you are writing and change things without having to type commands in the console over and over again. Another benefit of the script editor is that you can add notes to yourself. For example, I have typed some of my initial notes about R directly into the script, which I save for later use.

The next part is the global environment section in the upper right hand corner. This shows you all of the data sets and variables that you are using. If you do not use R Studio, you would have to keep track of these things yourself. It also contains a history of every command that you have coded and ran. This is a very useful feature when you are starting a statistics session over again.

In the bottom right hand corner, there is the file, plots, and packages window. The files tab allows you to look through the files in the working directory or on the rest of your computer. The packages tab is where you load up different packages that contain the specific statistical tests that you may need during the all-night stats-a-thon you’ll be doing. The plots tab is where you can view the variety of plots that you may be generating as part of your analyses.

Objects

R is an object-oriented program. This means that it works with objects that you have loaded into it or created within it. A variable is an example of an object, but so is a matrix or vector. In fact, you can load a whole data set as an object.

The benefit of being an object-oriented language is that you can perform operations between objects. The most useful tool that I find myself using is loading multiple data sets to analyze them for patterns. In my experience, no other program handles multiple data sets as well as R does.

Packages

R does some analyses on its own, such as linear regressions and tests like t-test or ANOVA. But it doesn’t innately run tests like mediation or moderation. To do these higher level analyses, you have to install and load packages.

A package is something that contains a set of features that are specific to your needs. For example, when running analyses for item response theory, I use a package called ‘CTT’ and ‘psychometric’. These packages contain the programs that are required to conduct these analyses. There are packages for almost every statistical method. The best place to find out what packages that you need is sites like R Bloggers and Stack Overflow.

Programming

The most intimidating part of R is the lack of preset menus and commands like SPSS or Stata. You have to be explicit in telling what R what you want to do. This means that you will have to use commands similar to programming in something like html or Python.

Sometimes the coding simple

V1 <- 5 * (1/5)

In R you can make extensive use of programming commands such as ifelse or the dreaded but useful loops. For example, changing a variable is a simple matter of a command like

V_new <- ifesle(V1 == 1, 5, N/A)

This is interpreted as "Create a new variable, V_new, such that if V1 = 1, then change it to 5. If V does not equal 5, mark it as N/A.

Loops are created in similar ways.

for (year in c(2010,2011,2012,2013,2014,2015)){
print(paste("The year is", year))
}

This is a simple loop that prints out the year for various years. However, just as in other programming languages, you must be careful of creating loops that don't stop or do exactly what you want. We are not going to get deep into the programming language itself, but check this out if you want to start.

The Takeaways

R is an amazingly expansive program that is capable of many statistical techniques from simple correlations to structural equation modeling and more. It is a free program (so is R Studio) that you can almost infinitely customize to meet your needs. Being an object-oriented language, you can manipulate multiple data sets and object with ease. Packages allow you to access almost any statistical test that you may ever need. While the initial learning curve for the programming language is pretty steep, it is amazing versatile once you get it. I hope this post helps give you an idea of what R is and can do. I look forward to seeing your questions below. Happy statistics!

P.S. I need to let you know that I am not some sort of red-belt in R. I know a lot about it and how to use it, but this post is only meant to serve as an introduction to what R is and what it can do. if you are really interested in what it can do and how to use it, I recommend R Bloggers and books like the R Cookbook.

Comments are closed.


Magoosh blog comment policy: To create the best experience for our readers, we will only approve comments that are relevant to the article, general enough to be helpful to other students, concise, and well-written! 😄 Due to the high volume of comments across all of our blogs, we cannot promise that all comments will receive responses from our instructors.

We highly encourage students to help each other out and respond to other students' comments if you can!

If you are a Premium Magoosh student and would like more personalized service from our instructors, you can use the Help tab on the Magoosh dashboard. Thanks!