Data Visualization: ggplot2 in R Basics

No matter what kind of data work you are doing, at the end of every task, people, your audiences will always want to see some graphs. Even people with most sophisticated data/programming skills want to see some simple illustration that immediately deliver the message. In this case, data visualization is very important.

I am using R as my primary language when analyzing data. R is a very powerful language and for most people without programming experience, it is easier than most other language. And this conclusion is based on learning Python and C++ by myself.

R has a very powerful package called ggplot2. GG stands for Graphic Grammar, an idea developed by Leland Wilkinson. There is a book that he wrote about this package and the idea behind it. If you are interested in learning more complex ideas about this graphic grammar, you can find this book online.

Here I will just do a simple illustration of what a common ggplot code looks like and how each part works.

First of all, one should have a basic understanding of the principle how ggplot2 works. Ggplot2 treats graphs as layers of information. For those who have worked with Photoshop or Premiere, thinking the layers as layers of your photograph or special effect layers of your video. When we draw our graphs by hands, we follow certain routines : coordination, points, lines, etc. Ggplot2 works in the same way and taking each of these elements as layers. A good thing about this is that one can have so many things customized in the graph and edit them all in different layers. Let’s have an example here.

If this is the first time using ggplot2, remember to install the package. You can either install in the lower right window or simply write install(ggplot2).

ggplot2_1

Every time running a package, always remember to state library(package_name). Certain functions are only available under certain packages, if you don’t recall it in each R script, R would look for this function in the basic R commands and most likely return an error message.

I am using the data set mtcars. To streamline the process, I have my basic data import code here that you can copy and paste.

setwd(“your file location”)
data <- mtcars

library(ggplot2)

ggplot(data,aes(x=mpg,y=hp))   #specifying what goes into the x-axis and y-axis
+ geom_bar(stat=”identity”)     #specifying the shape(bar in this case) and what the shape is measuring.

Notice that the underlined part is the actual code here. This is a simple graph measuring the horsepower of each car based on their different mpg measure. Except for geom_bar, bar chart, there are a lot more shapes to explore in ggplot2. You can find them online or here.

ggplot2_2

The code could be understood as : ggplot(data, aes(x-axis variable, y-axis variable)) + shape/other characteristics. Notice that aes() is the function specifying any data-related or variable related statement, hence, if you have something you want to specifying using a variable, no matter it is a color fill or something else, that statement needs to into aes().

It is always a good habit to look at the command default before or after running into crazy graphs. For example, when you were not sure what should go into geom_bar, you can check online for the default setting. The default setting for geom_bar returns a bar chart graph counting the frequency of certain instances. If you do not specify the stat=” statement, then you are probably going to running into some error message like this:

ggplot2_3

Why did this happen, you may ask. Well, thinking of the default setting of frequency, if this command did process correctly, the y-axis on the graph should show us the frequency of each observation. However, with y-axis pre-defined, the conflict arise between the default and pre-determined axis. So one have to specify state=’identity’, to simply telling R, show the bar as what it is, in this case, the number y-axis represents.

So if we look at the following command,

ggplot(data,aes(mpg)) + geom_bar

ggplot2_4

We are getting the frequency chart of mpg. Notice that on y-axis the default setting of geom_bar() is showing.

This is an introductory how ggplot2 works. I will update more content later about how to make graphics looks appealing and useful.

Advertisements

One thought on “Data Visualization: ggplot2 in R Basics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s