Wednesday, October 28, 2015

R Summary from Open Data Science Conference

Gabriela de Queiroz (Sharethrough, USA) kicked off the first workshop of the weekend with an Introduction to R. She shared that the following 3 installation packages within R were very useful: ggplot2, tidyr, and dplyr. She continued to state that everything is a data frame and that if you’re familiar with SQL, dplyr is very similar.

Ms. De Queiroz also went over how to summarize data using the following 3 methods:
i.                summarise(data, avg = mean(col_a): summarises data into a single row of values.
ii.              Summarise_each(data, funs(mean)): applies summary function to each column
iii.             count(data, vars_to_group_by, wt = col_a): counts the number of rows with each unique value of variables (with or without weights).

One of the more important concepts she introduced was the PIPES concept. Gabriela purported that pipes make one’s code more efficient and legible. It utilizes the characters: [filename]%>%.

Miss. De Queiroz continued with a tutorial on using the ggplot2 package installation from R to visualize data. To build a ggplot, the coder needs to:
i.                bind the plot to a specific data frame using the data argument
a.     ggplot(data = [filename]
ii.              define aesthetics (aes) that map variables in the data to axes on the plot or to plotting size, shape, and color.
iii.             Add gemos
a.     ggplot(data = [filename], aes(x = dep_delay, y = arr_delay))
b.     gemo_point(alpha = 0.1, na.rm = TRUE)
c.      Add colors

d.     Add a boxplot

No comments:

Post a Comment