Gabriela de Queiroz (Sharethrough, USA) kicked off the first
workshop of the weekend with an Introduction to R. She shared that the
following 3 installation packages within R were very useful: ggplot2, tidyr,
and dplyr. She continued to state that everything is a data frame and that if
you’re familiar with SQL, dplyr is very similar.
Ms. De Queiroz also went over how to summarize data using
the following 3 methods:
i.
summarise(data, avg = mean(col_a): summarises
data into a single row of values.
ii.
Summarise_each(data, funs(mean)): applies
summary function to each column
iii.
count(data, vars_to_group_by, wt = col_a):
counts the number of rows with each unique value of variables (with or without
weights).
One of the more important concepts she introduced was the PIPES concept. Gabriela purported that
pipes make one’s code more efficient and legible. It utilizes the characters:
[filename]%>%.
Miss. De Queiroz continued with a tutorial on using the
ggplot2 package installation from R to visualize data. To build a ggplot, the
coder needs to:
i.
bind the plot to a specific data frame using the
data argument
a.
ggplot(data = [filename]
ii.
define aesthetics (aes) that map variables in
the data to axes on the plot or to plotting size, shape, and color.
iii.
Add gemos
a.
ggplot(data = [filename], aes(x = dep_delay, y =
arr_delay))
b.
gemo_point(alpha = 0.1, na.rm = TRUE)
c.
Add colors
d.
Add a boxplot
No comments:
Post a Comment