--- title: "Introduction to R Markdown" author: "Taylor N. Carlson" date: '\today' geometry: margin=2cm header-includes: - \usepackage{placeins} output: pdf_document: fig_caption: yes keep_tex: yes toc: yes toc_depth: 3 html_document: fig_caption: yes highlight: pygments keep_md: no lib_dir: libs mathjax: null template: default.html theme: default toc: yes --- ```{r global_options, include=FALSE, cache=FALSE} rm(list=ls()) knitr::opts_chunk$set(fig.width=6, fig.height=6, fig.path='figures/', warning=FALSE, message=FALSE, fig.retina=NULL, cache=F, autodep=T, echo=FALSE) library(kfigr) #figure referencing for markdown library(stargazer) #pretty tables library(xtable) #pretty tables library(knitr) #knitting ``` ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # What is R Markdown? R Markdown is a fantastic tool that allows you to write your code and analysis all in one document. Creating data reports to help you get to know your data is much simpler, largely because it helps keep everything organized. It also helps you be more transparent in your research since it allows you to show the code you used to generate your results right next to them if desired. If you make a coding error (we've all been there ...) you don't have to go back and redo every single table, figure, or in-text description. R Markdown will update it all for you! With Markdown, you can write out to HTML, PDF, or MS Word. The syntax is a little bit different for each output. My personal preference is PDF. You will need to have LaTeX installed for this to work. If you don't have LaTeX installed, you should still be able to follow along and have your output be HTML instead of a PDF. # The Basics ## Header / Preamble What you see at the top of this .rmd file is a preamble. It essentially tells us how we want our final document to be formatted. For example, we can see the title, author, and date of the document up at the top. After that, we can see how big we want our margins to be (2cm in this case). We can also see a bunch of features under "output." Note on the PDF output that we can choose whether to have a table of contents (toc) or not, as well as how many levels of that table of contents we want to have (3 in our case). The table of contents is automatically hyperlinked, which is awesome. ## Writing R Code Any time you want to start writing your R code, you'll create a "code chunk" by using three tick marks (above the tab key) and curly brackets like this: ```{r intro, echo=TRUE, eval=TRUE} # Here's where I'd start my R Code # I'm using the pound sign (or "hashtag" for you millennials) # to -- just like I would if I were writing in a .R file # You end the code chunk with three more tick marks ``` The R Markdown syntax is fairly straightforward. Here's what's going on inside the curly brackets in the example above: * r --- tells Markdown that we're going to be writing R Code * intro, --- this is optional, but it just gives the codechunk a name. This is helpful for when you start making figures and tables, so you can refer to them in the text. This can only be one word, but you can use underscores or periods as separators (e.g. intro_chunk,). You should end with a comma. * echo=TRUE --- this tells us that we want our pdf to "echo" (include) our code. If echo=TRUE, our PDF will include everything in the code chunk. This is useful if you want to show someone the code you used to generate a result (e.g. on a problem set, when working with a collaborator, when having trouble with your code and you want help from someone). Usually, you will probably not want this code in your data report, so most of the time you'll want echo=FALSE. * eval=TRUE --- this tells us that we want to actually run the code in that chunk. Sometimes you end up writing code to generate a figure that you end up wanting to cut from the paper, but don't want to delete the code altogether. You can just set eval=FALSE to tell R to ignore that whole code chunk. * You can add other information between these curly brackets for more advanced options. For example: + results="asis" --- this is useful when you generate a table with xtable() or stargazer(). + fig.cap="blah blah blah" --- this is where you'd type in the caption to a figure (presumably you'd write something more informative than blah blah blah, but you do you) + fig.height=5 --- this is how you can customize the height of a figure (here I've set it to 5 inches) + fig.width=5 --- this is how you can customize the width of a figure (here I've set it to 5 inches) ## Example Loading in Data ```{r load_data, echo=FALSE, eval=TRUE} # Set working directory setwd("/Users/Taylor/Desktop/Williamsburg Trip/") # Read in dataset (ANES 2016 data saved as a .csv) #d <- read.csv("ANES_2016.csv") ## For the purposes of this exercise, you can also read it in from here d <- read.csv("http://pages.ucsd.edu/~tfeenstr/research/ANES_2016.csv") # Let's look at how big our dataset is: dim(d) # Try knitting the pdf while dim(d) is not commented and when it is commented. What do you see? ``` Now if we want to describe our data a bit, we **could** just type the numbers we saw in our R Console in like this: There are 3,649 rows and 106 columns. But, what if we later realize that some of these rows need to be dropped? Or that some columns were included in error? We'd have to go back and re-type the correct numbers. What a pain! R Markdown can help fix this by letting you write flexible results. We can write: There are `r dim(d)[1]` rows and `r dim(d)[2]` columns in our dataset. Here, we've used a single tick mark around our code in the text. We have the letter r to tell Markdown we're writing R code. Finally, we just write our code as normal. Much cleaner, eh? Notice also that I used asterisks to **bold** something. # Creating Figures We could have a whole session on how to create figures in R. Instead of spending time doing that, the purpose of this section is to show you how to generate figures in R Markdown that are embedded within your data report or paper. Just as before, we begin by creating a code chunk with three tick marks. In this case, we're going to add the arguments referenced above to set the height and width (in inches) of the figure we're going to produce. In this specific example, I know that my figure is going to be wide, so I'm going to set the width to be longer than the height. I also add a caption for my figure, keeping it in quotes. Next I write my R code just as I normally would. In this case, I'm creating a barplot that shows the average feeling thermometer toward Trump (my DV) for each value of strength of partisanship (my IV). ```{r feeling_pid, echo=FALSE, eval=TRUE, fig.height=5, fig.width=8, fig.cap="Feeling thermometer toward President Trump by strength of party ID."} # Create vectors of the mean feeling thermometer toward Trump (V17) by party ID feeling.pid.vec <- c(mean(d$V17[d$V36=="1. Strong Democrat"], na.rm=T), mean(d$V17[d$V36=="2. Not very strong Democract"], na.rm=T), mean(d$V17[d$V36=="3. Independent-Democrat"], na.rm=T), mean(d$V17[d$V36=="4. Independent"], na.rm=T), mean(d$V17[d$V36=="5. Independent-Republican"], na.rm=T), mean(d$V17[d$V36=="6. Not very strong Republican"], na.rm=T), mean(d$V17[d$V36=="7. Strong Republican"], na.rm=T)) # Create the bar graph barplot(feeling.pid.vec, main="Feelings toward Trump by Partisanship", xlab="Partisanship", ylab="Feeling Thermometer (Coldest-Warmest)", names.arg=c("Strong\nDemocrat", "Not Strong\nDemocrat", "Independent\nDemocrat", "Independent", "Independent\nRepublican", "Not Strong\nRepublican", "Strong\nRepublican"), cex.names=.65) ``` \FloatBarrier Now if I want to reference the figure I just created, I'm going to write: Figure `r figr("feeling_pid", type="Figure")`. If you're reading the .pdf, the code I typed was [r figr("feeling_pid", type="Figure")], but with a tick mark replacing each bracket. I use "feeling_pid" because that's what I named that code chunk. Now I can write things like "As shown in Figure `r figr("feeling_pid", type="Figure")`, Strong Republicans had much warmer feelings toward President Trump than Strong Democrats." # Creating Tables Tables are also a useful way to show your results. There are some awesome R packages that help create pretty tables for you: stargazer and xtable. If you don't have these packages installed, you should install them by running the following R code: install.packages("stargazer") install.packages("xtable") If you already have them installed, you'll just need to load them, which I've already done in our first code chunk up at the top. ## Tables with xtable ```{r partyid_gender, echo=FALSE, eval=TRUE, results="asis"} # Here I'm going to make a cross-tab table that shows the relationship between gender and party ID # Each row will be a party ID (strong democrat to strong republican) # Each column will be a gender (male, female) # I want to know what percentage of men are strong democrats, what percentage of women are strong democrats, etc. So, each column will sum to 100% partyid_gender <- prop.table(table(d$V36, d$V1)[2:8,2:3], 2) # Now I want to set it up to be pretty in the report options(xtable.comment=FALSE) print(xtable(partyid_gender, type="latex", caption="Party ID by Gender"), caption.placement="top") # In the code above, we've first told R to suppress a comment that otherwise appears above the latex table # Then we're telling it to print a table that we create with the xtable package. The table is what we stored our table as before (partyid_gender). # type="latex" says that we want to use latex code for our formatting to generate the pretty table # caption="..." tells us what we want our caption to be # caption.placement="top" tells us that we want the caption on top of the table instead of underneath it ``` We can then refer to our table in much the same way we referred to our figure before. The key difference now is that instead of writing type="Figure", we're going to write type="Table". This is to make sure that when R Markdown is automatically numbering everything for us (thanks!) it doesn't combine or mix up tables and figures. If we had left this table as type="Figure", it would list it as Figure 2, even though it's Table 1. So, if I want to refer to my table, I will write: As shown in Table `r figr("partyid_gender", type="Table")`. \FloatBarrier ## Tables with Stargazer We might also want to include a regression table in our results. To include a regression table that looks nice, we'll use the stargazer package. You'll need to store your regression in an object and then use stargazer on that object (see code below). ```{r trump_clinton_feeling_messy, echo=FALSE, eval=TRUE, results="asis"} # First, I'm going to run a very basic OLS (ordinary least squares) regression # My IV is feelings toward Hillary Clinton and my DV is feelings toward Donald Trump. # My hypothesis is that there will be a negative association between feelings toward Clinton and feelings toward Trump. trump_clinton_feeling_mod <- lm(V17 ~ V16, data=d) # Now I want to write this out in stargazer to make it look nice: stargazer(trump_clinton_feeling_mod, header=FALSE) ``` \FloatBarrier This looks a little sloppy. Our variable names don't make any sense to the average reader, there isn't a clear title. This is not going to be helpful. See the next code chunk for how we can add in variable names and titles. ```{r trump_clinton_feeling, echo=FALSE, eval=TRUE, results="asis"} trump_clinton_feeling_mod2 <- lm(V17 ~ V16, data=d) # Now I want to write this out in stargazer to make it look nice: stargazer(trump_clinton_feeling_mod2, header=FALSE, title="Relationship Between Feelings toward Clinton and Feelings toward Trump", dep.var.labels="Feelings toward Trump", covariate.labels="Feelings toward Clinton") ``` \FloatBarrier This looks **much** nicer! But, now let's say we want to do a multivariate regression in which we control for feelings toward Obama (V15) and feelings toward Big Business (V75B). ```{r multivariate, echo=FALSE, eval=TRUE, results="asis"} # First, write the code for my multivariate regression: multivariate <- lm(V17 ~ V16 + V15 + V75B, data=d) # Now write the stargazer code just as before # The key difference is that now we want to specify our covariate labels so that they don't appear as just V16, V106, V15, and V75B. # It is *very* important to keep these labels in order! You want them to be in the same order in which you listed the variables in the model stargazer(multivariate, header=FALSE, title="Relationship Between Feelings toward Clinton and Feelings toward Trump", dep.var.labels="Feelings toward Trump", covariate.labels=c("Feelings toward Clinton", "Feelings toward Obama", "Feelings toward Big Business")) ``` \FloatBarrier Let's say we wanted to put our bivariate regression in the same table as our multivariate regression. See the code chunk below. ```{r bivariate_multivariate, echo=FALSE, eval=TRUE, results="asis"} # Now write the stargazer code just as before # The key difference is that now we want to specify our covariate labels so that they don't appear as just V16, V106, V15, and V75B. # It is *very* important to keep these labels in order! You want them to be in the same order in which you listed the variables in the model # Notice that now we've just told stargazer that we want it to organize both our trump_clinton_feeling model and our multivariate model in the same table stargazer(trump_clinton_feeling_mod2, multivariate, header=FALSE, title="Relationship Between Feelings toward Clinton and Feelings toward Trump", dep.var.labels="Feelings toward Trump", covariate.labels=c("Feelings toward Clinton", "Feelings toward Obama", "Feelings toward Big Business")) ``` \FloatBarrier # Resources * https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf * https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf * https://rmarkdown.rstudio.com/lesson-1.html