Introduction: A Blog to Simplify Learning Quantitative Analyses

Something I saw a lot while in graduate school among my own peers, my undergraduate students, and even in my own case at one point, is that many students initially struggle with the technical side of coding and foundational statistics knowledge.

In my own case, I started off with slim to no statistical knowledge beyond standard undergraduate courses. The result is that I struggled to learn how to use Stata and R, as well as with the fundamentals of statistics for my discipline, during my first year of graduate school. I recall many of my peers and I would see Greek letters and really not understand what to do or what was going on (and in my own case this felt somewhat worse since I actually studied the Ancient Greek language as an undergraduate!). As a result, I believe many students, although especially those from underprivileged, first-generation, or low-income backgrounds (such as myself), very quickly lose enthusiasm for learning the art of quantitative analyses. While I persevered, it took me several years to really understand the fundamentals, followed by the more advanced methodologies used in the Political Science field, as well as how to actually code within analytical software such as Stata or R. So I started to wonder: maybe I could help spread this knowledge in a manner that helps simplify the process for these students? Maybe there is something I can do to help clarify the harder parts of the process? We all know how difficult it is to teach a methodological course to undergraduates or even graduate students, so perhaps I can help provide (at least) a supplement that will not take someone one or more 10 to 15 week sessions to learn.

I want that to be the purpose of this blog: I want to provide anyone who wants to learn an opportunity to quickly pick up the “tricks of the trade,” so to speak. I feel this can provide those underprivileged, first-generation, or low-income background students with a simple to understand resource for data analyses. It is worth noting, however, that any such approach can and will be limited if it does not also provide the foundational knowledge behind why something works the way it does. Thus, this blog will take two approaches:

  • Provide step-by-step instructions for students who want to learn to analyze data using Stata and R (beginning with Stata first, followed by R in future posts)

  • Provide the foundational knowledge behind such analyses (i.e., the important assumptions, inner workings, and limitations)

I have been thinking about this for quite a while and I think these approaches are best set up and explained individually in parts. Specifically, I will be working on weekly blog posts that provide a step-by-step guide to using Stata and R (Stata first, followed by R), which will then follow with a post that explains the assumptions and knowledge behind the technique(s) in question.

I believe that doing this will help anyone within the social sciences (although especially in Political Science!) who may want to learn how to analyze and interpret data in a quick and easy manner. There are, of course, some limitations here: I can only be so explicit in these posts as there are books upon books (upon books!) that can explain how different statistics work. As such, in the blog posts that will follow in the coming weeks, I will start with the basics and keep things as simple as possible in conversations on statistics and different forms of regression.

The technical posts where I explain Stata/R will also be very technical, although especially at first: from learning about the UI, to learning where variables are and how to edit them, to calling variables into Stata/R, to entering commands into the software, to exporting and interpreting results, etc. What this means is, if you already know and are used to much of this already, the first few blog posts may seem repetitive (and possibly even a bit boring…). However, if you are new and have just opened Stata or R up for the first time, you may find yourself completely lost at what is going on, how things work, and whether typing in a command that comes off as gibberish to you will mess up your computer (hint: it typically will not mess up your computer). These technical posts are meant to help provide those new users with the knowledge and skills they need to successfully analyze data in the respective software used.

Similarly, the more foundational knowledge posts that will follow technical posts will attempt to simplify the probability and statistics behind the actions in the technical posts. Thus, if you understand probability and statistics, some of the first few blog posts in this series may also seem a bit repetitive (or once more, possibly even a bit boring…). However, if you are new to the field and α, β, and ε really just look like gibberish to you, then these posts are meant to help you get started on deciphering what those mean, why they matter, and how they relate to the technical posts.

I hope that this blog will eventually evolve into using different programs/techniques within Political Science (such as typesetting in LaTeX for example), but for now I just hope that any students reading this (or any faculty who may want to provide their students with an additional resource to help them get started) will be able to use the posts that will follow on this blog.

If you have any suggestions or questions, please feel free to leave feedback in the comments.