Udacity recently added a new online course for exploratory data analysis with three members of the Data Science team at Facebook. The course can be done for free and I’d definitely recommend having a play around with it if you have an interest in data analysis or statistics. It teaches you the basics of exploratory data analysis using R, and provides some interesting data sets to play around with. Coincidentally, I’ve been searching for online data sets for the last few days to practice learning R, so this seemed absolutely perfect.
I thought this would be a good opportunity to create a new blog dedicated to conducting some exploratory analysis of open data and other random tasks. I’ll include code and outputs where possible. Some posts might require a basic understanding of programming, statistical analysis or other topics to follow along. Feel free to get in touch if you have any questions or suggestions.
Given that my PhD has heavily involved social networking site analysis on a small scale (A/B testing with a prototype), one of the course files provided with the Udacity course was of particular interest to me: a data set containing 99,000 simulated Facebook users with some basic information, such as age, gender, number of friends, likes and number of days since the account was created. The course creators were quick to mention that the data file was not produced from real user data, however, similar patterns apparently exist between the data set and real users. Since this is not real Facebook data, it’s important to realise that any findings from analysis of this data are not necessarily an accurate representation of what really goes on with Facebook users.
I’ll create new blog entires soon to focus on some R analysis.