Finding conjunctions in groups of interest on Facebook

I recently wanted to mine data from Facebook to find out how many people are liking in different groups of interest on Facebook. Let’s say you want to know how many people give likes to different political parties or different companies. The following blog post shows how to use the Rfacebook package, which provides an interface to the Facebook API, to find those conjunctions. I never encountered a limit for requests to the API, my biggest run with the script were about 12 hours. I mined 10 million likes from 73.000 posts in this time. For the reader interested in german politics: I used this script to find conjunctions between the political parties of the current parliament (+ a party called AfD) and right-wing extremists. You can find the (german) blog post here.

Get the pages

To get access to the API you need to register to the Facebook developer program and create a dummy application. Here you can find a very good tutorial on how to get R connected to Facebook including a workaround for recent changes Facebook made to the API. In this tutorial you are going to generate a token for authentication (fb_token). I would recommend to save this token so you can use it just by loading the file. I saved the token with save(fb_token, file = 'home/MyFolder'). Ok, so first let’s load the packages we need for the analysis and define our working directory.

For each group of interest I created a file in a folder which contains the URL’s for one group line wise. Make sure to use the top level domain of the sites which is usually: https://www.facebook.com/SiteName. In my project I wanted to check political parties in Germany and I decided to use the sites from all party associations of our federal states and the capitals of those states. That’s adding up to 32 sites for each party. In the next step we are defining this folder and load the above mentioned authentication token. Additionally we are defining the time span from which we want to collect the likes given to the posts of each page.

The Last step before we connect to facebook is to read the names of the sites from the URL’s in our files. In the next code chunk we are going through each file and each URL in this file and extract the name of the site. It also handles the encoding of german mutated vowels. Facebook adds an ID to names with those vowels to avoid them in the URL.

Get the post IDs and likes

So let’s start mining some data from Facebook! The next code chunk goes through each group and each site within this group and extracts all the posts of this site in the given time span. As you can see I limited the maximum number of posts in the getPage call to n = 1000. I’m also dumping the console output, because I wanted to have a simple status print out for each site. Since at this point I just want to look up who liked the posts I’m only extracting the IDs of the post. The rest of the post is dumped after that.

PagePosts contains the IDs now in the following form: Groups –> Sites –> PostIDs. I only wanted to find links between the complete groups, so I collapsed this list to Groups –> PostIDs with the following line:

With the IDs we can extract all the likes given to each post. For each like I’m going to extract the unique user ID of the person who liked the post. Again I’m dumping the console output and generate a custom message for each post.

For each group of sites PostLikes now contains all the ID’S of the users that liked a post in our previously defined time span. Currently those ID’s are stored in lists for each post. So we again have to collapse those lists to get all the ID’s for each group in one list per group. I’m also going to delete duplicate ID’s that occur in one group, because I want to find out if someone liked posts in more than one group and not how often. The remaining code in the next chunk calculates some aggregated values like the overall number of acquired posts.

Find the conjunctions

The last thing to do is to find the intersection between the groups. Who liked in more than one of the groups? To find that out I simply concatenate two groups and count duplicated ID’s. The result is a diagonal square matrix with the number of intersecting ID’s.

Plot the results

A very nice way to show the intersections between the groups are chord diagrams. The package chorddiag provides interactive D3 chord diagrams. The diagram under the next code chunk is the diagram I created for my project.

Please keep in mind that it is very hard to tell if the data gathered for a certain group is representative for this group. It really depends on how you collect the sites to represent the groups. Also there is certainly more stuff one could do with the collected data!


Also published on Medium.

Leave a Reply

Your email address will not be published. Required fields are marked *