Towards contextual consent in Social media health research
It is becoming more and more common for health researchers to turn to social media for participants. This data can be incredibly insightful, can be used to infer a wide variety of medical conditions, it is quite often freely available to access, and it is possible to programmatically scrape the profile of hundreds of thousands, or even millions, of people. The trouble with this, however, is that this is often done without the express consent, or perhaps even the knowledge, of the participants being studied.
There is a need to give people more control of their data. Not just in terms of whether or not they would be willing to participate, but also what data should be accessible and to whom. Giving this control, however, comes at the cost of burdening the participants with repeated requests to determine what is an acceptable flow of data. We think machine learning could be used to predict participant consent decisions, easing this burden.
We are currently conducting a web-based study to collect a dataset of participant consent decisions. The web-app asks the participant to authenticate with Facebook, and then they are presented with real posts from their Facebook profile and asked if they would share it with theoretical stakeholders. We collect the participant’s response, alongside metadata about the post, but not the actual content of the post itself.
We repeat this up to 100 times per participant, and the end result is a large dataset with consent decisions under a broad set of circumstances and contexts. We then use machine learning algorithms to train and evaluate a classification model for predicting consent.
We have carried out some preliminary analysis on 64 participants and found that our current model has a generalizable accuracy of around 75.9%. Additionally, in circumstances where a Facebook post should not be shared, our model correctly withholds it 82.9% of the time. This study is still ongoing and we intend to recruit more participants over the next month.
These preliminary results are promising, and we hope that more data will allow us to perform more sophisticated feature extraction and have more confidence in our findings. We intend to publish the full results of this work in the first half of 2018.