Should we trust in the results of computational social science?
My struggle to reproduce and replicate studies which used unsupervised methods for image analysis.
Recently, I attempted to reproduce and replicate three papers that used unsupervised learning methods for clustering images. The purpose of this exercise was to see whether unsupervised methods could be employed to imitate the social scientist’s gaze (i.e., determine whether the clusters could group images using the kinds of concepts that are meaningful to social scientists). The experience left me feeling rather pessimistic, not only about unsupervised image analysis, but about the current state of computational methods in the social sciences more generally.
Broadly speaking, the papers I was reproducing and replicating were advocating for the uptake of specific unsupervised methods for image clustering. This meant that, while they had a use case of images for clustering and provided validation of the results being meaningful groupings, the focus of each of the articles was on the method and convincing others of its promise for the social sciences.
With that in mind, I think it was particularly important that an interested reader (such as myself) felt empowered to apply these approaches to their own data following reading the article and its associated replication materials. I did not. Instead, I experienced the kinds of struggles that those who have tried to reproduce or replicate a study will undoubtedly be familiar with.
During the reproduction phase, where I used code and data provided by the authors in an attempt to reproduce the results reported in the articles, I encountered difficulties including:
Code that was provided (such as through Github repo) did not run ‘out of the box’ and required more than just simple changes (eg. point to my wd) to run. These changes were not clearly outlined in READMEs or code commenting and I’m not sure whether my ‘fixes’ fundamentally changed the process from the one that the authors implemented.
Data was not provided even when authors were contacted (as per their note that data would be provided after they were emailed).
Operating system and development environment requirements were not clearly identified and difficult to reproduce.
Hyperparameter choices were not justified and some works seem to have used default values without elaboration.
There are further issues on code quality, for example some values which are set at the top of the code are then hardcoded further on.
During the replication phase, where I adapted the authors’ code and methods and applied it to new data, I had significant setbacks including:
Hyperparameter tuning was not explained by the authors (or perhaps not done at all) which meant that it wasn’t clear how to adapt the code to match my new data.
Requirements for new data in terms of format, location etc. not clearly identified.
Additionally, there was one specific unsupervised method - one which was promised to be fully traceable and transparent since it’s based on a fixed mathematical formula - that was applied in two different papers. When I applied each paper’s reproduction code for this method to my data, I achieved two significantly different clusterings and did not understand why. It was unclear to me whether one paper applied the method correctly, or if neither did, and, therefore, whether I could have faith in any of my results.
I worry that in addition to authors being not (yet) incentivised to provide their reproduction code in a usable format with sufficient documentation, that hype around new computational methods means that we as computational social scientists get away with not really justifying that we understand them when we employ them in our research!
This is concerning. How can we as computational social scientists expect others to trust in our results, invest significant time to learn how to replicate an approach, or believe that our findings are robust (that is, that the found phenomenon would be identifiable using other methods) if we can’t convince them that we understand how they came about?