Data Activism

We are data activists. In the society of information, who has the power to disclose or hide data is also able to influence our understanding of society, and therefore to limit our freedom. Facebook won't help us here: that's why we must act.

As the research that exposed the Volkswagen pollution scandal has revealed, how one is collecting data is so fundamental that it might make a difference between a critical analysis and the press release of a company. We, as free thinkers and researchers, want to be able to collect data from the bottom, as users, from users, for users. In other terms, not just for ourselves. We build tools that allow other researchers, or users themselves, to understand how the algorithm is having an impact on their life: in this challenge, the deception is personalized, and our solution is collective.

So you heard Facebook should provide some data to some researchers...

The first week of May 2019, news media published a story regarding Facebook “opening its data to academics for first time”. Actually this was not the first time, so it’s not anything new. And once again they limit their dataset (which will allow researchers to study social network’s effects on elections from 2017 onward. In the official announcement, it’s noticeable how many of the projects are concerned with the impact of Facebook in political discourse. Again, this is not something new: other research groups obtained Facebook data and used it to try to answer similar questions on the impact social networks have on society. In our vision, this announcement should be seen as an attempt by Facebook to meet the goals promised in the context of its scrutiny, where they promised to offer a better comprehension of the abuses on the platform during electoral campaigns.

Data from Facebook is not neutral.

Well, no data is neutral, but the problem with the data Facebook disclosed in the past is that it has always been focused exclusively on the interactions that happen on its platform. What we want to address in this text it’s that research so far has not been looking at the abuses of the platform, but just at the ones happening on the platform. Instead, the subject of our investigation is Facebook itself. The object of research, according to Facebook, should be the people that use it improperly (fake news, propaganda, spam, hate speech, and so on..). We believe that the data that Facebook provides carries the fundamental problem that it will not allow an understanding of the role of the platform itself in this phenomena and therefore to allow attribution of the the proper responsibilities to each actor involved.

This binds researchers to look at Facebook with a corporate-friendly framing

The engagement is measured in terms of likes, comments, reactions and other activities captured by apps (how much of this meta-data are protected is something we cannot know). The interactions that happen on Facebook are the result of what people want, PLUS what the algorithm and the interface suggest to do. Facebook constantly changes these parameters, to maximize the time spent by users or to realize experiments (like the sadly infamous “Detecting Emotional Contagion in Massive Social Networks”). We are not able to know the algorithms that “created” the data of the subjects of the experiment: this variable is mixed with the other ones and it become impossible to isolate them. This implies we are not understanding “Facebook”, nor we are looking at “society”, but just at “how people used Facebook during that period”. But, why would we care about that?

We should understand the dynamics that influence masses, and how the powers involved create a specific imaginary, or the diffusion of a specific belief (true or not). Understanding this mechanism is necessary to attribute the right responsibilities.

Medium is the message, said Marshall McLuhan. Indeed, Facebook’s UX drives actions through emotional messages, usually in a visual way, and especially as much as they stimulate reactions that are measurable as “engagement”. This pushes in the direction of seeing “active interaction” as meaningful, but excluding the case that passive exposure to information plays a role. Nobody can understand if a “visualized” information has also been “consumed”. Naturally, Facebook could have access not just to interaction data, but also to the behavior of users. We cannot expect Facebook to provide these data because that would imply a violation of GDPR (users wouldn’t be informed beforehand that their personal data could be shared with third parties, as regulation requires). Publishing papers that use engagement as the exclusive metric of success or failure of a message or advertisement reinforces the narrative used by Cambridge Analytica and other data brokers. It stresses the fact that to have success in the social network, one needs to engage its users, and this sustains and legitimizes the existence of bots, as well as of a profiles/likes market. Moreover, it reinforces the power of Facebook that, as monopoly, sells a promise of engagement to who buys its sole product: advertising, or “user’s attention”. Engagement is just a metric among many others, that is prone to (automated) manipulation, and that is a result of different variables.

Our independent approach provides unique data

We are researchers, and as such, we should have a critical approach. That’s why we cannot forget that the main objective of Facebook is that of maximizing the attention (or time spent) of the users and to “lock them in” the platform. This doesn’t necessarily imply providing good quality data to researchers. That why we want to take a “braver path”, that some have been hoping for, but somehow didn’t succeed in. We, as third parties, collect data independently from Facebook.

Our data wants to protect individuals, but make phenomenon public!

This allows researchers to elaborate new research questions about these social, public, mass-phenomena. These questions should allow us, as a society, to understand social networks without having to be limited to the data Facebook is deliberately willing to share. The interactions that take place on the platform are the result of three components of heterogeneous nature: content producers (that can be either disinformers, commentators, friends which which you’re sharing content, bots or influencers and so on..); the algorithms, or platform logic (that decides what is becoming viral, what should be seen on top of the timeline, and in general decides what is relevant for you and what is not); and “society” of which we are part of. As such, we want to protect some categories and make others responsible, understanding which behaviors are useful to our growth. Unfortunately, it is difficult to take into account all of these factors at the same time.

We work on a technology that allows use to isolate the algorithms, or the platform logic of Facebook, so that we can differentiate responsibilities of the company from the individual ones. The analysis of the algorithm concerns the tecno-(or socio-)political sphere, a field of study that is being developed in these years and that is focused on the social impact of technologies. We think that by separating the variables, social scientist and other experts are the right people to judge the most human factors at play. Being able to provide data to those in order to support data-driven policy-making is our objective. That’s why we must recognize the multidisciplinary nature of the analysis that should be done and develop technologies that facilitate the reuse of data, as much as the protection of personal data (that constitutes the basement of our technology) allows.

As soon as we have data that finally show us the effects of the algorithm, we are able to discover new metrics and formulate different and more contemporary questions about social networks. In a research during the past year, for example, we developed two ways of measuring the algorithm: one is the percentage of content by media type (text, video, pictures) a profile gets exposed to, another is the amount of times a post is being showed to the same user. It is clear that if a certain number of contents is being shown again and again, the chance that the user will interact with it (and therefore, to engage with it) raises, while this repetition subtracts to the user the possibility of being exposed to more diverse information.

We documented our tests, published research, released open data, and inspired new publications. It is interesting to note how the data that Facebook released to its researchers never allowed for such analysis.

The fight against the abuse of political ads was to be done in 2015, now what?

How to ensure public accountability but protect individuals privacy?

Data should be protected because it’s clear that, if they were publicly accessible, they could be employed in further abuses. Furthermore, we need to protect them because most of the companies and businesses around data analysis are focused on studying the behavior of users for targeted advertisement or marketing purposes in general. But how Facebook disclose their data to researchers is not acceptable, either. How can anybody make sure that they are unbiased, untampered data? How can they be evaluated by the subjects themselves? Which procedure of refinement or selection did Facebook apply when extracting the dataset? And moreover, why should we trust a single group of researchers to elaborate inputs that will inform the development of public policy? According to the “European Data Commons”, section 2.2.2 of Diem25 Technologial Sovereignty], data are available to whoever wants to make use of them for research, as long as the logic of querying of the database is privacy-preserving. It’s not easy to “formalize” this as code, so this requires an assessment of impact for every method of querying to the database. What must be guaranteed is the possibility to analyze phenomena, but not individuals. We succeeded in building these mechanics, although we must consolidate a method to guarantee this protection and at same time promote research without interfering with it.

The conflict is freedom of choice

With a bit of abstraction, we understand that the field of conflict is the freedom of users in being able to exercise their will. A problem of self-determination. We become used to accept that a small part of our information is advertised content, but what we want to say with our analysis of the Newsfeed algorithm is that all of its content is organized around its advertisement system. As a consequence, we should not consider it as the product of our will. Even if we did freely decide our friends and our pages, the freedom of Facebook in showing us what it wants is way more prominent. We should apply this considerations to every platform that “personalizes the experience”. The field of struggle becomes personalization algorithms. Hate speech, political propaganda, misinformation: these are only a few of the possible objects of research that we can use to address the root cause. That’s why we cannot be surprised that other mechanics (such as friend recommendations, content becoming viral, number and types of comments selected to appear) could be controlled by Facebook or other actors (with their political agenda) that have the knowledge and the technological means to do so.

Our creative answers to the problem

Play with an alternative methods to access information and public discourse

We built fbtrexRSS, a system that allows to see content in relation to its context (or semantic topics) as long as they are “of public interest”. This is done horizontally and completely independent from any filter bubble (whatever is their effect, this make you jump it).

The algorithm, even if a really simple one, is under the control of the readers. Even if this is out of the scope of research, we want to promote creative re-usage of data, as long as it is privacy-preserving. This guarantees a wider variety of interest and therefore more diversity in the representative sample (a limit recalled here also in There aren’t any rules on how social scientists use private data, here is why need them.

Note: the RSS application is not officially part of our sponsored project, ALEX, but instead of our free software project that is growing at its side. If the software is protected by a collective license, its easier that it can guarantee the collectivization of these data and a collaborative and more complete revision of it.

Join the analysis

By installing the fbTREX browser extension, you can get insights into how they present different realities to us. We, as researchers, commit to analyze social phenomena and not individual behaviour. We all need to understand how algorithms interact with us. Because of their personalized nature, and the opacity of Facebook, only with a collective distributed observation we can get a grasp.