Twitter’s Acquisition Raises Red Flags for Scientific Community

Why scientists and data scientists are concerned

image by Alexander Shatov at unsplash.com

Twitter for years has been a useful tool for both the scientific and data science communities. Today, several researchers are sounding the alarm: the recent acquisition may have changed things.

The importance of Twitter and in science and data science

It is almost safe to say that almost every data scientist or artificial intelligence practitioner follows Twitter, Yann LeCun. He is not the only prominent member of the artificial intelligence community; in fact, most professors and researchers have Twitter accounts.

To begin with, we can say that Twitter has proven over the years to have both local and global relevance. For example, it has been vital during natural disaster response, both in facilitating response and warning. In addition, it has allowed dissident voices to evade censorship. Wired noted how Black Twitter has been a community capable of pushing socio-cultural change and that there is currently no replacement.

For science, first and foremost it has been an inexhaustible source of data. For example, analysis of Tweets has been used to understand people’s perceptions of climate change. Many social science studies have used Twitter to understand the spread of opinions, social movements, activism, political ideas, and their evolution. This is because Twitter has allowed researchers to access and analyze data in a way that no other social network has allowed.

Twitter enabled this by building a robust API, facilitating the work of researchers. In fact, there are more than a thousand datasets on Kaggle alone. These datasets have since been used both by researchers for scientific articles but also for various tutorials and blog articles. For example, Twitter data have been used extensively in the development of models such as Graph Neural Networks: allowing deep learning models to be tested on static and dynamic, small and large graphs obtained from the social network.

The second reason, Twitter made it possible to create collaborations, announce position openings, or contact prospective employers. In the scientific field, there are both accounts and hashtags dedicated to announcing PhDs and PostDocs (not to mention that some of them are dedicated to promoting PhD diversity in STEM faculties).

“Twitter is a tool that facilitates decentralization in science; you are able to present yourself to the community, to develop your personal brand, to set up a dialogue with people inside and outside your research field and to create or join professional environment in your field without mediators such as your direct boss.” — source

As noted by Science, Twitter has become the premier place for sharing ideas. During the Coronavirus outbreak, information about the virus and sequence publications traveled much faster on Twitter than on traditional channels, giving a boost to research.

In addition, articles that have been tweeted have been shown to receive more citations (demonstrated in several studies: here and here). Others have noted that articles that were highly shared during the first three days were subsequently accepted either in journals with higher impact factors or at more prestigious conferences.

DeepMind, OpenAI, Google, and so on regularly announce their latest models on Twitter. This not only allows them to spread the publication or as free publicity. On the other hand, several people test the algorithms and publish their impressions on Twitter. This allows them to receive information about how the model works, its limitations, potential errors, and biases.

One example, after the publication of ChatGPT, the hashtag #chatGPT reached more than 100,000 Tweets. Many users have shown enthusiasm, and have from the start shown the limitations of the chatbot. This allows researchers to find corner cases much more quickly where their model does not work properly.

How Twitter’s recent acquisition has concerned the scientific community

As a demonstration of some concern, two letters were written to Nature recently (here and here). Their publication is a sign that the scientific community is concerned about recent Twitter developments.

The first reason is that it is unclear whether Twitter will still allow researchers free access to data or whether this will be restricted in some way. Second, if researchers migrate to other platforms this will fragment the data and the community. Third, one reason for the value of Twitter’s data is that it is longitudinal (allowing trends and their evolutions to be analyzed), and if Twitter is abandoned this data will no longer be used for time series analysis.

Twitter is an important source of data precisely because they are reliable. Some researchers are concerned that this is no longer the case. Musk recently reduced Twitter’s team of moderators and has always spoken out in defense of ‘freedom of speech’. After draconian staff cuts, moderation on Twitter leans on automation. Many researchers fear this means a proliferation of hate speech, bots, and misinformation.

In addition, Elon Musk announced that he plans to delete more than 1 billion inactive accounts. It is not clear after how long an account is declared inactive and risks deletion. Although recent hints suggest it could be after a year. The deletion of these accounts is likely to lead to the loss of historical information and make researchers’ studies more problematic.

Also as mentioned before, Twitter has been used as before to discuss the limitations of a model and by researchers to connect before an important conference (e.g. during NeurIPS 2022 many researchers were discussing their work on Twitter or arranging to meet). Unfortunately, there is currently no alternative (Mastodon is not a replica of Twitter and is divided into several servers or communities).

Conclusions

Twitter for several years has been a place of discussion for many researchers. Its policy has enabled the collection of datasets that have been used for important research. On the other hand, almost all data scientists have tried their hand while learning to program with at least one dataset obtained from Twitter (such as the famous Twitter sentiment analysis).

Twitter, even more than LinkedIn, was the first audience in which to announce a new article and a new model of artificial intelligence. Users on Twitter have always been very active in presenting concerns about the presence of bias in models, corner cases, and so on.

Researchers commented with concern on Twitter’s recent acquisition. Both because Twitter may no longer be a base for collecting reliable data, and because relevant information may be lost in a tide of misinformation. What do you think? Let me know in the comments.

If you have found it interesting:

You can look for my other articles, you can also subscribe to get notified when I publish articles, and you can also connect or reach me on LinkedIn. Thanks for your support!

Here is the link to my GitHub repository, where I am planning to collect code and many resources related to machine learning, artificial intelligence, and more.

GitHub - SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…

github.com

Or feel free to check out some of my other articles on Medium:

Can an AI be a data scientist?

OpenAI’s ChatGPT is blowing data scientists' minds. Could it steal their job?

medium.com

Data sovereignty: sharing is not caring

Researchers are urging more data transparency, is it right to grant always data access?

medium.com

How AI Could Help Preserve Art

Art masterpieces are a risk at any time; AI and new technologies can give a hand

towardsdatascience.com

How artificial intelligence could save the Amazon rainforest

Amazonia is at risk and AI could help preserve it

towardsdatascience.com

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com