Twitter's acquisition has raised concerns within the scientific and data science communities due to potential changes in data access, fragmentation of the community, and loss of historical information.
Abstract
The article discusses the importance of Twitter in the scientific and data science communities, highlighting its role as a tool for collaboration, data source, and platform for sharing ideas. However, recent developments, including Twitter's acquisition and changes in its policies, have raised concerns about the future of the platform as a reliable data source and community hub. The article mentions letters written to Nature expressing these concerns and discusses potential issues such as data fragmentation, loss of historical information, and proliferation of misinformation.
Bullet points
Twitter has been a valuable tool for the scientific and data science communities, serving as a platform for collaboration, data source, and idea sharing.
Recent developments, including Twitter's acquisition and changes in its policies, have raised concerns about its future as a reliable data source and community hub.
Letters written to Nature express concerns about potential issues such as data fragmentation, loss of historical information, and proliferation of misinformation.
The article discusses the importance of Twitter's longitudinal data and its role in facilitating the spread of research findings and discussions about the limitations of models.
The scientific community is concerned about the potential loss of historical information due to the deletion of inactive accounts and the proliferation of misinformation due to reduced moderation.
There is currently no alternative platform that can fully replace Twitter's role in the scientific and data science communities.
The article concludes by asking readers for their thoughts on the matter and providing links to the author's other articles and resources.
Twitter’s Acquisition Raises Red Flags for Scientific Community
Twitter for years has been a useful tool for both the scientific and data science communities. Today, several researchers are sounding the alarm: the recent acquisition may have changed things.
The importance of Twitter and in science and data science
It is almost safe to say that almost every data scientist or artificial intelligence practitioner follows Twitter, Yann LeCun. He is not the only prominent member of the artificial intelligence community; in fact, most professors and researchers have Twitter accounts.
To begin with, we can say that Twitter has proven over the years to have both local and global relevance. For example, it has been vital during natural disaster response, both in facilitating response and warning. In addition, it has allowed dissident voices to evade censorship. Wired noted how Black Twitter has been a community capable of pushing socio-cultural change and that there is currently no replacement.
For science, first and foremost it has been an inexhaustible source of data. For example, analysis of Tweets has been used to understand people’s perceptions of climate change. Many social science studies have used Twitter to understand the spread of opinions, social movements, activism, political ideas, and their evolution. This is because Twitter has allowed researchers to access and analyze data in a way that no other social network has allowed.
Twitter enabled this by building a robust API, facilitating the work of researchers. In fact, there are more than a thousand datasets on Kaggle alone. These datasets have since been used both by researchers for scientific articles but also for various tutorials and blog articles. For example, Twitter data have been used extensively in the development of models such as Graph Neural Networks: allowing deep learning models to be tested on static and dynamic, small and large graphs obtained from the social network.
The second reason, Twitter made it possible to create collaborations, announce position openings, or contact prospective employers. In the scientific field, there are both accounts and hashtags dedicated to announcing PhDs and PostDocs (not to mention that some of them are dedicated to promoting PhD diversity in STEM faculties).
“Twitter is a tool that facilitates decentralization in science; you are able to present yourself to the community, to develop your personal brand, to set up a dialogue with people inside and outside your research field and to create or join professional environment in your field without mediators such as your direct boss.” — source
As noted by Science, Twitter has become the premier place for sharing ideas. During the Coronavirus outbreak, information about the virus and sequence publications traveled much faster on Twitter than on traditional channels, giving a boost to research.
In addition, articles that have been tweeted have been shown to receive more citations (demonstrated in several studies: here and here). Others have noted that articles that were highly shared during the first three days were subsequently accepted either in journals with higher impact factors or at more prestigious conferences.
DeepMind, OpenAI, Google, and so on regularly announce their latest models on Twitter. This not only allows them to spread the publication or as free publicity. On the other hand, several people test the algorithms and publish their impressions on Twitter. This allows them to receive information about how the model works, its limitations, potential errors, and biases.
One example, after the publication of ChatGPT, the hashtag #chatGPT reached more than 100,000 Tweets. Many users have shown enthusiasm, and have from the start shown the limitations of the chatbot. This allows researchers to find corner cases much more quickly where their model does not work properly.
How Twitter’s recent acquisition has concerned the scientific community
As a demonstration of some concern, two letters were written to Nature recently (here and here). Their publication is a sign that the scientific community is concerned about recent Twitter developments.
The first reason is that it is unclear whether Twitter will still allow researchers free access to data or whether this will be restricted in some way. Second, if researchers migrate to other platforms this will fragment the data and the community. Third, one reason for the value of Twitter’s data is that it is longitudinal (allowing trends and their evolutions to be analyzed), and if Twitter is abandoned this data will no longer be used for time series analysis.
Twitter is an important source of data precisely because they are reliable. Some researchers are concerned that this is no longer the case. Musk recently reduced Twitter’s team of moderators and has always spoken out in defense of ‘freedom of speech’. After draconian staff cuts, moderation on Twitter leans on automation. Many researchers fear this means a proliferation of hate speech, bots, and misinformation.
In addition, Elon Musk announced that he plans to delete more than 1 billion inactive accounts. It is not clear after how long an account is declared inactive and risks deletion. Although recent hints suggest it could be after a year. The deletion of these accounts is likely to lead to the loss of historical information and make researchers’ studies more problematic.
Also as mentioned before, Twitter has been used as before to discuss the limitations of a model and by researchers to connect before an important conference (e.g. during NeurIPS 2022 many researchers were discussing their work on Twitter or arranging to meet). Unfortunately, there is currently no alternative (Mastodon is not a replica of Twitter and is divided into several servers or communities).
Twitter for several years has been a place of discussion for many researchers. Its policy has enabled the collection of datasets that have been used for important research. On the other hand, almost all data scientists have tried their hand while learning to program with at least one dataset obtained from Twitter (such as the famous Twitter sentiment analysis).
Twitter, even more than LinkedIn, was the first audience in which to announce a new article and a new model of artificial intelligence. Users on Twitter have always been very active in presenting concerns about the presence of bias in models, corner cases, and so on.
Researchers commented with concern on Twitter’s recent acquisition. Both because Twitter may no longer be a base for collecting reliable data, and because relevant information may be lost in a tide of misinformation. What do you think? Let me know in the comments.
If you have found it interesting:
You can look for my other articles, you can also subscribe to get notified when I publish articles, and you can also connect or reach me onLinkedIn. Thanks for your support!
Here is the link to my GitHub repository, where I am planning to collect code and many resources related to machine learning, artificial intelligence, and more.