
A paper claiming smartphones created a mental health crisis in young girls should be retracted
A psychologist warns about the dangers of young girls having smartphones and unsupervised time on social media. One of the key scientific papers she cites as evidence actually offers zero support for the warning.
Ambitious academic psychologists quickly discover a startling reality. They can make more money and enjoy more name recognition even among other academics by marketing advice merchandise in the form of books, workshops, and lucrative corporate speaking gigs than they can from putting an equivalent amount of time developing the strength and quality of their research.
There can be a conflict between psychologists making the kinds of strong, definitive statements that appeal to nonprofessional audiences and being true to the often messy and contradictory findings of their research.
Many of us see this as a conflict of interest that should be prominently declared in our scientific papers. This is not a confession of any wrongdoing, but only a courtesy alerting of readers who feel that the information needs to be taken into account. Even journals that have adopted such a policy of disclosure of interests do not consistently enforce the requirement.
Over the years, I have earned a reputation for debunking simplistic, extravagant, and outright false claims in the research of psychologists who aim to appeal to lay audiences. I have shown how easily this task can be accomplished, especially with studies that are lavishly praised in places like the Atlantic, the New York Times, or the Guardian. What I do is often simply a matter of checking the consistency of what is said in the text and tables of scientific papers, without having to perform re-analyses of data.
In this article, I am back to my old tricks. I take a critical look at a scientific paper in Clinical Psychological Science that frequently comes up in discussions of the dangers to young girls of having smartphones and unsupervised time on social media. A number of respected scientists have pointed to serious problems with this paper. I know of no one who asks for a retraction, but I call for one at the end of this article.
The first author of the paper is Professor Jean Twenge, who is in the Department of Psychology at San Diego State University. She is widely sought after for her views about the psychological vulnerability of what she calls an iGen generation of preteens who were born between 1995 and 2012.
According to Twenge’s account, young girls in this age group, but not boys, are supposedly at particular risk for negative mental health outcomes including suicide because of their excessive dependence on smartphones and their unsupervised exposure to social media. Twenge says her concerns are based on solid scientific evidence from both her own studies with colleagues, as well as studies from other researchers.
Let’s hold Twenge to what she and her colleagues say in a key research paper, not what she says to lay audiences.
Why I think we should request a retraction of this paper: A drive-by, cursory review
Twenge JM, Joiner TE, Rogers ML, Martin GN. Increases in depressive symptoms, suicide-related outcomes, and suicide rates among US adolescents after 2010 and links to increased new media screen time. Clinical Psychological Science. 2018 Jan;6(1):3–17.
The abstract of the article announces that the study involves:
Two nationally representative surveys of U.S. adolescents in grades 8 through 12 (N = 506,820) and national statistics on suicide deaths for those ages 13 to 18, adolescents’ depressive symptoms, suicide-related outcomes, and suicide rates.
It is not clear from the abstract, but the two surveys will be entered into a single set of analyses, despite the surveys being quite different and the sample completing one or the other survey having no overlap. There is no study 1 and study 2 ahead in this paper. This is unusual. Many readers will be confused.
One data set comes from publicly available surveys of 8th, 10th, and 12th graders conducted every year since 1991 and has n= 388,275. The other data set, also publicly available, is a subsample of students who completed other surveys of 9th, 10th, 11th, and 12th graders every other year since 1991 with n = 118,545.
I will refer to the first survey as Mtf [Monitoring the Future] and the second as YRBSS [Youth Risk Behavior Surveillance System].
A study with over half a million participants may seem impressive but has come about by combining participants who received either the MtF or the YRBSS survey, not a complete set of questions. All the data that is available is cross-sectional and correlational. There is no accessing what any participant in the study said at another time — or a chance to check if they later died by suicide. Some guesswork is involved in making any comparisons within this data. The authors will be on shaky ground.
Survey researchers know that if they are in an undesirable position if they have only have cross-sectional data, they are limited in what they can say about causality: they cannot rule out reverse causality. In this case, that would mean that being depressed and suicidal could have just as well caused participants to spend more time on smartphones and social media, rather than vice versa, as the authors want to demonstrate.
This study is different than a panel study where the same participants were followed so that the authors could compare participants' said at one time with what they said at another time.
Authors having the suicide data would seem to be an exceptional strength to their study, but also comes with serious limitations. The official statistics collected by the government are anonymized and cannot be linked to the survey data. Thus, the authors cannot use the surveys to predict which young participants later died by suicide.
Assessments
Under the heading of Assessments, the authors list the variables they included in their analyses. In a few instances, the authors specify how each variable was measured on each of the two surveys. Mostly, the authors can only point to one survey on which the variables occurred. A relaxed reader might once again trust that surely the data were more comparable across studies, even if the authors did not say so explicitly.
The authors’ claim about depressive symptoms increasing over time requires having a measure of depressive symptoms from every participant especially crucial. Yet, a measure is available only for participants who received the MtF. Worse, the authors are limited to six items such as participants having a lost sense of meaning or purpose. We don’t know if they have endorsed most other symptoms needed for a diagnosis of depression or even a score above the cutpoint on a standard checklist so we could say that they screened positive for possible depression.
Comparisons of results of this study with the huge amount of depression screening checklist data available in other studies is just not possible. The authors cannot readily assign a cutoff or meaningful interpretation of scores that they have that could be related to screening checklists or inventories available elsewhere. Readers cannot be confident that some arbitrary distinction the authors make like “upper third of youth” is comparable to the upper third of youth in another study.
Facing this situation, many survey researchers would perform a small study with another sample in which they administered both the measure with which they were stuck in this study and a few conventional measures. They would use a hopefully high correlation among the measures to soften the blow of having only one measure in this study. There is no indication that this was done.
Another crucial variable, suicide-related outcomes, is assessed with only four items on the other survey, the YRBSS, so participants providing this information did not provide a measure of depressive symptoms.
Anyone experienced in basic survey methodology would expect serious problems in creating an overall picture of participants because a complete set of questions is not available for any individual participant. Some data nerds might be screaming at this point: “How can the authors get away with this?”
Bizarrely, one odd item that the authors consider a “suicide-related outcome” is actually is a composite of items usually found as two items on brief screening checklists for depression. The item asks about participants experiencing either sadness, hopelessness, or loss of interest in doing things with a request for a simple “yes” or “no” response.
“During the past 12 months, did you ever feel so sad or hopeless almost every day for two weeks or more in a row that you stopped doing some usual activities?”
A second item requires a yes/no answer to whether a participant had seriously considered attempting suicide.
The third required a yes/no response to whether the participant had made a suicide attempt.
The fourth item inquired about the “number of times suicide was attempted,” but the scoring of the item reduced the four response options to a simple yes/no so information whether there was one or numerous attempts is lost.
Experienced survey researchers are familiar with the problem and would be hesitant to put together a measure in this fashion.
Perhaps realizing they were in trouble with these survey data, the authors made an extraordinary decision that guaranteed a disaster. They rescored the summed total of the four items as 0/1 or yes/no. A survey researcher would recognize that by far the most common way of getting a positive 1 or yes score would be participants endorsing a two-week mood disturbance at least once in a year. That is not what most researchers or policymakers would consider a meaningful “suicide-related outcome.” It also does not allow attention to a much smaller, but more interesting number of participants who obtained that score having had multiple suicide attempts.
We might want to know if participants had thoughts of hurting themselves, or if they made a plan, or had made multiple attempts to die by suicide. These different scenarios would be treated as if they were identical.
Electronic Device Use
So far, this study is in serious trouble. The authors were stuck with the limited questions that were asked of only some participants. Obviously, the surveys were not originally constructed with their key research questions in mind. Their situation gets worse when we get to their distinctive concern with young girls being harmed by having their own smartphones and unsupervised time spent on social media.
The only available question about cell phones or smartphones was asked on the YRBSS and only in 2013 and 2015. Participants were asked a complex question but required to give a simple summary answer:
“On an average school day, how many hours do you play video or computer games or use a computer for something that is not school work? (Count time spent on things such as Xbox, PlayStation, an iPod, an iPad or other tablet, a smartphone, YouTube, Facebook or other social networking tools, and the Internet.)” Response choices were recoded as follows: “I do not play video or computer games or use a computer for something that is not school work” = 0; “less than 1 hour per day” = .5; “1 hour per day” = 1; “2 hours per day” = 2; “3 hours per day” = 3; “4 hours per day” = 4; and “5 or more hours per day” = 6.
Note that it would be impossible to tease out whether or not these participants even possessed a phone or if any restrictions were placed on accessing it, like whether they were answering the question with respect to the school year or summer or when they had a job.
Participants who received this question on the YRBSS were also asked about “suicide-related outcomes, which we just saw was not a good measure precisely about suicide-related outcomes.
Twenge makes lots of fuss in the media about young girls’ unsupervised exposure to social media sites, but the authors cannot say much because participants were asked about that only on the Mtf. They were not the same participants who were asked about suicide-related outcomes, but they were asked about depressive symptoms.
The question about social media was only asked starting in 2009:
“How often do you do each of the following? Visit social networking websites (like Facebook).” Response choices were never = 1, a few times a year = 2, once or twice a month = 3, at least once a week = 4, and almost every day = 5.
If you have followed me this far, you are special. You are aware of things that most readers do not notice, probably even the reviewers who recommended the publication of the paper. Your level of skepticism has been raised and you are now prepared to discover things about other studies that most readers miss. Lots of surprises and disappointments await you.
Results
After these problems have been revealed, you can expect that anything that looks interesting in the results is probably not accurate and interpretable.
Take the opening two sentences:
Depressive symptoms, suicide-related outcomes, and suicide deaths among adolescents all rose during the 2010s. These increases follow a period when mental health issues were declining or stable (see Table 1). Between 2009/2010 and 2015, 33% more adolescents exhibited high levels of depressive symptoms (item mean of 3 or over; 16.13% in 2010, 21.48% in 2015), 12% more reported at least one suicide-related outcome (31.93% in 2009, 35.80% in 2015; 5% more since 2011, 34.21%), and 31% more died by suicide (5.38 per 100,000 population in 2010, 7.04 in 2015).
Any illusion of meaningfulness is dispelled when we recall the poor quality of cross-sectional survey data that is used to calculate these results. We cannot really tell the clinical or practical significance of the increase in depressive symptoms from 2010 to 2015 or whether it is related to fluctuations in the rates of suicide.
If participants were given the measure of depressive symptoms, they were not asked about suicide-related outcomes. In turn, most participants who are considered as having at least one suicide-related outcome got classified that way because they only endorsed what is basically a depression screening question. No participant who was asked an imprecise question about devices was asked about the use of social media platforms. No one was asked about depressive symptoms, suicide-related outcomes, devices, and the vague question about how often they accessed social media platforms.
What is reported about deaths by suicide cannot be connected to depressive symptoms or suicide-related outcomes. I do not assume most readers will be aware that the suicide data does not come from participants who completed the measure of depressive symptoms. These readers will be primed to be misled by seeing the two variables mentioned in the same sentence.
Problems are compounded when authors take two seriously limited measures and calculate a simple correlation. The score for each measure consists of what the authors intended to measure mixed in with junk or error. If authors calculate the correlation, the error in each measure gets multiplied:
(What-we-want-to-know1 — junk1) x (What-we-want-to-know2 — junk2).
Twenge and co-author Thomas Joiner had sharp exchanges with various critics in letters in Clinical Psychological Science, as well other scientific and social media outlets, including on Twitter. Twenge in particular gets defensive and even combative and does not yield much ground to critics. In almost all instances where statistical questions arise, I side with critics who point out how many basic conventions are violated in the original article.
I think we can put these issues aside and make some final, decisive points just by eyeballing Table 2.* You can see immediately that all numbers are quite small, even minuscule, regardless of any statistical adjustment. The associations between depressive symptoms and screen activities range from .00 to .05. less than between depressive symptoms and exercise. We can only examine the association between suicide-related outcomes and the use of social media. We can’t be sure what it means and in which direction causality is most likely.
When critics comment that these are all small numbers, Twenge counters that even small numbers can be important when we are studying something so important as the mental health and life and death of young girls. I do not think that is the point. Rather, small numbers can come about because there is nothing there and the numbers reflect something other than what authors wanted to know.
It becomes clear that Twenge and her co-authors are very confused about the fundamental purpose of doing this kind of research when they are interpreting their findings in the results section and then discussing their overall pattern of findings in the discussion section. Twenge seems intent on rejecting the null hypothesis of none of the findings they expected actually being there (p<.05). Actually, Twenge’s goal should be that the findings are so strong (the numbers so large), that she should rush out and tell laypersons in press releases, books, workshops, and lucrative corporate speaking gigs,
Why should this paper be retracted?
Twenge JM, Joiner TE, Rogers ML, Martin GN. Increases in depressive symptoms, suicide-related outcomes, and suicide rates among US adolescents after 2010 and links to increased new media screen time. Clinical Psychological Science. 2018 Jan;6(1):3–17.
This paper should be retracted because it should not have been published in the first place and it is a serious source of misinformation about the dangers of smartphones and social media. I am open to such warnings being appropriate, but I find nothing in this paper to warrant that, except the authors’ preconceptions.
This paper should be retracted because the authors started with a set of data that is totally inadequate to answer their research questions and found nothing important that was interpretable.
A lack of results could simply be due to the authors asking the wrong questions or their measures were so poorly constructed.
The authors already had headaches when which they tried to cure by smashing their heads with a hammer. They provided the most defensive, confirmatory bias (We actually got what we expected it, but you cannot see it.) that I have ever seen.
For those who are indignant that I would say such things about a published paper, I will give an analogy. Suppose I went to a new restaurant and ordered the house specialty of fresh tomato soup with oil of chives. My experience was spoiled by two cockroaches that were alive enough to struggle to swim out of the bowl. I think most people would be grateful for me to reveal this and not beat around the bush with “Unexpected bits of crunchiness were such a delightful surprise that I screamed for more.”
Scientific papers are not in many ways like soup except consumers might want to be prepared for what they are getting.

Sign up at [email protected] for news about new stuff — new articles elsewhere, new media, new event — that will be coming soon from James C. Coyne AKA Coyne of the Realm.
*TABLE 2






