Last month I took a quick look at a month’s worth of Trumpian tweetage (user ID 25073877) using text analysis. Using a similar methodology I have now prepared and shared a CSV file containing Tweet IDs and other metadata of 3,805 Tweets from user ID 25073877 posted publicly between Thursday February 25 2016 16:35:12 +0000 to Monday April 03 2017 12:51:01 +0000. I deposited the file on figshare, including notes on motivation and methodology, here:
3805 Tweet IDs from User 25073877 [Thu Feb 25 16:35:12 +0000 2016 to Mon Apr 03 12:51:01 +0000 2017].
The dataset allows us count the sources for each Tweet (i.e. the application used to publish each Tweet according to the data provided by the Twitter Search API). The resulting counts are:
|Twitter for iPhone||1816|
|Twitter for Android||1672|
|Twitter Web Client||287|
|Twitter for iPad||22|
As we have seen in previous posts, the account has alternated between iPhone and Android since the Inauguration. I wanted to look at relative trends throughout the dataset. Having prepared the main dataset I performed the text analysis of a document comprising the source listing arranged in chronological order according to the date and time of Tweet publication, and the listing corresponds to Tweets published between 25 February 2016 and Monday 3 April 2017. Using the Trends tool in Voyant, I divided the document in 25 segments, with the intention to roughly represent each monthly period covered in the listing and highlight source relative frequency trends in the period covered per segment.
The Trends tool shows a line graph depicting the distribution of a word’s occurrence across a corpus or document; in this case each word represents the source of a Tweet in the document. Each line in the graph is coloured according to the word it represents, at the top of the graph a legend displays which words are associated with which colours. I only included the most-used sources, leaving iPad there as reference.
The resulting graph looks like this:
I enjoyed this article by Christopher Ingraham (Washington Post Weblog, 3 April 2017), and I envy the access to the whole Trupian tweetage dataset, that would be essential to attempt to reproduce the analysis presented. The piece focuses on the use of exclamation marks (something I took an initial look at on my 6 February 2017 post), but it would be useful to take a closer look at any potential significant correlations between use of language in specific Tweets and the sources used to post those Tweets.
The article also has an embedded video titled ‘When it’s actually Trump tweeting, it’s way angrier’, repeating claims that there is a clear difference between those Tweets the account in question published from an iPhone and those published from an Android. I briefly referred to this issue on my 15 March 2017 post already, and I have not seen evidence yet that it is a staffer who actually posts from Twitter for iPhone from the account. I may be completely wrong, but I am still not convinced there is data-backed evidence to say for certain that Tweets from different sources are always tweeted by two or more different people, or that the differences in language per source are predictable and reliably attributable to a single specific person (the same people can after all tweet from the same account using different devices and applications, and indeed potentially. use different language/discourse/tone). Anecdotal, I know, but I have noticed that sometimes my tweetage from the Android mobile app is different from my tweetage from TweetDeck on my Mac, but no regular patterns can be inferred there.
I do not necessarily doubt there is more than one person using the account, nor that the language used may vary significantly depending on the Tweets’ source. What I’d like to see however is more robust studies demonstrating and highlighting correlations between language use in Tweets- texts and Tweets’ sources from the account in question taking into consideration that the same users can own different devices and use different language strategies depending on a series of contextual variables. Access to the source data of said studies should be consider essential for any assessment of any results or conclusions provided. Limitations and oppostion to more open sharing of Twitter data for research reproducibility are just one hurdle on the way for more scholarship in this area.