#DH2018 and #DH2019 Twitter Archive Counts. A Comparison


My interest in documenting the scholarly activity on Twitter using conference hashtags is not new; for the digital humanities I have been looking into it since 2010. Searching on this blog or googling related keywords may throw some results to those interested in background. I have been archiving conference hashtag archives for a while now, often depositing them as part of the scholarly record, blogging and giving workshops about my objectives and methdologies, etc.

I like sharing results in real time while conferences are taking place or shortly after. Therefore any results shared are always-already provisional, perfectible, and unfinished. I have always believed that a signal is better than no signal or having to wait 3 years for one, therefore I insist in sharing any quick insights that I can get rather than not sharing them at all or having to wait until I miraculously find the time to do it differently (which I am not likely to, so I’d rather take any opportunity I have to share something). Hopefully someone finds it helpful in some way.

Once again I have also been critical of the metrication of scholarly activitiy so the fact that I share quantitative data from the archives collected does not mean I think this metrication is always-already something to aspire to or that it means anything in particular. I see it as an ethnographic means to document the existence of scholarly activity on Twitter around academic conferences in specific fields, and perhaps as an entry point to assess academic and public engagement on Twitter with academic hashtags and the events they represent, and/or possibly any increase or decrease or transformations in this type of activity on Twitter. For example, it is possible to gain insights of Twitter user settings preferences, as in the case of the language users have set up, as I looked into this post on user_lang in #DH2018 tweets.

The Methods

The metrics compared here are the result of a double method of collection as a means to ensure the validity of the collected data. I used a Python script to collect both archives, and then set the parameters as those for archives I collected using TAGS (see Priego 2018). Even if the collected data still needs to be refined, when the counts are the same or very semilar I get a degree of certainty the data collected via TAGS from the Twitter Search API is close to being as reliable as it could be.

For 2018 and 2019 I managed to get the settings and timings right to achieve what looks like a complete set of #DH2018 and #DH2019 tweets. Below I share a comparative table where the main metrics can be compared. As indicated in the table, it must be noted that there are important differences in mainly a) the number of days before and after the conference days included in the archive and b) the number of days each conference was held on according to their respective web pages / programmes (I seem to remember the Mexico City conference had activities at least one day prior to the date indicated on the main web site but I may be misremembering- need to check).

The Basic Counts

Needless to say most interesting or useful insights from looking at these archives would be qualitative data and not necessarily quantitative data as the one presented here. The RTs and @ replies stats can give an indication of the level of interaction in between accounts, and the number of accounts tweeting with each hashtag each year could be seen as an indication of the interest in the conference or hashtag (this indication may be misguiding due to spamming or confusion due to hashtag overlap, and of course one would need to know which accounts are included and not included in each one).

There is a series of analyses that can be run with the full data collected and I hope that now that I have a more solid longitudinal dataset of yearly archives I may be able to do that with more roubstness soon. In the meanwhile then, for what they are worth here are the main archive stats compared for last year and this year.


#DH2018 #DH2019 Notes
First conference day according to programme 26/06/2018 08/07/2019
Last conference day according to programme 29/06/2019 12/07/2019
First Tweet Collected in Archive 24/06/2018 06:19 29/06/2019 02:13 Local conference time zone
Last Tweet Collected in Archive 30/06/2018 06:17 14/07/2019 22:56 Local conference time zone
Days collected 6 days 16 days
Number of collected tweets (includes RTs) 13858 14101 Data might require refining and deduplication
In Reply Ids 564 1091
In Reply @s 747 812
Number of links 4312 9061
Number of RTs 8656 8650 Estimate on occurrence of RTs
Number of unique accounts 2329 2157
Conference location Mexico City, Mexico Utrecht, the Netherlands
Priego, E. (2019): #DH2018 and #DH2019 Twitter Archive Counts. Summary Comparative Data Table. figshare. Dataset. https://doi.org/10.6084/m9.figshare.8918810



Even if I collected #DH2019 during a longer period (ten days more than the #DH2018 archive), there were fewer unique user accounts using #DH2019 than #DH2018. And taking into account the #DH2019 archive included more collection days and therefore more opportunity for interactions, the #DH2019 archive showed more replies, mentions and links than the #DH2018 one. The number of tweets and RTs in both archives (again, taking into account the differences in collection days) remained very close. It could be argued the Twitter activity does not indicate an increment nor reduction in engagement (as manifested through tweets or RTs) with the conference hashtag, while showing that this year fewer accounts participated. What follows is refining and deduplicating the data if required, in order to limit the archives to the same data collection timings, revise the initial insights, and then perform qualitative text and account analysis in order to determine amongst other things if any differences in unique accounts using the hashtag were relevant to the field, or were simply bots or other unrelated accounts like spam bots. That qualitative refining could give us greater certainty about any changes in the demographic engaging with the conference hashtags over the years. This needs to be done carefully and following ethical standards.

A Polite Request

If you are interested in this same topic and you read this please do not disregard this output only because it’s not been published in a peer-reviewed journal. If you get any type of inspiration or value or motivation from this post, my tweets about it or any other blog posts about Twitter archiving, please do cite these outputs- not only is it good academic practice but a way for us to know about other responses to the same issues and to continue building knowledge together.


Priego, E. (2018) Archiving Small Twitter Datasets for Text Analysis: A Workshop Tutorial for Beginners. figshare. https://doi.org/10.6084/m9.figshare.6686798
Priego, E. (2019): #DH2018 and #DH2019 Twitter Archive Counts. Summary Comparative Data Table. figshare. Dataset. https://doi.org/10.6084/m9.figshare.8918810