
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.
This is part IV. For necessary context, methodology, limitations, please see here (part 1), here (part 2), and here (part 3).
Since this was published and shared for the first time I may have done new edits. I often come back to posts once they have been published to revise them.
—
Throughout the process of performing the day by day text analysis I became aware of other limitations to take into account and I have revised part 3 accordingly.
Summary
Here’s a summary of the counts of the source (unrefined) #WLIC2016 archive I collected:
Number of Links |
12435 |
Number of RTs estimate based on occurrence of RT |
14570 |
Number of Tweets |
23552 |
Unique Tweets <-used to monitor quality of archive |
23421 |
First Tweet in Archive | 14/08/2016 11:29:03 EDT |
Last Tweet in Archive | 22/08/2016 04:20:53 EDT |
In Reply Ids |
270 |
In Reply @s |
429 |
Number of Tweeters |
3035 |
As previously indicated the Tweet count includes RTs. This count might require further deduplication and it might include bots’ Tweets and possibly some unrelated Tweets.
Here’s a summary of the Tweet count of the #WLIC2016 dataset I refined from the complete archive. As I explained in part 3 I organised the Tweets into conference days, from Sunday 14 to Thursday 18 August. Each day was a different corpus to analyse. I also analysed the whole set as a single corpus to ensure the totals replicated.
Day | Tweet count |
Sunday 14 August 2016 |
2543 |
Monday 15 August 2016 |
6654 |
Tuesday 16 August 2016 |
4861 |
Wednesday 17 August 2016 |
4468 |
Thursday 18 August 2016 |
3801 |
Thursday – Sunday |
22327 |
The Most Frequent Terms
The text analysis involved analysing each corpus, first obtaining a ‘raw’ output of 300 most frequent terms and their counts. As described in previous posts, I then applied an edited English stop words list followed by a manual editing of the top 100 most frequent terms (for the shared dataset) and of the top 50 for this post. Unlike before in this case I removed ‘barack’ and ‘obama’ from Thursday and Monday’s corpora, and tried to remove usernames and hashtags though it’s posssible that further disambiguation and refining might be needed in those top 100 and top 50.
The text analysis of the Sun-Thu Tweets as a single corpus gave us the following Top 50:
#WLIC2016 Sun-Thu Top 50 Most Frequent Terms (stop-words applied; edited)
Rank |
Term | Count |
1 |
libraries |
2895 |
2 |
library |
2779 |
3 |
librarians |
1713 |
4 |
session |
1467 |
5 |
access |
872 |
6 |
world |
832 |
7 |
public |
774 |
8 |
copyright |
766 |
9 |
people |
757 |
10 |
need |
750 |
11 |
data |
746 |
12 |
make |
733 |
13 |
privacy |
674 |
14 |
digital |
629 |
15 |
new |
615 |
16 |
wikipedia |
602 |
17 |
indigenous |
593 |
18 |
use |
574 |
19 |
information |
555 |
20 |
great |
539 |
21 |
knowledge |
512 |
22 |
literacy |
502 |
23 |
internet |
481 |
24 |
work |
428 |
25 |
thanks |
419 |
26 |
message |
416 |
27 |
future |
412 |
28 |
change |
379 |
29 |
social |
378 |
30 |
open |
369 |
31 |
just |
354 |
32 |
research |
353 |
33 |
know |
330 |
34 |
community |
323 |
35 |
important |
319 |
36 |
oclc |
317 |
37 |
collections |
312 |
38 |
books |
300 |
39 |
learn |
300 |
40 |
opening |
291 |
41 |
read |
289 |
42 |
impact |
287 |
43 |
place |
282 |
44 |
good |
280 |
45 |
services |
277 |
46 |
national |
276 |
47 |
best |
272 |
48 |
latest |
269 |
49 |
report |
267 |
50 |
users |
266 |
As mentioned above I also analysed each day as a single corpus. I refined the ‘raw’ 300 most frequent terms per day to a top 100 after stop words and manual editing. I then laid them all out as a single table for comparison.
#WLIC2016 Top 50 Most Frequent Terms per Day Comparison (stop-words applied; edited)
Rank |
Sun 14 Aug |
Mon 15 Aug |
Tue 16 Aug |
Wed 17 Aug |
Thu 18 Aug |
1 |
libraries | library | library | libraries | libraries |
2 |
library | libraries | privacy | library | library |
3 |
librarians | librarians | libraries | librarians | librarians |
4 |
session | session | librarians | indigenous | public |
5 |
access | copyright | session | session | session |
6 |
world | wikipedia | people | knowledge | need |
7 |
public | digital | data | access | data |
8 |
copyright | make | indigenous | data | impact |
9 |
people | world | make | literacy | new |
10 |
need | internet | access | need | digital |
11 |
data | access | wikipedia | great | world |
12 |
make | new | use | people | thanks |
13 |
privacy | need | information | research | access |
14 |
digital | use | world | public | value |
15 |
new | public | public | new | national |
16 |
wikipedia | future | knowledge | marketing | change |
17 |
indigenous | people | copyright | general | privacy |
18 |
use | message | homeless | open | great |
19 |
information | collections | literacy | world | work |
20 |
great | information | oclc | archives | research |
21 |
knowledge | content | great | just | use |
22 |
literacy | open | homelessness | national | people |
23 |
internet | report | need | assembly | knowledge |
24 |
work | space | freedom | place | social |
25 |
thanks | trend | like | make | using |
26 |
message | great | thanks | read | know |
27 |
future | net | internet | community | make |
28 |
change | work | info | social | services |
29 |
social | neutrality | latest | reading | skills |
30 |
open | making | experiencing | work | award |
31 |
just | update | theft | information | information |
32 |
research | books | important | use | learning |
33 |
know | collection | just | learn | users |
34 |
community | social | subject | share | book |
35 |
important | design | change | matters | user |
36 |
oclc | data | guidelines | key | best |
37 |
collections | thanks | digital | know | collections |
38 |
books | librarian | students | global | academic |
39 |
learn | know | know | government | measure |
40 |
opening | shaping | online | life | poland |
41 |
read | protect | thanks | community | |
42 |
impact | change | working | important | learn |
43 |
place | literacy | statement | development | outcomes |
44 |
good | just | work | love | share |
45 |
services | technology | future | impact | time |
46 |
national | online | read | archivist | media |
47 |
best | poster | award | good | section |
48 |
latest | info | create | books | important |
49 |
report | working | services | cultural | service |
50 |
users | law | good | help | closing |
I have shared on figshare a datset containing the summaries above as well as the raw top 300 most frequent terms for the whole set as well as divided per day. The dataset also includes the top 100 most frequent terms lists per day that I manually edited after having applied the edited English stop word filter.
You can download the spreadsheet from figshare:
Priego, Ernesto (2016): #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2
Please bear in mind that as refining was done manually and the Terms tool does not always seem to apply stop words evenly there might be errors. This is why the raw output was shared as well. This data should be taken to be indicative only.
As it is increasingly recommended for data sharing, the CC-0 license has been applied to the resulting output in the repository. It is important however to bear in mind that some terms appearing in the dataset might be licensed individually differently; copyright of the source Tweets -and sometimes of individual terms- belongs to their authors. Authorial/curatorial/collection work has been performed on the shared file as a curated dataset resulting from analysis, in order to make it available as part of the scholarly record. If this dataset is consulted attribution is always welcome.
Ideally for proper reproducibility and to encourage other studies the whole archive dataset should be available. Those wishing to obtain the whole Tweets should still be able to get them themselves via text and data mining methods.
Conclusions
Indeed, for us today there is absolutely nothing surprising about the term ‘libraries’ being the most frequent word in Tweets coming from IFLA’s World Library and Information Congress. Looking at the whole dataset, however, provides an insight into other frequent terms used by Library and Information professionals in the context of libraries. These terms might not remain frequent for long, and might not have been frequent words in the past (I can only hypothesise– having evidence would be nice).
A key hypothesis for me guiding this exercise has been that perhaps by looking at the words appearing in social media outputs discussing and reporting from a professional association’s major congress, we can get a vague idea of where a sector’s concerns are/were.
I guess it can be safely said that words become meaningful in context. In an age in which repetition and frequency are key to public constructions of cultural relevance (‘trending topics’ increasingly define the news agenda… and what people talk about and how they talk about things) the repetition and frequency of key terms might provide a type of meaningful evidence in itself. Evidence, however, is just the beginning– further interpretation and analysis must indeed follow.
One cannot obtain the whole picture from decomposing a collectively, socially, publicly created textual corpus (or perhaps any corpus, unless it is a list of words from the start) into its constituent parts. It could also be said that many tools and methods often tell us more about themselves (and those using them) than about the objects of study.
So far text analysis (Rockwell 2003) and ‘distant reading’ through automated methods has focused on working with books (Ramsay 2014). However I’d like to suggest that this kind of text analysis can be another way of reading social media texts and offer another way to contribute to the assessment of their cultural relevance as living documents of a particular setting and moment in time. Who knows, they might also be telling us something about the present perception and activity of a professional field- and might help us to compare it with those in the future.
Other Considerations
Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailon, Sandra, et al, 2012).
Apart from the filters and limitations already declared, it cannot be guaranteed that each and every Tweet tagged with #WLIC2016 during the indicated period was analysed. The dataset was shared for archival, comparative and indicative educational research purposes only.
Only content from public accounts, obtained from the Twitter Search API, was analysed. The source data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.
These posts and the resulting dataset contain the results of analyses of Tweets that were published openly on the Web with the queried hashtag; the content of the Tweets is responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.
This work is shared to archive, document and encourage open educational research into scholarly activity on Twitter. The resulting dataset does not contain complete Tweets nor Twitter metadata. No private personal information was shared. The collection, analysis and sharing of the data has been enabled and allowed by Twitter’s Privacy Policy. The sharing of the results complies with Twitter’s Developer Rules of the Road.
A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). Tweets published publicly by scholars or other professionals during academic conferences are often publicly tagged (labeled) with a hashtag dedicated to the conference in question. This practice used to be the confined to a few ‘niche’ fields; it is increasingly becoming the norm rather than the exception.
Though every reason for Tweeters’ use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.
In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter’s Privacy and data sharing policies.
Professional associations like the Modern Language Association and the American Pyschological Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter’s search API has well-known temporal limitations for retrospective historical search and collection.
Beyond individual Tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. Though this work has limitations and might not be thoroughly systematic, it is hoped it can contribute to developing new insights into a discipline’s public concerns as expressed on Twitter over time.
References
González-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, Assessing the Bias in Samples of Large Online Networks (December 4, 2012). Available at SSRN: http://dx.doi.org/10.2139/ssrn.2185134
Priego, Ernesto (2016) #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2
Ramsay, Stephen (2014) “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee, 111-20. Ann Arbor: University of Michigan Press, 2014. Also available at http://quod.lib.umich.edu/d/dh/12544152.0001.001/1:5/–pastplay-teaching-and-learning-history-with-technology?g=dculture;rgn=div1;view=fulltext;xc=1
Rockwell, Geoffrey (2003) “What is Text Analysis, Really? [PDF]” preprint, Literary and Linguistic Computing, vol. 18, no. 2, 2003, p. 209-219.
2 thoughts on “Libraries! Most Frequent Terms in #WLIC2016 Tweets (part IV)”
Comments are closed.