Libraries! Most Frequent Terms in #WLIC2016 Tweets (part IV)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA — IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

This is part IV. For necessary context, methodology, limitations, please see here (part 1), here (part 2), and here (part 3).

Since this was published and shared for the first time I may have done new edits. I often come back to posts once they have been published to revise them.

—

Throughout the process of performing the day by day text analysis I became aware of other limitations to take into account and I have revised part 3 accordingly.

Summary

Here’s a summary of the counts of the source (unrefined) #WLIC2016 archive I collected:

Number of Links	12435
Number of RTs estimate based on occurrence of RT	14570
Number of Tweets	23552
Unique Tweets <-used to monitor quality of archive	23421
First Tweet in Archive	14/08/2016 11:29:03 EDT
Last Tweet in Archive	22/08/2016 04:20:53 EDT
In Reply Ids	270
In Reply @s	429
Number of Tweeters	3035

As previously indicated the Tweet count includes RTs. This count might require further deduplication and it might include bots’ Tweets and possibly some unrelated Tweets.

Here’s a summary of the Tweet count of the #WLIC2016 dataset I refined from the complete archive. As I explained in part 3 I organised the Tweets into conference days, from Sunday 14 to Thursday 18 August. Each day was a different corpus to analyse. I also analysed the whole set as a single corpus to ensure the totals replicated.

Day	Tweet count
Sunday 14 August 2016	2543
Monday 15 August 2016	6654
Tuesday 16 August 2016	4861
Wednesday 17 August 2016	4468
Thursday 18 August 2016	3801
Thursday – Sunday	22327

The Most Frequent Terms

The text analysis involved analysing each corpus, first obtaining a ‘raw’ output of 300 most frequent terms and their counts. As described in previous posts, I then applied an edited English stop words list followed by a manual editing of the top 100 most frequent terms (for the shared dataset) and of the top 50 for this post. Unlike before in this case I removed ‘barack’ and ‘obama’ from Thursday and Monday’s corpora, and tried to remove usernames and hashtags though it’s posssible that further disambiguation and refining might be needed in those top 100 and top 50.

The text analysis of the Sun-Thu Tweets as a single corpus gave us the following Top 50:

#WLIC2016 Sun-Thu Top 50 Most Frequent Terms (stop-words applied; edited)

Rank	Term	Count
1	libraries	2895
2	library	2779
3	librarians	1713
4	session	1467
5	access	872
6	world	832
7	public	774
8	copyright	766
9	people	757
10	need	750
11	data	746
12	make	733
13	privacy	674
14	digital	629
15	new	615
16	wikipedia	602
17	indigenous	593
18	use	574
19	information	555
20	great	539
21	knowledge	512
22	literacy	502
23	internet	481
24	work	428
25	thanks	419
26	message	416
27	future	412
28	change	379
29	social	378
30	open	369
31	just	354
32	research	353
33	know	330
34	community	323
35	important	319
36	oclc	317
37	collections	312
38	books	300
39	learn	300
40	opening	291
41	read	289
42	impact	287
43	place	282
44	good	280
45	services	277
46	national	276
47	best	272
48	latest	269
49	report	267
50	users	266

As mentioned above I also analysed each day as a single corpus. I refined the ‘raw’ 300 most frequent terms per day to a top 100 after stop words and manual editing. I then laid them all out as a single table for comparison.

#WLIC2016 Top 50 Most Frequent Terms per Day Comparison (stop-words applied; edited)

Rank	Sun 14 Aug	Mon 15 Aug	Tue 16 Aug	Wed 17 Aug	Thu 18 Aug
1	libraries	library	library	libraries	libraries
2	library	libraries	privacy	library	library
3	librarians	librarians	libraries	librarians	librarians
4	session	session	librarians	indigenous	public
5	access	copyright	session	session	session
6	world	wikipedia	people	knowledge	need
7	public	digital	data	access	data
8	copyright	make	indigenous	data	impact
9	people	world	make	literacy	new
10	need	internet	access	need	digital
11	data	access	wikipedia	great	world
12	make	new	use	people	thanks
13	privacy	need	information	research	access
14	digital	use	world	public	value
15	new	public	public	new	national
16	wikipedia	future	knowledge	marketing	change
17	indigenous	people	copyright	general	privacy
18	use	message	homeless	open	great
19	information	collections	literacy	world	work
20	great	information	oclc	archives	research
21	knowledge	content	great	just	use
22	literacy	open	homelessness	national	people
23	internet	report	need	assembly	knowledge
24	work	space	freedom	place	social
25	thanks	trend	like	make	using
26	message	great	thanks	read	know
27	future	net	internet	community	make
28	change	work	info	social	services
29	social	neutrality	latest	reading	skills
30	open	making	experiencing	work	award
31	just	update	theft	information	information
32	research	books	important	use	learning
33	know	collection	just	learn	users
34	community	social	subject	share	book
35	important	design	change	matters	user
36	oclc	data	guidelines	key	best
37	collections	thanks	digital	know	collections
38	books	librarian	students	global	academic
39	learn	know	know	government	measure
40	opening	shaping	online	life	poland
41	read	google	protect	thanks	community
42	impact	change	working	important	learn
43	place	literacy	statement	development	outcomes
44	good	just	work	love	share
45	services	technology	future	impact	time
46	national	online	read	archivist	media
47	best	poster	award	good	section
48	latest	info	create	books	important
49	report	working	services	cultural	service
50	users	law	good	help	closing

I have shared on figshare a datset containing the summaries above as well as the raw top 300 most frequent terms for the whole set as well as divided per day. The dataset also includes the top 100 most frequent terms lists per day that I manually edited after having applied the edited English stop word filter.

You can download the spreadsheet from figshare:

Priego, Ernesto (2016): #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Please bear in mind that as refining was done manually and the Terms tool does not always seem to apply stop words evenly there might be errors. This is why the raw output was shared as well. This data should be taken to be indicative only.

As it is increasingly recommended for data sharing, the CC-0 license has been applied to the resulting output in the repository. It is important however to bear in mind that some terms appearing in the dataset might be licensed individually differently; copyright of the source Tweets -and sometimes of individual terms- belongs to their authors. Authorial/curatorial/collection work has been performed on the shared file as a curated dataset resulting from analysis, in order to make it available as part of the scholarly record. If this dataset is consulted attribution is always welcome.

Ideally for proper reproducibility and to encourage other studies the whole archive dataset should be available. Those wishing to obtain the whole Tweets should still be able to get them themselves via text and data mining methods.

Conclusions

Indeed, for us today there is absolutely nothing surprising about the term ‘libraries’ being the most frequent word in Tweets coming from IFLA’s World Library and Information Congress. Looking at the whole dataset, however, provides an insight into other frequent terms used by Library and Information professionals in the context of libraries. These terms might not remain frequent for long, and might not have been frequent words in the past (I can only hypothesise– having evidence would be nice).

A key hypothesis for me guiding this exercise has been that perhaps by looking at the words appearing in social media outputs discussing and reporting from a professional association’s major congress, we can get a vague idea of where a sector’s concerns are/were.

I guess it can be safely said that words become meaningful in context. In an age in which repetition and frequency are key to public constructions of cultural relevance (‘trending topics’ increasingly define the news agenda… and what people talk about and how they talk about things) the repetition and frequency of key terms might provide a type of meaningful evidence in itself. Evidence, however, is just the beginning– further interpretation and analysis must indeed follow.

One cannot obtain the whole picture from decomposing a collectively, socially, publicly created textual corpus (or perhaps any corpus, unless it is a list of words from the start) into its constituent parts. It could also be said that many tools and methods often tell us more about themselves (and those using them) than about the objects of study.

So far text analysis (Rockwell 2003) and ‘distant reading’ through automated methods has focused on working with books (Ramsay 2014). However I’d like to suggest that this kind of text analysis can be another way of reading social media texts and offer another way to contribute to the assessment of their cultural relevance as living documents of a particular setting and moment in time. Who knows, they might also be telling us something about the present perception and activity of a professional field- and might help us to compare it with those in the future.

Other Considerations

Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailon, Sandra, et al, 2012).

Apart from the filters and limitations already declared, it cannot be guaranteed that each and every Tweet tagged with #WLIC2016 during the indicated period was analysed. The dataset was shared for archival, comparative and indicative educational research purposes only.

Only content from public accounts, obtained from the Twitter Search API, was analysed. The source data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.

These posts and the resulting dataset contain the results of analyses of Tweets that were published openly on the Web with the queried hashtag; the content of the Tweets is responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.

This work is shared to archive, document and encourage open educational research into scholarly activity on Twitter. The resulting dataset does not contain complete Tweets nor Twitter metadata. No private personal information was shared. The collection, analysis and sharing of the data has been enabled and allowed by Twitter’s Privacy Policy. The sharing of the results complies with Twitter’s Developer Rules of the Road.

A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). Tweets published publicly by scholars or other professionals during academic conferences are often publicly tagged (labeled) with a hashtag dedicated to the conference in question. This practice used to be the confined to a few ‘niche’ fields; it is increasingly becoming the norm rather than the exception.

Though every reason for Tweeters’ use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.

In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter’s Privacy and data sharing policies.

Professional associations like the Modern Language Association and the American Pyschological Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter’s search API has well-known temporal limitations for retrospective historical search and collection.

Beyond individual Tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. Though this work has limitations and might not be thoroughly systematic, it is hoped it can contribute to developing new insights into a discipline’s public concerns as expressed on Twitter over time.

References

González-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, Assessing the Bias in Samples of Large Online Networks (December 4, 2012). Available at SSRN: http://dx.doi.org/10.2139/ssrn.2185134

Priego, Ernesto (2016) #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Ramsay, Stephen (2014) “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee, 111-20. Ann Arbor: University of Michigan Press, 2014. Also available at http://quod.lib.umich.edu/d/dh/12544152.0001.001/1:5/–pastplay-teaching-and-learning-history-with-technology?g=dculture;rgn=div1;view=fulltext;xc=1

Rockwell, Geoffrey (2003) “What is Text Analysis, Really? [PDF]” preprint, Literary and Linguistic Computing, vol. 18, no. 2, 2003, p. 209-219.

Related

Published by Ernesto Priego

2 thoughts on “Libraries! Most Frequent Terms in #WLIC2016 Tweets (part IV)”