Libraries! Most Frequent Terms in #WLIC2016 Tweets (part IV)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

 


 

This is part IV. For necessary context, methodology, limitations, please see here (part 1),  here (part 2), and here (part 3).

Since this was published and shared for the first time I may have done new edits. I often come back to posts once they have been published to revise them.

Throughout the process of performing the day by day text analysis I became aware of other limitations to take into account and I have revised part 3 accordingly.

Summary

Here’s a summary of the counts of the source (unrefined) #WLIC2016 archive I collected:

Number of Links

12435

Number of RTs estimate based on occurrence of RT

14570

Number of Tweets

23552

Unique Tweets <-used to monitor quality of archive

23421

First Tweet in Archive 14/08/2016 11:29:03 EDT
Last Tweet in Archive 22/08/2016 04:20:53 EDT
In Reply Ids

270

In Reply @s

429

Number of Tweeters

3035

As previously indicated the Tweet count includes RTs. This count might require further deduplication and it might include bots’ Tweets and possibly some unrelated Tweets.

Here’s a summary of the Tweet count of the #WLIC2016  dataset I refined from the complete archive. As I explained in part 3 I organised the Tweets into conference days, from Sunday 14 to Thursday 18 August. Each day was a different corpus to analyse. I also analysed the whole set as a single corpus to ensure the totals replicated.

Day Tweet count
Sunday 14 August 2016

2543

Monday 15 August 2016

6654

Tuesday 16 August 2016

4861

Wednesday 17 August 2016

4468

Thursday 18 August 2016

3801

Thursday – Sunday

22327

 

The Most Frequent Terms

The text analysis involved analysing each corpus, first obtaining a ‘raw’ output of 300 most frequent terms and their counts. As described in previous posts, I then applied an edited English stop words list followed by a manual editing of the top 100 most frequent terms (for the shared dataset) and of the top 50 for this post. Unlike before in this case I removed ‘barack’ and ‘obama’ from Thursday and Monday’s corpora, and tried to remove usernames and hashtags though it’s posssible that further disambiguation and refining might be needed in those top 100 and top 50.

The text analysis of the Sun-Thu Tweets as a single corpus gave us the following Top 50:

#WLIC2016 Sun-Thu Top 50 Most Frequent Terms (stop-words applied; edited)

Rank

Term Count

1

libraries

2895

2

library

2779

3

librarians

1713

4

session

1467

5

access

872

6

world

832

7

public

774

8

copyright

766

9

people

757

10

need

750

11

data

746

12

make

733

13

privacy

674

14

digital

629

15

new

615

16

wikipedia

602

17

indigenous

593

18

use

574

19

information

555

20

great

539

21

knowledge

512

22

literacy

502

23

internet

481

24

work

428

25

thanks

419

26

message

416

27

future

412

28

change

379

29

social

378

30

open

369

31

just

354

32

research

353

33

know

330

34

community

323

35

important

319

36

oclc

317

37

collections

312

38

books

300

39

learn

300

40

opening

291

41

read

289

42

impact

287

43

place

282

44

good

280

45

services

277

46

national

276

47

best

272

48

latest

269

49

report

267

50

users

266

As mentioned above I also analysed each day as a single corpus. I refined the ‘raw’ 300 most frequent terms per day to a top 100 after stop words and manual editing. I then laid them all out as a single table for comparison.

#WLIC2016 Top 50 Most Frequent Terms per Day Comparison (stop-words applied; edited)

Rank

Sun 14 Aug

Mon 15 Aug

Tue 16 Aug

Wed 17 Aug

Thu 18 Aug

1

libraries library library libraries libraries

2

library libraries privacy library library

3

librarians librarians libraries librarians librarians

4

session session librarians indigenous public

5

access copyright session session session

6

world wikipedia people knowledge need

7

public digital data access data

8

copyright make indigenous data impact

9

people world make literacy new

10

need internet access need digital

11

data access wikipedia great world

12

make new use people thanks

13

privacy need information research access

14

digital use world public value

15

new public public new national

16

wikipedia future knowledge marketing change

17

indigenous people copyright general privacy

18

use message homeless open great

19

information collections literacy world work

20

great information oclc archives research

21

knowledge content great just use

22

literacy open homelessness national people

23

internet report need assembly knowledge

24

work space freedom place social

25

thanks trend like make using

26

message great thanks read know

27

future net internet community make

28

change work info social services

29

social neutrality latest reading skills

30

open making experiencing work award

31

just update theft information information

32

research books important use learning

33

know collection just learn users

34

community social subject share book

35

important design change matters user

36

oclc data guidelines key best

37

collections thanks digital know collections

38

books librarian students global academic

39

learn know know government measure

40

opening shaping online life poland

41

read google protect thanks community

42

impact change working important learn

43

place literacy statement development outcomes

44

good just work love share

45

services technology future impact time

46

national online read archivist media

47

best poster award good section

48

latest info create books important

49

report working services cultural service

50

users law good help closing

I have shared on figshare a datset containing the summaries above as well as the raw top 300 most frequent terms for the whole set as well as divided per day. The dataset also includes the top 100 most frequent terms lists per day that I  manually edited after having applied the edited English stop word filter.

You can download the spreadsheet from figshare:

Priego, Ernesto (2016): #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Please bear in mind that as refining was done manually and the Terms tool does not always seem to apply stop words evenly there might be errors. This is why the raw output was shared as well. This data should be taken to be indicative only.

As it is increasingly recommended for data sharing, the CC-0 license has been applied to the resulting output in the repository. It is important however to bear in mind that some terms appearing in the dataset might be licensed individually differently; copyright of the source Tweets -and sometimes of individual terms- belongs to their authors.  Authorial/curatorial/collection work has been performed on the shared file as a curated dataset resulting from analysis, in order to make it available as part of the scholarly record. If this dataset is consulted attribution is always welcome.

Ideally for proper reproducibility and to encourage other studies the whole archive dataset should be available.  Those wishing to obtain the whole Tweets should still be able to get them themselves via text and data mining methods.

Conclusions

Indeed, for us today there is absolutely nothing surprising about the term ‘libraries’ being the most frequent word in Tweets coming from IFLA’s World Library and Information Congress. Looking at the whole dataset, however, provides an insight into other frequent terms used by Library and Information professionals in the context of libraries. These terms might not remain frequent for long, and might not have been frequent words in the past (I can only hypothesise– having evidence would be nice).

A key hypothesis for me guiding this exercise has been that perhaps by looking at the words appearing in social media outputs discussing and reporting from a professional association’s major congress, we can get a vague idea of where a sector’s concerns are/were.

I guess it can be safely said that words become meaningful in context. In an age in which repetition and frequency are key to public constructions of cultural relevance (‘trending topics’ increasingly define the news agenda… and what people talk about and how they talk about things) the repetition and frequency of key terms might provide a type of meaningful evidence in itself.  Evidence, however, is just the beginning– further interpretation and analysis must indeed follow.

One cannot obtain the whole picture from decomposing a collectively, socially, publicly created textual corpus (or perhaps any corpus, unless it is a list of words from the start) into its constituent parts. It could also be said that many tools and methods often tell us more about themselves (and those using them) than about the objects of study.

So far text analysis (Rockwell 2003) and ‘distant reading’ through automated methods has focused on working with books (Ramsay 2014). However I’d like to suggest that this kind of text analysis can be another way of reading social media texts and offer another way to contribute to the assessment of their cultural relevance as living documents of a particular setting and moment in time. Who knows, they might also be telling us something about the present perception and activity of a professional field- and might help us to compare it with those in the future.

Other Considerations

Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailon, Sandra, et al, 2012).

Apart from the filters and limitations already declared, it cannot be guaranteed that each and every Tweet tagged with #WLIC2016 during the indicated period was analysed. The dataset was shared for archival, comparative and indicative educational research purposes only.

Only content from public accounts, obtained from the Twitter Search API, was analysed.  The source data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.

These posts and the resulting dataset contain the results of analyses of Tweets that were published openly on the Web with the queried hashtag; the content of the Tweets is responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.

This work is shared to archive, document and encourage open educational research into scholarly activity on Twitter. The resulting dataset does not contain complete Tweets nor Twitter metadata. No private personal information was shared. The collection, analysis and sharing of the data has been enabled and allowed by Twitter’s Privacy Policy. The sharing of the results complies with Twitter’s Developer Rules of the Road.

A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). Tweets published publicly by scholars or other professionals during academic conferences are often publicly tagged (labeled) with a hashtag dedicated to the conference in question. This practice used to be the confined to a few ‘niche’ fields; it is increasingly becoming the norm rather than the exception.

Though every reason for Tweeters’ use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.

In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter’s Privacy and data sharing policies.

Professional associations like the Modern Language Association and the American Pyschological Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter’s search API has well-known temporal limitations for retrospective historical search and collection.

Beyond individual Tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. Though this work has limitations and might not be thoroughly systematic, it is hoped it can contribute to developing new insights into a discipline’s public concerns as expressed on Twitter over time.

References

González-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, Assessing the Bias in Samples of Large Online Networks (December 4, 2012).  Available at SSRN: http://dx.doi.org/10.2139/ssrn.2185134

Priego, Ernesto (2016) #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Ramsay, Stephen (2014) “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee, 111-20. Ann Arbor: University of Michigan Press, 2014. Also available at http://quod.lib.umich.edu/d/dh/12544152.0001.001/1:5/–pastplay-teaching-and-learning-history-with-technology?g=dculture;rgn=div1;view=fulltext;xc=1

Rockwell, Geoffrey (2003) “What is Text Analysis, Really? [PDF]” preprint, Literary and Linguistic Computing, vol. 18, no. 2, 2003, p. 209-219.

2 thoughts on “Libraries! Most Frequent Terms in #WLIC2016 Tweets (part IV)

Comments are closed.