What Library Folk Live Tweet About: Most Frequent Terms in #WLIC2016 Tweets

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress. Logo copyright by IFLA, CC BY 4.0.

Part 2 is  here, part 3  here and the final, fourth part is here.

IFLA stands for The International Federation of Library Associations and Institutions.

The IFLA World Library and Information Congress 2016 and 2nd IFLA General Conference and Assembly, ‘Connections. Collaboration. Community’ is currently taking place (13–19 August 2016) at the Greater Columbus Convention Center (GCCC) in Columbus, Ohio, United States.

The official hashtag of the conference is #WLIC2016. Earlier, I shared a searchable, live archive of the hashtag here. (Page may be slow to load depending on bandwidth).

I have looked at the text from 4,945 Tweets published with #WLIC2016 from 14/08/2016 to 15/08/2016 11:16:06 (EDT, Columbus Ohio time). Only accounts with at least 1 follower were included. I collected them with Martin Hawksey’s TAGS.

According to Voyant Tools this corpus had 82,809 total words and 7,506 unique word forms.

I applied an English stop word list which I edited to include Twitter-specific terms (https, t.co, amp (&) etc.), proper names (Barack Obama, other personal usernames) and some French stop words (mainly personal pronouns). I also edited the stop word list to include some dataset-specific terms such as the conference hashtag and other common hashtags, ‘ifla’, etc. (I left others that could also be considered dataset-specific terms, such as ‘session’ though).

The result was a listing of of 800 frequent terms (the least frequent terms in the list had been repeated 5 times). I then cleaned the data from any dataset-specific stop words that the stop word list did not filter and created an edited ordered listing of the most frequent 50 terms. I left in organisations’ Twitter user names (including @potus), as well as other terms that may not seem that meaningful  on their own (but who knows, they may be).

It must be taken into account the corpus included Retweets; each RT counted as a single Tweet, even if that meant terms were being logically repeated. This means that term counts in the list reflect the fact the dataset contains Retweets (which obviously implies the repetition of text).

If for some reason you are curious about what the most frequent words in #WLIC2016 Tweets were during this initial period (see above), here’s the top 50:

Term Count
libraries

543

copyright

517

librarians

484

library

406

session

374

world

326

message

271

opening

249

access

226

make

204

digital

195

internet

162

future

161

information

157

new

146

use

141

people

138

president

131

potus

125

literacy

118

need

117

oclc

114

ceremony

113

dpla

109

poster

105

thanks

103

collections

102

public

100

delegates

99

cilipinfo

98

countries

95

iflatrends

95

google

93

shaping

91

work

89

drag

83

report

83

create

81

open

81

data

79

content

78

learn

78

latest

77

making

77

fight

76

ifla_arl

75

read

74

info

73

exceptions

69

great

68

So for what it’s worth those were the 5o most frequent terms in the corpus.

I, for one, not being present in the Congress, found it interesting that ‘copyright’ is the second most frequent term, following ‘libraries’. One notices also that, unsurprisingly, the listing of top most frequent terms includes some key terms (such as ‘access’, ‘internet’, ‘digital’, ‘open’, ‘data’) concerning Library and Information professionals of late.

Were these the terms you’d have expected to make a ‘top 50’ in almost 5,000 Tweets from this initial phase of this particular conference?

The conference hasn’t finished yet of course. But so far, for a libraries and information world congress, which terms would you say are noticeable by their absence in this list? ;-)

Part 2 is  here, part 3  here and the final, fourth part is here.