Improving online market research, customer relations management, public reaction analysis and search portal development through social sentiment analysis
Submitting Institution
University of WolverhamptonUnit of Assessment
Communication, Cultural and Media Studies, Library and Information Management Summary Impact Type
SocietalResearch Subject Area(s)
Information and Computing Sciences: Artificial Intelligence and Image Processing
Language, Communication and Culture: Linguistics
Summary of the impact
The Statistical Cybermetrics Research Group (SCRG) has developed social
science sentiment analysis methods that estimate the strength of positive
and negative sentiment in short informal social web text. These methods
are encapsulated in the SentiStrength software, which is sold
commercially, used commercially to develop socially useful computing
applications (e.g., question answering systems, customer relations
management systems), used to engage the public in science-related
entertaining events, and used for data journalism to inform the public
about specific news events. The research includes the development and
evaluation of new sentiment analysis techniques that can detect informal
expressions of sentiment in social web texts and that can detect the strength
of positive and negative sentiment and not just its polarity. The research
also includes the development of commercially viable software that
includes the sentiment analysis methods.
The research has economic impact by enhancing the performance of
commercial software systems, benefitting the owners of these systems
(e.g., Yahoo!, Inbenta, Gemius, New Cities Foundation). The research also
has economic impact by enhancing the customer relations of
companies using sentiment-enhanced customer relations management systems,
and with the traffic congestion detection system helping people to get to
work on time. It has wide public services impact by helping people
to find answers to their questions (via Yahoo! Answers). It has societal
impact by supporting newsworthy analyses of social phenomena for the
media. It has enhanced cultural life by driving spectacular
lightshows during the London Olympics.
Underpinning research
The field of sentiment analysis is concerned with developing computerised
methods to identify sentiment in written texts. Significant research into
sentiment analysis methods has conducted by computational linguists,
typically focusing on the commercially relevant domain of product reviews,
with the goal of helping market research by automatically extracting
consumer opinions about clients' products. In contrast, the SCRG developed
sentiment analysis software from an explicitly social perspective,
focussing on general social web texts rather than product reviews. The
SCRG's key contributions are the following.
* Developing a new dual positive-negative sentiment strength
classification scheme that gives each text a simultaneous positive and
negative score. This is based upon social psychology of emotions that
believes humans to be capable of feeling positive and negative sentiment
simultaneously, and our own observation that positive and negative
sentiment are frequently expressed simultaneously in the social web, even
in short phrases such as "miss you".
* Developing a set of relatively domain independent sentiment analysis
techniques (i.e., not specific to one type of data or type of web site)
targeted at the short informal text typical of the social web. As an
example, one rule specifies that additional duplicate letters in a
sentiment word beyond 1 increase the strength of sentiment in the word:
haaaapy is equivalent to "very happy".
* Encapsulating the sentiment analysis methods within a commercial
product, SentiStrength, that is fast enough to handle large volumes of
text. The Windows version SentiStrength is free online on the web site
sentistrength.wlv.ac.uk and there is also a Java version, which is the
commercial product.
* Evaluating SentiStrength against a range of alternatives, showing that
SentiStrength gives comparable accuracy with a fraction of the effort.
These alternatives are a range of standard machine learning methods with
various different feature sets - mainly 1-3 grams with feature selection,
a total of 690 main variations.
* Modifying the initial version of SentiStrength to make it more
language-independent and supporting the development of different language
versions by ourselves and others (currently English, Dutch, German,
Spanish, Finnish, and Russian). The modifications are twofold: ensuring
that all language-specific resources are in external plain text files for
ease of adaptation, and adding parameters that can be customised by
language. For instance, the negation position parameter allows negating
words to modify preceding sentiment words in languages like German in
which negating words occur after sentiment words (e.g., The formulation "I
am happy not" is OK in German but not in English).
* Proving the value of SentiStrength through its application to
understand the role of sentiment in different social web domains,
including extended studies of Twitter and YouTube.
The main researchers are Mike Thelwall, Kevan Buckley, Pardeep Sud and
Georgios Paltoglou. The research was conducted by these researchers and Di
Cai, all at Wolverhampton University. Other co-authors of the publications
listed in section 3 provided theoretical background from psychology
(Kappas) and a classification idea for YouTube texts (Vis).
References to the research
Quality of evidence: Five of the references are in the top journal of
library and information science, with the fifth being in a new ACM journal
with a low acceptance rate (14%). The core paper [ref 1] had been cited
162 times, according to Google Scholar, by November 2013. This is an
indicator the quality of this research from the perspective of the
academic community.
1. [SentiStrength initial development] Thelwall, M., Buckley, K.,
Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength
detection in short informal text. Journal of the American Society for
Information Science and Technology, 61(12), 2544-2558.
2. [SentiStrength improvement]. Thelwall, M., Buckley, K., &
Paltoglou, G. (2012). Sentiment strength detection for the social Web. Journal
of the American Society for Information Science and Technology,
63(1), 163-173.
3. [Further SentiStrength improvement]. Thelwall, M., & Buckley, K.
(2013). Topic-based sentiment analysis for the Social Web: The role of
mood and issue-related words. Journal of the American Society for
Information Science and Technology, 64(8), 1608-1617.
4. [SentiStrength application to Twitter] Thelwall, M., Buckley, K.,
& Paltoglou, G. (2011). Sentiment in Twitter events. Journal of
the American Society for Information Science and Technology, 62(2),
406-418.
5. [SentiStrength application to YouTube] Thelwall, M., Sud, P., &
Vis, F. (2012). Commenting on YouTube videos: From Guatemalan rock to El
Big Bang. Journal of the American Society for Information Science and
Technology, 63(3), 616-629.
6. [Assessing alternative sentiment analysis metrics]. Paltoglou G.,
Thelwall M., (2012). Twitter, MySpace, Digg: Unsupervised sentiment
analysis in social media, ACM Transactions on Intelligent Systems and
Technology, Special Issue on Search and Mining User Generated
Contents, 3(4), Article 66. (Acceptance Rate: 14%).
Details of the impact
Economic prosperity: SentiStrength enabled Yahoo! to
develop an improved question answering service. Yahoo! incorporates a
question and answer forum (Yahoo! Answers) in which users ask questions
and other users suggest answers. Yahoo! then presents these suggested
answers back to the original questioner and anyone else interested in the
question. Questions often elicit many suggested answers, some of which are
useful and others are unhelpful, inaccurate or spam. In response, Yahoo!
ranks the answers and returns them in order, with the most promising
answers being returned first. Unfortunately, promising answers can be very
difficult to detect automatically so answer-ranking is a difficult
problem. Using SentiStrength, Yahoo! has improved the ranking of the
suggested answers to questions by detecting sentiment in the answers and
also by detecting sentiment in feedback sent to the answerers (e.g.,
gratitude or criticism). The reasons why SentiStrength rather than any
other sentiment analysis software has made this possible are its speed,
its ability to process informal text, and its capability to effectively
process general web text. The improved performance of Yahoo! Answers is an
economic benefit to Yahoo! Evidence: see cited articles. [refs 1-3]
Public services: The improved performance of Yahoo! Answers
incorporates a significant public services benefit of an information
seeking type relevant to the library and information science discipline of
the research group. According to Alexa.com, Yahoo.com was the fourth most
visited website in the world and Yahoo! has claimed 200 million users for
the Yahoo! Answers service [ref 11] and so the potential number of
beneficiaries of an improved Yahoo! Answers is 200 million per year. More
specifically, these 200 million persons are more likely to find good
answers to questions that they think are important enough to pose to
Yahoo! Answers. These questions are likely to cover a wide variety of
issues from work-related to relationship advice. The improvement of a
service to satisfy a wide variety of information needs for hundreds of
millions of people generates a huge benefit to society. Evidence:
see cited articles. [refs 1-3]
Economic prosperity and public services: SentiStrength
enabled the New Cities Foundation to develop a system to automatically
detect traffic congestion by monitoring the location and sentiment of
Tweets and predicting that congestion might be occurring in areas with
negative tweets. Evidence: see cited article. [ref 4]
Economic prosperity (sales): The SentiStrength software has
been sold to 7 businesses for £1000 each (including CompanyBook, Norway;
sosolimited, USA) and has been given free to partner organisations (Yahoo!
Barcelona, Inbenta, Barcelona and Gemius SA, Warsaw) and start- up
companies (Tweetsport, Edinburgh, Sam and Jo, Australia; ComplaintLink,
USA) to be paid for once profitable. These businesses have used it for
marketing, customer relations management and search portal development. Evidence:
sales and sales agreements. [ref 5]
Cultural life: SentiStrength was used to detect sentiment
in Tweets related to the London Olympics, with the results illustrated
with a light display on the London Eye every evening during the Olympics.
Twitter was monitored for Olympic-related tweets from the start of the
torch relay until the end of the Olympics and Paralympics. Each tweet was
classified for sentiment by SentiStrength and the proportion of positive
and negative tweets on each day was highlighted on the London Eye, turning
it into a huge pie chart by lighting up bulbs on the outside of the wheel
and then continuing into a light show for 30 minutes. This entertaining
use funded by EDF Energy was accompanied by a control room open to the
public at the base of the London Eye to explain how the technology worked.
Evidence: see cited articles/web pages. This initiative reached an
audience of millions via major UK and world news outlets (e.g., BBC News,
Time Magazine, The Telegraph, The Mirror, plus Olympics attendees), web
pages and a Twitter App. Evidence: see a sample of cited articles.
[refs 6-9] N.B. SentiStrength is part of an additional similar
project that is secret but will be described on the SentiStrength website
in February 2014.
Economic prosperity (improved customer relations and marketing):
SentiStrength has improved marketing and customer relations management for
businesses and their clients from 2010 onwards. These businesses are in
the online market intelligence sector and have their own text gathering
and analysis software that they offer as a service to their clients (e.g.,
Inbenta's multilingual Social Media Monitoring and Social Media Management
tools). SentiStrength is superior to traditional sentiment analysis
programs for (a) its generic ability to give good performance across a
range of texts, (b) its capability to process the informal text typical of
the social web, (c) its speed, and (d) its transparency and ease of
customisation for particular customer requirements, including language.
SentiStrength is used to provide sentiment analysis for their clients for
the text that they gather, which is typically free text customer feedback
about their products (e.g., tweets, forum posts). Businesses using this
sentiment-enhanced customer relations management services can quickly
identify when their products or brands are starting to attract negative or
positive comments in the social web and can react accordingly, such as by
making product modifications or by emailing individual unhappy customers
to offer advice. This impact has been realised in the users of customer
relations and marketing services that incorporate SentiStrength. Services
like CompanyBook extend this capability to sentiment analysis of other
businesses with social media data. Some users have also developed
SentiStrength for different languages (Atbrox: Finnish, Inbenta: Spanish,
French, Portuguese), extending its scope of impact. Evidence:
sales and sales agreements.
Public discourse: SentiStrength aided public understanding
of social phenomena in 2012. SentiStrength was used to analyse tweets
relevant to political and other events to give insights to the public
about these events. For example, analyses of tweets relating to the UK
riots were published in The Guardian newspaper as part of a wider
investigation of the Riots led by the University of Manchester.
Evidence: see cited Guardian article. [ref 10]
Sources to corroborate the impact
- [Evidence of the commercial impact of SentiStrength: Paper written by
Yahoo! describing developments using SentiStrength to identify the best
answers to questions] Kucuktunc, O., Cambazoglu, B.B., Weber, I., &
Ferhatosmanoglu, H. (2012). A large-scale sentiment analysis for Yahoo!
Answers, Proceedings of the 5th ACM International Conference on Web
Search and Data Mining. (see also evidence of a claim of 200 million
users for Yahoo! Answers
http://yanswersblog.com/index.php/archives/2009/12/14/yahoo-answers-hits-200-million-visitors-
worldwide/)
- [Evidence of the commercial impact of SentiStrength: Paper written by
Yahoo! describing developments using SentiStrength to identify the best
answers to questions] Weber, I, Ukkonen, A., & Gionis, A. (2012).
Answers, not links: extracting tips from Yahoo! Answers to address
how-to web queries, Proceedings of the fifth ACM international
conference on Web search and data mining (WSDM '12).
- [Evidence of the commercial impact of SentiStrength: Paper written in
collaboration with Yahoo! describing developments using SentiStrength
for effective sentiment-focused web crawling] Vural, G. Cambazoglu, B.B.
& Senkul P. (2012). Sentiment-focused web crawling, Proceedings of
the 21st ACM International Conference on Information and Knowledge
Management, pp. 2020-2024.
- [Evidence of the commercial and social impact of SentiStrength:
SentiStrength being used to help automatically detect congestion via
Tweets] Greg Merritt: New Cities Foundation (2012), Connected Commuting:
Research and Analysis on the New Cities Foundation Task Force in San
Jose
http://www.newcitiesfoundation.org/index.php/2012/12/new-cities-
foundation-unveils-results-of-landmark-study-on-commuting-and-social-networks/
(SentiStrength is mentioned on page 16). See also Crowdsourcing your
Commute (New York Times) http://rendezvous.blogs.nytimes.com/2012/12/10/crowdsourcing-your-commute/
.
- [Evidence of the commercial uptake of SentiStrength by Gemius SA,
Poland]: Gemius (2012). Do you know that Poles are most grumpy in the
Twitter posts on Wednesday?
https://www.facebook.com/GemiusGroup/posts/505238729527205
and extended version in Polish https://www.gemius.pl/pl/aktualnosci/2013-01-28/01
- [Evidence of the public impact of the EDF Energy London Eye Olympics
project] Grossman, S. (2012). Want to Light Up the London Eye? Just
Tweet That the Olympics Are 'Totes Amazeballs', Time Magazine
July 27, 2012. http://olympics.time.com/2012/07/27/want-to-light-
up-the-london-eye-just-tweet-that-the-olympics-are-totes-amazeballs/
See also How it All
Worked http://www.edfenergy.com/brand/energy-of-the-nation/how-it-works.shtml
or
http://wayback.archive.org/web/*/
http://www.edfenergy.com/brand/energy-of-the-nation/how-it-works.shtml
- [Evidence of the impact of the EDF Energy London Eye Olympics project]
UK Daily Telegraph article, p. 27, 19 July 2012, "Happy Olympic tweeters
to light up London Eye" in "the world's first social media driven light
show".
http://www.telegraph.co.uk/technology/news/9408783/Happy-Olympic-tweeters-to-light-up-London-Eye.html
- [Evidence of the impact of the EDF Energy London Eye Olympics project]
BBC News article: 20 July 2012, London Eye Olympic Twitter
positivity lightshow launched.
http://www.bbc.co.uk/news/uk-england-london-18918318
- [Evidence of the impact of the EDF Energy London Eye Olympics project]
UK Daily Mirror article, 20 July 2012, The mood of the nation:
Tweets to power spectacular London 2012 light show.
http://www.mirror.co.uk/sport/other-sports/london-2012-tweets-to-power-spectacular-1149197
- [Evidence of the social impact of the riots research: Guardian riots
article, online version] How 2.6m tweets were analysed to
understand reaction to the riots.
http://www.guardian.co.uk/uk/2011/dec/07/how-tweets-analysed-understand-riots
- [Evidence of the popularity of the Yahoo! Answers system] Yahoo!
Answers hits 200 million visitors - December 14, 2009. Yanswersblog.com.
http://yanswersblog.com/index.php/archives/2009/12/14/yahoo-answers-hits-200-million-visitors-
worldwide/ and http://www.alexa.com/topsites