Improving online market research, customer relations management, public reaction analysis and search portal development through social sentiment analysis

Submitting Institution

University of Wolverhampton

Unit of Assessment

Communication, Cultural and Media Studies, Library and Information Management 

Summary Impact Type

Societal

Research Subject Area(s)

Information and Computing Sciences: Artificial Intelligence and Image Processing
Language, Communication and Culture: Linguistics


Download original

PDF

Summary of the impact

The Statistical Cybermetrics Research Group (SCRG) has developed social science sentiment analysis methods that estimate the strength of positive and negative sentiment in short informal social web text. These methods are encapsulated in the SentiStrength software, which is sold commercially, used commercially to develop socially useful computing applications (e.g., question answering systems, customer relations management systems), used to engage the public in science-related entertaining events, and used for data journalism to inform the public about specific news events. The research includes the development and evaluation of new sentiment analysis techniques that can detect informal expressions of sentiment in social web texts and that can detect the strength of positive and negative sentiment and not just its polarity. The research also includes the development of commercially viable software that includes the sentiment analysis methods.

The research has economic impact by enhancing the performance of commercial software systems, benefitting the owners of these systems (e.g., Yahoo!, Inbenta, Gemius, New Cities Foundation). The research also has economic impact by enhancing the customer relations of companies using sentiment-enhanced customer relations management systems, and with the traffic congestion detection system helping people to get to work on time. It has wide public services impact by helping people to find answers to their questions (via Yahoo! Answers). It has societal impact by supporting newsworthy analyses of social phenomena for the media. It has enhanced cultural life by driving spectacular lightshows during the London Olympics.

Underpinning research

The field of sentiment analysis is concerned with developing computerised methods to identify sentiment in written texts. Significant research into sentiment analysis methods has conducted by computational linguists, typically focusing on the commercially relevant domain of product reviews, with the goal of helping market research by automatically extracting consumer opinions about clients' products. In contrast, the SCRG developed sentiment analysis software from an explicitly social perspective, focussing on general social web texts rather than product reviews. The SCRG's key contributions are the following.

* Developing a new dual positive-negative sentiment strength classification scheme that gives each text a simultaneous positive and negative score. This is based upon social psychology of emotions that believes humans to be capable of feeling positive and negative sentiment simultaneously, and our own observation that positive and negative sentiment are frequently expressed simultaneously in the social web, even in short phrases such as "miss you".

* Developing a set of relatively domain independent sentiment analysis techniques (i.e., not specific to one type of data or type of web site) targeted at the short informal text typical of the social web. As an example, one rule specifies that additional duplicate letters in a sentiment word beyond 1 increase the strength of sentiment in the word: haaaapy is equivalent to "very happy".

* Encapsulating the sentiment analysis methods within a commercial product, SentiStrength, that is fast enough to handle large volumes of text. The Windows version SentiStrength is free online on the web site sentistrength.wlv.ac.uk and there is also a Java version, which is the commercial product.

* Evaluating SentiStrength against a range of alternatives, showing that SentiStrength gives comparable accuracy with a fraction of the effort. These alternatives are a range of standard machine learning methods with various different feature sets - mainly 1-3 grams with feature selection, a total of 690 main variations.

* Modifying the initial version of SentiStrength to make it more language-independent and supporting the development of different language versions by ourselves and others (currently English, Dutch, German, Spanish, Finnish, and Russian). The modifications are twofold: ensuring that all language-specific resources are in external plain text files for ease of adaptation, and adding parameters that can be customised by language. For instance, the negation position parameter allows negating words to modify preceding sentiment words in languages like German in which negating words occur after sentiment words (e.g., The formulation "I am happy not" is OK in German but not in English).

* Proving the value of SentiStrength through its application to understand the role of sentiment in different social web domains, including extended studies of Twitter and YouTube.

The main researchers are Mike Thelwall, Kevan Buckley, Pardeep Sud and Georgios Paltoglou. The research was conducted by these researchers and Di Cai, all at Wolverhampton University. Other co-authors of the publications listed in section 3 provided theoretical background from psychology (Kappas) and a classification idea for YouTube texts (Vis).

References to the research

Quality of evidence: Five of the references are in the top journal of library and information science, with the fifth being in a new ACM journal with a low acceptance rate (14%). The core paper [ref 1] had been cited 162 times, according to Google Scholar, by November 2013. This is an indicator the quality of this research from the perspective of the academic community.

1. [SentiStrength initial development] Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.

 

2. [SentiStrength improvement]. Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology, 63(1), 163-173.

 

3. [Further SentiStrength improvement]. Thelwall, M., & Buckley, K. (2013). Topic-based sentiment analysis for the Social Web: The role of mood and issue-related words. Journal of the American Society for Information Science and Technology, 64(8), 1608-1617.

 

4. [SentiStrength application to Twitter] Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 62(2), 406-418.

 

5. [SentiStrength application to YouTube] Thelwall, M., Sud, P., & Vis, F. (2012). Commenting on YouTube videos: From Guatemalan rock to El Big Bang. Journal of the American Society for Information Science and Technology, 63(3), 616-629.

 

6. [Assessing alternative sentiment analysis metrics]. Paltoglou G., Thelwall M., (2012). Twitter, MySpace, Digg: Unsupervised sentiment analysis in social media, ACM Transactions on Intelligent Systems and Technology, Special Issue on Search and Mining User Generated Contents, 3(4), Article 66. (Acceptance Rate: 14%).

 

Details of the impact

Economic prosperity: SentiStrength enabled Yahoo! to develop an improved question answering service. Yahoo! incorporates a question and answer forum (Yahoo! Answers) in which users ask questions and other users suggest answers. Yahoo! then presents these suggested answers back to the original questioner and anyone else interested in the question. Questions often elicit many suggested answers, some of which are useful and others are unhelpful, inaccurate or spam. In response, Yahoo! ranks the answers and returns them in order, with the most promising answers being returned first. Unfortunately, promising answers can be very difficult to detect automatically so answer-ranking is a difficult problem. Using SentiStrength, Yahoo! has improved the ranking of the suggested answers to questions by detecting sentiment in the answers and also by detecting sentiment in feedback sent to the answerers (e.g., gratitude or criticism). The reasons why SentiStrength rather than any other sentiment analysis software has made this possible are its speed, its ability to process informal text, and its capability to effectively process general web text. The improved performance of Yahoo! Answers is an economic benefit to Yahoo! Evidence: see cited articles. [refs 1-3]

Public services: The improved performance of Yahoo! Answers incorporates a significant public services benefit of an information seeking type relevant to the library and information science discipline of the research group. According to Alexa.com, Yahoo.com was the fourth most visited website in the world and Yahoo! has claimed 200 million users for the Yahoo! Answers service [ref 11] and so the potential number of beneficiaries of an improved Yahoo! Answers is 200 million per year. More specifically, these 200 million persons are more likely to find good answers to questions that they think are important enough to pose to Yahoo! Answers. These questions are likely to cover a wide variety of issues from work-related to relationship advice. The improvement of a service to satisfy a wide variety of information needs for hundreds of millions of people generates a huge benefit to society. Evidence: see cited articles. [refs 1-3]

Economic prosperity and public services: SentiStrength enabled the New Cities Foundation to develop a system to automatically detect traffic congestion by monitoring the location and sentiment of Tweets and predicting that congestion might be occurring in areas with negative tweets. Evidence: see cited article. [ref 4]

Economic prosperity (sales): The SentiStrength software has been sold to 7 businesses for £1000 each (including CompanyBook, Norway; sosolimited, USA) and has been given free to partner organisations (Yahoo! Barcelona, Inbenta, Barcelona and Gemius SA, Warsaw) and start- up companies (Tweetsport, Edinburgh, Sam and Jo, Australia; ComplaintLink, USA) to be paid for once profitable. These businesses have used it for marketing, customer relations management and search portal development. Evidence: sales and sales agreements. [ref 5]

Cultural life: SentiStrength was used to detect sentiment in Tweets related to the London Olympics, with the results illustrated with a light display on the London Eye every evening during the Olympics. Twitter was monitored for Olympic-related tweets from the start of the torch relay until the end of the Olympics and Paralympics. Each tweet was classified for sentiment by SentiStrength and the proportion of positive and negative tweets on each day was highlighted on the London Eye, turning it into a huge pie chart by lighting up bulbs on the outside of the wheel and then continuing into a light show for 30 minutes. This entertaining use funded by EDF Energy was accompanied by a control room open to the public at the base of the London Eye to explain how the technology worked. Evidence: see cited articles/web pages. This initiative reached an audience of millions via major UK and world news outlets (e.g., BBC News, Time Magazine, The Telegraph, The Mirror, plus Olympics attendees), web pages and a Twitter App. Evidence: see a sample of cited articles. [refs 6-9] N.B. SentiStrength is part of an additional similar project that is secret but will be described on the SentiStrength website in February 2014.

Economic prosperity (improved customer relations and marketing): SentiStrength has improved marketing and customer relations management for businesses and their clients from 2010 onwards. These businesses are in the online market intelligence sector and have their own text gathering and analysis software that they offer as a service to their clients (e.g., Inbenta's multilingual Social Media Monitoring and Social Media Management tools). SentiStrength is superior to traditional sentiment analysis programs for (a) its generic ability to give good performance across a range of texts, (b) its capability to process the informal text typical of the social web, (c) its speed, and (d) its transparency and ease of customisation for particular customer requirements, including language. SentiStrength is used to provide sentiment analysis for their clients for the text that they gather, which is typically free text customer feedback about their products (e.g., tweets, forum posts). Businesses using this sentiment-enhanced customer relations management services can quickly identify when their products or brands are starting to attract negative or positive comments in the social web and can react accordingly, such as by making product modifications or by emailing individual unhappy customers to offer advice. This impact has been realised in the users of customer relations and marketing services that incorporate SentiStrength. Services like CompanyBook extend this capability to sentiment analysis of other businesses with social media data. Some users have also developed SentiStrength for different languages (Atbrox: Finnish, Inbenta: Spanish, French, Portuguese), extending its scope of impact. Evidence: sales and sales agreements.

Public discourse: SentiStrength aided public understanding of social phenomena in 2012. SentiStrength was used to analyse tweets relevant to political and other events to give insights to the public about these events. For example, analyses of tweets relating to the UK riots were published in The Guardian newspaper as part of a wider investigation of the Riots led by the University of Manchester. Evidence: see cited Guardian article. [ref 10]

Sources to corroborate the impact

  1. [Evidence of the commercial impact of SentiStrength: Paper written by Yahoo! describing developments using SentiStrength to identify the best answers to questions] Kucuktunc, O., Cambazoglu, B.B., Weber, I., & Ferhatosmanoglu, H. (2012). A large-scale sentiment analysis for Yahoo! Answers, Proceedings of the 5th ACM International Conference on Web Search and Data Mining. (see also evidence of a claim of 200 million users for Yahoo! Answers
    http://yanswersblog.com/index.php/archives/2009/12/14/yahoo-answers-hits-200-million-visitors- worldwide/)
  2. [Evidence of the commercial impact of SentiStrength: Paper written by Yahoo! describing developments using SentiStrength to identify the best answers to questions] Weber, I, Ukkonen, A., & Gionis, A. (2012). Answers, not links: extracting tips from Yahoo! Answers to address how-to web queries, Proceedings of the fifth ACM international conference on Web search and data mining (WSDM '12).
  3. [Evidence of the commercial impact of SentiStrength: Paper written in collaboration with Yahoo! describing developments using SentiStrength for effective sentiment-focused web crawling] Vural, G. Cambazoglu, B.B. & Senkul P. (2012). Sentiment-focused web crawling, Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2020-2024.
  4. [Evidence of the commercial and social impact of SentiStrength: SentiStrength being used to help automatically detect congestion via Tweets] Greg Merritt: New Cities Foundation (2012), Connected Commuting: Research and Analysis on the New Cities Foundation Task Force in San Jose http://www.newcitiesfoundation.org/index.php/2012/12/new-cities- foundation-unveils-results-of-landmark-study-on-commuting-and-social-networks/ (SentiStrength is mentioned on page 16). See also Crowdsourcing your Commute (New York Times) http://rendezvous.blogs.nytimes.com/2012/12/10/crowdsourcing-your-commute/ .
  5. [Evidence of the commercial uptake of SentiStrength by Gemius SA, Poland]: Gemius (2012). Do you know that Poles are most grumpy in the Twitter posts on Wednesday?
    https://www.facebook.com/GemiusGroup/posts/505238729527205 and extended version in Polish https://www.gemius.pl/pl/aktualnosci/2013-01-28/01
  6. [Evidence of the public impact of the EDF Energy London Eye Olympics project] Grossman, S. (2012). Want to Light Up the London Eye? Just Tweet That the Olympics Are 'Totes Amazeballs', Time Magazine July 27, 2012. http://olympics.time.com/2012/07/27/want-to-light- up-the-london-eye-just-tweet-that-the-olympics-are-totes-amazeballs/ See also How it All
    Worked http://www.edfenergy.com/brand/energy-of-the-nation/how-it-works.shtml or
    http://wayback.archive.org/web/*/ http://www.edfenergy.com/brand/energy-of-the-nation/how-it-works.shtml
  7. [Evidence of the impact of the EDF Energy London Eye Olympics project] UK Daily Telegraph article, p. 27, 19 July 2012, "Happy Olympic tweeters to light up London Eye" in "the world's first social media driven light show".
    http://www.telegraph.co.uk/technology/news/9408783/Happy-Olympic-tweeters-to-light-up-London-Eye.html
  8. [Evidence of the impact of the EDF Energy London Eye Olympics project] BBC News article: 20 July 2012, London Eye Olympic Twitter positivity lightshow launched.
    http://www.bbc.co.uk/news/uk-england-london-18918318
  9. [Evidence of the impact of the EDF Energy London Eye Olympics project] UK Daily Mirror article, 20 July 2012, The mood of the nation: Tweets to power spectacular London 2012 light show. http://www.mirror.co.uk/sport/other-sports/london-2012-tweets-to-power-spectacular-1149197
  10. [Evidence of the social impact of the riots research: Guardian riots article, online version] How 2.6m tweets were analysed to understand reaction to the riots.
    http://www.guardian.co.uk/uk/2011/dec/07/how-tweets-analysed-understand-riots
  11. [Evidence of the popularity of the Yahoo! Answers system] Yahoo! Answers hits 200 million visitors - December 14, 2009. Yanswersblog.com.
    http://yanswersblog.com/index.php/archives/2009/12/14/yahoo-answers-hits-200-million-visitors- worldwide/ and http://www.alexa.com/topsites