Can We Use Big Social Data To Predict The Election Outcome?

As social media continues to grow and become increasingly advanced, more and more people are starting to use it as a way to predict future trends, or as seen more recently, to predict who is going to win the upcoming UK General Election.

But how can social media be used to predict the winner?

Sentiment analysis is now a very useful tool for companies to use when they wish to gain insight into how customers feel about certain products, marketing campaigns, their brand or the overall company.

Sentiment analysis involves an automated process of analysing conversations that are taking place online and is therefore how many analysts can voice their opinion on who the winner of the General Election might be. 

However, the question of how accurate this analysis is must be raised. After all, the automated process relies heavily on algorithms, a set number of recognised keywords and a reliance on technology to determine irony, sarcasm etc.

For instance is technology able to determine whether the following tweet of “Thanks Ed Milliband for running a fantastic election campaign – not” is positive or negative? In all likelihood this tweet would be determined as positive, simply because of the language being used, despite it actually being sarcastic, and in this case negative.

We have seen a number of businesses set up recently with the aim of examining the growing trend, in sentiment analysis such as the Centre for the Analysis of Social Media, which has been set up by Carl Miller and his team at the think-tank Demos.

Carl and his team have access to Twitter’s ‘firehose’, which, essentially, is the only way to monitor 100% of tweets in real-time. Firehose enables them to monitor tweets from all around the globe, which was especially useful during the recently televised general election debates.

The Demos sentiment algorithm (based on technology that has been developed by the Text Analytics Group at the University of Sussex) picked up 420,000 relevant tweets, which were then classified as “cheers” (positive) or “boos” (negative). Leading the way with 83% cheers was Nicola Sturgeon (SNP), Nick Clegg (Lib Dem) was distinctly average with 48% cheers, and David Cameron (Conservative) was bringing up the rear with 32% cheers.

Dr Jeremy Reffin at the University of Sussex explained more about the way the sentiment algorithm and system works, defining the process as such:

  • 1. Humans identify relevant hashtags
  • 2. The algorithm is taught how to classify each tweet as positive, negative or neutral. This is done by using Natural Language Processing technology, which helps the algorithm learn how to differentiate opinion from factual statements.
  • 3. Assisted machine learning is then used, and is essentially where the computer completes sentiment examples and then checks with humans whether it is making its sentiment decisions correctly. When it makes a mistake, it learns from that mistake for next time.

As with all new technology, there are kinks to work out! As previously mentioned computers find it difficult to correctly identify specific emotions in text, such as irony or sarcasm, with Dr Reffin agreeing that “computers have a real problem with sarcasm.” Or even just random statements, such as “Ad-break. Time for a kitten in a hat. #leadersdebate”. This post was determined as a cheer when in fact it’s not saying anything positive about the debate itself.

There are many more examples of these mistakes and not just in the case of the leaders’ debate. A tweet can be absolutely positive, yet the wording is determined as negative by the algorithm and vice versa. Something that Mr Wibberley, a doctoral student at the university argues shouldn’t be taken on a case by case basis but on a larger scale, where he states the system is a lot more accurate.

Something else to take account of, especially in a case such as the General Election, are the tweets flying back and forth between various journalists and political professionals. This comes under the term network analysis and adds yet another complicated layer to the sentiment system.

Of course the collection of social data comes from more than just Twitter. We also have to consider Facebook, where the number of users still dwarfs the number on Twitter. As with Twitter, people do share their opinion about the political parties frequently and openly, which certainly gives data analysts a lot to work with.

For instance results showed that UKIP received 9.7 million interactions, the Conservatives 8.2 million, Labour 6.6 million, Liberal Democrats 1.3 million and the SNP 1.3 million. However, Elizabeth Linder, a Facebook politics specialist advises people to use caution when using such data in fear of over-interpreting it.

After all just because someone shares a positive post about the Liberal Democrats does not actually mean that they are agreeing with the sentiment of the original post, so such data could well lead to analysis becoming skewed.

Facebook data analysis has come on leaps and bounds over recent years, however, many social scientists are still wary over attributing too much faith in what Facebook might determine a direct relationship between a user’s likes and their political views.  Something that Carl Miller certainly recognises, stating that “It’ll be quite some time before [big data] can stand shoulder to shoulder with the social sciences in terms of how rigorous it is”.

The answer, therefore, to the question about whether social data can be used to predict the outcome of the General Election is probably a non-committal “not just yet”. Many traditionalists might prefer, instead, to stick to the opinion polls but who knows, perhaps social data might well be the norm by the time we get round to the next General Election!

Leave a Reply