Exploring the value of big data analysis of Twitter tweets and share prices

PhD Thesis

Wlodarczak, Peter. 2017. Exploring the value of big data analysis of Twitter tweets and share prices. PhD Thesis Doctor of Philosophy. University of Southern Queensland. https://doi.org/10.26192/5c05cde0d30cc

Exploring the value of big data analysis of Twitter tweets and share prices

TypePhD Thesis
AuthorWlodarczak, Peter
SupervisorAlly, Mustafa
Soar, Jeffrey
Institution of OriginUniversity of Southern Queensland
Qualification NameDoctor of Philosophy
Number of Pages166
Digital Object Identifier (DOI)https://doi.org/10.26192/5c05cde0d30cc

Over the past decade, the use of social media (SM) such as Facebook, Twitter, Pinterest and Tumblr has dramatically increased. Using SM, millions of users are creating large amounts of data every day. According to some estimates ninety per cent of the content on the Internet is now user generated. Social Media (SM) can be seen as a distributed content creation and sharing platform based on Web 2.0 technologies. SM sites make it very easy for its users to publish text, pictures, links, messages or videos without the need to be able to program. Users post reviews on products and services they bought, write about their interests and intentions or give their opinions and views on political subjects. SM has also been a key factor in mass movements such as the Arab Spring and the Occupy Wall Street protests and is used for human aid and disaster relief (HADR).

There is a growing interest in SM analysis from organisations for detecting new trends, getting user opinions on their products and services or finding out about their online reputation. Companies such as Amazon or eBay use SM data for their recommendation engines and to generate more business. TV stations buy data about opinions on their TV programs from Facebook to find out what the popularity of a certain TV show is. Companies such as Topsy, Gnip, DataSift and Zoomph have built their entire business models around SM analysis.

The purpose of this thesis is to explore the economic value of Twitter tweets. The economic value is determined by trying to predict the share price of a company. If the share price of a company can be predicted using SM data, it should be possible to deduce a monetary value. There is limited research on determining the economic value of SM data for “nowcasting”, predicting the present, and for forecasting. This study aims to determine the monetary value of Twitter by correlating the daily frequencies of positive and negative Tweets about the Apple company and some of its most popular products with the development of the Apple Inc. share price. If the number of positive tweets about Apple increases and the share price follows this development, the tweets have predictive information about the share price.

A literature review has found that there is a growing interest in analysing SM data from different industries. A lot of research is conducted studying SM from various perspectives. Many studies try to determine the impact of online marketing campaigns or try to quantify the value of social capital. Others, in the area of behavioural economics, focus on the influence of SM on decision-making. There are studies trying to predict financial indicators such as the Dow Jones Industrial Average (DJIA). However, the literature review has indicated that there is no study correlating sentiment polarity on products and companies in tweets with the share price of the company.

The theoretical framework used in this study is based on Computational Social Science (CSS) and Big Data. Supporting theories of CSS are Social Media Mining (SMM) and sentiment analysis. Supporting theories of Big Data are Data Mining (DM) and Predictive Analysis (PA). Machine learning (ML) techniques have been adopted to analyse and classify the tweets.

In the first stage of the study, a body of tweets was collected and pre-processed, and then analysed for their sentiment polarity towards Apple Inc., the iPad and the iPhone. Several datasets were created using different pre-processing and analysis methods. The tweet frequencies were then represented as time series. The time series were analysed against the share price time series using the Granger causality test to determine if one time series has predictive information about the share price time series over the same period of time. For this study, several Predictive Analytics (PA) techniques on tweets were evaluated to predict the Apple share price.

To collect and analyse the data, a framework has been developed based on the LingPipe (LingPipe 2015) Natural Language Processing (NLP) tool kit for sentiment analysis, and using R, the functional language and environment for statistical computing, for correlation analysis. Twitter provides an API (Application Programming Interface) to access and collect its data programmatically.

Whereas no clear correlation could be determined, at least one dataset was showed to have some predictive information on the development of the Apple share price. The other datasets did not show to have any predictive capabilities. There are many data analysis and PA techniques. The techniques applied in this study did not indicate a direct correlation. However, some results suggest that this is due to noise or asymmetric distributions in the datasets.

The study contributes to the literature by providing a quantitative analysis of SM data, for example tweets about Apple and its most popular products, the iPad and iPhone. It shows how SM data can be used for PA. It contributes to the literature on Big Data and SMM by showing how SM data can be collected, analysed and classified and explore if the share price of a company can be determined based on sentiment time series. It may ultimately lead to better decision making, for instance for investments or share buyback.

Keywordssocial media; Twitter; economic value
ANZSRC Field of Research 2020469999. Other information and computing sciences not elsewhere classified
460908. Information systems organisation and management
Byline AffiliationsSchool of Management and Enterprise
Permalink -


Download files

Published Version
File access level: Anyone

  • 1992
    total views
  • 1148
    total downloads
  • 5
    views this month
  • 41
    downloads this month

Export as

Related outputs

Data mining in IoT: data analysis for a new paradigm on the Internet
Wlodarczak, Peter, Ally, Mustafa and Soar, Jeffrey. 2017. "Data mining in IoT: data analysis for a new paradigm on the Internet." 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2017). Leipzig, Germany 23 - 26 Aug 2017 New York, United States. https://doi.org/10.1145/3106426.3115866
Data process and analysis technologies of big data
Wlodarczak, Peter, Ally, Mustafa and Soar, Jeffrey. 2016. "Data process and analysis technologies of big data." Yu, Shui, Lin, Xiaodong, Misic, Jelena and Shen, Xuemin (Sherman) (ed.) Networking for big data. Boca Raton, FL, United States. CRC Press. pp. 103-119
Genome mining using machine learning techniques
Wlodarczak, Peter, Soar, Jeffrey and Ally, Mustafa. 2015. "Genome mining using machine learning techniques." Geissbuhler, Antoine, Demongeot, Jacques, Mokhtari, Mounir and Abdulrazak, Bessam (ed.) 13th International Conference on Smart Homes and Health Telematics: Inclusive Smart Cities and e-Health (ICOST 2015). Geneva, Switzerland 10 - 12 Jun 2015 Switzerland. https://doi.org/10.1007/978-3-319-19312-0_39
Reality mining in eHealth
Wlodarczak, Peter, Soar, Jeffrey and Ally, Mustafa. 2015. "Reality mining in eHealth." Yin, Xiaoxia, Ho, Kendall, Zeng, Daniel, Aickelin, Uwe, Zhou, Rui and Wang, Hua (ed.) 4th International Health Information Science Conference (HIS 2015). Melbourne, Australia 28 - 30 May 2015 Switzerland. https://doi.org/10.1007/978-3-319-19156-0_1
What the future holds for social media data analysis
Wlodarczak, P., Soar, J. and Ally, M.. 2015. "What the future holds for social media data analysis." International Journal of Computer, Information, Systems and Control Engineering. 9 (1), pp. 16-19.