Electronic Thesis and Dissertation Repository


Master of Science


Computer Science


Dr. Charles Ling


The task of forecasting stock performance is well studied with clear monetary motivations for those wishing to invest. A large amount of research in the area of stock performance prediction has already been done, and multiple existing results have shown that data derived from textual sources related to the stock market can be successfully used towards forecasting. These existing approaches have mostly focused on short term forecasting, used relatively simple sentiment analysis techniques, or had little data available. In this thesis, we prepare over ten years worth of stock data and propose a solution which combines features from textual yearly and quarterly filings with fundamental factors for long term stock performance forecasting. Additionally, we develop a method of text feature extraction and apply feature selection aided by a novel evaluation function. We work with investment company Highstreet Inc. and create a set of models with our technique allowing us to compare the performance to their own models. Our results show that feature selection is able to greatly improve the validation and test performance when compared to baseline models. We also show that for 2015, our method produces models which perform comparably to Highstreet's hand-made models while requiring no expert knowledge beyond data preparation, making the model an attractive aid for constructing investment portfolios. Highstreet has decided to continue to work with us on this research, and our machine learning models can potentially be used in actual portfolio selection in the near future.