Electronic Thesis and Dissertation Repository

Thesis Format



Doctor of Philosophy


Applied Mathematics


Today many brokerage firms use computer algorithms to make trade decisions, submit orders, and manage orders after submission. This algorithmic trading is required to maximize execution speed and so minimize the cost, market impact and risk associated with trading large volumes of securities. Traders place orders to buy or sell a given amount of a security for a specific price on an exchange. These buy and sell orders accumulate in the `order book' until they either find a counter-party for execution or are canceled. All participants can also issue market orders to buy or sell at the best available prices; these orders are immediately executed on a `first come first serve' basis.

Using high frequency trading (HFT) data on the Toronto Stock Exchange, provided by the TMX Group, we explore a data driven model to detect a form of high frequency price manipulation -- known as spoofing. A spoofer manipulates prices by placing limit orders which they do not intend to be executed in order to mislead other traders about the available volume of shares. The hope is that this will cause prices to move in their favor. We show that a generalized form of volume imbalance is associated with price movements and this can be manipulated by spoofing strategies. The literature argues spoofing strategies are detrimental to the integrity of markets and new models are necessary for regulators to combat them.

The size of the data sets we use definitely qualify for the moniker `Big Data'. The limit order book must be constructed each time an order arrives for a particular stock. This process is implemented on a distributed data system using Pyspark since it would be impossible to do so, efficiently, on a local machine. We discuss some issues and complications that arise from working with very large data sets of this type.

We define a generalized volume imbalance as the weight in a convex combination of two price change distributions which forms our price change model. Price changes for different stocks happen at different time scales. We remedy this issue by comparing stocks on time intervals over which they all have the same variance in their price change distributions. Statistical and goodness of fit tests using Cramer's V statistic and Kullback–Leibler divergence, respectively, are implemented to validate our model across a large collection of stocks. The model is then used to test the sensitivity of the limit order book to spoofing and derive relationships between the spoofer's constraints and their optimal decisions. These results could then be implemented by regulators as a way to flag periods of the trading day where market conditions make spoofing possible as a means to improve market surveillance.

Summary for Lay Audience

Price manipulation is detrimental to the integrity of financial markets. Price manipulation strategies have always existed, but, since the adoption of computer systems, new forms of price manipulation are emerging. In the past traders manipulated prices by injecting false or misleading information into the market in order to capitalize from resulting price movements and high frequnecy trading is not immune to these tactics. Traders can `spoof' the market by strategically commiting specific orders to an exchange to buy or sell a set number of shares while actually never intending to allow their order to be executed. The idea is that other traders can see these spoofing orders, act on this misleading information, and move prices in the spoofer's favour. Using high frequency order data on the Toronto Stock Exchange, provided by the TMX Group, we explore a data driven stock price model which is influenced by the orders arriving to the exchange. From our model we can calculate the average costs associated with a spoofer's optimal decisions to manipulate the market. We analyze this decision process to gain insights into how regulators can combat this type of illegal trade behaviour.