Doctor of Philosophy
Statistics and Actuarial Sciences
This thesis introduces a novel and interpretable index of increase which is mathematically defined based on the distance between a given function and a set of non-increasing functions. Unlike the widely used traditional statistical methods for analyzing relationships between variables, the index does not rely on assumptions such as linearity, normality, and monotonicity, which may not be satisfied. Hence, it has the flexibility to be applied directly on pairs of data points to measure and compare non-linear, asymmetric, and non-monotonic relationships between two variables.
We begin with a review of the literature and background knowledge in Chapter 2.
In Chapter 3, we propose a distance-based index of increase, describe its properties in detail, and show its benefits through applying it to an educational dataset. In this way, we see the interpretability of the index of increase and how it can be applied. We also propose several modifications for different scenarios, such as subgroup analysis. Lastly, we provide a step-by-step implementation guideline for non-statistical researchers or practitioners.
In Chapter 4, we investigate two extensions of the index of increase, which quantify the interchangeability between variables. We discuss the usage of them in the context of developing curricula, accompanied with extensive graphical and numerical illustrations.
In Chapter 5, we introduce and explore an empirical index of increase that works in both deterministic and random environments, thus allowing to assess monotonicity of functions that are prone to random measurement-errors. We prove consistency of the empirical index and show how its rate of convergence is influenced by deterministic and random parts of the data. In particular, the obtained results suggest a frequency at which observations should be taken in order to reach any pre-specified level of estimation precision. We illustrate the index using data arising from purely deterministic and error-contaminated functions, which may or may not be monotonic.
Finally, in Chapter 6, we summarize our main results and give an outline of potential future works.
Summary for Lay Audience
Traditional statistical methods for analyzing relationships between variables often rely on assumptions such as linearity, normality, and monotonicity, which may not be satisfied. For example, this is the case when analyzing curves depicting sales versus prices, exports versus economic growth -- they are hardly monotonic, let alone linear. Thus, the use of traditional statistical tools becomes problematic. Furthermore, random noise or random measurement errors frequently contaminate data, and thus true relationships are blurred, thus leading to misrepresentations of results.
In this thesis, we explore an index of increase and its estimator that works in both deterministic and random environments, thus enabling the assessment of monotonicity of functions that might be exposed to random noise. The index and its estimator allow us to quantify non-linear, asymmetric, and non-monotonic relationships between variables. We shall illustrate theoretical results using data arising from deterministic and error-contaminated functions, which may or may not be monotonic. We also apply the index of increase with proper modifications in educational datasets to illuminate the use cases and potential extensions. Finally, we summarize the contributions of this thesis and outline the potential future works.
Chen, Lingzhi, "Making Sense of Noisy Data: Theory and Applications" (2021). Electronic Thesis and Dissertation Repository. 7826.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License