Asymptotic Theory for Random Forest

Specialeforsvar: Peter Lund Andersen

Titel:  Asymptotic Theory for Random Forest
Based upon subsampling and honesty

Abstract: This dissertation explores recent developments in asymptotic theory for random forests by combining classical statistical methods with the analysis of tree-based predictors to establish asymptotic results. We start by providing a brief overview of decision trees and ensemble methods constructed using trees. The first major objective of the thesis is to
develop asymptotic results for a large class of estimators called U-statistics. We then extend this theory to random forests based on subsampling. The next segment of the thesis is a proof-heavy section that investigates the properties of decision trees that lead to asymptotic normality for random forests. With regularization and a particular type of independence in the trees, referred to as ”honesty,” we establish both asymptotic normality and a centered sampling distribution. Finally, we verify the asymptotic distribution results through a simulation study, demonstrating asymptotic normality across a wide range of random forests. This simulation study also includes an investigation into inference with random forests and the potential limitations of inference.

Vejleder: Munir Hiabu
Censor:   Sören Möller