Abstracts

Jun Yang (Department of Mathematical Sciences, University of Copenhagen): 

Title: Complexity results for MCMC derived from quantitative bounds. 

Abstract: This paper considers how to obtain MCMC quantitative convergence bounds which can be translated into tight complexity bounds in high-dimensional settings. We propose a modified drift-and-minorization approach, which establishes generalized drift conditions defined in subsets of the state space. The subsets are called the “large sets”, and are chosen to rule out some “bad” states which have poor drift property when the dimension of the state space gets large. Using the “large sets” together with a “fitted family of drift functions”, a quantitative bound can be obtained which can be translated into a tight complexity bound. As a demonstration, we analyze several Gibbs samplers and obtain complexity upper bounds for the mixing time. In particular, for one example of Gibbs sampler which is related to the James–Stein estimator, we show that the number of iterations required for the Gibbs sampler to converge is constant under certain conditions on the observed data and the initial state. It is our hope that this modified drift-and-minorization approach can be employed in many other specific examples to obtain complexity bounds for high-dimensional Markov chains. Joint work with Jeffrey S. Rosenthal.

Anne Helby Petersen (Section of Biostatistics, University of Copenhagen)

Title: Causal discovery for observational life course studies

Canceled: Therese Graversen (IT University of Copenhagen)

Title: Designing a Data Science Education

Abstract: The first bachelor programme in Data Science in Denmark opened at the IT University (ITU) in 2017 and has become highly popular – at the time of writing we have 93 students in their first year. I took over the role as the head of programme at the beginning of 2020 just in time for graduating the first cohort, and I have since had the pleasure of evaluating the programme and implementing major revisions. In this talk I will share some thoughts on designing good data science education programmes. I will also touch upon the pertinent question of “what is data science, actually?”, particularly from a statistician’s point of view.

Jonas Wallin (Department of Statistics, Lund University): 

Title: Gaussian Whittle–Matern fields on metric graphs

Abstract: Random fields are popular models in statistics and machine learning for spatially dependent data on Euclidian domains. However, in many applications, data is observed on non-Euclidian domains such as street networks, or river networks. In this case, it is much more difficult to construct valid random field models. In this talk, we discuss some recent approaches to modeling data in this setting, and in particular define a new class of Gaussian processes on compact metric graphs. The proposed models, the Whittle-Matérn fields, are defined via a stochastic partial differential equation on the compact metric graph and are a natural extension of Gaussian fields with Matérn covariance functions on Euclidean domains to the non-Euclidean metric graph setting. We discuss various properties of the models, and show how to use them for statistical inference. Finally, we illustrate the model via an application to modeling traffic data. If time permits we will also discuss how to modify the processes that can be applied for directional networks.

Susanne Ditlevsen (Department of Mathematical Sciences, University of Copenhagen)

Title: Estimation of time to a tipping point

Abstract: In recent years there has been an increasing awareness of the risks of collapse or tipping points in a wide variety of complex systems, ranging from human medical conditions, pandemics, ecosystems to climate, finance and society. They are characterized by variations on multiple spatial and temporal scales, leading to incomplete understanding or uncertainty in modelling of the dynamics. Even in systems where governing equations are known, such as the atmospheric flow, predictability is limited by the chaotic nature of the system and by the limited resolution in observations and computer simulations. In order to progress in analyzing these complex systems, assuming unresolved scales and chaotic dynamics beyond the horizon of prediction as being stochastic has proven itself efficient and successful. When complex systems undergo critical transitions by changing a control parameter through a critical value, a structural change in the dynamics happens, the previously statistically stable state ceases to exist and the system moves to a different statistically stable state. To establish under which conditions an early warning for tipping can be given, we consider a simple stochastic model, which can be considered a generic representative of many complex two state systems. We show how this provides a robust statistical method for predicting the time of tipping. The method is used to give a warning of a forthcoming collapse of the Atlantic meridional overturning circulation.

References: Peter D. Ditlevsen and Susanne Ditlevsen (2023), Warning of a forthcoming collapse of the Atlantic meridional overturning circulation. Nat Commun 14, 4254

Christian Pipper (Novo Nordisk A/S): 

Title: Properties of a confirmatory two-stage adaptive procedure for assessing average bioequivalence

Abstract: We investigate a confirmatory two stage adaptive procedure for assessing average bioequivalence and provide some insights to its theoretical properties. Effectively, we perform Two One-Sided Tests (TOST) to reach an overall decision about each of the two traditional null-hypotheses involved in declaring average bioequivalence. The tests are performed as combination tests separately for each hypothesis based on the corresponding pair of stagewise p-values. Features of the procedure include a built in futility, sample size reassessment, and the ability to simultaneously assess average bioequivalence with respect to multiple endpoints while controlling the familywise error rate. To facilitate inference at the end of a trial we consider confidence limits that match the decision reached on each one sided hypothesis and provide theory ensuring their appropriateness. The performance is assessed by simulation in the context of planning a study to compare two different administrations of an antibody treatment.

    Erin Evelyn Gabriel (Section of Biostatistics, University of Copenhagen)

    Title: Propensity score weighting plus an adjusted proportional hazards model does not equal doubly robust away from the null

    Abstract: Recently, applied works have combined commonly used survival analysis modeling methods, such as the multivariable Cox model, and propensity score weighting, via the weighting of the (partial) scores, with the intention of forming a doubly robust estimator that is consistent when either the survival outcome model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. This lack of robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model can be easily demonstrated via simulation. However, when there is truly no causal effect, these models, as well as more general proportional hazards models, will be consistent when weighted via the correctly specified propensity score, even if the outcome model is misspecified. We prove that the IPT weighting of the (partial) scores from a general proportional hazards survival model is consistent both in the fitted log hazard or the standardized survival difference under the null of no causal effect under particular censoring mechanisms if either the propensity score or the outcome model are correctly specified and contains all confounders.

    Munir Hiabu (Department of Mathematical Sciences, University of Copenhagen)

    Title: Unifying local and global model explanations by functional decomposition of low dimensional structures

    Abstract: We consider a global representation of a regression or classification function by decomposing it into the sum of main and interaction components of arbitrary order. We propose a new identification constraint that allows for the extraction of interventional SHAP values and partial dependence plots, thereby unifying local and global explanations. With our proposed identification, a feature's partial dependence plot corresponds to the main effect term plus the intercept. The interventional SHAP value of feature k is a weighted sum of the main component and all interaction components that include k, with the weights given by the reciprocal of the component's dimension. This brings a new perspective to local explanations such as SHAP values which were previously motivated by game theory only. We show that the decomposition can be used to reduce direct and indirect bias by removing all components that include a protected feature. Lastly, we motivate a new measure of feature importance. In principle, our proposed functional decomposition can be applied to any machine learning model, but exact calculation is only feasible for low-dimensional structures or ensembles of those. We provide an algorithm and efficient implementation for gradient-boosted trees (xgboost) and random planted forest. The proposed methods are implemented in an R package, available at https://github.com/PlantedML/glex. This is joint work with  Joseph T. Meyer and Marvin N. Wright

    Claudia Strauch (Department of Mathematics, Aarhus University)

    Title: On the statistical analysis of high-dimensional stochastic processes

    Abstract: Rapidly increasing computational power has led to a massive increase in research into the development of efficient algorithms for various tasks based on high-dimensional and complex data. While there is now a fairly well-established theory in high-dimensional statistics for i.i.d. data, results for more involved date-generating mechanisms are relatively scarce. This is particularly true for statistics for high-dimensional stochastic processes. To fill this gap in our understanding, it is necessary to derive new statistical methods and estimators suitable for the analysis of high-dimensional stochastic processes. Furthermore, quantifying the performance of the estimators requires making connections to basic probabilistic concepts, such as concentration inequalities or rates of ergodicity, and developing extensions of them to overcome the particular challenges posed by high-dimensional stochastic processes. In this talk, we will give an overview of existing results, useful tools for developing new techniques, and we will discuss future developments. Based on joint work with Niklas Dexheimer and Lukas Trottner.