Program - Statistics in Complex Systems

Wednesday, April 11

Time Speaker Title
9.00-9.10 Michael Sørensen Welcome
9.10-10.00 Piotr Zwiernik Totally positive exponential families, graphical models, and convex optimization
10.00-10.50 Martin Jaggi Learning in a Distributed and Heterogeneous Environment
10.50-11.10 Coffee break
11.10-12.00 Caroline Uhler Maximum likelihood estimation for totally positive log-concave densities
12.00-16.00 Lunch and break
16.00-16.50 Kasper Daniel Hansen Co-expression networks are associated with the role of the epigenetic machinery in neurological dysfunction
17.10-18.00 Cecilia Holmgren Split trees and Galton Watson trees: Two important classes of random trees
18.00- Symposium dinner

Thursday, April 12

Thursday, April 12

Time Speaker Title
9.00-9.50 Markus Reiss A nonparametric problem for SPDEs
10.00-10.50 Subhash Lele Covariance models for spatio-temporal processes on a regular grid: Flexible, yet computationally simple
10.50-11.10 Coffee break
11.10-12.00 Sonja Greven Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains
12.00-13.00 Lunch
13.00- Louisiana

Friday, April 13

Friday, April 13

Time Speaker Title
9.00-9.50 Rajen Shah Low-priced lunch in conditional independence testing
10.00-10.25 Søren Wengel Mogensen Markov equivalence in graphical models for partially observed stochastic processes
10.25-10.50 Jacob Østergaard Inferring network connectivity using cointegration analysis
10.50-11.10 Coffee break
11.10-12.00 Marc Hoffmann Nonparametric estimation in age-dependent models in a large population limit
12.00-14.00 Lunch and break
14.00-14.50 Mathias Drton Causal Discovery in Linear Non-Gaussian Models

Abstracts

Abstracts

Piotr Zwiernik, Totally positive exponential families, graphicalTil toppen models, and convex optimization

Probability distributions that are multivariate totally positive of order 2 (MTP2) appeared in the theory of positive dependence and in statistical physics through the celebrated FKG inequality. The MTP2 property is stable under marginalization, conditioning and it appears naturally in various probabilistic graphical models with hidden variables. Models of exponential families with the MTP2 property admit a unique maximum likelihood estimator. In the Gaussian case, the MLE exists also in high-dimensional settings, when p>>n, and it leads to sparse solutions. 


Martin Jaggi, Learning in a Distributed and Heterogeneous EnvTil toppenironment

We discuss recent directions in optimization algorithms used for the training of machine learning systems, such as generalized linear models (regression, classification) and deep learning. For distributed optimization when using many machines, as well as for integrated compute devices with varying compute and memory capacities (such as GPUs paired with regular compute nodes), we present ideas from convex optimization which help accelerating training. In particular, we study importance sampling methods and primal-dual gap techniques for these purposes. 


Caroline Uhler, Maximum likelihood estimation for totallyTil toppen positive log-concave densities

We consider the problem of non-parametric density estimation under the shape constraint that the logarithm of the density function is concave and supermodular (i.e., the density is log-concave and totally positive). Given n independent and identically distributed random vectors, we first prove that the MLE under these shape constraints exists with probability one. We then characterize the domain of the MLE and show that it is in general larger than the convex hull of the observations. If the observations are 2-dimensional or binary, we show that the logarithm of the MLE is a tent function and we provide a conditional gradient method for computing the MLE. In the more general setting where d>2, we provide a conditional gradient method under a more restrictive shape constraint known as translation supermodularity.


Kasper Daniel Hansen, Co-expression networks are associated with the role of theTil toppen epigenetic machinery in neurological dysfunction

I will discuss a recent application of networks to increase insight in a complex biology system. We focus on the epigenetic machinery, by which we mean genes encoding proteins which are directly involved in DNA methylation and histone modifications. These two biological processes fundamental to mammalian development and reflects a molecular program which is active in every cell in the human body. We are using large-scale studies of genetic and expression patterns in humans to increase our understanding of these basic processes, particular their involvement in human disease. Using the GTEx resource, a large study of gene expression in human tissues, we used co-expression networks to show a surprising and intriguing role of co-expression between components of the epigenetic machinery and neurological dysfunction. Specifically, we constructed co-expression networks separately for each tissue and found a subset of the epigenetic machinery are present in the same network across tissues. This subset is strikingly enriched for genes where neurological dysfunction is a Mendelian phenotype, suggesting that this network is particular important for normal neurological development. 


Cecilia Holmgren, Split trees and Galton Watson trTil toppenees: Two important classes of random trees

I will talk about two important classes of random trees, split trees and Galton-Watson
trees. Split trees were introduced by Devroye (1998) for unifying many important random trees of logarithmic height. They are interesting not least because of their usefulness as models of sorting algorithms in computer science; for instance can the well-known Quicksort algorithm (introduced by Hoare [1960]) be depicted as the binary search tree. Galton-Watson-trees were introduced already in 1875 to describe under which conditions a (noble) family name would die out or survive forever. The conditioned Galton-Watson trees (also called simply-generated trees) are conditioned on a given total size of the number of vertices and represent important random trees of non-logarithmic height. Examples are ordered (plane) trees, Cayley trees and binary trees. I will give a brief general introduction to the field and the main focus of my talk will then be to discuss some of my own results for these large classes of random trees (e.g., on the total path length, the number of cuttings, the number of inversions, and bootstrap percolation) and some of the methods that I have used (e.g., renewal theory and Aldous Brownian continuum random tree).


Markus Reiss, A nonparametric problem fTil toppenor SPDEs

In the first part drift estimation for stochastic ordinary and partial differential equations (SODEs/SPDEs) will be reviewed. The basic difference is that simple drift parameters can be identified from continuous SPDE observations, but not from SODE observations. In the second part we consider the specific problem of estimating the space-dependent (nonparametric) diffusivity of a stochastic heat equation from time-continuous observations with space resolution h. The rather counterintuitive result and its efficiency as h -> 0 will be discussed. This is joint work with Randolf Altmeyer.


Subhash Lele,  Covariance models for spatio-temporal processes on a regTil toppenular grid: Flexible, yet computationally simple

Mountain pine beetles (MPB) is a major forest pest in North America. Due to the climate change and other anthropogenic factors, the MPBs are spreading across large regions of Canada, substantially affecting forestry and forestry related economy in British Columbia and Alberta. Understanding the spread of MPB is important to devise biological control systems. The data on MPB are available on a regular grid across space and time. Two important impediments to modelling large spatial or spatio-temporal data are (a) Specification of the spatial covariance structure, (b) Conducting likelihood inference that involves computation of determinant and inversion of large matrices. We utilize a flexible class of covariance models, called separable covariance models, to model the dependence structure much more flexibly than the commonly used isotropic models. These models allow us to reduce the computational complexity by several orders of magnitude and at the same time, to increase the model flexibility by allowing geometric and other kinds of anisotropies. We will discuss the statistical and computational implications of separable covariance models for analyzing large amounts of spatio-temporal Gaussian and non-Gaussian data. This is joint work with Dean Koch and Mark Lewis.


Sonja Greven, Multivariate Functional Principal CoTil toppenmponent Analysis for Data Observed on Different (Dimensional) Domains 

We consider principal component analysis for multivariate functional data on different domains that may differ in dimension. This allows us to investigate the main joint modes of variation and joint dimension reduction for several curves and images. The theoretical basis for the approach is given in terms of a Karhunen-Loève Theorem. For the practically relevant case of a finite Karhunen-Loève representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error, as naturally occurs in longitudinal studies. The new method is shown to be competitive to existing approaches for the special case of densely observed functional data on a common one-dimensional domain. The motivating application is a neuroimaging study, where the goal is to explore how longitudinal trajectories of a neuropsychological test score covary with FDG-PET brain scans at baseline. A flexible R implementation in the packages MFPCA and funData is available on CRAN. This is joint work with Clara Happ.


Rajen Shah, Low-priced lunch in conditional independence testing

It is a common saying that testing for conditional independence, i.e., testing whether X is independent of Y, given Z, is a hard statistical problem if Z is a continuous random variable. We provide a formalisation of this result and show that a test with correct size does not have power against any alternative. Given the non-existence of a uniformly valid conditional independence test, we argue that tests must be designed so their suitability for a particular problem setting may be judged easily. To address this need, we propose to nonlinearly regress X on Z, and Y on Z and then compute a test statistic based on the sample covariance between the residuals, which we call the generalised covariance measure (GCM). We prove that validity of this form of test relies almost entirely on the weak requirement that the regression procedures are able to estimate the conditional means X given Z, and Y given Z, at a slow rate. While our general procedure can be tailored to the setting at hand by combining it with any regression technique, we develop the theoretical guarantees for kernel ridge regression. A simulation study shows that the test based on GCM is competitive with state of the art conditional independence tests. 


Søren Wengel Mogensen, Markov equivalence in graphical models for partially obseTil toppenrved stochastic processes 

When studying multivariate stochastic processes, one can ask if a coordinate process is independent of another process. However, this ignores the temporal component of the model. Instead, one can ask if the past of a coordinate process is predictive of or influences the present of another process. This can be formalized using the notion of (conditional) local independence. This notion of independence resembles conditional independence of random variables, but in contrast to conditional independence, local independence treats past and present differently which means that B can be locally independent of A given C without A being locally independent of B given C. To make a graphical representation of local independence models, earlier work has used directed graphs in which each vertex corresponds to a coordinate process and introduced the notion of delta-separation in these graphs. We extend this work and introduce a larger class of graphs along with mu-separation to represent local independence models in partially observed systems of multivariate stochastic processes, i.e., settings where unmeasured and unknown processes may act as "confounders" of measured processes. We discuss properties of the Markov equivalence classes of these graphs, in particular a central maximality property which gives a characterization of Markov equivalence. This characterization is constructive in the sense that it straightforwardly outlines a learning algorithm in the oracle setting assuming that the graph and the local independence model are Markov and faithful. The maximality property also allows us to define another graphical object that concisely represents the equivalence class.


Jacob Østergaard, Inferring network connectivity using cointeTil toppengration analysis

Understanding how the brain works is a challenging question that has puzzled neuroscientists for ages. Contemporary research efforts attempt to explain the structure of neural networks by mapping the connectivity among the neurons. A particular challenging task is to derive the so-called uni-directional couplings, i.e. one-way communication channels, due to the requirement of asymmetrical measures. In this talk we discuss how to infer network structures by utilizing analysis techniques from the area of cointegration. This statistical toolbox have been celebrated among econometricians for decades where it has found applications for inference in multivariate non-stationary stochastic processes. We set the stage using the backdrop of classical cointegration in discrete time and discuss how to transform this  for continuous time processes. Along the way we argue that a system of coupled neurons can be interpreted as a cointegrated system and we present some simple examples, including uni-directional couplings, which can be analyzed using these techniques. Finally, we introduce some ideas and challenges with respect to inference in high-dimensional settings.


Marc Hoffmann, Nonparametric estimation in age-dependent models in a large Til toppenpopulation limit

Motivated by improving mortality tables from human demography databases, we investigate inference of an age-evolving density of a population alimented by time inhomogeneous mortality and fertility. Asymptotics are taken as the size of the population grows within a limited time horizon: the observation gets closer to the solution of a PDE (a inhomogeneous version of the Von Foerster Mc Kendrick equation) and the difficulty lies in controlling simultaneously the approximation to the limiting PDE in a suitable sense together with an appropriate parametrisation of the anisotropic solution. In this setting, we prove new concentration inequalities that enable us to implement the Goldenshluger-Lepski algorithm and derive oracle inequalities. Minimax lower bounds are investigated and links to inverse problems for the fertility rate estimation are identified. This is a joint work with A. Boumezoued and P. Jeunesse. 


Til toppenMathias Drton, Causal Discovery in Linear Non-Gaussian Models

We consider the problem of inferring the causal graph underlying a structural equation model from an i.i.d. sample.  It is well known that this graph is identifiable only under special assumptions.  We consider one such set of assumptions, namely, linear structural equations with non-Gaussian errors, and discuss inference of the causal graph in high-dimensional settings as well as in the presence of latent confounders.  Joint work with Y. Samuel Wang.