$\def\conf{\mathtt{conf}} \def\cov{\mathtt{cov}} \def\Int{\mathbb{Z}} \def\Real{\mathbb{R}} \def\Comp{\mathbb{C}} \def\ZN{\Int_N} \def\one{{\mathbf 1}} \def\attr{{\mathfrak{a}}} \def\U{{\mathcal U}} \def\Ed{{\mathcal E}} \def\Rd{{\Real}^d} \def\cN{{\mathbf N}} \def\OO{{\mathcal O}} \def\DD{{\mathcal D}} \def\HH{{\mathcal H}} \def\NN{{\mathcal N}} \def\VV{{\mathcal V}} \def\domain{{\mathcal{B}}} \def\ii{{\mathbf i}} \def\T{{\mathbb T}} \def\S{{\mathbb S}} \def\mod{{\mathit mod}} \def\Cycle{{\mathbf \sigma}} \def\prob{{\mathbb P}} \def\ex{{\mathbb E}} \def\codim{{\mathit{codim}}} \def\paths{{\mathcal P}} \def\sens{{\mathcal S}} \def\measurements{{\mathcal E}} \def\indep{{\mathcal I}} \def\conf{\mathcal{C}} \def\bx{\mathbf{x}} \def\array{\mathbb{A}} \def\plex{\mathbf{P}} \def\nodes{\mathbf{N}} \def\image{\mathtt{Im}\,} \def\ker{\mathtt{Ker}\,} \def\ph{\mathbf{P}H} \def\bbr{\mathtt{Br}} \def\bm{\mathtt{B}} \def\bmv{\bm^v} \def\phpp{\mathbf{b}} \def\dimph{\mathrm{dim}_{\ph}} \def\class{\mathcal{F}}$

### persistent homology of Brownian motions

##### yuliy baryshnikov, UIUC
with support of afosr

#### Introduction

Talk will be dealing with topology of random functions. Random functions enter data analysis as the distance functions to the sample. Gaussian random functions are the key model for the null-hypothesis in many algorithms of image analysis.

More generally, understanding the topology of random sets seems to be the necessary step beyond analysis the random sets.

In all the cases, one studies the excursion sets $M_c=\{f\leq c\}, f\in C(\Real^D), c\in\Real$.

#### Example: point clouds

To determine the topology of a point cloud $X\in \Real^D$ one thickens the points, or, equivalently, considers the excursion sets of the distance to $X$. The hope (often justified, see Niyogi-Smale-Weinberger) is that if $X$ is a (slightly) noisy sample from a submanifold $N\subset \Real^D$, the union of the balls around points of $X$ would have the same topology as $N$.

#### Example: Gaussian random fields

The real life data (MRI scans, WMAP, ...) are inherently noisy, and the statistical analysis always requires understanding the null-hypothesis: that what is observed is merely noise (say, a Gaussian random field). Hence the necessity to understand the topology of the Gaussian random fields, a well tended field (see Adler-Taylor's corpus of works).

#### Topology of noisy functions: persistence

Typically, whichever level one chooses, the excursion set has a lot of noise: small components, small holes, handles, etc. Changing the level removes them, but introduces new ones. One needs the features that are there for a long time, that persist. Hence the persistence kicks in. One defines the persistent homology $\ph_k(a,b)$ as $$\image H_a^b, \,\mathrm{where}\, H_a^b:H_k(M_a)\to H_k(M_b), a\lt b$$ is the image of the action of the natural inclusion $i(a,b):M_a\to M_b$ on homologies (which here are assumed to be over a field $k$).

#### Barcodes and persistance diagrams

Persistence allows one to record the birth and death of homologic classes. Long living ones represent important features. Short living one represent noise. One uses either barcodes or persistence diagrams to represent these data.

#### persistence diagrams

For a random function, it seems that the natural object to study is the ($k$-th)-persistence point process $\phpp_k$ placing a dot of multiplicity $\mu$ (i.e. $\mu \delta_{(x,y)}$) on the $k$-th sheet of persistence diagram if the rank of $\ph_k(x,y)=\mu$.

Here we will be mostly interested in the intensity density $\beta_k=\ex \phpp_k$ of this point process.

#### persistence for Brownian trajectories

The persistence toolbox aims at discarding homological noise generated by small wrinkles of a mapping. Understanding the nature of noise for random functions is an essential component of statistical topology.

Here we address the most basic type of random functions, 1D Brownian motion. The fractal nature of the trajectories lead to interesting structure of the persistence homology (only $\beta_0$ is nontrivial, for dimensional reasons).

In fact, the naively definition of $\ph$ is a bit awkward; better to use the identity $$\image H_k(a,b)\cong \ker(H_k(M_b)\to H_k(M_b,M_a));$$ for $k=0$ this means that we count only those components of $M_b$ that contain points of $M_a$.

#### persistence for Brownian trajectories

Let $\bbr:[0,1]\to \Real$ be a sample trajectory of the Brownian bridge (Brownian motion conditioned on $\bm(0)=\bm(1)=0$, or, equivalently, Brownian motion on the unit circle).

#### fractal nature of persistence homology for Brownian motion

Denote by $N_k^f(a,b)$ the number of bars in $k$-th persistent homology of $f$ overlapping the interval $[a,b]$, and by $B(x,y)=\ex N_o^\bbr (x,y)$. Then

for $0 \leq x \leq y$ one has $B(x,y)=\sum_{n\geq 1} e^{-2(n(y-x)+x)^2}$
When $x$ is close to $y$, the number of the bars overlapping $[x,y], x>0$ is close to $(y-x)^{-1}\sqrt{\pi/2} \mathtt{erfc}(2x)$

#### fractal nature of persistence homology for Brownian motion

In other words, the density $\beta_0$ of the persistence point process near the diagonal is exploding like $$\beta_0^{\bbr}(x,y)\sim (y-x)^{-3}$$

#### fractal nature, cont'd

Similarly, for the Brownian motion with drift, $$\bmv_t=\sigma\bm_t+vt,$$ the number of bars overlapping any given interval is finite a.s. and is geometrically distributed with parameter $$\exp(-v(y-x)/\sigma^2).$$

As a corollary, if $f$ is smooth, then $$\lim_{\sigma\to 0} \frac{1}{\sigma^2} \lim_{\Delta\to 0} \Delta \beta_0^{f+\sigma \bm}(x, x+\Delta)=\sum_{s:f(s)=x} \frac{1}{f'(s)}=\frac{f_*(dx)}{dx}.$$

In particular, the short bars of a small perturbation of a polynomial fix the polynomial up to shifts and reflections.

#### Persistence dimension

Somewhat more generally, one can define the persistence dimension of a function $f$ on a manifold $M$ as $$\dimph(f):=\inf\{k: \langle (y-x)^k \cdot \phpp \rangle <\infty\},$$ and the persistence dinension of a class of functions $\class$ as the $\sup_{f\in \class} \dimph(f)$.

Some recent results by Cohen-Steiner-Edelsbrunner-Harer-Mileyko suggest that for Lipschitz functions on a smooth manifold $M$ of dimension $d$ the persistence dimension is $d$; the results above make it plausible that for almost all Brownian motions the persistence dimension is $2$.

In general, it is natural to conjecture that for the class of $\alpha$-Hölder functions on $M$, the persistence dimension is $d/\alpha$.

#### questions

• Big question is to understand the persistence dimension for large classes of functions and underlying topological spaces. Already for smooth or Lipschitz functions on smooth manifolds unknown.
• Correlation functions for $\phpp_k$ for Brownian motions.