GSC1 - Ronnie Tiburon

Ingeniería Civil

•
Outros

Muchos Materiales
5/12/2022
¡Este material tiene más páginas!
Entonces, ¿te gustó este material?
Ayude a animar a otros estudiantes a mejorar el contenido
¿Te gustó este material? ¡Compartir! 🧡
Ingeniería Civil

107.381 Materiales compartidos
Descarga la aplicación para disfrutar aún más
Lea materiales sin conexión, sin usar Internet. Además de muchas otras características!
Vista previa del material en texto
Galaxy Survey Cosmology, part 1
Hannu Kurki-Suonio
8.2.2021
Contents
1 Statistical measures of a density field 1
1.1 Ergodicity and statistical homogeneity and isotropy . . . . . . . . . . . . . . . . 1
1.2 Density 2-point autocorrelation function . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Fourier expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Power spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Bessel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Spherical Bessel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.8 Power-law spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.9 Scales of interest and window functions . . . . . . . . . . . . . . . . . . . . . . . 18
2 Distribution of galaxies 25
2.1 The average number density of galaxies . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Galaxy 2-point correlation function . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Counts in cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Fourier transform for a discrete set of objects . . . . . . . . . . . . . . . . . . . . 31
2.5.1 Poisson distribution again . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Subspaces of lower dimension 35
3.1 Skewers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Angular correlation function for small angles 38
4.1 Relation to the 3D correlation function . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.1 Selection function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.2 Small-angle limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.3 Power law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Power spectrum for flat sky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.1 Relation to the 3D power spectrum . . . . . . . . . . . . . . . . . . . . . . 43
5 Spherical sky 45
5.1 Angular correlation function and angular power spectrum . . . . . . . . . . . . . 45
5.2 Legendre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Spherical harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 Euler angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Wigner D-functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6 Relation to the 3D power spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6 Dynamics 54
6.1 Linear perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Nonlinear growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7 Redshift space 61
7.1 Redshift space as a distortion of real space . . . . . . . . . . . . . . . . . . . . . . 61
7.2 Linear perturbations and the power spectrum . . . . . . . . . . . . . . . . . . . . 62
7.3 Correlation function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3.1 Linear growing mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.3.2 Projected correlation function . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.4 Small scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.5 Redshift space in Friedmann–Robertson–Walker universe . . . . . . . . . . . . . . 70
7.6 Alcock–Paczyński effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8 Measuring the correlation function 74
8.1 Bias and variance of different estimators . . . . . . . . . . . . . . . . . . . . . . . 76
9 Power spectrum estimation 83
9.1 Shot noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.2 Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.3 Selection function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.4 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10 Baryon acoustic oscillation scale as a standard ruler 90
11 Higher-order statistics 94
11.1 N-point correlation function and (N-1)-spectrum . . . . . . . . . . . . . . . . . . 94
11.2 Three-point correlation function and bispectrum . . . . . . . . . . . . . . . . . . 95
11.3 Measuring the three-point correlation function . . . . . . . . . . . . . . . . . . . 97
11.4 Higher-order statistics in cosmology . . . . . . . . . . . . . . . . . . . . . . . . . 97
11.5 Results from galaxy surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
12 Galaxy surveys 99
12.1 Sloan Digital Sky Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.1.1 SDSS-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.1.2 SDSS-III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.1.3 SDSS-IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Preface
These are the lecture notes of the first part (GSC1) of my Galaxy Survey Cosmology course
lectured at the University of Helsinki in spring 2017. The second part of the course discussed
gravitational lensing with the focus on weak lensing and cosmic shear. The lecture notes for the
second part (GSC2) are based on the lectures by Schneider in the textbook Schneider, Kochanek,
and Wambsganss (Gravitational Lensing: Strong, Weak and Micro; Springer 2006), and they
are available in hand-written form only. This was the first time I lectured this course, and
consequently the lecture notes are a bit raw. In the future I hope to add more material about
the practical aspects and cosmological results from galaxy surveys.
– Hannu Kurki-Suonio, May 2017
Preface for 2019
The current version of these lecture notes contains no introduction, but jumps directly to the
mathematical formulation of correlation functions and power spectra, the main tools in galaxy
survey cosmology. For a 4-page introduction to the field, read Sec. 2.7 of [2]. I also gave a
(different) introduction during the first lecture. The amount of calculus in these lecture notes
may seem formidable to some students. I have aimed for completeness so these notes can be
used as a reference for results that may be needed, but the student need not absorb all of
the mathematical results. This year I have added new material including recent observational
results, and correspondingly some of the older material (Sec. 3 and the latter half of Sec. 5) in
these notes were not covered in the course.
– Hannu Kurki-Suonio, February 2019
Preface for 2021
These lecture notes will be updated as the course progresses. The current version is essentially
as it was at the end of the 2019 course, except some typos and errors have been fixed. I thank
Elina Keihänen for finding some of them.
– Hannu Kurki-Suonio, January 2021
1 STATISTICAL MEASURES OF A DENSITY FIELD 1
1 Statistical measures of a density field
We begin by discussing statistical measures of a density field ρ(x) in Euclidian d-dimensional
space. We begin with a general treatment where we do not specify in more detail what density
we are talking about. It may refer to number density of objects such as galaxies (which will be
the main application) or just mass density, but we treat ρ(x) as a continuous quantity for now.
In d dimensions the volume element corresponding to radial distance between r and r + dr
is1
dV = Cdr
d−1dr , where Cd =
2πd/2
Γ(d/2)
, (1.1)
and the volume within distance R is
V (R) =
Cd
dRd . (1.2)
We will have applications for d = 1, 2, and 3, for which
C1 = 2 , C2 = 2π , C3 = 4π . (1.3)
The main application is d = 3 (3D), but d = 1 (1D) corresponds to, e.g., a pencil-beam survey
with a very long exposure of a small field on the sky with distance (redshift) determinations for
a large number of galaxies along this line of sight. The main 2D application is the distribution
of galaxies on the sky, without distance determinations, when the sky is approximated as a flat
plane, but it also corresponds to a redshift survey along a great circle on the sky (e.g., the
equator, see Fig. 1).
We assume that the density variations originate from a statistically isotropic and homoge-
neous ergodic random process, and we are really interested in the statistics of this random process
rather than in that of a particular realization of ρ(x).
It is currently thought that initial density perturbations in the Universe2 were produced
during inflation in the very early universe by quantum fluctuations of the inflaton field, which
is a random process that in standard models of inflation satisfies these properties of isotropy,
homogeneity, and ergodicity. The density field then evolved until today through determinis-
tic physics, which modified its various statistical measures, but maintained these fundamental
properties.
We follow [1] and [2]. Sec. 2.7 of [2] gives a 4-page introduction to the field. I recommend
reading it at this point.
1.1 Ergodicity and statistical homogeneity and isotropy
Statistical properties are typically defined as averages of some quantities. We will deal with two
kind of averages: volume average and ensemble average.3
The volume average applies to a particular realization (and to some volume V in it). We
denote the volume average of a quantity f(x) with the overbar, f̄ , and it is defined as
f̄ ≡ 1
V
∫
V
ddxf(x) . (1.4)
1Here Γ(x) = (x − 1)! is the gamma function, with values Γ( 1
2
) =
√
π, Γ(1) = 1, Γ( 3
2
) =
√
π/2, etc. You get
easily other values using the recursion formula Γ(x+ 1) = xΓ(x).
2‘Universe’ with a capital U refers to the universe we live in; whereas ’universe’ refers to the theoretical concept,
or any hypothetical universe we may consider.
3We are redoing here material from Cosmology II, Section 8.1 (2018 version), but with a different approach.
In Cosmo II the approach was theoretical, so we assumed the volume V was very large so that ρ̄ = 〈ρ〉. Now the
volume V is related to the volume of a galaxy survey, and for accurate treatment we need to take into account
that ρ̄ 6= 〈ρ〉.
1 STATISTICAL MEASURES OF A DENSITY FIELD 2
Figure 1: Distribution of galaxies according to the Sloan Digital Sky Survey (SDSS). This figure shows
galaxies that are within 2◦ of the equator and closer than 858 Mpc (assuming H0 = 71 km/s/Mpc).
Figure from astro-ph/0310571[12].
1 STATISTICAL MEASURES OF A DENSITY FIELD 3
The ensemble average refers to the random process. We assume that the observed density
field is just one of an ensemble of an infinite number of possible realizations that could have
resulted from the random process. To know the random process, means to know the probability
distribution Prob(γ) of the quantities γ produced by it. (At this stage we use the abstract
notation of γ to denote the infinite number of these quantities. They could be the values of
the density field ρ(x) at every location, or its Fourier coefficients ρk.) The ensemble average
of a quantity f depending on these quantities γ as f(γ) is denoted by 〈f〉 and defined as the
(possibly infinite-dimensional) integral
〈f〉 ≡
∫
dγProb(γ)f(γ) . (1.5)
Here f could be, e.g., the value of ρ(x) at some location x. The ensemble average is also
called the expectation value. Thus the ensemble represents a probability distribution. And the
properties of the density field we will discuss (e.g., statistical homogeneity and isotropy, and
ergodicity, see below) will be properties of this ensemble.
Statistical homogeneity means that the expectation value 〈f(x)〉 must be the same at all x,
and thus we can write it as 〈f〉. Statistical isotropy means that for quantities which involve
a direction, the statistical properties are independent of the direction. For example, for vector
quantities v, all directions must be equally probable. This implies that 〈v〉 = 0.
If theoretical properties are those of an ensemble, and we can only observe one realization
(the Universe) from that ensemble, how can we compare theory and observation? It seems
reasonable that the statistics we get by comparing different parts of a large volume should be
similar to the statistics of a given part over different realizations, i.e., that they provide a fair
sample of the probability distribution. This is called ergodicity. Fields f(x) that satisfy
f̄ → 〈f〉 as V →∞ (1.6)
are called ergodic. We assume that the density field is ergodic. It can be shown that a statistically
homogeneous and isotropic Gaussian random process is ergodic4 (but we do not here make the
assumption of Gaussianity).
Because of the ergodicity assumption, the concepts of volume and ensemble average are not
always kept clearly separate in literature, so that the notation 〈·〉 is used without specifying
which one it refers to, but we shall distinguish between these concepts.5 The equality of f̄
with 〈f〉 does not hold for a finite volume V ; the difference is called sample variance or cosmic
variance. The larger the volume, the smaller is the difference. Since cosmological theory predicts
〈f〉, whereas observations probe f̄ for a limited volume, cosmic variance limits how accurately
we can compare theory with observations.
1.2 Density 2-point autocorrelation function
We define the density perturbation field as
δ(x) ≡ ρ(x)− 〈ρ〉
〈ρ〉
. (1.7)
Since ρ ≥ 0, necessarily δ ≥ −1.
From statistical homogeneity,
〈ρ(x)〉 = 〈ρ〉 ⇒ 〈δ〉 = 0 . (1.8)
4Liddle & Lyth [7] make this statement on p. 73, but do not give a reference to an actual proof.
5The 〈·〉 notation is more convenient than ·̄ for complicated expressions, so we may sometimes use 〈·〉V for
volume average.
1 STATISTICAL MEASURES OF A DENSITY FIELD 4
Thus we cannot use 〈δ〉 as a measure of the inhomogeneity. Instead we can use the square of δ,
which is necessarily nonnegative everywhere, so it cannot average out like δ did. Its expectation
value 〈δ2〉 is the variance of the density perturbation, and the square root of the variance,
δrms ≡
√
〈δ2〉 (1.9)
the root-mean-square (rms) density perturbation, is a typical expected absolute value of δ at an
arbitrary location.6 It tells us about how strong the inhomogeneity is, but nothing about the
shapes or sizes of the inhomogeneities. To get more information, we introduce the correlation
function ξ.
We define the density 2-point autocorrelation function (often called just correlation function)
as
ξ(x1,x2) ≡ 〈δ(x1)δ(x2)〉 . (1.10)
It is positive if the density perturbation is expected to have the same sign at both x1 and x2,
and negative for an overdensity at one and underdensity at the other. Thus it probes how
density perturbations at different locations are correlated with each other. Due to statistical
homogeneity, ξ(x1,x2) can only depend on the difference (separation) r ≡ x2−x1, so we redefine
ξ as
ξ(r) ≡ 〈δ(x)δ(x + r)〉 . (1.11)
From statistical isotropy, ξ(r) is independent of direction, i.e., spherically symmetric (we use this
as a generic term for arbitrary d – we might also say ‘isotropic’ – i.e., for d = 2, read ‘circularly
symmetric’, and for d = 1, read ‘even’),
ξ(r) = ξ(r) . (1.12)
The correlation function is large and positive for r smaller than the size of a typical over- or
underdense region, and becomes small for larger separations.
The correlation function at zero separation gives the variance of the density perturbation,
〈δ2〉 ≡ 〈δ(x)δ(x)〉 ≡ ξ(0) . (1.13)
We define the volume average of ξ up to a distance R as
ξ̄(R) ≡ 1
V (R)
∫ R
0
ξ(r)Cdr
d−1dr . (1.14)
For d = 3 this becomes
ξ̄(R) ≡ 3
R3
∫ R
0
ξ(r)r2dr ≡ 3
R3
J3(R) , (1.15)
where
J3(R)≡
∫ R
0
ξ(r)r2dr (1.16)
is called the “J3 integral” (not a Bessel function; see Sec. 7.3.1 for more on J` and K` integrals).
Exercise: Integral constraint for a single realization. For a single realization and finite volume
(so that ρ̄ 6= 〈ρ〉) we can define
δ̂(x) ≡ ρ(x)− ρ̄
ρ̄
and ξ̂(r) ≡ 1
V
∫
V
ddx δ̂(x)δ̂(x + r) . (1.17)
6In other words, δrms is the standard deviation of ρ/〈ρ〉.
1 STATISTICAL MEASURES OF A DENSITY FIELD 5
Figure 2: The 2-point correlation function ξ(r) from galaxy surveys. Left: Small scales shown in a
log-log plot. The circles with error bars show the observational determination from the APM galaxy
survey [4]. The different lines are theoretical predictions by [5] (this is Fig. 9 from [5]). Right: Large
scales shown in a linear plot. Red circles with error bars show the observational determination from the
CMASS Data Release 9 (DR9) sample of the Baryonic Oscillation Spectroscopic Survey (BOSS). The
dashed line is a theoretical prediction from the ΛCDM model. The bump near 100h−1Mpc is the baryon
acoustic oscillation (BAO) peak that will be discussed in Sec. 10. This is Fig. 2a from [6].
1. Theoretical approach: assume periodic boundary conditions. This makes also ξ̂(r) periodic. All
integrals, also (1.18), are to be taken over the volume V . Show that∫
V
ddr ξ̂(r) = 0 (1.18)
(the integral constraint). Thus the positive values of ξ̂ at small separations must be compensated
by negative values at larger r. Note that here we do not need any statistical assumptions (like
statistical homogeneity or ergodicity). If ξ̂(r) → 0 for large r fast enough, for large volumes the
boundary conditions do not matter.
2. Practical approach: To avoid using boundary conditions and going outside the volume, redefine
ξ̂(r) ≡ 1
V
∫
ddx δ̂(x)δ̂(x + r) (1.19)
so that the integral for each r goes over only those values of x, for which both x and x + r are
within the volume. This is what one does with real galaxy surveys. Show that∫
ddr ξ̂(r) = 0 , (1.20)
where the integral goes over those values of r, for which ξ̂(r) is defined by (1.19), i.e., r separates
two points inside the volume.
1.3 Fourier expansion
Fourier analysis is a method for separating out the different distance scales, so that the depen-
dence of the physics on distance scale becomes clear and easy to handle.
1 STATISTICAL MEASURES OF A DENSITY FIELD 6
For a Fourier analysis of the density field we consider a cubic volume V = Ld and assume
periodic boundary conditions.7
We can now expand any function of space f(x) as a Fourier series
f(x) =
∑
k
fke
ik·x , (1.21)
where the wave vectors k = (k1, . . . , kd) take values
ki = ni
2π
L
, ni = 0,±1,±2, . . . (1.22)
(Note that we use the Fourier conventions of [2], not those of [1].)
The Fourier coefficients fk are obtained as
fk =
1
V
∫
V
f(x)e−ik·xddx . (1.23)
The term k = 0 gives the mean value,
f0 = f̄ . (1.24)
The Fourier coefficients are complex numbers even though we are dealing with real quantities8
f(x). From the reality f(x)∗ = f(x) follows that
f−k = f
∗
k . (1.25)
Thus Fourier modes comes in pairs
fke
ik·x + f∗ke
−ik·x = 2 ∗ Refk cos k · x− 2 ∗ Imfk sin k · x , (1.26)
and only the real part of each, Refk cos k · x− Imfk sin k · x, survives; so to visualize a Fourier
mode, just visualize this real part. The size of the Fourier coefficients depends on the volume
V – increasing V tends to make the fk smaller to compensate for the denser sampling of k in
Fourier space.
The Fourier expansion is an expansion in terms of plane waves eik·x, which form an orthogonal
and complete (closed) set of functions in the Euclidean volume V . (We will later encounter other
such expansions in terms of other functions.) Thus they satisfy the orthogonality relation∫
dV
(
eik·x
)∗ (
eik
′·x
)
=
∫
dV ei(k
′−k)·x = V δkk′ , (1.27)
where δkk′ is the Kronecker delta (δkk′ = 1 for k = k
′, and δkk′ = 0 otherwise), and the closure
(completeness) relation
1
V
∑
k
(
eik·x
)∗ (
eik·x
′
)
=
1
V
∑
k
eik·(x
′−x) = δdD(x
′ − x) , (1.28)
7This does not imply that the density field should be periodic in reality; we are just interested only in the
density field within the volume, and so we can replace the part outside the volume with a periodic replication
of the volume. This will introduce discontinuities at the volume boundary. The expansion (1.21) itself does not
assume anything about f(x) outside V (the expansion will be correct inside V and outside V it will represent
such a periodic extension; the discontinuities imply that the expansion will contain high-k modes); but some of
the discussion below, like convolution with a window function, makes use of this periodicity. Near the boundary
of the volume the window function will extend outside the volume; and thus in reality this will introduce edge
effects as the real universe is not periodic. This will have to be treated (sometime later) together with the fact
that an actual survey does not cover exactly a cubic volume.
For theoretical work one can also take V to be much larger than the observable universe so that the boundaries
are so far away that they do not matter.
8In GSC1 we deal only with real quantities. In GSC2, where we discuss gravitational lensing, we introduce
the complex shear, so these reality conditions do not apply to its Fourier coefficients/transform.
1 STATISTICAL MEASURES OF A DENSITY FIELD 7
where δdD(x
′ − x) is the d-dimensional Dirac delta function.9 Do not confuse the Dirac and
Kronecker deltas with the density perturbation δ(x) or its Fourier coefficient δk! Thus the
functions {
1√
V
eik·x
}
(1.30)
form an orthonormal set.
The point of a completeness relation10 for orthogonal functions is that any function can
indeed be expanded in them. Here∑
k
fke
ik·x =
1
V
∑
k
∫
V
ddx′f(x′)e−ik·x
′
eik·x =
∫
ddx′f(x′)δdD(x− x′) = f(x) . (1.31)
The convolution theorem states that convolution in coordinate space becomes just multipli-
cation in Fourier space (exercise):
(f ∗ g)(x) ≡
∫
V
ddx′f(x′)g(x− x′) =
∫
V
ddx′f(x− x′)g(x′) = V
∑
k
fkgke
ik·x , (1.32)
and multiplication in coordinate space becomes convolution in Fourier space (exercise):
1
V
∫
V
f(x)g(x)e−ik·xddx =
∑
q
fqgk−q . (1.33)
The Plancherel formula states (exercise):
1
V
∫
V
ddxf(x)g(x) =
∑
k
f∗kgk . (1.34)
With g = f this becomes the Parseval formula:
1
V
∫
V
ddxf(x)2 =
∑
k
|fk|2 . (1.35)
A great benefit of Fourier analysis is that derivation is replaced by multiplication:
g(x) ≡ ∇f(x) = ∇
∑
k
fke
ik·x =
∑
k
ikfke
ik·x ⇒ gk = ikfk . (1.36)
1.4 Fourier transform
The separation of neighboring ki values is ∆ki = 2π/L, so we can write
f(x) =
∑
k
fke
ik·x
(
L
2π
)d
∆k1 . . .∆kd ≈
1
(2π)d
∫
f(k)eik·xddk , (1.37)
9The Dirac delta function is not a true function but rather an operator (the correct mathematical term is
’distribution’) defined by its action on a function f(x) under an integral:∫
V
δdD(x
′ − x)f(x′)ddx ≡ f(x) . (1.29)
It can be thought of as a limit of a set of functions that have large values very near 0 and are close to zero
elsewhere.
10For some reason these closure relations are rarely given in standard sources for mathematical methods.
(Mathematicians do not like the Dirac delta function?) For example, I could not find Eq. (1.28) anywhere.
1 STATISTICAL MEASURES OF A DENSITY FIELD 8
where
f(k) ≡ Ldfk . (1.38)
replacing the Fourier series with the Fourier integral.
In the limit V → ∞, the approximation in (1.37) becomes exact, and we have the Fourier
transform pair
f(x) =
1
(2π)d
∫
f(k)eik·xddk
f(k) =
∫
f(x)e−ik·xddx . (1.39)
Note that this assumes that the integrals converge, which requires that f(x)→ 0 for |x| → ∞.11
Thus we don’t use this for, e.g., δ(x), but for, e.g., the correlation function ξ(x) the Fourier
transform is appropriate.
A special case is f = 1 (which does not satisfy f(x) → 0, so it does not lead to a true
function), whose Fourier transform is the Dirac delta function:∫
ddxe−ik·x = (2π)dδdD(k) . (1.40)
Writing k− k′ in placeof k we get∫
ddxei(k
′−k)·x = (2π)dδdD(k− k′) , (1.41)
the orthogonality relation of plane waves for the infinite volume. The orthonormal set is thus{
1
(2π)d/2
eik·x
}
. (1.42)
The closure relation is the same except x and k change places:∫
ddkeik·(x
′−x) = (2π)dδdD(x
′ − x) . (1.43)
The convolution theorem becomes (exercise):
(f ∗ g)(x) ≡
∫
ddx′f(x′)g(x− x′) = 1
(2π)d
∫
ddk f(k)g(k)eik·x
(f ∗ g)(k) ≡
∫
ddk′f(k′)g(k− k′) = (2π)d
∫
ddx f(x)g(x)e−ik·x , (1.44)
so that the Fourier transform of (f ∗ g)(x) is f(k)g(k) and the Fourier transform of (f ∗ g)(k)
is (2π)df(x)g(x).
The Plancherel theorem (exercise) is∫
ddx f(x)g(x) =
1
(2π)d
∫
ddk f∗(k)g(k) (1.45)
and the Parseval theorem is ∫
ddx f(x)2 =
1
(2π)d
∫
ddk |f(k)|2 . (1.46)
11The condition is tighter than this, but the condition that
∫
|f(x)|ddx over the infinite volume is finite, assumed
sometimes in literature (for 1D), seems too tight, as it is not satisfied by any power law, and yet we transform
power laws successfully in Sec. 1.8.
1 STATISTICAL MEASURES OF A DENSITY FIELD 9
Even with a finite V we can use the Fourier integral as an approximation. Often it is
conceptually simpler to work first with the Fourier series (so that one can, e.g., use the Kronecker
delta δkk′ instead of the Dirac delta function δ
d
D(k − k′)), replacing it with the integral in the
end, when it needs to be calculated. The recipe for going from the series to the integral is(
2π
L
)d∑
k
→
∫
ddk
Ldfk → f(k)(
L
2π
)d
δkk′ → δdD(k− k′)
(1.47)
Exercise: CMB lensing. For a small part of the sky we can use the flat-sky approximation, treating
it as a 2D plane. As is common in this context, denote the 2D coordinate by θ and the corresponding 2D
wave vector by l. Gravitational lensing deflects the CMB photons so that a photon originating from θ is
seen coming from θ+∇ψ(θ), where ψ(θ) is the lensing potential. Thus the observed (Tobs) and unlensed
(T ) CMB temperatures are related
Tobs(θ) = T [θ +∇ψ(θ)] ≈ T (θ) +∇ψ(θ) · ∇T (θ) . (1.48)
Express Tobs(l) in terms of T (l) and ψ(l).
1.5 Power spectrum
We now expand the density perturbation as a Fourier series (assuming a large cubic box V = Ld
with periodic boundary conditions)
δ(x) =
∑
k
δke
ik·x , (1.49)
with
δk =
1
V
∫
V
δ(x)e−ik·xddx (1.50)
and δ−k = δ
∗
k. Note that
〈δ(x)〉 = 0 ⇒ 〈δk〉 = 0 . (1.51)
The Fourier coefficients of the density field ρ(x) and the density perturbation δ(x) are related
by
ρk = 〈ρ〉δk for k 6= 0 , (1.52)
since the k 6= 0 coefficients vanish for the homogeneous part, and
ρ0 = ρ̄ = 〈ρ〉(1 + δ0) = 〈ρ〉(1 + δ̄) , (1.53)
where
δ̄ =
ρ̄− 〈ρ〉
〈ρ〉
(1.54)
(see Eq. 1.7) is the mean density perturbation within the volume V indicating whether the
volume is over- or underdense.
In analogy with the correlation function ξ(x,x′), we may ask what is the corresponding
correlation in Fourier space, 〈δ∗kδk′〉. Note that due to the mathematics of complex numbers,
correlations of Fourier coefficients are defined with the complex conjugate ∗. This way the
1 STATISTICAL MEASURES OF A DENSITY FIELD 10
correlation of δk with itself, 〈δ∗kδk〉 = 〈|δk|2〉 is a real (and nonnegative) quantity, the expectation
value of the absolute value (modulus) of δk squared, i.e., the variance of δk. Calculating
〈δ∗kδk′〉 =
1
V 2
∫
ddxeik·x
∫
ddx′e−ik
′·x′〈δ(x)δ(x′)〉
=
1
V 2
∫
ddxeik·x
∫
ddre−ik
′·(x+r)〈δ(x)δ(x + r)〉
=
1
V 2
∫
ddre−ik
′·rξ(r)
∫
ddxei(k−k
′)·x
=
1
V
δkk′
∫
ddre−ik·rξ(r) ≡ 1
V
δkk′P (k) , (1.55)
where we used 〈δ(x)δ(x + r)〉 = ξ(r), which results from statistical homogeneity. Note that
here δkk′ is the Kronecker delta, not a density perturbation! Thus, from statistical homogeneity
follows that the Fourier coefficients δk are uncorrelated. The quantity
P (k) ≡ V 〈|δk|2〉 =
∫
ddr e−ik·rξ(r) , (1.56)
which gives the variance of δk, is called the power spectrum of δ(x). Since the correlation
function → 0 for large separations, we can replace the integration volume V in (1.56) with an
infinite volume. Thus the power spectrum and correlation function form a d-dimensional Fourier
transform pair, so that
ξ(r) =
1
(2π)d
∫
ddk eik·rP (k) . (1.57)
Unlike the correlation function, the power spectrum P (k) is positive everywhere.
The correlation function is a dimensionless quantity, whereas the power spectrum P (k) has
the dimension of volume (δ(x) and δk are dimensionless). We noted earlier that the magnitude
of Fourier coefficients depends on the volume V . From (1.56) we see that the typical magnitude
of δk goes down with volume as ∝
√
V . Although the density of k-modes increases ∝ V ,
neighboring δk are uncorrelated, so they add up incoherently, so that, e.g., 4 times as many
k modes bring only a factor of 2 increase in
∑
k δke
ik·x, to be compensated by the δk being a
factor of 2 smaller.
From statistical isotropy
ξ(r) = ξ(r) ⇒ P (k) = P (k) (1.58)
(the Fourier transform of a spherically symmetric function is also spherically symmetric), so
that the variance of δk depends only on the magnitude k of the wave vector k, i.e., on the
corresponding distance scale. Since small distance scales correspond to large k and vice versa,
to avoid confusion it is better to use the words high and low instead of “large” and “small” for
k, i.e., small scales correspond to high k, and large scales to low k.
Using the recipe (1.47) for going from Fourier coefficients to Fourier transform, (1.55) gives
〈δ(k)∗δ(k′)〉 ≡ (2π)dδdD(k− k′)P (k) . (1.59)
Notice that with δk we can write P (k) ≡ V 〈δ∗kδk〉 (without having to use δkk′ in the equation),
but with δ(k) we need to use the δD-function in the definition of P (k).
The correlation function is more closely connected to observations, whereas theoretical pre-
dictions come more naturally in terms of P (k), especially at large distance scales, where the
density perturbations are small and closer to their primordial state. In principle, when we have
determined one of ξ(r) and P (k) from observations, we get the other by Fourier transform. In
1 STATISTICAL MEASURES OF A DENSITY FIELD 11
practice, observational errors make this inaccurate, and it is better to determine each one sep-
arately with a method optimized for it. Especially for large separations, where ξ(r) is small, it
is difficult to determine it accurately, if at all. For these reasons, density perturbations at large
distance scales (low k) are more commonly discussed in terms of P (k) and for small distance
scales (small r) in terms of ξ(r).
For the density variance we get
〈δ2〉 ≡ ξ(0) = 1
(2π)d
∫
ddk P (k) =
Cd
(2π)d
∫ ∞
0
P (k)kd−1dk
=
Cd
(2π)2
∫ ∞
0
kdP (k)
dk
k
≡
∫ ∞
−∞
P(k)d ln k . (1.60)
where we have defined
P(k) ≡ Cdk
d
(2π)d
P (k) =
k
π
P (k) ,
k2
2π
P (k) ,
k3
2π2
P (k) for d = 1, 2, 3 . (1.61)
Another common notation for P(k) is ∆2(k).12 The word “power spectrum” is used to refer to
both P (k) and P(k). Of these two, P(k) has the more obvious physical meaning: it gives the
contribution of a logarithmic interval of scales, i.e., from k to ek, to the density variance. P(k)
is dimensionless, whereas P (k) has the dimension of (d-dimensional) volume.
See Fig. 3 for the observed power spectrum from the Sloan Digital Sky Survey.
The pair of (1.60) is
P (0) = lim
k→0
P (k) =
∫
ddr ξ(r) . (1.62)
If 13 P (k)→ 0 as k → 0 we get the integral constraint∫
ddr ξ(r) = 0 . (1.63)
Therefore ξ(r) must become negative for some r, so that at such a separation from an overdense
region we are more likely to find an underdense region. (Going to ever larger separations, ξ as
a function of r may oscillate around zero, the oscillation becoming ever smaller in amplitude.
Most of the interest in ξ(r) is for the smaller r within the initial positive region.)
For isotropic ξ(r) and P (k) we can switch to polar, spherical, etc. coordinates and do the
angular integrals to rewrite (1.56) and (1.57) as 1-dimensional integrals (exercise):
P (k) =
∫ ∞
0
ξ(r) cos kr 2dr or P(k) = 2k
π
∫ ∞
0
ξ(r) cos krdr (1D)
P (k) =
∫ ∞
0
ξ(r)J0(kr) 2πrdr or P(k) = k2∫ ∞
0
ξ(r)J0(kr)rdr (2D)
P (k) =
∫ ∞
0
ξ(r)
sin kr
kr
4πr2dr or P(k) = 2k
3
π
∫ ∞
0
ξ(r)
sin kr
kr
r2dr (3D) (1.64)
and
ξ(r) =
1
π
∫ ∞
0
P (k) cos krdk =
∫ ∞
0
P(k) cos krdk
k
(1D)
ξ(r) =
1
2π
∫ ∞
0
P (k)J0(kr)kdk =
∫ ∞
0
P(k)J0(kr)
dk
k
(2D)
ξ(r) =
1
(2π)3
∫ ∞
0
P (k)
sin kr
kr
4πk2dk =
∫ ∞
0
P(k)sin kr
kr
dk
k
(3D) , (1.65)
12The notation P(k) and calling it “power spectrum” is common among cosmologists. Astronomers seem to
use the notation ∆2(k) for it, and reserve the word “power spectrum” for P (k).
13From (1.56), P (0) = V 〈(δ0)2〉, where δ0 = δ̄ = (ρ̄ − 〈ρ〉)/〈ρ〉. While 〈δ̄〉 = 0, 〈(δ̄)2〉 6= 0 for a finite V . We’ll
come back to this later.
1 STATISTICAL MEASURES OF A DENSITY FIELD 12
0.01 0.1 1
k [Mpc
-1
]
0.0001
0.001
0.01
0.1
1
10
k
3
 P
(k
) 
/ 
2
π
2
0.01 0.1 1
k [Mpc
-1
]
100
1000
10000
1e+05
P
(k
) 
[M
p
c3
]
Figure 3: The matter power spectrum from the SDSS obtained using luminous red galaxies [13]. The
top figure shows P(k) and the bottom figure P (k). A Hubble constant value H0 = 71.4 km/s/Mpc has
been assumed for this figure. (These galaxy surveys only obtain the scales up to the Hubble constant,
and therefore the observed P (k) is usually shown in units of h Mpc−1 for k and h−3 Mpc3 for P (k), so
that no value for H0 need to be assumed.) The black bars are the observations and the red curve is a
theoretical fit, from linear perturbation theory, to the data. The bend in P (k) at keq ∼ 0.01 Mpc−1 is
clearly visible in the bottom figure. Linear perturbation theory fails when P(k) & 1, and therefore the
data points do not follow the theoretical curve to the right of the dashed line (representing an estimate
on how far linear theory can be trusted). Figure by R. Keskitalo.
1 STATISTICAL MEASURES OF A DENSITY FIELD 13
where J0 is a Bessel function and the sin kr/kr = j0(kr) is a spherical Bessel function. In 3D,
for the volume-averaged ξ̄(R) defined in (1.15) we get (exercise):
ξ̄(R) =
∫ ∞
0
P(k)
[
3(sin kR− kR cos kR)
(kR)3
]
dk
k
=
∫ ∞
0
P(k)
[
3
kR
j1(kR)
]
dk
k
. (1.66)
“The factor in brackets dies off faster with increasing k than the (sin kr/kr) in (1.65), so ξ̄(R)
gives a cleaner measure of the power spectrum at k ∼ 1/R than does ξ(R).” (MBW[2], p. 263.)
(This is a comparison between j0(x) and (3/x)j1(x), where the jn(x) are spherical Bessel func-
tions. These are elementary functions, so we do not discuss them more at this stage, but we will
meet them later.)
Example: We do the 2D case of (1.64), since it involves the non-elementary function J0:
P (k) =
∫
d2r e−ik·rξ(r) =
∫ ∞
0
rdr ξ(r)
∫ 2π
0
dϕ e−ikr cosϕ , (1.67)
where the angular integral gives∫ 2π
0
dϕ e−ikr cosϕ = 2
∫ π
0
dϕ cos(kr cosϕ) = 2πJ0(kr) , (1.68)
where we used the integral representation of the Bessel function ([8], p. 680)
J0(x) =
2
π
∫ π/2
0
cos(x sinϕ)dϕ =
2
π
∫ π/2
0
cos(x cosϕ)dϕ . (1.69)
These two integrals are equal since cosϕ and sinϕ go over the same values at the same rate. Since (the
outer) cos is an even function, the integrals give the same result over each quadrant, i.e, we could as well
integrate from π/2 to π.
Exercise: Continuation of an earlier exercise: Fourier expand the “observed” δ̂(x) of Eq. (1.17) to
get its Fourier coefficients δ̂k. Show that
ξ̂(r) =
V
(2π)d
∫
ddk |δ̂k|2eik·r , (1.70)
for this single realization. Note that here we do not need any statistical assumptions (like statistical
homogeneity or ergodicity). Contrast this result with (1.57).
1.6 Bessel functions
The 2D Fourier transform brings in Bessel functions Jn(x), with n = 0, 1, 2, . . . They are mostly
used on the positive real axis, where they are oscillating functions, whose amplitude decreases
with increasing x, asymptotically as x−1/2. See Fig. 4. We list some of their properties.
J0(0) = 1 and Jn(0) = 0 for n = 1, 2, . . . (1.71)
Their power series begins
Jn(x) =
xn
2nn!
− x
n+2
2n+2(n+ 1)!
+ . . . (1.72)
They have the integral representations
Jn(x) =
1
π
∫ π
0
cos(nϕ− x sinϕ) dϕ = (−i)
n
π
∫ π
0
eix cosϕ cosnϕdϕ . (1.73)
1 STATISTICAL MEASURES OF A DENSITY FIELD 14
0 1 2 3 4 5 6 7 8 9 10
−0.4
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
Figure 4: The first three Bessel functions: J0 (blue), J1 (green), and J2 (red).
Both integrals are even in ϕ, and periodic over 2π, so one can replace
1
π
∫ π
0
=
1
2π
∫ π
−π
=
1
2π
∫ a+2π
a
. (1.74)
A number of recursion formulae relate them and their derivatives to each other:
Jn−1(x) + Jn+1(x) =
2n
x
Jn(x)
Jn−1(x)− Jn+1(x) = 2J ′n(x)
Jn−1(x) =
n
x
Jn(x) + J
′
n(x)
Jn+1(x) =
n
x
Jn(x)− J ′n(x) . (1.75)
As a special case of (1.75b),
J ′0(x) = −J1(x) . (1.76)
The Bessel function closure relation applies to Bessel functions with the same n but different
wavelengths: ∫ ∞
0
Jn(αx)Jn(α
′x)x dx =
1
α
δD(α− α′) . (1.77)
A somewhat similar (note dx instead of x dx) integral with neighboring (n and n − 1) Bessel
functions gives (Gradshteyn&Ryzhik[9] 6.512.3)
∫ ∞
0
Jn(αx)Jn−1(βx) dx =
βn−1
αn
Θ(α− β) =

0 (α < β)
1
2α (α = β)
βn−1
αn (α > β) ,
(1.78)
1.7 Spherical Bessel functions
The spherical Bessel functions jn(x) (of integer order) are related to ordinary Bessel functions
of half-integer order:
jn(x) =
√
π
2x
Jn+1/2(x) . (1.79)
Like Jn, they are mostly used on the positive real axis, where they are oscillating functions; their
amplitude decreases faster, asymptotically as x−1. See Fig. 5. Unlike Jn, they are elementary
functions (for integer n), see Table 1.7.
1 STATISTICAL MEASURES OF A DENSITY FIELD 15
0 1 2 3 4 5 6 7 8 9 10
−0.4
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
Figure 5: The first three spherical Bessel functions: j0 (blue), j1 (green), and j2 (red).
Spherical Bessel functions
j0(x) =
sinx
x
j1(x) =
sinx
x2
− cosx
x
j2(x) =
(
3
x3
− 1
x
)
sinx− 3
x2
cosx
Table 1: Spherical Bessel functions.
We list some of their properties.
j0(0) = 1 and jn(0) = 0 for n = 1, 2, . . . (1.80)
Their power series begins
jn(x) =
2nn!xn
(2n+ 1)!
− 2
n(n+ 1)!xn+2
(2n+ 3)!
+ . . . (1.81)
They have recursion formulae relating them and their derivatives to each other:
jn−1(x) + jn+1(x) =
2n+ 1
x
jn(x)
njn−1(x)− (n+ 1)jn+1(x) = (2n+ 1) j′n(x) . (1.82)
As a special case of (1.82b),
j′0(x) = −j1(x) . (1.83)
1.8 Power-law spectra
For certain ranges of scales, ξ(r) and P (k) can be approximated by a power-law form,
ξ(r) ∝ r−γ or P (k) ∝ kn . (1.84)
Note the minus sign for ξ – we expect correlations to decrease with increasing separation, so
this makes γ positive. When plotted on a log-log scale, such functions appear as straight lines
with slope −γ and n:
log ξ = −γ log r + const and logP = n log k + const . (1.85)
1 STATISTICAL MEASURES OF A DENSITY FIELD 16
Figure 6: Top panel: The correlation function from the 2dFGRS galaxy survey in log-log scale. The
dashed line is the best-fit power law (r0 = 5.05h
−1Mpc, γ = 1.67). The inset shows the same in linear
scale. Bottom panel: 2dFGRS data (solid circles with error bars) divided by the power-law fit. The solid
line is the result from the APM survey and the dashed line from an N-body simulation. This is Fig. 11
from [10].
The proportionality constant can be given in terms of a reference scale. For ξ(r) we usually
choose the scale r0 where ξ(r0) = 1, so that
ξ(r) =
(
r
r0
)−γ
. (1.86)
See Fig. 6. For P (k) we may write
P (k) = A2
(
k
kp
)n
or P(k) = A2
(
k
kp
)n+d
, (1.87)
where kp is called a pivot scale (whose choice depends on the application) and A ≡
√
P (kp) or√
P(kp) is the amplitude of the power spectrum at the pivot scale.
We define the spectral index n(k) as
n(k) ≡ d lnP
d ln k
. (1.88)
It gives the slope of P (k) on a log-log plot. For a power-law P (k), n(k) = const = n. We can
study power-law ξ(r) and P (k) as a playground to get a feeling what different values of the
spectral index mean, and, e.g., how γ and n are related.
The Fourier transform of a power law is a power law. For the correlation function of (1.86)
we get (exercise)
1D: P(k) =
2
k
Γ(1− γ) sin(12γπ)(kr0)
γ (0 < γ < 1 ⇒ −1 < n < 0)
2D: P (k) =
2π
k2
21−γ
Γ(1− 12γ)
Γ(12γ)
(kr0)
γ (12 < γ < 2 ⇒ −
3
2 < n < 0)
3D: P (k) =
4π
k3
Γ(2− γ) sin (2− γ)π
2
(kr0)
γ (1 < γ < 3 ⇒ −2 < n < 0) , (1.89)
1 STATISTICAL MEASURES OF A DENSITY FIELD 17
so that γ and n are related by
n = γ − d . (1.90)
For P(k) these read
1D: P(k) = 2
π
Γ(1− γ) sin(12γπ)(kr0)
γ
2D: P(k) = 21−γ
Γ(1− 12γ)
Γ(12γ)
(kr0)
γ
3D: P(k) = 2
π
Γ(2− γ) sin (2− γ)π
2
(kr0)
γ , (1.91)
In the 3D case these expressions are undefined for γ = 2, and we have the simpler result
P (k) =
2π2
k3
(kr0)
2 and P(k) = (kr0)2 . (1.92)
Observationally, the 3D correlation function has γ ≈ 1.8 for small separations, corresponding
to n ≈ −1.2 for high k. See Fig. 3 for the observed power spectrum from the Sloan Digital Sky
Survey.
Note that a power-law correlation function is everywhere positive. This is possible when
limk→0 P (k) 6= 0, which is indeed the case for the allowed spectral indices, n < 0, above. (In
this case, there is sufficient structure at ever larger scales to maintain positive correlation at
ever larger distances.) In reality, the power spectrum bends at large scales so that its spectral
index becomes positive for low k, and therefore also the correlation function changes shape and
will have also negative values at large enough r.
Outside these values of spectral indices the Fourier transform integrals diverge in the small-
or large-scale limit (I guess); but this does not prevent ξ(r) or P (k) from having also such
power-law forms over some limited range of scales.
The variance
〈δ2〉 = ξ(0) =
∫ ∞
0
P(k)dk
k
∝
∫ ∞
0
kn+d−1dk =
1
n+ d
[
kn+d
]∞
0
for n 6= −d (1.93)
diverges at large scales (low k) for n ≤ −d and at small scales (high k) for n ≥ −d. Thus we
should have n > −d for k → 0 and n < −d for k →∞. The large scales are not an issue (since
indeed n > −d) in cosmology. At small scales, (1.86) forces ξ(r)→∞ as r → 0 for any positive
γ. The solution of this issue is that arbitrarily small scales are not relevant; in practice we
have a finite resolution that cuts off the smallest scales. This can be implemented with window
functions, discussed in Sec. 1.9.
The case n = −d is a scale-invariant spectrum, P(k) = const . Such a spectrum would mean
that the universe would appear equally inhomogeneous at arbitrarily large scales – no asymptotic
homogeneity. Note that here we discuss the spectrum of density. References to a scale-invariant
or nearly scale-invariant spectrum in cosmology refer usually to the spectrum of gravitational
potential (Newtonian treatment) or spacetime curvature perturbations (GR treatment). Their
spectral index is lower by 4 so that such a scale-invariant spectrum will have a density spectral
index n = 1 (in 3D).
The boundary case n = 0 has the same 〈|δk|2〉 at all scales. From (1.90) γ → d as n → 0.
For γ < d, the integration from 0 to R in (1.14) for ξ̄(R) smooths over the small-scale divergence
of ξ(r); but for γ ≥ d the integral (1.14) diverges. Actually, also (1.89c) and (1.91c) diverge
(Γ(−1) =∞) so that a finite ξ would give infinite P (k). Instead, n = 0 corresponds to the case
of no correlations, ξ = 0. This is called white noise or a Poisson distribution (to be discussed
in Sec. 2.3).
1 STATISTICAL MEASURES OF A DENSITY FIELD 18
The larger n is, the more is the structure concentrated at small scales. Peacock comments
that n ≥ 0 spectra would seem to indicate that any large-scale structure is ‘accidental’, “re-
flecting the low-k Fourier coefficients of some small-scale process”, whereas n < 0 means that
large-scale structure is ‘real’ ” ([1], p. 499).
Example: To do the 1D and 3D cases in (1.89) is standard FYMM I stuff, but for the 2D case I had
to resort to integral tables:
P (k) =
∫ ∞
0
ξ(r)J0(kr) 2πrdr = 2πr
γ
0
∫ ∞
0
r1−γJ0(kr)dr =
2π(kr0)
γ
k2
∫ ∞
0
x1−γJ0(x)dx . (1.94)
From Gradshteyn & Ryzhik ([9], formula 6.561.14) we find that∫ ∞
0
xµJν(x)dx = 2
µΓ(
1
2 +
1
2ν +
1
2µ)
Γ( 12 +
1
2ν −
1
2µ)
for −ν − 1 < µ < 12 . (1.95)
We have ν = 0 and µ = 1− γ, so the condition becomes 12 < γ < 2 and the result∫ ∞
0
x1−γJ0(x)dx = 2
1−γ Γ(1−
1
2γ)
Γ( 12γ)
. (1.96)
1.9 Scales of interest and window functions
In (1.60) we integrated over all scales, from the infinitely large (k = 0 and ln k = −∞) to the
infinitely small (k = ∞ and ln k = ∞) to get the density variance. Perhaps this is not really
what we want. The average matter density today is 3× 10−27 kg/m3. The density of the Earth
is 5.5× 103 kg/m3 and that of an atomic nucleus 2× 1017 kg/m3, corresponding to δ ≈ 2× 1030
and δ ≈ 1044. Probing the density of the universe at such small scales finds a huge variance in
it, but this is no longer the topic of cosmology - we are not interested here in planetary science
or nuclear physics.
Even the study of the structure of individual galaxies is not considered to belong to cos-
mology, so the smallest (comoving) scale of cosmological interest, at least when we discuss the
present universe, is that of a typical separation between neighboring galaxies, of the order 1 Mpc.
To exclude scales smaller than R (r < R or k > R−1) we can filter the density field with a
window function (sometimes called a filter function). This can be done in k-space or x-space.
The filtering in x-space is done by convolution. We introduce a (usually spherically sym-
metric) window function W (r;R) such that∫
ddrW (r;R) = 1 (1.97)
(normalization) and W ∼ 0 for |r| � R and define the filtered density field
δ(x;R) ≡ (δ ∗W )(x) ≡
∫
ddx′ δ(x′)W (x− x′) . (1.98)
Here δ(x;R) and W (x;R) are considered as functions of x and R denotes the chosen resolution.
To simplify notation, we write hereafter W (x;R) as W (x), leaving the scale R implicit. We now
also assume W is spherically symmetric, so we can write just W (r).
Denote the Fourier coefficients of δ(x;R) by δk(R). We use the Fourier series for δ(x) and
δ(x;R), but since W (r) vanishes for large r we can use the Fourier transform W (k) for it. Thus
we need a mixed form of the convolution theorem. Let’s do it explicitly:
δk(R) =
1
V
∫
V
ddx δ(x;R)e−ik·x =
1
V
∫
V
ddxddx′ δ(x′)W (x− x′)e−ik·x
=
1
V
∫
V
ddx′ δ(x′)e−ik·x
′
∫
ddrW (r)e−ik·r = W (k)δk , (1.99)
1 STATISTICAL MEASURES OF A DENSITY FIELD 19
where
W (k) =
∫
ddrW (r)e−ik·r (1.100)
is the Fourier transform of W (r).14 With our normalization, W (r) has dimension 1/V and
W (k) is dimensionless with W (k = 0) = 1. Since W (r) = W (r) is spherically symmetric, so is
W (k) = W (k). Since W (−r) = W (r), W (k) is real.
For the correlations of these filtered Fourier coefficients we get
〈δ∗k(R)δk′(R)〉 = W (k)∗W (k′)〈δ∗kδk′〉 =
1
V
δkk′W (k)
2P (k) (1.101)
so the filtered power spectra are
W (k)2P (k) and W (k)2P(k) . (1.102)
The filtered correlation function is
ξ(r;R) ≡ 〈δ(x;R)δ(x− r;R)〉 = 1
(2π)d
∫
ddk eik·rW (k)2P (k) . (1.103)
and the variance of the filtered density field is
σ2(R) ≡ 〈δ(x;R)2〉 = ξ(0;R) =
∫ ∞
0
W (k)2P(k)dk
k
. (1.104)
Considered as a function of R, it provides another measure of structure at different scales.
Writing W (k) and P (k) in terms of their Fourier transforms, we get (exercise)
σ2(R) =
1
(2π)d
∫
ddkW (k)2P (k) =
∫
ddxddx′ ξ(|x′ − x|)W (x)W (x′) . (1.105)
Spectral moments. More generally, we define the spectral moments
σ2` (R) ≡
∫ ∞
0
k2`P(k)W (k)2 dk
k
, (1.106)
so that σ2(R) = σ20(R) is the zeroth moment. The relation σ
2(R) = ξ(0;R) can be generalized to higher
moments and derivatives of ξ(0;R) at r = 0, since, e.g.,
∇2ξ(r;R) = 1
(2π)d
∫
ddk (−k2)eik·rW (k)2P (k)
⇒ ∇2ξ(0;R) = 1
(2π)d
∫
ddk (−k2)W (k)2P (k) = −σ21(R) . (1.107)
(The unfiltered ξ(r) and its derivatives may diverge at r = 0, but ξ(0;R) has been smoothed by the
window function.) Peacock[1], p. 500 has
ξ(2`)(0;R) = (−1)` σ
2
` (R)
2`+ 1
(1.108)
which probably holds as such only for d = 3, since I get
ξ′′(0, R) = −σ
2
1(R)
d
(1.109)
14When x is closer than R to the edge of the volume V , the window function collectsa contribution outside
V . In this convolution theorem we used periodic boundary conditions. In real applications one needs to consider
edge effects.
1 STATISTICAL MEASURES OF A DENSITY FIELD 20
(I did not try to do the higher moments).
To get (1.109) from (1.107) we need to relate ∇2ξ(r;R) to ξ′′(r,R) at r = 0. Expand
ξ(r,R) =
∞∑
n=0
anr
n , (1.110)
where a1 = 0 so that ξ(r;R) is smooth at r = 0. Now
∂ir
n = nrn−1∂ir = nr
n−2xi ⇒ ∂iξ(r;R) =
∞∑
n=2
annr
n−2xi (1.111)
and
∂j∂iξ(r,R) =
∞∑
n=2
ann
[
(n− 2)rn−4xixj + rn−2∂jxi
]
(1.112)
where ∂jxi = δij . Thus
∇2ξ(r,R) ≡
∑
i
∂i∂iξ(r) =
∞∑
n=2
ann
[
(n− 2)rn−2 + d rn−2
]
→ 2d a2 (1.113)
as r → 0, whereas
ξ′′(r,R) =
∞∑
n=2
ann(n− 1)rn−2 → 2a2 , (1.114)
so that
∇2ξ(0;R) = d · ξ′′(0;R) . (1.115)
The simplest window function is the top-hat window function
WT (r) ≡
1
V (R)
for |r| ≤ R (1.116)
and WT (r) = 0 elsewhere, i.e., δ(x) is filtered by replacing it with its mean value within the
distance R. It’s Fourier transform15 is (exercise):
1D: WT (k) =
1
kR
sin kR
2D: WT (k) =
2
kR
J1(kR)
3D: WT (k) =
3
(kR)3
(sin kR− kR cos kR) = 3
kR
j1(kR) (1.117)
Mathematically more convenient is the Gaussian window function
WG(r) ≡
1
VG(R)
e−
1
2 r
2/R2 , (1.118)
where
VG(R) ≡
∫
ddre−
1
2 |r|
2/R2 (1.119)
15Note the emerging pattern with the Bessel functions: trigonometric functions are “Bessel functions for 1D”,
cos and sin corresponding to J0 and J1; the ordinary Bessel functions Jn are “for 2D”; and the spherical Bessel
functions are “for 3D”. All are oscillating functions; trigonometric functions have constant amplitude; Jn decay
as x−1/2 for large x, and jn(x) decay as x
−1.
1 STATISTICAL MEASURES OF A DENSITY FIELD 21
is the volume of WG. The volume of a window function is defined as what
∫
ddrW (r) would be
if W were normalized so that W (0) = 1, instead of the normalization we chose in (1.97).
The volume of WG is (exercise)
VG(R) = (2π)
d/2Rd , (1.120)
and its Fourier transform is, for all d, (exercise)
WG(k) = e
−12 (kR)
2
. (1.121)
(The 1D case was done in FYMM Ib. From that it’s easy to generalize to arbitrary d.)
We can also define the k-space top-hat window function
Wk(k) ≡ 1 for k ≤ 1/R (1.122)
and Wk(k) = 0 elsewhere. In x-space this becomes (exercise)
3D: Wk(r) =
1
2π2R3
sin y − y cos y
y3
=
1
6π2R3
3j1(y)
y
, where y ≡ |r|/R , (1.123)
and
3D: Vk(R) = 6π
2R3 (1.124)
For this window function the density variance is simply
σ2(R) =
1
(2π)d
∫ R−1
0
Cdk
d−1P (k)dk =
∫ − lnR
−∞
P(k)d ln k . (1.125)
Note that the volumes of the different window functions are quite different. See Fig. 7. In
3D:
VT (R) =
4π
3
R3 = 4.189R3 , VG(R) = (2π)
3/2R3 = 15.75R3 , Vk(R) = 6π
2R3 = 59.22R3 .
(1.126)
The values of R that make the volumes equal are RG = 0.6431RT and Rk = 0.4136RT . Thus a
given R corresponds to a somewhat different effective scale for the different window functions.
The different window functions also give quite different σ2(R). Observationally, the 3D
galaxy distribution has ([1], p. 501, [2], p. 83).
σ2T (R) ≈ 1.0 for R = 8h−1Mpc. (1.127)
Near these scales the slope of the correlation function is
γ ≈ 1.8 corresponding to n = −1.2. (1.128)
This slope does not hold at larger scales, and at R = 30h−1Mpc, σ2T (R) is already down to 10
−2
(σ ∼ 0.1 [2], p. 83). See also Fig. 3.
One may also ask, whether scales larger than the observed universe (i.e., the lower limit
k = 0 or ln k = −∞ in the k integrals) are relevant, since we cannot observe the inhomogeneity
at such scales. Due to such very-large-scale inhomogeneities, the average density in the observed
universe may deviate from the average density of the entire universe. Inhomogeneities at scales
somewhat larger than the observed universe could appear as an anisotropy in the observed
universe. The importance of such large scales depends on how strong the inhomogeneities at
these scales are, i.e., how the power spectrum behaves as k → 0.
1 STATISTICAL MEASURES OF A DENSITY FIELD 22
0 1 2 3 4 5 6 7
0.00
0.05
0.10
0.15
0.20
0.25
Figure 7: The 3D window functions W (r), top-hat (green), Gaussian (red), and k (blue), for R = 1.
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
γ = n+d
0
1
2
3
4
5
6
7
8
Figure 8: The ratio of σ2(R) to P(R−1) in the case of a power-law spectrum P(k) ∝ kn+d for the three
different window functions: Gaussian (red), k (blue), 1D top-hat (green dashed), 2D top-hat (green), and
3D top-hat (green with dots). They all diverge in the limit n→ −d (γ → 0) due to the contributions of
ever larger scales (ln k → −∞). The divergence at n → 1 for the top-hat window functions is a trickier
thing. It has to do with their Fourier transform not dying off at high k fast enough.
1 STATISTICAL MEASURES OF A DENSITY FIELD 23
Exercise: We defined σ2(R) as an expectation value over the ensemble. Define σ̂2(R) as the volume
average over a realization and show that
σ̂2(R) ≡ 1
V
∫
V
ddx δ̂(x, R)2 =
V
(2π)d
∫
ddk |δ̂k|2W (k)2 . (1.129)
Exercise: For a power-law spectrum and a Gaussian window function, show that
σ2G(R) =
1
2
Γ
(
n+ d
2
)
P(R−1) . (1.130)
Exercise: For a power-law spectrum and k-space top-hat window function, show that
σ2k(R) =
1
n+ d
P(R−1) . (1.131)
Example: I wanted to do the same also for the top-hat window function, especially since (1.127).
The cases seem different for different d, so try first the 3D:
σ2T (R) =
∫ ∞
0
P(k)WT (k)2
dk
k
= A2k−(n+3)p
∫ ∞
0
kn+3
(
3
kR
)2
j21(kR)
dk
k
= 9A2(kpR)
−(n+3)
∫ ∞
0
xnj21(x)dx = 9InP(R−1) , (1.132)
where
In =
∫ ∞
0
xnj21(x)dx . (1.133)
I couldn’t integrate In and didn’t find it in integral tables. Wolfram Alpha said “computation time
exceeded” for both In and I−n, but for n = −1.2 it gave the remarkable result
I−1.2 =
125
√
5 +
√
5Γ(4/5)
1386× 23/10
≈ 0.229418 . (1.134)
This would give σ2T (R) = 2.0648P(R−1) for n = −1.2 (γ = 1.8), which appears surprisingly large, since
the other two window functions give σ2G(R) = 0.5343P(R−1) and σ2k(R) = 0.5556P(R−1). Actually, I
was equally surprised that σ2G and σ
2
k came so close to each other although the volumes of the two window
functions are quite different. I would have been content, if σ2G had been intermediate between σ
2
k and σ
2
T .
I think the explanation lies in the Fourier transform WT (k) not dying off fast enough for high k, so for
large n + d, where there is lots of power at small scales, scales � R keep contributing to σ2T (R). When
this is applied to galaxy number density, there will be another cut-off due to the finite distances between
galaxies, so the effect of this high-k tail may not be fully realized. . .
The solution for doing the 3D case with Wolfram Alpha turned out to be to restrict the range of n
to n < 0 and n < −1. This gives
In =
∫ ∞
0
xnj21(x)dx = 2
−n sin
nπ
2
(n+ 1)Γ(n− 1)
n− 3
for −3 < n < 0 . (1.135)
(I just assume that this result holds also for 0 < n < 1; it diverges at n→ 1, see Fig. 8.)
Then the 1D case:
σ2T (R) = A
2(kpR)
−(n+1)
∫ ∞
0
xn−2 sin2 x dx = InP(R−1) , (1.136)
where
In =
∫ ∞
0
xn−2 sin2 x dx , (1.137)
1 STATISTICAL MEASURES OF A DENSITY FIELD 24
which diverges at x → 0 for n ≤ −1 and at x → ∞ for n ≥ +1. Gradshteyn&Ryzhik[9] 3.821.9 gives
I0 = π/2. Wolfram Alpha gives
In = 2
−n sin
nπ
2
Γ(n)
1− n
(0 < n < 1) and In = 2
−n sin
(
−nπ
2
)
Γ(n+ 1)
n(n− 1)
(−1 < n < 0) (1.138)
For n+ 1 = 1.8 this gives σ2T (R) = 3.1797P(R−1), which is even more than I got for the 3D case.
Then the 2D case:
σ2T (R) = 4InP(R−1) , where In =
∫ ∞
0
xn−1J1(x)
2 dx . (1.139)
Wolfram Alpha computation time was exceeded but Gradshteyn&Ryzhik[9] 6.574.2 gives
In =
Γ(1− n)Γ(1 + n/2)
21−nΓ(1− n/2)2Γ(2− n/2)
for n < 1. (1.140)
These results for the different window functions are compared in Fig. 8.
2 DISTRIBUTION OF GALAXIES 25
Figure 9: Evolution of the comoving total galaxy number density φT as a function of redshift and time
[18]. The symbols with error bars are results from different surveys. The solid line on theright panel is
a fit to the data points, and the dashed line is a fit of a galaxy merger model to them. The plot assumes
h = 0.7, so to convert into units of h3 Mpc−3, multiply the vertical scale numbers (which are for φT , not
log φT ) by h
−3 = 2.9.
2 Distribution of galaxies
Instead of a continuous density ρ(x), we now consider a distribution of discrete objects. Their
number density ρ(x) is then only defined with a finite resolution.16 To make the discussion sound
less abstract we call these objects galaxies, although they could also be other cosmological objects
(e.g., clusters of galaxies), and another application is in numerical methods where a continuous
density field is represented by a distribution of point masses.
2.1 The average number density of galaxies
It is often said that the observable universe contains about 200 billion (2 × 1011) galaxies.
If the observable universe is taken to mean everything until the last scattering surface (the
origin of the cosmic microwave background) at z = 1090, which lies at comoving distance
r ≈ 3.1H−10 = 9300h−1Mpc, then its comoving volume is
V =
4π
3
r3 ≈ 3400h−3 Gpc3 = 3.4× 1012h−3 Mpc3 . (2.1)
However, in this context the observable universe is taken to mean up to z = 8, which lies at
r ≈ 2.05H−10 = 6100h−1Mpc, giving
Vobs ≈ 970h−3 Gpc3 = 9.7× 1011h−3 Mpc3 . (2.2)
With Ng = 2× 1011 galaxies this gives comoving mean galaxy number density
ρ̄g =
Ng
Vobs
≈ 0.21
h−3 Mpc3
. (2.3)
16MBW[2], Sec. 6.1.2 takes a heavier approach here. They consider a two-step random process, where the first
random process generates a continuous density field ρ(x) and a second random process generates a point mass
representation of it. The advantage of this is that there is no worry about dV being smaller than the resolution
of ρ(x). They also invoke ergodicity, but this does not seem necessary, if one refers to 〈ρ〉 instead of ρ̄.
2 DISTRIBUTION OF GALAXIES 26
Our past light cone. Recently I read 17 that the number of galaxies in the observable universe is
10 times larger, since in the early universe galaxies were much smaller, and later they merged to form
larger galaxies so the comoving galaxy number density went down. Thus the above density applies to the
late universe, but in the early universe it was more than 10 times larger. The page links to a draft of an
article by Conselice et al.[18]. Fig. 9 is from that article.
In (2.2) and (2.3), I restricted the observable universe to z ≤ 8, and ignored evolution effects (higher
redshifts corresponds to earlier times) to get a homogeneous galaxy mean number density to correspond
to some recent t = const , but more appropriate would be to define the observable universe to correspond
to our past light cone, all the way to z = 1090. The galaxies it contains are then those, whose world lines
intersect this light cone. The comoving galaxy number density then increases towards higher z at first,
because of the above evolution effect, but then begins to fall (probably near z ∼ 8, but we don’t have
good data at such high redshifts) when we get to times when most galaxies had not yet formed.
The baryon density is ρ̄b = Ωbρcrit0 = ωbh
−2ρcrit0, where ωb = 0.022 [19], giving
ρ̄b = 6.1× 109m�/Mpc3 (2.4)
With h = 0.7 this gives
ρ̄b
ρ̄g
= 8.6× 1010m� = 1.7× 1041 kg baryonic matter per galaxy (2.5)
in the late universe. The total matter density parameter is ωm = 0.14 [19], so this gives
5.5× 1011m� total matter (baryonic + cold dark matter) per galaxy.
From [2], p.62 (Table 2.6) there are about 10 times as many dwarf galaxies in the local part
of the universe as there are spiral galaxies; and the number of other types of galaxies is about
half of that of spiral galaxies. The dwarf galaxies (defined as those with absolute magnitude
MB & −18) thus make up most of the number of galaxies although they contain a relatively
small fraction of all stars ([2], p. 57). The further out we look the larger the absolute luminosity
(the smaller the absolute magnitude) of the galaxy has to be for us to be able to observe it.
Thus the number density of observable galaxies is smaller than (2.3) and falls with distance.
For the 2D galaxy number density on the sky I can give a more definite number: The Euclid
wide survey will cover (Ω =) 15 000 square degrees (= 36.36% of the sky) and is expected to
observe 1.5 billion (Ng = 1.5×109) galaxies (with apparent magnitude m < 24.5, observed suffi-
cently well for their observed shapes to be used for weak lensing statistics)[20]. This corresponds
to a 2D density of
ρ̄g,Euclid =
Ng
Ω
≈ 30
arcmin2
. (2.6)
Most of these galaxies are at z . 2, corresponding to r . 2H−10 or V ≈ 190h−3 Gpc3, 1/5 of the
volume to z = 8. Defining the “Euclid volume” as 15 000 square degrees of sky up to z = 2, we
have
VEuclid ≈ 70h−3 Gpc3 , (2.7)
which should contain about 14 billion galaxies, so Euclid will miss most of them.
Let’s check if these numbers from the three sources[18, 2, 20] appear consistent with each other:
Comparing the absolute magnitude of the brightest dwarf galaxies, M = −18, to the Euclid wide survey
depth, m = 24.5, we conclude that even the brightest dwarf galaxies will be missed beyond distance
modulus m − M = 42.5 = −5 + 5 lg dL[pc] ⇒ dL = 109.5pc ≈ 3.16 Gpc. With h = 0.7 this is
dL = 2.2h
−1Gpc = 0.74H−10 corresponding to z ≈ 0.55. (In a flat universe the luminosity distance and
comoving distance are related by dL = (1 + z)r, so this corresponds to r ≈ 0.48H−10 .) Thus the Euclid
17https://www.nasa.gov/feature/goddard/2016/hubble-reveals-observable-universe-contains-10-
times-more-galaxies-than-previously-thought
2 DISTRIBUTION OF GALAXIES 27
wide survey should see some dwarf galaxies at z < 0.55 and none at z > 0.55, beyond which it will miss
also some of the larger galaxies. This seems consistent with Euclid observing 1.5 billion out of a total of
14 billion galaxies in VEuclid.
2.2 Galaxy 2-point correlation function
We treat individual galaxies as mathematical points, so that each galaxy has a coordinate value
x. We define the galaxy 2-point correlation function ξ(r) as the excess probability of finding a
galaxy at separation r from another galaxy:
dP ≡ 〈ρ〉 [1 + ξ(r)] dV (2.8)
where 〈ρ〉 is the mean (ensemble average) galaxy number density, dV is a volume element that
is a separation r away from a chosen reference galaxy, and dP is the probability that there is a
galaxy within dV .
The probability of finding a galaxy in volume dV1 at a random location x is
dP1 = 〈ρ(x)〉dV1 = 〈ρ〉〈1 + δ(x)〉dV1 = 〈ρ〉dV1 . (2.9)
The probability of finding a galaxy pair at x and x + r is
dP12 = 〈ρ(x)ρ(x + r)〉dV1dV2 = 〈ρ〉2〈[1 + δ(x)][1 + δ(x + r)]〉dV1dV2
= 〈ρ〉2 [1 + 〈δ(x)〉+ 〈δ(x + r)〉+ 〈δ(x)δ(x + r)〉] dV1dV2
= 〈ρ〉2 [1 + 〈δ(x)δ(x + r)〉] dV1dV2 , (2.10)
since 〈δ(x)〉 = 〈δ(x + r)〉 = 0. Dividing dP12 with dP1 we get the probability dP2 of finding the
second galaxy once we have found the first one
dP2 = 〈ρ〉 [1 + 〈δ(x)δ(x + r)〉] dV2 . (2.11)
Comparing (2.11) to (2.8) we see that our new definition of ξ agrees with the old one
ξ(r) = 〈δ(x)δ(x + r)〉 . (2.12)
Thus, for any galaxy, 〈ρ〉[1 + ξ(r)]dV is the expectation number of galaxies in a volume
element dV at separation r and the mean number of neighbors within a spherical shell is
dN(r) = 〈ρ〉 [1 + ξ(r)]Cdrd−1dr (2.13)
and the mean number of neighbors within distance R is
N(R) = 〈ρ〉V (R) + 〈ρ〉
∫ R
0
ξ(r)Cdr
d−1dr = 〈ρ〉V (R)
[
1 + ξ̄(R)
]
. (2.14)
Thus 1 + ξ(r) can be interpreted as the mean (expected) density profile around each galaxy.
2.3 Poisson distribution
A Poisson distribution is an uncorrelated distribution of galaxies, which we get when we assign
each galaxy i a random location xi (with uniform probability density within V ) independently.
(This process is called a Poisson process.) Then
dP12 = 〈ρ〉2dV1dV2 ⇒ ξ(r) = 0 . (2.15)
2 DISTRIBUTION OF GALAXIES 28
Figure 10: Poisson distribution of N = 250 galaxies. The 2D volume is divided into M = 25 cells, so
that on average a cell should contain 10 galaxies. There are twocells with just 5 galaxies.
Divide the volume V into M subvolumes (cells) ∆V . Assign now N galaxies into V with a
Poisson process. See Fig. 10. Each galaxy lands in a particular cell with probability p = 1/M ,
and somewhere else with probability 1 − p = 1 − 1/M . The probability of the n first galaxies
landing in this cell and the remaining N−n elsewhere is thus (1/M)n(1−1/M)N−n. Since there
are (
N
n
)
≡ N !
n!(N − n)!
(2.16)
ways of choosing n galaxies out of N , the probability of getting exactly n galaxies in a particular
cell is
P(n) =
(
N
n
)(
1
M
)n(
1− 1
M
)N−n
. (2.17)
This is the nth term of the binomial expansion
(a+ b)N =
N∑
n=0
(
N
n
)
anbN−n (2.18)
so we can easily check that the total probability is
N∑
n=0
P(n) =
N∑
n=0
(
N
n
)(
1
M
)n(
1− 1
M
)N−n
=
[
1
M
+
(
1− 1
M
)]N
= 1N = 1 . (2.19)
This probability distribution (2.17) of integers is called the binomial distribution B(N, p),
where p = 1/M . The Poisson limit theorem states that: if N → ∞ and p → 0 (M → ∞) so
that Np = N/M → λ, then
P(n) =
(
N
n
)
pn(1− p)N−n → λ
ne−λ
n!
=
(
N
M
)n e−N/M
n!
(2.20)
2 DISTRIBUTION OF GALAXIES 29
This probability distribution of integers is called the Poisson distribution.
For the Poisson distribution we have the expectation values
〈n〉 =
∞∑
n=0
nP(n) = e−λ
∞∑
n=1
λn
(n− 1)!
= λe−λ
∞∑
n=0
λn
n!
= λ = N/M
〈n2〉 =
∞∑
n=0
n2P(n) = . . . = λ+ λ2
〈(∆n)2〉 ≡ 〈(n− 〈n〉)2〉 = 〈n2〉 − 〈n〉2 = λ . (2.21)
The galaxy density in a cell is ρ(x) = n/∆V and the density perturbation is δ(x) = (n−〈n〉)/〈n〉,
so that
〈δ2〉 = 〈(n− 〈n〉)
2〉
〈n〉2
=
1
λ
=
1
〈n〉
. (2.22)
(We will later, in Sec. 8, refer to this as Poisson variance: the relative variance is 1/expected
number of points.) Thus, although for a Poisson distribution ξ(r) = 0 for r 6= 0, we have
ξ(0) = 〈δ2〉 = M
N
=
1
〈n〉
(2.23)
(in the limit of very large N and M). This density variance depends on the resolution since
increasing the number of cells M (decreasing ∆V ) makes its larger.
We could continue with this approach to specify that the volume V is cubic and the division into M
cells forms a rectangular grid, replacing M with Md and doing a discrete Fourier transform to find the
power spectrum of the Poisson distribution, finding that
P (k) = const , (2.24)
but since this seems not to be the usual approach in literature, I skip this, avoiding the discussion of
discrete Fourier transforms. However, we make some comments on the result: The Poisson distribution
has the power spectrum of white noise, n = 0, the amplitude depending on the resolution. We noted in
Sec. 1.8 that as n→ 0, γ → d. Now we have that ξ(r) = 0 for r 6= 0, but ξ(0) ∝ (∆r)−d (for fixed V and
N), where ∆r is the side of the cell, i.e., ∆V = (∆r)d.
Exercise: Cox process. A Cox process refers to a combination of two Poisson processes. Consider
the following examples:
1. Infinitely long lines are placed randomly (the first Poisson process) into an infinite volume. Galaxies
are then assigned randomly (the second Poisson process) on these lines, so that the mean (expec-
tation value of) linear number density on these lines is λ. What is ξ(r), given in terms of λ and
the resulting mean (expectation value of) 3D galaxy number density 〈ρ〉?
2. Like the previous case, but now the line segments have finite length L (all have the same length).
What is ξ(r), in terms of λ, L, and the number density ns of line segments?
Answer:
1.
ξ(r) =
λ
2π〈ρ〉
r−2 (2.25)
2.
ξ(r) =
1
2πnsLr2
(
1− r
L
)
for r ≤ L ; ξ(r) = 0 for r > L (2.26)
Note that this does not depend on λ (〈ρ〉 = nsLλ, so it cancels in the ratio λ/〈ρ〉).
2 DISTRIBUTION OF GALAXIES 30
Note: The Cox process has been used to generate “mock” catalogs for testing computer codes to
estimate ξ(r). It is actually not easy to generate test catalogs which have exactly some known, but
non-zero, correlation function ξ(r) that the estimate could be compared to. Cox is one way to do this.
Exercise: Cox process with binning. Continuation of previous exercise: When measuring the
correlation function ξ(r) from data, one has to count it for bins of finite width ∆r. So instead of asking
what is the excess probability ξ(r)〈ρ〉dV for another galaxy at separation r (so that dV = 4πr2dr), for
comparing data with theory we ask what is the excess probability ξ(r1, r2)〈ρ〉∆V for a separation between
r1 and r2 (i.e., here ∆r = r2 − r1). Find ξ(r1, r2) for r1 < r2 ≤ L. (ξ = 0 for r1 ≥ L, and the answer for
r1 < L < r2 is more complicated but rarely needed.)
Answer:
1.
ξ(r1, r2) =
λ
2π〈ρ〉
3(r2 − r1)
r32 − r31
=
λ
2π〈ρ〉
3
r21 + r1r2 + r
2
2
(2.27)
2.
ξ(r1, r2) =
1
2πnsL
3(r2 − r1)
r32 − r31
(
1− r1 + r2
2L
)
=
1
2πnsL
3
r21 + r1r2 + r
2
2
(
1− r1 + r2
2L
)
(2.28)
Table 2: The Cox process correlation function for L = 500, ns = 8× 10−7, and ∆r = 1 binning. On the
left: a log-log plot of ξ(r). On the right: lin-lin plot of r2ξ(r).
2.4 Counts in cells
One of the first methods to measure the clustering properties of galaxies was dividing the survey
volume V into cells (subvolumes) of equal size ∆V and shape, and count the number n of galaxies
in each cell. Defining
∆n ≡ n− 〈n〉 (2.29)
the variance
µ2 ≡ 〈(∆n)2〉 (2.30)
2 DISTRIBUTION OF GALAXIES 31
and skewness
µ3 ≡ 〈(∆n)3〉 , (2.31)
we have that for a completely random (i.e., Poisson) distribution of galaxies
µ2 = µ3 = 〈n〉 . (2.32)
A clustered distribution will have a larger variance. We define
y ≡ µ2 − 〈n〉
〈n〉2
(2.33)
as a measure of clustering. It can be shown that
〈y〉 =
∫
∆V
∫
∆V ξ(x1 − x2)d
dx1d
dx2
∆V 2
(2.34)
Likewise we define
z ≡ µ3 − 3µ2 + 2〈n〉
〈n〉3
(2.35)
as a measure of excess skewness of the clustering. It measures nonlinear effects in structure
growth and non-Gaussianity of primordial perturbations.
For an actual survey we define corresponding quantities µ̂2, µ̂3, ŷ, and ẑ, by replacing
expectation values with survey averages. Note that these are biased estimators, 〈ŷ 〉 6= y, 〈ẑ 〉 6= z,
since taking expectation values does not commute with raising to second or third power and
division. One can study scale dependence of structure by using larger or smaller cells. This
method is better than correlation function estimation for detecting structure at large scales,
where the correlation function is small.
An important improvement of this method is to, instead of using disjoint cells, assign a much
larger number of cells of the same size ∆V to random locations within the survey, allowing them
to overlap. This oversampling does not change the expected variance, which is determined by
(2.34), but the measured variance will be closer to this expectation value. [14, 15]
2.5 Fourier transform for a discrete set of objects
When we replaced the continuous density field with a set of discrete objects, the resolution of
our description was reduced. The standard approach of a discrete Fourier transform, where
one introduces a rectangular grid with finite resolution, introduces another, independent, loss
of resolution, which is unnecessary. We would lose the information on the exact locations of the
galaxies if we just assigned them into finite cells. The only discreteness we need to introduce
is that inherent in the problem, that of the discrete point set. This is done by introducing
microcells. We start as if we were going to do a normal discrete Fourier transform, dividing the
volume into cells. But now we make the cells extremely small, so that the probability of there
being more than one galaxy in a cell becomes zero, and specifying in which cell a galaxy i is,
specifies its exact location xi. Denote the volume of such a microcell with δV . Most microcells
will be empty. The galaxy number density in microcell j is nj/δV , where nj = 0 or 1. This
means that n2j = nj , which will be very helpful later.
The Fourier coefficients of the density field become
ρk =
1
V
∫
V
ρ(x)e−ik·xddx =
1
V
∑
j
(nj/δV )e
−ik·xjδV =
1
V
∑
j
nje
−ik·xj , (2.36)
2 DISTRIBUTION OF GALAXIES 32
where the sum is overmicrocells, and xj is the location of the microcell. But since nj = 0 for all
the empty microcells (nj = 0), only those terms survive, where the microcell contains a galaxy,
nj = 1, and the sum becomes a sum over galaxies
ρk =
1
V
∑
i
e−ik·xi , (2.37)
where xi is the location of galaxy i. Thus
δk =
ρk
〈ρ〉
=
1
〈ρ〉V
∑
i
e−ik·xi =
1
〈N〉
N∑
i=1
e−ik·xi for k 6= 0
δ0 = δ̄ =
ρ̄
〈ρ〉
− 1 = N − 〈N〉
〈N〉
, (2.38)
where N is the total number of galaxies in the volume V , and 〈N〉 = 〈ρ〉V is its expectation
value.
2.5.1 Poisson distribution again
We apply now our new Fourier method to the Poisson distribution, where the locations xi are
independent random numbers, so that the complex numbers e−ik·xi in (2.38) are distributed
randomly on the unit circle of the complex plane. Doing the sum
∑N
i=1 e
−ik·xi thus executes a
random walk on the complex plane, with step length 1.
To get the power spectrum,
|δk|2 = δ∗kδk =
1
〈N〉2
∑
ij
eik·(xi−xj) =
1
〈N〉2
∑
i 6=j
eik·(xi−xj) +
∑
i
1

=
1
〈N〉2
2∑
pairs
cos(k · (xi − xj)) +N
 . (2.39)
There is an equal probability for the cos(k·(xi−xj)) to be positive or negative, so the expectation
value of the first term vanishes, and we get
P (k) = V 〈|δk|2〉 = V
〈N〉
〈N〉2
=
V
〈N〉
=
1
〈ρ〉
, (2.40)
which is independent of k, i.e., the spectral index is n = 0.
In Sec. 9.1 we redo this for a correlated distribution, where we see that the effect of having
a discrete set of galaxies instead of a continuous density adds this V/〈N〉 = 1/〈ρ〉 term to the
power spectrum. This added term is called shot noise. The higher the density of points (galaxies
included in the survey) the smaller is this shot noise. For an estimate of the power spectrum of
the underlying mass distribution we subtract this shot noise.
Probability distribution of P̂ (k): We will later discuss estimation of power spectrum from galaxy
surveys in more detail, but let us already consider how individual realizations differ from the expectation
value, i.e., how is
P̂ (k) ≡ V |δ̂k|2 (2.41)
distributed around the expectation value P (k). We assume a fixed N (which may be different from 〈N〉),
i.e., we do not here fold in the probability distribution of N . Consider the complex number δ̂k as a 2D
vector
δ̂k =
1
N
N∑
i=1
e−ik·xi =
1
N
(∑
i
cos k · xi , −
∑
i
sin k · xi
)
, (2.42)
2 DISTRIBUTION OF GALAXIES 33
which points to the endpoint of the random walk (or Nδ̂k does). Consider
the real part of δ̂k:
1
N
∑
i
cos k · xi . (2.43)
Here the terms in the sum are independent random variables with a nonuni-
form probability distribution (k·xi has a uniform probability distribution18).
We apply the central limit theorem: The sum (or mean) of independent
random variables approaches the normal (Gaussian) distribution,
P(x) =
1√
2πσ2
e−
1
2 (x−µ)
2/σ2 , (2.44)
where µ is the expectation value and σ2 the variance, as N →∞, regardless
of the probability distribution of the individual variables. If each variable has the same probability
distribution, with expectation value µ and variance σ2, then their mean will have expectation value µ
and variance σ2/N .
The expectation value of cos k · xi is zero and the variance is 12π
∫ 2π
0
cos2 xdx = 12 . Thus Re δ̂k has
the probability distribution
P
(
Re δ̂k = x
)
=
√
N
π
e−Nx
2
. (2.45)
The imaginary part has the same probability distribution. Now clearly the real and imaginary parts are
correlated. The individual terms are fully correlated since sin =
√
1− cos2. Some of this correlation
remains for the sums, especially in the large-|δ̂k| tail of the probability distribution. Clearly, if the real
part is close to its possible maximum value (cos k ·xi has mostly landed near 1), then the imaginary part
has to be small, and vice versa. However, far from the tail we can expect the correlation between the
sums (the two components of the random walk) be negligible. Making this approximation,19 we get the
2D probability distribution
P
(
δ̂k
)
=
N
π
e−N |δ̂k|
2
. (2.46)
To convert this into a probability distribution for |δ̂k|2, we need to integrate. Do first
P
(
|δ̂k| = r
)
dr = Ne−Nr
2
2rdr . (2.47)
This is known as the Rayleigh distribution. For s = |δ̂k|2 = r2, ds = dr2 = 2rdr, so
P
(
|δ̂k|2 = s
)
ds = Ne−Nsds , (2.48)
and
P
(
|δ̂k|2 > s
)
= N
∫ ∞
s
e−Nsds = e−Ns . (2.49)
The mean of this distribution is 1/N and the variance is 2/N2, i.e,
〈P̂ (k)〉 = V
N
=
1
ρ̄
and
〈(
P̂ (k)− 〈P̂ (k)〉
)2〉
= 2
(
V
N
)2
=
2
ρ̄2
. (2.50)
Thus, although the expectation value of P̂ (k) agrees with P (k), the variance around it is large and
the most probable value is actually P̂ (k) = 0 ! Note, however, that this is for an individual Fourier mode
k, and to estimate P (k) from a survey one would take the mean over a large number of Fourier modes
for which |k| falls between k and k + dk, and the variance of this mean is then much lower.
18Comment by Elina Keihänen: I do not think k ·x has uniform distribution, except in 1D. It is a sum of three
terms, k1x1 + k2x2 + k3x3, each of which separately is uniformly distributed, but their sum is not. Suprisingly,
(2.45) still seems to hold. I checked this numerically.
19Peacock[1] does not mention any such approximation (maybe it comes as part of the N � 1 limit), but I need
to make it to get his Eq. (16.115), i.e., our (2.49).
2 DISTRIBUTION OF GALAXIES 34
We get the probability distribution of N by just applying (2.20) but considering the full volume V
as one subvolume of an infinite universe, i.e., replacing n by N ,
P(N) = 〈N〉N exp
−〈N〉
N !
. (2.51)
3 SUBSPACES OF LOWER DIMENSION 35
3 Subspaces of lower dimension
Consider now how the statistics of a 1D or 2D subspace are related to the statistics in the full 3D
space. A key starting point here is that because of statistical isotropy, the correlation function
ξ(r) = 〈δ(x)δ(x + r)〉 (3.1)
is the same, whether x and x + r are restricted on a 1D line or 2D plane, or not.
3.1 Skewers
Consider the power spectrum P1D(k) along a straight line (‘skewer’) going through 3D space.
Since
P1D(k) =
2k
π
∫ ∞
0
ξ(r) cos krdr and ξ(r) =
∫ ∞
0
P3D(k)
sin kr
kr
dk
k
, (3.2)
we have
P1D(k) =
k
π
∫ ∞
0
dq
q2
P3D(q)
∫ ∞
0
dr
2 cos kr sin qr
r
= k
∫ ∞
k
dq
q2
P3D(q) , (3.3)
since (exercise) ∫ ∞
0
dr
2 cos kr sin qr
r
= πΘ(q − k) , (3.4)
where Θ(q − k) is the step function, 1 for q > k, 0 for q <
k. Shorter wavelength q > k modes in 3D contribute to the
observed power in 1D at k, since when q is not parallel to the
line, the intersection of the 3D plane wave with the line has
a longer wavelength. In terms of P1D(k) = (π/k)P1D(k) and
P3D(k) = (2π
2/k3)P3D(k) this reads
P1D(k) =
1
2π
∫ ∞
k
dq
q
q2P3D(q) . (3.5)
This means that if P3D has a spectral index n ≥ −2 (P3D(q) ∝ q or steeper), P1D(k) at a
given scale k will be dominated by much smaller-scale (higher-k) 3D structure, and the true
larger-scale 3D structure cannot be seen in a 1D survey.
From (3.5) we see that
dP1D(k)
dk
= − 1
2π
kP3D(k) , (3.6)
so that P1D(k) is necessarily monotonously decreasing. Thus we always have n1D < 0, even if
n3D > 0.
Example: Let’s redo the calculation of (3.5) in a way that is maybe easier to generalize to other
cases: Start from the d-dimensional integrals
P1D(k) =
∫ ∞
−∞
dr e−ik·rξ(r) and ξ(r) =
1
(2π)3
∫
d3q eiq·rP3D(q) , (3.7)
so that
P1D(k) =
1
(2π)3
∫
drd3qe−ik·reiq·rP3D(q) , (3.8)
where the vectors r and k lie along the 1D line (which we can take as the x axis). Thus in the exponentials
(q− k) · r = (qx − k)r and the r integral gives
1
2π
∫
dr ei(qx−k)r = δD1 (k − qx) , (3.9)
3 SUBSPACES OF LOWER DIMENSION 36
forcing qx = k. Write now q = (k,h) where h is 2-dimensional. Now
P1D(k) =
1
(2π)2
∫
d2hP3D(q) =
1
2π
∫
hdhP3D(q) . (3.10)
Since (q)2 = k2 + h2, hdh = qdq and q ≥ k and we have (3.5).
For the case of discrete objects (‘galaxies’) we have to allow a
finite thickness for the skewer (an infinitely thin line will catch
zero galaxies, if they are treated as points). Consider thus a