Descarga la aplicación para disfrutar aún más
Vista previa del material en texto
Galaxy Survey Cosmology, part 1 Hannu Kurki-Suonio 8.2.2021 Contents 1 Statistical measures of a density field 1 1.1 Ergodicity and statistical homogeneity and isotropy . . . . . . . . . . . . . . . . 1 1.2 Density 2-point autocorrelation function . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Fourier expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Power spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Bessel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.7 Spherical Bessel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.8 Power-law spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.9 Scales of interest and window functions . . . . . . . . . . . . . . . . . . . . . . . 18 2 Distribution of galaxies 25 2.1 The average number density of galaxies . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Galaxy 2-point correlation function . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Counts in cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5 Fourier transform for a discrete set of objects . . . . . . . . . . . . . . . . . . . . 31 2.5.1 Poisson distribution again . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Subspaces of lower dimension 35 3.1 Skewers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4 Angular correlation function for small angles 38 4.1 Relation to the 3D correlation function . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.1 Selection function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.2 Small-angle limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1.3 Power law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Power spectrum for flat sky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.1 Relation to the 3D power spectrum . . . . . . . . . . . . . . . . . . . . . . 43 5 Spherical sky 45 5.1 Angular correlation function and angular power spectrum . . . . . . . . . . . . . 45 5.2 Legendre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 Spherical harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Euler angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.5 Wigner D-functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.6 Relation to the 3D power spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6 Dynamics 54 6.1 Linear perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.2 Nonlinear growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7 Redshift space 61 7.1 Redshift space as a distortion of real space . . . . . . . . . . . . . . . . . . . . . . 61 7.2 Linear perturbations and the power spectrum . . . . . . . . . . . . . . . . . . . . 62 7.3 Correlation function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.3.1 Linear growing mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.3.2 Projected correlation function . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.4 Small scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.5 Redshift space in Friedmann–Robertson–Walker universe . . . . . . . . . . . . . . 70 7.6 Alcock–Paczyński effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8 Measuring the correlation function 74 8.1 Bias and variance of different estimators . . . . . . . . . . . . . . . . . . . . . . . 76 9 Power spectrum estimation 83 9.1 Shot noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 9.2 Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 9.3 Selection function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 9.4 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 10 Baryon acoustic oscillation scale as a standard ruler 90 11 Higher-order statistics 94 11.1 N-point correlation function and (N-1)-spectrum . . . . . . . . . . . . . . . . . . 94 11.2 Three-point correlation function and bispectrum . . . . . . . . . . . . . . . . . . 95 11.3 Measuring the three-point correlation function . . . . . . . . . . . . . . . . . . . 97 11.4 Higher-order statistics in cosmology . . . . . . . . . . . . . . . . . . . . . . . . . 97 11.5 Results from galaxy surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 12 Galaxy surveys 99 12.1 Sloan Digital Sky Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 12.1.1 SDSS-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 12.1.2 SDSS-III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 12.1.3 SDSS-IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Preface These are the lecture notes of the first part (GSC1) of my Galaxy Survey Cosmology course lectured at the University of Helsinki in spring 2017. The second part of the course discussed gravitational lensing with the focus on weak lensing and cosmic shear. The lecture notes for the second part (GSC2) are based on the lectures by Schneider in the textbook Schneider, Kochanek, and Wambsganss (Gravitational Lensing: Strong, Weak and Micro; Springer 2006), and they are available in hand-written form only. This was the first time I lectured this course, and consequently the lecture notes are a bit raw. In the future I hope to add more material about the practical aspects and cosmological results from galaxy surveys. – Hannu Kurki-Suonio, May 2017 Preface for 2019 The current version of these lecture notes contains no introduction, but jumps directly to the mathematical formulation of correlation functions and power spectra, the main tools in galaxy survey cosmology. For a 4-page introduction to the field, read Sec. 2.7 of [2]. I also gave a (different) introduction during the first lecture. The amount of calculus in these lecture notes may seem formidable to some students. I have aimed for completeness so these notes can be used as a reference for results that may be needed, but the student need not absorb all of the mathematical results. This year I have added new material including recent observational results, and correspondingly some of the older material (Sec. 3 and the latter half of Sec. 5) in these notes were not covered in the course. – Hannu Kurki-Suonio, February 2019 Preface for 2021 These lecture notes will be updated as the course progresses. The current version is essentially as it was at the end of the 2019 course, except some typos and errors have been fixed. I thank Elina Keihänen for finding some of them. – Hannu Kurki-Suonio, January 2021 1 STATISTICAL MEASURES OF A DENSITY FIELD 1 1 Statistical measures of a density field We begin by discussing statistical measures of a density field ρ(x) in Euclidian d-dimensional space. We begin with a general treatment where we do not specify in more detail what density we are talking about. It may refer to number density of objects such as galaxies (which will be the main application) or just mass density, but we treat ρ(x) as a continuous quantity for now. In d dimensions the volume element corresponding to radial distance between r and r + dr is1 dV = Cdr d−1dr , where Cd = 2πd/2 Γ(d/2) , (1.1) and the volume within distance R is V (R) = Cd dRd . (1.2) We will have applications for d = 1, 2, and 3, for which C1 = 2 , C2 = 2π , C3 = 4π . (1.3) The main application is d = 3 (3D), but d = 1 (1D) corresponds to, e.g., a pencil-beam survey with a very long exposure of a small field on the sky with distance (redshift) determinations for a large number of galaxies along this line of sight. The main 2D application is the distribution of galaxies on the sky, without distance determinations, when the sky is approximated as a flat plane, but it also corresponds to a redshift survey along a great circle on the sky (e.g., the equator, see Fig. 1). We assume that the density variations originate from a statistically isotropic and homoge- neous ergodic random process, and we are really interested in the statistics of this random process rather than in that of a particular realization of ρ(x). It is currently thought that initial density perturbations in the Universe2 were produced during inflation in the very early universe by quantum fluctuations of the inflaton field, which is a random process that in standard models of inflation satisfies these properties of isotropy, homogeneity, and ergodicity. The density field then evolved until today through determinis- tic physics, which modified its various statistical measures, but maintained these fundamental properties. We follow [1] and [2]. Sec. 2.7 of [2] gives a 4-page introduction to the field. I recommend reading it at this point. 1.1 Ergodicity and statistical homogeneity and isotropy Statistical properties are typically defined as averages of some quantities. We will deal with two kind of averages: volume average and ensemble average.3 The volume average applies to a particular realization (and to some volume V in it). We denote the volume average of a quantity f(x) with the overbar, f̄ , and it is defined as f̄ ≡ 1 V ∫ V ddxf(x) . (1.4) 1Here Γ(x) = (x − 1)! is the gamma function, with values Γ( 1 2 ) = √ π, Γ(1) = 1, Γ( 3 2 ) = √ π/2, etc. You get easily other values using the recursion formula Γ(x+ 1) = xΓ(x). 2‘Universe’ with a capital U refers to the universe we live in; whereas ’universe’ refers to the theoretical concept, or any hypothetical universe we may consider. 3We are redoing here material from Cosmology II, Section 8.1 (2018 version), but with a different approach. In Cosmo II the approach was theoretical, so we assumed the volume V was very large so that ρ̄ = 〈ρ〉. Now the volume V is related to the volume of a galaxy survey, and for accurate treatment we need to take into account that ρ̄ 6= 〈ρ〉. 1 STATISTICAL MEASURES OF A DENSITY FIELD 2 Figure 1: Distribution of galaxies according to the Sloan Digital Sky Survey (SDSS). This figure shows galaxies that are within 2◦ of the equator and closer than 858 Mpc (assuming H0 = 71 km/s/Mpc). Figure from astro-ph/0310571[12]. 1 STATISTICAL MEASURES OF A DENSITY FIELD 3 The ensemble average refers to the random process. We assume that the observed density field is just one of an ensemble of an infinite number of possible realizations that could have resulted from the random process. To know the random process, means to know the probability distribution Prob(γ) of the quantities γ produced by it. (At this stage we use the abstract notation of γ to denote the infinite number of these quantities. They could be the values of the density field ρ(x) at every location, or its Fourier coefficients ρk.) The ensemble average of a quantity f depending on these quantities γ as f(γ) is denoted by 〈f〉 and defined as the (possibly infinite-dimensional) integral 〈f〉 ≡ ∫ dγProb(γ)f(γ) . (1.5) Here f could be, e.g., the value of ρ(x) at some location x. The ensemble average is also called the expectation value. Thus the ensemble represents a probability distribution. And the properties of the density field we will discuss (e.g., statistical homogeneity and isotropy, and ergodicity, see below) will be properties of this ensemble. Statistical homogeneity means that the expectation value 〈f(x)〉 must be the same at all x, and thus we can write it as 〈f〉. Statistical isotropy means that for quantities which involve a direction, the statistical properties are independent of the direction. For example, for vector quantities v, all directions must be equally probable. This implies that 〈v〉 = 0. If theoretical properties are those of an ensemble, and we can only observe one realization (the Universe) from that ensemble, how can we compare theory and observation? It seems reasonable that the statistics we get by comparing different parts of a large volume should be similar to the statistics of a given part over different realizations, i.e., that they provide a fair sample of the probability distribution. This is called ergodicity. Fields f(x) that satisfy f̄ → 〈f〉 as V →∞ (1.6) are called ergodic. We assume that the density field is ergodic. It can be shown that a statistically homogeneous and isotropic Gaussian random process is ergodic4 (but we do not here make the assumption of Gaussianity). Because of the ergodicity assumption, the concepts of volume and ensemble average are not always kept clearly separate in literature, so that the notation 〈·〉 is used without specifying which one it refers to, but we shall distinguish between these concepts.5 The equality of f̄ with 〈f〉 does not hold for a finite volume V ; the difference is called sample variance or cosmic variance. The larger the volume, the smaller is the difference. Since cosmological theory predicts 〈f〉, whereas observations probe f̄ for a limited volume, cosmic variance limits how accurately we can compare theory with observations. 1.2 Density 2-point autocorrelation function We define the density perturbation field as δ(x) ≡ ρ(x)− 〈ρ〉 〈ρ〉 . (1.7) Since ρ ≥ 0, necessarily δ ≥ −1. From statistical homogeneity, 〈ρ(x)〉 = 〈ρ〉 ⇒ 〈δ〉 = 0 . (1.8) 4Liddle & Lyth [7] make this statement on p. 73, but do not give a reference to an actual proof. 5The 〈·〉 notation is more convenient than ·̄ for complicated expressions, so we may sometimes use 〈·〉V for volume average. 1 STATISTICAL MEASURES OF A DENSITY FIELD 4 Thus we cannot use 〈δ〉 as a measure of the inhomogeneity. Instead we can use the square of δ, which is necessarily nonnegative everywhere, so it cannot average out like δ did. Its expectation value 〈δ2〉 is the variance of the density perturbation, and the square root of the variance, δrms ≡ √ 〈δ2〉 (1.9) the root-mean-square (rms) density perturbation, is a typical expected absolute value of δ at an arbitrary location.6 It tells us about how strong the inhomogeneity is, but nothing about the shapes or sizes of the inhomogeneities. To get more information, we introduce the correlation function ξ. We define the density 2-point autocorrelation function (often called just correlation function) as ξ(x1,x2) ≡ 〈δ(x1)δ(x2)〉 . (1.10) It is positive if the density perturbation is expected to have the same sign at both x1 and x2, and negative for an overdensity at one and underdensity at the other. Thus it probes how density perturbations at different locations are correlated with each other. Due to statistical homogeneity, ξ(x1,x2) can only depend on the difference (separation) r ≡ x2−x1, so we redefine ξ as ξ(r) ≡ 〈δ(x)δ(x + r)〉 . (1.11) From statistical isotropy, ξ(r) is independent of direction, i.e., spherically symmetric (we use this as a generic term for arbitrary d – we might also say ‘isotropic’ – i.e., for d = 2, read ‘circularly symmetric’, and for d = 1, read ‘even’), ξ(r) = ξ(r) . (1.12) The correlation function is large and positive for r smaller than the size of a typical over- or underdense region, and becomes small for larger separations. The correlation function at zero separation gives the variance of the density perturbation, 〈δ2〉 ≡ 〈δ(x)δ(x)〉 ≡ ξ(0) . (1.13) We define the volume average of ξ up to a distance R as ξ̄(R) ≡ 1 V (R) ∫ R 0 ξ(r)Cdr d−1dr . (1.14) For d = 3 this becomes ξ̄(R) ≡ 3 R3 ∫ R 0 ξ(r)r2dr ≡ 3 R3 J3(R) , (1.15) where J3(R)≡ ∫ R 0 ξ(r)r2dr (1.16) is called the “J3 integral” (not a Bessel function; see Sec. 7.3.1 for more on J` and K` integrals). Exercise: Integral constraint for a single realization. For a single realization and finite volume (so that ρ̄ 6= 〈ρ〉) we can define δ̂(x) ≡ ρ(x)− ρ̄ ρ̄ and ξ̂(r) ≡ 1 V ∫ V ddx δ̂(x)δ̂(x + r) . (1.17) 6In other words, δrms is the standard deviation of ρ/〈ρ〉. 1 STATISTICAL MEASURES OF A DENSITY FIELD 5 Figure 2: The 2-point correlation function ξ(r) from galaxy surveys. Left: Small scales shown in a log-log plot. The circles with error bars show the observational determination from the APM galaxy survey [4]. The different lines are theoretical predictions by [5] (this is Fig. 9 from [5]). Right: Large scales shown in a linear plot. Red circles with error bars show the observational determination from the CMASS Data Release 9 (DR9) sample of the Baryonic Oscillation Spectroscopic Survey (BOSS). The dashed line is a theoretical prediction from the ΛCDM model. The bump near 100h−1Mpc is the baryon acoustic oscillation (BAO) peak that will be discussed in Sec. 10. This is Fig. 2a from [6]. 1. Theoretical approach: assume periodic boundary conditions. This makes also ξ̂(r) periodic. All integrals, also (1.18), are to be taken over the volume V . Show that∫ V ddr ξ̂(r) = 0 (1.18) (the integral constraint). Thus the positive values of ξ̂ at small separations must be compensated by negative values at larger r. Note that here we do not need any statistical assumptions (like statistical homogeneity or ergodicity). If ξ̂(r) → 0 for large r fast enough, for large volumes the boundary conditions do not matter. 2. Practical approach: To avoid using boundary conditions and going outside the volume, redefine ξ̂(r) ≡ 1 V ∫ ddx δ̂(x)δ̂(x + r) (1.19) so that the integral for each r goes over only those values of x, for which both x and x + r are within the volume. This is what one does with real galaxy surveys. Show that∫ ddr ξ̂(r) = 0 , (1.20) where the integral goes over those values of r, for which ξ̂(r) is defined by (1.19), i.e., r separates two points inside the volume. 1.3 Fourier expansion Fourier analysis is a method for separating out the different distance scales, so that the depen- dence of the physics on distance scale becomes clear and easy to handle. 1 STATISTICAL MEASURES OF A DENSITY FIELD 6 For a Fourier analysis of the density field we consider a cubic volume V = Ld and assume periodic boundary conditions.7 We can now expand any function of space f(x) as a Fourier series f(x) = ∑ k fke ik·x , (1.21) where the wave vectors k = (k1, . . . , kd) take values ki = ni 2π L , ni = 0,±1,±2, . . . (1.22) (Note that we use the Fourier conventions of [2], not those of [1].) The Fourier coefficients fk are obtained as fk = 1 V ∫ V f(x)e−ik·xddx . (1.23) The term k = 0 gives the mean value, f0 = f̄ . (1.24) The Fourier coefficients are complex numbers even though we are dealing with real quantities8 f(x). From the reality f(x)∗ = f(x) follows that f−k = f ∗ k . (1.25) Thus Fourier modes comes in pairs fke ik·x + f∗ke −ik·x = 2 ∗ Refk cos k · x− 2 ∗ Imfk sin k · x , (1.26) and only the real part of each, Refk cos k · x− Imfk sin k · x, survives; so to visualize a Fourier mode, just visualize this real part. The size of the Fourier coefficients depends on the volume V – increasing V tends to make the fk smaller to compensate for the denser sampling of k in Fourier space. The Fourier expansion is an expansion in terms of plane waves eik·x, which form an orthogonal and complete (closed) set of functions in the Euclidean volume V . (We will later encounter other such expansions in terms of other functions.) Thus they satisfy the orthogonality relation∫ dV ( eik·x )∗ ( eik ′·x ) = ∫ dV ei(k ′−k)·x = V δkk′ , (1.27) where δkk′ is the Kronecker delta (δkk′ = 1 for k = k ′, and δkk′ = 0 otherwise), and the closure (completeness) relation 1 V ∑ k ( eik·x )∗ ( eik·x ′ ) = 1 V ∑ k eik·(x ′−x) = δdD(x ′ − x) , (1.28) 7This does not imply that the density field should be periodic in reality; we are just interested only in the density field within the volume, and so we can replace the part outside the volume with a periodic replication of the volume. This will introduce discontinuities at the volume boundary. The expansion (1.21) itself does not assume anything about f(x) outside V (the expansion will be correct inside V and outside V it will represent such a periodic extension; the discontinuities imply that the expansion will contain high-k modes); but some of the discussion below, like convolution with a window function, makes use of this periodicity. Near the boundary of the volume the window function will extend outside the volume; and thus in reality this will introduce edge effects as the real universe is not periodic. This will have to be treated (sometime later) together with the fact that an actual survey does not cover exactly a cubic volume. For theoretical work one can also take V to be much larger than the observable universe so that the boundaries are so far away that they do not matter. 8In GSC1 we deal only with real quantities. In GSC2, where we discuss gravitational lensing, we introduce the complex shear, so these reality conditions do not apply to its Fourier coefficients/transform. 1 STATISTICAL MEASURES OF A DENSITY FIELD 7 where δdD(x ′ − x) is the d-dimensional Dirac delta function.9 Do not confuse the Dirac and Kronecker deltas with the density perturbation δ(x) or its Fourier coefficient δk! Thus the functions { 1√ V eik·x } (1.30) form an orthonormal set. The point of a completeness relation10 for orthogonal functions is that any function can indeed be expanded in them. Here∑ k fke ik·x = 1 V ∑ k ∫ V ddx′f(x′)e−ik·x ′ eik·x = ∫ ddx′f(x′)δdD(x− x′) = f(x) . (1.31) The convolution theorem states that convolution in coordinate space becomes just multipli- cation in Fourier space (exercise): (f ∗ g)(x) ≡ ∫ V ddx′f(x′)g(x− x′) = ∫ V ddx′f(x− x′)g(x′) = V ∑ k fkgke ik·x , (1.32) and multiplication in coordinate space becomes convolution in Fourier space (exercise): 1 V ∫ V f(x)g(x)e−ik·xddx = ∑ q fqgk−q . (1.33) The Plancherel formula states (exercise): 1 V ∫ V ddxf(x)g(x) = ∑ k f∗kgk . (1.34) With g = f this becomes the Parseval formula: 1 V ∫ V ddxf(x)2 = ∑ k |fk|2 . (1.35) A great benefit of Fourier analysis is that derivation is replaced by multiplication: g(x) ≡ ∇f(x) = ∇ ∑ k fke ik·x = ∑ k ikfke ik·x ⇒ gk = ikfk . (1.36) 1.4 Fourier transform The separation of neighboring ki values is ∆ki = 2π/L, so we can write f(x) = ∑ k fke ik·x ( L 2π )d ∆k1 . . .∆kd ≈ 1 (2π)d ∫ f(k)eik·xddk , (1.37) 9The Dirac delta function is not a true function but rather an operator (the correct mathematical term is ’distribution’) defined by its action on a function f(x) under an integral:∫ V δdD(x ′ − x)f(x′)ddx ≡ f(x) . (1.29) It can be thought of as a limit of a set of functions that have large values very near 0 and are close to zero elsewhere. 10For some reason these closure relations are rarely given in standard sources for mathematical methods. (Mathematicians do not like the Dirac delta function?) For example, I could not find Eq. (1.28) anywhere. 1 STATISTICAL MEASURES OF A DENSITY FIELD 8 where f(k) ≡ Ldfk . (1.38) replacing the Fourier series with the Fourier integral. In the limit V → ∞, the approximation in (1.37) becomes exact, and we have the Fourier transform pair f(x) = 1 (2π)d ∫ f(k)eik·xddk f(k) = ∫ f(x)e−ik·xddx . (1.39) Note that this assumes that the integrals converge, which requires that f(x)→ 0 for |x| → ∞.11 Thus we don’t use this for, e.g., δ(x), but for, e.g., the correlation function ξ(x) the Fourier transform is appropriate. A special case is f = 1 (which does not satisfy f(x) → 0, so it does not lead to a true function), whose Fourier transform is the Dirac delta function:∫ ddxe−ik·x = (2π)dδdD(k) . (1.40) Writing k− k′ in placeof k we get∫ ddxei(k ′−k)·x = (2π)dδdD(k− k′) , (1.41) the orthogonality relation of plane waves for the infinite volume. The orthonormal set is thus{ 1 (2π)d/2 eik·x } . (1.42) The closure relation is the same except x and k change places:∫ ddkeik·(x ′−x) = (2π)dδdD(x ′ − x) . (1.43) The convolution theorem becomes (exercise): (f ∗ g)(x) ≡ ∫ ddx′f(x′)g(x− x′) = 1 (2π)d ∫ ddk f(k)g(k)eik·x (f ∗ g)(k) ≡ ∫ ddk′f(k′)g(k− k′) = (2π)d ∫ ddx f(x)g(x)e−ik·x , (1.44) so that the Fourier transform of (f ∗ g)(x) is f(k)g(k) and the Fourier transform of (f ∗ g)(k) is (2π)df(x)g(x). The Plancherel theorem (exercise) is∫ ddx f(x)g(x) = 1 (2π)d ∫ ddk f∗(k)g(k) (1.45) and the Parseval theorem is ∫ ddx f(x)2 = 1 (2π)d ∫ ddk |f(k)|2 . (1.46) 11The condition is tighter than this, but the condition that ∫ |f(x)|ddx over the infinite volume is finite, assumed sometimes in literature (for 1D), seems too tight, as it is not satisfied by any power law, and yet we transform power laws successfully in Sec. 1.8. 1 STATISTICAL MEASURES OF A DENSITY FIELD 9 Even with a finite V we can use the Fourier integral as an approximation. Often it is conceptually simpler to work first with the Fourier series (so that one can, e.g., use the Kronecker delta δkk′ instead of the Dirac delta function δ d D(k − k′)), replacing it with the integral in the end, when it needs to be calculated. The recipe for going from the series to the integral is( 2π L )d∑ k → ∫ ddk Ldfk → f(k)( L 2π )d δkk′ → δdD(k− k′) (1.47) Exercise: CMB lensing. For a small part of the sky we can use the flat-sky approximation, treating it as a 2D plane. As is common in this context, denote the 2D coordinate by θ and the corresponding 2D wave vector by l. Gravitational lensing deflects the CMB photons so that a photon originating from θ is seen coming from θ+∇ψ(θ), where ψ(θ) is the lensing potential. Thus the observed (Tobs) and unlensed (T ) CMB temperatures are related Tobs(θ) = T [θ +∇ψ(θ)] ≈ T (θ) +∇ψ(θ) · ∇T (θ) . (1.48) Express Tobs(l) in terms of T (l) and ψ(l). 1.5 Power spectrum We now expand the density perturbation as a Fourier series (assuming a large cubic box V = Ld with periodic boundary conditions) δ(x) = ∑ k δke ik·x , (1.49) with δk = 1 V ∫ V δ(x)e−ik·xddx (1.50) and δ−k = δ ∗ k. Note that 〈δ(x)〉 = 0 ⇒ 〈δk〉 = 0 . (1.51) The Fourier coefficients of the density field ρ(x) and the density perturbation δ(x) are related by ρk = 〈ρ〉δk for k 6= 0 , (1.52) since the k 6= 0 coefficients vanish for the homogeneous part, and ρ0 = ρ̄ = 〈ρ〉(1 + δ0) = 〈ρ〉(1 + δ̄) , (1.53) where δ̄ = ρ̄− 〈ρ〉 〈ρ〉 (1.54) (see Eq. 1.7) is the mean density perturbation within the volume V indicating whether the volume is over- or underdense. In analogy with the correlation function ξ(x,x′), we may ask what is the corresponding correlation in Fourier space, 〈δ∗kδk′〉. Note that due to the mathematics of complex numbers, correlations of Fourier coefficients are defined with the complex conjugate ∗. This way the 1 STATISTICAL MEASURES OF A DENSITY FIELD 10 correlation of δk with itself, 〈δ∗kδk〉 = 〈|δk|2〉 is a real (and nonnegative) quantity, the expectation value of the absolute value (modulus) of δk squared, i.e., the variance of δk. Calculating 〈δ∗kδk′〉 = 1 V 2 ∫ ddxeik·x ∫ ddx′e−ik ′·x′〈δ(x)δ(x′)〉 = 1 V 2 ∫ ddxeik·x ∫ ddre−ik ′·(x+r)〈δ(x)δ(x + r)〉 = 1 V 2 ∫ ddre−ik ′·rξ(r) ∫ ddxei(k−k ′)·x = 1 V δkk′ ∫ ddre−ik·rξ(r) ≡ 1 V δkk′P (k) , (1.55) where we used 〈δ(x)δ(x + r)〉 = ξ(r), which results from statistical homogeneity. Note that here δkk′ is the Kronecker delta, not a density perturbation! Thus, from statistical homogeneity follows that the Fourier coefficients δk are uncorrelated. The quantity P (k) ≡ V 〈|δk|2〉 = ∫ ddr e−ik·rξ(r) , (1.56) which gives the variance of δk, is called the power spectrum of δ(x). Since the correlation function → 0 for large separations, we can replace the integration volume V in (1.56) with an infinite volume. Thus the power spectrum and correlation function form a d-dimensional Fourier transform pair, so that ξ(r) = 1 (2π)d ∫ ddk eik·rP (k) . (1.57) Unlike the correlation function, the power spectrum P (k) is positive everywhere. The correlation function is a dimensionless quantity, whereas the power spectrum P (k) has the dimension of volume (δ(x) and δk are dimensionless). We noted earlier that the magnitude of Fourier coefficients depends on the volume V . From (1.56) we see that the typical magnitude of δk goes down with volume as ∝ √ V . Although the density of k-modes increases ∝ V , neighboring δk are uncorrelated, so they add up incoherently, so that, e.g., 4 times as many k modes bring only a factor of 2 increase in ∑ k δke ik·x, to be compensated by the δk being a factor of 2 smaller. From statistical isotropy ξ(r) = ξ(r) ⇒ P (k) = P (k) (1.58) (the Fourier transform of a spherically symmetric function is also spherically symmetric), so that the variance of δk depends only on the magnitude k of the wave vector k, i.e., on the corresponding distance scale. Since small distance scales correspond to large k and vice versa, to avoid confusion it is better to use the words high and low instead of “large” and “small” for k, i.e., small scales correspond to high k, and large scales to low k. Using the recipe (1.47) for going from Fourier coefficients to Fourier transform, (1.55) gives 〈δ(k)∗δ(k′)〉 ≡ (2π)dδdD(k− k′)P (k) . (1.59) Notice that with δk we can write P (k) ≡ V 〈δ∗kδk〉 (without having to use δkk′ in the equation), but with δ(k) we need to use the δD-function in the definition of P (k). The correlation function is more closely connected to observations, whereas theoretical pre- dictions come more naturally in terms of P (k), especially at large distance scales, where the density perturbations are small and closer to their primordial state. In principle, when we have determined one of ξ(r) and P (k) from observations, we get the other by Fourier transform. In 1 STATISTICAL MEASURES OF A DENSITY FIELD 11 practice, observational errors make this inaccurate, and it is better to determine each one sep- arately with a method optimized for it. Especially for large separations, where ξ(r) is small, it is difficult to determine it accurately, if at all. For these reasons, density perturbations at large distance scales (low k) are more commonly discussed in terms of P (k) and for small distance scales (small r) in terms of ξ(r). For the density variance we get 〈δ2〉 ≡ ξ(0) = 1 (2π)d ∫ ddk P (k) = Cd (2π)d ∫ ∞ 0 P (k)kd−1dk = Cd (2π)2 ∫ ∞ 0 kdP (k) dk k ≡ ∫ ∞ −∞ P(k)d ln k . (1.60) where we have defined P(k) ≡ Cdk d (2π)d P (k) = k π P (k) , k2 2π P (k) , k3 2π2 P (k) for d = 1, 2, 3 . (1.61) Another common notation for P(k) is ∆2(k).12 The word “power spectrum” is used to refer to both P (k) and P(k). Of these two, P(k) has the more obvious physical meaning: it gives the contribution of a logarithmic interval of scales, i.e., from k to ek, to the density variance. P(k) is dimensionless, whereas P (k) has the dimension of (d-dimensional) volume. See Fig. 3 for the observed power spectrum from the Sloan Digital Sky Survey. The pair of (1.60) is P (0) = lim k→0 P (k) = ∫ ddr ξ(r) . (1.62) If 13 P (k)→ 0 as k → 0 we get the integral constraint∫ ddr ξ(r) = 0 . (1.63) Therefore ξ(r) must become negative for some r, so that at such a separation from an overdense region we are more likely to find an underdense region. (Going to ever larger separations, ξ as a function of r may oscillate around zero, the oscillation becoming ever smaller in amplitude. Most of the interest in ξ(r) is for the smaller r within the initial positive region.) For isotropic ξ(r) and P (k) we can switch to polar, spherical, etc. coordinates and do the angular integrals to rewrite (1.56) and (1.57) as 1-dimensional integrals (exercise): P (k) = ∫ ∞ 0 ξ(r) cos kr 2dr or P(k) = 2k π ∫ ∞ 0 ξ(r) cos krdr (1D) P (k) = ∫ ∞ 0 ξ(r)J0(kr) 2πrdr or P(k) = k2∫ ∞ 0 ξ(r)J0(kr)rdr (2D) P (k) = ∫ ∞ 0 ξ(r) sin kr kr 4πr2dr or P(k) = 2k 3 π ∫ ∞ 0 ξ(r) sin kr kr r2dr (3D) (1.64) and ξ(r) = 1 π ∫ ∞ 0 P (k) cos krdk = ∫ ∞ 0 P(k) cos krdk k (1D) ξ(r) = 1 2π ∫ ∞ 0 P (k)J0(kr)kdk = ∫ ∞ 0 P(k)J0(kr) dk k (2D) ξ(r) = 1 (2π)3 ∫ ∞ 0 P (k) sin kr kr 4πk2dk = ∫ ∞ 0 P(k)sin kr kr dk k (3D) , (1.65) 12The notation P(k) and calling it “power spectrum” is common among cosmologists. Astronomers seem to use the notation ∆2(k) for it, and reserve the word “power spectrum” for P (k). 13From (1.56), P (0) = V 〈(δ0)2〉, where δ0 = δ̄ = (ρ̄ − 〈ρ〉)/〈ρ〉. While 〈δ̄〉 = 0, 〈(δ̄)2〉 6= 0 for a finite V . We’ll come back to this later. 1 STATISTICAL MEASURES OF A DENSITY FIELD 12 0.01 0.1 1 k [Mpc -1 ] 0.0001 0.001 0.01 0.1 1 10 k 3 P (k ) / 2 π 2 0.01 0.1 1 k [Mpc -1 ] 100 1000 10000 1e+05 P (k ) [M p c3 ] Figure 3: The matter power spectrum from the SDSS obtained using luminous red galaxies [13]. The top figure shows P(k) and the bottom figure P (k). A Hubble constant value H0 = 71.4 km/s/Mpc has been assumed for this figure. (These galaxy surveys only obtain the scales up to the Hubble constant, and therefore the observed P (k) is usually shown in units of h Mpc−1 for k and h−3 Mpc3 for P (k), so that no value for H0 need to be assumed.) The black bars are the observations and the red curve is a theoretical fit, from linear perturbation theory, to the data. The bend in P (k) at keq ∼ 0.01 Mpc−1 is clearly visible in the bottom figure. Linear perturbation theory fails when P(k) & 1, and therefore the data points do not follow the theoretical curve to the right of the dashed line (representing an estimate on how far linear theory can be trusted). Figure by R. Keskitalo. 1 STATISTICAL MEASURES OF A DENSITY FIELD 13 where J0 is a Bessel function and the sin kr/kr = j0(kr) is a spherical Bessel function. In 3D, for the volume-averaged ξ̄(R) defined in (1.15) we get (exercise): ξ̄(R) = ∫ ∞ 0 P(k) [ 3(sin kR− kR cos kR) (kR)3 ] dk k = ∫ ∞ 0 P(k) [ 3 kR j1(kR) ] dk k . (1.66) “The factor in brackets dies off faster with increasing k than the (sin kr/kr) in (1.65), so ξ̄(R) gives a cleaner measure of the power spectrum at k ∼ 1/R than does ξ(R).” (MBW[2], p. 263.) (This is a comparison between j0(x) and (3/x)j1(x), where the jn(x) are spherical Bessel func- tions. These are elementary functions, so we do not discuss them more at this stage, but we will meet them later.) Example: We do the 2D case of (1.64), since it involves the non-elementary function J0: P (k) = ∫ d2r e−ik·rξ(r) = ∫ ∞ 0 rdr ξ(r) ∫ 2π 0 dϕ e−ikr cosϕ , (1.67) where the angular integral gives∫ 2π 0 dϕ e−ikr cosϕ = 2 ∫ π 0 dϕ cos(kr cosϕ) = 2πJ0(kr) , (1.68) where we used the integral representation of the Bessel function ([8], p. 680) J0(x) = 2 π ∫ π/2 0 cos(x sinϕ)dϕ = 2 π ∫ π/2 0 cos(x cosϕ)dϕ . (1.69) These two integrals are equal since cosϕ and sinϕ go over the same values at the same rate. Since (the outer) cos is an even function, the integrals give the same result over each quadrant, i.e, we could as well integrate from π/2 to π. Exercise: Continuation of an earlier exercise: Fourier expand the “observed” δ̂(x) of Eq. (1.17) to get its Fourier coefficients δ̂k. Show that ξ̂(r) = V (2π)d ∫ ddk |δ̂k|2eik·r , (1.70) for this single realization. Note that here we do not need any statistical assumptions (like statistical homogeneity or ergodicity). Contrast this result with (1.57). 1.6 Bessel functions The 2D Fourier transform brings in Bessel functions Jn(x), with n = 0, 1, 2, . . . They are mostly used on the positive real axis, where they are oscillating functions, whose amplitude decreases with increasing x, asymptotically as x−1/2. See Fig. 4. We list some of their properties. J0(0) = 1 and Jn(0) = 0 for n = 1, 2, . . . (1.71) Their power series begins Jn(x) = xn 2nn! − x n+2 2n+2(n+ 1)! + . . . (1.72) They have the integral representations Jn(x) = 1 π ∫ π 0 cos(nϕ− x sinϕ) dϕ = (−i) n π ∫ π 0 eix cosϕ cosnϕdϕ . (1.73) 1 STATISTICAL MEASURES OF A DENSITY FIELD 14 0 1 2 3 4 5 6 7 8 9 10 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Figure 4: The first three Bessel functions: J0 (blue), J1 (green), and J2 (red). Both integrals are even in ϕ, and periodic over 2π, so one can replace 1 π ∫ π 0 = 1 2π ∫ π −π = 1 2π ∫ a+2π a . (1.74) A number of recursion formulae relate them and their derivatives to each other: Jn−1(x) + Jn+1(x) = 2n x Jn(x) Jn−1(x)− Jn+1(x) = 2J ′n(x) Jn−1(x) = n x Jn(x) + J ′ n(x) Jn+1(x) = n x Jn(x)− J ′n(x) . (1.75) As a special case of (1.75b), J ′0(x) = −J1(x) . (1.76) The Bessel function closure relation applies to Bessel functions with the same n but different wavelengths: ∫ ∞ 0 Jn(αx)Jn(α ′x)x dx = 1 α δD(α− α′) . (1.77) A somewhat similar (note dx instead of x dx) integral with neighboring (n and n − 1) Bessel functions gives (Gradshteyn&Ryzhik[9] 6.512.3) ∫ ∞ 0 Jn(αx)Jn−1(βx) dx = βn−1 αn Θ(α− β) = 0 (α < β) 1 2α (α = β) βn−1 αn (α > β) , (1.78) 1.7 Spherical Bessel functions The spherical Bessel functions jn(x) (of integer order) are related to ordinary Bessel functions of half-integer order: jn(x) = √ π 2x Jn+1/2(x) . (1.79) Like Jn, they are mostly used on the positive real axis, where they are oscillating functions; their amplitude decreases faster, asymptotically as x−1. See Fig. 5. Unlike Jn, they are elementary functions (for integer n), see Table 1.7. 1 STATISTICAL MEASURES OF A DENSITY FIELD 15 0 1 2 3 4 5 6 7 8 9 10 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5: The first three spherical Bessel functions: j0 (blue), j1 (green), and j2 (red). Spherical Bessel functions j0(x) = sinx x j1(x) = sinx x2 − cosx x j2(x) = ( 3 x3 − 1 x ) sinx− 3 x2 cosx Table 1: Spherical Bessel functions. We list some of their properties. j0(0) = 1 and jn(0) = 0 for n = 1, 2, . . . (1.80) Their power series begins jn(x) = 2nn!xn (2n+ 1)! − 2 n(n+ 1)!xn+2 (2n+ 3)! + . . . (1.81) They have recursion formulae relating them and their derivatives to each other: jn−1(x) + jn+1(x) = 2n+ 1 x jn(x) njn−1(x)− (n+ 1)jn+1(x) = (2n+ 1) j′n(x) . (1.82) As a special case of (1.82b), j′0(x) = −j1(x) . (1.83) 1.8 Power-law spectra For certain ranges of scales, ξ(r) and P (k) can be approximated by a power-law form, ξ(r) ∝ r−γ or P (k) ∝ kn . (1.84) Note the minus sign for ξ – we expect correlations to decrease with increasing separation, so this makes γ positive. When plotted on a log-log scale, such functions appear as straight lines with slope −γ and n: log ξ = −γ log r + const and logP = n log k + const . (1.85) 1 STATISTICAL MEASURES OF A DENSITY FIELD 16 Figure 6: Top panel: The correlation function from the 2dFGRS galaxy survey in log-log scale. The dashed line is the best-fit power law (r0 = 5.05h −1Mpc, γ = 1.67). The inset shows the same in linear scale. Bottom panel: 2dFGRS data (solid circles with error bars) divided by the power-law fit. The solid line is the result from the APM survey and the dashed line from an N-body simulation. This is Fig. 11 from [10]. The proportionality constant can be given in terms of a reference scale. For ξ(r) we usually choose the scale r0 where ξ(r0) = 1, so that ξ(r) = ( r r0 )−γ . (1.86) See Fig. 6. For P (k) we may write P (k) = A2 ( k kp )n or P(k) = A2 ( k kp )n+d , (1.87) where kp is called a pivot scale (whose choice depends on the application) and A ≡ √ P (kp) or√ P(kp) is the amplitude of the power spectrum at the pivot scale. We define the spectral index n(k) as n(k) ≡ d lnP d ln k . (1.88) It gives the slope of P (k) on a log-log plot. For a power-law P (k), n(k) = const = n. We can study power-law ξ(r) and P (k) as a playground to get a feeling what different values of the spectral index mean, and, e.g., how γ and n are related. The Fourier transform of a power law is a power law. For the correlation function of (1.86) we get (exercise) 1D: P(k) = 2 k Γ(1− γ) sin(12γπ)(kr0) γ (0 < γ < 1 ⇒ −1 < n < 0) 2D: P (k) = 2π k2 21−γ Γ(1− 12γ) Γ(12γ) (kr0) γ (12 < γ < 2 ⇒ − 3 2 < n < 0) 3D: P (k) = 4π k3 Γ(2− γ) sin (2− γ)π 2 (kr0) γ (1 < γ < 3 ⇒ −2 < n < 0) , (1.89) 1 STATISTICAL MEASURES OF A DENSITY FIELD 17 so that γ and n are related by n = γ − d . (1.90) For P(k) these read 1D: P(k) = 2 π Γ(1− γ) sin(12γπ)(kr0) γ 2D: P(k) = 21−γ Γ(1− 12γ) Γ(12γ) (kr0) γ 3D: P(k) = 2 π Γ(2− γ) sin (2− γ)π 2 (kr0) γ , (1.91) In the 3D case these expressions are undefined for γ = 2, and we have the simpler result P (k) = 2π2 k3 (kr0) 2 and P(k) = (kr0)2 . (1.92) Observationally, the 3D correlation function has γ ≈ 1.8 for small separations, corresponding to n ≈ −1.2 for high k. See Fig. 3 for the observed power spectrum from the Sloan Digital Sky Survey. Note that a power-law correlation function is everywhere positive. This is possible when limk→0 P (k) 6= 0, which is indeed the case for the allowed spectral indices, n < 0, above. (In this case, there is sufficient structure at ever larger scales to maintain positive correlation at ever larger distances.) In reality, the power spectrum bends at large scales so that its spectral index becomes positive for low k, and therefore also the correlation function changes shape and will have also negative values at large enough r. Outside these values of spectral indices the Fourier transform integrals diverge in the small- or large-scale limit (I guess); but this does not prevent ξ(r) or P (k) from having also such power-law forms over some limited range of scales. The variance 〈δ2〉 = ξ(0) = ∫ ∞ 0 P(k)dk k ∝ ∫ ∞ 0 kn+d−1dk = 1 n+ d [ kn+d ]∞ 0 for n 6= −d (1.93) diverges at large scales (low k) for n ≤ −d and at small scales (high k) for n ≥ −d. Thus we should have n > −d for k → 0 and n < −d for k →∞. The large scales are not an issue (since indeed n > −d) in cosmology. At small scales, (1.86) forces ξ(r)→∞ as r → 0 for any positive γ. The solution of this issue is that arbitrarily small scales are not relevant; in practice we have a finite resolution that cuts off the smallest scales. This can be implemented with window functions, discussed in Sec. 1.9. The case n = −d is a scale-invariant spectrum, P(k) = const . Such a spectrum would mean that the universe would appear equally inhomogeneous at arbitrarily large scales – no asymptotic homogeneity. Note that here we discuss the spectrum of density. References to a scale-invariant or nearly scale-invariant spectrum in cosmology refer usually to the spectrum of gravitational potential (Newtonian treatment) or spacetime curvature perturbations (GR treatment). Their spectral index is lower by 4 so that such a scale-invariant spectrum will have a density spectral index n = 1 (in 3D). The boundary case n = 0 has the same 〈|δk|2〉 at all scales. From (1.90) γ → d as n → 0. For γ < d, the integration from 0 to R in (1.14) for ξ̄(R) smooths over the small-scale divergence of ξ(r); but for γ ≥ d the integral (1.14) diverges. Actually, also (1.89c) and (1.91c) diverge (Γ(−1) =∞) so that a finite ξ would give infinite P (k). Instead, n = 0 corresponds to the case of no correlations, ξ = 0. This is called white noise or a Poisson distribution (to be discussed in Sec. 2.3). 1 STATISTICAL MEASURES OF A DENSITY FIELD 18 The larger n is, the more is the structure concentrated at small scales. Peacock comments that n ≥ 0 spectra would seem to indicate that any large-scale structure is ‘accidental’, “re- flecting the low-k Fourier coefficients of some small-scale process”, whereas n < 0 means that large-scale structure is ‘real’ ” ([1], p. 499). Example: To do the 1D and 3D cases in (1.89) is standard FYMM I stuff, but for the 2D case I had to resort to integral tables: P (k) = ∫ ∞ 0 ξ(r)J0(kr) 2πrdr = 2πr γ 0 ∫ ∞ 0 r1−γJ0(kr)dr = 2π(kr0) γ k2 ∫ ∞ 0 x1−γJ0(x)dx . (1.94) From Gradshteyn & Ryzhik ([9], formula 6.561.14) we find that∫ ∞ 0 xµJν(x)dx = 2 µΓ( 1 2 + 1 2ν + 1 2µ) Γ( 12 + 1 2ν − 1 2µ) for −ν − 1 < µ < 12 . (1.95) We have ν = 0 and µ = 1− γ, so the condition becomes 12 < γ < 2 and the result∫ ∞ 0 x1−γJ0(x)dx = 2 1−γ Γ(1− 1 2γ) Γ( 12γ) . (1.96) 1.9 Scales of interest and window functions In (1.60) we integrated over all scales, from the infinitely large (k = 0 and ln k = −∞) to the infinitely small (k = ∞ and ln k = ∞) to get the density variance. Perhaps this is not really what we want. The average matter density today is 3× 10−27 kg/m3. The density of the Earth is 5.5× 103 kg/m3 and that of an atomic nucleus 2× 1017 kg/m3, corresponding to δ ≈ 2× 1030 and δ ≈ 1044. Probing the density of the universe at such small scales finds a huge variance in it, but this is no longer the topic of cosmology - we are not interested here in planetary science or nuclear physics. Even the study of the structure of individual galaxies is not considered to belong to cos- mology, so the smallest (comoving) scale of cosmological interest, at least when we discuss the present universe, is that of a typical separation between neighboring galaxies, of the order 1 Mpc. To exclude scales smaller than R (r < R or k > R−1) we can filter the density field with a window function (sometimes called a filter function). This can be done in k-space or x-space. The filtering in x-space is done by convolution. We introduce a (usually spherically sym- metric) window function W (r;R) such that∫ ddrW (r;R) = 1 (1.97) (normalization) and W ∼ 0 for |r| � R and define the filtered density field δ(x;R) ≡ (δ ∗W )(x) ≡ ∫ ddx′ δ(x′)W (x− x′) . (1.98) Here δ(x;R) and W (x;R) are considered as functions of x and R denotes the chosen resolution. To simplify notation, we write hereafter W (x;R) as W (x), leaving the scale R implicit. We now also assume W is spherically symmetric, so we can write just W (r). Denote the Fourier coefficients of δ(x;R) by δk(R). We use the Fourier series for δ(x) and δ(x;R), but since W (r) vanishes for large r we can use the Fourier transform W (k) for it. Thus we need a mixed form of the convolution theorem. Let’s do it explicitly: δk(R) = 1 V ∫ V ddx δ(x;R)e−ik·x = 1 V ∫ V ddxddx′ δ(x′)W (x− x′)e−ik·x = 1 V ∫ V ddx′ δ(x′)e−ik·x ′ ∫ ddrW (r)e−ik·r = W (k)δk , (1.99) 1 STATISTICAL MEASURES OF A DENSITY FIELD 19 where W (k) = ∫ ddrW (r)e−ik·r (1.100) is the Fourier transform of W (r).14 With our normalization, W (r) has dimension 1/V and W (k) is dimensionless with W (k = 0) = 1. Since W (r) = W (r) is spherically symmetric, so is W (k) = W (k). Since W (−r) = W (r), W (k) is real. For the correlations of these filtered Fourier coefficients we get 〈δ∗k(R)δk′(R)〉 = W (k)∗W (k′)〈δ∗kδk′〉 = 1 V δkk′W (k) 2P (k) (1.101) so the filtered power spectra are W (k)2P (k) and W (k)2P(k) . (1.102) The filtered correlation function is ξ(r;R) ≡ 〈δ(x;R)δ(x− r;R)〉 = 1 (2π)d ∫ ddk eik·rW (k)2P (k) . (1.103) and the variance of the filtered density field is σ2(R) ≡ 〈δ(x;R)2〉 = ξ(0;R) = ∫ ∞ 0 W (k)2P(k)dk k . (1.104) Considered as a function of R, it provides another measure of structure at different scales. Writing W (k) and P (k) in terms of their Fourier transforms, we get (exercise) σ2(R) = 1 (2π)d ∫ ddkW (k)2P (k) = ∫ ddxddx′ ξ(|x′ − x|)W (x)W (x′) . (1.105) Spectral moments. More generally, we define the spectral moments σ2` (R) ≡ ∫ ∞ 0 k2`P(k)W (k)2 dk k , (1.106) so that σ2(R) = σ20(R) is the zeroth moment. The relation σ 2(R) = ξ(0;R) can be generalized to higher moments and derivatives of ξ(0;R) at r = 0, since, e.g., ∇2ξ(r;R) = 1 (2π)d ∫ ddk (−k2)eik·rW (k)2P (k) ⇒ ∇2ξ(0;R) = 1 (2π)d ∫ ddk (−k2)W (k)2P (k) = −σ21(R) . (1.107) (The unfiltered ξ(r) and its derivatives may diverge at r = 0, but ξ(0;R) has been smoothed by the window function.) Peacock[1], p. 500 has ξ(2`)(0;R) = (−1)` σ 2 ` (R) 2`+ 1 (1.108) which probably holds as such only for d = 3, since I get ξ′′(0, R) = −σ 2 1(R) d (1.109) 14When x is closer than R to the edge of the volume V , the window function collectsa contribution outside V . In this convolution theorem we used periodic boundary conditions. In real applications one needs to consider edge effects. 1 STATISTICAL MEASURES OF A DENSITY FIELD 20 (I did not try to do the higher moments). To get (1.109) from (1.107) we need to relate ∇2ξ(r;R) to ξ′′(r,R) at r = 0. Expand ξ(r,R) = ∞∑ n=0 anr n , (1.110) where a1 = 0 so that ξ(r;R) is smooth at r = 0. Now ∂ir n = nrn−1∂ir = nr n−2xi ⇒ ∂iξ(r;R) = ∞∑ n=2 annr n−2xi (1.111) and ∂j∂iξ(r,R) = ∞∑ n=2 ann [ (n− 2)rn−4xixj + rn−2∂jxi ] (1.112) where ∂jxi = δij . Thus ∇2ξ(r,R) ≡ ∑ i ∂i∂iξ(r) = ∞∑ n=2 ann [ (n− 2)rn−2 + d rn−2 ] → 2d a2 (1.113) as r → 0, whereas ξ′′(r,R) = ∞∑ n=2 ann(n− 1)rn−2 → 2a2 , (1.114) so that ∇2ξ(0;R) = d · ξ′′(0;R) . (1.115) The simplest window function is the top-hat window function WT (r) ≡ 1 V (R) for |r| ≤ R (1.116) and WT (r) = 0 elsewhere, i.e., δ(x) is filtered by replacing it with its mean value within the distance R. It’s Fourier transform15 is (exercise): 1D: WT (k) = 1 kR sin kR 2D: WT (k) = 2 kR J1(kR) 3D: WT (k) = 3 (kR)3 (sin kR− kR cos kR) = 3 kR j1(kR) (1.117) Mathematically more convenient is the Gaussian window function WG(r) ≡ 1 VG(R) e− 1 2 r 2/R2 , (1.118) where VG(R) ≡ ∫ ddre− 1 2 |r| 2/R2 (1.119) 15Note the emerging pattern with the Bessel functions: trigonometric functions are “Bessel functions for 1D”, cos and sin corresponding to J0 and J1; the ordinary Bessel functions Jn are “for 2D”; and the spherical Bessel functions are “for 3D”. All are oscillating functions; trigonometric functions have constant amplitude; Jn decay as x−1/2 for large x, and jn(x) decay as x −1. 1 STATISTICAL MEASURES OF A DENSITY FIELD 21 is the volume of WG. The volume of a window function is defined as what ∫ ddrW (r) would be if W were normalized so that W (0) = 1, instead of the normalization we chose in (1.97). The volume of WG is (exercise) VG(R) = (2π) d/2Rd , (1.120) and its Fourier transform is, for all d, (exercise) WG(k) = e −12 (kR) 2 . (1.121) (The 1D case was done in FYMM Ib. From that it’s easy to generalize to arbitrary d.) We can also define the k-space top-hat window function Wk(k) ≡ 1 for k ≤ 1/R (1.122) and Wk(k) = 0 elsewhere. In x-space this becomes (exercise) 3D: Wk(r) = 1 2π2R3 sin y − y cos y y3 = 1 6π2R3 3j1(y) y , where y ≡ |r|/R , (1.123) and 3D: Vk(R) = 6π 2R3 (1.124) For this window function the density variance is simply σ2(R) = 1 (2π)d ∫ R−1 0 Cdk d−1P (k)dk = ∫ − lnR −∞ P(k)d ln k . (1.125) Note that the volumes of the different window functions are quite different. See Fig. 7. In 3D: VT (R) = 4π 3 R3 = 4.189R3 , VG(R) = (2π) 3/2R3 = 15.75R3 , Vk(R) = 6π 2R3 = 59.22R3 . (1.126) The values of R that make the volumes equal are RG = 0.6431RT and Rk = 0.4136RT . Thus a given R corresponds to a somewhat different effective scale for the different window functions. The different window functions also give quite different σ2(R). Observationally, the 3D galaxy distribution has ([1], p. 501, [2], p. 83). σ2T (R) ≈ 1.0 for R = 8h−1Mpc. (1.127) Near these scales the slope of the correlation function is γ ≈ 1.8 corresponding to n = −1.2. (1.128) This slope does not hold at larger scales, and at R = 30h−1Mpc, σ2T (R) is already down to 10 −2 (σ ∼ 0.1 [2], p. 83). See also Fig. 3. One may also ask, whether scales larger than the observed universe (i.e., the lower limit k = 0 or ln k = −∞ in the k integrals) are relevant, since we cannot observe the inhomogeneity at such scales. Due to such very-large-scale inhomogeneities, the average density in the observed universe may deviate from the average density of the entire universe. Inhomogeneities at scales somewhat larger than the observed universe could appear as an anisotropy in the observed universe. The importance of such large scales depends on how strong the inhomogeneities at these scales are, i.e., how the power spectrum behaves as k → 0. 1 STATISTICAL MEASURES OF A DENSITY FIELD 22 0 1 2 3 4 5 6 7 0.00 0.05 0.10 0.15 0.20 0.25 Figure 7: The 3D window functions W (r), top-hat (green), Gaussian (red), and k (blue), for R = 1. 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 γ = n+d 0 1 2 3 4 5 6 7 8 Figure 8: The ratio of σ2(R) to P(R−1) in the case of a power-law spectrum P(k) ∝ kn+d for the three different window functions: Gaussian (red), k (blue), 1D top-hat (green dashed), 2D top-hat (green), and 3D top-hat (green with dots). They all diverge in the limit n→ −d (γ → 0) due to the contributions of ever larger scales (ln k → −∞). The divergence at n → 1 for the top-hat window functions is a trickier thing. It has to do with their Fourier transform not dying off at high k fast enough. 1 STATISTICAL MEASURES OF A DENSITY FIELD 23 Exercise: We defined σ2(R) as an expectation value over the ensemble. Define σ̂2(R) as the volume average over a realization and show that σ̂2(R) ≡ 1 V ∫ V ddx δ̂(x, R)2 = V (2π)d ∫ ddk |δ̂k|2W (k)2 . (1.129) Exercise: For a power-law spectrum and a Gaussian window function, show that σ2G(R) = 1 2 Γ ( n+ d 2 ) P(R−1) . (1.130) Exercise: For a power-law spectrum and k-space top-hat window function, show that σ2k(R) = 1 n+ d P(R−1) . (1.131) Example: I wanted to do the same also for the top-hat window function, especially since (1.127). The cases seem different for different d, so try first the 3D: σ2T (R) = ∫ ∞ 0 P(k)WT (k)2 dk k = A2k−(n+3)p ∫ ∞ 0 kn+3 ( 3 kR )2 j21(kR) dk k = 9A2(kpR) −(n+3) ∫ ∞ 0 xnj21(x)dx = 9InP(R−1) , (1.132) where In = ∫ ∞ 0 xnj21(x)dx . (1.133) I couldn’t integrate In and didn’t find it in integral tables. Wolfram Alpha said “computation time exceeded” for both In and I−n, but for n = −1.2 it gave the remarkable result I−1.2 = 125 √ 5 + √ 5Γ(4/5) 1386× 23/10 ≈ 0.229418 . (1.134) This would give σ2T (R) = 2.0648P(R−1) for n = −1.2 (γ = 1.8), which appears surprisingly large, since the other two window functions give σ2G(R) = 0.5343P(R−1) and σ2k(R) = 0.5556P(R−1). Actually, I was equally surprised that σ2G and σ 2 k came so close to each other although the volumes of the two window functions are quite different. I would have been content, if σ2G had been intermediate between σ 2 k and σ 2 T . I think the explanation lies in the Fourier transform WT (k) not dying off fast enough for high k, so for large n + d, where there is lots of power at small scales, scales � R keep contributing to σ2T (R). When this is applied to galaxy number density, there will be another cut-off due to the finite distances between galaxies, so the effect of this high-k tail may not be fully realized. . . The solution for doing the 3D case with Wolfram Alpha turned out to be to restrict the range of n to n < 0 and n < −1. This gives In = ∫ ∞ 0 xnj21(x)dx = 2 −n sin nπ 2 (n+ 1)Γ(n− 1) n− 3 for −3 < n < 0 . (1.135) (I just assume that this result holds also for 0 < n < 1; it diverges at n→ 1, see Fig. 8.) Then the 1D case: σ2T (R) = A 2(kpR) −(n+1) ∫ ∞ 0 xn−2 sin2 x dx = InP(R−1) , (1.136) where In = ∫ ∞ 0 xn−2 sin2 x dx , (1.137) 1 STATISTICAL MEASURES OF A DENSITY FIELD 24 which diverges at x → 0 for n ≤ −1 and at x → ∞ for n ≥ +1. Gradshteyn&Ryzhik[9] 3.821.9 gives I0 = π/2. Wolfram Alpha gives In = 2 −n sin nπ 2 Γ(n) 1− n (0 < n < 1) and In = 2 −n sin ( −nπ 2 ) Γ(n+ 1) n(n− 1) (−1 < n < 0) (1.138) For n+ 1 = 1.8 this gives σ2T (R) = 3.1797P(R−1), which is even more than I got for the 3D case. Then the 2D case: σ2T (R) = 4InP(R−1) , where In = ∫ ∞ 0 xn−1J1(x) 2 dx . (1.139) Wolfram Alpha computation time was exceeded but Gradshteyn&Ryzhik[9] 6.574.2 gives In = Γ(1− n)Γ(1 + n/2) 21−nΓ(1− n/2)2Γ(2− n/2) for n < 1. (1.140) These results for the different window functions are compared in Fig. 8. 2 DISTRIBUTION OF GALAXIES 25 Figure 9: Evolution of the comoving total galaxy number density φT as a function of redshift and time [18]. The symbols with error bars are results from different surveys. The solid line on theright panel is a fit to the data points, and the dashed line is a fit of a galaxy merger model to them. The plot assumes h = 0.7, so to convert into units of h3 Mpc−3, multiply the vertical scale numbers (which are for φT , not log φT ) by h −3 = 2.9. 2 Distribution of galaxies Instead of a continuous density ρ(x), we now consider a distribution of discrete objects. Their number density ρ(x) is then only defined with a finite resolution.16 To make the discussion sound less abstract we call these objects galaxies, although they could also be other cosmological objects (e.g., clusters of galaxies), and another application is in numerical methods where a continuous density field is represented by a distribution of point masses. 2.1 The average number density of galaxies It is often said that the observable universe contains about 200 billion (2 × 1011) galaxies. If the observable universe is taken to mean everything until the last scattering surface (the origin of the cosmic microwave background) at z = 1090, which lies at comoving distance r ≈ 3.1H−10 = 9300h−1Mpc, then its comoving volume is V = 4π 3 r3 ≈ 3400h−3 Gpc3 = 3.4× 1012h−3 Mpc3 . (2.1) However, in this context the observable universe is taken to mean up to z = 8, which lies at r ≈ 2.05H−10 = 6100h−1Mpc, giving Vobs ≈ 970h−3 Gpc3 = 9.7× 1011h−3 Mpc3 . (2.2) With Ng = 2× 1011 galaxies this gives comoving mean galaxy number density ρ̄g = Ng Vobs ≈ 0.21 h−3 Mpc3 . (2.3) 16MBW[2], Sec. 6.1.2 takes a heavier approach here. They consider a two-step random process, where the first random process generates a continuous density field ρ(x) and a second random process generates a point mass representation of it. The advantage of this is that there is no worry about dV being smaller than the resolution of ρ(x). They also invoke ergodicity, but this does not seem necessary, if one refers to 〈ρ〉 instead of ρ̄. 2 DISTRIBUTION OF GALAXIES 26 Our past light cone. Recently I read 17 that the number of galaxies in the observable universe is 10 times larger, since in the early universe galaxies were much smaller, and later they merged to form larger galaxies so the comoving galaxy number density went down. Thus the above density applies to the late universe, but in the early universe it was more than 10 times larger. The page links to a draft of an article by Conselice et al.[18]. Fig. 9 is from that article. In (2.2) and (2.3), I restricted the observable universe to z ≤ 8, and ignored evolution effects (higher redshifts corresponds to earlier times) to get a homogeneous galaxy mean number density to correspond to some recent t = const , but more appropriate would be to define the observable universe to correspond to our past light cone, all the way to z = 1090. The galaxies it contains are then those, whose world lines intersect this light cone. The comoving galaxy number density then increases towards higher z at first, because of the above evolution effect, but then begins to fall (probably near z ∼ 8, but we don’t have good data at such high redshifts) when we get to times when most galaxies had not yet formed. The baryon density is ρ̄b = Ωbρcrit0 = ωbh −2ρcrit0, where ωb = 0.022 [19], giving ρ̄b = 6.1× 109m�/Mpc3 (2.4) With h = 0.7 this gives ρ̄b ρ̄g = 8.6× 1010m� = 1.7× 1041 kg baryonic matter per galaxy (2.5) in the late universe. The total matter density parameter is ωm = 0.14 [19], so this gives 5.5× 1011m� total matter (baryonic + cold dark matter) per galaxy. From [2], p.62 (Table 2.6) there are about 10 times as many dwarf galaxies in the local part of the universe as there are spiral galaxies; and the number of other types of galaxies is about half of that of spiral galaxies. The dwarf galaxies (defined as those with absolute magnitude MB & −18) thus make up most of the number of galaxies although they contain a relatively small fraction of all stars ([2], p. 57). The further out we look the larger the absolute luminosity (the smaller the absolute magnitude) of the galaxy has to be for us to be able to observe it. Thus the number density of observable galaxies is smaller than (2.3) and falls with distance. For the 2D galaxy number density on the sky I can give a more definite number: The Euclid wide survey will cover (Ω =) 15 000 square degrees (= 36.36% of the sky) and is expected to observe 1.5 billion (Ng = 1.5×109) galaxies (with apparent magnitude m < 24.5, observed suffi- cently well for their observed shapes to be used for weak lensing statistics)[20]. This corresponds to a 2D density of ρ̄g,Euclid = Ng Ω ≈ 30 arcmin2 . (2.6) Most of these galaxies are at z . 2, corresponding to r . 2H−10 or V ≈ 190h−3 Gpc3, 1/5 of the volume to z = 8. Defining the “Euclid volume” as 15 000 square degrees of sky up to z = 2, we have VEuclid ≈ 70h−3 Gpc3 , (2.7) which should contain about 14 billion galaxies, so Euclid will miss most of them. Let’s check if these numbers from the three sources[18, 2, 20] appear consistent with each other: Comparing the absolute magnitude of the brightest dwarf galaxies, M = −18, to the Euclid wide survey depth, m = 24.5, we conclude that even the brightest dwarf galaxies will be missed beyond distance modulus m − M = 42.5 = −5 + 5 lg dL[pc] ⇒ dL = 109.5pc ≈ 3.16 Gpc. With h = 0.7 this is dL = 2.2h −1Gpc = 0.74H−10 corresponding to z ≈ 0.55. (In a flat universe the luminosity distance and comoving distance are related by dL = (1 + z)r, so this corresponds to r ≈ 0.48H−10 .) Thus the Euclid 17https://www.nasa.gov/feature/goddard/2016/hubble-reveals-observable-universe-contains-10- times-more-galaxies-than-previously-thought 2 DISTRIBUTION OF GALAXIES 27 wide survey should see some dwarf galaxies at z < 0.55 and none at z > 0.55, beyond which it will miss also some of the larger galaxies. This seems consistent with Euclid observing 1.5 billion out of a total of 14 billion galaxies in VEuclid. 2.2 Galaxy 2-point correlation function We treat individual galaxies as mathematical points, so that each galaxy has a coordinate value x. We define the galaxy 2-point correlation function ξ(r) as the excess probability of finding a galaxy at separation r from another galaxy: dP ≡ 〈ρ〉 [1 + ξ(r)] dV (2.8) where 〈ρ〉 is the mean (ensemble average) galaxy number density, dV is a volume element that is a separation r away from a chosen reference galaxy, and dP is the probability that there is a galaxy within dV . The probability of finding a galaxy in volume dV1 at a random location x is dP1 = 〈ρ(x)〉dV1 = 〈ρ〉〈1 + δ(x)〉dV1 = 〈ρ〉dV1 . (2.9) The probability of finding a galaxy pair at x and x + r is dP12 = 〈ρ(x)ρ(x + r)〉dV1dV2 = 〈ρ〉2〈[1 + δ(x)][1 + δ(x + r)]〉dV1dV2 = 〈ρ〉2 [1 + 〈δ(x)〉+ 〈δ(x + r)〉+ 〈δ(x)δ(x + r)〉] dV1dV2 = 〈ρ〉2 [1 + 〈δ(x)δ(x + r)〉] dV1dV2 , (2.10) since 〈δ(x)〉 = 〈δ(x + r)〉 = 0. Dividing dP12 with dP1 we get the probability dP2 of finding the second galaxy once we have found the first one dP2 = 〈ρ〉 [1 + 〈δ(x)δ(x + r)〉] dV2 . (2.11) Comparing (2.11) to (2.8) we see that our new definition of ξ agrees with the old one ξ(r) = 〈δ(x)δ(x + r)〉 . (2.12) Thus, for any galaxy, 〈ρ〉[1 + ξ(r)]dV is the expectation number of galaxies in a volume element dV at separation r and the mean number of neighbors within a spherical shell is dN(r) = 〈ρ〉 [1 + ξ(r)]Cdrd−1dr (2.13) and the mean number of neighbors within distance R is N(R) = 〈ρ〉V (R) + 〈ρ〉 ∫ R 0 ξ(r)Cdr d−1dr = 〈ρ〉V (R) [ 1 + ξ̄(R) ] . (2.14) Thus 1 + ξ(r) can be interpreted as the mean (expected) density profile around each galaxy. 2.3 Poisson distribution A Poisson distribution is an uncorrelated distribution of galaxies, which we get when we assign each galaxy i a random location xi (with uniform probability density within V ) independently. (This process is called a Poisson process.) Then dP12 = 〈ρ〉2dV1dV2 ⇒ ξ(r) = 0 . (2.15) 2 DISTRIBUTION OF GALAXIES 28 Figure 10: Poisson distribution of N = 250 galaxies. The 2D volume is divided into M = 25 cells, so that on average a cell should contain 10 galaxies. There are twocells with just 5 galaxies. Divide the volume V into M subvolumes (cells) ∆V . Assign now N galaxies into V with a Poisson process. See Fig. 10. Each galaxy lands in a particular cell with probability p = 1/M , and somewhere else with probability 1 − p = 1 − 1/M . The probability of the n first galaxies landing in this cell and the remaining N−n elsewhere is thus (1/M)n(1−1/M)N−n. Since there are ( N n ) ≡ N ! n!(N − n)! (2.16) ways of choosing n galaxies out of N , the probability of getting exactly n galaxies in a particular cell is P(n) = ( N n )( 1 M )n( 1− 1 M )N−n . (2.17) This is the nth term of the binomial expansion (a+ b)N = N∑ n=0 ( N n ) anbN−n (2.18) so we can easily check that the total probability is N∑ n=0 P(n) = N∑ n=0 ( N n )( 1 M )n( 1− 1 M )N−n = [ 1 M + ( 1− 1 M )]N = 1N = 1 . (2.19) This probability distribution (2.17) of integers is called the binomial distribution B(N, p), where p = 1/M . The Poisson limit theorem states that: if N → ∞ and p → 0 (M → ∞) so that Np = N/M → λ, then P(n) = ( N n ) pn(1− p)N−n → λ ne−λ n! = ( N M )n e−N/M n! (2.20) 2 DISTRIBUTION OF GALAXIES 29 This probability distribution of integers is called the Poisson distribution. For the Poisson distribution we have the expectation values 〈n〉 = ∞∑ n=0 nP(n) = e−λ ∞∑ n=1 λn (n− 1)! = λe−λ ∞∑ n=0 λn n! = λ = N/M 〈n2〉 = ∞∑ n=0 n2P(n) = . . . = λ+ λ2 〈(∆n)2〉 ≡ 〈(n− 〈n〉)2〉 = 〈n2〉 − 〈n〉2 = λ . (2.21) The galaxy density in a cell is ρ(x) = n/∆V and the density perturbation is δ(x) = (n−〈n〉)/〈n〉, so that 〈δ2〉 = 〈(n− 〈n〉) 2〉 〈n〉2 = 1 λ = 1 〈n〉 . (2.22) (We will later, in Sec. 8, refer to this as Poisson variance: the relative variance is 1/expected number of points.) Thus, although for a Poisson distribution ξ(r) = 0 for r 6= 0, we have ξ(0) = 〈δ2〉 = M N = 1 〈n〉 (2.23) (in the limit of very large N and M). This density variance depends on the resolution since increasing the number of cells M (decreasing ∆V ) makes its larger. We could continue with this approach to specify that the volume V is cubic and the division into M cells forms a rectangular grid, replacing M with Md and doing a discrete Fourier transform to find the power spectrum of the Poisson distribution, finding that P (k) = const , (2.24) but since this seems not to be the usual approach in literature, I skip this, avoiding the discussion of discrete Fourier transforms. However, we make some comments on the result: The Poisson distribution has the power spectrum of white noise, n = 0, the amplitude depending on the resolution. We noted in Sec. 1.8 that as n→ 0, γ → d. Now we have that ξ(r) = 0 for r 6= 0, but ξ(0) ∝ (∆r)−d (for fixed V and N), where ∆r is the side of the cell, i.e., ∆V = (∆r)d. Exercise: Cox process. A Cox process refers to a combination of two Poisson processes. Consider the following examples: 1. Infinitely long lines are placed randomly (the first Poisson process) into an infinite volume. Galaxies are then assigned randomly (the second Poisson process) on these lines, so that the mean (expec- tation value of) linear number density on these lines is λ. What is ξ(r), given in terms of λ and the resulting mean (expectation value of) 3D galaxy number density 〈ρ〉? 2. Like the previous case, but now the line segments have finite length L (all have the same length). What is ξ(r), in terms of λ, L, and the number density ns of line segments? Answer: 1. ξ(r) = λ 2π〈ρ〉 r−2 (2.25) 2. ξ(r) = 1 2πnsLr2 ( 1− r L ) for r ≤ L ; ξ(r) = 0 for r > L (2.26) Note that this does not depend on λ (〈ρ〉 = nsLλ, so it cancels in the ratio λ/〈ρ〉). 2 DISTRIBUTION OF GALAXIES 30 Note: The Cox process has been used to generate “mock” catalogs for testing computer codes to estimate ξ(r). It is actually not easy to generate test catalogs which have exactly some known, but non-zero, correlation function ξ(r) that the estimate could be compared to. Cox is one way to do this. Exercise: Cox process with binning. Continuation of previous exercise: When measuring the correlation function ξ(r) from data, one has to count it for bins of finite width ∆r. So instead of asking what is the excess probability ξ(r)〈ρ〉dV for another galaxy at separation r (so that dV = 4πr2dr), for comparing data with theory we ask what is the excess probability ξ(r1, r2)〈ρ〉∆V for a separation between r1 and r2 (i.e., here ∆r = r2 − r1). Find ξ(r1, r2) for r1 < r2 ≤ L. (ξ = 0 for r1 ≥ L, and the answer for r1 < L < r2 is more complicated but rarely needed.) Answer: 1. ξ(r1, r2) = λ 2π〈ρ〉 3(r2 − r1) r32 − r31 = λ 2π〈ρ〉 3 r21 + r1r2 + r 2 2 (2.27) 2. ξ(r1, r2) = 1 2πnsL 3(r2 − r1) r32 − r31 ( 1− r1 + r2 2L ) = 1 2πnsL 3 r21 + r1r2 + r 2 2 ( 1− r1 + r2 2L ) (2.28) Table 2: The Cox process correlation function for L = 500, ns = 8× 10−7, and ∆r = 1 binning. On the left: a log-log plot of ξ(r). On the right: lin-lin plot of r2ξ(r). 2.4 Counts in cells One of the first methods to measure the clustering properties of galaxies was dividing the survey volume V into cells (subvolumes) of equal size ∆V and shape, and count the number n of galaxies in each cell. Defining ∆n ≡ n− 〈n〉 (2.29) the variance µ2 ≡ 〈(∆n)2〉 (2.30) 2 DISTRIBUTION OF GALAXIES 31 and skewness µ3 ≡ 〈(∆n)3〉 , (2.31) we have that for a completely random (i.e., Poisson) distribution of galaxies µ2 = µ3 = 〈n〉 . (2.32) A clustered distribution will have a larger variance. We define y ≡ µ2 − 〈n〉 〈n〉2 (2.33) as a measure of clustering. It can be shown that 〈y〉 = ∫ ∆V ∫ ∆V ξ(x1 − x2)d dx1d dx2 ∆V 2 (2.34) Likewise we define z ≡ µ3 − 3µ2 + 2〈n〉 〈n〉3 (2.35) as a measure of excess skewness of the clustering. It measures nonlinear effects in structure growth and non-Gaussianity of primordial perturbations. For an actual survey we define corresponding quantities µ̂2, µ̂3, ŷ, and ẑ, by replacing expectation values with survey averages. Note that these are biased estimators, 〈ŷ 〉 6= y, 〈ẑ 〉 6= z, since taking expectation values does not commute with raising to second or third power and division. One can study scale dependence of structure by using larger or smaller cells. This method is better than correlation function estimation for detecting structure at large scales, where the correlation function is small. An important improvement of this method is to, instead of using disjoint cells, assign a much larger number of cells of the same size ∆V to random locations within the survey, allowing them to overlap. This oversampling does not change the expected variance, which is determined by (2.34), but the measured variance will be closer to this expectation value. [14, 15] 2.5 Fourier transform for a discrete set of objects When we replaced the continuous density field with a set of discrete objects, the resolution of our description was reduced. The standard approach of a discrete Fourier transform, where one introduces a rectangular grid with finite resolution, introduces another, independent, loss of resolution, which is unnecessary. We would lose the information on the exact locations of the galaxies if we just assigned them into finite cells. The only discreteness we need to introduce is that inherent in the problem, that of the discrete point set. This is done by introducing microcells. We start as if we were going to do a normal discrete Fourier transform, dividing the volume into cells. But now we make the cells extremely small, so that the probability of there being more than one galaxy in a cell becomes zero, and specifying in which cell a galaxy i is, specifies its exact location xi. Denote the volume of such a microcell with δV . Most microcells will be empty. The galaxy number density in microcell j is nj/δV , where nj = 0 or 1. This means that n2j = nj , which will be very helpful later. The Fourier coefficients of the density field become ρk = 1 V ∫ V ρ(x)e−ik·xddx = 1 V ∑ j (nj/δV )e −ik·xjδV = 1 V ∑ j nje −ik·xj , (2.36) 2 DISTRIBUTION OF GALAXIES 32 where the sum is overmicrocells, and xj is the location of the microcell. But since nj = 0 for all the empty microcells (nj = 0), only those terms survive, where the microcell contains a galaxy, nj = 1, and the sum becomes a sum over galaxies ρk = 1 V ∑ i e−ik·xi , (2.37) where xi is the location of galaxy i. Thus δk = ρk 〈ρ〉 = 1 〈ρ〉V ∑ i e−ik·xi = 1 〈N〉 N∑ i=1 e−ik·xi for k 6= 0 δ0 = δ̄ = ρ̄ 〈ρ〉 − 1 = N − 〈N〉 〈N〉 , (2.38) where N is the total number of galaxies in the volume V , and 〈N〉 = 〈ρ〉V is its expectation value. 2.5.1 Poisson distribution again We apply now our new Fourier method to the Poisson distribution, where the locations xi are independent random numbers, so that the complex numbers e−ik·xi in (2.38) are distributed randomly on the unit circle of the complex plane. Doing the sum ∑N i=1 e −ik·xi thus executes a random walk on the complex plane, with step length 1. To get the power spectrum, |δk|2 = δ∗kδk = 1 〈N〉2 ∑ ij eik·(xi−xj) = 1 〈N〉2 ∑ i 6=j eik·(xi−xj) + ∑ i 1 = 1 〈N〉2 2∑ pairs cos(k · (xi − xj)) +N . (2.39) There is an equal probability for the cos(k·(xi−xj)) to be positive or negative, so the expectation value of the first term vanishes, and we get P (k) = V 〈|δk|2〉 = V 〈N〉 〈N〉2 = V 〈N〉 = 1 〈ρ〉 , (2.40) which is independent of k, i.e., the spectral index is n = 0. In Sec. 9.1 we redo this for a correlated distribution, where we see that the effect of having a discrete set of galaxies instead of a continuous density adds this V/〈N〉 = 1/〈ρ〉 term to the power spectrum. This added term is called shot noise. The higher the density of points (galaxies included in the survey) the smaller is this shot noise. For an estimate of the power spectrum of the underlying mass distribution we subtract this shot noise. Probability distribution of P̂ (k): We will later discuss estimation of power spectrum from galaxy surveys in more detail, but let us already consider how individual realizations differ from the expectation value, i.e., how is P̂ (k) ≡ V |δ̂k|2 (2.41) distributed around the expectation value P (k). We assume a fixed N (which may be different from 〈N〉), i.e., we do not here fold in the probability distribution of N . Consider the complex number δ̂k as a 2D vector δ̂k = 1 N N∑ i=1 e−ik·xi = 1 N (∑ i cos k · xi , − ∑ i sin k · xi ) , (2.42) 2 DISTRIBUTION OF GALAXIES 33 which points to the endpoint of the random walk (or Nδ̂k does). Consider the real part of δ̂k: 1 N ∑ i cos k · xi . (2.43) Here the terms in the sum are independent random variables with a nonuni- form probability distribution (k·xi has a uniform probability distribution18). We apply the central limit theorem: The sum (or mean) of independent random variables approaches the normal (Gaussian) distribution, P(x) = 1√ 2πσ2 e− 1 2 (x−µ) 2/σ2 , (2.44) where µ is the expectation value and σ2 the variance, as N →∞, regardless of the probability distribution of the individual variables. If each variable has the same probability distribution, with expectation value µ and variance σ2, then their mean will have expectation value µ and variance σ2/N . The expectation value of cos k · xi is zero and the variance is 12π ∫ 2π 0 cos2 xdx = 12 . Thus Re δ̂k has the probability distribution P ( Re δ̂k = x ) = √ N π e−Nx 2 . (2.45) The imaginary part has the same probability distribution. Now clearly the real and imaginary parts are correlated. The individual terms are fully correlated since sin = √ 1− cos2. Some of this correlation remains for the sums, especially in the large-|δ̂k| tail of the probability distribution. Clearly, if the real part is close to its possible maximum value (cos k ·xi has mostly landed near 1), then the imaginary part has to be small, and vice versa. However, far from the tail we can expect the correlation between the sums (the two components of the random walk) be negligible. Making this approximation,19 we get the 2D probability distribution P ( δ̂k ) = N π e−N |δ̂k| 2 . (2.46) To convert this into a probability distribution for |δ̂k|2, we need to integrate. Do first P ( |δ̂k| = r ) dr = Ne−Nr 2 2rdr . (2.47) This is known as the Rayleigh distribution. For s = |δ̂k|2 = r2, ds = dr2 = 2rdr, so P ( |δ̂k|2 = s ) ds = Ne−Nsds , (2.48) and P ( |δ̂k|2 > s ) = N ∫ ∞ s e−Nsds = e−Ns . (2.49) The mean of this distribution is 1/N and the variance is 2/N2, i.e, 〈P̂ (k)〉 = V N = 1 ρ̄ and 〈( P̂ (k)− 〈P̂ (k)〉 )2〉 = 2 ( V N )2 = 2 ρ̄2 . (2.50) Thus, although the expectation value of P̂ (k) agrees with P (k), the variance around it is large and the most probable value is actually P̂ (k) = 0 ! Note, however, that this is for an individual Fourier mode k, and to estimate P (k) from a survey one would take the mean over a large number of Fourier modes for which |k| falls between k and k + dk, and the variance of this mean is then much lower. 18Comment by Elina Keihänen: I do not think k ·x has uniform distribution, except in 1D. It is a sum of three terms, k1x1 + k2x2 + k3x3, each of which separately is uniformly distributed, but their sum is not. Suprisingly, (2.45) still seems to hold. I checked this numerically. 19Peacock[1] does not mention any such approximation (maybe it comes as part of the N � 1 limit), but I need to make it to get his Eq. (16.115), i.e., our (2.49). 2 DISTRIBUTION OF GALAXIES 34 We get the probability distribution of N by just applying (2.20) but considering the full volume V as one subvolume of an infinite universe, i.e., replacing n by N , P(N) = 〈N〉N exp −〈N〉 N ! . (2.51) 3 SUBSPACES OF LOWER DIMENSION 35 3 Subspaces of lower dimension Consider now how the statistics of a 1D or 2D subspace are related to the statistics in the full 3D space. A key starting point here is that because of statistical isotropy, the correlation function ξ(r) = 〈δ(x)δ(x + r)〉 (3.1) is the same, whether x and x + r are restricted on a 1D line or 2D plane, or not. 3.1 Skewers Consider the power spectrum P1D(k) along a straight line (‘skewer’) going through 3D space. Since P1D(k) = 2k π ∫ ∞ 0 ξ(r) cos krdr and ξ(r) = ∫ ∞ 0 P3D(k) sin kr kr dk k , (3.2) we have P1D(k) = k π ∫ ∞ 0 dq q2 P3D(q) ∫ ∞ 0 dr 2 cos kr sin qr r = k ∫ ∞ k dq q2 P3D(q) , (3.3) since (exercise) ∫ ∞ 0 dr 2 cos kr sin qr r = πΘ(q − k) , (3.4) where Θ(q − k) is the step function, 1 for q > k, 0 for q < k. Shorter wavelength q > k modes in 3D contribute to the observed power in 1D at k, since when q is not parallel to the line, the intersection of the 3D plane wave with the line has a longer wavelength. In terms of P1D(k) = (π/k)P1D(k) and P3D(k) = (2π 2/k3)P3D(k) this reads P1D(k) = 1 2π ∫ ∞ k dq q q2P3D(q) . (3.5) This means that if P3D has a spectral index n ≥ −2 (P3D(q) ∝ q or steeper), P1D(k) at a given scale k will be dominated by much smaller-scale (higher-k) 3D structure, and the true larger-scale 3D structure cannot be seen in a 1D survey. From (3.5) we see that dP1D(k) dk = − 1 2π kP3D(k) , (3.6) so that P1D(k) is necessarily monotonously decreasing. Thus we always have n1D < 0, even if n3D > 0. Example: Let’s redo the calculation of (3.5) in a way that is maybe easier to generalize to other cases: Start from the d-dimensional integrals P1D(k) = ∫ ∞ −∞ dr e−ik·rξ(r) and ξ(r) = 1 (2π)3 ∫ d3q eiq·rP3D(q) , (3.7) so that P1D(k) = 1 (2π)3 ∫ drd3qe−ik·reiq·rP3D(q) , (3.8) where the vectors r and k lie along the 1D line (which we can take as the x axis). Thus in the exponentials (q− k) · r = (qx − k)r and the r integral gives 1 2π ∫ dr ei(qx−k)r = δD1 (k − qx) , (3.9) 3 SUBSPACES OF LOWER DIMENSION 36 forcing qx = k. Write now q = (k,h) where h is 2-dimensional. Now P1D(k) = 1 (2π)2 ∫ d2hP3D(q) = 1 2π ∫ hdhP3D(q) . (3.10) Since (q)2 = k2 + h2, hdh = qdq and q ≥ k and we have (3.5). For the case of discrete objects (‘galaxies’) we have to allow a finite thickness for the skewer (an infinitely thin line will catch zero galaxies, if they are treated as points). Consider thus a
Compartir