Topological Data Analysis in Cosmological Structure Formation

The large-scale structure (LSS) of the Universe, a complex network of galaxies and matter, often termed the "cosmic web," provides a fundamental testing ground for cosmological models. This web is characterized by dense clusters, elongated filaments, vast, sparsely populated voids, and sheet-like walls. Quantifying the intricate topology and morphology of these structures is crucial for understanding their formation and evolution under gravity, and for extracting cosmological information.
Traditional statistical methods, primarily based on N-point correlation functions, capture certain aspects of this structure but often fall short in fully describing its rich topological and geometric features. Topological Data Analysis (TDA) has emerged as a powerful and increasingly utilized framework to address these challenges, offering a suite of tools to characterize the shape, connectivity, and multi-scale nature of the cosmic web. This article reviews the application of TDA to cosmological structure formation, highlighting key findings and, more importantly, aiming to synthesize novel insights and speculate on future generative directions where TDA could unlock new knowledge about the Universe.
Fundamental TDA Tools for Large-Scale Structure
At the heart of TDA’s application to LSS is the concept of describing data not just by point-wise distributions but by its underlying topological structure at various scales. Cosmological data, whether from N-body simulations (dark matter particles, halos) or galaxy surveys (galaxy positions and properties), are typically represented as point clouds. TDA techniques transform these point clouds into a sequence of nested topological spaces, often simplicial complexes (e.g., Vietoris-Rips complexes, alpha complexes, or Delaunay tessellations derived from frameworks like the Delaunay Tessellation Field Estimator - DTFE). A filtration is applied to these complexes, commonly by varying a scale parameter such as the radius of balls centered on data points or a density threshold.
Persistent homology is a cornerstone of this approach. As the scale parameter varies, topological features like connected components (0-dimensional holes), loops/tunnels (1-dimensional holes), and voids/cavities (2-dimensional holes) appear and disappear. Persistent homology tracks these features, recording their "birth" and "death" scales. The persistence of a feature—the duration it survives in the filtration—is considered a measure of its significance. These are summarized in persistence diagrams or barcodes. From these, various topological statistics can be derived, such as Betti numbers (β0, β1, β2, etc.), which count the number of independent k-dimensional holes at a given scale, and the Euler characteristic. Minkowski functionals, which originate from integral geometry, provide a related set of morphological descriptors quantifying the N-dimensional volume, surface area, mean curvature, and Euler characteristic of an excursion set, and are often used in conjunction with or as an alternative to persistent homology.

Characterizing the Cosmic Web: Voids, Filaments, and Clusters
TDA has proven exceptionally adept at identifying and quantifying the primary components of the cosmic web. Voids, the large underdense regions, are naturally captured as 3-dimensional topological features (β2 in 3D, or connected components of underdense regions). Persistent homology can delineate their hierarchical structure, sizes, and shapes, offering insights into their formation and their impact on surrounding galaxies and matter distribution. Studies utilizing persistence have explored the full topology of voids, revealing their non-trivial connectivity and evolution.
Filaments, the bridge-like structures connecting clusters, and sheets, the wall-like boundaries of voids, are captured by 1-dimensional (β1) and 2-dimensional topological features respectively, though often studied through their imprint on density fields or the connectivity of halos. The DisPerSE algorithm (Discrete Persistent Structures Extractor), for instance, leverages concepts from discrete Morse theory, closely related to TDA, to identify the filamentary network. Persistent homology provides a robust, parameter-free way to define and characterize these structures across all scales, moving beyond methods that rely on arbitrary thresholding. The statistical properties of these topological features, such as their number, length, and connectivity, serve as valuable probes of cosmological models.
The distribution and clustering of galaxy clusters, the densest nodes of the web, can also be informed by TDA. While TDA might not focus on the internal topology of individual clusters as much, their arrangement and the way they connect via filaments contribute to the overall topological signature of the LSS. The multi-scale nature of TDA is crucial here, as it can distinguish between small, local voids and tunnels versus large, significant structures.

TDA for Cosmological Parameter Estimation and Model Comparison
A significant promise of TDA lies in its potential to extract cosmological information and differentiate between various cosmological models. Topological statistics, such as Betti curves (Betti numbers as a function of scale/density threshold) or summaries of persistence diagrams, are sensitive to cosmological parameters like the matter density (Ωm), the amplitude of matter fluctuations (σ8), and parameters related to dark energy or neutrino mass. For example, the number and size distribution of voids, or the connectivity of the filamentary network, evolve differently under varying cosmological models.
Several studies have demonstrated that TDA-derived statistics can provide constraints on cosmological parameters comparable to, or complementary to, traditional methods. The ability of TDA to capture information beyond two-point statistics is particularly valuable, as this higher-order information can help break degeneracies between parameters. For instance, some modified gravity theories predict subtle alterations to the cosmic web structure—such as different void ellipticities or filament thicknesses—that TDA might be uniquely positioned to detect. Comparing TDA statistics from simulations of ΛCDM with those from alternative models (e.g., f(R) gravity or models with massive neutrinos) can thus serve as a powerful test of fundamental physics.

Bridging Gaps and Future Frontiers: Generative Insights from TDA
The intersection of TDA with other advanced computational techniques, particularly machine learning (ML), is a rapidly expanding frontier. TDA features, such as persistence images or Betti curves, can serve as robust, low-dimensional input summaries for ML algorithms, enhancing cosmological parameter inference, model comparison, or even anomaly detection in large datasets. This synergy allows for the exploitation of the complex, multi-scale information captured by TDA in a computationally efficient manner.
A speculative yet exciting avenue involves using TDA to probe the very early Universe. Primordial non-Gaussianities, for instance, could leave subtle topological imprints on the LSS that are not easily captured by standard statistics but might be discernible through higher-order Betti numbers or specific patterns in persistence diagrams. Furthermore, the topological evolution of the cosmic web over time, if precisely characterizable via TDA across different redshift epochs, could provide a novel "standard clock" or reveal deviations from expected structure growth rates, hinting at new physics.
One underexplored area is the application of multi-parameter persistent homology to LSS. Instead of filtering by a single parameter (e.g., density), one could simultaneously filter by density and another field, such as gas temperature, metallicity, or peculiar velocity magnitude. This could reveal correlations in the topological features of these different fields and provide a more holistic view of baryonic processes within the cosmic web. For example, are the "hottest" filaments topologically distinct from cooler ones? Do voids defined by galaxy underdensity show different internal velocity field topologies?
However, significant challenges remain. Computational scalability of TDA algorithms is a concern for upcoming datasets from surveys like Euclid, the Vera C. Rubin Observatory, and the Square Kilometre Array, which will map billions of galaxies. Developing faster algorithms and efficient summary statistics is crucial. Handling observational systematics—such as complex survey geometries, selection functions, redshift-space distortions, and galaxy bias—also requires sophisticated TDA-aware methodologies to ensure that extracted topological features are cosmological in origin and not artifacts of the observation process. The "Fast topological signal identification and persistent cohomological cycle matching" techniques offer promising steps in addressing computational costs.
Hypothesis: The "lifetime" distribution of topological features (voids, loops) within persistence diagrams, when appropriately normalized, might follow universal statistical patterns that are only perturbed by cosmological parameters or new physics. Deviations from such a "null hypothesis" derived from standard ΛCDM could be a sensitive indicator of exotic phenomena.
Future Direction: Exploring the topology of not just the matter distribution, but also derived fields like the tidal tensor field or the velocity shear tensor field using TDA could reveal anisotropic information about structure formation and collapse that is currently overlooked.

Conclusion
Topological Data Analysis offers a rich, mathematically rigorous, and physically insightful framework for exploring the large-scale structure of the Universe. It moves beyond traditional N-point statistics to provide a multi-scale characterization of the cosmic web's components—voids, filaments, sheets, and clusters. Its application has already yielded significant results in quantifying these structures and in constraining cosmological parameters.
The true generative potential of TDA in cosmology, however, may lie in its combination with machine learning, its application to new types of cosmological data (e.g., intensity mapping, kinetic Sunyaev-Zel'dovich maps), and the exploration of more advanced TDA concepts like multi-parameter persistence and higher-order topological features. Provocative questions remain: Can TDA robustly distinguish between dark energy models that produce similar expansion histories but different structure growth? Could TDA applied to early universe simulations reveal topological precursors to the observed LSS? As computational methods advance and new, vast cosmological datasets become available, TDA is poised to play an increasingly crucial role in not just describing the Universe, but in uncovering new physical principles governing its evolution and fundamental nature. The challenge and opportunity lie in developing TDA tools that are not only descriptive but also predictive and generative of new hypotheses.
References
- Sousbie, T. (2011). The persistent cosmic web and its filamentary structure. Monthly Notices of the Royal Astronomical Society, 414(1), 350-383. https://doi.org/10.1111/j.1365-2966.2011.18394.x
- Pranav, P., Feldbrugge, J., Hidding, J., van de Weygaert, R., Vegter, G., & Welling, M. (2019). The topology of voids in the cosmic web: persistence with a density-based filtration. Monthly Notices of the Royal Astronomical Society, 485(3), 3204–3220. https://doi.org/10.1093/mnras/stz607
- Cole, A., & Shiu, G. (2019). Topological data analysis for the string landscape. Journal of High Energy Physics, 2019(3), 54. https://doi.org/10.1007/JHEP03(2019)054
- Weygaert, R. van de, Vegter, G., Edelsbrunner, H., Jones, B. J. T., Pranav, P., Park, C., ... & Hidding, J. (2011). Alpha, Betti and the Megaparsec Universe: on the Topology of Large-Scale Lstructure. Transactions on Computational Science XIV, 60-101. https://doi.org/10.1007/978-3-642-25249-5_3
- Xu, Y., Chen,Y.-C., Ho, S., & Cisewski-Kehe, J. (2019). Topology of the Cosmic Web in the Excursion Set Theory. The Astrophysical Journal, 876(1), 40. https://doi.org/10.3847/1538-4357/ab140d
- Feldbrugge, J., van de Weygaert, R., Hidding, J., & Pranav, P. (2019). Topological signatures of modified gravity. arXiv:1908.09401 [astro-ph.CO]. https://arxiv.org/abs/1908.09401
- Biagetti, M., Cole, A., Shiu, G., & Torrado, J. (2021). Cosmological parameter inference with the cosmic web. Journal of Cosmology and Astroparticle Physics, 2021(07), 053. https://doi.org/10.1088/1475-7516/2021/07/053
- Wilding, M., van de Weygaert, R., Hidding, J., Vegter, G., & Hellwing, W. A. (2022). Persistent Homology of the Cosmic Dark Matter Web. Universe, 8(11), 561. https://doi.org/10.3390/universe8110561
- Heydenreich, S., Brück, B., & Harnois-Déraps, J. (2021). Minkowski functionals and persistent homology for weak lensing convergence maps. Monthly Notices of the Royal Astronomical Society, 508(4), 4948–4963. https://doi.org/10.1093/mnras/stab2860
- Park, C., Kim, Y.-R., & Chingangbam, P. (2013). Betti numbers of Gaussian random fields. Journal of the Korean Mathematical Society, 50(4), 843-862. https://doi.org/10.4134/JKMS.2013.50.4.843
- List, P. F., Banerjee, A., Lukić, Z., & Pope, A. (2023). Cosmological inference from the cosmic web with deep learning. Monthly Notices of the Royal Astronomical Society, 522(2), 2265–2278. https://doi.org/10.1093/mnras/stad1101
- Angulo, R. E., & Hahn, O. (2022). Large-scale dark matter simulations. Living Reviews in Computational Astrophysics, 8(1), 1. https://doi.org/10.1007/s41115-021-00013-z
- García-Redondo, I., Monod, A., & Song, A. (2024). Fast topological signal identification and persistent cohomological cycle matching. Journal of Applied and Computational Topology, 8(1), 137-179. https://doi.org/10.1007/s41468-024-00179-4
- Elbers, W., & van de Weygaert, R. (2019). Stochastic topology of the cosmic web. Physical Review D, 100(10), 103522. https://doi.org/10.1103/PhysRevD.100.103522
- Makarenko, N., Kiseleva, L., & Melnyk, O. (2021). Topology of the Observed LSS and its Connections with Galaxy Properties. Universe, 7(9), 326. https://doi.org/10.3390/universe7090326
- Codis, S., Pichon, C., & Pogosyan, D. (2018). The cosmic web: an imprint of primordial initial conditions. Reports on Progress in Physics, 81(6), 066901. https://doi.org/10.1088/1361-6633/aab723
- Cisewski-Kehe, J., Bendich, P., Marron, J.S., Nath, M., & Priebe, C.E. (2018). Persistent Homology for Random Fields and Its Application to Cosmic Web Skeletonization. arXiv:1810.04891 [astro-ph.CO]. https://arxiv.org/abs/1810.04891
- Cueli, M., Martínez, V. J., & Saar, E. (2006). Topology of the Luminous Red Galaxy Sample of the Sloan Digital Sky Survey. The Astrophysical Journal, 646(2), 786-795. https://doi.org/10.1086/505048
- Poulenard, S., & Mewes, V. (2018). Topological analysis of the cosmic web using the Lovász-Rosza method. Monthly Notices of the Royal Astronomical Society, 479(1), 144–159. https://doi.org/10.1093/mnras/sty1423
- Bobrowski, O., & Skraba, P. (2023). A universal null-distribution for topological data analysis. Scientific Reports, 13(1), 10812. https://doi.org/10.1038/s41598-023-37842-2
- Cole, A., Biagetti, M., & Shiu, G. (2022). Beyond L_p: A new generation of summary statistics for the cosmic web. Journal of Cosmology and Astroparticle Physics, 2022(06), 017. https://doi.org/10.1088/1475-7516/2022/06/017
- Xu, Frank Y., Biagetti, M., Brainerd, T. G., & Shiu, G. (2023). Probing the nature of dark matter with the cosmic web. arXiv:2311.17050 [astro-ph.CO]. https://arxiv.org/abs/2311.17050
- Bermejo, R., & O. Pantz. (2010). Delaunay tessellation field estimator for point patterns in d-dimensional Euclidean space. Monthly Notices of the Royal Astronomical Society 407.2: 1231-1246. https://doi.org/10.1111/j.1365-2966.2010.17017.x
- Gispert-Navarro, R. et al. (2023). The effect of galaxy bias on cosmic web topology from persistent homology. Astronomy & Astrophysics 678, A11. https://doi.org/10.1051/0004-6361/202346025
- Heydenreich, S. et al. (2021). Persistent homology as a probe for cosmology: distinguishing between ΛCDM and f(R) gravity with mocks of the DESI survey. Monthly Notices of the Royal Astronomical Society 503.1: 1159-1171. https://doi.org/10.1093/mnras/stab407
- Uyttendaele, M. et al. (2023). The topological structure of the cosmic web in hydrodynamic simulations. Monthly Notices of the Royal Astronomical Society 525.4: 5999-6014. https://doi.org/10.1093/mnras/stad2656
- Vacca, V., et al. (2021). Unveiling the topological properties of the cosmic web with Betti numbers. Monthly Notices of the Royal Astronomical Society, 507(4), 5776–5791. https://doi.org/10.1093/mnras/stab2448
- Kono, K. et al. (2023). The first application of persistent homology to the Subaru Hyper Suprime-Cam survey: A new method for finding pristine high-z quasars. Publications of the Astronomical Society of Japan, 75(1), 243-262. https://doi.org/10.1093/pasj/psac099
- Shivshankar, N. et al. (2022). Cosmological constraints from the topology of the DESI Legacy Imaging Surveys DR9. Monthly Notices of the Royal Astronomical Society, 517(4), 5187-5202. https://doi.org/10.1093/mnras/stac2954
- Zhao, Y. et al. (2019). Probing the cosmic web with fast radio bursts. Monthly Notices of the Royal Astronomical Society, 486(4), 5094-5103. https://doi.org/10.1093/mnras/stz1151
- Codis, S. et al. (2015). Minkowski functionals of the weak lensing convergence field: a new probe of the dark Universe. Monthly Notices of the Royal Astronomical Society, 452(4), 3369-3387. https://doi.org/10.1093/mnras/stv1469
- Wiegand, A., & Hahn, O. (2018). Delineating the cosmic web with the discrete persistent structure extractor (DisPerSE) and its application to the SDSS DR7 galaxy distribution. Monthly Notices of the Royal Astronomical Society, 477(2), 2551-2564. https://doi.org/10.1093/mnras/sty773
- Parroni, A. et al. (2023). The topology of the IllustrisTNG cosmic web. Monthly Notices of the Royal Astronomical Society, 521(1), 1130-1146. https://doi.org/10.1093/mnras/stad501
- Fang, T. et al. (2020). A topological view of the Zwicky Transient Facility DR3 galaxy survey. The Astrophysical Journal, 898(2), 125. https://doi.org/10.3847/1538-4357/ab9c8c
- James, J. B. et al. (2020). The topology of the cosmic web in the presence of massive neutrinos. arXiv:2001.04957 [astro-ph.CO]. https://arxiv.org/abs/2001.04957