In Situ Reduction, Post Hoc Exploration

The In Situ Reduction, Post Hoc Exploration paradigm:
How to reduce data to a small enough form that it can be saved to disk and yet maintain enough accuracy to be useful for analysis?

Overview

An important consideration for in situ processing is whether the desired visualizations are known a priori. When they are known a priori, the visualizations can be specified ahead of time and carried out as the data is generated. When they are not known a priori, in situ processing is more complicated, since it is not clear which visualizations to carry out. One approach is to refuse in situ processing, i.e., to do post hoc processing instead, although saving data less frequently. The problem with this approach is that I/O constraints may cause the data to become so sparse temporally that important phenomena may be lost (i.e., a phenomenon begins after one time slice is saved and ends before the next one) or that features cannot be tracked over time.

Another approach when lacking a priori knowledge, and the research challenge this webpage focuses on, is to use a combination of in situ and post hoc processing. Specifically, data is transformed and reduced in situ, the resulting reduced data is saved to disk, and then this data is explored later in the traditional post hoc manner. The benefit of this idea is that the reduced data could be small enough to store sufficient temporal frequency even in the face of severe I/O constraints.

The remainder of this section is organized in three parts: (1) Lagrangian flow, (2) wavelet compression, and (3) additional results.

Lagrangian Flow

While many researchers have considered the benefits of considering flow from the Lagrangian perspective, our work was the first to envision using Lagrangian basis flows as an in situ data reduction operator. The idea is to extract Lagrangian basis flows (i.e., pathlines) in situ, to store the basis flows to disk, and then later to do post hoc exploration by interpolating from the basis flows. The Lagrangian approach has an advantage over the traditional (Eulerian) approach, as its basis flows are calculated in situ, giving it access to all spatiotemporal data. As a result, the basis flows accurately capture an interval in time. This contrasts with the Eulerian method (the traditional approach), where time steps of vector data are stored and post hoc analysis requires temporal interpolation between these time steps, introducing error. In all, Lagrangian methods have the potential to represent more information per byte than the Eulerian approach, enabling more accurate analysis for the same storage or enabling the same accuracy with less storage. The initial work, a collaboration by UC Davis Ph.D. student Alexy Agranovsky, Hank Childs, and others, was on the viability of the approach. Example results from this work included 12X accuracy improvements using the same storage, and the same accuracy with 64X less storage. This work was awarded Best Paper at LDAV14. After this initial work, CDUX alumnus Sudhanshu Sane pursued improvements to the Lagrangian approach as part of his dissertation. Sudhanshu started by performing more extensive evaluations of the initial work, in order to provide a better baseline for future comparisons. His second work extended the possibilities for increasing efficiency. While the initial work placed pathlines at regular seeding locations and terminated them at regular time intervals, which we subsequently termed "fixed duration / fixed placement" (FDFP), Sudhanshu introduced "variable duration / variable placement" (VDVP) basis flows, which led to factor-of-two improvements. That said, his key contribution here was an interpolation scheme that can deal with VDVP basis flows, opening the door to new strategies that go beyond the FDFP's more regimented approach. Sudhanshu's third work optimized the time it takes to calculate basis flows in situ, reducing the burden on simulation codes. His main idea was to terminate basis flow calculation at block boundaries, creating a communication-free approach. While this approach will clearly improve performance, the surprising aspect was how little error it introduced. This work was awarded Best Paper at EGPGV21. Finally, Sudhanshu's fourth work on Lagrangian flow put together all of his previous results and showed viability for both cosmology and seismology simulations on Oak Ridge's Summit supercomputer. This work was awarded Best Paper for ICCS21's Main Track, which received 650 submissions.

Wavelet Compression

Our group has produced several results considering the viability of wavelet compression as an in situ reduction operator, and this topic was the primary dissertation question of Samuel Li. Sam's first research thrust was on evaluating the integrity of scientific data sets after undergoing a wavelet transformation. In this work, Sam also demonstrated the benefit of bringing in modern wavelet approaches (CDF kernels and prioritized coefficients) for scientific data. Sam's second research thrust was on how to carry out wavelet transforms on exascale computers. Sam had two works on this front, one on portably performant approaches and another on achieving better accuracy by leveraging deep memory hierarchies to embrace temporal coherence. Sam’s final research thrust considered whether wavelet transformations could be accomplished within simulation codes' time budgets, and evaluating their savings at scale. Sam's work on this topic included runs of up to 1000 MPI ranks on NCAR’s Cheyenne supercomputer. These works combined to answer his dissertation question: yes, wavelet compression is a viable reduction operator for exascale computing. Finally, Nicole Marsaglia published an extension to this work, on dynamically adapting the I/O budget between the wavelet compressors on each node to where it will do the most good.

Additional Results

While our in situ reduction research is deepest on Lagrangian flow and wavelets, we also have considered other approaches. First, Nicole Marsaglia looked at using temporal intervals, which can be useful for simulations that evolve slowly in time. Second, James Kress considered binning, and showed its applicability for fusion simulations. Third, Matt Larsen contributed a new technique for accelerating the rendering of multiple images simultaneously, motivated by the in situ reduction approach taken by the Cinema project (and others) where in situ processing is used to generate thousands of images, rather than save out mesh and field data. Finally, while not a research result, Samuel Li produced a very useful survey on data reduction operators for his Area Exam Paper, which was ultimately published in Computer Graphics Forum.

CDUX People

Shaomeng (Samuel) Li
Ph.D. Student (alum)

Sudhanshu Sane
Ph.D. Student (alum)

Nicole Marsaglia
Ph.D. Student (alum)

Matt Larsen
Ph.D. Student (alum)

James Kress
Ph.D. Student (alum)

Abhishek Yenpure
Ph.D. Candidate

Kristi Belcher
M.S. Student (alum)

Hank Childs
CDUX Director

Publications (Lagrangian Flow)

Investigating In Situ Reduction via Lagrangian Representations for Cosmology and Seismology Applications
Sudhanshu Sane, Christopher R. Johnson, and Hank Childs
International Conference on Computational Science (ICCS), Krakow, Poland, June 2021
Best Paper, Main Track

[PDF] [BIB]

Scalable In Situ Computation of Lagrangian Representations via Local Flow Maps
Sudhanshu Sane, Abhishek Yenpure, Roxana Bujack, Matt Larsen, Ken Moreland, Christoph Garth, Christopher R. Johnson, and Hank Childs
Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), Zurich, Switzerland, June 2021
Best Paper

[PDF] [BIB]

An Interpolation Scheme for VDVP Lagrangian Basis Flows
Sudhanshu Sane, Hank Childs, and Roxana Bujack
Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), Porto, Portugal, June 2019

[PDF] [BIB]

Revisiting the Evaluation of In Situ Lagrangian Analysis
Sudhanshu Sane, Roxana Bujack, and Hank Childs
Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), Brno, Czech Republic, June 2018

[PDF] [BIB]

Improved Post Hoc Flow Analysis Via Lagrangian Representations
Alexy Agranovsky, David Camp, Christoph Garth, Wes Bethel, Kenneth I. Joy, and Hank Childs
IEEE Large Data Analysis and Visualization (LDAV), Paris, France, November 2014
Best Paper

[PDF] [BIB]

Publications (Wavelets)

Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data
Samuel Li, Sudhanshu Sane, Leigh Orf, Pablo Mininni, John Clyne, and Hank Childs
IEEE Cluster Conference, Honolulu, HI, September 2017

[PDF] [BIB]

Evaluating the Efficacy of Wavelet Configurations on Turbulent-Flow Data
Samuel Li, Kenny Gruchalla, Kristi Potter, John Clyne, and Hank Childs
IEEE Symposium on Large Data Analysis and Visualization (LDAV), Chicago, IL, October 2015

[PDF] [BIB]

Achieving Portable Performance For Wavelet Compression Using Data Parallel Primitives
Samuel Li, Nicole Marsaglia, Vincent Chen, Christopher Sewell, John Clyne, and and Hank Childs
Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), Barcelona, Spain, June 2017

[PDF] [BIB]

Performance Impacts of In Situ Wavelet Compression on Scientific Simulations
Samuel Li, Matt Larsen, John Clyne, and Hank Childs
ISAV 2017: In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, Denver, CO, November 2017

[PDF] [BIB]

Dynamic I/O Budget Reallocation For In Situ Wavelet Compression
Nicole Marsaglia, Shaomeng Li, Kristi Belcher, Matt Larsen, and Hank Childs,
Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), Porto, Portugal, June 2019

[PDF] [BIB]

Publications (Additional Results)

Enabling Explorative Visualization With Full Temporal Resolution Via In Situ Calculation Of Temporal Intervals
Nicole Marsaglia, Shaomeng Li, and Hank Childs
ISC High Performance 2018 International Workshops (Lecture Notes in Computer Science), Frankfurt, Germany, June 2018

[PDF] [BIB]

Binning Based Data Reduction for Vector Field Data of a Particle-In-Cell Fusion Simulation
James Kress, Jong Choi, Scott Klasky, Michael Churchill, Hank Childs, and David Pugmire
ISC High Performance 2018 International Workshops (Lecture Notes in Computer Science), Frankfurt, Germany, June 2018

[PDF] [BIB]

Optimizing Multi-Image Sort-Last Parallel Rendering
Matthew Larsen, Ken Moreland, Chris Johnson, and Hank Childs
IEEE Symposium on Large Data Analysis and Visualization (LDAV), Baltimore, MD, October 2016

[PDF] [BIB]

Data Reduction Techniques for Simulation, Visualization, and Data Analysis
Samuel Li, Nicole Marsaglia, Christoph Garth, Jon Woodring, John Clyne, and Hank Childs
Computer Graphics Forum, September 2018

[PDF] [BIB]