The CDUX group pursues research at the intersection of HPC systems, AI, scientific visualization and analysis, scientific data reduction, computational science, and computer graphics. Click a topic in the graph below to read more about each area.

Click a bubble to read about that area  •  drag nodes to explore

High-Performance Computing

High-performance computing (HPC) harnesses supercomputers—machines built from thousands of interconnected nodes—to solve problems far beyond the reach of any single computer. CDUX studies how to run scientific workloads efficiently at extreme scale, addressing the challenges that arise from massive parallelism, complex communication patterns, deep memory hierarchies, and the increasingly heterogeneous architectures of modern leadership-class systems.

Our work spans the full range of the machine, from extracting performance on an individual node to coordinating computation across an entire supercomputer. The overarching goal is to enable scientific discovery on the world's largest computers, where data and computation are too large to be handled by conventional means.

GPU Computing

Modern supercomputers derive most of their computational power from graphics processing units (GPUs), which deliver enormous parallelism but require fundamentally different programming approaches than traditional CPUs. CDUX develops techniques for running visualization, analysis, and data-reduction algorithms efficiently across diverse many-core architectures.

A central theme of this work is performance portability: writing software once and having it run well on GPUs from different vendors—as well as on multi-core CPUs—without rewriting code for each platform. This makes it possible to deploy the same scientific software across the constantly changing landscape of supercomputing hardware. Learn more about how visualization software can run efficiently on diverse many-core architectures.

Artificial Intelligence & Machine Learning

Artificial intelligence and machine learning are transforming how scientists analyze and understand data. CDUX explores the intersection of AI/ML with high-performance computing and scientific visualization—using learned models to accelerate simulation and analysis, to guide automated decision-making during in situ processing, and to extract insight from massive scientific data sets.

We are also interested in the systems challenges of training and deploying models efficiently at scale, and in how AI techniques can help address long-standing problems in visualization and analysis when there is no human in the loop.

What Is Scientific Visualization?

Visualization is the branch of computer science devoted to analyzing data by visual means. The visualization field is frequently segmented into two sub-disciplines: scientific visualization and information visualization. The primary distinction between the two is that scientific visualization data normally has an implied spatial layout, and information visualization data does not. With scientific visualization, techniques can exploit the spatial properties of the information (meshes, grids, vectors, tensors, etc.) and utilize the three-dimensional capabilities of today's graphics engines to help visually present their analysis. The types of data associated with scientific visualization are often "scientific" in nature: engineering, climate, medical, etc. With information visualization, the data has no accompanying spatial information, and so the resulting visualizations have more flexibility in how to arrange data. The types of data associated with information visualization are diverse: tax records, twitter feed information, network usage statistics, etc. For both scientific visualization and information visualization, however, the goal is to enable insight into data.

General Challenges for Large-Data Scientific Visualization

Large data visualization is the area of visualization devoted to very large and complex data sets. There are two key challenges with large-data visualization:

  • The first challenge is obvious: how to process the scale of large data? How to load a terabyte (or petabyte, or exabyte), apply an algorithm to it, and render the result?
  • The second challenge is more subtle: how to gain insight from large data? That is, assuming that techniques do exist for processing very large data sets, then how do we ensure that the resulting visualizations do not produce images beyond what the human brain's visual processing system can interpret?

Challenges for Visualization on Supercomputers: In Situ Processing, Massive Parallelism, and Many-Core Nodes

While large scientific data sets can come from experimental or observational sources, such as medical instrumentation or sensor networks, they often come from simulation, and specifically from simulations on supercomputers. In this case, data sets are often stored on the supercomputers where they were generated, and the typical processing approach is to use parallel techniques on the supercomputer itself. Traditionally, the processing paradigm for visualization on supercomputers has been post hoc, i.e., a simulation program saves its data to a disk, and, later, a visualization program loads this data and operates on it. However, supercomputing trends are favoring compute capability over I/O bandwidth — disks speeds are getting faster, but not when compared to increases in the ability to generate data. Worse, visualization performance on supercomputers was already frequently limited by the time it takes to read data from disk, so increased data load times significantly worsen overall execution time. Taken altogether, post hoc processing is falling out of favor with our community. Instead, we are moving towards in situ processing, i.e., visualizing data as it is generated. Finally, leading-edge supercomputers are made up of many nodes, each of which contains multiple GPUs, and, with in situ processing, our visualization algorithms need to run on these architectures. This complex setting requires innovation to achieve high efficiency.

These supercomputing trends have led to many interesting research questions for visualization:

Finally, CDUX personnel have contributed to several resources which are useful for learning more about in situ processing on supercomputers:

Scientific Data Reduction

Scientific simulations now generate data far faster than it can be stored to disk or moved across a network. As a result, data reduction has become essential to modern science. CDUX develops compression and reduction techniques—including error-bounded lossy compression—that dramatically shrink data while preserving the features scientists care about.

By controlling exactly how much error is introduced, these methods let scientists save, transfer, and analyze their data within the tight I/O and storage budgets of today's supercomputers, without sacrificing scientific fidelity. Data reduction is tightly connected to in situ processing, where reduced data must be produced as a simulation runs.

Performance Analysis

Achieving good performance on supercomputers requires understanding where time and resources are spent. CDUX studies performance analysis and modeling for visualization and analysis workloads—characterizing how algorithms behave across different architectures, identifying bottlenecks, and building models that predict performance and guide tuning.

This understanding is key to using expensive computing resources efficiently and to making informed design decisions, particularly for in situ processing, where the cost of visualization must be weighed against the cost of the simulation it accompanies.