The CDUX group pursues research in scientific visualization, high-performance computing, computational science, and computer graphics, and especially focuses on problems where these areas intersect. That said, much of our group's results to date have considered visualization on supercomputers. This page provides more background on this topic, and concludes with links to pages summarizing some research highlights from CDUX members.

What Is Scientific Visualization?

Visualization is the branch of computer science devoted to analyzing data by visual means. The visualization field is frequently segmented into two sub-disciplines: scientific visualization and information visualization. The primary distinction between the two is that scientific visualization data normally has an implied spatial layout, and information visualization data does not. With scientific visualization, techniques can exploit the spatial properties of the information (meshes, grids, vectors, tensors, etc.) and utilize the three-dimensional capabilities of today’s graphics engines to help visually present their analysis. The types of data associated with scientific visualization are often "scientific" in nature: engineering, climate, medical, etc. With information visualization, the data has no accompanying spatial information, and so the resulting visualizations have more flexibility in how to arrange data. The types of data associated with information visualization are diverse: tax records, twitter feed information, network usage statistics, etc. For both scientific visualization and information visualization, however, the goal is to enable insight into data.

General Challenges for Large-Data Scientific Visualization

Large data visualization is the area of visualization devoted to very large and complex data sets. There are two key challenges with large-data visualization:

  • The first challenge is obvious: how to process the scale of large data? How to load a terabyte (or petabyte, or exabyte), apply an algorithm to it, and render the result?
  • The second challenge is more subtle: how to gain insight from large data? That is, assuming that techniques do exist for processing very large data sets, then how do we ensure that the resulting visualizations do not produce images beyond what the human brain’s visual processing system can interpret?

Challenges for Visualization on Supercomputers: In Situ Processing, Massive Parallelism, and Many-Core Nodes

While large scientific data sets can come from experimental or observational sources, such as medical instrumentation or sensor networks, they often come from simulation, and specifically from simulations on supercomputers. In this case, data sets are often stored on the supercomputers where they were generated, and the typical processing approach is to use parallel techniques on the supercomputer itself. Traditionally, the processing paradigm for visualization on supercomputers has been post hoc, i.e., a simulation program saves its data to a disk, and, later, a visualization program loads this data and operates on it. However, supercomputing trends are favoring compute capability over I/O bandwidth — disks speeds are getting faster, but not when compared to increases in the ability to generate data. Worse, visualization performance on supercomputers was already frequently limited by the time it takes to read data from disk, so increased data load times significantly worsen overall execution time. Taken altogether, post hoc processing is falling out of favor with our community. Instead, we are moving towards in situ processing, i.e., visualizing data as it is generated. Finally, leading-edge supercomputers are made up of many nodes, each of which contains multiple GPUs, and, with in situ processing, our visualization algorithms need to run on these architectures. This complex setting requires innovation to achieve high efficiency.

These supercomputing trends have led to many interesting research questions for visualization:

Finally, CDUX personnel have contributed to several resources which are useful for learning more about in situ processing on supercomputers: