Tech Project

CATALYST - Data Analytics for HPC

BigDataHPCSimulation

Open Call

In Residency

Residency Outcome

CATALYST advances computing by converging HPC and Big Data to allow a workflow between simulation + analytics.

https://www.hlrs.de/de/about-us/research/current-projects/data-analytics-for-hpc/

From Oct. 1, 2016 to Sept. 1, 2019

CRAY Urika Credits: -
Logo Credits: -

Description of the challenges faced by the Tech Project

The challenge in the project is twofold: first challenge is to find the right way to extract the interesting information from the available data. The second challenge is much more complex, it is about finding and extracting relevant information from the data pool often combining several data sources, not knowing, what one is looking for. The available hardware together with the specific software (both mentioned above) are the tools to work on those challenges. One CATALYST task tries to classificate turbulence in flow data sets, another task tries to extract additional information from real time public transport data for example.

Brief description of technology

HLRS operates two Cray Urika-GX systems, which are composed of highly specialised hardware components and a tailored software stack in order to enable High Performance Data Analytics. Within the scope of the project, the artist will get access to 64 system nodes that consist each of 2 Intel Xeon processors, 512 GB memory and a local 1.6 TB PCIe SSD scratch storage for data caching. In addition, for large-scale data, access to a 240TB Lustre filesystem is provided. Together with the hardware, the artist will be able to use the optimised Cray Data Analytics software stack that encompasses the state-of-the-art tools Apache Hadoop, Apache Spark as well as the CRAY Graph Engine, amongst others. The Urika-GX system is built on Centos 7 Linux and can be accessed via standard remote shell connections (SSH). For simplified system usage, innovative tools for programming and system management, such as Jupyter Notebooks or the adapted OpenStack Dashboard are available as well.

What the project is looking to gain from the collaboration and what kind of artist would be suitable

Due to the fact that Big Data can be applied to all kind of subjects and topics, we are quite open what the context of the artwork will be. This includes engineering challenges as well as current social or ecological problems. We imagine that a cooperation with an artist can bring new and innovative thoughts to our fields and directions of research. Especially regarding new insights in different fields and contexts of Big Data. As we are dealing with a quite complex and abstract topic, we would also welcome a way to visualize and explain our research outcomes in an understandable manner through the artwork. Artists should have knowledge in the fields of Programming, Computing and ideally already some experience with High Performance Computing. A mathematical background would be also favorable.

Resources available to the artist

The residency will take place at the High Performance Computing Center Stuttgart (HLRS). HLRS is operating leading edge High Performance Computing (HPC) systems as well as engaging in teaching, general research and industry collaborations in the field of HPC. Its current largest system, the Supercomputer Hazelhen has a peak performance of 7,42 Petaflops. We will offer the artist a steady workplace including internet connection at HLRS for the period of the residency. The artist will be integrated in the different work fields of the project contributors working at HLRS and supported by involved staff member according to their work schedule and work load.