HPC Meets GPUs: A Love Story

When HPC first encountered the GPU they appeared to have very little in common. HPC was all about parallel computing using powerful supercomputers and Linux cluster management to solve complex computational problems, whereas GPUs were computer hardware that found a niche in speeding up gaming graphics. Fast forward to today, where software-defined HPC is becoming an integral part of computing, and the roles of CPUs and GPUs are changing.

Full article here

ANF Évaluation de performance de codes HPC

Groupe Calcul, en partenariat avec les centres d’excellence HPC européens EoCoE et POP, propose une Action Nationale de Formation dédiée à l’évaluation de performance de codes de calcul HPC du 16-20 septembre 2019 à l’Observatoire de Haute Provence.

Deux parcours :

» Parcours « Base des outils » sur 2 jours ouvert à 25-30 personnes : présentation des outils Scalasca et Paraver (cours + TP).

» Parcours « Diagnostique HPC » sur 3 jours supplémentaires et ouvert à 10 personnes : intégration des outils dans vos codes, productions et analyse des métriques de performance et établissement d’une feuille de route d’optimisation.

Les cours, l’hébergement et les repas sont pris en charge.

Le nombre de places étant limité, vos motivations, votre projet et le code de calcul sur lequel vous souhaitez travailler, sa maturité et son impact sur les communautés scientifiques sont les critères qui seront pris en compte lors de l’évaluation des candidatures.

Date limite de pré-inscription.

17 Mai 2019

Inscription et information.

ISC 2019 Registration Now Open

Reap Early Bird Saving!

By registering between now and May 8 for the full conference pass, you can save over 45 percent off the onsite registration rates.

Passes are available under four different categories: tutorial, conference, exhibition and workshop, depending on your individual requirements. For the complete fee structure and pass descriptions, click here. Please note that tutorials and workshops require separate registrations.

ISC Conference Keynote: THE ALGORITHMS OF LIFE – SCIENTIFIC COMPUTING FOR SYSTEMS BIOLOGY

Monday, June 17

We are excited to announce Professor Ivo Sbalzarini as the 2019 conference opening keynote speaker. Under the title, The Algorithms of Life – Scientific Computing for Systems Biology, Sbalzarini will discuss how HPC is being used as a tool for scientific investigation and for hypothesis testing, as well as a more fundamental way to think about problems in systems biology. As the Chair of Scientific Computing for Systems Biology on the faculty of computer science of TU Dresden, Sbalzarini is fostering innovative ways to apply scientific computing to the life science domain. He is also the Senior Research Group Leader at the Max Planck Institute of Molecular Cell Biology and Genetics, as well as a member of the Faculty of Mathematics of TU Dresden.

Plan already now to attend all the three ISC keynotes. They are inspiring as well as educational.

Attendee Info & Agenda Planner

After long consideration, we have decided to go paper-free starting this year. Starting 2019, we will be offering all program-related information in our Agenda Planner, and all general information can be found on the Attendee Info page.

We encourage you to actively use these two links to prepare for ISC 2019.

The Machine Learning Day

This year’s Machine Learning Day is an exciting track, headed by two experienced data scientists. Dr. Yu Wang is a senior AI scientist at the Leibniz Supercomputing Centre and Dr. Azalia Mirhoseini is a senior researcher at Google Brain, leading ML for Systems Moonshot. The full program is available on the agenda planner.

In a few weeks you will also be able to find the Industrial Day program online.

All ISC Workshops Now Online

We are pleased to let you know that all ISC 2019 workshops are now online. We are hosting a total of 22 workshops, some of them are full-day sessions, and the others will be held as half-day sessions, taking place either in the morning or afternoon of Thursday, June 20.

Please note that if you book a full conference pass, you can save 50 euros off your workshop registration.

ISC Pre-Conference Party – Reserve Your Spot!

We are organizing a pre-conference party this year! We are looking forward to hosting 400 attendees at this event. This party is a great way to start the conference. For sign-up information, click here.

19th PRACE Call for Proposals for Project Access

The 19th PRACE Call for Proposals for Project Access is open till 30 April 2019. The allocation of awarded resources on Tier-0 systems is made for 1 year at a time with its utilisation starting in October 2019.

» Call for proposals: 5 March 2019 – 30 April 2019 @10h00 AM CEST
» Allocation period for awarded proposals: 1 October 2019 – 30 September 2020
» Available resources: Piz Daint (ETH Zürich/CSCS, Switzerland), Joliot-Curie SKL a Joliot-Curie KNL (GENCI@CEA, France), JUWELS (GCS@JSC, Germany), MareNostrum (BSC, Spain), HAWK (GCS@HLRS, Germany), Marconi-Broadwell a Marconi KNL (CINECA, Italy).

GENCI has set up an information support  to help you submit your PRACE project. Please send your requests to appels-prace@genci.fr

More information is available at the PRACE web site

Arrêt de Liger / Outage Liger

Maintenance trimestrielle programmée

Nous vous informons qu’une maintenance trimestrielle programmée (cf. calendrier mesocentre) concernant notre infrastructure aura lieu plutôt le jeudi 21 mars entre 13h30 et 17h30 CEST (au lieu du 12-13 mars de prévu sur le calendrier) profitant le même jour d’une coupure électrique du site récemment programmée par l’Ecole pour des tests d’audit.

En conséquence, la machine LIGER ainsi que ses services associés seront inaccessibles pendant cette période mais les travaux soumis seront préservés. Pour éviter toute nouvelle soumission, tous les noeuds de calcul seront drainés dès 8h00 le matin du 21 mars et exceptionnellement les travaux en file d’attente seront perdus.

Nous vous prions, par avance, de bien vouloir nous excuser de cette perturbation et de la gêne occasionnée.


Scheduled Quarterly Maintenance

We inform you that a scheduled quarterly maintenance (see site calendar) concerning our infrastructure will take place on Thursday, March 21st between 1:30 pm and 5:30 pm CEST where in the same day a general power outage of the site has been recently scheduled for audit tests .

As a result, the LIGER cluster and its related services will be unreacheable during this period but submitted jobs will be preserved. To avoid new submissions, all computing nodes will be drained by 8:00 am CEST on March 21st and all queued jobs will be exceptionally lost.

We apologize in advance for the service disruption and the inconvenience.

Topics Related to HPC in 2019

Algorithms

The development, evaluation and optimization of scalable, general-purpose, high performance algorithms.

» Algorithmic techniques to improve energy and power efficiency
» Algorithmic techniques to improve load balance
» Data-intensive parallel algorithms
» Discrete and combinatorial problems
» Fault-tolerant algorithms
» Graph and network algorithms
» Hybrid/heterogeneous/accelerated algorithms
» Numerical methods and algebraic systems
» Scheduling algorithms
» Uncertainty quantification
» Other high performance algorithms

Applications

The development and enhancement of algorithms, parallel implementations, models, software and problem solving environments for specific applications that require high performance resources.

» Bioinformatics and computational biology
» Computational earth and atmospheric sciences
» Computational materials science and engineering
» Computational astrophysics/astronomy, chemistry, and physics
» Computational fluid dynamics and mechanics
» Computation and data enabled social science
» Computational design optimization for aerospace, energy, manufacturing, and industrial applications
» Computational medicine and bioengineering
» Other high performance applications
» Use of uncertainty quantification, statistical, and machine-learning techniques to improve a specific HPC application
» Improved models, algorithms, performance or scalability of specific applications and respective software

Architecture and Networks

All aspects of high performance hardware including the optimization and evaluation of processors and networks.

» Memory systems: caches, memory technology, non-volatile memory, memory system architecture (to include address translation for cores and accelerators)
» I/O architecture/hardware and emerging storage technologies
» Network protocols, quality of service, congestion control, collective communication
» Scalable and composable coherence (for cores and accelerators)
» Multi-processor architecture and micro-architecture (e.g. reconfigurable, vector, stream, dataflow, GPUs, and custom/novel architecture)
» Interconnect technologies, topology, switch architecture, optical networks, software-defined networks
» Architectures to support extremely heterogeneous composable systems (e.g., chiplets)
» Secure architectures, side-channel attacks, and mitigation
» Power-efficient design and power-management strategies
» Resilience, error correction, high availability architectures
» Software/hardware co-design, domain specific language support
» Evaluation and measurement on testbed or production hardware systems
» Hardware acceleration of containerization and virtualization mechanisms for HPC

Clouds and Distributed Computing

All software aspects of clouds and distributed computing that are related to HPC systems, including software architecture, configuration, optimization and evaluation.

» Compute and storage cloud architectures including many-core computing and accelerators in the cloud
» HPC and cloud convergence at infrastructure and software level
» Innovative methods for using cloud-based systems for HPC applications
» Support and tuning of Big Data cloud data ecosystems on HPC infrastructures
» Parallel programming models and tools at the intersection of cloud and HPC
» Virtualization and containerization for HPC, virtualized high performance I/O network interconnects, parallel and distributed file systems in virtual environments
» Cloud workflow, data, and resource management including dynamic resource provisioning
» Methods, systems, and architectures for scalable data stream processing
» Scheduling, load balancing, resource provisioning, energy efficiency, fault tolerance, and reliability for cloud computing
» Self-configuration, management, information services, and monitoring
» Service-oriented architectures and tools for integration of clouds, clusters, and distributed computing
» Cloud security and identity management
» Science case studies on cloud infrastructure
» Machine learning for science in the cloud

Data Analytics, Visualization, and Storage

All aspects of data analytics, visualization, storage, and storage I/O related to HPC systems. Submissions on work done at scale are highly favored.

» Databases and scalable structured storage for HPC
» Data mining, analysis, and visualization for modeling and simulation
» Data analytics and frameworks supporting data analytics
» Ensemble analysis and visualization
» I/O performance tuning, benchmarking, and middleware
» Scalable storage systems
» Next-generation storage systems and media
» Parallel file, object, key-value, campaign, and archival systems
» Provenance, metadata, and data management
» Reliability and fault tolerance in HPC storage
» Scalable storage, metadata, namespaces, and data management
» Storage tiering, entirely on-premise internal tiering as well as tiering between on-premise and cloud
» Storage innovations using machine learning such as predictive tiering, failure, etc.
» Storage networks
» Cloud-based storage

Machine Learning and HPC

The development and enhancement of algorithms, systems, and software for scalable machine learning utilizing high-performance and cloud computing platforms.

» Machine learning and optimization models for extreme scale systems
» Enhancing applicability of machine learning in HPC (e.g. usability)
» Learning large models / optimizing hyper parameters (e.g. deep learning, representation learning)
» Facilitating very large ensembles in extreme scale systems
» Training machine learning models on large datasets and scientific data
» Overcoming the machine learning problems inherent to large datasets (e.g. noisy labels, missing data, scalable ingest)
» Large scale machine learning applications utilizing HPC
» Future research challenges for machine learning at large scale
» Hybrid machine learning algorithms for hybrid HPC compute architectures
» Systems, compilers, and languages for machine learning at scale

Performance Measurement, Modeling, and Tools

Novel methods and tools for measuring, evaluating, and/or analyzing performance for large scale systems.

» Analysis, modeling, prediction, or simulation methods
» Empirical measurement techniques on HPC systems
» Scalable tools and instrumentation infrastructure for measurement, monitoring, and/or visualization of performance
» Novel and broadly applicable performance optimization techniques
» Methodologies, metrics, and formalisms for performance analysis and tools
» Workload characterization and benchmarking techniques
» Performance studies of HPC subsystems such as processor, network, memory, accelerators, and storage
» System-design tradeoffs between different measures of performance (e.g., performance and resilience, performance and security)

Programming Systems

Technologies that support parallel programming for large-scale systems as well as smaller-scale components that will plausibly serve as building blocks for next-generation HPC architectures.

» Parallel programming languages, libraries, models, and notations
» Programming language and compilation techniques for reducing energy and data movement (e.g., precision allocation, use of approximations, tiling)
» Solutions for parallel-programming challenges (e.g., interoperability, memory consistency, determinism, race detection, work stealing, or load balancing)
» Parallel application frameworks
» Tools for parallel program development (e.g., debuggers and integrated development environments)
» Program analysis, synthesis, and verification to enhance cross-platform portability, maintainability, result reproducibility, resilience (e.g., combined static and dynamic analysis methods, testing, formal methods)
» Compiler analysis and optimization; program transformation
» Runtime systems as they interact with programming systems

State of the Practice

All R&D aspects of the pragmatic practices of HPC, including operational IT infrastructure, services, facilities, large-scale application executions and benchmarks.

» Bridging of cloud data centers and supercomputing centers
» Comparative system benchmarking over a wide spectrum of workloads
» Deployment experiences of large-scale infrastructures and facilities
» Facilitation of “big data” associated with supercomputing
» Long-term infrastructural management experiences
» Pragmatic resource management strategies and experiences
» Procurement, technology investment and acquisition best practices
» Quantitative results of education, training and dissemination activities
» User support experiences with large-scale and novel machines
» Infrastructural policy issues, especially international experiences
» Software engineering best practices for HPC

System Software

Operating system (OS), runtime system and other low-level software research & development that enables allocation and management of hardware resources for HPC applications and services.

» Alternative and specialized parallel operating systems and runtime systems
» Approaches for enabling adaptive and introspective system software
» Communication optimization
» Software distributed shared memory systems
» System-software support for global address spaces
» OS and runtime system enhancements for attached and integrated accelerators
» Interactions among the OS, runtime, compiler, middleware, and tools
» Parallel/networked file system integration with the OS and runtime
» Resource management
» Runtime and OS management of complex memory hierarchies
» System software strategies for controlling energy and temperature
» Support for fault tolerance and resilience
» Virtualization and virtual machines