Given recent initiatives from funding agencies and a push to move academic research to be more openly accessible, managing research data has become a critical part of the research process. This tutorial will discuss how to adequately manage your data to ensure optimum visibility for you and your project, peer-reviewed publication options, and how to be more competitive when applying for research grants. Topics will include: data storage, metadata, writing a successful data management plan, accessibility, and ways to use data to promote your research.
The goal of this session is to give newcomers to Linux the skills and
knowledge to work confidently at the Linux command line. We'll cover basic
concepts, logging in, the Linux shell, useful commands, navigating the
directory structure, creating and editing files, running programs, and a few
handy tricks and tips. If you would like to
follow along with the examples, please bring a laptop that a) runs Linux, or b) allows you to log in to a Linux server using ssh.
This session will
Computational efficiency with increased focus on performance-per-energy-cost has become the overarching driver behind architectural considerations for Exascale HPC. The development of scientific and engineering application software that achieves full potential from such Exascale systems will require a level of hardware/software co-design that is a relatively new approach in
conventional HPC. NVIDIA GPUs are a centerpiece of the first pre-Exascale systems announced in the USA under the U.S. Department of Energy CORAL partnership, and are a key component of co-design collaborations that will strive for accelerated application readiness by system deployment in 2017.
This presentation will first examine the motivation and progress of GPU-based heterogeneous system architectures for the pre-Exascale phase of HPC. The second topic will introduce the requirements for extraction of fine-grain parallelism of application software and current state of GPU-accelerated modeling and simulation applications with review of the programming strategies deployed. Select examples will provide relevance to science-scale HPC practice that quantifies the benefits of heterogeneous vs. CPU-only computing. In addition to GPU use with x86 CPUs starting from the mid-2000’s, the POWER and ARM-64 CPU architectures have become available alternatives
starting in 2014. The third and final topic will provide roadmaps of GPU hardware and system software, and interoperability with these new host CPU platforms.
Globus is software-as-a-service for research data management, used at dozens of institutions and national facilities for moving and sharing big data. Recent additions to Globus are services for data publication and discovery that enable: publication of large research data sets with appropriate policies for all types of institutions and researchers; the ability to publish data using your own storage or cloud storage that you manage, without third party publishers; extensible metadata that describe the specific attributes for your field of research; publication and curation workflows that can be easily tailored to meet institutional requirements; public and restricted collections that give you complete control over who may access your published data; a rich discovery model that allows others to search and use your published data. This presentation will give an overview and demonstration of these services, as well as case studies that illustrate how institutions are using Globus for data publication and discovery.
Scalable systems management might appear to be a new problem that needs to be solved as cloud and hyperscale solutions become more mainstream. However, scalable management has been a problem that has needed a solution in high performance computing for almost 20 years beginning with the emergence of Beowulf clusters. Several open source and proprietary toolkits exists today with each differing in their ability to quickly and easily scale. One such toolkit, the Extreme Cluster Administration Toolkit (xCAT: http://sourceforge.net/p/xcat/wiki/Main_Page/) first came on the systems management scene in the early 2000s with version 1.x and has since matured to it’s current 2.9.1 version. xCAT is a scalable cluster management and provisioning tool that provides hardware control, discovery, and OS diskful/diskfree deployment for all types of scalable systems beyond HPC. xCAT is open source today, will continue to be in the future and is strategic to our scalable systems management strategy at Lenovo. Come listen to this presentation to learn how the xCAT team is working to solve tomorrow’s system management problems with the emergence of the new Confluent module. This module is poised to simplify extreme scalability and ease of use issue so users can minimize downtime and increase productivity of their systems.
This talk will discuss how to publish datasets, as well as peer-reviewed publications, and how to use these to advance your academic career.
Cyberinfrastructure is an intensely local activity that has regional and national impacts as more and more researchers collaborate with peers across the region/nation. This session explores networking, storage, and computational infrastructures and where current and anticipated choke points are which inhibit broader collaboration. Inversely speakers will explore areas where the cyberinfrastructure is ripe for exploitation.
Managing folders, files, and the data in them is a key skill for researchers in Linux-based computing environments. We'll cover creating, moving, searching, and manipulating files and data using Linux commands and utilities. A major goal will be increasing the productivity and efficiency of your workflows. If you would like to follow along with the examples, please bring a laptop that a) runs Linux or Mac OSX, or b) allows you to log in to a Linux server using ssh. Some previous experience with the Linux command line would be helpful.
This tutorial will demonstrate how to use and set up the IPython notebook (used frequently in the meetup presentations) both locally (on your laptop) and remotely (on a supercomputer).
For both OpenMP and MPI tutorials we assume no expertise in parallel programming. It is expected that you are familiar with a compiled language like C, C++ or Fortran. These tutorials are hands-on, please bring a sufficiently recent (mutli-core) laptop so as to be able to participate.
This course introduces the fundamentals of shared memory programming. Teaching you how to code using OpenMP, providing hands-on experience of parallel computing geared towards numerical applications.
Topics:
Introduction to OpenMP
Creating Threads
Parallel Loops
Synchronization
Memory model
Tasks
We will be facilitating live data analysis on the Janus supercomputer for this tutorial. If you wish to participate, please bring a laptop with a configured ssh client.
Realistic models of the natural world often require large computer models, commonly addressed using high-performance computing (HPC) with supercomputers – the subject of much of this RMACC Symposium. However, many large models are better suited for the complimentary approach of high-throughput computing (HTC). Problems suited for HTC are those that, when broken down to their component parts, have computational requirements small enough to fit on a standard desktop computer but where a very large number of runs of those component parts are needed (for example, Monte Carlo analysis or genome mapping using Grid computing). Or put another way, HPC problems can be addressed using one very large run, where HTC problems require many runs to be fully answered. Computational capabilities of desktop computers have greatly increased over the last three decades – including multiple cores on a single CPU. Such advances present new opportunities for solving societal problems using HTC approaches. This workshop will cover the elements of an HTC problem, examples of open-source HTC software, and demonstration of an HTC run.
The popular revision control system “Git” still has a reputation for being difficult to learn; but the underlying system is relatively simple with a little bit of knowledge. Through a series of real-word workflow examples, we’ll demonstrate not only how to use Git to track changes and collaborate with others; but how to understand what Git is doing to track, record, and protect the history of your data.
Matplotlib is a 2D plotting library in python which produces publication-quality figures in a wide variety of formats and environments. Matplotlib tries to make easy things easy and hard things possible with just a few lines of code. In the Matplotlib tutorial, we will demonstrate how to produce quality images for basic plots, scatter plots, vector plots, histograms, streamlines, contours, and others. We will also present some extensions such as the basemap toolkit which is a python library for plotting 2D data on maps.
The Linux shell is much more than just a way to enter individual commands. In this session, we'll learn to use bash's built-in programming elements, including loops, tests and conditions, variables, and functions. With the full power of the shell at your fingertips, your efficiency and productivity will skyrocket! If you would like to follow along with the examples, please bring a laptop that a) runs Linux or Mac OSX, or b) allows you to log in to a Linux server using ssh. Previous experience with the Linux command line would be helpful.
This talk will be geared toward Matlab users who are interested in learning Python. We will discuss similar ways to achieve the same goals (such as reading in data, plotting, etc) in Python that you already know how to do in Matlab. It is intended to be a high level overview.
This course introduces the fundamentals of distributed memory programming, using the Message Passing Interface (MPI) standard. Similar to the OpenMP tutorial, we will be using a hands-on approach.
Topics:
Introduction to MPI
Point to Point Communication
Collective Communication
Virtual Topologies
Debugging Parallel Programs
There are many recent additions to Python that make it an excellent programming language for data analysis. This tutorial has two goals. First, we introduce several of the recent Python modules for data analysis. We provide hands-on exercises for manipulating and analyzing data using pandas, scikit-learn, and other modules. Second, we execute examples using the IPython notebook, a web-based interactive development environment that facilitates documentation, sharing, and remote execution. Together these tools create a powerful, new way to approach scientific workflows for data analysis on HPC systems.