ALLOCATIONS

Getting an Allocation on Frontera
NSF Petascale Computing Resource Allocation Program

All activities of the Frontera project support the overriding project goal of serving "as a national resource for providing predictable and sustained long-term leadership computing capabilities for science and engineering to push the frontiers of knowledge...". In the context of Frontera, we believe this means enabling computation-based scientific discovery, with a particular focus on "big science." At least 80% of the capacity of Frontera, or about 55 million node hours, will be made available for to scientists and engineers around the country using NSF's Petascale Computing Resource Allocation (PRAC) program.

In general, proposers must show compelling science or engineering challenges that require multi-petascale computing resources, and they must be prepared to demonstrate that they can effectively exploit the multi-petascale computing capabilities offered by Frontera. Proposals from or including junior researchers are encouraged, as one of the goals of this solicitation is to build a community capable of using petascale computing. The Frontera team will offer consulting support and assistance to each project team that is granted access through this solicitation.

A description of the most recent call for allocations can be found here, as a guide to researchers considering future proposals.

Pathways for Computational Growth

For many users, making the transition from more general-use resources, such as Stampede 2, to a system that emphasizes large-scale computing can be difficult. It is here that latent scaling bugs (such as race conditions, hard coded array bounds, and so on) are often encountered for the first time. Frontera's future allocation options will include "Pathways" allocations for projects that believe they are ready to begin scaling up, but have not yet fully tested their applications at scale. When instituted, we expect that Pathways allocations would typically be around 15% of system capacity.

Check back here, and watch "user news" on Frontera, for more information as it becomes available.

Emerging Critical Needs

The remaining 5% of Frontera's resources are reserved for allocation by TACC's executive director to address areas of urgent national need (i.e. response to national disasters) and industry projects.

Estimating Node Hours for Allocation Requests

As with other TACC systems, the fundamental allocation unit on Frontera is a node, and allocations are awarded in node hours. This is true for both the Xeon and single-precision accelerator-based nodes. A project is charged 1 node-hour for the use of a node irrespective of how much work is done on the node (i.e., a job that uses only half the cores in a node is still charge 1 node hour for an hour of computation). As is typical of most HPC systems, the compute nodes are exclusive to a job so that only one job may access all the compute resources provided by the node(s) allocated to the job. 38 PFLOPS of the system's total capability will be provided by >8,000 nodes of Intel's next generation Xeon processor; approximately 8 PF of additional capability will be provided by a single-precision partition of the system. In order to assist in writing allocation proposals, we advise that teams assume that their application will run between 10% and 15% faster on Frontera than Stampede 2 for a fixed node count.

The service unit to use in requests for Frontera is "node-hours", simply representing a wall-clock hour on a single physical node. This is the same unit used on the Blue Waters system.

Storage

Frontera will have multiple file systems; In addition to a home directory, users will be assigned to one of several scratch filesystems on a rotating basis. Each scratch filesystem will have a disk capacity of approximately 15 usable petabytes, and each individual filesystem is expected to maintain a bandwidth of >100GB/s. Total scratch capacity will exceed 50PB. Users with very high bandwidth or IOPS requirements will be able to request an allocation on an all-NVMe filesystem with an approximate capacity of 3PB, and bandwidth of ~1.2TB/s. We intend to limit the number of simultaneous users on the "solid state" scratch component. Users may also request allocation on the "/work" filesystem, a medium-term filesystem (longer than scratch, but shorter than home or archive) which is shared among all TACC platforms. Work is for migration between systems, or the need for "semi-persistent" storage (i.e., a copy of reference genomes that are infrequently updated, but used by compute jobs over 1-2 years). Storage system hardware is provided by DataDirect Networks.

Users may also request archive space on Ranch, the TACC archive system. Ranch is currently undergoing upgrades which will, among other things, dramatically increase the disk cache of the archive to approximately 30PB. The goal of this upgrade is to keep all archive data on disk that has been used in recent months, and to keep small files on disk for up to several years if occasionally being re-used. Behind the disk storage will be multiple tape libraries (with capacity scalable to an exabyte or more); one of the tape options will offer encryption and appropriate compliance to store controlled unclassified information (if CUI data is required, users should specify this in the allocation request).