Managing Files on Frontera

Discuss Frontera's File Systems, navigating the global shared filesystem, (Stockyard), and transferring your files to and from Frontera.

File Systems on Frontera

Table 2. Frontera File Systems

File System Quota Key Features
$HOME 25GB, 400,000 files Not intended for parallel or high-intensity file operations.
Backed up regularly.
Defaults: 1 stripe, 1MB stripe size.
Not purged.
$WORK 1TB, 3,000,000 files across all TACC systems,
regardless of where on the file system the files reside.
Not intended for high-intensity file operations or jobs involving very large files.
On the Global Shared File System that is mounted on most TACC systems.
Defaults: 1 stripe, 1MB stripe size.
Not backed up.
Not purged.
$SCRATCH1 no quota Overall capacity 44 PB.
Defaults: 1 stripe, 1MB stripe size.
Not backed up.
Subject to purge if access time* is more than 10 days old.

*The operating system updates a file's access time when that file is modified on a login or compute node. Reading or executing a file/script on a login node does not update the access time, but reading or executing on a compute node does update the access time. This approach helps us distinguish between routine management tasks (e.g. tar, scp) and production use. Use the command ls -ul to view access times.

Frontera mounts three Lustre file systems that are shared across all nodes: the home, work, and scratch file systems. Frontera will have a fourth file system, FLASH, supporting applications with very high bandwidth or IOPS requirements that will be an allocatable resource. Frontera's startup mechanisms define corresponding account-level environment variables $HOME, $SCRATCH and $WORK that store the paths to directories that you own on each of these file systems. Consult the Frontera File Systems table above for the basic characteristics of these file systems, and the Good Citizenship sections for guidance on file system etiquette.

Frontera's home and scratch file systems are mounted only on Frontera, but the work file system mounted on Frontera is the Global Shared File System hosted on Stockyard. This is the same work file system that is currently available on Lonestar5, Stampede2 and several other TACC resources.

The $STOCKYARD environment variable points to the highest-level directory that you own on the Global Shared File System. The definition of the $STOCKYARD environment variable is of course account-specific, but you will see the same value on all TACC systems that provide access to the Global Shared File System (see Table 3). This directory is an excellent place to store files you want to access regularly from multiple TACC resources.

Stockyard File System
Figure 3. Stockyard File System

Figure 3. Account-level directories on the work file system (Global Shared File System hosted on Stockyard). Example for fictitious user bjones. All directories usable from all systems. Sub-directories (e.g. lonestar5, maverick2) exist only if you have allocations on the associated system.

Your account-specific $WORK environment variable varies from system to system and is a subdirectory of $STOCKYARD (Figure 3). The subdirectory name corresponds to the associated TACC resource. The $WORK environment variable on Frontera points to the $STOCKYARD/frontera subdirectory, a convenient location for files you use and jobs you run on Frontera. Remember, however, that all subdirectories contained in your $STOCKYARD directory are available to you from any system that mounts the file system. If you have accounts on both Frontera and Stampede2, for example, the $STOCKYARD/frontera directory is available from your Stampede2 account, and $STOCKYARD/stampede2 directory is available from your Frontera account. Your quota and reported usage on the Global Shared File System reflects all files that you own on Stockyard, regardless of their actual location on the file system.

Note that resource-specific subdirectories of $STOCKYARD are simply convenient ways to manage your resource-specific files. You have access to any such subdirectory from any TACC resources. If you are logged into Frontera, for example, executing the alias cdw (equivalent to cd $WORK) will take you to the resource-specific subdirectory $STOCKYARD/frontera. But you can access this directory from other TACC systems as well by executing cd $STOCKYARD/frontera. These commands allow you to share files across TACC systems. In fact, several convenient account-level aliases make it even easier to navigate across the directories you own in the shared file systems:

Table 3. Built-in Account Level Aliases

Alias Command
cd or cdh cd $HOME
cdw cd $WORK
cds cd $SCRATCH
cdy or cdg cd $STOCKYARD

Transferring your Files

Transferring Files with scp

You can transfer files between Frontera and Linux-based systems using either scp or rsync. Both scp and rsync are available in the Mac Terminal app. Windows SSH clients typically include scp-based file transfer capabilities.

The Linux scp (secure copy) utility is a component of the OpenSSH suite. Assuming your Frontera username is bjones, a simple scp transfer that pushes a file named myfile from your local Linux system to Frontera $HOME would look like this:

localhost$ scp ./myfile bjones@frontera.tacc.utexas.edu:  # note colon after net address

You can use wildcards, but you need to be careful about when and where you want wildcard expansion to occur. For example, to push all files ending in .txt from the current directory on your local machine to /work/01234/bjones/scripts on Frontera:

localhost$ scp *.txt bjones@frontera.tacc.utexas.edu:/work/01234/bjones/frontera

To delay wildcard expansion until reaching Frontera, use a backslash (\) as an escape character before the wildcard. For example, to pull all files ending in .txt from /work/01234/bjones/scripts on Frontera to the current directory on your local system:

localhost$ scp bjones@frontera.tacc.utexas.edu:/work/01234/bjones/frontera/\*.txt .

You can of course use shell or environment variables in your calls to scp. For example:

localhost$ destdir="/work/01234/bjones/frontera/data"
localhost$ scp ./myfile bjones@frontera.tacc.utexas.edu:$destdir

You can also issue scp commands on your local client that use Frontera environment variables like $HOME, $WORK, and $SCRATCH. To do so, use a backslash (\) as an escape character before the $; this ensures that expansion occurs after establishing the connection to Frontera:

localhost$ scp ./myfile bjones@frontera.tacc.utexas.edu:\$WORK/data   # Note backslash

Avoid using scp for recursive transfers of directories that contain nested directories of many small files:

localhost$ scp -r ./mydata     bjones@frontera.tacc.utexas.edu:\$WORK  # DON'T DO THIS

Instead, use tar to create an archive of the directory, then transfer the directory as a single file:

localhost$ tar cvf ./mydata.tar mydata                                  # create archive
localhost$ scp     ./mydata.tar bjones@frontera.tacc.utexas.edu:\$WORK  # transfer archive

Transferring Files with rsync

The rsync (remote synchronization) utility is a great way to synchronize files that you maintain on more than one system: when you transfer files using rsync, the utility copies only the changed portions of individual files. As a result, rsync is especially efficient when you only need to update a small fraction of a large dataset. The basic syntax is similar to scp:

localhost$ rsync       mybigfile bjones@frontera.tacc.utexas.edu:\$WORK/data
localhost$ rsync -avtr mybigdir  bjones@frontera.tacc.utexas.edu:\$WORK/data

The options on the second transfer are typical and appropriate when synching a directory: this is a recursive update (-r) with verbose (-v) feedback; the synchronization preserves time stamps (-t) as well as symbolic links and other meta-data (-a). Because rsync only transfers changes, recursive updates with rsync may be less demanding than an equivalent recursive transfer with scp.

See Good Citizenship for additional important advice about striping the receiving directory when transferring large files; watching your quota on $HOME and $WORK; and limiting the number of simultaneous transfers. Remember also that $STOCKYARD (and your $WORK directory on each TACC resource) is available from several other TACC systems: there's no need for scp when both the source and destination involve subdirectories of $STOCKYARD.

Sharing Files with Collaborators

If you wish to share files and data with collaborators in your project, see Sharing Project Files on TACC Systems for step-by-step instructions. Project managers or delegates can use Unix group permissions and commands to create read-only or read-write shared workspaces that function as data repositories and provide a common work area to all project members.

Striping Large Files

Before transferring large files to Frontera, or creating new large files, be sure to set an appropriate default stripe count on the receiving directory. To avoid exceeding your fair share of any given OST, a good rule of thumb is to allow at least one stripe for each 100GB in the file. For example, to set the default stripe count on the current directory to 30 (a plausible stripe count for a directory receiving a file approaching 3TB in size), execute:

$ lfs setstripe -c 30 $PWD

Note that an lfs setstripe command always sets both stripe count and stripe size, even if you explicitly specify only one or the other. Since the example above does not explicitly specify stripe size, the command will set the stripe size on the directory to Frontera's system default (1MB). In general there's no need to customize stripe size when creating or transferring files.

Remember that it's not possible to change the striping on a file that already exists. Moreover, the mv command has no effect on a file's striping if the source and destination directories are on the same file system. You can, of course, use the cp command to create a second copy with different striping; to do so, copy the file to a directory with the intended stripe parameters.