HPC Operations Engineer

Santa Fe, NM
Engineering
Full-time
We’re looking for an HPC Operations Engineer who is interested in supporting large-scale parallel workloads on modern HPC hardware. Your role will be to work with our team of scientists to deploy, monitor, and react to changing HPC infrastructure.
50% - Managing day-to-day operations of our HPC infrastructure and workloads.
25% - Aggregating runtime and performance statistics for further analysis
25% - Assisting in evaluating new hardware and software opportunities for our workloads

Requirements

    • 2 years of experience working with HPC centers
    • Extensive knowledge of the Linux operating system
    • Experience with job schedulers (MOAB, SLURM, etc)
    • Strong oral and written communications skills

Pluses

    • Considerable knowledge of computing hardware and software used in scientific computing environments.
    • Experience with front-line customer support.
    • Experience with shell/scripting languages such as Bash, Python, and Perl.
    • Knowledge of high-performance networks and parallel file systems (Lustre, NFS).
    • Knowledge of log analysis and aggregation for distributed systems