HPC Operations Engineer
Santa Fe, NM
We’re looking for an HPC Operations Engineer who is interested in supporting large-scale parallel workloads on modern HPC hardware. Your role will be to work with our team of scientists to deploy, monitor, and react to changing HPC infrastructure.
50% - Managing day-to-day operations of our HPC infrastructure and workloads.
25% - Aggregating runtime and performance statistics for further analysis
25% - Assisting in evaluating new hardware and software opportunities for our workloads
- 2 years of experience working with HPC centers
- Extensive knowledge of the Linux operating system
- Experience with job schedulers (MOAB, SLURM, etc)
- Strong oral and written communications skills
- Considerable knowledge of computing hardware and software used in scientific computing environments.
- Experience with front-line customer support.
- Experience with shell/scripting languages such as Bash, Python, and Perl.
- Knowledge of high-performance networks and parallel file systems (Lustre, NFS).
- Knowledge of log analysis and aggregation for distributed systems