Job Details

Senior Linux HPC Systems Engineer

  2025-11-12     Cadre5     Knoxville,TN  
Description:

Cadre5, founded in 1999 in the Smoky Mountains of East Tennessee, provides innovative technical solutions to customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the Information Technology Services Directorate (ITSD) at Oak Ridge National Laboratory (ORNL) to recruit a Senior Linux HPC Systems Engineer to design, operate and maintain clusters, servers, and workstations supporting services where science happens at ORNL.

ORNL delivers scientific discoveries and technical breakthroughs to realize solutions in energy and national security while providing economic benefit to the nation. This premier research institution near Knoxville in Oak Ridge, TN addresses national needs through impactful research and world-leading centers.

This is a full-time, permanent position that follows a hybrid model. Minimum of 3 days on-site a week.

Why Cadre5?

  • Working with highly talented team members
  • 3 weeks' vacation
  • Excellent medical insurance, up to 100% paid by employer and contributions to HSA Plans

Job Responsibilities

  • Advocate and promote HPC and clustered computing services to researchers who process large data sets and/or develop code as a part of their project.
  • Ensure the availability, performance, scalability, and security of production systems.
  • Leverage automation and monitoring solutions that minimize day-to-day maintenance and scout opportunities to optimize system management practices or performance.
  • Collaborate with technical POCs for programs we support to install and help tune the performance of various scientific toolsets.
  • Optimize workflows and monitoring solutions to take advantage of our 24/7 operations staff, reducing off-hours support. Use Email, Jira, Confluence, Teams, Slack, and other collaboration tools to stay in contact.
  • Deliver ORNL's mission by aligning behaviors, priorities, and interactions with core values of Impact, Integrity, Teamwork, Safety, and Service. Promote equal opportunity by fostering a respectful workplace.

Basic Qualifications

  • A BS degree in computer science, computer engineering, information technology, information systems, science, engineering, business, or related discipline with eight to twelve years of aligned professional experience (equivalents considered).
  • Master's holders: seven to ten years of relevant experience. PhD holders: four to six years.
  • Five or more years managing UNIX/Linux systems.
  • Three or more years of experience with configuration management and automation tools (Git, Jenkins, Ansible, Puppet).
  • Moderate proficiency in at least one scripting language (Bash, Python, or similar).
  • Experience performing advanced troubleshooting and system administration with Linux servers.
  • Experience supporting large data systems.
  • Strong desire to innovate and communicate potential benefits of new technologies to the team and research partners.
  • Collaborative and proactive approach; ability to build trust and credibility with research teams.
  • Ability to obtain and maintain a Department of Energy Q clearance (US Citizenship required).

Preferred Qualifications

  • Active DOE Q, active DOD Top Secret, or active DOD TS/SCI clearance is heavily preferred.
  • Understanding of multiple operating systems and cluster technologies.
  • Experience with Rocky/CentOS/RHEL, Ubuntu, VMware.
  • Knowledge of HPC platforms supporting SLURM job submissions and troubleshooting.
  • Experience building and running containerized applications in an HPC environment.
  • Experience with deployment mechanisms like Diskless, Warewulf, Cobbler, PXEboot, and/or Bright.
  • Experience managing GPU clusters for AI/ML or image processing (NVIDIA and AMD).
  • Networking fundamentals including TCP/IP, traffic analysis, protocols, and diagnostics.
  • Experience with Infiniband networks and diagnostics.
  • Extensive experience with high-performance parallel file systems (Lustre, WEKA, GPFS, etc.).
  • Experience with performance and diagnostic tools for benchmarking and tuning systems, networking, and storage.
  • Experience with monitoring tools such as Grafana, CheckMK, Nagios, Zabbix, SolarWinds, Ganglia, or similar.
  • Experience in government or highly technical environments and good documentation skills, including simple web documentation.

Benefits

Cadre5 offers excellent pay and benefits, including full medical, dental, and vision coverage, 401K match, 15 days PTO, and 10 holidays.

Cadre5 is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. Cadre5 is an E-Verify Employer.

Seniority level

  • Mid-Senior level

Employment type

  • Full-time

Job function

  • Information Technology
  • Industries: Software Development

Referrals increase your chances of interviewing at Cadre5. Get notified about new Linux System Engineer jobs in Knoxville, TN.

#J-18808-Ljbffr


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search