top of page

System Administrator – HPC / EDA Infrastructure

Muscat, Oman

Full-Time

Position Overview

We are seeking a highly skilled System Administrator to design, administer, and optimize large-scale Red Hat Enterprise Linux (RHEL)-based High-Performance Computing (HPC) and Electronic Design Automation (EDA) infrastructure. The ideal candidate will have strong experience in Linux system administration, workload orchestration, virtualization, storage integration, performance tuning, and infrastructure automation within a regulated enterprise data center environment. This role supports engineering workloads and compute-intensive environments used for semiconductor design, simulation, and verification.

Key Responsibilities

Linux & HPC Infrastructure

  • Administer and maintain large-scale RHEL-based HPC environments

  • Perform system tuning for high CPU, memory, and I/O workloads

  • Manage OS patch lifecycle, upgrades, and security compliance

  • Implement system hardening aligned with CIS security baselines


Workload Scheduling & Compute Farm Optimization

  • Configure and manage IBM LSF or Slurm workload schedulers

  • Monitor cluster performance and optimize job scheduling efficiency

  • Implement resource allocation policies and quota management


Virtualization & VDI

  • Manage VMware clusters, including HA, DRS, and resource pools

  • Support VDI environments for engineering workloads (CPU/GPU)

  • Maintain golden images and provisioning workflows for VDI users


EDA Infrastructure

  • Deploy and maintain EDA tool environments including Cadence, Synopsys, and Siemens

  • Perform tool performance tuning and patch compatibility validation

  • Manage license servers (FlexLM / RLM) including triad redundancy configurations


Storage & Data Management

  • Configure and maintain NFSv4 and parallel file systems

  • Ensure proper file permissions, locking, and secure access to design data

  • Support high-throughput storage infrastructure for compute workloads


Automation & DevOps

  • Develop automation workflows using Ansible, shell scripting, or Python

  • Automate system provisioning, patching, and monitoring tasks

  • Support emerging container technologies (Docker / Podman) for EDA workloads


Monitoring, Logging & Troubleshooting

  • Monitor infrastructure health and perform capacity planning

  • Analyze system logs and resolve performance or reliability issues

  • Integrate centralized logging into enterprise SIEM platforms


Backup, Recovery & Resilience

  • Manage backup policies and validate recovery procedures

  • Conduct disaster recovery (DR) and high availability (HA) testing


User Environment & Access Management

  • Provision user environments and manage access lifecycle

  • Support engineers using shared compute infrastructure

  • Maintain documentation and follow enterprise change management processes

Qualifications

Required Qualifications

  • 5+ years of Linux system administration experience in enterprise environments

  • Strong expertise in Red Hat Enterprise Linux (RHEL)

  • Experience with HPC cluster management or large compute farms

  • Hands-on experience with LSF, Slurm, or similar job schedulers

  • Experience with VMware virtualization

  • Knowledge of NFS storage environments

  • Scripting experience (Bash, Python, or similar)

  • Strong troubleshooting and performance optimization skills


Required Certifications

  • RHCSA or RHCE


Preferred Qualifications

  • Experience supporting EDA environments

  • MCSE or equivalent Microsoft infrastructure experience

  • AWS certification

  • VMware certification (VCP or higher)

  • Experience with GPU computing or VDI platforms

  • Familiarity with container environments (Docker / Podman)


Soft Skills

  • Strong analytical and problem-solving skills

  • Excellent documentation and communication skills

  • Ability to work in cross-functional engineering environments

  • Experience working in regulated or compliance-driven environments

bottom of page