System Administrator – HPC / EDA Infrastructure
Muscat, Oman
Full-Time
Position Overview
We are seeking a highly skilled System Administrator to design, administer, and optimize large-scale Red Hat Enterprise Linux (RHEL)-based High-Performance Computing (HPC) and Electronic Design Automation (EDA) infrastructure. The ideal candidate will have strong experience in Linux system administration, workload orchestration, virtualization, storage integration, performance tuning, and infrastructure automation within a regulated enterprise data center environment. This role supports engineering workloads and compute-intensive environments used for semiconductor design, simulation, and verification.
Key Responsibilities
Linux & HPC Infrastructure
Administer and maintain large-scale RHEL-based HPC environments
Perform system tuning for high CPU, memory, and I/O workloads
Manage OS patch lifecycle, upgrades, and security compliance
Implement system hardening aligned with CIS security baselines
Workload Scheduling & Compute Farm Optimization
Configure and manage IBM LSF or Slurm workload schedulers
Monitor cluster performance and optimize job scheduling efficiency
Implement resource allocation policies and quota management
Virtualization & VDI
Manage VMware clusters, including HA, DRS, and resource pools
Support VDI environments for engineering workloads (CPU/GPU)
Maintain golden images and provisioning workflows for VDI users
EDA Infrastructure
Deploy and maintain EDA tool environments including Cadence, Synopsys, and Siemens
Perform tool performance tuning and patch compatibility validation
Manage license servers (FlexLM / RLM) including triad redundancy configurations
Storage & Data Management
Configure and maintain NFSv4 and parallel file systems
Ensure proper file permissions, locking, and secure access to design data
Support high-throughput storage infrastructure for compute workloads
Automation & DevOps
Develop automation workflows using Ansible, shell scripting, or Python
Automate system provisioning, patching, and monitoring tasks
Support emerging container technologies (Docker / Podman) for EDA workloads
Monitoring, Logging & Troubleshooting
Monitor infrastructure health and perform capacity planning
Analyze system logs and resolve performance or reliability issues
Integrate centralized logging into enterprise SIEM platforms
Backup, Recovery & Resilience
Manage backup policies and validate recovery procedures
Conduct disaster recovery (DR) and high availability (HA) testing
User Environment & Access Management
Provision user environments and manage access lifecycle
Support engineers using shared compute infrastructure
Maintain documentation and follow enterprise change management processes
Qualifications
Required Qualifications
5+ years of Linux system administration experience in enterprise environments
Strong expertise in Red Hat Enterprise Linux (RHEL)
Experience with HPC cluster management or large compute farms
Hands-on experience with LSF, Slurm, or similar job schedulers
Experience with VMware virtualization
Knowledge of NFS storage environments
Scripting experience (Bash, Python, or similar)
Strong troubleshooting and performance optimization skills
Required Certifications
RHCSA or RHCE
Preferred Qualifications
Experience supporting EDA environments
MCSE or equivalent Microsoft infrastructure experience
AWS certification
VMware certification (VCP or higher)
Experience with GPU computing or VDI platforms
Familiarity with container environments (Docker / Podman)
Soft Skills
Strong analytical and problem-solving skills
Excellent documentation and communication skills
Ability to work in cross-functional engineering environments
Experience working in regulated or compliance-driven environments
