HPC Engineer/Architect
TekWissen ® - new york city, NY
Apply NowJob Description
Job Title: HPC EngineerArchitectWork Location: New York, NY 10001Job Type: Contract Work Type: HybridDuration: 6+ MonthsPay Rate: $70-70.00 hourly rateOverview: TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan that offers strategic talent solutions to our clients world-wide. Our client is an American multinational information technology services and consulting company and is a leading provider of information technology, consulting, and business process outsourcing services, dedicated helping the world's leading companies build stronger businesses.Job Summary:You will support day-to-day operations of large-scale parallel file systems, deploy and maintain Linux HPC infrastructure across multiple data centers, and assist HPC engineers and architects with day-to-day operations and tickets.Support day-to-day operations of large-scale parallel file systemsDeploy and Maintain Linux HPC infrastructure across multiple datacentersAssist HPC engineers and architects with day-to-day operations and ticketsExperience:16 to 20 yearsRequired Skills:Linux Operating Systems (RHELCentOS), Parallel file system (GPFS), Job Scheduler LSFSlrmAnxible, Python, Shell scriptingGPU-based compute infrastructure (including CUDA)CentOS 4.5HPCCResponsibilities:Design, architect and oversee implementation of Linux based HPC clusters and storageDeploy physical hardware using HPC deployment tools and configuration and orchestration tools (Ansible)Parallel file system (GPFS) performance tuning, monitoring and troubleshootingPerform systems benchmarking, and developing automated tests for the HPC environment, ensuring the reliability and efficiency of our computational infrastructureInfiniband network maintenance and troubleshootingAutomate and monitor the HPC user lifecycle processSlurm installation, configuration, performance tuning and troubleshootingPlan, design and implement a transition from the LSF scheduler to SlurmManage the Slurm scheduler and translate Research policies into scheduler configurationsConsult with faculty and students to develop research pipelines for use on the HPC clusterDevelop and maintain user lifecycle software suite in Python, implement CICD pipelineTest and automate upgrades of critical system applications using Ansible and shell scripts.The ability to communicate effectively with clinicians, researchers, and other team members to develop technological solutions is keyQualifications:Experience working in a large-scale research based HPC environmentProven experience working with distributed file storage solutions (i.e., GPFS)Experience with deploying and troubleshooting Linux Operating Systems (RHELCentOS)Experience with Scripting and Automation (Ansible, Python, Shell Scripting)Solid understanding of job schedulers (LSFSLURM)Experience with GPU-based compute infrastructure (including CUDA)TekWissen® Group is an equal opportunity employer supporting workforce diversity.
Created: 2025-02-22