HPC/Storage/GPU Engineer
Tower Research Capital LLC - New York City, NY
Apply NowJob Description
Job Responsibilities Supporting, maintaining, and enhancing the firm's trading Linux infrastructure Supporting, maintaining, and enhancing the firm's HPC infrastructure for research Providing support specifically for the Linux and HPC environments including: Emergency response Execution of planned changes, updates, and deployment projects within the Linux server infrastructure Manage HPC systems to support trading operations and Condor Job scheduler Advanced profiling and troubleshooting of performance issues specifically within the Linux servers environment Contributing to the development and refinement of tools and systems to automate provisioning, configuration, and monitoring of thousands of Linux servers Management of essential core services such as DHCP, LDAP, DNS, and NFS for on-prem and hosted data centers as well as public clouds Participating in an on-call rotation and occasional weekend shifts Engaging in daily direct communication with trading teams and core engineering Stay up-to-date with the latest technologies and best practices in HPC, storage, and GPU computing. Qualifications Experience in maintenance, operation, and administration of a sufficiently advanced Linux environment Daily use of and contribution to developing automation and monitoring tools Comprehensive understanding of Linux OS concepts and internals Working knowledge of Intel-based hardware and server components Good knowledge of Python, expert knowledge of Bash for scripting and automation tasks in a Linux environment Understanding of Linux server-side networking and typical network protocols Participation in open source or personal projects is a plus Understanding of Linux configuration management, source control, CI/CD, and automated deployment Strong communication skills and the ability to work effectively in a team. Preferred Qualifications Experience with containerization and orchestration tools (e.g., Docker, Kubernetes). Familiarity with cloud computing platforms and hybrid cloud environments. Knowledge of parallel file systems (e.g., GPFS), batch systems (e.g., Slurm, Grid Engine, Condor), and high-performance network interconnects. Experience with VAST and Weka storage solutions is highly desirable. Solid understanding of trading infrastructure and low-latency systems. Excellent problem-solving skills and the ability to work in a fast-paced, dynamic environment. Skills in managing hybrid cloud/on-premises environments. Experience proposing and implementing Infrastructure as Code (IaC) practices from the ground up. Tower's office is located in Downtown Montreal and is easily accessible by public transportation. While we work hard, Tower's cubicle-free workplace, jeans-clad workforce, and well-stocked kitchens reflect the premium the firm places on quality of life. Benefits include: Competitive salary and discretionary bonuses 5 weeks of paid vacation per year Lunch and snacks on a daily basis Reimbursement for health and wellness expenses Free events and workshops Tower Research Capital is an equal opportunity employer. #J-18808-Ljbffr
Created: 2025-03-01