Staff Site Reliability Engineer - Incident Response
Zscaler - seattle, WA
Apply NowJob Description
About Zscaler Serving thousands of enterprise customers around the world including 40% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. As the operator of the world's largest security cloud, Zscaler accelerates digital transformation so enterprises can be more agile, efficient, resilient, and secure. The pioneering, AI-powered Zscaler Zero Trust Exchange platform protects thousands of enterprise customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location. Named a Best Workplace in Technology by Fortune and others, Zscaler fosters an inclusive and supportive culture that is home to some of the brightest minds in the industry. If you thrive in an environment that is fast-paced and collaborative, and you are passionate about building and innovating for the greater good, come make your next move with Zscaler. Our Engineering team built the world's largest cloud security platform from the ground up, and we keep building. With more than 100 patents and big plans for enhancing services and increasing our global footprint, the team has made us and our multitenant architecture today's cloud security leader, with more than 15 million users in 185 countries. Bring your vision and passion to our team of cloud architects, software engineers, security experts, and more who are enabling organizations worldwide to harness speed and agility with a cloud-first strategy. NOTE: U.S. citizenship is required for this position due to the nature of the customers assigned to this role We're looking for an experienced Staff Site Reliability Engineer-Incident Response to join our Shared Platform Engineer team. Reporting to the Director Cloud Operations and Incident Management, you'll be responsible for: Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department. Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution. Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team. Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability. Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency. What We're Looking for (Minimum Qualifications) 5+ years of experience as a Site Reliability Engineer, with relevant experience in an Operations or Engineering environment. Hands-on experience troubleshooting Linux-based systems Networking knowledge and able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues. Coding experience (preferably Python) building tools, scripting, or automation Bachelor's degree in Computer Science, a related technical field involving computer systems engineering, or equivalent practical experience. What Will Make You Stand Out (Preferred Qualifications) Experience supporting High/Moderate FedRAMP environments Understanding of Observability practices and Tools - Grafana, DataDog, Splunk, etc Experience Leading Major Incidents in large scale, high uptime environments #LI-YC2 #LI-Remote This role offers remote work option Zscaler's salary ranges are benchmarked and are determined by role and level. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations and could be higher or lower based on a multitude of factors, including job-related skills, experience, an
Created: 2024-10-25