Staff Site Reliability Engineer, PLM Operations
Tesla - stanford, CA
Apply NowJob Description
This position can be based in Palo Alto, CA, San Diego, CA or Austin, TX. Every day, thousands of Tesla Engineers around the world use a variety of software tools and data stores to design mechanical, electrical, electronic, and software systems. The PLM/CAD Operations team, POPS for short, maintains and improves these systems as technologies evolve so that Tesla Engineers have access to reliable and performant engineering design tools. Due to the breadth of technology used by Tesla, the members of the POPS team are expected to be technical generalists - with a deeper well in a few areas, e.g. database, networking or cluster management. As SREs, we replace toil with automation. We develop tooling in Go, but we encounter plenty of Java, Python, JS frameworks, Tcl, and even some VB. We manage clusters above the node allocation layer, managing for example, our own kubelet upgrades and Windows nodes. Responsibilities Define SLOs around latency, traffic, errors and saturation. Reliability and performance are the team's deliverables Maintain Tesla-custom Helm Charts to deploy highly customized and evolving 3DExperience (Dassault Systmes) services running on on-prem Kubernetes Modernize our deployment infrastructure using custom GitHub Actions, ArgoCD, Atlantis, and terraform Achieve high performance service using tools like Prometheus, Grafana, Catchpoint, Splunk and OpsGenie Be in an on-call rotation, manage incidents as Incident Commander, write actionable incident reports Manage tasks via Jira for observability and human capacity planning. Maintain excellent Jira hygiene Write and review design docs - testing frameworks, deployment models, environment definitions, etc. Requirements Deep networking experience, e.g. experience troubleshooting outages from L7 to L3, experience contributing to infra or networking GitHub repos or publications Deep Oracle Database experience, e.g. indexing deltas, schema migrations Docker/Kubernetes, e.g. performed kubelet upgrades in-situ, used skopeo or CRI-O intentionally, configured containerd Diagnosing problems in legacy enterprise Java stacks Installing, managing or using 3DExperience, or similar experience with other PLM software Outstanding experience with Scientific computing or LIMS Deep understanding of hypervisor technology (VMware) Compensation and Benefits Benefits Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire: Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction Family-building, fertility, adoption and surrogacy benefits Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA Healthcare and Dependent Care Flexible Spending Accounts (FSA) LGBTQ+ care concierge services 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits Company paid Basic Life, AD&D, short-term and long-term disability insurance Employee Assistance Program Sick and Vacation time (Flex time for salary positions), and Paid Holidays Back-up childcare and parenting support resources Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance Weight Loss and Tobacco Cessation Programs Tesla Babies program Commuter benefits Employee discounts and perks program Expected Compensation $140,000 - $252,000/annual salary + cash and stock awards + benefits Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.
Created: 2024-10-07