Site Reliability Engineer (SRE) - Java Troubleshooting ...
Altimetrik - Mountain View, CA
Apply NowJob Description
Job Title:Site Reliability Engineer (SRE) - Java Troubleshooting Expert (L2/L3)Job Location: Mountain View, CAJob Type: Full-TimeAbout the Role:We are seeking a talented and experienced SRE Support Engineer (L2/L3) to join our dynamic team. This role involves providing operational support, troubleshooting, and ensuring the smooth functioning of our systems. The ideal candidate will have strong expertise in Java, Python, DevOps tools, Groovy scripting, AWS Lambda, Key Responsibilities:L2/L3 SupportProvide advanced troubleshooting for production systems and applications.Resolve complex technical issues escalated from L1 support teams.Perform root cause analysis and implement permanent fixes.Site Reliability Engineering (SRE)Monitor system performance and proactively address potential issues.Enhance system reliability, availability, and scalability through automation.Design and implement robust incident management processes.Development & ScriptingWrite and maintain Java and Python scripts to support operations.Develop Groovy scripts for CI/CD pipelines and automation.Build tools and scripts for system performance optimization.Cloud & InfrastructureDesign and maintain solutions using AWS Lambda and other AWS services.Troubleshoot cloud-based environments and applications.Optimize cloud infrastructure for cost and performance.Required Skills & Experience:Programming Languages: Strong knowledge of Java and Python.Scripting: Hands-on experience with Groovy for automation and CI/CD.DevOps: Familiarity with tools such as Jenkins, Docker, Kubernetes, Git, and Terraform.AWS Expertise: Strong experience with AWS Lambda, EC2, S3, CloudWatch, and IAM.Troubleshooting: Proficient in diagnosing and resolving complex system issues.AIOps: Experience with AIOps tools (e.g., Dynatrace, AppDynamics, Splunk) is a plus.Soft Skills: Strong problem-solving, communication, and collaboration skills.Preferred Qualifications:Bachelor's or master's degree in computer science, Information Technology, or a related field.Experience in incident management and on-call rotations.Familiarity with Agile and DevOps methodologies.Certifications in AWS, DevOps, or relevant technologies are a plus.
Created: 2025-02-13