Site Reliability Engineer
VeeAR Projects Inc. - Sunnyvale, CA
Apply NowJob Description
Job Summary:We are seeking a highly skilled Site Reliability Engineer (SRE) with expertise in Kubernetes and Google Cloud's POD Spanner to join our team. The ideal candidate will be responsible for maintaining, optimizing, and automating our cloud infrastructure, ensuring high availability, scalability, and performance for our mission-critical applications.Key Responsibilities:Reliability & Performance Optimization:Design, implement, and manage highly available, scalable, and resilient distributed systems.Optimize database performance and availability using Google Cloud Spanner (POD Spanner).Implement SLOs, SLIs, and SLAs to monitor and maintain system reliability.Kubernetes & Cloud Infrastructure Management:Deploy, manage, and scale Kubernetes clusters across cloud environments.Automate infrastructure provisioning and configuration using Terraform, Helm, or similar tools.Monitor and troubleshoot Kubernetes workloads, networking, and storage issues.Automation & CI/CD:Develop and maintain CI/CD pipelines to streamline deployments.Automate repetitive operational tasks using Python, Go, or Bash scripting.Implement GitOps workflows using ArgoCD, Flux, or similar tools.Incident Response & Monitoring:Implement observability best practices using Prometheus, Grafana, OpenTelemetry, or Google Cloud Operations Suite.Proactively monitor system health and troubleshoot incidents using logging and tracing tools.Participate in on-call rotations and incident post-mortem reviews.Required Skills & Experience:3+ years of experience as an SRE, DevOps Engineer, or similar role.Proficiency with Kubernetes (EKS, GKE, or AKS) and container orchestration.Hands-on experience with Google Cloud Spanner (POD Spanner) or other distributed SQL databases.Strong experience with Terraform, Helm, and Kubernetes operators.Knowledge of networking, load balancing, and security best practices in cloud environments.Proficiency in Python, Go, or Bash scripting for automation.Experience with observability tools (Prometheus, Grafana, OpenTelemetry, Google Cloud Logging/Monitoring).Familiarity with GitOps, CI/CD pipelines (Jenkins, GitHub Actions, ArgoCD, or Spinnaker).Strong troubleshooting skills for distributed systems and cloud infrastructure.Preferred Qualifications:Experience with Google Kubernetes Engine (GKE) and Google Cloud services.Knowledge of Istio, Envoy, or service mesh technologies.Familiarity with Kafka, Redis, or other cloud-native databases and messaging systems.Certification in Google Cloud (e.g., Professional Cloud DevOps Engineer, Kubernetes Certification).
Created: 2025-02-15