Observability Engineer
Sands Corp - Dallas, TX
Apply NowJob Description
The primary responsibility of the Observability Engineer is implementing and developing infrastructure focused on monitoring and observability within Sands. The Observability Engineer will be responsible for developing solutions for and ensuring that infrastructure operational teams have effective tools to monitor, observe, and operate systems and platforms within the framework of large enterprise compliance and governance needs. The Observability Engineer will develop, maintain, and execute infrastructure such as code scripts and playbooks to automate deployment and maintenance tasks to ensure the availability, reliability, and efficient operation of the enterprise systems. The position demands someone who is highly technically competent, detail-oriented, and driven to stay current with evolving technologies. All duties are to be performed in accordance with departmental and Las Vegas Sands Corp.'s policies, practices, and procedures. All Las Vegas Sands Corp. Team Members are expected to conduct and carry themselves in a professional manner at all times. Team Members are required to observe the Company's standards, work requirements, and rules of conduct. Work with Lead Observability Engineer to decide and execute upon priorities for monitoring, alerting, and observability KPIs that are required. Develop solutions to observability demands. Deliver broad services that cover the following domains: Log Collection and Analysis Operational Metrics Build, Test, and Deployment Automation Platform reliability engineering monitoring Design, develop, and maintain automation solutions to support observability and operations, focusing on improving system monitoring, alerting, and reporting capabilities. Provide technology and/or process solutions to high-impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards. Be accountable for execution in support of observability policies, processes, and architectural decisions. Responsible for ensuring operational methods, procedures, facilities, and tools are developed in accordance with policies, and are well documented and maintained. Monitor and research emerging observability trends and technologies with the potential to improve efficiency, security, and business capabilities. Develop and execute proof-of-concept projects to evaluate new solutions for potential adoption. Develop documentation (e.g., including data flow diagrams, logical diagrams, and physical diagrams) and training in compliance with standards. Apply enterprise design principles and best practices for implementing and supporting observability services. Operate with a limited level of direct supervision and exercise independence of judgment and autonomy. Consistently share standard methodologies and improve processes within and across teams. Perform job duties in a safe manner. Attend work as scheduled on a consistent and regular basis. Perform other related duties as assigned. Minimum Qualifications At least 21 years of age. Proof of authorization to work in the United States. Bachelor's degree in computer science, Engineering, or related discipline required. Advanced degree in technology or engineering is a plus. Must be able to obtain and maintain any certification or license, as required by law or policy. 5+ years proven experience of developing Monitoring, Observability solutions in on-premises IT infrastructure, applications and private & public cloud monitoring. Experience in ITRS, Geneos, and OpsView is a plus. Strong expertise with scripting in Python, Java, and RESTful Services, with focus on building high throughput/High volume distributed systems. Strong expertise in Linux/Unix, Container orchestration (e.g., Kubernetes), container runtimes, and optimization. Strong understanding of Site Reliability Engineering and DevOps principles. Strong technical acumen in Cloud Architecture, Performance Benchmarking, and Capacity planning. Proficiency in Project Management and work item management tools such as Azure DevOps and Portfolio. Strong knowledge of logging systems, experience with ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, or similar platforms. Experience with tools like Harness, GitLab, Terraform, Ansible, or CloudFormation for managing and monitoring infrastructure. Demonstrated experience diagnosing performance bottlenecks and other system issues using observability data. Demonstrated understanding and respect of IT service management practices (e.g., change, release, incident, problem management). Able to multi-task and handle various types of requests from different people/areas. Strong analytical and problem-solving skills. Effective written and verbal communication skills in English. Physical Requirements Must be able to: Physically access assigned workspace areas with or without reasonable accommodation. Work indoors and be exposed to various environmental factors such as, but not limited to, CRT, noise, and dust. Utilize laptop and standard keyboard to perform essential functions of the job. #J-18808-Ljbffr
Created: 2025-02-13