Site Reliability Engineer

Graphite GTC - philadelphia, PA

Apply Now

Job Description

Since our inception in 2017, Graphite GTC has been on a mission to redefine the landscape of software development. Our groundbreaking No-Code development platform has transformed the way software is conceived and created, democratizing the process and making it accessible to all. Our mantra, "Better. Faster. Cheaper.â„¢" is not just a slogan; it's the reality we deliver through our innovative platform. Graphite GTC is a beacon of innovation in the no-code application development sphere. Our vision is to provide equal access to cutting-edge technology for a diverse range of clients, from startups to enterprises. We achieve this vision by moving away from traditional hand-coding methods and embracing a visually driven approach to application design, enabling anyone to craft sustainable, enterprise-class applications. Our proprietary software stands as a testament to our innovative spirit and technical prowess. This intellectual property has not only set us apart in the market but has also been the cornerstone of our service offerings. We have evolved into a full-service IT and consulting powerhouse, catering to an impressive roster of clients including the U.S. government, leading pharmaceutical companies, educational institutions, and giants in the construction and sustainability sectors. We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. In this role, you will focus on ensuring the reliability, scalability, and performance of the applications developed by Graphite GTC using our no-code platform. You will collaborate closely with our development teams to design, implement, and maintain the infrastructure and systems that support these applications for our diverse client base. Your responsibilities will include developing automated solutions for operational aspects such as deployment, monitoring, and incident response for client applications. You will proactively identify and resolve potential issues in application architectures to prevent downtime and performance degradation. This role is ideal for a detail-oriented engineer with a passion for ensuring application reliability and a background in both software development and systems engineering. Key Responsibilities Application Reliability and Performance: Ensure the reliability, availability, and optimal performance of applications developed for our clients across various industries. Infrastructure Management: Design, implement, and manage scalable infrastructure solutions that support client applications in production environments. Automation and Tooling: Develop and maintain automation tools for application deployment, monitoring, and management to improve efficiency and reduce manual intervention. Monitoring and Incident Response: Implement comprehensive monitoring solutions for client applications to detect and respond to issues promptly. Participate in on-call rotations to troubleshoot and resolve incidents. Collaboration: Work closely with development teams to integrate reliability best practices into the application development lifecycle. Capacity Planning: Analyze application performance data to plan for future growth and ensure scalability. Security and Compliance: Ensure applications and associated systems comply with security policies and industry regulations, particularly those relevant to federal clients. Continuous Improvement: Identify areas for improvement in application architecture and operational processes. Propose and implement solutions to enhance reliability and performance. Documentation: Create and maintain detailed documentation of application configurations, processes, and procedures. Qualifications Education: Bachelor's degree in Computer Science, Engineering, or a related field; a Master's degree is preferred. Experience: Minimum of 5 years of experience in site reliability engineering, systems engineering, or a related field. Proven experience supporting applications in production environments. Strong background in software development and scripting languages (e.g., Python, Go, Bash). Technical Skills: Expertise in automation and configuration management tools (e.g., Ansible, Terraform, Chef, Puppet). Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes). Proficient in monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack). Solid understanding of networking concepts, databases, and operating systems. Experience with CICD pipelines and DevOps practices. Location Requirement: Must be able to work 100% in-person at our Bryn Mawr, PA location.

Created: 2024-10-19

➤

Login

Create Account

Site Reliability Engineer