Hello folks, Share your resume to: [Upgrade to PRO to see contact] Or DM me directly Role:Site Reliability Engineer (SRE) – Senior Analyst(manufacturing domain) Location :Atlanta,Georgia(onsite) Exp:10+ req Job Description Key Responsibilities · System Reliability & Availability: Ensure system uptime, availability, and performance across all services. · Incident Response & Troubleshooting: Act as a first responder for incidents, perform root cause analysis, and drive long-term resolution. · Automation & Tooling: Build and maintain automation scripts and tools to improve operational efficiency. · Monitoring & Alerting: Set up, maintain, and optimize monitoring, alerting, and logging systems using tools like Prometheus, Grafana, CloudWatch, or Datadog. · Performance Optimization: Analyze system performance and recommend enhancements to improve reliability and efficiency. · Configuration Management: Use tools like Ansible, Puppet, or Chef to automate system configuration and deployments. · Service Level Objectives (SLOs): Define and monitor SLOs, SLAs, and error budgets to drive reliability goals. · Collaboration & Support: Work closely with software developers, QA teams, and other stakeholders to improve system reliability. · Documentation: Create and maintain clear documentation for incident response, system architecture, and operational procedures. Required Skills & Qualifications · Education: Bachelor’s degree in Computer Science, Information Technology, or a related field. · Experience: 5-7 years of experience as a Site Reliability Engineer (SRE), DevOps Engineer, or Systems Engineer. · Operating Systems: Strong knowledge of Linux/Unix system administration. · Cloud Platforms: Hands-on experience with AWS, Azure, or GCP. · Monitoring & Alerting: Experience with tools like Prometheus, Grafana, CloudWatch, or Datadog. · Scripting & Automation: Proficiency in Python, Shell, or Bash scripting. · Infrastructure as Code (IaC): Familiarity with Terraform, CloudFormation, or similar tools. · CI/CD Pipelines: Hands-on experience with Jenkins, GitLab CI/CD, or GitHub Actions. · Incident Response: Strong problem-solving and incident response skills. · Version Control: Experience with Git and source control best practices. Preferred Skills · Cloud Certifications: AWS Certified Solutions Architect, Azure Administrator, or GCP Professional Cloud Engineer. · ITSM/ITIL Knowledge: Understanding of IT Service Management (ITSM) and ITIL processes. · Containerization: Familiarity with Docker and Kubernetes. · Soft Skills: Excellent communication, problem-solving, and teamwork skills. Share your resume to: [Upgrade to PRO to see contact] Or DM me directly

Ramprasad Vaddepally

Site Reliability Engineer

Similar DevOps Projects

Lead AI SRE/AI Ops Engineer

Lead AI SRE/AI Ops Engineer

SRE Lead Tools Engineer

SRE / DevOps Engineer

SRE Architect

SRE DevOps Engineer

Translation Jobs

Popular Skills

Jobs by Salary

For Job Seekers

For Employers