Hello folks,
Share your resume to: [Upgrade to PRO to see contact]
Or DM me directly
Role:Site Reliability Engineer (SRE) β Senior Analyst(manufacturing domain)
Location :Atlanta,Georgia(onsite)
Exp:10+ req
Job Description
Key Responsibilities
Β· System Reliability & Availability: Ensure system uptime, availability, and performance across all services.
Β· Incident Response & Troubleshooting: Act as a first responder for incidents, perform root cause analysis, and drive long-term resolution.
Β· Automation & Tooling: Build and maintain automation scripts and tools to improve operational efficiency.
Β· Monitoring & Alerting: Set up, maintain, and optimize monitoring, alerting, and logging systems using tools like Prometheus, Grafana, CloudWatch, or Datadog.
Β· Performance Optimization: Analyze system performance and recommend enhancements to improve reliability and efficiency.
Β· Configuration Management: Use tools like Ansible, Puppet, or Chef to automate system configuration and deployments.
Β· Service Level Objectives (SLOs): Define and monitor SLOs, SLAs, and error budgets to drive reliability goals.
Β· Collaboration & Support: Work closely with software developers, QA teams, and other stakeholders to improve system reliability.
Β· Documentation: Create and maintain clear documentation for incident response, system architecture, and operational procedures.
Required Skills & Qualifications
Β· Education: Bachelorβs degree in Computer Science, Information Technology, or a related field.
Β· Experience: 5-7 years of experience as a Site Reliability Engineer (SRE), DevOps Engineer, or Systems Engineer.
Β· Operating Systems: Strong knowledge of Linux/Unix system administration.
Β· Cloud Platforms: Hands-on experience with AWS, Azure, or GCP.
Β· Monitoring & Alerting: Experience with tools like Prometheus, Grafana, CloudWatch, or Datadog.
Β· Scripting & Automation: Proficiency in Python, Shell, or Bash scripting.
Β· Infrastructure as Code (IaC): Familiarity with Terraform, CloudFormation, or similar tools.
Β· CI/CD Pipelines: Hands-on experience with Jenkins, GitLab CI/CD, or GitHub Actions.
Β· Incident Response: Strong problem-solving and incident response skills.
Β· Version Control: Experience with Git and source control best practices.
Preferred Skills
Β· Cloud Certifications: AWS Certified Solutions Architect, Azure Administrator, or GCP Professional Cloud Engineer.
Β· ITSM/ITIL Knowledge: Understanding of IT Service Management (ITSM) and ITIL processes.
Β· Containerization: Familiarity with Docker and Kubernetes.
Β· Soft Skills: Excellent communication, problem-solving, and teamwork skills.
Share your resume to: [Upgrade to PRO to see contact]
Or DM me directly