Job Title: DevOps Engineer (Cloud Infrastructure β AWS/GCP)
Department: Technology
Reports To: DevOps Lead
Job Summary
We are looking for a skilled DevOps Engineer to design, build, and manage scalable cloud infrastructure on AWS and/or GCP. This role will be responsible for automation, system reliability, CI/CD pipelines, and ensuring secure and efficient cloud operations. You will work closely with engineering teams to streamline deployments and improve system performance.
Key Responsibilities
Design, implement, and manage cloud infrastructure on AWS and/or GCP
Build and maintain Infrastructure as Code (IaC) using tools like Terraform or CloudFormation
Develop and manage CI/CD pipelines for automated build, test, and deployment processes
Monitor system performance, availability, and reliability using observability tools
Manage containerized applications using Docker and Kubernetes
Ensure security best practices (IAM, network policies, secrets management)
Optimize cloud cost, performance, and scalability
Automate operational tasks and reduce manual intervention
Troubleshoot production issues and perform root cause analysis
Collaborate with development teams to improve release cycles and system resilience
Maintain documentation of infrastructure and processes
Requirements
Bachelorβs degree in Computer Science, Engineering, or related field (or equivalent experience)
3+ years of experience in DevOps, SRE, or cloud engineering roles
Strong hands-on experience with AWS and/or GCP services
Experience with Infrastructure as Code (Terraform preferred)
Proficiency in scripting (Python, Bash, or similar)
Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins, etc.)
Solid understanding of networking concepts (VPC, DNS, load balancing)
Experience with containerization (Docker) and orchestration (Kubernetes)
Key Skills
Experience with multi-cloud or hybrid cloud environments
Familiarity with monitoring tools (Prometheus, Grafana, Datadog, CloudWatch, Stackdriver)
Knowledge of security best practices and compliance standards
Experience with service mesh (Istio, Linkerd)
Exposure to serverless architectures (AWS Lambda, GCP Cloud Functions)
Soft Skills
Strong problem-solving and analytical thinking
Ability to work independently and take ownership of infrastructure
Effective communication and collaboration skills
Proactive mindset with a focus on automation and improvement
Key Metrics for Success
System uptime and reliability (SLA/SLO adherence)
Deployment frequency and lead time
Incident response and resolution time
Infrastructure cost efficiency
Automation coverage