About You
This is a highβimpact, highβexpectation senior Azure DevOps/SRE role supporting a Private Equity Group (PEG) platform. The team is new and the infrastructure is new. You must operate with strong autonomy, influence engineering decisions, challenge assumptions, and serve as a thought partner rather than an orderβtaker. You will work directly with VPβlevel stakeholders in a technical environment where reliability, deployment safety, and clarity in communication are essential.
This position requires someone who can own DevOps architecture endβtoβend, enable safe and frequent deployments, and establish operational excellence from day zero.
Note: This position is offered under a contractor model for a period of 6 months with the possibility of extension.
You Bring to Applaudo the Following Competencies
β’ Proven ownership of production-grade CI/CD pipelines using GitHub Actions reusable workflows and GitOps automation with ArgoCD.
β’ Expert-level Kubernetes and AKS operations, including cluster management, scaling, RBAC, and networking.
β’ Production-scale Terraform expertise, including Azure backend (Storage Accounts / remote state), and PR-driven workflows via Atlantis.
β’ Strong reliability engineering experience, including SLO/SLI design, alerting strategies, dashboards, incident response, and post-incident reviews.
β’ Hands-on experience with secrets management solutions such as HashiCorp Vault or Azure Key Vault (experience integrating Key Vault with Kubernetes is highly preferred).
β’ Experience implementing supply-chain security controls, including image scanning and signing, SBOM generation, and policy enforcement with OPA/Gatekeeper.
β’ Strong experience with observability stacks, including Prometheus, Grafana, Loki, Tempo, and Alertmanager.
β’ Experience with service mesh technologies such as Istio, including traffic management, mTLS, AuthorizationPolicies, and circuit breaking.
β’ Scripting ability using Python and Bash for automation and operational tooling.
β’ Active use of AI-assisted engineering tools such as Cursor, GitHub Copilot, or Cloud Code to accelerate IaC development, incident response, and runbook generation.
β’ Strong communication skills, with the ability to communicate clearly and confidently with VP-level stakeholders during operational incidents.
β’ Advanced English proficiency, as you will work directly with US-based clients.
You Will Be Accountable for the Following Responsibilities
β’ Design and maintain GitHub Actions reusable workflows across a multi-repository ecosystem.
β’ Own GitOps deployments through ArgoCD, including promotion workflows, sync policies, drift detection, and automated rollback strategies.
β’ Implement deployment safety mechanisms such as environment protections, concurrency rules, and verification gates.
β’ Operate and manage Kubernetes clusters in Azure (AKS), including scaling strategies, networking configuration, and cluster upgrades.
β’ Maintain Terraform-driven infrastructure and enforce PR-driven workflows through Atlantis.
β’ Define and maintain SLOs, SLIs, alerting rules, and monitoring dashboards across platform services.
β’ Lead incident response, coordinate recovery efforts, and execute structured post-incident reviews.
β’ Participate in an on-call rotation and contribute to improving operational processes.
β’ Operate and manage secrets using Vault or Azure Key Vault, including access policies and secure integration with workloads.
β’ Implement supply-chain security controls, including Trivy scanning, Cosign signing, SBOM generation, and OPA/Gatekeeper enforcement.
β’ Partner with Security Engineering on network policies, egress controls, and compliance standards.
β’ Automate repetitive tasks and maintain proactive runbooks to reduce operational risk.
β’ Use AI tools to improve infrastructure automation, documentation, and deployment safety validation.
β’ Collaborate with product teams to strengthen SLOs and deployment safety practices.
β’ Challenge technical assumptions and advocate for scalable, secure DevOps architectures.