REGENT LP
Regent is a global private holding company focused on investing in and transforming businesses across a broad spectrum of industries including automotive, technology, consumer products, retail, industrial, and media. Known for driving innovation and operational excellence, Regent partners with management teams to unlock long-term value.
ROLE OVERVIEW
We are seeking an experienced Lead Engineer for Cloud Platform Operations to join a global technology team in Krakow. This area lead role carries full accountability for provisioning, securing, and continuously improving a multi-cloud estate, with AWS as the primary platform, alongside OCI and Azure. The role spans IaaS and PaaS operations, Infrastructure-as-Code engineering, cloud-native security and networking, Kubernetes/container platforms, SRE practices, and FinOps - partnering closely with application, security, and infrastructure teams to deliver a reliable, governed, and cost-efficient cloud platform. The successful candidate combines deep technical breadth across cloud disciplines with the leadership capability to mentor engineers and drive continuous improvement.*
*Please note this role is on-site 5 days per week
AREA CONTEXT
Cloud Platform Operations is responsible for the provisioning, management, and optimisation of cloud infrastructure and services. This area supports the organisation's digital transformation by enabling rapid deployment of resources, automation of operational tasks, and integration with DevOps workflows. It covers public cloud platforms - AWS (primary), Oracle Cloud Infrastructure (OCI), and Azure - as well as IaaS, PaaS, and cloud-native offerings. The area addresses challenges related to cost management, scalability, and compliance, ensuring that cloud resources are used efficiently and securely.
KEY RESPONSIBILITIES
β’ Lead the provisioning, management, and optimisation of cloud infrastructure and services across AWS (primary), OCI, and Azure, covering IaaS, PaaS, and cloud-native offerings.
β’ Oversee the deployment and configuration of public cloud resources, ensuring security, scalability, cost efficiency, and alignment with landing-zone guardrails (tagging, naming, quota, region standards).
β’ Curate and operate a Cloud Service Catalog of approved blueprints for common stacks, enabling governed self-service and faster time-to-value for application teams.
β’ Implement an Infrastructure-as-Code approach using Terraform for all cloud infrastructure deployments; maintain drift detection and auto-remediation where safe.
β’ Integrate IaC pipelines with pre-merge security and compliance testing (OPA/Conftest, static analysis, terraform validate/plan gates) and manage controlled promotion across environments; own access controls, secrets hygiene, and security configuration across CI/CD tooling (Jenkins, GitHub).
β’ Develop and maintain automation scripts and tools (CLI, PowerShell, Python, Bash) for cloud resource management, health checks, and operational toil reduction; experience developing AI-assisted automation or autonomous agents to accelerate operational workflows is a strong advantage.
β’ Establish and maintain cloud landing zones with policy-as-code guardrails (Azure Policy/Defender for Cloud, AWS Organizations/Control Tower SCPs, OCI Policies).
β’ Own identity and access standards: enforce least privilege, SSO, role mapping, privileged access break-glass, workload identities, and key/secrets management (KMS/HSM, rotation SLAs, secret scanning).
β’ Define and operate cloud network reference architectures (hub-and-spoke, private endpoints, egress controls, DNS, global load balancing, cross-cloud connectivity) with security baselines.
β’ Maintain golden images and patch pipelines for compute and container runtimes; ensure vulnerability management and CIS/NIST benchmark alignment.
β’ Lead container and Kubernetes platform operations (EKS/AKS/OKE): cluster lifecycle, node pools, autoscaling, admission control, image provenance, and supply chain security.
β’ Implement observability at scale (centralised logs, metrics, traces); integrate with SIEM/SOAR and enforce runbook-driven incident response and post-incident reviews.
β’ Embed SRE practices (SLOs, error budgets, capacity policies, toil reduction) and automate health checks, drift detection, and remediations.
β’ Own FinOps operations: cost allocation/chargeback, budgets and alerts, rightsizing, Reserved Instances/Savings Plans/Flexible commitments, and lifecycle policies for idle or orphaned resources.
β’ Oversee backup, disaster recovery, and business continuity planning for cloud environments; define RTO/RPO targets and participate in restore drills.
β’ Ensure CMDB/ITSM integration (auto-discovery, service mapping), event enrichment, and change automation (standard changes) with audit-ready evidence.
β’ Drive integration with DevOps workflows, supporting rapid deployment and continuous delivery across cloud platforms.
β’ Ensure compliance with organisational policies and regulatory requirements; support audit activities and lead the response to audit findings with timely remediation.
β’ Participate in governance, reporting, and service review meetings; conduct regular reviews of cloud resource utilisation and performance.
β’ Lead cloud migration projects, ensuring minimal disruption and robust risk management across AWS, OCI, and Azure.
β’ Manage vendor relationships and coordinate with third-party cloud providers (AWS, Oracle, Microsoft); stay current with emerging cloud technologies.
β’ Mentor and develop engineers in cloud build standards, IaC patterns, and troubleshooting; manage team workload and capacity across concurrent workstreams, maintain and prioritise the team's Jira ticket queue, and organise the On-Call rota for the Cloud Platform Operations area.
β’ Drive continuous improvement in cloud platform operations processes and standardise patterns into reusable blueprints.
QUALIFICATIONS
Experience
β’ 7+ years of enterprise cloud engineering and operations experience, with at least 2 years in a team lead or managerial role: including team workload management, ticket queue prioritisation (Jira), and hands-on development of engineers.
β’ Deep, hands-on production expertise in AWS is essential; experience across at least one additional platform (OCI and/or Azure) covering both IaaS and PaaS services at enterprise scale.
β’ Demonstrable Terraform expertise: authored, maintained, and reviewed IaC codebases in a team environment with CI/CD pipelines (Jenkins, GitHub) and policy gates.
β’ Proven track record delivering cloud migration projects and complex cloud transformation initiatives with minimal disruption.
β’ Experience designing and operating cloud IAM frameworks: federation/SSO, workload identities, JIT/PAM, least-privilege design, and KMS/HSM secret management.
β’ Hands-on experience with cloud networking: VNet/VPC design, private endpoints, hub-and-spoke architectures, DNS, global load balancing, and egress controls.
β’ Experience operating Kubernetes clusters (EKS, AKS, or OKE) in production: node pools, autoscaling, admission control, and image supply chain.
β’ Background in FinOps practices: cost allocation, rightsizing, commitment planning, anomaly detection, and showback/chargeback.
β’ Experience supporting or leading cloud audit and compliance activities (ISO 27001, SOC 2, PCI-DSS, or equivalent) with evidenced remediation.
β’ Proficiency with ITSM and project tracking platforms β Jira (primary) and ServiceNow or equivalent β for incident management, ticket queue management, CMDB, and change automation.
Technical Skillset
β’ Multi-cloud platform management (AWS primary, OCI, Azure): secure provisioning, tenancy hygiene, landing-zone design, and quota/region governance.
β’ Infrastructure-as-Code (Terraform): module design, state management, drift control, and CI/CD pipeline integration (Jenkins, GitHub) with policy/test gates (OPA/Conftest); access controls, secrets hygiene, and security governance across CI/CD tooling.
β’ Policy-as-code and guardrails: Azure Policy/Defender for Cloud, AWS SCPs/Config/Control Tower, OCI Policies & Cloud Guard.
β’ Deep IAM skills: federation/SSO, workload identities, conditional access, JIT/PAM, least-privilege design patterns, KMS/HSM, and secret lifecycle management.
β’ Cloud networking patterns: VNet/VPC design, private links/endpoints, service endpoints, routing/peering, DNS, global load balancing, egress control, and cross-cloud connectivity.
β’ Kubernetes/container operations (EKS/AKS/OKE): cluster lifecycle, admission controllers, image signing (SBOM), registry governance, and autoscaling.
β’ SRE and operability: SLOs, error budgets, toil reduction, runbook authoring, incident command, and post-incident review facilitation.
β’ Security posture and compliance: CSPM/CWPP tooling, CIS/NIST/ISO mapping, vulnerability management, patch baselines, and workload hardening.
β’ FinOps tooling: budget management, anomaly detection, commitment planning, showback/chargeback, cost allocation tags, and lifecycle policies.
β’ Observability and ITSM automation: centralised log/metrics/trace pipelines, SIEM/SOAR integration, auto-discovery, service mapping, and event enrichment.
β’ Backup, disaster recovery, and geo-redundancy for cloud environments; restore drill planning and RTO/RPO definition.
β’ Automation scripting (Python, PowerShell, Bash, CLI) for bulk operations, health checks, and compliance reporting.
β’ Nutanix (advantageous): familiarity with Nutanix HCI/cloud platform is a strong differentiator, particularly for hybrid workload integration and private-to-public cloud migration scenarios.
β’ Strong documentation and coaching skills; ability to standardise patterns into reusable blueprints and service catalog items.
β’ AI-assisted automation and agent development (advantageous): ability to work with AI tooling and develop autonomous agents to accelerate operational and engineering workflows; candidates with hands-on experience building AI agents or integrating LLM-based automation into platform operations will be strongly preferred.