Keyloop bridges the gap between dealers, manufacturers, technology suppliers and car buyers.
We empower car dealers and manufacturers to fully embrace digital transformation. How? By creating innovative technology that makes selling cars better for our customers, and buying and owning cars better for theirs.
Â
We use cutting-edge technology to link our clients’ systems, departments and sites. We provide an open technology platform that’s shaping the industry for the future. We use data to help clients become more efficient, increase profitability and give more customers an amazing experience. Want to be part of it?
Â
About the Role
We’re seeking a highly experienced Azure Cloud SME (IC3) to serve as the senior technical authority for Azure platform operations across multiple tenants and subscriptions (3) for single product.
This role owns production stability for Azure PaaS and infrastructure services, leads high-severity incident response, drives governance and security enforcement, optimises cost and performance, and mentors junior engineers.
You will operate with minimal supervision, act as the technical escalation point for complex platform issues, and partner closely with Engineering, Security, FinOps, and the 24/7 Operations Centre to ensure operational excellence
Key Responsibilities
• Azure Platform Ownership
• Own Azure PaaS operations including SQL Databases (Elastic Pools), App Services, Function Apps, Storage Accounts, Key Vault, App Configuration, and Service Bus.
• Manage lifecycle activities including provisioning, optimisation, decommissioning, and capacity planning.
• Lead troubleshooting of complex performance, scaling, availability, and resilience issues.
• Manage edge controls including Application Gateway (WAF), Traffic Manager, and support Azure Front Door planning where required.
• Critical Incident Leadership (P1/P2)
• Lead high-severity incident response triggered by monitoring platforms.
• Coordinate cross-functional technical teams during outages.
• Provide structured, executive-ready communications during incidents.
• Conduct detailed Root Cause Analysis (RCA) with corrective and preventive actions.
• Identify recurring patterns and drive permanent fixes.
• App Services & Web Services – Advanced Support
• Diagnose scaling failures, memory leaks, deployment failures, and performance bottlenecks.
• Support rollback strategies and change validation.
• Troubleshoot connectivity issues involving Application Gateway, Traffic Manager, VNets, and Private Endpoints.
• · Partner with application teams to optimise service performance and reliability.
• Azure SQL & Elastic Pool Operations
• Investigate performance degradation, blocking, and query inefficiencies.
• Analyse DTU/vCore consumption and pool resource contention.
• Validate backup strategies and lead restore testing exercises.
• Support high availability and failover troubleshooting.
• Monitoring & Observability
• Own Azure Monitor, Log Analytics, and Application Insights configurations.
• Define SLIs/SLOs and tune alert thresholds to reduce operational noise.
• Build dashboards and proactive health checks.
• Drive automation of alert remediation and operational runbooks.
• Identity, Access & Security
• Manage operational IAM using Entra ID, RBAC, and Privileged Identity Management (PIM).
• Enforce least-privilege access and conduct access reviews.
• Lead security vulnerability remediation and hardening initiatives.
• Manage Key Vault access policies and secrets lifecycle.
• Support audit, compliance, and security review activities.
• Governance & Azure Policy
• Implement and maintain Azure Policy definitions and initiatives.
• Enforce tagging, naming conventions, and compliance standards.
• Contribute to governance maturity and subscription-level controls.
• Cost Optimisation
• Identify optimisation opportunities across Elastic Pools, App Service Plans, Storage Accounts, and Reserved Instances.
• Provide cost insights and recommendations to stakeholders.
• Partner with FinOps to safely implement optimisation initiatives via standard change controls.
• Backup & Disaster Recovery
• Own backup configuration and validation across SQL Databases, Storage Accounts, and App Services.
• Maintain documented RTO/RPO definitions.
• Ensure restore testing and DR validation are performed and evidenced.
• Automation & Continuous Improvement
• Develop and maintain Terraform configurations for infrastructure provisioning.
• Write PowerShell scripts to automate administrative tasks.
• Contribute to CI/CD pipelines using Azure DevOps and GitHub Actions.
• Identify and eliminate repetitive manual operational tasks.
• Networking
• Manage private networking including VNets, NSGs, Private Endpoints, and Private DNS Zones.
• Troubleshoot connectivity issues across subscriptions and environments.
• Ensure network design aligns with security and least-privilege principles.
• Mentorship & Technical Escalation
• Mentor IC1 and IC2 engineers.
• Develop knowledge articles, runbooks, and technical documentation.
• Conduct peer reviews and knowledge-sharing sessions.
• Act as the Azure technical escalation point within the team.
Essential Skills & Experience
• 8-10+ years in Cloud / DevOps / SRE roles, with 8+ years hands-on Azure experience.
• Strong operational experience with Azure App Services and Function Apps.
• Azure SQL Database and Elastic Pool performance tuning, backup, and restore expertise.
• Proven experience handling P1/P2 incidents independently.
• Deep knowledge of Azure Monitor, Log Analytics, and Application Insights.
• Strong networking fundamentals: VNet, NSG, Private Endpoints, Application Gateway, WAF, Traffic Manager.
• Entra ID, RBAC, and Privileged Identity Management (PIM).
• Azure Policy and governance controls.
• Backup and Disaster Recovery strategy and validation.
• Infrastructure as Code using Terraform.
• PowerShell scripting and automation.
• CI/CD experience with Azure DevOps and/or GitHub Actions.
• Cost optimisation and FinOps collaboration experience.
• Highly Desirable
• Azure Front Door experience.
• Automation using Bash and Azure CLI.
• Familiarity with Azure Well-Architected Framework.
• Experience with ITSM tools (Jira, JSM, ServiceNow).
• Exposure to AWS within a multi-cloud environment.
Additional Information:
• This position follows a hybrid work model, requiring in-office presence on days defined by your manager. Occasional out-of-hours engagement may be required for major incidents or critical escalations in coordination with the 24/7 Operations Centre.
Why join us?
We’re on a journey to become market leaders in our space – and with that comes some incredible opportunities. Collaborate and learn from industry experts from all over the globe. Work with game-changing products and services. Get the training and support you need to try new things, adapt to quick changes and explore different paths. Join Keyloop and progress your career, your way.
Â
An inclusive environment to thrive
We’re committed to fostering an inclusive work environment. One that respects all dimensions of diversity. We promote an inclusive culture within our business, and we celebrate different employees and lifestyles – not just on key days, but every day.
Â
Be rewarded for your efforts
We believe people should be paid based on their performance so our pay and benefits reflect this and are designed to attract the very best talent. We encourage everyone in our organisation to explore opportunities which enable them to grow their career through investment in their development but equally by working in a culture which fosters support and unbridled collaboration.
Keyloop doesn’t require academic qualifications for this position. We select based on experience and potential, not credentials.
We are also an equal opportunity employer committed to building a diverse and inclusive workforce. We value diversity and encourage candidates of all backgrounds to apply.