REGENT LP
Regent is a global private holding company focused on investing in and transforming businesses across a broad spectrum of industries including automotive, technology, consumer products, retail, industrial, and media. Known for driving innovation and operational excellence, Regent partners with management teams to unlock long-term value.
ROLE OVERVIEW
We are seeking an experienced Lead Engineer specialising in Windows Infrastructure to join a global technology team. This area lead role has full accountability for the stability, security, and continuous improvement of an enterprise Windows server estate - spanning on-premises data centres, VMware virtualisation, and VMC on AWS. This role manages the lifecycle of Windows servers, ensuring the stability, security, and performance of business-critical applications. The successful candidate combines deep technical expertise with the ability to mentor junior team members and collaborate across application, network, storage, and cloud teams.*
*Please note this role is on-site 5 days per week
AREA CONTEXT
The Windows area is dedicated to the management of server operating systems, ensuring the stability, security, and performance of business-critical applications. It covers the deployment, patching, and monitoring of Windows servers across data centres and cloud environments. The area addresses operational challenges such as end-of-life hardware, legacy systems, and the need for rapid response to incidents. It supports business continuity through robust backup and recovery processes and aligns with best practices for system administration.
KEY RESPONSIBILITIES
β’ Build, provision, and maintain Windows servers, VM clusters, and VMs; apply domain join, security baselines, and GPOs in line with organisational standards.
β’ Execute monthly patching cycles using WSUS/SCCM or equivalent; validate application services post-patch and record evidence for compliance.
β’ Monitor OS and VM health (services, event logs, performance counters) and resolve faults; implement and tune monitoring thresholds and dashboards to reduce alert noise.
β’ Troubleshoot authentication, GPO application, and group membership issues; investigate CPU, RAM, disk, and network performance bottlenecks using perfmon and standard tooling.
β’ Validate and renew TLS/certificate bindings for Windows services (IIS, RDP, LDAP); enforce least-privilege local admin practices and secure service account usage.
β’ Ensure antivirus/EDR agents are installed, healthy, and reporting across the Windows estate; remediate coverage gaps in line with security policy.
β’ Manage and administer Rubrik (the primary backup solution) across the Windows estate: maintain backup agents, validate policy coverage, and run periodic restore tests; capture and document DR evidence; participate in DR tests (site failover, recovery) and document outcomes.
β’ Track EOL OS and virtualisation layer versions; plan and coordinate in-place upgrades or migrations, managing risk with application and business stakeholders.
β’ Lead incident response and root-cause analysis for Windows server-related issues; drive lasting remediation and produce post-incident reports.
β’ Oversee, maintain, and support the Hypervisor/Virtualisation layer (VMware, Hyper-V, or Nutanix) used within on-premises data centres and VMC on AWS.
β’ Maintain and support Windows Failover Clusters across on-premises data centres and VMC on AWS; ensure HA policies and DRS configurations are current and tested.
β’ Support application teams with prerequisites, port configurations, and service dependencies; collaborate with network, storage, and cloud teams for seamless operations.
β’ Keep server runbooks, diagrams, and CMDB attributes current and accurate; maintain comprehensive documentation of architectures, configurations, and operational procedures.
β’ Provide weekly operational KPIs (availability, incidents, patch compliance, backup success) and present findings in governance, reporting, and service review meetings.
β’ Ensure alignment with SLA requirements; lead the response to audit findings, ensuring timely remediation and evidenced closure.
β’ Manage vendor relationships and coordinate with third-party Windows support providers (Microsoft Premier, hardware OEMs); oversee secure decommissioning of EOL hardware.
β’ Mentor and develop engineers, fostering skill development and knowledge sharing; manage team workload and capacity across concurrent workstreams, maintain and prioritise the team's Jira ticket queue, and organise the On-Call rota for the Windows area.
β’ Drive continuous improvement in Windows server management processes and stay current with Microsoft technology roadmaps and emerging best practices.
β’ Support business continuity and disaster recovery planning for Windows server environments.
QUALIFICATIONS
Experience
β’ 7+ years of enterprise Windows Server administration and engineering, with at least 2 years in a team lead or leadership role: including team workload management, ticket queue prioritisation (Jira), and hands-on development of engineers.
β’ Demonstrated experience managing large-scale Windows estates (500+ servers) across hybrid on-premises and cloud environments.
β’ Hands-on experience with VMware vSphere/ESXi, Hyper-V, or Nutanix in a production environment, including Windows Failover Cluster and HA/DRS management.
β’ Proven track record delivering complex Windows OS migrations, EOL remediations, and platform consolidation projects on time and within risk tolerance.
β’ Deep experience with Active Directory design, GPO strategy, tiered admin models (PAW/LAPS), and secure privileged access management.
β’ Hands-on experience with patch orchestration tooling (WSUS, SCCM/MECM, or equivalent) and troubleshooting failed updates at enterprise scale.
β’ Strong PowerShell scripting and automation capability β this is a critical requirement for this role; candidates must demonstrate proven ability to write, maintain, and deploy PowerShell scripts for operational automation, compliance reporting, and configuration management. Bash scripting is an additional advantage.
β’ Hands-on experience with Rubrik (primary backup solution) for Windows server backup administration, policy management, and restore operations.
β’ Experience managing TLS/PKI certificate lifecycles, IIS and RDP certificate bindings, and certificate renewal automation.
β’ Background supporting or leading audit and compliance activities (ISO 27001, SOC 2, PCI-DSS, or equivalent) with evidenced remediation.
β’ Proficiency with ITSM and project tracking platforms β Jira (primary) and ServiceNow or equivalent β for incident management, ticket queue management, CMDB, and change control.
Technical Skillset
β’ Windows Server administration: AD join, GPO, services, registry, security baseline enforcement, and CIS hardening at enterprise scale.
β’ Virtualisation layer administration: VMware vSphere/ESXi, Nutanix, or Hyper-V β including VM lifecycle, HA clusters, DRS, and capacity planning.
β’ Patch management tooling and troubleshooting of failed Windows updates (WSUS, SCCM/MECM, or equivalent).
β’ TLS/PKI on Windows roles (IIS, RDP, LDAP): certificate lifecycle management and renewal automation.
β’ Performance triage using perfmon, event logs, Process Monitor, and IOPS analysis.
β’ PowerShell scripting and automation (critical requirement): proven ability to write, maintain, and deploy PowerShell scripts for operational automation, compliance reporting, configuration management, and toil reduction. Bash scripting is an additional advantage.
β’ EDR/AV agent operations: health monitoring, coverage gap remediation, and security policy alignment (CrowdStrike, Defender for Endpoint, or equivalent).
β’ Backup and disaster recovery: Rubrik (primary backup platform) administration, agent management, restore testing, and DR test participation; familiarity with Veeam or Commvault is an advantage.
β’ EOL remediation: in-place upgrade planning, application re-platforming co-ordination, and hardware decommissioning.
β’ CMDB/ITSM proficiency: Jira for team ticket queue management and workload tracking; runbook maintenance, diagram upkeep, and accurate CMDB attribute management (ServiceNow or equivalent).
β’ Mentoring and developing engineers in Windows server operations and troubleshooting methodology; team workload management and Jira ticket queue prioritisation.
β’ AI-assisted automation and agent development (advantageous): ability to work with AI tooling and develop autonomous agents to accelerate Windows infrastructure operations; candidates with hands-on experience building AI agents or integrating LLM-based automation into infrastructure workflows will be strongly preferred.