Infinite pl, is a digital led tech firm driven to become a digital logistics pioneer by harnessing the power of people, data, and platforms. We are enabled through in-house, external, network, & other investment capabilities which we utilize to orchestrate & build innovative platforms that tackle complex problems within logistics & adjacent sectors.Â
Â
Infinite pl’s mission is nothing short of a logistics revolution! We're here to enrich the experiences of governments, businesses, and residents around the world through cutting-edge digital solutions.
"We're not just players; we're game-changers."
Job Summary:
• We are seeking an experienced Operations Manager to lead the technical operations team responsible for running a mission-critical platform with a target availability of 99.99%. The role will ensure service stability, SLA achievement, ITSM process compliance, security and regulatory compliance, and continuous improvement across operations. The Operations Manager will own day-to-day service operations, major incident leadership, operational readiness, vendor coordination, and reporting to senior stakeholders.
Key Objectives:
• Oversee stable operations for the platform with 99.99% availability.
• Enforce and continuously improve ITSM processes (Incident, Problem, Change, Request, Release, Knowledge, CMDB).
• Ensure SLA / SLO compliance, operational readiness, and performance reporting.
• Maintain strong security posture and ensure adherence to applicable compliance requirements.
Key Responsibilities:
• 1) Service Operations Leadership
• Oversee the platform operations team (NOC/Operations Engineers/SRE-like functions as applicable) to ensure reliable, secure, and high-performing services.
• maintain clear operating rhythms: daily ops reviews, weekly service health checks, monthly SLA reviews, and quarterly service improvement plans.
• Drive on-call readiness, shift coverage, escalation paths, and decision-making during critical events.
• 2) ITSM Process Ownership & Compliance
• Own and enforce ITSM processes end-to-end.
• Audit operational adherence and drive corrective actions for non-compliance.
• 3) SLA, Availability, and Reliability Management
• Ensure continuous tracking and achievement (availability, response time, resolution time, performance).
• Manage availability and resilience practices: redundancy validation, capacity planning, proactive monitoring, and performance tuning.
• Lead post-incident reviews and drive measurable improvements.
• 4) Security & Compliance, Partner with security teams to ensure:
• Timely patching and remediation
• Secure configuration baselines
• Audit readiness and evidence collection
• Incident response alignment and reporting
• Enforce least privilege access and periodic access reviews.
• 5) Monitoring, Observability, and Operational Tooling
• Ensure comprehensive monitoring and alerting coverage for infrastructure, applications, APIs, databases, integrations, and security events.
• Ensure operational toolchain effectiveness (ITSM tool, monitoring, CI/CD visibility, CMDB, asset management).
• 6) Stakeholder & Vendor Management
• Act as the primary operations interface for internal stakeholders and external partners/vendors.
• Manage vendor SLAs and ensure effective collaboration for incident resolution, patching, upgrades, and service improvements.
• Provide clear operational communications during incidents and planned maintenance.
• 7) Reporting & Governance
• Produce weekly/monthly service reports including SLA performance, availability, incidents, trends, risks, and improvement actions.
• Maintain an operational risk register and ensure mitigation plans are executed.
• Present service health and improvement plans to leadership.
Required Qualifications:
• Bachelor’s degree in Computer Science, Information Systems, Engineering, or equivalent experience.
• 5+ years in IT operations / production support roles, with 2+ years leading teams for critical services.
• Strong hands-on understanding of operating high-availability platforms (24/7 environments).
• Proven experience implementing and running ITSM processes in production (ITIL-aligned).
Technical & Professional Skills:
• Deep understanding of incident/problem/change management, operational readiness, and service governance.
• Experience with cloud and modern platform operations (e.g., cloud infrastructure, APIs, containerized services) is preferred.
• Ability to define, track, and improve operational KPIs and reliability metrics.
• Strong stakeholder management, structured communication, and decision-making under pressure.
Preferred Certifications:
• ITIL Foundation / ITIL Managing Professional (or equivalent ITSM certification)
• ISO 27001 awareness/certification or security-related certifications
• Cloud certifications (GCP) is a plus
Working Model:
• Full-time, includes on-call leadership and participation in major incident bridges as required.
Infinite pl ♾️ - where innovation meets logistics, and the journey is Infinitely boundless!Â
Let's disrupt logistics together and explore infinite opportunities!