Help us use technology to make a big green dent in the universe! Kraken powers some of the most innovative global developments in energy. We’re a technology company focused on creating a smart, sustainable energy system. From optimising renewable generation, creating a more intelligent grid and enabling utilities to provide excellent customer experiences, our operating system for energy is transforming the industry around the world in a way that benefits everyone. It’s a really exciting time in energy. Help us make a real impact on shaping a better, more sustainable future. Our Global Platform Engineering Reliability group is responsible for architecting, developing, and maintaining the resilient and scalable infrastructure that powers and supports our platform. We are looking for a Senior Platform Engineer to join our Product Reliability team. You will operate as a senior individual contributor, owning complex, cross-cutting reliability problems that span teams and services. You’ll be expected to not just solve issues, but to identify where the system itself needs to improve and drive those changes across the organisation. Your core focus will be improving the availability, performance, and resilience of customer-facing systems across Kraken. You will work across multiple product services and domains, partnering with engineering and product teams to identify systemic reliability risks, drive platform-wide improvements, and help teams build and operate reliable services with confidence. This is a high-autonomy, hands-on role. You will lead reliability initiatives end-to-end - from identifying risks and debugging production issues to shaping long-term fixes, improving incident practices, and influencing how systems are designed and operated. You will be expected to navigate ambiguity, build alignment across teams, and raise the bar for operational excellence through both technical depth and strong collaboration. You will have the opportunity to make a direct, visible impact across engineering by reducing repeat incidents, improving service performance, strengthening observability, and creating scalable patterns that teams across the organisation can adopt. You will work closely with teams across the reliability group and broader platform function, including Foundational Observability, Quality Engineering, Solutions, Systems, and Product Engineering teams. Our current stack includes AWS, Terraform, Python, Datadog, Grafana, Kubernetes, PostgreSQL, and Rootly. What You’ll Do: • Lead reliability improvements across multiple product services and domains • Identify systemic reliability risks and drive cross-team initiatives to address them • Partner closely with product and engineering teams to influence system design, operational practices, and prioritisation • Improve observability, incident management, and service performance across critical systems • Lead incident investigations and follow-up, ensuring root causes are addressed and long-term fixes are driven through to completion • Help standardise incident management practices, tooling usage, and operational guardrails across teams • Contribute hands-on through debugging, code changes, automation, and system design • Identify common reliability patterns and implement scalable solutions that can be reused across teams • Establish and promote best practices for building and operating reliable systems • Support broader platform engineering work where needed across infrastructure, release, developer enablement, and resilience initiatives • Help solve complex and ambiguous problems in a fast-moving environment What You'll Have: • Strong experience operating and improving production systems at scale • Proven track record of leading reliability or platform initiatives across teams, with measurable impact • Deep understanding of distributed systems and common failure modes • Strong debugging and problem-solving skills in complex production environments • Hands-on experience with cloud infrastructure, with AWS preferred; strong GCP or Azure experience is also valued • Experience working with infrastructure tooling such as Terraform • Ability to read, write, review, and improve production-grade code; Python experience is highly valued • Experience with incident management tooling such as Rootly, PagerDuty, Incident.io, or Datadog • Experience leading incident investigations, post-incident follow-up, and long-term remediation • Strong communication skills, including explaining technical concepts and trade-offs clearly to different audiences • Experience working cross-functionally with product and engineering teams in distributed environments • Strong interpersonal skills, empathy, and the ability to influence teams constructively • Comfort operating with high autonomy in small, accountable teams • Comfortable working in a Kanban environment What Will Help: • Kubernetes and container orchestration • Observability tooling such as Datadog, Grafana, and Prometheus • CI/CD and release engineering practices • Event-driven systems and messaging platforms • Experience in SRE, platform engineering, or other reliability-focused roles • Familiarity with PostgreSQL or Amazon RDS at scale • Experience defining reusable standards, guardrails, or golden paths across teams Kraken is a certified Great Place to Work in France, Germany, Spain, Japan and Australia. In the UK we are one of the Best Workplaces on Glassdoor with a score of 4.5 and in Germany we rate 4.7 on Kununu as a Top Company. Check out our Welcome to the Jungle site (FR/EN) to learn more about our teams and culture. Are you ready for a career with us? We want to ensure you have all the tools and environment you need to unleash your potential. If you have any specific accommodations or a unique preference, please contact us at [Upgrade to PRO to see contact] and we'll do what we can to customise your interview process for comfort and maximum magic! Studies have shown that some groups of people, like women, are less likely to apply to a role unless they meet 100% of the job requirements. Whoever you are, if you like one of our jobs, we encourage you to apply as you might just be the candidate we hire. Across Kraken, we're looking for genuinely decent people who are honest and empathetic. Our people are our strongest asset and the unique skills and perspectives people bring to the team are the driving force of our success. As an equal opportunity employer, we do not discriminate on the basis of any protected attribute. We consider all applicants without regard to race, colour, religion, national origin, age, sex, gender identity or expression, sexual orientation, marital or veteran status, disability, or any other legally protected status. U.S. based candidates can learn more about their EEO rights here. Our (i) Applicant and Candidate Privacy Notice and Artificial Intelligence (AI) Notice, (ii) Website Privacy Notice and (iii) Cookie Notice govern the collection and use of your personal data in connection with your application and use of our website. These policies explain how we handle your data and outline your rights under applicable laws, including, but not limited to, the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Depending on your location, you may have the right to access, correct, or delete your information, object to processing, or withdraw consent. By applying, you acknowledge that you’ve read, understood and consent to these terms

Senior Platform Engineer - Product Reliability at kraken123

Similar Engineering Jobs

Senior Software Engineer - Order Management System

Software Engineer in Test - LOIS for Meetings

Business Development Manager (Bank Customers) - India

Share this job

About kraken123

Business Development Manager - Baden-Württemberg

Senior Software Engineer - C++ (Cloud Video Framework)

Principal Support Engineer (L3, Edge Network) | Gcore | Remote

Translation Jobs

Popular Skills

Jobs by Salary

For Job Seekers

For Employers