LIFE AT UIPATH The people at UiPath believe in the transformative power of automation to change how the world works. We’re committed to creating category-leading enterprise software that unleashes that power. To make that happen, we need people who are curious, self-propelled, generous, and genuine. People who love being part of a fast-moving, fast-thinking growth company. And people who care—about each other, about UiPath, and about our larger purpose. Could that be you? YOUR MISSION UiPath is seeking a Principal Site Reliability Engineer to redefine how reliability is engineered using AI. This role focuses on building intelligent reliability platforms and tooling that leverage AI/ML to improve reliability of our services, reduce operational toil for developers, and accelerate incident response across large-scale, cloud-native systems. You will operate at the intersection of SRE, distributed systems, and applied AI, designing systems that transform raw telemetry into actionable insights, enable predictive reliability, and introduce self-healing capabilities into production environments. You will build the next generation of reliability systems, where detection, diagnosis, and remediation are increasingly automated and data driven. You will help define how reliability is architected, scaled, measured, and automated across our large-scale, cloud-native systems. This role requires broad technical judgment, platform thinking, and the ability to influence reliability outcomes across the various engineering and platform teams. WHAT YOU'LL DO AT UIPATH Intelligent automation and Self-healing systems - Design and implement self-healing mechanisms including automated remediation workflows and intelligent retry and fallback strategies. Reliability platform tooling - Build internal systems that enable engineering teams to debug faster using AI-assisted tooling and proactively identify and mitigate reliability risks. End-to-End Reliability strategy - Define and evolve reliability strategy using predictive reliability models(Capacity, Failure forecasting, Reliability scoring) and embed intelligent reliability practices across the engineering teams. AI-assisted Incident response & RCA - Build AI-powered systems that determine impact and use historical data to improve detection and response over time. Technical Leadership & Org Impact - Influence standards for building AI-driven tooling, mentor junior and senior engineers, and elevate reliability focus across the organization. WHAT YOU'LL BRING TO THE TEAM Engineering & Reliability Experience • 7+ years of experience in SRE, Platform, Cloud infrastructure engineering roles with a track record of building internal tooling to improve reliability. • Strong conceptual understanding of distributed systems, performance bottlenecks, failure modes, and trade-offs inherent to large-scale systems. AI/ML Application to systems & operations • Experience building applications or internal tools using LLMs to automate non-trivial workflows (e.g., AIOps, Automated code reviews, Automated flagging of reliability risks) • Hands-on experience with building Agents/Copilots using modern ML frameworks (PyTorch, vLLM or equivalent) in production setting. Scripting & Tooling • Proficiency in at least one programming language (e.g., Python, Go, or similar). Experience with Infrastructure as Code (e.g., Terraform, Pulumi) and container orchestration (e.g., Kubernetes). Cloud & Infrastructure Expertise • Hands-on experience working with one or more major cloud providers (Azure, AWS, GCP), with practical knowledge of networking, deployments, and scaling. Observability & Operational Practices • Proven experience with monitoring/observability stacks (metrics, logs, traces) and building meaningful dashboards and alerts that improve reliability signals. Incident Response & Post-Incident Learning • Experience participating in and improving incident response, blameless postmortems, and implementing systemic fixes rather than symptomatic patches. Collaboration & Influence • Ability to partner with product, infrastructure, and engineering teams to influence architecture and reliability practices without direct authority. #LI-VR1 Maybe you don’t tick all the boxes above—but still think you’d be great for the job? Go ahead, apply anyway. Please. Because we know that experience comes in all shapes and sizes—and passion can’t be learned. Many of our roles allow for flexibility in when and where work gets done. Depending on the needs of the business and the role, the number of hybrid, office-based, and remote workers will vary from team to team. Applications are assessed on a rolling basis and there is no fixed deadline for this requisition. The application window may change depending on the volume of applications received or may close immediately if a qualified candidate is selected. We value a range of diverse backgrounds, experiences and ideas. We pride ourselves on our diversity and inclusive workplace that provides equal opportunities to all persons regardless of age, race, color, religion, sex, sexual orientation, gender identity, and expression, national origin, disability, neurodiversity, military and/or veteran status, or any other protected classes. Additionally, UiPath provides reasonable accommodations for candidates on request and respects applicants' privacy rights. To review these and other legal disclosures, visit our privacy policy [Upgrade to PRO to see link]

Principal Site Reliability Engineer at Uipath

Similar Engineering Jobs

Software Engineer, CX (Node.js, AWS)

Technical Support Engineer

Senior Software Engineer, CX (Node.js, AWS)

Share this job

About Uipath

Monolith Software Engineer

Senior PHP Developer

Staff Data Engineer

Translation Jobs

Popular Skills

Jobs by Salary

For Job Seekers

For Employers