About Gridware
Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid. We pioneered a groundbreaking new class of grid management called active grid response (AGR), focused on monitoring the electrical, physical, and environmental aspects of the grid that affect reliability and safety. Gridwareβs advanced Active Grid Response platform uses high-precision sensors to detect potential issues early, enabling proactive maintenance and fault mitigation. This comprehensive approach helps improve safety, reduce outages, and ensure the grid operates efficiently. The company is backed by climate-tech and Silicon Valley investors. For more information, please visitΒ www.Gridware.io.
Responsibilities
β’ Lead the design, build, and rollout of an internal developer platform on top of AWS, EKS, Argo CD, and GitHub Actions that lets engineers create, deploy, and operate services with minimal friction.
β’ Own and evolve our service templates, Helm chart conventions, and Argo CD App-of-Apps patterns so that adding or migrating a service is a guided, low-risk experience.
β’ Build and maintain reusable GitHub Actions workflows (build / push / scan, frontend build / deploy, SonarQube scans, semantic release) and improve CI feedback loops, build times, and caching.
β’ Define and enforce platform standards for observability β structured logs into Loki, metrics into Prometheus / Mimir, dashboards in Grafana, and SLOs / alerts wired in by default.
β’ Build self-service tooling around environments, secrets, feature flags, and access β so that the right thing is easy and the wrong thing is hard to do by accident.
β’ Own the developer-facing aspects of identity and access (Auth0, IdP integrations, Tailscale access, IRSA / service accounts) and keep onboarding and offboarding smooth.
β’ Partner with DevOps on infrastructure changes, with Cloud Security on guardrails, and with backend / frontend / data / firmware teams to understand their pain points and prioritize platform investments.
β’ Mentor engineers across the org on platform conventions, lead design reviews for new services, and push back on patterns that donβt scale.
β’ Treat the platform as a product: gather feedback, define roadmaps, write docs, and measure adoption and reliability.
Required Skills
β’ 5+ years in Platform Engineering, DevOps, or SRE roles, including significant experience building and shipping developer-facing tooling for other engineering teams.
β’ Track record of owning and delivering platform initiatives end-to-end, from design through adoption, with limited day-to-day supervision.
β’ Strong working knowledge of Kubernetes (EKS or similar) and GitOps workflows with Argo CD or Flux.
β’ Hands-on experience with Infrastructure as Code using Terraform; comfort with Terragrunt or a similar wrapper.
β’ Solid experience with CI/CD systems, ideally GitHub Actions, including reusable / composable workflows and release automation.
β’ Working knowledge of AWS core services (EKS, EC2, RDS, S3, IAM, VPC, ECR) and how to compose them into reliable, secure platforms.
β’ Experience designing developer abstractions β Helm charts, service templates, internal CLIs, scaffolding tools, or Backstage-style portals β that other engineering teams easily interact with.
β’ Strong programming skills in Python, Bash, or TypeScript for building tooling and automation.
β’ Experience integrating observability (Grafana, Loki, Prometheus / Mimir, OpenTelemetry, or similar) as a default rather than an afterthought.
β’ Strong written communication skills, with a habit of writing docs, runbooks, and wikis that engineers can actually use.
Bonus Skills
β’ Experience building or operating Apollo Router / GraphQL federation gateways and supporting subgraph development workflows.
β’ Experience with Backstage or a comparable internal developer portal.
β’ Experience integrating Argo Workflows or similar Kubernetes-native job / pipeline runners into a developer platform.
β’ Familiarity with Databricks or ML Ops pipelines and the developer experience around data / model deployment.
β’ Experience with Tailscale, Auth0, EntraID, or other identity / zero-trust networking tooling.
β’ Familiarity with cloud architectures supporting IoT / embedded systems and distributed, low-power devices.
β’ Experience in high-growth startup environments where you must wear many hats.
This describes the ideal candidate; many of us have picked up this expertise along the way. Even if you meet only part of this list, we encourage you to apply!
Benefits
Health, Dental & Vision (Gold and Platinum with some providers plans fully covered)Β
Paid parental leaveΒ
Alternating day off (every other Monday)
βOff the Gridβ, a two week per year paid break for all employees.Β
Commuter allowanceΒ
Company-paid training