WHAT IS EVEROPS?
Some of the world’s most advanced and innovative global enterprise software and tech companies have trouble finding engineering partners that have the ability to perform highly complex deliveries and services that match their rigorous standards. These teams need a partner that can co-own problems from within their own development environment. Enter EverOps – the premier Embedded Service Provider. We partner directly with our customer engineering and operations teams to help them assess and address a variety of delivery and service related issues in the DevOps space.
THE CHALLENGE
EverOps is looking for a Senior DevOps Data Engineer with deep expertise in data platform architecture, disaster recovery design, and infrastructure-level data operations. This role is not about data analytics or content—it’s about building and operating the infrastructure that makes data systems reliable, resilient, and scalable. You’ll own the architectural decisions around data platform availability, cutover workflows, replication topologies, and backup/restore strategies across enterprise cloud environments.
THE MISSION
As a DevOps Data Engineer at EverOps you will join our U.S.-Based Virtual Operating Center (your home office), working with a team of dynamic engineers to architect and operate data infrastructure across multiple customers’ production cloud environments. You’ll bring a data architect’s lens to DevOps—designing DR strategies, planning database migrations and cutovers, and ensuring data platform resilience at scale. Our existing team of engineers has a deep understanding of our customer environments and are eager to empower, ramp up, and mentor each new hire so that success is achieved.
WHAT YOU’LL DO
• Design, implement, and validate disaster recovery architectures for relational, NoSQL, and managed data services across AWS, Azure, or GCP
• Plan and execute database migration cutovers including blue-green database swaps, read-replica promotion, and zero-downtime schema migration workflows
• Architect replication topologies (cross-region, cross-account, active-passive, active-active) and validate RPO/RTO targets through runbook-driven DR drills
• Build and maintain Infrastructure as Code for data platform provisioning (RDS, Aurora, DynamoDB, ElastiCache, Redshift, managed Kafka/MSK, etc.) using Terraform, Atlantis, and/or CloudFormation
• Design backup, snapshot, and point-in-time recovery strategies with automated validation and alerting
• Develop automation tooling for data platform operations: failover orchestration, health checks, capacity scaling, and credential rotation
• Implement observability for data infrastructure—replication lag monitoring, connection pool health, query performance baselines, and storage growth forecasting
• Support production workload migrations including data tier cutovers with rollback plans and data integrity verification
• Contribute to multi-tenant Kubernetes platform operations where data services intersect (e.g., External Secrets Operator for DB credentials, sidecar patterns for connection pooling)
• Participate in regular customer and internal EverOps scrums, providing data architecture guidance and operational status
• Document runbooks, architecture decision records (ADRs), and operational playbooks for data platform operations
YOU HAVE
• 5+ years of professional experience as a DevOps Engineer, Data Platform Engineer, Database Reliability Engineer, or Site Reliability Engineer with a data infrastructure focus
• Deep hands-on experience designing and operating disaster recovery architectures for production databases (failover, replication, backup/restore, cross-region DR)
• Production experience planning and executing database cutover workflows—blue-green database swaps, read-replica promotions, DMS-based migrations, and zero-downtime schema changes
• Strong experience with AWS managed data services: RDS/Aurora (Multi-AZ, Global Database, cross-region replicas), DynamoDB (Global Tables, PITR, on-demand backup), ElastiCache, Redshift, and/or MSK
• Hands-on experience with Infrastructure as Code (Terraform + Atlantis and/or CloudFormation) for data platform provisioning and lifecycle management
• Hands-on experience and deep understanding of Linux
• Strong professional experience with at least one of: Python, Golang, Bash, or Rust for automation and tooling
• Production experience with Amazon EKS including understanding of how data workloads intersect with Kubernetes (StatefulSets, PVCs, External Secrets Operator, connection pooling)
• Experience with HashiCorp Vault for secrets management, particularly database credential rotation and dynamic secrets
• Understanding of GitOps workflows, repository structures, and governance patterns
• Experience with CI/CD tools like Jenkins, GitHub Actions, ArgoCD, etc.
• Experience with monitoring tools such as Datadog, Splunk, ELK, or Prometheus/Grafana—specifically for data infrastructure observability (replication lag, connection health, query latency)
• Relational database experience with PostgreSQL or MySQL including operational knowledge of replication, failover, and performance tuning
• NoSQL experience with at least one of: DynamoDB, Cassandra, or MongoDB including understanding of consistency models and partition strategies
EXTRA AWESOME
• Experience designing and executing DR drills with documented RPO/RTO validation
• AWS Database Migration Service (DMS) or Schema Conversion Tool (SCT) experience for heterogeneous migrations
• Data platform consolidation or multi-tenant data architecture experience
• Streaming data infrastructure experience (Kafka/MSK, Kinesis) including replication and DR patterns
• FinOps experience including Reserved Instance planning, storage tiering optimization, and cost allocation tagging for data services
• Experience with database proxy or connection pooling solutions (RDS Proxy, PgBouncer, ProxySQL)
• Argo Rollouts or similar progressive delivery tooling adapted for data-tier changes
• Gaming infrastructure or high-MAU consumer platform experience with significant data tier complexity
• Change Data Capture (CDC) experience for migration or replication workflows
• Industry certifications (AWS Database Specialty, AWS Solutions Architect, HashiCorp, etc.)
• Strong desire to learn new technologies
BENEFITS
• 100% remote workplace – We’ve been remote since Day 1!
• Unlimited Paid Time Off
• Equity – If you display ownership of the work you’re doing you’ll become a true owner of the company
• 401K with company contribution
• Company sponsored healthcare
• Competitive compensation
• Opportunities to accelerate professional growth with access to training and certification programs