Lead Platform Engineer
Relish
Software Engineering
Remote
USD 130k-170k / year
Lead Platform Engineer
Job details
Pay
- $130,000 - $170,000 a year
Job type
- Full-time
Work setting
- Remote
BenefitsPulled from the full job description
- Referral program
- 401(k)
- Health insurance
- 401(k) matching
- Vision insurance
- Dental insurance
Full job description
Lead Platform EngineerAbout Us
We are a rapidly growing, high-impact startup in the procurement space, expanding at 50% YoY. We process massive volumes of sensitive invoice and supplier data using advanced OCR, structured data extraction, matching, validation, and anomaly detection. Because we handle critical financial data for our clients, security, compliance, and trust are foundational to everything we build. As we rapidly expand our product capabilities into Agentic AI workflows, we are overhauling our infrastructure to support our next massive phase of growth securely and efficiently.
The Opportunity
We are looking for a Lead Platform Engineer to spearhead the modernization of our cloud infrastructure. Today, we operate heavily in an AWS Serverless environment (Lambdas, Step Functions, DynamoDB). Tomorrow, with your leadership, we are migrating to a robust, highly scalable, and compliant platform built on Kubernetes, Temporal.io, and a mix of modern databases (PostgreSQL, MySQL, NoSQL).
In this role, you will own the transition, lay down the foundational infrastructure using Terraform CDK, and build out modern CI/CD pipelines from scratch. A major mandate for this role is to design the new platform to be as cloud-agnostic as possible, preventing vendor lock-in while maintaining high performance. You will ensure the platform is secure by design—adhering to frameworks like SOC 2 and GDPR—while designing the infrastructure required to run cutting-edge Agentic AI workloads.
Key Responsibilities
· Architect the Future: Lead the migration from our current AWS Serverless stack to a modern, cloud-agnostic Kubernetes architecture capable of supporting diverse database workloads (PostgreSQL, MySQL, NoSQL).
· Security & Compliance by Design: Build the new infrastructure to strictly adhere to compliance frameworks (e.g., SOC 2, ISO 27001, GDPR). Implement DevSecOps principles, zero-trust networking, secure secrets management, and robust IAM policies.
· Workflow Orchestration: Implement and manage Temporal.io to replace AWS Step Functions, ensuring highly reliable, scalable, and resilient distributed workflows.
· Advanced Observability: Design and implement comprehensive observability using OpenTelemetry and leading log management/APM platforms (e.g., Datadog, Splunk, Elastic, Grafana Loki). You will ensure we have deep, distributed tracing and visibility across microservices, Temporal workflows, and AI agents.
· Infrastructure as Code (IaC): Fully automate our cloud infrastructure provisioning and management using Terraform CDK, ensuring everything is version-controlled, auditable, repeatable, and easily portable across cloud providers.
· CI/CD Pipeline Design: Build and maintain fast, automated deployment pipelines using [GitHub Actions / Azure DevOps] with automated security, testing, and compliance guardrails built-in.
· AI Infrastructure: Collaborate closely with our Data and AI engineering teams to provision and optimize secure infrastructure for Agentic AI workflows (e.g., managing GPU compute, vector databases, and AI model deployments).
· Cloud FinOps: Actively monitor, analyze, and optimize infrastructure costs without sacrificing performance, security, or reliability.
What We're Looking For
· Experience: 7+ years of experience in DevOps, Cloud Infrastructure, or Platform Engineering, with a proven track record of leading large-scale architectural migrations.
· Cloud & Agnosticism: Deep expertise in AWS (EKS, networking, IAM), paired with a strong architectural mindset for building portable, cloud-agnostic systems that abstract away underlying provider dependencies.
· Security & Compliance: Proven experience designing and managing cloud environments subject to rigorous compliance audits (SOC 2, GDPR, etc.). You know how to secure a Kubernetes cluster and manage sensitive data at scale.
· Kubernetes Expert: Extensive hands-on experience designing, deploying, securing, and managing production Kubernetes clusters.
· Modern IaC: Strong proficiency in Infrastructure as Code. Experience with Terraform CDK (using TypeScript or Python) is highly preferred.
· Observability Champions: Deep understanding of distributed tracing, metrics, and logging. Hands-on experience with OpenTelemetry and major observability platforms.
· Programming Skills: You are a strong coder—not just a scripter. Proficiency in TypeScript, Python, or Go is required to effectively use CDKTF and support our engineering teams.
· Database Knowledge: Experience with database migrations and optimizing a mix of relational (PostgreSQL, MySQL) and modern NoSQL databases for high-throughput, data-intensive workloads.
Bonus Points
· Hands-on experience with Temporal.io or similar workflow orchestration engines (e.g., Airflow, Cadence).
· Experience supporting AI/ML infrastructure, MLOps, or running LLM agents in production.
· Background in B2B SaaS, fintech, or procurement data processing.
Pay: $130,000.00 - $170,000.00 per year
Benefits:
- 401(k)
- 401(k) matching
- Dental insurance
- Health insurance
- Referral program
- Vision insurance
Experience:
- Infrastructure as code: 3 years (Required)
- Platform Engineering: 1 year (Required)
- Terraform CDK: 3 years (Required)
- AWS: 4 years (Required)
Work Location: Remote