Senior Infrastructure & Security Engineer

Koda Health
6h$160,000 - $170,000Remote

About The Position

Koda Health is looking for a Senior Infrastructure & Security Engineer to own the reliability, security, and operational health of our production systems. You'll be the person responsible for keeping our platform running, secure, and observable — owning everything from AWS infrastructure and deployment pipelines to incident response, security compliance, and production monitoring. You'll work directly with the CTO and a small engineering team. This is a hands-on, high-ownership role. We run a multi-region healthcare platform on AWS with real uptime requirements, HIPAA obligations, and SOC 2 compliance. You'll inherit a mature CDK codebase and be expected to extend it, harden it, and build the monitoring and incident management layer. We also want someone who can contribute to the codebase and automate operational work. You won't be a full-time software engineer, but you should be comfortable using AI coding tools like Claude Code to make small TypeScript PRs, triage Sentry errors, fix production bugs, and set up automated monitoring, triage, and recurring infrastructure health checks. Expect roughly: 60–70% infrastructure, architecture, reliability, and monitoring 10–20% security, compliance, and vendor questionnaires 5–10% TypeScript contributions (bug fixes, small features, Sentry triage)

Requirements

  • 6+ years building and operating production systems on AWS
  • Strong experience with AWS CDK (we use CDK in typescript)
  • Deep knowledge of core AWS services: Lambda, ECS, S3, CloudWatch, SNS, SQS, IAM, VPC, WAF
  • Experience setting up and managing monitoring, alerting, and incident management
  • Experience with security hardening and compliance in regulated environments (HIPAA, SOC 2, or similar)
  • Working knowledge of TypeScript or Node.js - enough to read the codebase, make PRs, and debug production issues
  • Experience with CI/CD pipelines (CodePipeline, GitHub Actions, or similar)
  • Comfortable owning production systems end-to-end in a small team where you're the expert
  • Strong English fluency - written & verbal communication (security questionnaire responses, etc)
  • US-based, able to work CST/EST hours (contractual requirement).

Nice To Haves

  • Healthcare industry experience (FHIR, HL7v2, Epic/Cerner integrations)
  • Experience with multi-region AWS architectures and disaster recovery
  • Experience with MongoDB operations and performance
  • Experience with cost optimization in AWS
  • Familiarity with AI-assisted development tools (e.g., Claude Code)

Responsibilities

  • Own the operational health of production across two AWS regions
  • Investigate production issues, lead root-cause analysis, and drive resolution
  • Build and maintain dashboards that give real-time visibility into application health, queue depths, API latency, and error rates
  • Monitor SQS/SNS queue health, dead-letter queues, and event processing pipelines
  • Expand observability beyond CloudWatch - evaluate and implement distributed tracing, APM, and log aggregation
  • Oversee weekly deployments to production
  • Own cost monitoring and alerting (Budget alerts, Cost Explorer)
  • Improve automated uptime and SLA reporting
  • Own and evolve all AWS infrastructure defined in CDK
  • Lead the migration to capturing 100% of cloud infrastructure in CDK
  • Manage and improve services: Lambda, ECS Fargate, Elastic Beanstalk, S3, CloudFront, SNS, SQS, EventBridge, WAF, Cognito
  • Support multi-region uptime, disaster recovery planning, and backup/restore practices
  • Improve cross-region replication and automated failover
  • Own deployment pipelines, release processes, and database migration safety
  • Support and evolve data pipelines used for analytics and product features
  • Set standards for how we ship, deploy, and operate software at scale
  • Maintain and harden AWS infrastructure with a strong security mindset
  • Own vulnerability remediation and SLA timelines
  • Help respond to security questionnaires and vendor assessments
  • Own and improve WAF rules, security groups, IAM policies, and network configuration
  • Own SecurityHub, AWS Config, VPC Flow Logs, and CloudTrail
  • Support GuardDuty malware scanning and S3 upload security
  • Ensure SOC 2 and HIPAA compliance across infrastructure
  • Manage secrets, key rotation, and access controls
  • Conduct periodic security reviews of infrastructure and application configuration
  • Triage and fix production errors surfaced by Sentry
  • Make small TypeScript PRs to backend services
  • Debug complex production issues that span infrastructure and application code
  • Participate in architecture discussions, especially around infrastructure and deployment concerns

Benefits

  • Base salary of $160,000 - $170,000 per year
  • Fully remote role (US-based)
  • Flexible, Unlimited Paid Time Off
  • Great medical, dental, and vision coverage
  • 401k options
  • Yearly personal development budget that can be used for books, courses, trainings, and more
  • Office setup budget
  • Annual company and team events
  • Latest Macbook + enterprise tooling (e.g. Claude Code, etc)
  • Opportunity to gain exposure to applied RL and SFT work on foundational AI models
  • Clear growth paths for ICs (Staff/Principal) and managers (EM/Director).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service