Platform Developer - Network Architecture & Site Reliability

Judi Health•New York, NY

About The Position

Join our team as a Platform Engineer focused on network architecture and site reliability. In this role, you will own the design and implementation of our cloud network architecture, ensuring our platform is resilient, secure, and scales effectively across multiple AWS accounts, environments, and regions. You'll architect solutions for complex networking challenges like hierarchical CIDR allocation across accounts and integrations, implement disaster recovery and regional redundancy strategies, and establish the reliability practices that keep our healthcare platform running. Working closely with leadership and cross-functional teams, you'll build the foundational infrastructure that supports our rapidly growing platform while ensuring we can handle failures gracefully and recover quickly.

Requirements

5+ years of infrastructure engineering, DevOps, or site reliability engineering experience.
Experience leading projects or teams: Demonstrated ability to lead technical initiatives, drive architectural decisions, or mentor other engineers.
Extensive AWS expertise: Deep production experience with VPC, subnets, routing, security groups, Transit Gateway, Direct Connect, Route 53, and other AWS networking services.
Network architecture experience: Proven track record designing and implementing complex network architectures including CIDR planning, subnet design, and multi-account strategies.
Disaster recovery expertise: Experience designing and testing disaster recovery procedures, backup strategies, and multi-region failover capabilities.
Infrastructure as code: Strong experience with Terraform for managing cloud infrastructure at scale.
Site reliability practices: Understanding of SLIs, SLOs, error budgets, incident management, and reliability engineering principles.
Strong development skills: Proficiency in Python or similar languages for automation and tooling.
Security mindset: Understanding of network security principles, encryption, IAM, and compliance requirements in regulated industries.

Nice To Haves

Multi-cloud experience: Hands-on experience with Azure, GCP, or Oracle Cloud networking and cross-cloud connectivity.
Rust development experience or interest in learning Rust for infrastructure tooling.
Advanced AWS certifications: AWS Solutions Architect Professional, AWS Advanced Networking, or similar certifications.
Container networking: Experience with ECS, EKS, or Kubernetes networking, service mesh, and load balancing.
Database networking: Understanding of database replication, Aurora Global Databases, and cross-region data synchronization.
CDN and edge networking: Experience with CloudFront, edge locations, and content delivery optimization.
Cost optimization: Track record of reducing infrastructure costs while maintaining or improving reliability.
Previous Pharmacy Benefits Manager (PBM) or healthcare technology experience.

Responsibilities

Design network architecture: Architect and implement hierarchical CIDR allocation strategies across multiple AWS accounts, environments, and external integrations, ensuring proper subnet organization and IP address management.
Build multi-region capabilities: Design and implement regional redundancy and multi-region failover strategies to support disaster recovery requirements and improve platform availability.
Manage cloud networking: Own VPC design, subnets, routing tables, security groups, NACLs, VPC peering, Transit Gateway, and other AWS networking components across our infrastructure.
Implement disaster recovery: Establish and regularly test disaster recovery procedures, backup strategies, and failover mechanisms to ensure business continuity.
Evaluate multi-cloud solutions: Assess and prototype solutions across AWS, Azure, GCP, and Oracle Cloud to determine optimal approaches for specific use cases and integrations.
Monitor platform health: Define and implement platform health indicators, SLIs, SLOs, and monitoring that provide early warning of infrastructure issues and track reliability improvements.
Ensure security and compliance: Work with security teams to implement network security best practices, maintain compliance requirements (HIPAA, SOC2, FedRAMP), and conduct security reviews.
Drive infrastructure reliability: Participate in on-call rotations, lead incident response for infrastructure issues, conduct postmortems, and implement improvements to prevent recurrence.
Automate infrastructure management: Build infrastructure-as-code using Terraform to manage network resources, implement self-service capabilities, and ensure consistency across environments.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume