EOP - Site Reliability Engineer - TS/SCI Required

cFocus Software IncorporatedWashington, DC
1dRemote

About The Position

cFocus Software seeks a Site Reliability Engineer to join our program supporting the United States Secret Services (USSS). This position is remote. This position requires the ability a TS/SCI clearance.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent experience).
  • Minimum of 2 years of experience in systems engineering, DevOps, or Site Reliability Engineering roles.
  • Strong proficiency with Linux/Unix operating systems.
  • Experience with scripting and automation using Python, Bash, or similar languages.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or equivalent.
  • Experience supporting CI/CD tools such as GitLab, Jenkins, or ArgoCD.
  • Experience with containerization and orchestration platforms including Docker and Kubernetes.
  • Understanding of SRE principles including SLIs, SLOs, and error budgets.
  • Strong troubleshooting, problem-solving, and documentation skills.
  • Ability a TS/SCI clearance.

Responsibilities

  • Monitor system health, availability, and performance using centralized monitoring and logging tools.
  • Respond to, troubleshoot, and resolve incidents in production environments and provide root cause analysis.
  • Conduct after-action reporting and post-incident reviews to improve system resilience.
  • Automate repetitive operational tasks including deployments, monitoring, and incident response.
  • Administer user accounts, access controls, and authentication mechanisms.
  • Maintain and configure workflow templates, user fields, and application configurations.
  • Maintain test environments that mirror production and support pre-deployment testing.
  • Design and maintain backup, high availability (HA), and disaster recovery (DR) solutions.
  • Develop and maintain incident response and disaster recovery plans for supported applications.
  • Configure and support integrations with complementary enterprise systems.
  • Architect, build, and maintain on-premise and cloud infrastructure supporting applications.
  • Administer production, staging, and development environments.
  • Manage system logs and monitor for security and operational events.
  • Maintain and improve CI/CD pipelines and DevSecOps processes.
  • Apply configuration management disciplines including patching, hardening, and documentation.
  • Create and maintain dashboards, SLIs, SLOs, and service health metrics.
  • Support operational readiness boards and weekly service reviews.
  • Provide on-call support for outages, upgrades, and emergency maintenance as required.
  • Support surge activities, including Presidential Transition-related data analysis if required.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service