Senior Machine Learning Engineer

DocusignSan Francisco, CA
3dHybrid

About The Position

Docusign brings agreements to life. Over 1.5 million customers and more than a billion people in over 180 countries use Docusign solutions to accelerate the process of doing business and simplify people’s lives. With intelligent agreement management, Docusign unleashes business-critical data that is trapped inside of documents. Until now, these were disconnected from business systems of record, costing businesses time, money, and opportunity. Using Docusign’s Intelligent Agreement Management platform, companies can create, commit, and manage agreements with solutions created by the #1 company in e-signature and contract lifecycle management (CLM). We are looking for a Senior Machine Learning Engineer to redefine how we operate our global services. You won't just be building dashboards; you will be building the "brain" of our infrastructure. We are moving beyond simple anomaly detection. We are building a self-healing ecosystem where Multi-Agent Systems and Reinforcement Learning (RL) loops work in tandem with Large Language Models (LLMs) to not only detect incidents in real-time but to troubleshoot and resolve them autonomously. If you are passionate about applying complex AI architectures to massive datasets (billions of telemetry points) to solve real-world reliability challenges, this is the role for you. This position is an individual contributor role reporting to the Sr. Director, Software Engineering.

Requirements

  • 8+ years of professional experience in Machine Learning Engineering or Data Science
  • Experience with PyTorch or TensorFlow, specifically regarding Time Series analysis (forecasting/anomaly detection) and NLP
  • Experience building applications using LLMs (RAG pipelines, LangChain, vector databases) specifically for technical domains (code analysis, log parsing)
  • Experience with RL concepts (policies, rewards, agents) and experience applying them to optimization or control problems
  • Experience with distributed data processing and streaming technologies (Apache Spark, Kafka, Flink)
  • Expereience with software engineering fundamentals (Python, C++, or Go), CI/CD for ML, and experience deploying models via APIs (FastAPI, Triton Inference Server)

Nice To Haves

  • Familiarity with the "three pillars" (Logs, Metrics, Traces) and tools like Prometheus, Grafana, OpenTelemetry, or Jaeger
  • Experience with frameworks like AutoGen, CrewAI, or Ray RLlib
  • Deep experience with AWS/GCP/Azure and Kubernetes (K8s) orchestration
  • A background in control theory or causal inference

Responsibilities

  • Design and implement autonomous multi-agent systems using Reinforcement Learning (RL) loops that can interact with our infrastructure to perform safe, automated remediation actions
  • Build GenAI agents capable of digesting logs, traces, and metrics to provide "Human-in-the-loop" root cause analysis and conversational debugging for our SREs
  • Develop and deploy deep learning models (Transformers, LSTMs, etc.) for forecasting and anomaly detection on high-cardinality, high-volume time series data
  • Optimize inference pipelines to run with low latency on streaming telemetry data (Kafka/Flink), ensuring we catch issues the moment they happen
  • Own the lifecycle of your models—from feature engineering on petabyte-scale datasets to training, deployment, and monitoring in production Kubernetes environments
  • Collaborate with Applied Scientists to translate bleeding-edge research (e.g., causal inference, decision transformers) into production-hardened AIOps tools

Benefits

  • Bonus: Sales personnel are eligible for variable incentive pay dependent on their achievement of pre-established sales goals. Non-Sales roles are eligible for a company bonus plan, which is calculated as a percentage of eligible wages and dependent on company performance.
  • Stock: This role is eligible to receive Restricted Stock Units (RSUs).
  • Global benefits provide options for the following:
  • Paid Time Off: earned time off, as well as paid company holidays based on region
  • Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement
  • Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment
  • Retirement Plans: select retirement and pension programs with potential for employer contributions
  • Learning and Development: options for coaching, online courses and education reimbursements
  • Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service