Staff Data Engineer (Agent Systems)

Faraday FutureGardena, CA
11h

About The Position

As a Staff Data Engineer (Agent Systems) in our Crypto projects, you will design, deliver, and operate the data platform that powers our agentic products—real-time ingestion (on-chain/market/social), feature stores, vectorization & retrieval for RAG, time-series/streaming computation, and ML observability. You’ll define data contracts and SLAs, ensure offline–online consistency, and partner closely with AI Agents, Backend/BFF, and Security/Compliance.

Requirements

  • Bachelor’s degree or above in CS/EE/Math/Stats or related.
  • 7+ years in data engineering with 3+ years building streaming pipelines/feature stores for production systems.
  • Proficient in Python and SQL, plus one of Java/Scala/Go, strong data modeling and performance tuning.
  • Streaming & batch: Kafka, Flink/Spark (stateful ops, event-time, watermarking, exactly-once), Airflow/Dagster, dbt.
  • Storage: PostgreSQL/MySQL, ClickHouse/BigQuery/Snowflake, NoSQL (MongoDB/DynamoDB/Bigtable/Firestore), Redis, and lakehouse on Amazon S3 or Google Cloud Storage (GCS) (Parquet + Iceberg/Delta).
  • Feature platforms (e.g., Feast) and online feature serving; offline–online consistency validation.
  • Vector retrieval: embeddings pipelines and vector stores (pgvector/FAISS/Milvus); relevance & recency metrics.
  • Ops/Observability: Docker/K8s; data quality/lineage (OpenLineage/Marquez or similar); cost & throughput optimization.

Nice To Haves

  • Crypto-signal ingestion (order books, trades, on-chain events); precision arithmetic and idempotent metrics.
  • Privacy/compliance (GDPR/CCPA), tokenization/pseudonymization strategies.
  • Cost/perf tuning (autoscaling, compaction/retention, caching) and SRE collaboration.

Responsibilities

  • Platform Architecture: Author data contracts and schemas; produce ADRs; design tiered storage (OLTP/OLAP, lakehouse) with governance and lineage.
  • Streaming & Batch Pipelines: Build low-latency streams (Kafka/Flink or equivalent) and robust batch ETL (Airflow/Dagster + dbt); support CDC, replay/backfill, and schema evolution.
  • Feature Store & Online Serving: Provide point-in-time-correct features (near-real-time/time-series); guarantee offline–online parity and latency SLOs.
  • RAG Data Plane: Orchestrate embedding pipelines, chunking/routing, vector DB (pgvector/FAISS/Milvus), HNSW/IVF indexes, and reindexing/TTL strategies.
  • Evaluation & ML Ops: Materialize canonical eval datasets/labels; wire A/B hooks; manage model/feature registries and CI/CD for ML; enable canary rollouts.
  • Data Quality & Observability: Monitor freshness, completeness, duplication, drift/decay; implement lineage and cost/performance guardrails.
  • Security & Compliance: Enforce PII handling, retention, and auditability; implement least-privilege access to datasets and secrets.
  • Collaboration: Work hand-in-hand with DS/Agents/Backend on interfaces, SLAs, and incident RCAs; document playbooks and standards.

Benefits

  • Healthcare + dental + vision benefits (Free for you/discounted for family)
  • Casual dress code + relaxed work environment
  • Culturally diverse, progressive atmosphere
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service