Principal Associate, Data Scientist - US Card (Applied GenAI)

Capital One•McLean, VA

About The Position

Data is at the center of everything we do. As a startup, we disrupted the credit card industry by individually personalizing every credit card offer using statistical modeling and the relational database, cutting edge technology in 1988! Fast-forward a few years, and this little innovation and our passion for data has skyrocketed us to a Fortune 200 company and a leader in the world of data-driven decision-making. As a Data Scientist at Capital One, you’ll be part of a team that’s leading the next wave of disruption at a whole new scale, using the latest in computing and machine learning technologies and operating across billions of customer records to unlock the big opportunities that help everyday people save money, time and agony in their financial lives. Team Description: The Servicing Intelligence team delivers data science solutions to capture value from unstructured, multi-modal data sources — text, image, and audio data. We operate as an applied data science team, building with open source generative AI models and tooling, but prioritizing application over research to scale the adoption of AI with in-market solutions. You will sit on a team of data scientists that collaborates daily with product, tech, and business teams to embed AI in varied domains, including frontline agent servicing, back office document processing, AI for regulatory compliance, and overall customer experience. Your work will apply generative AI on millions of inputs, spanning from extracting key information from unstructured documents to analyzing call transcripts to resolve the root cause of customer friction. Role Description: In this role, you will: Apply expertise in unstructured data (text, image) to harness the power of open source large language models (LLMs) and visual language models (VLMs) Leverage a broad stack of technologies — LangGraph, LlamaIndex, Weights and Biases Weave, Hugging Face, PyTorch, AWS, and more — to automate workflows using huge volumes of text and vision data Build machine learning and NLP models through all phases of development, from design through training, evaluation, and validation; partnering with engineering teams to operationalize them in scalable and resilient production systems that serve 80+ million customers. Assessing GenAI or LLM-Powered application architectures in production, including best practices for Generative AI development and deployments. Define requirements for AI observability, focusing on the traceability of autonomous decisions and comprehensive system audit trails. Evaluate the dynamic behavior of AI systems and oversee the development of key continuous monitoring controls and testing, ensuring that non-deterministic outputs and autonomous actions remain within risk appetite. Get into the weeds of internal business processes and data operations by guiding annotators to curate high quality, consistent datasets for model training, evaluation, and ongoing AI monitoring. Collaborate on a team of data scientists through all phases of project development, from design through training, evaluation, validation, implementation, and maintenance. Interact with a variety of internal stakeholders to ensure the alignment of data science solutions with business outcomes. The Ideal Candidate is: Customer first. You love the process of analyzing and creating, but also share our passion to do the right thing. You know at the end of the day it’s about making the right decision for our customers. Innovative. You continually research and evaluate emerging technologies. You stay current on published state-of-the-art methods, technologies, and applications and seek out opportunities to apply them. Creative. You thrive on bringing definition to big, undefined problems. You love asking questions and pushing hard to find answers. You’re not afraid to share a new idea. A leader. You challenge conventional thinking and work with stakeholders to identify and improve the status quo. You're passionate about talent development for your own team and beyond. Technical. You’re comfortable with open-source languages and are passionate about developing further. You have hands-on experience developing data science solutions using open-source tools and cloud computing platforms. Statistically-minded. You’ve built models, validated them, and backtested them. You know how to interpret a confusion matrix or a ROC curve. You have experience with clustering, classification, sentiment analysis, time series, and deep learning. A data guru. “Big data” doesn’t faze you. You have the skills to retrieve, combine, and analyze data from a variety of sources and structures. You know understanding the data is often the key to great data science.

Requirements

Currently has, or is in the process of obtaining one of the following with an expectation that the required degree will be obtained on or before the scheduled start date:
A Bachelor's Degree in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field) plus 5 years of experience performing data analytics
A Master's Degree in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field) or an MBA with a quantitative concentration plus 3 years of experience performing data analytics
A PhD in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field)

Nice To Haves

Master’s Degree in “STEM” field (Science, Technology, Engineering, or Mathematics) plus 3 years of experience in data analytics, or PhD in “STEM” field (Science, Technology, Engineering, or Mathematics)
At least 1 year of experience working with AWS
At least 3 years’ experience in Python, Scala, or R
At least 3 years’ experience with machine learning
At least 3 years’ experience with SQL
At least 2 years’ experience with relational databases
At least 2 years’ experience AI/ML tools and ecosystems, such as LangGraph, LlamaIndex, Weights and Biases Weave, Pytorch, or Hugging Face

Responsibilities

Apply expertise in unstructured data (text, image) to harness the power of open source large language models (LLMs) and visual language models (VLMs)
Leverage a broad stack of technologies — LangGraph, LlamaIndex, Weights and Biases Weave, Hugging Face, PyTorch, AWS, and more — to automate workflows using huge volumes of text and vision data
Build machine learning and NLP models through all phases of development, from design through training, evaluation, and validation; partnering with engineering teams to operationalize them in scalable and resilient production systems that serve 80+ million customers.
Assessing GenAI or LLM-Powered application architectures in production, including best practices for Generative AI development and deployments.
Define requirements for AI observability, focusing on the traceability of autonomous decisions and comprehensive system audit trails.
Evaluate the dynamic behavior of AI systems and oversee the development of key continuous monitoring controls and testing, ensuring that non-deterministic outputs and autonomous actions remain within risk appetite.
Get into the weeds of internal business processes and data operations by guiding annotators to curate high quality, consistent datasets for model training, evaluation, and ongoing AI monitoring.
Collaborate on a team of data scientists through all phases of project development, from design through training, evaluation, validation, implementation, and maintenance.
Interact with a variety of internal stakeholders to ensure the alignment of data science solutions with business outcomes.

Benefits

Capital One offers a comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being.
Learn more at the Capital One Careers website .
This role is also eligible to earn performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume