NLP Data Engineering Intern

iManageChicago, IL
3dOnsite

About The Position

iManage U provides students the chance to experience a dynamic, rapid growth technology company firsthand. iManage will provide a structured program which delivers project-based activities, improved knowledge of business fundamentals, tackling complex problem solving, collaboration, team building, and some fun experiences along the way! This year, our paid internship program will kick-off on Monday, June 8th and will run through Thursday, August 13th. This internship will be based out of our downtown Chicago office, with activities requiring in-person presence. Goals of the Program: iM Making An Impact: Leave your mark on your team by owning and completing assigned projects iM A Mentee: Learn from teammates across departments & gain perspectives from a diversity of people iM A Connector: Meet & connect with as many interns and iManage employees as possible iM Inspired: Learn from our leadership team and ask questions during our lunch and learns iM Social: Enjoy intern events, and everything iManage has to offer this summer Being an NLP Data Engineering intern at iManage means… You are excited about transforming unstructured text into meaningful insights that power AI and machine learning solutions. You thrive at the intersection of data engineering and natural language processing and are eager to contribute to the pipelines and datasets that fuel generative AI applications, agentic systems, and other NLP-driven capabilities across iManage. As an NLP Data Engineering Intern on the AI and knowledge engineering team, you will get hands-on experience designing, building, and optimizing text data pipelines that power AI/ML and Generative AI solutions for our customers. You’ll collaborate with knowledge engineering, applied AI, and product teams to help prepare, enrich, and integrate document data. Your contributions will be essential to enabling intelligent, AI-powered features across the iManage platform.

Requirements

  • Current enrollment in a Master’s, or PhD program in Computer Science, Data Engineering, Data Science, Applied Mathematics, Computational Linguistics, or a related quantitative field
  • Proficiency in Python and experience using it to extract, structure, classify, and analyze text data
  • Foundational understanding of NLP concepts such as tokenization, embeddings, and semantic search
  • Familiarity with standard NLP libraries such as SpaCy, HuggingFace Datasets, or NLTK
  • Solid knowledge of data structures, algorithms, and statistics
  • Proficiency with Git and collaborative development workflows
  • A passion to learn and improve, and an eagerness to share knowledge with colleagues
  • Problem-solving, creativity, curiosity, and a collaborative mindset

Nice To Haves

  • Exposure to Microsoft Azure services such as Fabric, ADLS, AI Foundry, or Azure ML
  • Experience with data pipeline orchestration or workflow automation tools like Databricks
  • Familiarity with knowledge graphs or semantic data modeling

Responsibilities

  • Performing exploratory analyses on large text corpora and developing preprocessing pipelines for training and evaluation data
  • Supporting the design of automated workflows for text normalization, deduplication, language identification, PII redaction, and metadata enrichment
  • Assisting with building automated data validation processes to ensure accuracy and consistency of NLP datasets
  • Contributing to dataset curation, prompt dataset preparation, labeling coordination, and text quality validation to support model fine-tuning, semantic search, and Gen AI evaluations
  • Partnering with the Applied AI team to understand data requirements and help build data interfaces for machine learning systems
  • Learning and applying data lineage best practices and data privacy, security, and governance principles
  • Maintaining highest quality standards through processes that identify and correct mistakes and inconsistencies
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service