NLP Data Engineering Intern

iManage•Chicago, IL

3d•Onsite

About The Position

iManage U provides students the chance to experience a dynamic, rapid growth technology company firsthand. iManage will provide a structured program which delivers project-based activities, improved knowledge of business fundamentals, tackling complex problem solving, collaboration, team building, and some fun experiences along the way! This year, our paid internship program will kick-off on Monday, June 8th and will run through Thursday, August 13th. This internship will be based out of our downtown Chicago office, with activities requiring in-person presence. Goals of the Program: iM Making An Impact: Leave your mark on your team by owning and completing assigned projects iM A Mentee: Learn from teammates across departments & gain perspectives from a diversity of people iM A Connector: Meet & connect with as many interns and iManage employees as possible iM Inspired: Learn from our leadership team and ask questions during our lunch and learns iM Social: Enjoy intern events, and everything iManage has to offer this summer Being an NLP Data Engineering intern at iManage means… You are excited about transforming unstructured text into meaningful insights that power AI and machine learning solutions. You thrive at the intersection of data engineering and natural language processing and are eager to contribute to the pipelines and datasets that fuel generative AI applications, agentic systems, and other NLP-driven capabilities across iManage. As an NLP Data Engineering Intern on the AI and knowledge engineering team, you will get hands-on experience designing, building, and optimizing text data pipelines that power AI/ML and Generative AI solutions for our customers. You’ll collaborate with knowledge engineering, applied AI, and product teams to help prepare, enrich, and integrate document data. Your contributions will be essential to enabling intelligent, AI-powered features across the iManage platform.

Requirements

Current enrollment in a Master’s, or PhD program in Computer Science, Data Engineering, Data Science, Applied Mathematics, Computational Linguistics, or a related quantitative field
Proficiency in Python and experience using it to extract, structure, classify, and analyze text data
Foundational understanding of NLP concepts such as tokenization, embeddings, and semantic search
Familiarity with standard NLP libraries such as SpaCy, HuggingFace Datasets, or NLTK
Solid knowledge of data structures, algorithms, and statistics
Proficiency with Git and collaborative development workflows
A passion to learn and improve, and an eagerness to share knowledge with colleagues
Problem-solving, creativity, curiosity, and a collaborative mindset

Nice To Haves

Exposure to Microsoft Azure services such as Fabric, ADLS, AI Foundry, or Azure ML
Experience with data pipeline orchestration or workflow automation tools like Databricks
Familiarity with knowledge graphs or semantic data modeling

Responsibilities

Performing exploratory analyses on large text corpora and developing preprocessing pipelines for training and evaluation data
Supporting the design of automated workflows for text normalization, deduplication, language identification, PII redaction, and metadata enrichment
Assisting with building automated data validation processes to ensure accuracy and consistency of NLP datasets
Contributing to dataset curation, prompt dataset preparation, labeling coordination, and text quality validation to support model fine-tuning, semantic search, and Gen AI evaluations
Partnering with the Applied AI team to understand data requirements and help build data interfaces for machine learning systems
Learning and applying data lineage best practices and data privacy, security, and governance principles
Maintaining highest quality standards through processes that identify and correct mistakes and inconsistencies

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume