Research Aide-Crabtree - MCS - Patil, Vipul - 3.13.26.

Argonne National Laboratory•Lemont, IL

2d•$25 - $34

About The Position

Multi-modal vector databses (such as LanceDB) are quickly emerging to handle the explosive growth of unstructured data across diverse formats and modalities, which is especially relevant for scientific applications at DOE. Tthe majority of state of art efforts are dedicated to optimizing read queries against such databases, notably approximate nearest neighbor search (ANN). How to quicky insert new items belonging to different modalities and update indices to maintain fast lookup performance without compromising accuracy is a relatively open question. Part of the challenge is the need to run a non-trivial two-stage pipeline: AI models first compute embeddings (for single modalities or multiple modalities that end up in the same embedding space), then various techniques are used to insert the embeddings into vector databases. This project will study the overheads involved at each stage, characterize bottlenecks and overlapping opportunities, and finally design end-to-end asynchronous techniques that take advantage of overlapping opportunities that optimize the two-stage pipeline. At a technical level, it will bridge high-level (Python/Torch) with low-level (C++/Rust) abstractions needed to implement the pipeline efficiently.

Requirements

The entirety of the appointment must be conducted within the United States.
Applicants must be:
Currently enrolled in undergraduate or graduate studies at an accredited institution.
Graduated from an accredited institution within the past 3 months; or
Actively enrolled in a graduate program at an accredited institution.
Must be 18 years or older at the time the appointment begins.
Must possess a cumulative GPA of 3.0 on a 4.0 scale.
If accepting an offer, candidates may be required to complete pre-employment drug testing based on appointment length. All students remain subject to applicable drug testing policies.
Must complete a satisfactory background check.