About The Position

Do you want to build a career that is truly worthwhile? The World Bank Group is a unique global partnership of five institutions driven by a bold vision to create a world free of poverty on a livable planet. As one of the largest sources of funding and knowledge for developing countries, we help solve the world’s greatest development challenges. When you join the World Bank Group, you become part of a dynamic, diverse organization with 189 member countries and more than 120 offices worldwide. We work with public and private sector partners, invest in groundbreaking projects, and use data, research, and technology to bring tangible and transformative changes around the globe. For more information, visit www.worldbank.org. The Development Economics Vice Presidency (DEC) is the World Bank’s central reservoir of fresh insights into the most pressing challenges of development. It is led by the Chief Economist and Senior Vice President of the World Bank Group, who advises the President and senior managers, serves as the community leader for the WBG’s economists, and helps to keep the institution in the forefront of thinking about development policy. DEC constitutes one of the world’s largest teams of economists focused on policy solutions for developing economies. The official development data, covering poverty, jobs, education, climate, and other critical domains, is foundational to evidence-based policymaking and informed public discourse. Despite growing availability, many users continue to face challenges in discovering, interpreting, and trusting this information. At the same time, increased reliance on AI systems has amplified risks of misinformation, as models often draw from secondary or unofficial sources, widening the authority gap. To address this challenge, the AI for Data – Data for AI Team in the Development Data Group (DECDG) and the Office of the World Bank Group Chief Statistician, with funding support from the World Bank Group Innovation Awards and the Global Data Facility, is developing an open-sourced MCP-centric, AI-driven Data Chat application for Data360. This initiative leverages the Model Context Protocol (MCP) to connect large language models (LLMs) directly to authoritative, AI-ready development data and metadata, enabling trustworthy, transparent, and auditable AI-mediated access. MCP is emerging as a foundational standard for exposing trusted resources to AI systems. DECDG is therefore investing not only in technical implementation, but also in evaluation frameworks, governance patterns, and reusable guidance that can support broader institutional adoption across the World Bank and partner organizations. To advance this work, DECDG seeks a Full Stack AI and Machine Learning Engineer to contribute to the implementation of the Data360 MCP, development of the Data360 Chat application, technical evaluation, and the documentation of processes and guides related to MCP for Development Data. The objective of this Extended Term Consultant assignment is to provide sustained Full Stack AI and machine learning engineering expertise to advance MCP-enabled systems and applied AI-use cases for development data, with a focus on: • End-to-end design, implementation, and maturation of MCP integrations for Data360 Chat. • Development of robust evaluation (“evals”) infrastructure for LLM-based systems using MCP. • Institutionalization of best practices through reusable components, documentation, and guidance. • Supporting Innovation Awards deliverables while laying foundations for long-term operational use. The ETC will serve as a core technical contributor and steward for MCP-based AI systems and AI for Data use cases within DECDG.

Requirements

  • At least 5–7 years of professional experience in AI/ML engineering, applied machine learning, and backend system development, with demonstrated ownership of production-grade systems.
  • Bachelor’s or master’s degree in actuarial science, Business Administration, Computer Science, Finance, Mathematics, Physics, and other analytical fields.
  • Demonstrated hands-on experience with:
  • Protocol-based system design and API integration, including secure, modular, and well-documented service interfaces.
  • Retrieval-augmented and context-augmented LLM systems, including embeddings, vector databases, and structured context construction.
  • LLM tooling, agent architectures, and/or MCP-like frameworks, including tool invocation, orchestration, and context governance.
  • Application of synthetic data generation methods for data-scarce use cases.
  • Strong proficiency in Python, with experience across the modern AI/ML stack, including:
  • Transformers and LLM ecosystems.
  • Vector databases and semantic retrieval.
  • Human-in-the-loop and agentic AI workflows.
  • Proven experience designing, implementing, and maintaining model evaluation frameworks, including:
  • Automated and human-in-the-loop evals.
  • Model performance monitoring, error analysis, and iterative improvement.
  • Responsible AI guardrails (accuracy, robustness, transparency).
  • Solid experience working with metadata-rich and structured data platforms, including integrating heterogeneous data sources into ML or AI systems.
  • Demonstrated ability to translate complex technical systems into clear technical documentation, guidance, and implementation notes.
  • Strong collaboration skills and experience working effectively with multidisciplinary teams, including data specialists, engineers, product owners, and non-technical stakeholders.

Nice To Haves

  • Experience working with international development data, public-sector data, or large-scale statistical systems.
  • Familiarity with authoritative data platforms such as Data360, UN data systems, National Statistical Office (NSO) environments, or similar.
  • Experience with user-centered or usability-driven evaluation of AI systems, including prompt testing, qualitative assessment, or UX-informed iteration.
  • Hands-on experience with cloud infrastructure and DevOps practices, including:
  • Containerization (Docker).
  • Orchestration platforms (e.g., Kubernetes).
  • Cloud platforms (AWS, Azure, or GCP).
  • CI/CD or ML lifecycle tooling (e.g., MLflow or equivalents).
  • Experience deploying and maintaining production ML systems, including model tracking, monitoring, and retraining workflows.
  • Demonstrated engagement with the open-source ecosystem, including contributions to AI, ML, or data infrastructure projects.
  • Prior experience mentoring or guiding junior engineers or data scientists is an asset.

Responsibilities

  • MCP Architecture, Integration, and System Development. The consultant will contribute to:
  • Designing, implementing, and maintaining MCP-based integrations with Data360 data and metadata.
  • Developing and refining MCP tools and resources that expose authoritative development data through clearly defined business logic.
  • Supporting the evolution of Data360 Chat from prototype to production-ready architecture.
  • Ensuring MCP implementations support transparency, traceability, and responsible AI principles.
  • Collaborating closely with platform engineers, data teams, and product owners to ensure alignment with Data360 standards and APIs.
  • LLM Evaluation, Evals Infrastructure, and Continuous Improvement The consultant will design and operationalize evaluation frameworks for MCP-enabled LLM systems and related applications, including:
  • Automated eval pipelines to assess.
  • Grounding in authoritative data.
  • Correctness of tool usage and MCP compliance.
  • Robustness, latency, and failure modes.
  • Development of synthetic and real-world prompt corpora representative of development data use cases.
  • Human-in-the-loop evaluation workflows using structured rubrics and LLM-as-a-Judge.
  • Iterative optimization of MCP configurations, prompts, tools, and resources informed by evaluation results.
  • Documentation of evaluation findings suitable for internal governance, Innovation Awards reporting, and future reuse.
  • Knowledge Assets, Guidance, and Institutionalization. The consultant will contribute to the long-term sustainability and scalability of MCP adoption by:
  • Contributing to the development and maintenance of an “MCP for Development Data” Handbook, covering:
  • oArchitectural patterns.
  • oGovernance considerations.
  • oEvaluation methodologies.
  • oExample implementations.
  • Developing reusable MCP templates, sample tools, and reference deployments.
  • Supporting internal capacity building through documentation, presentations, and knowledge-transfer sessions.
  • Advising other World Bank teams exploring MCP-based AI solutions.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service