Keyword search, the default entry point for most digital collections, struggles with a corpus like the Islam West Africa Collection. The term "hadj" retrieves both articles about the pilgrimage to Mecca and those mentioning individuals bearing the honorific "El Hadj." A single imam may appear under more than twenty spelling variants. Searches miss synonyms and conceptual relationships: a query for "hajj" will not surface documents discussing "pilgrimage to Mecca" unless those exact words appear.

This Model Context Protocol (MCP) server and its companion agent skill address these barriers by giving AI assistants structured, read-only access to the entire IWAC through the Model Context Protocol — an open standard that does for AI what the International Image Interoperability Framework (IIIF) does for images: a decentralised interface that preserves institutional data sovereignty while enabling discovery.

What the server provides

Sixteen read-only tools let any MCP-compatible AI assistant — Claude, ChatGPT, or open-source models — search and retrieve data from the collection without API credentials. The tools cover:

  • Article search: Find newspaper articles by keyword, country, newspaper, subject, date range, or any combination
  • Full article retrieval: Access complete metadata and OCR text for individual articles
  • Authority records: Query the index of persons, places, organisations, events, and subjects, with frequency statistics and date ranges
  • Sentiment analysis: Retrieve and compare polarity, centrality, and subjectivity assessments from three AI models (Gemini, ChatGPT, Mistral) for any article
  • Collection statistics: Get aggregate data on corpus size, newspaper coverage, and cross-country comparisons

The agent skill: from plumbing to intelligence

A server alone provides plumbing — raw access to data. The companion agent skill, a modular instruction set written in Markdown, encodes the domain knowledge that transforms tool access into methodical research. It structures the AI's reasoning into a five-phase workflow inspired by archival research practice:

  1. Scoping: Assess collection coverage for a given topic across countries, newspapers, and time periods
  2. Systematic searching: Deploy French-language variants and Arabic-Islamic transliteration alternatives to overcome the vocabulary problem
  3. Deep reading: Retrieve and analyse full-text articles, comparing sentiment assessments across AI models
  4. Triangulation: Cross-reference findings across subsets — articles, index entries, publications, references — to build a fuller picture
  5. Synthesis: Produce structured findings with source attribution and confidence grading

The skill also documents collection biases — which countries and decades have stronger coverage, which newspapers are over- or under-represented — so that the AI can contextualise its findings rather than present them as exhaustive.

Why MCP rather than fine-tuning?

Approaches like fine-tuning or retrieval-augmented generation require surrendering data to external training pipelines. MCP takes a different path: the collection stays on its own infrastructure while AI assistants query it through defined, inspectable tools. For African digital collections, historically subject to extractive knowledge practices, this matters. The institution retains full control over its data. The skill layer makes every assumption transparent and versionable. And because MCP is an open standard, any AI model — commercial or open-source — can use the same tools without vendor lock-in.

View on GitHub