Build an AI Knowledge Base with Yavy

An AI knowledge base lets you search your documentation by meaning, not keywords. Set one up with Yavy by creating a project, adding a source (website, GitHub, or Notion), waiting for indexing, and running your first query. The whole process takes under five minutes.

What Makes an AI Knowledge Base Different

Standard search matches words. An AI knowledge base matches meaning.

Ask "how do I reset my password?" and a keyword search only finds pages containing those exact words. An AI knowledge base finds "recovering account access" too, because it understands the intent behind the query.

Here's why that matters: your users rarely phrase questions the way your docs are written. They search like humans. An AI knowledge base meets them there.

Yavy builds this by converting each page into a vector embedding, a numerical representation of its meaning. When a query comes in, Yavy finds pages with similar meanings rather than similar words.

Step 1: Create Your First Project

Sign in and click "New Project." Give it a name tied to what you're indexing, "API Docs," "Help Center," or "Internal Wiki" all work. The name is for your reference only and has no effect on search results.

You can run as many projects as you need. Most teams create one project per product or documentation site, so results stay focused. Mixing unrelated content into a single project degrades result quality because the search has to rank across a wider, less coherent set of pages.

Step 2: Add a Source to Your AI Knowledge Base

Yavy supports three source types:

Website - crawls any public URL. Works with custom domains, GitBook, ReadTheDocs, and most documentation platforms.
GitHub - pulls markdown files directly from a repository. The right choice for docs-as-code workflows.
Notion - syncs pages from a Notion workspace. Best for internal wikis your team maintains in Notion.

For your first project, start with a website source. Paste your documentation URL and Yavy begins crawling immediately. No configuration required.

Step 3: Watch the Indexer Work

Indexing speed depends on page count. A 50-page site finishes in under two minutes. A 500-page site takes around ten.

The project dashboard shows live progress: pages discovered, pages indexed, and any errors. Common errors include pages that returned 404 or pages where the crawler was blocked.

What that means: if you see a high error count, your site is probably blocking the crawler via robots.txt or rate limiting. Check the crawl log for repeated failures on the same URLs.

Step 4: Run Your First Query

Once indexing completes, open the built-in search on the project dashboard. Type a question in plain English.

Good first queries to try:

A question a new user would ask on day one
A feature description instead of a feature name ("how do I send an email" instead of "SMTP configuration")
A vague query, to see how well the results hold up without exact wording

Basically: you want to stress-test the index before you connect it to anything. A query that returns three highly relevant results is a strong signal the indexing worked. A query that returns the homepage and two nav pages is a signal to clean up what got indexed.

If results look off, check what the crawler indexed. Navigation pages and search pages sometimes get picked up and dilute results. Use the URL exclusion filters to block those patterns, then re-index.

How to Connect Your Knowledge Base to an AI

A working knowledge base is useful on its own. Connected to an AI assistant, it becomes something your team actually relies on daily.

Yavy gives you two connection options:

MCP server. Exposes your knowledge base as a tool that Claude or another AI can call during a conversation. The AI retrieves relevant pages in real time before generating a response. This is the right starting point for most teams.

Skills package. A portable, offline bundle. No internet connection required at query time. Useful for air-gapped environments or when you want a self-contained deployment.

Both are covered in their own guides. Start with the MCP server unless you have a specific reason to go offline.

Fixing Common First-Run Problems

The crawler found 0 pages. Your site likely requires JavaScript to render content. Enable the JavaScript rendering option in source settings and re-trigger indexing.

Results are irrelevant. Your documentation has too much boilerplate getting indexed: navigation bars, footers, cookie banners. Add URL exclusion patterns to skip those sections.

Indexing never finishes. Check the crawl log for repeated errors on the same domains. Your site is probably blocking the crawler via robots.txt or returning rate-limit responses.

If the dashboard does not explain the issue, the error log usually does. Support chat is available if you are still stuck after checking both.