Semantic Tool Selection in Practice: A Step-by-Step Guide with MCP Connect Inspector UI

Set up semantic tool selection from scratch and reduce token usage by 89% while maintaining 100% accuracy.

When you connect an AI agent to an MCP server with dozens of tools, every request sends the entire tool catalog to your LLM. GitHub's MCP server alone exposes 93 operations. Ask Claude to "create an issue," and it receives definitions for fork_repository, get_code_scanning_alert, list_commits—tools it will never use for that request.

This isn't just inefficient. It's expensive, slow, and can degrade response quality as models struggle with information overload.

MCP Connect Inspector UI solves this with semantic tool selection—a vector-based approach that matches your prompt to the most relevant tools before sending anything to the LLM. MCP Connect Inspector UI is a free, open-source developer GUI for working with MCP servers, and semantic tool search is built right in. Under the hood, it uses MCP-RAG, our open-source library for semantic tool discovery (read more in our deep-dive blog post). In our benchmarks, this reduced token usage by 89% and response times by 62%, while maintaining 100% accuracy.

In this guide, you'll set up semantic tool selection from scratch and see the results in real-time.

What You'll Need

MCP Connect Inspector UI (free, open-source, running in your browser)
An MCP server with tools to test (we'll use GitHub's server with 93 tools)
Neo4j Aura account (free tier works) or local Neo4j instance
OpenAI API key (for generating embeddings)

Getting the Prerequisites Running

Before diving in, you'll need the GitHub MCP server and Neo4j running locally. We have dedicated guides for both:

How to MCP Connect to GitHub — Set up the GitHub MCP server with all 93 tools
How to MCP Connect to Neo4j — Configure Neo4j for vector search

Following these guides will get all the dependencies running locally on your machine.

Running MCP Connect Inspector UI Locally

While you can use mcp.rconnect.tech directly, we recommend running MCP Connect Inspector UI locally to avoid CORS issues when connecting to local MCP servers:

bash

npx @mcpconnect/cli

This starts the inspector on http://localhost:3001 and opens your browser automatically. Local servers like the GitHub MCP server connect without any CORS configuration needed.

MCP Connect Inspector UI interface showing semantic tool selection overview

How It Works (30-Second Overview)

Traditional tool selection sends everything:

User: "Show me the README for rocket-connect/mcp-rag"
     ↓
LLM receives: 93 tool definitions (47,000+ tokens)
     ↓
LLM picks: get_file_contents

Semantic selection is smarter:

User: "Show me the README for rocket-connect/mcp-rag"
     ↓
Vector search: Find tools semantically similar to prompt
     ↓
Top matches: get_file_contents, get_repository, list_commits
     ↓
LLM receives: 3 tool definitions (~1,000 tokens)
     ↓
LLM picks: get_file_contents

The magic happens in that vector search step. MCP Connect Inspector UI embeds your prompt and compares it against pre-computed embeddings for every tool, parameter, and return type. Neo4j handles the similarity search at scale.

But Doesn't This Add Latency?

At first glance, adding a vector search layer before each LLM call seems like it would slow things down. You're adding an extra network hop, right?

For simple one-off operations, the overhead might seem negligible either way. But consider real-time, dynamic use cases: an MCP server connected to a live phone call, a video stream, or any continuous interaction where your agent processes dozens or hundreds of requests in sequence.

In these scenarios, token reduction compounds dramatically. Every request with fewer tokens means lower costs, faster responses, and critically—reduced chance of hallucinations from information overload. When your agent is handling a 30-minute customer support call, those cumulative improvements translate directly into better experiences for your users.

The benchmark results demonstrate the impact clearly: significant token reduction with maintained accuracy across all test cases. The semantic layer pays for itself quickly, especially in high-frequency, real-time applications.

Step 1: Connect Your MCP Server

Open MCP Connect Inspector UI and connect to your MCP server. For this guide, we'll assume you're using a server with multiple tools—GitHub's MCP server is ideal for demonstrating the impact.

Once connected, you should see your tools listed in the interface. Note the count—this is what would normally be sent to your LLM on every request.

MCP Connect Inspector UI showing connected MCP server with tool list

Step 2: Configure OpenAI

Important: Semantic tool selection requires OpenAI as your LLM provider. MCP Connect Inspector UI uses OpenAI's text-embedding-3-small model to generate the vector embeddings that power semantic search.

In Settings, find the LLM Configuration section
Select OpenAI as your provider
Enter your OpenAI API key

This key is used for two purposes:

Generating embeddings when you sync tools
Chat completions when you interact with tools

MCP Connect Inspector UI settings panel showing OpenAI API key configuration

Once OpenAI is configured, you'll see the Semantic Tool Selection panel appear in the left sidebar. This is where you'll configure Neo4j and manage your vector sync.

MCP Connect Inspector UI sidebar showing Semantic Tool Selection panel after OpenAI configuration

Step 3: Configure Neo4j

Click on the Semantic Tool Selection panel in the sidebar to open the settings modal.

Enter your Neo4j connection details:
- URI: Your Neo4j connection string (e.g., neo4j+s://xxxxx.databases.neo4j.io for Aura)
- Username: neo4j (default)
- Password: Your Neo4j password
- Database: Leave blank for default
Click Test Connection to verify your credentials
Once the connection test passes, you're ready to sync

MCP Connect Inspector UI Neo4j configuration modal with connection fields and test button

Step 4: Vectorize Your Schema

With Neo4j connected, click the Sync button. MCP Connect Inspector UI will:

Convert each tool to a standardized format
Generate embeddings for tool names, descriptions, parameters, and return types
Store everything in Neo4j with vector indexes

You'll see progress as tools are processed. For 93 GitHub tools, this typically takes 30-60 seconds on first sync.

Once complete, the status shows:

✅ Synced
Tool count: Number of tools indexed
Hash: A fingerprint of your schema (used to detect changes)

What's Being Stored?

MCP Connect Inspector UI uses a decomposed approach to indexing. Instead of one embedding per tool, it creates embeddings for:

The tool itself (name + description)
Each parameter (name + description)
Return types

This granular indexing means a query like "add a comment" matches not just tools with "comment" in their name, but also tools with comment-related parameters.

In Neo4j, this creates a graph:

cypher
(ToolSet)-[:HAS_TOOL]->(Tool)
(Tool)-[:HAS_PARAM]->(Parameter)
(Tool)-[:RETURNS]->(ReturnType)

Each node has its own vector embedding.

Neo4j browser showing the tool graph structure with ToolSet, Tool, Parameter, and ReturnType nodes

Step 5: Test Semantic Selection

Now for the fun part. Open a chat with your connected MCP server and try a natural language query:

Query: "What's in the README of rocket-connect/mcp-rag?"

Watch what happens:

Semantic search starts — MCP Connect Inspector UI embeds your prompt
Vector search runs — Neo4j finds similar tool embeddings
Results appear — You see which tools were selected and their relevance scores
LLM responds — Using only the selected tools

The UI shows a Vector Search Tool Card with:

Selected tools ranked by relevance
Relevance percentages (e.g., get_file_contents: 94%)
Search duration
Context reduction (e.g., "Reduced context by 89%")

MCP Connect Inspector UI chat showing vector search tool card with relevance scores and context reduction

Try Different Queries

Query	Expected Top Tools

"Get pull request #1 from rocket-connect/mcp-rag"	`get_pull_request`
"List all open issues in rocket-connect/mcpconnect"	`list_issues`, `search_issues`
"Show me the recent commits on mcp-rag"	`list_commits`
"What's in the README?"	`get_file_contents`
"Get the repo details for rocket-connect/mcp-rag"	`get_repository`

Step 6: Compare Before and After

To really appreciate the improvement, compare metrics with and without semantic selection.

Without Semantic Selection (Baseline)

Metric	Value

Tools sent to LLM	93
Prompt tokens	~47,000
Response time	~4,500ms

With Semantic Selection

Metric	Value

Tools sent to LLM	5-10
Prompt tokens	~1,000-2,000
Response time	~1,700ms

Results:

89% reduction in token usage
62% faster responses
100% accuracy maintained

These aren't theoretical—they're from our benchmark suite running real queries against the full GitHub toolset. View the complete benchmark results on GitHub.

Development vs Production

MCP Connect Inspector UI is a free, open-source local development tool. It's designed to help you experiment with MCP servers, debug tool interactions, and test semantic selection—all from your browser during development.

When you're juggling multiple MCP servers with dozens of tools each, semantic selection in MCP Connect Inspector UI helps you:

Validate that your prompts match the right tools
Fine-tune max tool counts and relevance thresholds
Debug why certain tools aren't being selected
Measure token savings before committing to production

Taking It to Production

For production deployments, use MCP-RAG directly. It's the same open-source library that powers MCP Connect Inspector UI's semantic selection, but designed to wrap your AI SDK in your own applications.

typescript
import { createMCPRag } from '@mcp-rag/client';
import { openai } from '@ai-sdk/openai';

const rag = createMCPRag({
  model: openai('gpt-4'),
  neo4j: driver,
  tools: yourMCPTools,
});

// Sync once at startup
await rag.sync();

// Use exactly like the AI SDK, but with semantic selection
const result = await rag.generateText({
  prompt: userQuery,
  maxActiveTools: 10,
});

MCP-RAG is a drop-in wrapper around the AI SDK. Your existing code stays the same—you just get automatic semantic tool filtering on every request.

Conclusion

If you're building with MCP and your agents have access to more than a handful of tools, semantic selection isn't optional—it's essential. The token savings alone justify the setup, but the real win is faster, more focused responses.

Use MCP Connect Inspector UI during development to experiment and validate. When you're ready for production, wrap your AI SDK with MCP-RAG and deploy with confidence.

Try it now: MCP Connect Inspector UI

Questions? Find us on Twitter @dan_starns or open an issue on GitHub.

Resources

MCP Connect Inspector UI — Try semantic tool selection in your browser
MCP Connect GitHub — Source code for MCP Connect Inspector UI
MCP-RAG Repository — Open source library for production deployments
Benchmark Results — View the raw performance data
Neo4j Aura — Free cloud vector database
Semantic Tool Discovery Blog — Deep-dive into the MCP-RAG architecture

Dan Starns is the Founder & CTO of Rocket Connect, former core contributor to Neo4j, and builder of developer tools that make AI agents more efficient. Currently based in Southeast Asia, organizing developer events and pushing the boundaries of MCP tooling.