Configuration
ContextMCP is configured through a single config.yaml file in your project root.
Overview
The configuration file has four main sections:
- vectordb - Where to store embeddings
- embeddings - Which model to use
- sources - What documentation to index
- chunking - How to split content
Full Example
vectordb:
provider: pinecone
indexName: my-company-docs
namespace: production
pinecone:
cloud: aws
region: us-east-1
embeddings:
provider: openai
model: text-embedding-3-large
dimensions: 3072
sources:
- name: main-docs
displayName: "Documentation"
type: github
repository: myorg/docs
branch: main
parser: mdx
baseUrl: https://docs.example.com
skipDirs:
- node_modules
- .git
- name: api-reference
displayName: "API Reference"
type: github
repository: myorg/api
path: openapi
parser: openapi
baseUrl: https://docs.example.com/api
chunking:
maxChunkSize: 2000
minChunkSize: 250
idealChunkSize: 1000
reindex:
clearBeforeReindex: true
batchSize: 100Vector Database
Currently supports Pinecone. Your embeddings are stored here.
vectordb:
provider: pinecone
indexName: my-docs # Your Pinecone index name
namespace: production # Optional: namespace within index
pinecone:
cloud: aws # aws or gcp
region: us-east-1 # Your Pinecone regionEmbeddings
Configure the model used to generate vector embeddings.
embeddings:
provider: openai
model: text-embedding-3-large # or text-embedding-3-small
dimensions: 3072 # 3072 for large, 1536 for smallCost note: text-embedding-3-large costs ~$0.13 per 1M tokens. For a typical docs site (~500 files), expect ~$0.50-1.00 per full reindex.
Sources
Define where your documentation lives. You can have multiple sources.
GitHub Repository
sources:
- name: docs # Unique identifier
displayName: "My Docs" # Human-readable name
type: github
repository: owner/repo # GitHub repo
branch: main # Optional, defaults to main
path: docs/ # Optional, subdirectory
parser: mdx # mdx, markdown, or openapi
baseUrl: https://docs.example.com
skipDirs:
- node_modules
- .git
- images
skipFiles:
- CHANGELOG.mdParser Types
| Parser | Use For | Features |
|---|---|---|
mdx | MDX/Markdown docs | Extracts frontmatter, preserves code blocks |
markdown | Plain markdown, READMEs | Simple parsing, language hints |
openapi | Swagger/OpenAPI specs | Generates docs from API definitions |
Language Hints
When indexing SDK repositories, add a language hint:
- name: python-sdk
repository: myorg/python-sdk
parser: markdown
language: python # Helps AI understand contextChunking
Control how documents are split for indexing.
chunking:
maxChunkSize: 2000 # Maximum characters per chunk
minChunkSize: 250 # Minimum (avoids tiny chunks)
idealChunkSize: 1000 # Target sizeNote: ContextMCP uses AST-aware chunking. Code blocks and tables are never split mid-content, regardless of size limits.
Reindex Settings
reindex:
clearBeforeReindex: true # Clear index before reindexing
batchSize: 100 # Vectors uploaded per batchUsing YAML Anchors
For shared configuration across sources:
x-common-skip: &commonSkip
- node_modules
- .git
- dist
- __pycache__
sources:
- name: docs
skipDirs: *commonSkip
- name: sdk
skipDirs: *commonSkip