This is a Model Context Protocol (MCP) server for Crawlab, allowing AI applications to interact with Crawlab's functionality.
The MCP server provides a standardized way for AI applications to access Crawlab's features, including:
- Spider management (create, read, update, delete)
- Task management (run, cancel, restart)
- File management (read, write)
- Resource access (spiders, tasks)
The MCP Server/Client architecture facilitates communication between AI applications and Crawlab:
graph TB
User[User] --> Client[MCP Client]
Client --> LLM[LLM Provider]
Client <--> Server[MCP Server]
Server <--> Crawlab[Crawlab API]
subgraph "MCP System"
Client
Server
end
subgraph "Crawlab System"
Crawlab
DB[(Database)]
Crawlab <--> DB
end
class User,LLM,Crawlab,DB external;
class Client,Server internal;
%% Flow annotations
LLM -.-> |Tool calls| Client
Client -.-> |Executes tool calls| Server
Server -.-> |API requests| Crawlab
Crawlab -.-> |API responses| Server
Server -.-> |Tool results| Client
Client -.-> |Human-readable response| User
classDef external fill:#f9f9f9,stroke:#333,stroke-width:1px;
classDef internal fill:#d9edf7,stroke:#31708f,stroke-width:1px;
- User Query: The user sends a natural language query to the MCP Client
- LLM Processing: The Client forwards the query to an LLM provider (e.g., Claude, OpenAI)
- Tool Selection: The LLM identifies necessary tools and generates tool calls
- Tool Execution: The Client sends tool calls to the MCP Server
- API Interaction: The Server executes the corresponding Crawlab API requests
- Response Generation: Results flow back through the Server to the Client to the LLM
- User Response: The Client delivers the final human-readable response to the user
You can install the MCP server as a Python package, which provides a convenient CLI:
# Install from source
pip install -e .
# Or install from GitHub (when available)
# pip install git+https://github.com/crawlab-team/crawlab-mcp-server.git
After installation, you can use the CLI:
# Start the MCP server
crawlab_mcp-mcp server [--spec PATH_TO_SPEC] [--host HOST] [--port PORT]
# Start the MCP client
crawlab_mcp-mcp client SERVER_URL
- Python 3.8+
- Crawlab instance running and accessible
- API token from Crawlab
-
Copy the
.env.example
file to.env
:cp .env.example .env
-
Edit the
.env
file with your Crawlab API details:CRAWLAB_API_BASE_URL=http://your-crawlab-instance:8080/api CRAWLAB_API_TOKEN=your_api_token_here
-
Install dependencies:
pip install -r requirements.txt
-
Run the server:
python server.py
-
Build the Docker image:
docker build -t crawlab-mcp-server .
-
Run the container:
docker run -p 8000:8000 --env-file .env crawlab-mcp-server
To add the MCP server to your existing Crawlab Docker Compose setup, add the following service to your docker-compose.yml
:
services:
# ... existing Crawlab services
mcp-server:
build: ./backend/mcp-server
ports:
- "8000:8000"
environment:
- CRAWLAB_API_BASE_URL=http://backend:8000/api
- CRAWLAB_API_TOKEN=your_api_token_here
depends_on:
- backend
The MCP server enables AI applications to interact with Crawlab through natural language. Following the architecture diagram above, here's how to use the MCP system:
- Start the MCP Server: Make sure your MCP server is running and accessible
- Configure the AI Client: Connect your AI application to the MCP server
- Open Claude Desktop
- Go to Settings > MCP Servers
- Add a new server with the URL of your MCP server (e.g.,
http://localhost:8000
) - In a conversation with Claude, you can now use Crawlab functionality by describing what you want to do in natural language
Based on our architecture, here are example interactions with the system:
Create a Spider:
User: "Create a new spider named 'Product Scraper' for the e-commerce project"
↓
LLM identifies intent and calls the create_spider tool
↓
MCP Server executes the API call to Crawlab
↓
Spider is created and details are returned to the user
Run a Task:
User: "Run the 'Product Scraper' spider on all available nodes"
↓
LLM calls the run_spider tool with appropriate parameters
↓
MCP Server sends the command to Crawlab API
↓
Task is started and confirmation is returned to the user
You can interact with the system using natural language commands like:
- "List all my spiders"
- "Create a new spider with these specifications..."
- "Show me the code for the spider named X"
- "Update the file main.py in spider X with this code..."
- "Run spider X and notify me when it's complete"
- "Show me the results of the last run of spider X"
These are the underlying tools that power the natural language interactions:
spiders
: List all spiderstasks
: List all tasks
get_spider
: Get details of a specific spidercreate_spider
: Create a new spiderupdate_spider
: Update an existing spiderdelete_spider
: Delete a spider
get_task
: Get details of a specific taskrun_spider
: Run a spidercancel_task
: Cancel a running taskrestart_task
: Restart a taskget_task_logs
: Get logs for a task
get_spider_files
: List files for a spiderget_spider_file
: Get content of a specific filesave_spider_file
: Save content to a file