MCPMark: Stress-Testing Comprehensive MCP Use

MCP Servers are shaping the future of software. MCPMark is a comprehensive, stress-testing benchmark and a collection of diverse, verifiable tasks designed to evaluate model and agent capabilities in real-world MCP use.

A project initiated by EVAL SYS

NUS TRAIL × LobeHub

Average task resolution success rate for top and select models on MCPMark's dataset of 127 tasks

View MCPMark tasks

Showing 127 tasks

ModelContextProtocolPlaywrightReddit

Create sports analytics account, collect NBA player statistics from forum discussions, analyze basketball performance metrics, and compile comprehensive statistical report with community insights.

User Interaction, Data Extraction, Comparative Analysis, Content Submission
Created by Fanqing Meng
2025-08-12
ModelContextProtocolFilesystemDesktop Template

Extract contact details from various file formats on desktop and perform reasoning analysis on the collected relationship data.

Data Extraction, Cross Referencing
Created by Lingjun Chen
2025-08-14
ModelContextProtocolPlaywrightEval Web

Navigate websites with Cloudflare Turnstile protection, handle security challenges, bypass bot detection mechanisms, and successfully access protected content using automated browser interactions.

User Interaction
Created by Allison Zhan
2025-07-27
ModelContextProtocolNotionToronto Guide

Navigate to the Toronto Guide page and change all pink-colored elements to different colors.

Visual Formatting, Conditional Filtering
Created by Xiangyan Liu
2025-08-14
ModelContextProtocolPostgresLego

Create PostgreSQL function to handle inventory part transfers between LEGO sets with validation and audit logging.

Transactional Operations, Stored Procedures And Functions, Audit And Compliance
Created by Jiawei Wang
2025-08-16
ModelContextProtocolGithubMCPMark CI/CD

Set up ESLint workflow for code quality enforcement on all pull requests with proper CI integration.

Ci Cd Automation, Pr Workflows
Created by Zijian Wu
2025-08-15
View all MCPMark tasks