Extraction Table

L3
ModelContextProtocolPlaywrightEval Web

Extract structured data from complex web tables, parse multi-level headers, handle dynamic content loading, transform data formats, and export comprehensive datasets.

Created by Arvin Xu
2025-08-18
Data Extraction

Model Ranking

Click on the dots to view the trajectory of each task run
Model
Run Results
Pass@4
Pass^4
Avg Time
Avg Turns
Input Tokens
Output Tokens
Total Tokens
OpenAI
gpt-5-high
4
/4
1695.1s
28.5
1,148,807
45,328
1,194,135
OpenAI
gpt-5-low
4
/4
485.0s
20.3
652,775
21,704
674,479
OpenAI
gpt-5-medium
4
/4
646.5s
24.0
735,790
24,495
760,285
Z.ai
glm-4-5
3
/4
186.1s
9.5
140,267
4,488
144,756
OpenAI
o3
1
/4
97.7s
10.3
134,942
5,502
140,443
Qwen
qwen-3-coder-plus
1
/4
97.4s
16.5
386,393
2,579
388,972
Claude
claude-opus-4-1
0
/1
--
151.3s
4.0
33,113
2,322
35,435
Claude
claude-sonnet-4
0
/4
108.2s
6.3
81,135
4,716
85,850
Claude
claude-sonnet-4-high
0
/4
109.7s
7.8
108,713
5,164
113,876
Claude
claude-sonnet-4-low
0
/4
112.3s
7.0
94,946
5,501
100,447
DeepSeek
deepseek-chat
0
/4
33.7s
3.0
15,364
258
15,622
Gemini
gemini-2-5-flash
0
/4
23.6s
4.0
30,500
2,988
33,488
Gemini
gemini-2-5-pro
0
/4
130.9s
13.3
388,505
7,930
396,435
OpenAI
gpt-4-1
0
/4
22.2s
6.3
48,455
800
49,255
OpenAI
gpt-4-1-mini
0
/4
13.2s
3.3
13,144
393
13,536
OpenAI
gpt-4-1-nano
0
/4
17.3s
7.0
43,488
840
44,328
OpenAI
gpt-5-mini-high
0
/4
21.3s
2.0
5,632
2,232
7,864
OpenAI
gpt-5-mini-low
0
/4
11.0s
2.0
5,568
657
6,225
OpenAI
gpt-5-mini-medium
0
/4
15.5s
2.0
5,632
1,448
7,080
OpenAI
gpt-5-nano-high
0
/4
118.3s
3.0
18,263
23,766
42,029
OpenAI
gpt-5-nano-low
0
/4
17.2s
2.0
5,568
1,771
7,339
OpenAI
gpt-5-nano-medium
0
/4
36.5s
2.5
8,633
7,421
16,053
OpenAI
gpt-oss-120b
0
/4
6.8s
2.0
7,157
485
7,642
Grok
grok-4
0
/4
184.8s
9.5
139,984
8,514
148,497
Grok
grok-code-fast-1
0
/4
301.9s
5.3
58,726
8,013
66,739
MoonshotAI
kimi-k2-0711
0
/4
157.1s
8.5
138,475
3,745
142,220
MoonshotAI
kimi-k2-0905
0
/4
47.2s
3.3
17,648
544
18,192
OpenAI
o4-mini
0
/4
95.8s
6.0
62,237
7,381
69,618
Qwen
qwen-3-max
0
/4
49.9s
7.3
175,382
1,180
176,562

Task State


Instruction

Web Data Extraction Task

Use Playwright MCP tools to extract all data from the specified website and present it in CSV format.

Requirements:

  1. Navigate to https://eval-web.mcpmark.ai/extraction
  2. Wait for the page to fully load
  3. Extract all data content from the page, including:
    • Title
    • Rating
    • Likes
    • Views
    • Replies
  4. Organize the extracted data into CSV format
  5. Ensure data completeness and accuracy
  6. Output ONLY the complete CSV formatted data (no additional text or explanations)

CSV Data Example:

CSV
Title, Rating, Likes, Views, Replies
SEO Optimization, "4.6", 756, 10123, 72
Vue 3 Composition API, "4.5", 743, 9876, 67
Advanced TypeScript Types Guide, "4.9", 924, 15432, 102
Node.js Performance Optimization, "4.2", 567, 8765, 45
Frontend Engineering Best Practices, "4.7", 812, 11234, 78

Notes:

  • Ensure extraction of all visible data rows
  • Maintain data format consistency
  • All numeric data (Rating, Likes, Views, Replies) should NOT have quotes, only text data containing commas should be wrapped in quotes
  • Wait for the page to fully load before starting data extraction
  • Verify the quantity and format of extracted data are correct
  • IMPORTANT: Final output must contain ONLY CSV data - no explanatory text, descriptions, or other content


Verify

*.py
Python
#!/usr/bin/env python3
"""
Verification script for checking Playwright web data extraction tasks.

This script verifies whether the model successfully extracted CSV format data from web pages
by checking the last assistant message in messages.json.
"""

import sys
import json
import os
import re
import csv
from io import StringIO

# Expected CSV header (must match exactly, including spaces)
EXPECTED_HEADER_LINE = "Title, Rating, Likes, Views, Replies"
EXPECTED_HEADERS = ["Title", "Rating", "Likes", "Views", "Replies"]
# Exact number of data rows (must match data.csv exactly)
EXPECTED_DATA_ROWS = 97


def get_model_response():
    """
    Get the model's response from the MCP_MESSAGES environment variable.
    Returns the last assistant message text.
    """
    messages_path = os.getenv("MCP_MESSAGES")
    print(f"| MCP_MESSAGES: {messages_path}")
    if not messages_path:
        print("| Warning: MCP_MESSAGES environment variable not set", file=sys.stderr)
        return None

    try:
        with open(messages_path, 'r') as f:
            messages = json.load(f)

        # Find the last assistant message with status completed
        for message in reversed(messages):
            if (message.get('role') == 'assistant' and
                message.get('status') == 'completed' and
                message.get('type') == 'message'):
                content = message.get('content', [])
                # Extract text from content
                if isinstance(content, list):
                    for item in content:
                        if isinstance(item, dict) and item.get('type') in ['text', 'output_text']:
                            return item.get('text', '')
                elif isinstance(content, str):
                    return content

        print("| Warning: No completed assistant message found", file=sys.stderr)
        return None
    except Exception as e:
        print(f"| Error reading messages file: {str(e)}", file=sys.stderr)
        return None


def extract_csv_from_response(response):
    """
    Extract CSV data from model response.
    """
    # Look for CSV code blocks
    csv_pattern = r'```(?:csv)?\s*\n(.*?)\n```'
    matches = re.findall(csv_pattern, response, re.DOTALL | re.IGNORECASE)

    if matches:
        return matches[-1].strip()  # Return the last CSV block

    # If no code block found, try to find CSV data starting with header
    lines = response.split('\n')
    csv_start = -1

    # Stricter header matching: look for lines containing "Title" and "Rating"
    for i, line in enumerate(lines):
        if "Title" in line and "Rating" in line and "Likes" in line:
            csv_start = i
            break

    if csv_start >= 0:
        # Extract from header until empty line or non-CSV format line
        csv_lines = []
        for line in lines[csv_start:]:
            line = line.strip()
            if not line or not (',' in line):
                if csv_lines:  # If we already have data, stop at empty line
                    break
                continue
            csv_lines.append(line)
            if len(csv_lines) > 100:  # Prevent extracting too many rows
                break

        return '\n'.join(csv_lines)

    return None


def validate_csv_data(csv_text):
    """
    Validate CSV data format and content, must match data.csv exactly.
    """
    if not csv_text:
        return False, "CSV data not found"

    try:
        lines = csv_text.strip().split('\n')

        # Check total number of rows (1 header row + data rows)
        expected_total_rows = EXPECTED_DATA_ROWS + 1
        if len(lines) != expected_total_rows:
            return False, f"| CSV total row count mismatch, expected: {expected_total_rows} rows, actual: {len(lines)} rows"

        # Check header row format (must match exactly)
        header_line = lines[0].strip()
        if header_line != EXPECTED_HEADER_LINE:
            return False, f"| Header format mismatch, expected: '{EXPECTED_HEADER_LINE}', actual: '{header_line}'"

        # Parse CSV to validate structure
        csv_reader = csv.reader(StringIO(csv_text))
        rows = list(csv_reader)

        # Check column count for each row
        expected_columns = len(EXPECTED_HEADERS)
        for i, row in enumerate(rows):
            if len(row) != expected_columns:
                return False, f"| Row {i+1} column count incorrect, expected: {expected_columns} columns, actual: {len(row)} columns"

        # Validate data row format
        valid_rows = 0
        for i, row in enumerate(rows[1:], 2):  # Skip header, start from row 2
            # Check if each column has data
            if not all(cell.strip() for cell in row):
                return False, f"| Row {i} contains empty data"

            # Check numeric column format (Rating, Likes, Views, Replies should not have quotes)
            for col_idx, col_name in [(1, "Rating"), (2, "Likes"), (3, "Views"), (4, "Replies")]:
                value = row[col_idx].strip()

                # Check for quotes (should not have any)
                if value.startswith('"') and value.endswith('"'):
                    return False, f"| Row {i} {col_name} should not have quotes, actual: {value}"

                # Check numeric format
                if col_name == "Rating":
                    try:
                        float(value)
                    except ValueError:
                        return False, f"| Row {i} {col_name} should be a number, actual: {value}"
                else:
                    if not value.isdigit():
                        return False, f"| Row {i} {col_name} should be pure digits, actual: {value}"

            valid_rows += 1

        # Validate number of data rows
        if valid_rows != EXPECTED_DATA_ROWS:
            return False, f"| Valid data row count mismatch, expected: {EXPECTED_DATA_ROWS} rows, actual: {valid_rows} rows"

        return True, f"| CSV validation successful: format matches data.csv exactly, {valid_rows} valid data rows"

    except Exception as e:
        return False, f"| CSV format parsing error: {str(e)}"


def verify():
    """
    Verify if the model's response contains correct CSV data extraction results.
    """
    # Get model response
    model_response = get_model_response()

    if not model_response:
        print("| Model response not found", file=sys.stderr)
        return False

    print(f"|\n| Model response (first 500 characters): {model_response[:500]}...", file=sys.stderr)

    # Extract CSV data from response
    csv_data = extract_csv_from_response(model_response)

    if not csv_data:
        print("|\n| ✗ CSV data not found in response", file=sys.stderr)
        return False

    print(f"|\n| Found CSV data (first 300 characters):\n| {csv_data[:300]}...", file=sys.stderr)

    # Validate CSV data
    is_valid, message = validate_csv_data(csv_data)

    if is_valid:
        print(f"|\n| ✓ {message}", file=sys.stderr)
        return True
    else:
        print(f"|\n| ✗ CSV validation failed: {message}", file=sys.stderr)
        return False


def main():
    """
    Executes the verification process and exits with a status code.
    """
    result = verify()
    sys.exit(0 if result else 1)


if __name__ == "__main__":
    main()