Extraction Table

PlaywrightEval Web

Extract structured data from complex web tables, parse multi-level headers, handle dynamic content loading, transform data formats, and export comprehensive datasets.

Created by Arvin Xu

2025-08-18

Data Extraction

Model Ranking

Click on the dots to view the trajectory of each task run

Model	Run Results	Pass@4	Pass^4	Avg Time	Avg Turns	Input Tokens	Output Tokens	Total Tokens
Model	Run Results	Pass@4	Pass^4	Avg Time	Avg Turns	Input Tokens	Output Tokens	Total Tokens
gpt-5-high	4 /4			1695.1s	28.5	1,148,807	45,328	1,194,135
gpt-5-low	4 /4			485.0s	20.3	652,775	21,704	674,479
gpt-5-medium	4 /4			646.5s	24.0	735,790	24,495	760,285
glm-4-5	3 /4			186.1s	9.5	140,267	4,488	144,756
o3	1 /4			97.7s	10.3	134,942	5,502	140,443
qwen-3-coder-plus	1 /4			97.4s	16.5	386,393	2,579	388,972
claude-opus-4-1	0 /1	-	-	151.3s	4.0	33,113	2,322	35,435
claude-sonnet-4	0 /4			108.2s	6.3	81,135	4,716	85,850
claude-sonnet-4-high	0 /4			109.7s	7.8	108,713	5,164	113,876
claude-sonnet-4-low	0 /4			112.3s	7.0	94,946	5,501	100,447
deepseek-chat	0 /4			33.7s	3.0	15,364	258	15,622
gemini-2-5-flash	0 /4			23.6s	4.0	30,500	2,988	33,488
gemini-2-5-pro	0 /4			130.9s	13.3	388,505	7,930	396,435
gpt-4-1	0 /4			22.2s	6.3	48,455	800	49,255
gpt-4-1-mini	0 /4			13.2s	3.3	13,144	393	13,536
gpt-4-1-nano	0 /4			17.3s	7.0	43,488	840	44,328
gpt-5-mini-high	0 /4			21.3s	2.0	5,632	2,232	7,864
gpt-5-mini-low	0 /4			11.0s	2.0	5,568	657	6,225
gpt-5-mini-medium	0 /4			15.5s	2.0	5,632	1,448	7,080
gpt-5-nano-high	0 /4			118.3s	3.0	18,263	23,766	42,029
gpt-5-nano-low	0 /4			17.2s	2.0	5,568	1,771	7,339
gpt-5-nano-medium	0 /4			36.5s	2.5	8,633	7,421	16,053
gpt-oss-120b	0 /4			6.8s	2.0	7,157	485	7,642
grok-4	0 /4			184.8s	9.5	139,984	8,514	148,497
grok-code-fast-1	0 /4			301.9s	5.3	58,726	8,013	66,739
kimi-k2-0711	0 /4			157.1s	8.5	138,475	3,745	142,220
kimi-k2-0905	0 /4			47.2s	3.3	17,648	544	18,192
o4-mini	0 /4			95.8s	6.0	62,237	7,381	69,618
qwen-3-max	0 /4			49.9s	7.3	175,382	1,180	176,562

Task State

eval-web.mcpmark.ai

view this website to see the original task state

Instruction

Web Data Extraction Task

Use Playwright MCP tools to extract all data from the specified website and present it in CSV format.

Requirements:

Navigate to https://eval-web.mcpmark.ai/extraction
Wait for the page to fully load
Extract all data content from the page, including:
- Title
- Rating
- Likes
- Views
- Replies
Organize the extracted data into CSV format
Ensure data completeness and accuracy
Output ONLY the complete CSV formatted data (no additional text or explanations)

CSV Data Example:

CSV

Title, Rating, Likes, Views, Replies
SEO Optimization, "4.6", 756, 10123, 72
Vue 3 Composition API, "4.5", 743, 9876, 67
Advanced TypeScript Types Guide, "4.9", 924, 15432, 102
Node.js Performance Optimization, "4.2", 567, 8765, 45
Frontend Engineering Best Practices, "4.7", 812, 11234, 78

Notes:

Ensure extraction of all visible data rows
Maintain data format consistency
All numeric data (Rating, Likes, Views, Replies) should NOT have quotes, only text data containing commas should be wrapped in quotes
Wait for the page to fully load before starting data extraction
Verify the quantity and format of extracted data are correct
IMPORTANT: Final output must contain ONLY CSV data - no explanatory text, descriptions, or other content

Verify

Python

#!/usr/bin/env python3
"""
Verification script for checking Playwright web data extraction tasks.

This script verifies whether the model successfully extracted CSV format data from web pages
by checking the last assistant message in messages.json.
"""

import sys
import json
import os
import re
import csv
from io import StringIO

# Expected CSV header (must match exactly, including spaces)
EXPECTED_HEADER_LINE = "Title, Rating, Likes, Views, Replies"
EXPECTED_HEADERS = ["Title", "Rating", "Likes", "Views", "Replies"]
# Exact number of data rows (must match data.csv exactly)
EXPECTED_DATA_ROWS = 97


def get_model_response():
    """
    Get the model's response from the MCP_MESSAGES environment variable.
    Returns the last assistant message text.
    """
    messages_path = os.getenv("MCP_MESSAGES")
    print(f"| MCP_MESSAGES: {messages_path}")
    if not messages_path:
        print("| Warning: MCP_MESSAGES environment variable not set", file=sys.stderr)
        return None

    try:
        with open(messages_path, 'r') as f:
            messages = json.load(f)

        # Find the last assistant message with status completed
        for message in reversed(messages):
            if (message.get('role') == 'assistant' and
                message.get('status') == 'completed' and
                message.get('type') == 'message'):
                content = message.get('content', [])
                # Extract text from content
                if isinstance(content, list):
                    for item in content:
                        if isinstance(item, dict) and item.get('type') in ['text', 'output_text']:
                            return item.get('text', '')
                elif isinstance(content, str):
                    return content

        print("| Warning: No completed assistant message found", file=sys.stderr)
        return None
    except Exception as e:
        print(f"| Error reading messages file: {str(e)}", file=sys.stderr)
        return None


def extract_csv_from_response(response):
    """
    Extract CSV data from model response.
    """
    # Look for CSV code blocks
    csv_pattern = r'```(?:csv)?\s*\n(.*?)\n```'
    matches = re.findall(csv_pattern, response, re.DOTALL | re.IGNORECASE)

    if matches:
        return matches[-1].strip()  # Return the last CSV block

    # If no code block found, try to find CSV data starting with header
    lines = response.split('\n')
    csv_start = -1

    # Stricter header matching: look for lines containing "Title" and "Rating"
    for i, line in enumerate(lines):
        if "Title" in line and "Rating" in line and "Likes" in line:
            csv_start = i
            break

    if csv_start >= 0:
        # Extract from header until empty line or non-CSV format line
        csv_lines = []
        for line in lines[csv_start:]:
            line = line.strip()
            if not line or not (',' in line):
                if csv_lines:  # If we already have data, stop at empty line
                    break
                continue
            csv_lines.append(line)
            if len(csv_lines) > 100:  # Prevent extracting too many rows
                break

        return '\n'.join(csv_lines)

    return None


def validate_csv_data(csv_text):
    """
    Validate CSV data format and content, must match data.csv exactly.
    """
    if not csv_text:
        return False, "CSV data not found"

    try:
        lines = csv_text.strip().split('\n')

        # Check total number of rows (1 header row + data rows)
        expected_total_rows = EXPECTED_DATA_ROWS + 1
        if len(lines) != expected_total_rows:
            return False, f"| CSV total row count mismatch, expected: {expected_total_rows} rows, actual: {len(lines)} rows"

        # Check header row format (must match exactly)
        header_line = lines[0].strip()
        if header_line != EXPECTED_HEADER_LINE:
            return False, f"| Header format mismatch, expected: '{EXPECTED_HEADER_LINE}', actual: '{header_line}'"

        # Parse CSV to validate structure
        csv_reader = csv.reader(StringIO(csv_text))
        rows = list(csv_reader)

        # Check column count for each row
        expected_columns = len(EXPECTED_HEADERS)
        for i, row in enumerate(rows):
            if len(row) != expected_columns:
                return False, f"| Row {i+1} column count incorrect, expected: {expected_columns} columns, actual: {len(row)} columns"

        # Validate data row format
        valid_rows = 0
        for i, row in enumerate(rows[1:], 2):  # Skip header, start from row 2
            # Check if each column has data
            if not all(cell.strip() for cell in row):
                return False, f"| Row {i} contains empty data"

            # Check numeric column format (Rating, Likes, Views, Replies should not have quotes)
            for col_idx, col_name in [(1, "Rating"), (2, "Likes"), (3, "Views"), (4, "Replies")]:
                value = row[col_idx].strip()

                # Check for quotes (should not have any)
                if value.startswith('"') and value.endswith('"'):
                    return False, f"| Row {i} {col_name} should not have quotes, actual: {value}"

                # Check numeric format
                if col_name == "Rating":
                    try:
                        float(value)
                    except ValueError:
                        return False, f"| Row {i} {col_name} should be a number, actual: {value}"
                else:
                    if not value.isdigit():
                        return False, f"| Row {i} {col_name} should be pure digits, actual: {value}"

            valid_rows += 1

        # Validate number of data rows
        if valid_rows != EXPECTED_DATA_ROWS:
            return False, f"| Valid data row count mismatch, expected: {EXPECTED_DATA_ROWS} rows, actual: {valid_rows} rows"

        return True, f"| CSV validation successful: format matches data.csv exactly, {valid_rows} valid data rows"

    except Exception as e:
        return False, f"| CSV format parsing error: {str(e)}"


def verify():
    """
    Verify if the model's response contains correct CSV data extraction results.
    """
    # Get model response
    model_response = get_model_response()

    if not model_response:
        print("| Model response not found", file=sys.stderr)
        return False

    print(f"|\n| Model response (first 500 characters): {model_response[:500]}...", file=sys.stderr)

    # Extract CSV data from response
    csv_data = extract_csv_from_response(model_response)

    if not csv_data:
        print("|\n| ✗ CSV data not found in response", file=sys.stderr)
        return False

    print(f"|\n| Found CSV data (first 300 characters):\n| {csv_data[:300]}...", file=sys.stderr)

    # Validate CSV data
    is_valid, message = validate_csv_data(csv_data)

    if is_valid:
        print(f"|\n| ✓ {message}", file=sys.stderr)
        return True
    else:
        print(f"|\n| ✗ CSV validation failed: {message}", file=sys.stderr)
        return False


def main():
    """
    Executes the verification process and exits with a status code.
    """
    result = verify()
    sys.exit(0 if result else 1)


if __name__ == "__main__":
    main()