Label Color Standardization

L3
ModelContextProtocolGithubClaude Code

Standardize label colors from default gray to a comprehensive color scheme for better visual organization and issue triage.

Created by Zijian Wu
2025-08-15
Issue ManagementWorkflow Automation

Model Ranking

Click on the dots to view the trajectory of each task run
Model
Run Results
Pass@4
Pass^4
Avg Time
Avg Turns
Input Tokens
Output Tokens
Total Tokens
OpenAI
gpt-5-high
3
/4
725.5s
11.5
640,236
26,273
666,509
OpenAI
gpt-5-medium
3
/4
450.4s
11.8
728,130
18,883
747,013
DeepSeek
deepseek-v3-2-thinking
2
/4
442.5s
34.5
2,243,450
8,188
2,251,637
Gemini
gemini-3-pro-low
2
/4
127.9s
13.0
1,011,140
5,473
1,016,613
OpenAI
gpt-5-mini-medium
2
/4
172.2s
16.8
1,362,541
8,866
1,371,407
Claude
claude-opus-4-5-high
1
/4
167.0s
9.3
690,649
6,066
696,716
Claude
claude-sonnet-4-5
1
/4
177.3s
12.8
1,190,577
5,100
1,195,677
Claude
claude-sonnet-4-high
1
/4
159.5s
11.3
914,485
4,093
918,578
Claude
claude-sonnet-4-low
1
/4
161.3s
10.8
764,372
4,128
768,500
DeepSeek
deepseek-v3-2-chat
1
/4
233.1s
15.3
751,512
4,507
756,019
Gemini
gemini-3-pro-high
1
/4
104.4s
9.8
488,461
4,233
492,694
OpenAI
gpt-5-mini-high
1
/4
427.8s
16.5
1,893,422
29,915
1,923,337
Claude
claude-opus-4-1
0
/1
--
337.8s
13.0
651,976
3,885
655,861
Claude
claude-sonnet-4
0
/4
190.5s
12.3
1,062,279
4,634
1,066,913
DeepSeek
deepseek-chat
0
/4
177.9s
8.5
472,677
1,704
474,381
DeepSeek
deepseek-v3-1-terminus
0
/4
125.0s
6.5
187,007
1,308
188,315
DeepSeek
deepseek-v3-1-terminus-thinking
0
/4
428.5s
2.3
107,949
9,523
117,472
Gemini
gemini-2-5-flash
0
/4
657.6s
15.8
4,405,352
8,004
4,413,356
Gemini
gemini-2-5-pro
0
/4
98.1s
8.0
302,850
6,415
309,265
Z.ai
glm-4-5
0
/4
149.8s
9.0
447,554
4,118
451,671
OpenAI
gpt-4-1
0
/4
452.1s
26.5
2,402,979
1,928
2,404,906
OpenAI
gpt-4-1-mini
0
/4
143.4s
25.8
1,980,181
2,598
1,982,779
OpenAI
gpt-4-1-nano
0
/4
37.3s
7.8
309,046
901
309,947
OpenAI
gpt-5-low
0
/4
310.1s
11.8
598,326
14,163
612,489
OpenAI
gpt-5-mini-low
0
/4
66.4s
9.0
427,263
2,273
429,535
OpenAI
gpt-5-nano-high
0
/4
213.6s
11.8
564,364
35,655
600,019
OpenAI
gpt-5-nano-low
0
/4
54.1s
9.5
171,711
3,242
174,953
OpenAI
gpt-5-nano-medium
0
/4
162.5s
13.3
1,211,367
19,104
1,230,471
OpenAI
gpt-oss-120b
0
/4
9.3s
2.3
19,446
658
20,104
Grok
grok-4
0
/4
365.1s
18.8
1,082,077
1,746
1,091,706
Grok
grok-4-fast
0
/4
168.1s
10.5
589,141
12,055
601,196
Grok
grok-code-fast-1
0
/4
235.3s
13.8
1,007,379
6,225
1,013,604
MoonshotAI
kimi-k2-0711
0
/4
367.9s
6.5
373,829
1,426
375,255
MoonshotAI
kimi-k2-0905
0
/4
571.4s
15.5
688,316
6,439
694,755
OpenAI
o3
0
/4
84.9s
9.0
220,935
3,295
224,229
OpenAI
o4-mini
0
/4
187.4s
8.8
126,634
11,205
137,839
Qwen
qwen-3-coder-plus
0
/4
80.8s
9.0
741,470
1,375
742,844
Qwen
qwen-3-max
0
/4
161.4s
15.5
1,102,726
2,017
1,104,743

Task State


Instruction

I need you to implement a comprehensive label documentation and organization workflow for the repository.

Step 1: Create Label Documentation Issue Create a new issue with:

  • Title containing: "Document label organization for better visual organization" and "label guide"
  • Body must include:
    • A "## Problem" heading describing the need for better label documentation
    • A "## Proposed Solution" heading about creating a comprehensive label guide for different label categories
    • A "## Benefits" heading listing improved visual organization and easier issue triage
    • Keywords: "label documentation", "visual organization", "label guide", "organization"
  • Labels: Initially add "enhancement" and "documentation" labels to the issue

Step 2: Create Feature Branch Create a new branch called 'feat/label-color-guide' from main.

Step 3: Create Label Documentation On the feature branch, create the file docs/LABEL_COLORS.md with:

  • A "# Label Organization Guide" title
  • A "## Label Categories" section with a table that MUST follow this exact format:
Markdown
| Label Name | Category | Description |
|------------|----------|-------------|

The table must include ALL existing labels in the repository. For each label:

  • Group labels by category (e.g., issue-type, platform, area, status, performance)

  • Include a description for each label

  • A "## Usage Guidelines" section explaining when to use each label category

Step 4: Apply ALL Labels to the Documentation Issue Update the issue you created in Step 1 by adding ALL existing labels from the repository. This serves as a visual demonstration of the label organization. The issue should have every single label that exists in the repository applied to it.

Step 5: Create Pull Request Create a pull request from 'feat/label-color-guide' to 'main' with:

  • Title containing: "Add label organization guide" and "visual organization"
  • Body must include:
    • A "## Summary" heading explaining the label organization documentation
    • A "## Changes" heading with a bullet list of what was added
    • "Fixes #[ISSUE_NUMBER]" pattern linking to your created issue
    • A "## Verification" section stating that all labels have been documented
    • Keywords: "label documentation", "organization guide", "visual improvement", "documentation"
  • Labels: Add a reasonable subset of labels to the PR (at least 5-10 labels from different categories)

Step 6: Document Changes in Issue Add a comment to the original issue with:

  • Confirmation that the label documentation has been created
  • Total count of labels documented
  • Reference to the PR using "PR #[NUMBER]" pattern
  • Keywords: "documentation created", "label guide complete", "organization complete"


Verify

*.py
Python
import sys
import os
import requests
from typing import Dict, List, Optional, Tuple
from dotenv import load_dotenv


def _get_github_api(
    endpoint: str, headers: Dict[str, str], org: str, repo: str = "claude-code"
) -> Tuple[bool, Optional[Dict]]:
    """Make a GET request to GitHub API and return (success, response)."""
    url = f"https://api.github.com/repos/{org}/{repo}/{endpoint}"
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            return True, response.json()
        elif response.status_code == 404:
            return False, None
        else:
            print(f"API error for {endpoint}: {response.status_code}", file=sys.stderr)
            return False, None
    except Exception as e:
        print(f"Exception for {endpoint}: {e}", file=sys.stderr)
        return False, None



def _check_branch_exists(
    branch_name: str, headers: Dict[str, str], org: str, repo: str = "claude-code"
) -> bool:
    """Verify that a branch exists in the repository."""
    success, _ = _get_github_api(f"branches/{branch_name}", headers, org, repo)
    return success


def _check_file_content(
    branch: str,
    file_path: str,
    headers: Dict[str, str],
    org: str,
    repo: str = "claude-code",
) -> Optional[str]:
    """Get file content from a branch."""
    import base64

    success, result = _get_github_api(
        f"contents/{file_path}?ref={branch}", headers, org, repo
    )
    if not success or not result:
        return None

    if result.get("content"):
        try:
            content = base64.b64decode(result.get("content", "")).decode("utf-8")
            return content
        except Exception as e:
            print(f"Content decode error for {file_path}: {e}", file=sys.stderr)
            return None

    return None


def _parse_label_table(content: str) -> List[str]:
    """Parse the label table from markdown content and return label names."""
    documented_labels = []

    # Find the table in the content
    lines = content.split("\n")
    in_table = False

    for line in lines:
        # Skip header and separator lines
        if "| Label Name | Category |" in line:
            in_table = True
            continue
        if in_table and line.startswith("|---"):
            continue

        # Parse table rows
        if in_table and line.startswith("|"):
            parts = [p.strip() for p in line.split("|")]
            if len(parts) >= 3:  # Should have at least label, category
                label_name = parts[1].strip()
                if label_name:
                    documented_labels.append(label_name)

        # Stop at end of table
        if in_table and line and not line.startswith("|"):
            break

    return documented_labels


def _find_issue_by_title_keywords(
    title_keywords: List[str],
    headers: Dict[str, str],
    org: str,
    repo: str = "claude-code",
) -> Optional[Dict]:
    """Find an issue by title keywords and return the issue data."""
    for state in ["open", "closed"]:
        success, issues = _get_github_api(
            f"issues?state={state}&per_page=100", headers, org, repo
        )
        if success and issues:
            for issue in issues:
                # Skip pull requests
                if "pull_request" in issue:
                    continue
                title = issue.get("title", "").lower()
                if all(keyword.lower() in title for keyword in title_keywords):
                    return issue
    return None


def _find_pr_by_title_keywords(
    title_keywords: List[str],
    headers: Dict[str, str],
    org: str,
    repo: str = "claude-code",
) -> Optional[Dict]:
    """Find a PR by title keywords and return the PR data."""
    for state in ["open", "closed"]:
        success, prs = _get_github_api(
            f"pulls?state={state}&per_page=100", headers, org, repo
        )
        if success and prs:
            for pr in prs:
                title = pr.get("title", "").lower()
                if all(keyword.lower() in title for keyword in title_keywords):
                    return pr
    return None


def _get_issue_comments(
    issue_number: int, headers: Dict[str, str], org: str, repo: str = "claude-code"
) -> List[Dict]:
    """Get all comments for an issue."""
    success, comments = _get_github_api(
        f"issues/{issue_number}/comments", headers, org, repo
    )
    if success and comments:
        return comments
    return []




def verify() -> bool:
    """
    Programmatically verify that the label color standardization workflow meets the
    requirements described in description.md.
    """
    # Load environment variables from .mcp_env
    load_dotenv(".mcp_env")

    # Get GitHub token and org
    github_token = os.environ.get("MCP_GITHUB_TOKEN")
    github_org = os.environ.get("GITHUB_EVAL_ORG")

    if not github_token:
        print("Error: MCP_GITHUB_TOKEN environment variable not set", file=sys.stderr)
        return False

    if not github_org:
        print("Error: GITHUB_EVAL_ORG environment variable not set", file=sys.stderr)
        return False

    # Configuration constants
    BRANCH_NAME = "feat/label-color-guide"

    # Issue requirements
    ISSUE_TITLE_KEYWORDS = ["Document label organization", "label guide"]
    ISSUE_KEYWORDS = [
        "label documentation",
        "visual organization",
        "label guide",
        "organization",
    ]

    # PR requirements
    PR_TITLE_KEYWORDS = ["label organization guide", "visual organization"]
    PR_KEYWORDS = [
        "label documentation",
        "organization guide",
        "visual improvement",
        "documentation",
    ]

    # All expected labels in the repository that are actually used/discoverable via MCP tools
    # Note: Excludes 'wontfix', 'invalid', 'good first issue', 'help wanted' as they exist
    # in the repository but are not used by any issues (not discoverable via MCP search)
    ALL_EXPECTED_LABELS = [
        "bug",
        "enhancement",
        "duplicate",
        "question",
        "documentation",
        "platform:macos",
        "platform:linux",
        "platform:windows",
        "area:core",
        "area:tools",
        "area:tui",
        "area:ide",
        "area:mcp",
        "area:api",
        "area:security",
        "area:model",
        "area:auth",
        "area:packaging",
        "has repro",
        "memory",
        "perf:memory",
        "external",
    ]

    headers = {
        "Authorization": f"token {github_token}",
        "Accept": "application/vnd.github.v3+json",
    }

    # Run verification checks
    print("Verifying label color standardization workflow completion...")

    # 1. Check that feature branch exists
    print("1. Verifying feature branch exists...")
    if not _check_branch_exists(BRANCH_NAME, headers, github_org):
        print(f"Error: Branch '{BRANCH_NAME}' not found", file=sys.stderr)
        return False

    # 2. Check documentation file exists and has correct format
    print("2. Verifying label documentation file...")
    doc_content = _check_file_content(
        BRANCH_NAME, "docs/LABEL_COLORS.md", headers, github_org
    )
    if not doc_content:
        print("Error: docs/LABEL_COLORS.md not found", file=sys.stderr)
        return False

    # Parse the label table from documentation
    documented_labels = _parse_label_table(doc_content)
    if len(documented_labels) < 20:
        print(
            f"Error: Documentation table incomplete, found only {len(documented_labels)} labels",
            file=sys.stderr,
        )
        return False

    # 3. Verify labels are documented
    print("3. Verifying expected labels are documented...")
    print(f"  ✓ {len(ALL_EXPECTED_LABELS)} expected labels defined for verification")

    # 4. Find the created issue
    print("4. Verifying issue creation...")
    issue = _find_issue_by_title_keywords(ISSUE_TITLE_KEYWORDS, headers, github_org)
    if not issue:
        print(
            "Error: Issue with title containing required keywords not found",
            file=sys.stderr,
        )
        return False

    issue_number = issue.get("number")
    issue_body = issue.get("body", "")

    # Check issue content has required sections and keywords
    issue_required_sections = ["## Problem", "## Proposed Solution", "## Benefits"]
    for section in issue_required_sections:
        if section not in issue_body:
            print(f"Error: Issue body missing required section: {section}", file=sys.stderr)
            return False

    # Check issue has required keywords
    if not all(keyword.lower() in issue_body.lower() for keyword in ISSUE_KEYWORDS):
        missing_keywords = [kw for kw in ISSUE_KEYWORDS if kw.lower() not in issue_body.lower()]
        print(f"Error: Issue body missing required keywords: {missing_keywords}", file=sys.stderr)
        return False

    # Check issue has initial required labels (enhancement and documentation)
    issue_label_names = [label["name"] for label in issue.get("labels", [])]
    initial_required_labels = ["enhancement", "documentation"]
    for required_label in initial_required_labels:
        if required_label not in issue_label_names:
            print(f"Error: Issue missing initial required label: {required_label}", file=sys.stderr)
            return False

    # 5. Find the created PR
    print("5. Verifying pull request creation...")
    pr = _find_pr_by_title_keywords(PR_TITLE_KEYWORDS, headers, github_org)
    if not pr:
        print(
            "Error: PR with title containing required keywords not found",
            file=sys.stderr,
        )
        return False

    pr_number = pr.get("number")
    pr_body = pr.get("body", "")
    pr_labels = pr.get("labels", [])

    # Check PR references issue with correct pattern
    if f"Fixes #{issue_number}" not in pr_body and f"fixes #{issue_number}" not in pr_body:
        print(f"Error: PR does not contain 'Fixes #{issue_number}' pattern", file=sys.stderr)
        return False

    # Check PR body has required sections and keywords
    pr_required_sections = ["## Summary", "## Changes", "## Verification"]
    for section in pr_required_sections:
        if section not in pr_body:
            print(f"Error: PR body missing required section: {section}", file=sys.stderr)
            return False

    # Check PR has required keywords
    if not all(keyword.lower() in pr_body.lower() for keyword in PR_KEYWORDS):
        missing_keywords = [kw for kw in PR_KEYWORDS if kw.lower() not in pr_body.lower()]
        print(f"Error: PR body missing required keywords: {missing_keywords}", file=sys.stderr)
        return False

    # Check PR has sufficient labels (at least 5 from different categories)
    if len(pr_labels) < 5:
        print(f"Error: PR has only {len(pr_labels)} labels, needs at least 5", file=sys.stderr)
        return False

    # 6. Verify issue has ALL expected/usable labels applied (demonstrates organization)
    print("6. Verifying issue has all expected labels applied...")
    issue_label_names = [label["name"] for label in issue.get("labels", [])]
    # Use our expected labels list instead of all repo labels (excludes unused labels)
    expected_labels_to_check = ALL_EXPECTED_LABELS
    missing_labels = []

    for expected_label in expected_labels_to_check:
        if expected_label not in issue_label_names:
            missing_labels.append(expected_label)

    if missing_labels:
        print(
            f"Error: Issue missing {len(missing_labels)} expected labels: {missing_labels[:5]}...",
            file=sys.stderr,
        )
        return False

    print(f"  ✓ Issue has all {len(expected_labels_to_check)} expected labels applied")

    # 7. Verify issue has comment documenting changes
    print("7. Verifying issue comment with documentation...")
    issue_comments = _get_issue_comments(issue_number, headers, github_org)

    found_update_comment = False
    comment_required_keywords = ["documentation created", "label guide complete", "organization complete"]
    
    for comment in issue_comments:
        body = comment.get("body", "")
        # Check for PR reference and required keywords
        if (f"PR #{pr_number}" in body and 
            any(keyword.lower() in body.lower() for keyword in comment_required_keywords) and
            "total" in body.lower() and "labels" in body.lower()):
            found_update_comment = True
            break

    if not found_update_comment:
        print("Error: Issue missing comment documenting changes with required content", file=sys.stderr)
        print("  Comment should include: PR reference, label count, and completion keywords", file=sys.stderr)
        return False

    # 8. Final verification of complete workflow
    print("8. Final verification of workflow completion...")
    
    # Skip repository label existence check - we trust that our expected labels 
    # are the ones actually discoverable/usable via MCP tools

    # Ensure expected labels are documented (not all repo labels, since some are unused)
    documented_label_count = len(documented_labels)
    expected_label_count = len(ALL_EXPECTED_LABELS)

    if documented_label_count < expected_label_count:
        print(
            f"Error: Documentation incomplete - {documented_label_count} documented vs {expected_label_count} expected",
            file=sys.stderr,
        )
        return False

    # Check that all expected labels are documented
    missing_documented_labels = []
    for expected_label in ALL_EXPECTED_LABELS:
        if expected_label not in documented_labels:
            missing_documented_labels.append(expected_label)

    if missing_documented_labels:
        print(
            f"Error: Documentation missing expected labels: {missing_documented_labels}",
            file=sys.stderr,
        )
        return False

    print(f"  ✓ All {expected_label_count} expected labels documented")
    print(f"  ✓ All {len(ALL_EXPECTED_LABELS)} expected labels present and documented")

    print("\n✓ All verification checks passed!")
    print("Label documentation workflow completed successfully:")
    print(
        f"  - Issue #{issue_number}: {issue.get('title')} (with all {len(issue_label_names)} labels)"
    )
    print(f"  - PR #{pr_number}: {pr.get('title')}")
    print(f"  - Branch: {BRANCH_NAME}")
    print("  - Documentation: docs/LABEL_COLORS.md")
    print(f"  - {expected_label_count} labels documented for better organization")
    return True


if __name__ == "__main__":
    success = verify()
    sys.exit(0 if success else 1)