Employee Retention Analysis

L3
ModelContextProtocolPostgresEmployees

Analyze retention patterns identifying turnover factors and high-risk employees to develop targeted retention strategies.

Created by Lingxiao Du
2025-08-15
Reporting And AnalyticsStatistical AggregationAudit And Compliance

Model Ranking

Click on the dots to view the trajectory of each task run
Model
Run Results
Pass@4
Pass^4
Avg Time
Avg Turns
Input Tokens
Output Tokens
Total Tokens
OpenAI
gpt-5-low
3
/4
244.5s
14.5
64,172
17,875
82,047
OpenAI
gpt-5-high
2
/4
1087.8s
15.0
94,078
33,400
127,478
OpenAI
gpt-5-medium
2
/4
283.0s
12.8
66,641
15,313
81,954
Qwen
qwen-3-max
1
/4
50.0s
13.3
72,722
1,562
74,284
Claude
claude-opus-4-1
0
/1
--
758.7s
31.0
375,875
15,307
391,182
Claude
claude-sonnet-4
0
/4
216.3s
24.5
185,711
5,820
191,532
Claude
claude-sonnet-4-high
0
/4
139.1s
17.5
188,373
5,862
194,235
Claude
claude-sonnet-4-low
0
/4
138.0s
19.0
206,086
5,685
211,771
DeepSeek
deepseek-chat
0
/4
327.6s
25.8
159,693
4,117
163,809
Gemini
gemini-2-5-flash
0
/4
36.5s
7.8
20,652
5,132
25,784
Gemini
gemini-2-5-pro
0
/4
118.5s
9.3
45,762
11,958
57,720
Z.ai
glm-4-5
0
/4
142.7s
17.8
94,920
5,439
100,359
OpenAI
gpt-4-1
0
/4
19.7s
8.3
17,681
539
18,220
OpenAI
gpt-4-1-mini
0
/4
29.8s
13.5
37,110
890
38,000
OpenAI
gpt-4-1-nano
0
/4
13.4s
5.8
10,281
388
10,670
OpenAI
gpt-5-mini-high
0
/4
303.6s
19.5
166,075
32,032
198,107
OpenAI
gpt-5-mini-low
0
/4
62.8s
7.3
22,670
5,444
28,113
OpenAI
gpt-5-mini-medium
0
/4
123.0s
18.3
153,174
11,885
165,059
OpenAI
gpt-5-nano-high
0
/4
424.7s
12.5
104,721
67,158
171,878
OpenAI
gpt-5-nano-low
0
/4
98.9s
7.0
19,754
21,262
41,016
OpenAI
gpt-5-nano-medium
0
/4
226.3s
13.3
99,602
33,636
133,237
OpenAI
gpt-oss-120b
0
/4
8.0s
3.8
8,686
210
8,896
Grok
grok-4
0
/4
313.9s
21.5
236,824
11,018
247,842
Grok
grok-code-fast-1
0
/4
46.8s
13.0
89,863
4,583
94,446
MoonshotAI
kimi-k2-0711
0
/4
349.0s
31.3
207,382
4,764
212,146
MoonshotAI
kimi-k2-0905
0
/4
313.5s
27.8
259,726
3,763
263,489
OpenAI
o3
0
/4
69.3s
9.5
29,536
4,164
33,700
OpenAI
o4-mini
0
/4
70.1s
5.3
11,957
5,638
17,595
Qwen
qwen-3-coder-plus
0
/4
82.5s
29.5
312,328
3,376
315,704

Task State

Enum "employees"."employee_gender" { "M" "F" } Table "employees"."department" { "id" bpchar(4) [pk, not null] "dept_name" varchar(40) [unique, not null] } Table "employees"."department_employee" { "employee_id" int8 [not null] "department_id" bpchar(4) [not null] "from_date" date [not null] "to_date" date [not null] Indexes { (employee_id, department_id) [type: btree, name: "idx_16982_primary"] department_id [type: btree, name: "idx_16982_dept_no"] } } Table "employees"."department_manager" { "employee_id" int8 [not null] "department_id" bpchar(4) [not null] "from_date" date [not null] "to_date" date [not null] Indexes { (employee_id, department_id) [type: btree, name: "idx_16985_primary"] department_id [type: btree, name: "idx_16985_dept_no"] } } Table "employees"."employee" { "id" int8 [pk, not null, increment] "birth_date" date [not null] "first_name" varchar(14) [not null] "last_name" varchar(16) [not null] "gender" employees.employee_gender [not null] "hire_date" date [not null] } Table "employees"."salary" { "employee_id" int8 [not null] "amount" int8 [not null] "from_date" date [not null] "to_date" date [not null] Indexes { (employee_id, from_date) [type: btree, name: "idx_16991_primary"] } } Table "employees"."title" { "employee_id" int8 [not null] "title" varchar(50) [not null] "from_date" date [not null] "to_date" date Indexes { (employee_id, title, from_date) [type: btree, name: "idx_16994_primary"] } } Ref "dept_emp_ibfk_1":"employees"."employee"."id" < "employees"."department_employee"."employee_id" [update: restrict, delete: cascade] Ref "dept_emp_ibfk_2":"employees"."department"."id" < "employees"."department_employee"."department_id" [update: restrict, delete: cascade] Ref "dept_manager_ibfk_1":"employees"."employee"."id" < "employees"."department_manager"."employee_id" [update: restrict, delete: cascade] Ref "dept_manager_ibfk_2":"employees"."department"."id" < "employees"."department_manager"."department_id" [update: restrict, delete: cascade] Ref "salaries_ibfk_1":"employees"."employee"."id" < "employees"."salary"."employee_id" [update: restrict, delete: cascade] Ref "titles_ibfk_1":"employees"."employee"."id" < "employees"."title"."employee_id" [update: restrict, delete: cascade]

Instruction

Analyze employee retention patterns and identify factors contributing to turnover across the organization. The HR leadership team needs comprehensive insights to develop targeted retention strategies and reduce costly employee attrition.

Your Tasks:

  1. Create the retention analysis table — build a table called employee_retention_analysis in the employees schema with these exact columns:

    • department_name (varchar) — the department name
    • total_employees_ever (integer) — total number of employees who have ever worked in this department
    • current_employees (integer) — number of current employees in the department
    • former_employees (integer) — number of employees who left the department
    • retention_rate (decimal) — percentage of employees still with the company (current/total * 100)
  2. Create the high-risk employee identification table — build a table called high_risk_employees in the employees schema with:

    • employee_id (bigint) — the employee's ID
    • full_name (varchar) — concatenated first and last name
    • current_department (varchar) — current department name
    • tenure_days (integer) — days with the company
    • current_salary (integer) — current salary amount
    • risk_category (varchar) — risk level ('high_risk', 'medium_risk', 'low_risk')

    Note: Analyze only current employees (those with active salary records where to_date = '9999-01-01').

  3. Create the turnover trend analysis table — build a table called turnover_trend_analysis in the employees schema with:

    • departure_year (integer) — year when employees left (extract from to_date of salary records)
    • departures_count (integer) — number of employees who left that year
    • avg_tenure_days (decimal) — average tenure in days for employees who left that year
    • avg_final_salary (decimal) — average final salary of departed employees that year
  4. Apply risk assessment criteria for current employees:

    • High risk: Employees in departments with retention rate < 80% AND tenure < 1095 days (3 years)
    • Medium risk: Employees in departments with retention rate < 85% AND tenure < 1825 days (5 years)
    • Low risk: All other current employees
  5. Analyze departure trends — examine employees who left between 1985-2002, grouping by departure year.

  6. Handle final salary selection — when calculating avg_final_salary, if an employee has multiple salary records with the same departure date, select the record with the latest start date. If there are still ties, select the record with the highest salary amount.

  7. Focus appropriately — use current employees for risk analysis, all historical data for retention rates, and former employees for trend analysis.

The comprehensive analysis will help identify retention patterns, at-risk employees, and historical turnover trends to guide strategic workforce planning.



Verify

*.py
Python
"""
Verification script for PostgreSQL Task 2: Employee Retention Analysis
"""

import os
import sys
import psycopg2
from decimal import Decimal

def rows_match(actual_row, expected_row):
    """
    Compare two rows with appropriate tolerance.
    For Decimal types: allows 0.1 tolerance
    For other types: requires exact match
    """
    if len(actual_row) != len(expected_row):
        return False
    
    for actual, expected in zip(actual_row, expected_row):
        if isinstance(actual, Decimal) and isinstance(expected, Decimal):
            if abs(float(actual) - float(expected)) > 0.1:
                return False
        elif actual != expected:
            return False
    
    return True

def get_connection_params() -> dict:
    """Get database connection parameters."""
    return {
        "host": os.getenv("POSTGRES_HOST", "localhost"),
        "port": int(os.getenv("POSTGRES_PORT", 5432)),
        "database": os.getenv("POSTGRES_DATABASE"),
        "user": os.getenv("POSTGRES_USERNAME"),
        "password": os.getenv("POSTGRES_PASSWORD")
    }

def verify_retention_analysis_results(conn) -> bool:
    """Verify the employee retention analysis results."""
    with conn.cursor() as cur:
        # Get actual results from the created table
        cur.execute("""
            SELECT department_name, total_employees_ever, current_employees, 
                   former_employees, retention_rate
            FROM employees.employee_retention_analysis
            ORDER BY department_name
        """)
        actual_results = cur.fetchall()
        
        # Execute ground truth query
        cur.execute("""
            SELECT
            d.dept_name AS department_name,
            COUNT(DISTINCT de.employee_id) AS total_employees_ever,
            COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01') AS current_employees,
            (COUNT(DISTINCT de.employee_id)
            - COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01')) AS former_employees,
            (COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01'))::DECIMAL
                / NULLIF(COUNT(DISTINCT de.employee_id), 0) * 100 AS retention_rate
            FROM employees.department d
            LEFT JOIN employees.department_employee de
            ON d.id = de.department_id
            GROUP BY d.id, d.dept_name
            ORDER BY d.dept_name
        """)
        expected_results = cur.fetchall()

        if len(actual_results) != len(expected_results):
            print(f"❌ Expected {len(expected_results)} retention analysis results, got {len(actual_results)}")
            return False

        mismatches = 0
        for i, (actual, expected) in enumerate(zip(actual_results, expected_results)):
            if not rows_match(actual, expected):
                if mismatches < 5:  # Only show first 5 mismatches
                    print(f"❌ Row {i+1} mismatch: expected {expected}, got {actual}")
                mismatches += 1

        if mismatches > 0:
            print(f"❌ Total mismatches: {mismatches}")
            return False

        print(f"✅ Employee retention analysis results are correct ({len(actual_results)} records)")
        return True

def verify_high_risk_results(conn) -> bool:
    """Verify the high risk employee analysis results."""
    with conn.cursor() as cur:
        # Get actual results from the created table
        cur.execute("""
            SELECT employee_id, full_name, current_department, tenure_days, 
                   current_salary, risk_category
            FROM employees.high_risk_employees
            ORDER BY employee_id
        """)
        actual_results = cur.fetchall()
        
        # Execute ground truth query - only current employees
        cur.execute("""
            WITH current_salary AS (
            SELECT employee_id, amount AS current_amount
            FROM (
                SELECT s.*,
                    ROW_NUMBER() OVER (PARTITION BY s.employee_id
                                        ORDER BY s.from_date DESC, s.amount DESC) AS rn
                FROM employees.salary s
                WHERE s.to_date = DATE '9999-01-01'
            ) x
            WHERE rn = 1
            ),
            current_dept AS (
            SELECT employee_id, department_id
            FROM (
                SELECT de.*,
                    ROW_NUMBER() OVER (PARTITION BY de.employee_id
                                        ORDER BY de.from_date DESC, de.department_id) AS rn
                FROM employees.department_employee de
                WHERE de.to_date = DATE '9999-01-01'
            ) x
            WHERE rn = 1
            ),
            dept_retention AS (
            SELECT
                d.id   AS department_id,
                d.dept_name,
                COUNT(DISTINCT de.employee_id) AS total_employees_ever,
                COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01') AS current_employees,
                (COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01'))::NUMERIC
                / NULLIF(COUNT(DISTINCT de.employee_id), 0) * 100 AS retention_rate
            FROM employees.department d
            LEFT JOIN employees.department_employee de
                    ON de.department_id = d.id
            GROUP BY d.id, d.dept_name
            )
            SELECT
            e.id AS employee_id,
            CONCAT(e.first_name, ' ', e.last_name) AS full_name,
            d.dept_name AS current_department,
            (CURRENT_DATE - e.hire_date)::INTEGER AS tenure_days,
            cs.current_amount::INTEGER AS current_salary,
            CASE
                WHEN dr.retention_rate < 80  AND (CURRENT_DATE - e.hire_date) < 1095 THEN 'high_risk'
                WHEN dr.retention_rate < 85  AND (CURRENT_DATE - e.hire_date) < 1825 THEN 'medium_risk'
                ELSE 'low_risk'
            END AS risk_category
            FROM employees.employee e
            JOIN current_salary cs ON cs.employee_id = e.id
            JOIN current_dept   cd ON cd.employee_id = e.id
            JOIN employees.department d ON d.id = cd.department_id
            JOIN dept_retention dr ON dr.department_id = d.id
            ORDER BY e.id;
        """)
        expected_results = cur.fetchall()

        if len(actual_results) != len(expected_results):
            print(f"❌ Expected {len(expected_results)} high risk analysis results, got {len(actual_results)}")
            return False

        mismatches = 0
        for i, (actual, expected) in enumerate(zip(actual_results, expected_results)):
            if not rows_match(actual, expected):
                if mismatches < 5:  # Only show first 5 mismatches
                    print(f"❌ Row {i+1} mismatch: expected {expected}, got {actual}")
                mismatches += 1

        if mismatches > 0:
            print(f"❌ Total mismatches: {mismatches}")
            return False

        print(f"✅ High risk employee analysis results are correct ({len(actual_results)} records)")
        return True

def verify_turnover_trend_results(conn) -> bool:
    """Verify the turnover trend analysis results."""
    with conn.cursor() as cur:
        # Get actual results from the created table
        cur.execute("""
            SELECT departure_year, departures_count, avg_tenure_days, avg_final_salary
            FROM employees.turnover_trend_analysis
            ORDER BY departure_year
        """)
        actual_results = cur.fetchall()
        
        # Execute ground truth query - simplified version
        cur.execute("""
            WITH last_non_current_salary AS (
            SELECT
                s.employee_id,
                s.to_date      AS departure_date,
                s.amount       AS final_salary,
                ROW_NUMBER() OVER (
                PARTITION BY s.employee_id
                ORDER BY s.to_date DESC, s.from_date DESC, s.amount DESC
                ) AS rn
            FROM employees.salary s
            WHERE s.to_date <> DATE '9999-01-01'
                AND NOT EXISTS (
                SELECT 1
                FROM employees.salary s_cur
                WHERE s_cur.employee_id = s.employee_id
                    AND s_cur.to_date = DATE '9999-01-01'
                )
            ),
            departed AS (
            SELECT employee_id, departure_date, final_salary
            FROM last_non_current_salary
            WHERE rn = 1
            ),
            with_tenure AS (
            SELECT
                e.id AS employee_id,
                d.departure_date,
                d.final_salary,
                (d.departure_date - e.hire_date)::INTEGER AS tenure_days
            FROM employees.employee e
            JOIN departed d ON d.employee_id = e.id
            )
            SELECT
            EXTRACT(YEAR FROM departure_date)::INTEGER AS departure_year,
            COUNT(*)::INTEGER                         AS departures_count,
            AVG(tenure_days)                          AS avg_tenure_days,
            AVG(final_salary)                         AS avg_final_salary
            FROM with_tenure
            WHERE departure_date BETWEEN DATE '1985-01-01' AND DATE '2002-12-31'
            GROUP BY EXTRACT(YEAR FROM departure_date)
            ORDER BY departure_year;
        """)
        expected_results = cur.fetchall()

        if len(actual_results) != len(expected_results):
            print(f"❌ Expected {len(expected_results)} turnover trend results, got {len(actual_results)}")
            return False

        mismatches = 0
        for i, (actual, expected) in enumerate(zip(actual_results, expected_results)):
            if not rows_match(actual, expected):
                if mismatches < 5:  # Only show first 5 mismatches
                    print(f"❌ Row {i+1} mismatch: expected {expected}, got {actual}")
                mismatches += 1

        if mismatches > 0:
            print(f"❌ Total mismatches: {mismatches}")
            return False

        print(f"✅ Turnover trend analysis results are correct ({len(actual_results)} records)")
        return True

def main():
    """Main verification function."""
    print("=" * 50)

    # Get connection parameters
    conn_params = get_connection_params()

    if not conn_params["database"]:
        print("❌ No database specified")
        sys.exit(1)

    try:
        # Connect to database
        conn = psycopg2.connect(**conn_params)

        # Verify all three analysis results
        success = (
            verify_retention_analysis_results(conn) and 
            verify_high_risk_results(conn) and 
            verify_turnover_trend_results(conn)
        )

        conn.close()

        if success:
            print("\n🎉 Task verification: PASS")
            sys.exit(0)
        else:
            print("\n❌ Task verification: FAIL")
            sys.exit(1)

    except psycopg2.Error as e:
        print(f"❌ Database error: {e}")
        sys.exit(1)
    except Exception as e:
        print(f"❌ Verification error: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()