Employee Retention Analysis

PostgresEmployees

Analyze retention patterns identifying turnover factors and high-risk employees to develop targeted retention strategies.

Created by Lingxiao Du

2025-08-15

Reporting And AnalyticsStatistical AggregationAudit And Compliance

Model Ranking

Click on the dots to view the trajectory of each task run

Model	Run Results	Pass@4	Pass^4	Avg Time	Avg Turns	Input Tokens	Output Tokens	Total Tokens
Model	Run Results	Pass@4	Pass^4	Avg Time	Avg Turns	Input Tokens	Output Tokens	Total Tokens
gpt-5-low	3 /4			244.5s	14.5	64,172	17,875	82,047
gemini-3-pro-high	2 /4			304.0s	15.3	239,278	9,208	248,486
gpt-5-high	2 /4			1087.8s	15.0	94,078	33,400	127,478
gpt-5-medium	2 /4			283.0s	12.8	66,641	15,313	81,954
grok-4-fast	2 /4			74.1s	17.0	160,228	7,118	167,345
qwen-3-max	1 /4			50.0s	13.3	72,722	1,562	74,284
claude-opus-4-1	0 /1	-	-	758.7s	31.0	375,875	15,307	391,182
claude-opus-4-5-high	0 /4			146.4s	18.8	273,806	5,929	279,735
claude-sonnet-4	0 /4			216.3s	24.5	185,711	5,820	191,532
claude-sonnet-4-5	0 /4			148.2s	19.5	321,098	6,670	327,768
claude-sonnet-4-high	0 /4			139.1s	17.5	188,373	5,862	194,235
claude-sonnet-4-low	0 /4			138.0s	19.0	206,086	5,685	211,771
deepseek-chat	0 /4			327.6s	25.8	159,693	4,117	163,809
deepseek-v3-1-terminus	0 /4			230.0s	19.5	177,070	3,463	180,533
deepseek-v3-1-terminus-thinking	0 /4			243.2s	21.0	195,704	3,596	199,301
deepseek-v3-2-chat	0 /4			277.5s	29.5	308,191	5,615	313,805
deepseek-v3-2-thinking	0 /4			370.7s	30.3	387,092	9,026	396,118
gemini-2-5-flash	0 /4			36.5s	7.8	20,652	5,132	25,784
gemini-2-5-pro	0 /4			118.5s	9.3	45,762	11,958	57,720
gemini-3-pro-low	0 /4			143.8s	12.3	122,515	8,200	130,715
glm-4-5	0 /4			142.7s	17.8	94,920	5,439	100,359
gpt-4-1	0 /4			19.7s	8.3	17,681	539	18,220
gpt-4-1-mini	0 /4			29.8s	13.5	37,110	890	38,000
gpt-4-1-nano	0 /4			13.4s	5.8	10,281	388	10,670
gpt-5-2-high	0 /4			785.4s	16.3	237,687	30,114	267,801
gpt-5-mini-high	0 /4			303.6s	19.5	166,075	32,032	198,107
gpt-5-mini-low	0 /4			62.8s	7.3	22,670	5,444	28,113
gpt-5-mini-medium	0 /4			123.0s	18.3	153,174	11,885	165,059
gpt-5-nano-high	0 /4			424.7s	12.5	104,721	67,158	171,878
gpt-5-nano-low	0 /4			98.9s	7.0	19,754	21,262	41,016
gpt-5-nano-medium	0 /4			226.3s	13.3	99,602	33,636	133,237
gpt-oss-120b	0 /4			8.0s	3.8	8,686	210	8,896
grok-4	0 /4			313.9s	21.5	236,824	11,018	247,842
grok-code-fast-1	0 /4			46.8s	13.0	89,863	4,583	94,446
kimi-k2-0711	0 /4			349.0s	31.3	207,382	4,764	212,146
kimi-k2-0905	0 /4			313.5s	27.8	259,726	3,763	263,489
o3	0 /4			69.3s	9.5	29,536	4,164	33,700
o4-mini	0 /4			70.1s	5.3	11,957	5,638	17,595
qwen-3-coder-plus	0 /4			82.5s	29.5	312,328	3,376	315,704

Task State

Enum "employees"."employee_gender" { "M" "F" } Table "employees"."department" { "id" bpchar(4) [pk, not null] "dept_name" varchar(40) [unique, not null] } Table "employees"."department_employee" { "employee_id" int8 [not null] "department_id" bpchar(4) [not null] "from_date" date [not null] "to_date" date [not null] Indexes { (employee_id, department_id) [type: btree, name: "idx_16982_primary"] department_id [type: btree, name: "idx_16982_dept_no"] } } Table "employees"."department_manager" { "employee_id" int8 [not null] "department_id" bpchar(4) [not null] "from_date" date [not null] "to_date" date [not null] Indexes { (employee_id, department_id) [type: btree, name: "idx_16985_primary"] department_id [type: btree, name: "idx_16985_dept_no"] } } Table "employees"."employee" { "id" int8 [pk, not null, increment] "birth_date" date [not null] "first_name" varchar(14) [not null] "last_name" varchar(16) [not null] "gender" employees.employee_gender [not null] "hire_date" date [not null] } Table "employees"."salary" { "employee_id" int8 [not null] "amount" int8 [not null] "from_date" date [not null] "to_date" date [not null] Indexes { (employee_id, from_date) [type: btree, name: "idx_16991_primary"] } } Table "employees"."title" { "employee_id" int8 [not null] "title" varchar(50) [not null] "from_date" date [not null] "to_date" date Indexes { (employee_id, title, from_date) [type: btree, name: "idx_16994_primary"] } } Ref "dept_emp_ibfk_1":"employees"."employee"."id" < "employees"."department_employee"."employee_id" [update: restrict, delete: cascade] Ref "dept_emp_ibfk_2":"employees"."department"."id" < "employees"."department_employee"."department_id" [update: restrict, delete: cascade] Ref "dept_manager_ibfk_1":"employees"."employee"."id" < "employees"."department_manager"."employee_id" [update: restrict, delete: cascade] Ref "dept_manager_ibfk_2":"employees"."department"."id" < "employees"."department_manager"."department_id" [update: restrict, delete: cascade] Ref "salaries_ibfk_1":"employees"."employee"."id" < "employees"."salary"."employee_id" [update: restrict, delete: cascade] Ref "titles_ibfk_1":"employees"."employee"."id" < "employees"."title"."employee_id" [update: restrict, delete: cascade]

Instruction

Analyze employee retention patterns and identify factors contributing to turnover across the organization. The HR leadership team needs comprehensive insights to develop targeted retention strategies and reduce costly employee attrition.

Your Tasks:

Create the retention analysis table — build a table called employee_retention_analysis in the employees schema with these exact columns:
- department_name (varchar) — the department name
- total_employees_ever (integer) — total number of employees who have ever worked in this department
- current_employees (integer) — number of current employees in the department
- former_employees (integer) — number of employees who left the department
- retention_rate (decimal) — percentage of employees still with the company (current/total * 100)
Create the high-risk employee identification table — build a table called high_risk_employees in the employees schema with:
- employee_id (bigint) — the employee's ID
- full_name (varchar) — concatenated first and last name
- current_department (varchar) — current department name
- tenure_days (integer) — days with the company
- current_salary (integer) — current salary amount
- risk_category (varchar) — risk level ('high_risk', 'medium_risk', 'low_risk')
Note: Analyze only current employees (those with active salary records where to_date = '9999-01-01').
Create the turnover trend analysis table — build a table called turnover_trend_analysis in the employees schema with:
- departure_year (integer) — year when employees left (extract from to_date of salary records)
- departures_count (integer) — number of employees who left that year
- avg_tenure_days (decimal) — average tenure in days for employees who left that year
- avg_final_salary (decimal) — average final salary of departed employees that year
Apply risk assessment criteria for current employees:
- High risk: Employees in departments with retention rate < 80% AND tenure < 1095 days (3 years)
- Medium risk: Employees in departments with retention rate < 85% AND tenure < 1825 days (5 years)
- Low risk: All other current employees
Analyze departure trends — examine employees who left between 1985-2002, grouping by departure year.
Handle final salary selection — when calculating avg_final_salary, if an employee has multiple salary records with the same departure date, select the record with the latest start date. If there are still ties, select the record with the highest salary amount.
Focus appropriately — use current employees for risk analysis, all historical data for retention rates, and former employees for trend analysis.

The comprehensive analysis will help identify retention patterns, at-risk employees, and historical turnover trends to guide strategic workforce planning.

Verify

Python

"""
Verification script for PostgreSQL Task 2: Employee Retention Analysis
"""

import os
import sys
import psycopg2
from decimal import Decimal

def rows_match(actual_row, expected_row):
    """
    Compare two rows with appropriate tolerance.
    For Decimal types: allows 0.1 tolerance
    For other types: requires exact match
    """
    if len(actual_row) != len(expected_row):
        return False
    
    for actual, expected in zip(actual_row, expected_row):
        if isinstance(actual, Decimal) and isinstance(expected, Decimal):
            if abs(float(actual) - float(expected)) > 0.1:
                return False
        elif actual != expected:
            return False
    
    return True

def get_connection_params() -> dict:
    """Get database connection parameters."""
    return {
        "host": os.getenv("POSTGRES_HOST", "localhost"),
        "port": int(os.getenv("POSTGRES_PORT", 5432)),
        "database": os.getenv("POSTGRES_DATABASE"),
        "user": os.getenv("POSTGRES_USERNAME"),
        "password": os.getenv("POSTGRES_PASSWORD")
    }

def verify_retention_analysis_results(conn) -> bool:
    """Verify the employee retention analysis results."""
    with conn.cursor() as cur:
        # Get actual results from the created table
        cur.execute("""
            SELECT department_name, total_employees_ever, current_employees, 
                   former_employees, retention_rate
            FROM employees.employee_retention_analysis
            ORDER BY department_name
        """)
        actual_results = cur.fetchall()
        
        # Execute ground truth query
        cur.execute("""
            SELECT
            d.dept_name AS department_name,
            COUNT(DISTINCT de.employee_id) AS total_employees_ever,
            COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01') AS current_employees,
            (COUNT(DISTINCT de.employee_id)
            - COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01')) AS former_employees,
            (COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01'))::DECIMAL
                / NULLIF(COUNT(DISTINCT de.employee_id), 0) * 100 AS retention_rate
            FROM employees.department d
            LEFT JOIN employees.department_employee de
            ON d.id = de.department_id
            GROUP BY d.id, d.dept_name
            ORDER BY d.dept_name
        """)
        expected_results = cur.fetchall()

        if len(actual_results) != len(expected_results):
            print(f"❌ Expected {len(expected_results)} retention analysis results, got {len(actual_results)}")
            return False

        mismatches = 0
        for i, (actual, expected) in enumerate(zip(actual_results, expected_results)):
            if not rows_match(actual, expected):
                if mismatches < 5:  # Only show first 5 mismatches
                    print(f"❌ Row {i+1} mismatch: expected {expected}, got {actual}")
                mismatches += 1

        if mismatches > 0:
            print(f"❌ Total mismatches: {mismatches}")
            return False

        print(f"✅ Employee retention analysis results are correct ({len(actual_results)} records)")
        return True

def verify_high_risk_results(conn) -> bool:
    """Verify the high risk employee analysis results."""
    with conn.cursor() as cur:
        # Get actual results from the created table
        cur.execute("""
            SELECT employee_id, full_name, current_department, tenure_days, 
                   current_salary, risk_category
            FROM employees.high_risk_employees
            ORDER BY employee_id
        """)
        actual_results = cur.fetchall()
        
        # Execute ground truth query - only current employees
        cur.execute("""
            WITH current_salary AS (
            SELECT employee_id, amount AS current_amount
            FROM (
                SELECT s.*,
                    ROW_NUMBER() OVER (PARTITION BY s.employee_id
                                        ORDER BY s.from_date DESC, s.amount DESC) AS rn
                FROM employees.salary s
                WHERE s.to_date = DATE '9999-01-01'
            ) x
            WHERE rn = 1
            ),
            current_dept AS (
            SELECT employee_id, department_id
            FROM (
                SELECT de.*,
                    ROW_NUMBER() OVER (PARTITION BY de.employee_id
                                        ORDER BY de.from_date DESC, de.department_id) AS rn
                FROM employees.department_employee de
                WHERE de.to_date = DATE '9999-01-01'
            ) x
            WHERE rn = 1
            ),
            dept_retention AS (
            SELECT
                d.id   AS department_id,
                d.dept_name,
                COUNT(DISTINCT de.employee_id) AS total_employees_ever,
                COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01') AS current_employees,
                (COUNT(DISTINCT de.employee_id) FILTER (WHERE de.to_date = DATE '9999-01-01'))::NUMERIC
                / NULLIF(COUNT(DISTINCT de.employee_id), 0) * 100 AS retention_rate
            FROM employees.department d
            LEFT JOIN employees.department_employee de
                    ON de.department_id = d.id
            GROUP BY d.id, d.dept_name
            )
            SELECT
            e.id AS employee_id,
            CONCAT(e.first_name, ' ', e.last_name) AS full_name,
            d.dept_name AS current_department,
            (CURRENT_DATE - e.hire_date)::INTEGER AS tenure_days,
            cs.current_amount::INTEGER AS current_salary,
            CASE
                WHEN dr.retention_rate < 80  AND (CURRENT_DATE - e.hire_date) < 1095 THEN 'high_risk'
                WHEN dr.retention_rate < 85  AND (CURRENT_DATE - e.hire_date) < 1825 THEN 'medium_risk'
                ELSE 'low_risk'
            END AS risk_category
            FROM employees.employee e
            JOIN current_salary cs ON cs.employee_id = e.id
            JOIN current_dept   cd ON cd.employee_id = e.id
            JOIN employees.department d ON d.id = cd.department_id
            JOIN dept_retention dr ON dr.department_id = d.id
            ORDER BY e.id;
        """)
        expected_results = cur.fetchall()

        if len(actual_results) != len(expected_results):
            print(f"❌ Expected {len(expected_results)} high risk analysis results, got {len(actual_results)}")
            return False

        mismatches = 0
        for i, (actual, expected) in enumerate(zip(actual_results, expected_results)):
            if not rows_match(actual, expected):
                if mismatches < 5:  # Only show first 5 mismatches
                    print(f"❌ Row {i+1} mismatch: expected {expected}, got {actual}")
                mismatches += 1

        if mismatches > 0:
            print(f"❌ Total mismatches: {mismatches}")
            return False

        print(f"✅ High risk employee analysis results are correct ({len(actual_results)} records)")
        return True

def verify_turnover_trend_results(conn) -> bool:
    """Verify the turnover trend analysis results."""
    with conn.cursor() as cur:
        # Get actual results from the created table
        cur.execute("""
            SELECT departure_year, departures_count, avg_tenure_days, avg_final_salary
            FROM employees.turnover_trend_analysis
            ORDER BY departure_year
        """)
        actual_results = cur.fetchall()
        
        # Execute ground truth query - simplified version
        cur.execute("""
            WITH last_non_current_salary AS (
            SELECT
                s.employee_id,
                s.to_date      AS departure_date,
                s.amount       AS final_salary,
                ROW_NUMBER() OVER (
                PARTITION BY s.employee_id
                ORDER BY s.to_date DESC, s.from_date DESC, s.amount DESC
                ) AS rn
            FROM employees.salary s
            WHERE s.to_date <> DATE '9999-01-01'
                AND NOT EXISTS (
                SELECT 1
                FROM employees.salary s_cur
                WHERE s_cur.employee_id = s.employee_id
                    AND s_cur.to_date = DATE '9999-01-01'
                )
            ),
            departed AS (
            SELECT employee_id, departure_date, final_salary
            FROM last_non_current_salary
            WHERE rn = 1
            ),
            with_tenure AS (
            SELECT
                e.id AS employee_id,
                d.departure_date,
                d.final_salary,
                (d.departure_date - e.hire_date)::INTEGER AS tenure_days
            FROM employees.employee e
            JOIN departed d ON d.employee_id = e.id
            )
            SELECT
            EXTRACT(YEAR FROM departure_date)::INTEGER AS departure_year,
            COUNT(*)::INTEGER                         AS departures_count,
            AVG(tenure_days)                          AS avg_tenure_days,
            AVG(final_salary)                         AS avg_final_salary
            FROM with_tenure
            WHERE departure_date BETWEEN DATE '1985-01-01' AND DATE '2002-12-31'
            GROUP BY EXTRACT(YEAR FROM departure_date)
            ORDER BY departure_year;
        """)
        expected_results = cur.fetchall()

        if len(actual_results) != len(expected_results):
            print(f"❌ Expected {len(expected_results)} turnover trend results, got {len(actual_results)}")
            return False

        mismatches = 0
        for i, (actual, expected) in enumerate(zip(actual_results, expected_results)):
            if not rows_match(actual, expected):
                if mismatches < 5:  # Only show first 5 mismatches
                    print(f"❌ Row {i+1} mismatch: expected {expected}, got {actual}")
                mismatches += 1

        if mismatches > 0:
            print(f"❌ Total mismatches: {mismatches}")
            return False

        print(f"✅ Turnover trend analysis results are correct ({len(actual_results)} records)")
        return True

def main():
    """Main verification function."""
    print("=" * 50)

    # Get connection parameters
    conn_params = get_connection_params()

    if not conn_params["database"]:
        print("❌ No database specified")
        sys.exit(1)

    try:
        # Connect to database
        conn = psycopg2.connect(**conn_params)

        # Verify all three analysis results
        success = (
            verify_retention_analysis_results(conn) and 
            verify_high_risk_results(conn) and 
            verify_turnover_trend_results(conn)
        )

        conn.close()

        if success:
            print("\n🎉 Task verification: PASS")
            sys.exit(0)
        else:
            print("\n❌ Task verification: FAIL")
            sys.exit(1)

    except psycopg2.Error as e:
        print(f"❌ Database error: {e}")
        sys.exit(1)
    except Exception as e:
        print(f"❌ Verification error: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()