< More Jobs
Posted on 2025/12/08
Remote Data Scientist - AI Trainer ($100-$120 per hour)
Mercor
San Diego, CA, United States
Full-time
Full Description
Job Description: AI Task Evaluation & Statistical Analysis Specialist
Role Overview We're seeking a data-driven analyst to conduct comprehensive failure analysis on AI agent performance across finance-sector tasks. You'll identify patterns, root causes, and systemic issues in our evaluation framework by analyzing task performance across multiple dimensions (task types, file types, criteria, etc.). ## Key Responsibilities - Statistical Failure Analysis: Identify patterns in AI agent failures across task components (prompts, rubrics, templates, file types, tags) - Root Cause Analysis: Determine whether failures stem from task design, rubric clarity, file complexity, or agent limitations - Dimension Analysis: Analyze performance variations across finance sub-domains, file types, and task categories - Reporting & Visualization: Create dashboards and reports highlighting failure clusters, edge cases, and improvement opportunities - Quality Framework: Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings - Stakeholder Communication: Present insights to data labeling experts and technical teams ## Required Qualifications - Statistical Expertise: Strong foundation in statistical analysis, hypothesis testing, and pattern recognition - Programming: Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis - Data Analysis: Experience with exploratory data analysis and creating actionable insights from complex datasets - AI/ML Familiarity: Understanding of LLM evaluation methods and quality metrics - Tools: Comfortable working with Excel, data visualization tools (Tableau/Looker), and SQL ## Preferred Qualifications - Experience with AI/ML model evaluation or quality assurance - Background in finance or willingness to learn finance domain concepts - Experience with multi-dimensional failure analysis - Familiarity with benchmark datasets and evaluation frameworks - 2-4 years of relevant experience
Find AI, ML, Data Science Jobs By Location
Find Jobs By Position