Posted on 2026/04/10

Remote AI Quality Evaluator (Japanese). Job in San Antonio Move Collective Jobs

Turing

San Antonio, TX, United States

Full-time

Qualifications

• Japanese Proficiency: Ability to read and write in Japanese fluently, as Japanese is the primary language for this project

• Personal Account Usage: Willingness to utilize your main personal Google account (not a testing account) and enable personal data sources for an authentic evaluation

• Schedule Flexibility: Availability for full-time work in your local time zone is essential

• Creative Prompt Engineering: Experience in developing innovative, multi-turn starting prompts based on personal context to rigorously test the model's capabilities

• Strong Evaluation Acumen: Familiarity with personalization concepts, including the ability to spot incorrect personalization, flawed inferences, and forced connections

• Excellent Written Communication: Strong ability to articulate clear and concise rationales for model rankings, explicitly citing specific turn numbers

• Independence: Self-motivated and capable of working autonomously in a remote environment

• Technical Setup: A desktop or laptop with a reliable internet connection is required

• A BS/BA degree or equivalent experience in a relevant field (e.g., Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical area)

• Commitments Required: A minimum of 4 hours per day and 30 hours per week, including 4 hours of overlap with Pacific Standard Time (PST)

• Two options for time commitment are available: 30 hrs/week or 40 hrs/week

• Engagement Type: Contractor

• Engagement Length: 3 months

• 10 more items(s)

Responsibilities

• As an AI Quality Evaluator, you will examine a novel personalization feature for Gemini

• Your task will be to assess how effectively the model utilizes information from your previous Gemini conversations, Gmail, Google Search, and YouTube activity to provide more relevant and helpful responses

• This position demands a unique combination of creativity and analytical precision

• You will craft prompts drawn from your own experiences and employ your analytical skills to evaluate the quality of the model's personalized replies, focusing on aspects such as Grounding, Integration, and Helpfulness

• We require coverage from a global, 24-hour operations team

• Exceptional Analytical Thinking: Capability to evaluate nuanced and ambiguous AI responses, specifically judging the quality of personalization

• Meticulous Attention to Detail: Skill in reviewing Side-by-Side (SxS) model responses and identifying subtle nuances in naturalness and excessive narration

• Feedback: Capacity to provide constructive critiques and detailed annotations

• Communication: Outstanding communication and collaboration abilities

• Join a dynamic team dedicated to evaluating the quality of personalized AI interactions

• Creating and executing multi-turn conversational prompts (typically 1-5 turns) that compel the AI to draw upon your personal information and experiences

• Assessing model responses based on your initial intent, verifying the appropriateness of the personalization

• Analyzing responses for Grounding issues, ensuring that claims about you are substantiated by evidence rather than flawed reasoning or hallucinations

• Evaluating Integration quality to ensure that personal data is seamlessly incorporated into the response without robotic or excessive narration

• Thoroughly comparing and ranking two model responses side-by-side (SxS) to identify which is more helpful, user-friendly, and enjoyable

• Providing clear and defensible justifications for your evaluations, explicitly highlighting issues or commendable aspects of the conversations

• Extracting and confirming "Debug Info" from the model to ensure that chat summaries and data sources were effectively utilized

• Upholding rigorous data hygiene by deleting evaluation conversations to avoid contaminating your future chat history

• 15 more items(s)

More job highlights

Job description

About Turing:

Turing is the premier research accelerator for cutting-edge AI labs and a trusted collaborator for global businesses implementing sophisticated AI solutions.

We offer support in two key areas: enhancing frontier research with high-quality data, advanced training frameworks, and elite AI researchers skilled in coding, reasoning, STEM, multilingualism, multimodality, and agent-based s ystems; and transforming AI from concept to proprietary intelligence to ensure reliable performance, significant impact, and lasting benefits on profit and loss.

Role Overview:

As an AI Quality Evaluator, you will examine a novel personalization feature for Gemini.

Your task will be to assess how effectively the model utilizes information from your previous Gemini conversations, Gmail, Google Search, and YouTube activity to provide more relevant and helpful responses.

This position demands a unique combination of creativity and analytical precision.

You will craft prompts drawn from your own experiences and employ your analytical skills to evaluate the quality of the model's personalized replies, focusing on aspects such as Grounding, Integration, and Helpfulness.

Key Qualifications:

• Japanese Proficiency: Ability to read and write in Japanese fluently, as Japanese is the primary language for this project.

• Personal Account Usage: Willingness to utilize your main personal Google account (not a testing account) and enable personal data sources for an authentic evaluation.

• Schedule Flexibility: Availability for full-time work in your local time zone is essential. We require coverage from a global, 24-hour operations team.

• Exceptional Analytical Thinking: Capability to evaluate nuanced and ambiguous AI responses, specifically judging the quality of personalization.

• Creative Prompt Engineering: Experience in developing innovative, multi-turn starting prompts based on personal context to rigorously test the model's capabilities.

• Strong Evaluation Acumen: Familiarity with personalization concepts, including the ability to spot incorrect personalization, flawed inferences, and forced connections.

• Meticulous Attention to Detail: Skill in reviewing Side-by-Side (SxS) model responses and identifying subtle nuances in naturalness and excessive narration.

• Excellent Written Communication: Strong ability to articulate clear and concise rationales for model rankings, explicitly citing specific turn numbers.

• Feedback: Capacity to provide constructive critiques and detailed annotations.

• Communication: Outstanding communication and collaboration abilities.

• Independence: Self-motivated and capable of working autonomously in a remote environment.

• Technical Setup: A desktop or laptop with a reliable internet connection is required.

Description:

• Join a dynamic team dedicated to evaluating the quality of personalized AI interactions.

Your Day-to-Day Responsibilities Will Include:

• Creating and executing multi-turn conversational prompts (typically 1-5 turns) that compel the AI to draw upon your personal information and experiences.

• Assessing model responses based on your initial intent, verifying the appropriateness of the personalization.

• Analyzing responses for Grounding issues, ensuring that claims about you are substantiated by evidence rather than flawed reasoning or hallucinations.

• Evaluating Integration quality to ensure that personal data is seamlessly incorporated into the response without robotic or excessive narration.

• Thoroughly comparing and ranking two model responses side-by-side (SxS) to identify which is more helpful, user-friendly, and enjoyable.

• Providing clear and defensible justifications for your evaluations, explicitly highlighting issues or commendable aspects of the conversations.

• Extracting and confirming "Debug Info" from the model to ensure that chat summaries and data sources were effectively utilized.

• Upholding rigorous data hygiene by deleting evaluation conversations to avoid contaminating your future chat history.

Education & Experience:

• A BS/BA degree or equivalent experience in a relevant field (e.g., Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical area).

• Preferred experience in data annotation, AI quality assessment, content moderation, or a related field.

Offer Details:

• Commitments Required: A minimum of 4 hours per day and 30 hours per week, including 4 hours of overlap with Pacific Standard Time (PST).

Two options for time commitment are available: 30 hrs/week or 40 hrs/week.

• Engagement Type: Contractor.

• Engagement Length: 3 months.

After applying, you will receive an email with a login link to complete your profile on the portal.

Know exceptional talent?

Refer them to earn money from your network.

Show full description

Choose what you’re giving feedback on

Report this listing

Apply Promote

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Learn More