Project Sigma
Maker Onboarding
Agenda
1. Project Overview
2. Project Objective
3. Mathematics Data Set
4. Formatting
5. Question Creation
6. Stumping the Model
7. Answer Creation
8. Explanation Creation
9. Quality Rubric
The following documentation provides a comprehensive approach to processing tasks within Project
Sigma. While your academic training has equipped you with exceptional analytical skills, these guidelines
ensure consistency across our diverse contributor base while maintaining the high standards expected of
this project.
We recommend:
● 1. Reading the guidelines thoroughly before beginning any tasks
● 2. Referring back to specific sections when questions arise
● 3. Applying your domain expertise within the framework provided
● 4. Noting any areas where your specialized knowledge might suggest improvements
Project Overview
To create and curate high-quality, diverse math reasoning datasets that enable the training and evaluation
of AI models in solving complex mathematical problems. The project aims to ensure that the data
challenges models to demonstrate logical reasoning, step-by-step problem-solving, and accurate
application of mathematical concepts, ultimately improving the AI’s ability to understand, explain, and solve
math problems in a manner consistent with human reasoning.
Project Objective
Questions should be original, non-routine problems that require analytical skills and a deep understanding
of Mathematical principles, focusing on reasoning ability. Questions must not be copied from or found in
public archives. Here are some examples of sufficiently complex questions:
Mathematics Dataset
Mathematics Dataset
Formatting
Contributors are required to write all mathematical expressions and equations using Markdown with
LaTeX syntax where applicable. The tool provides a built-in Markdown editor to assist with both question
and answer writing, ensuring proper formatting and readability.
● Contributors may use iMathEQ to create complex mathematical equations, then copy and paste the
generated LaTeX code into the Markdown editor, enclosed with “$” symbols as prefix and suffix, to
ensure proper rendering.
Note: The following math question is too simple to stump the real model.
Stumping the Model
● You should write questions with the goal of stumping the model. Stumping the model means that when you click the
“Test Model Response” button, the model should generate an incorrect answer.
● The more complex your question is, the more likely you are to stump the model.
● If you successfully stump the model, then you will write the correct answer and the chain-of-thought explanation to
demonstrate how to actually find the answer.
● If the model gets the correct answer, then you will need to write a more complex question to try to stump the model
with.
Stumping the Model
● Utilize questions like the examples provided in the previous slides only as benchmarks and
inspiration for question style, complexity, and the type of reasoning required.
● Create unique, original and complex reasoning questions and test them against the provided LLM
model for benchmarking (“Test Model Response” button in Task UI).
Question Creation
● If the model is unable to answer the question correctly, the user should proceed and add a short answer, a detailed
explanation, and the chain of thought required to answer the question.
● But if the model is able to answer the question correctly, the user is required to modify/create a new question to
increase its complexity; else, if the model has made logical issues for the solution of the question, the user can go
ahead with the short answer and the detailed explanation, where the chain-of-thought added should be accurate.
● And finally, the maker can proceed with filling in the mandatory attributes and adding reference images and citations,
if applicable.
Question Creation
If the model has made logical
issues for the solution of the
question
04
The user can go ahead with the short answer
and the detailed explanation, where the
chain-of-thought added should be accurate
If the model is able to answer
the question correctly
03
The user is required to modify/create a new
question to increase its complexity
If the model is unable to
answer the question correctly
02
Add a short answer, a detailed explanation,
and the chain of thought required to answer
the question.
Create complex reasoning questions and test them against the
provided LLM model for benchmarking
01
Answer Creation
● The answer will be the final answer derived from the answer. It should directly address the question.
● The answer should just be the answer. Any explanation for how to get the answer should go in the
following explanation section.
● Accuracy and Factual Correctness: All information should be factually accurate. Contributors
should cross-verify details with reliable sources or textbooks before submission, if needed.
● Relevance to the question: Answers must directly address the question or question without
diverging from the core topic. Each answer should fully resolve the query in a concise and focused
manner.
Answer Creation
Explanation Creation
● Developing the Chain-of-Thought Explanation: Along with the correct answer, contributors must
provide a complete Chain-of-Thought explanation, which needs to be detailed, providing the
step-by-step logical process, reasoning, and calculations to arrive at the correct answer from the
information given in the question. The explanation should be clear, comprehensive, and accurately
reflect the reasoning required.
● Clarity and Coherence: Use clear, structured language and present information in a logical
sequence. The tone and explanation should be appropriate, ensuring ease of understanding.
Explanation Creation
Metadata
● The following information should be provided for each job:
● Subdomain: Tag the question with the subdomain it is in within the domain of mathematics.
(Example - Algebra, Geometry, Calculus, etc). You can find the list with the most common
subdomains in the guidelines. In the case the drop down does not cover the sub-domain, you may
select the “others” option and then fill in the open text field.
● Source: An optional piece of metadata. If there was a paper that served as the inspiration for the
question or contained vital information related to the question, please provide information on that
paper.
Metadata
Live Feedback
Live Feedback
● REMEMBER: The Live Feedback button uses a Large Language Model to provide feedback.
● The Live Feedback is almost always wrong on “correctness” categories.
● The Live Feedback is better at noticing if incorrect formatting is used.
● The Live Feedback rubric results have no bearing on how the Reviewer or QC steps will grade your
task.
Live Feedback
Quality Rubric - Prompt
Complexity: To what extent does the prompt require multi step logical reasoning and the application of advanced mathematical concepts?
● 5_Very High Complexity:
○ The prompt requires a highly complex and creative chain of reasoning that is nonstandard. It involves the application of advanced,
mathematical theorems or a novel synthesis of multiple complex concepts (typical of math Olympiad problems). The solution
demands significant ingenuity and deep mathematical insight.
○ Two or more distinct areas of advanced mathematics are required to determine the correct final answer.
● 4_High Complexity:
○ The prompt requires a robust chain of logical deductions and the application of advanced concepts, which may be from early
university-level mathematics (e.g., calculus, introductory number theory, vectors). The solution demands significant insight to connect
different ideas or a creative approach to problem solving.
○ Two or more distinct areas of advanced mathematics are required to determine the correct final answer.
● 3_Moderate Complexity:
○ Either the prompt correctly requires multiple, integrated logical steps and the connection of two or more distinct mathematical
concepts, but the concepts are typically from advanced high school mathematics (e.g., trigonometry, logarithms, sequences and
series).
○ Or the prompt only requires one area of advanced mathematics instead of involving multiple distinct areas, but the mathematics is
suitably advanced.
● 2_Low Complexity:
○ The prompt requires a short and predictable sequence of logical steps (2-3 steps). The reasoning follows a common, procedural
method.
○ It involves the application of standard, high school–level concepts (e.g., solving linear equations, basic geometry theorems).
● 1_Very Low Complexity:
○ The prompt requires a single, straightforward step or the direct application of a basic, well-known formula. The reasoning is minimal,
and the concepts involved are elementary (e.g., basic arithmetic, simple algebraic manipulation). The solution path is immediately
obvious.
Quality Rubric - Prompt
Domain Match: How closely does the selected domain in the metadata match the prompt
● 5_ Perfectly Aligned: The selected domain is the primary and essential field required to solve the prompt. The core concepts, theorems, and
solution methods are central to this domain. It is the most accurate and specific classification possible
○ Example: The prompt is about finding the number of integer solutions to an equation, and the selected domain is Number Theory >
Diophantine Equations.
● 4_Closely Aligned: The selected domain is a major component of the prompt's solution. It is either one of two equally important domains
required to solve the problem, or it is a correct broader category where a more specific subdomain would have been the perfect choice.
○ Example: The prompt requires both Combinatorics and Probability, and Combinatorics is selected. Or, the prompt is about Prime
Factorization, and the broader domain of Number Theory is selected.
● 3_Somewhat Aligned: The selected domain is relevant and plays a secondary or supporting role in the solution, but it is not the primary
domain. A more central or specific domain could have been chosen to better represent the prompt's main.
○ Example: The prompt involves using trigonometric identities to solve a geometric problem (Geometry), but the selected domain is
Trigonometry.
● 2_Poorly Aligned: Description: The selected domain has only a minor or tangential connection to the prompt. While a concept from this
domain might be mentioned or used in a trivial way, it is not at all central to the required reasoning or solution method. The choice is a
significant misrepresentation of the prompt's core challenge.
○ Example: The prompt is a complex Diophantine equation (Number Theory) that happens to involve a quadratic, but the selected
domain is Algebra > Theory of Equations.
● 1_Not Aligned: The selected domain has no connection to the mathematical concepts or reasoning required by the prompt. The selection is
factually incorrect and does not reflect any part of the problem or its solution.
○ Example: The prompt is about Euclidean Geometry, but the selected domain is Number Theory.
Quality Rubric - Prompt
Quality Rubric - Prompt
Well Defined: Is the prompt unambiguous, self-contained (no missing assumptions), and clearly formulated?
● Yes_the prompt
○ Unambiguous: All terms, conditions, and requirements are clearly defined with no room for multiple interpretations.
○ Self-contained: All necessary information to solve the problem is provided within the prompt, with no unstated or implied
assumptions required.
○ Clearly formulated: The question or task is presented in a logical, well-structured manner that is easy to understand.
○ Complete: All relevant variables, constraints, and conditions are explicitly stated.
○ Consistent: There are no contradictions or conflicting information within the prompt.
○ Precise language: Mathematical terms and concepts are used accurately and appropriately.
● No_the prompt:
○ Contains ambiguities: Some terms, conditions, or requirements are open to multiple interpretations.
○ Missing information: Unstated or implied assumptions are necessary to solve the problem.
○ Unclear formulation: The question is presented in a confusing or illogical manner.
○ Incomplete: Some relevant variables, constraints, or conditions are not explicitly stated.
○ Inconsistent: Contains contradictions or conflicting information.
○ Imprecise language: Mathematical terms or concepts are used inaccurately or inappropriately.
Quality Rubric - Prompt
Formatting: Is LaTeX formatting used consistently and correctly for the prompts?
● Yes:
○ LaTeX formatting is applied consistently throughout the prompt, ensuring uniformity in the presentation of mathematical expressions
and symbols.
○ All mathematical expressions, equations, and symbols are correctly formatted using LaTeX, with no errors or inconsistencies.
● No:
○ LaTeX formatting is not applied consistently, leading to variations in the presentation of mathematical expressions and symbols.
○ There are errors or inconsistencies in the LaTeX formatting of mathematical expressions, equations, or symbols.
Naturalness: Is the prompt grammatical and natural sounding, and written using an academic register?
● 5_Excellent: The language is clear, precise, and natural-sounding, with no grammatical errors. It maintains a consistent and appropriate
academic register, using precise mathematical terminology correctly.
● 4_Good / Mostly Correct: The prompt is free from grammatical errors. The language is for the most part clear and natural sounding and is
easy to comprehend. The academic register is appropriate and consistent throughout the prompt, requiring minimal or no editing.
● 3_Moderate: The prompt is mostly understandable but contains noticeable minor grammatical or stylistic errors. The language is mostly
natural but may have some awkward sentences that could be improved. The academic register is generally maintained but may have minor
inconsistencies. The prompt is functional but requires editing.
● 2_Below Average: The prompt contains significant grammatical errors that impact readability. The language is frequently unnatural or
confusing. The academic register may be inconsistent. While the core mathematical question is somewhat usable, the text requires substantial
editing for clarity and correctness.
● 1_Poor: The prompt contains numerous critical grammatical errors that make it difficult to understand. The language is unnatural, and
awkward. The register may also not be appropriate for a mathematical context. The prompt is fundamentally flawed and requires a complete
rewrite.
Quality Rubric - Final Answer
Correctness: Is the final answer the mathematically-correct answer to the prompt?
● Yes:
○ The final answer provided is accurate and has been verified and validated against the prompt, ensuring that it is
the correct solution.
● No:
○ The answer does not match the expected solution based on the prompt
Formatting: Is LateX formatting used correctly and consistently for the final answers
● Yes:
○ LaTeX formatting is applied correctly in the final answer, ensuring uniformity in the presentation of mathematical
expressions and symbols.
○ All mathematical expressions, equations, and symbols are correctly formatted using LaTeX, with no errors or
inconsistencies.
● No:
○ LaTeX formatting is not applied correctly in the final answer, leading to variations in the presentation of
mathematical expressions and symbols.
○ There are errors or inconsistencies in the LaTeX formatting of mathematical expressions, equations, or symbols.
Quality Rubric - Final Answer
Quality Rubric
Quality Rubric - Explanation
Chain-of-Thought Reasoning: To what extent does the explanation follow clear, logical steps to demonstrate the reasoning
process?
● 5_Perfect Reasoning: The explanation perfectly lays out all of the logical steps required to solve the prompt. Every
step is clearly, logically, and elegantly presented.
● 4_Well Reasoned: The explanation does a good job laying out all of the logical steps required to solve the prompt.
Each step is clear and logical, but minor improvements could be made.
● 3_Minor Gaps: The explanation generally does a good job of laying out the logical steps required to solve the prompt,
but there are minor gaps in the reasoning process. There may also be minor problems with how the reasoning steps
connect to one another. The explanation is functional but requires editing.
● 2_Major Gaps: The explanation contains major gaps in explaining the logical steps required to solve the prompt.
There may also be major issues with the logic of how each step logically connects to the next. The explanation
requires major revision.
● 1_Poor: The explanation completely fails to explain the logical steps required to solve the prompt. A complete rewrite
of the explanation is needed.
Quality Rubric - Explanation
Naturalness: Is the explanation grammatical and natural sounding, and written using an academic register?
● 5_Excellent: The language of the explanation is clear, precise, and natural-sounding, with no grammatical errors. It
maintains a consistent and appropriate academic register, using precise mathematical terminology correctly.
● 4_Good / Mostly Correct: The explanation is free from grammatical errors. The language is for the most part clear
and natural sounding and is easy to comprehend. The academic register is appropriate and consistent throughout the
explanation, requiring minimal or no editing.
● 3_Moderate: The explanation is mostly understandable but contains noticeable minor grammatical or stylistic errors.
The language is mostly natural but may have some awkward sentences that could be improved. The academic
register is generally maintained but may have minor inconsistencies. The explanation is functional but requires editing.
● 2_Below Average: The explanation contains significant grammatical errors that impact readability. The language is
frequently unnatural or confusing. The academic register may be inconsistent. While the explanation is somewhat
usable, the text requires substantial editing for clarity and correctness.
● 1_Poor: The explanation contains numerous critical grammatical errors that make it difficult to understand. The
language is unnatural, and awkward. The register may also not be appropriate for a mathematical context. The
explanation is fundamentally flawed and requires a complete rewrite.
Quality Rubric - Explanation
Use of Proper Notation: To what extent are mathematical symbols, formulas, and terminology used consistently and
correctly in the explanation?
● 5_Excellent: The explanation is completely correct and consistent in its use of proper mathematical notation.
● 4_Correct / Mostly Consistent: The explanation is completely correct in its use of proper mathematical notation. It
may contain minor inconsistencies with its mathematical notation (for example, mixing different versions of
terminology referring to the same thing) that do not affect correctness.
● 3_Mostly Correct / Inconsistent: The explanation is mostly correct in its use of proper mathematical notation. Either
minor errors are present or there are major inconsistencies in how mathematical notation is used. The explanation is
functional but requires editing.
● 2_Incorrect: The explanation contains major errors in proper mathematical notation. It may also have issues in
consistency of how mathematical notation is used. While the explanation is somewhat usable, the text requires
substantial editing for proper mathematical notation.
● 1_Poor: The explanation contains significant errors in proper mathematical notation. The explanation is fundamentally
flawed and requires a complete rewrite. The explanation is fundamentally flawed and requires a complete rewrite.
Quality Rubric - Explanation
Formatting: Is LaTex formatting used consistently and correctly in the explanation?
● Yes:
○ LaTeX formatting is applied correctly in the explanation, ensuring uniformity in the presentation of mathematical expressions and
symbols.
○ All mathematical expressions, equations, and symbols are correctly formatted using LaTeX, with no errors or inconsistencies.
● No:
○ LaTeX formatting is not applied correctly in the explanation, leading to variations in the presentation of mathematical expressions and
symbols.
○ There are errors or inconsistencies in the LaTeX formatting of mathematical expressions, equations, or symbols.
Completeness: To what extent does the explanation fully solve all parts of the problem?
● 5_Complete: The explanation completely answers the prompt. There is nothing missing from the process of finding the solution, and all parts
of the prompt are answered completely.
● 4_Mostly Complete: The explanation for the most part completely answers the prompt. The process for finding the solution could possibly be
better fleshed out, but all parts of the prompt are answered completely.
● 3_Almost Complete: The explanation comes close to completely answering the prompt, but either is lacking some major parts of explaining
the process of finding the solution or fails to answer some minor parts of the prompt. The explanation is functional but requires editing.
● 2_Incomplete: The explanation fails to answer major parts of the prompt and may also lack major parts of explaining the process for finding
the solution. While the explanation is somewhat usable, the text is missing significant parts of what is needed for the explanation to be
complete.
● 1_Poor: The explanation completely fails to answer the prompt. The explanation requires a complete rewrite.
Quality Rubric - Explanation
No Hallucination: Is the explanation free from invented definitions, theorems, or data not provided in the prompt or
commonly known?
● Yes:
● The solution is free from invented definitions, theorems, or data not provided in the prompt or commonly known.
● No:
● The solution contains invented definitions, theorems, or data not provided in the prompt or commonly known.
Conciseness: To what extent does the explanation avoid overly verbose explanations or redundant steps?
● 5_Concise: The explanation contains precisely as much information as is needed to answer the prompt. There is no
irrelevant or redundant information contained in the explanation. Syntax and word choice are spot on to elegantly and
concisely explain how the correct answer is found.
● 4_Mostly Concise: The explanation is concise and does not contain irrelevant or redundant information. It could
potentially be written more elegantly.
● 3_Almost Concise: The explanation is almost concise but does contain some irrelevant or redundant information. The
explanation is functional but requires editing.
● 2_Verbose: The explanation contains a lot of irrelevant or redundant information. Major edits are needed to fix this
explanation.
● 1_Poor: The explanation is almost entirely redundant or irrelevant information. A complete rewrite of the explanation is
needed.
Example
● Subdomain: Graph Theory
● Source: “”
● Correct Final Answer: 5
● Explanation (Chain of Thought): see next slides
Example (continued)
Example (continued)
Example (continued)
Example (continued)
As part of our continuous improvement expectations, we are implementing strict quality
control measures.
● If your failed count exceeds 25%, you will be pulled from production and assigned
to re-training.
● You will then have 5 business days to demonstrate improvement following the
training.
● Failure to show improvement after this period will result in removal from the project
We strongly encourage you to make use of the feedback provided by the QC team to
improve continuously throughout the project.
Quality Control & Continuous Improvement Notice
To prevent multiple rework loops and ensure efficient use of contributor efforts, we are
implementing the following restrictions moving forward:
● Only one rework per task is allowed at the maker stage. After a job exceeds the rework
limit and is rejected the 2nd time at the review step, it will be cancelled and not
accounted for.
● Read reviewer feedback carefully and fix all errors during the first rework attempt.
● Makers are expected to understand quality guidelines and improve task quality to remain
eligible for the project.
● If you disagree with the reviewer’s feedback or wish to challenge a failed decision,
submit the Job ID via community_support@telusinternational.ai, so the project team can
investigate further.
Please note: your performance is tracked based on the number of reworks, cancelled jobs as
well as quality of your tasks.
Rework of Tasks
We observed a lot of timed-out tasks at the maker step and hence did not get reassigned
to the same maker. Please remember:
● FTS will time out if left idle (maximum of 3 hours).
● Exceeding this limit will result in automatic exit and loss of progress, so plan your
work accordingly.
● You should open, save, and exit tasks properly to keep your workspace clean and
avoid data loss.
Time-out Tasks
● The content created for each task must be unique and originally created by you.
● Using AI models or internet websites to create content is NOT allowed for this
project.
● The use of AI to create content, or copying content from websites, will result in
disqualification and removal from the project.
Unique Prompts
Minimum expected commitment is: 5 hrs a week, with no maximum limit.
● Tasks are available on a first-come, first-served basis.
● If you don’t see tasks when you log in, it likely means others have already picked
them up — keep checking!
● 📌 Tip: FTS does not send notifications, so we recommend checking the queue daily
to stay active and maximize your opportunities.
🔍Task Description:
Once in production, we will track your performance under live production standards,
including:
● Average Handling Time (AHT) 70 minutes – this is monitored closely.
● Note: Consistently exceeding AHT expectations may lead to removal from the project.
📊 Performance Expectations
You will be required to complete the task using FTS Studio for this project. You can access
your task using this link:
https://fts-app.playment.io/
Additionally, we’ve emailed you a how-to document to further assist you in navigating the
tool. The document provides step-by-step instruction to log on to the tool and proceed
with your assigned task.
Annotation Platform
To be successful in this project, you must study and deeply understand the project
guidelines. This is the best way to ensure success and compensated tasks.
Guidelines
There will be some minor updates to the guidelines completed later today, so please
reread the guidelines over the next few days.
📓Guidelines:
Next Steps
Project Team Intro
& Q&A Session
Application &
Pre-qualification
Task Completion
Simulation / Qualification
Project Training &
Guidelines Review
Production
Pay Rate or Portal Questions
For any questions regarding your pay rate or regarding the AI Community Portal, you
may address them at payments_support@telusinternational.ai or with your sourcing
representative
LIVE DEMONSTRATION
with Reza
Q&A with you!
Thank you for your attention!

Guidelines about how to train a neural network

  • 1.
  • 2.
    Agenda 1. Project Overview 2.Project Objective 3. Mathematics Data Set 4. Formatting 5. Question Creation 6. Stumping the Model 7. Answer Creation 8. Explanation Creation 9. Quality Rubric
  • 3.
    The following documentationprovides a comprehensive approach to processing tasks within Project Sigma. While your academic training has equipped you with exceptional analytical skills, these guidelines ensure consistency across our diverse contributor base while maintaining the high standards expected of this project. We recommend: ● 1. Reading the guidelines thoroughly before beginning any tasks ● 2. Referring back to specific sections when questions arise ● 3. Applying your domain expertise within the framework provided ● 4. Noting any areas where your specialized knowledge might suggest improvements Project Overview
  • 4.
    To create andcurate high-quality, diverse math reasoning datasets that enable the training and evaluation of AI models in solving complex mathematical problems. The project aims to ensure that the data challenges models to demonstrate logical reasoning, step-by-step problem-solving, and accurate application of mathematical concepts, ultimately improving the AI’s ability to understand, explain, and solve math problems in a manner consistent with human reasoning. Project Objective
  • 5.
    Questions should beoriginal, non-routine problems that require analytical skills and a deep understanding of Mathematical principles, focusing on reasoning ability. Questions must not be copied from or found in public archives. Here are some examples of sufficiently complex questions: Mathematics Dataset
  • 6.
  • 7.
    Formatting Contributors are requiredto write all mathematical expressions and equations using Markdown with LaTeX syntax where applicable. The tool provides a built-in Markdown editor to assist with both question and answer writing, ensuring proper formatting and readability. ● Contributors may use iMathEQ to create complex mathematical equations, then copy and paste the generated LaTeX code into the Markdown editor, enclosed with “$” symbols as prefix and suffix, to ensure proper rendering.
  • 8.
    Note: The followingmath question is too simple to stump the real model. Stumping the Model
  • 9.
    ● You shouldwrite questions with the goal of stumping the model. Stumping the model means that when you click the “Test Model Response” button, the model should generate an incorrect answer. ● The more complex your question is, the more likely you are to stump the model. ● If you successfully stump the model, then you will write the correct answer and the chain-of-thought explanation to demonstrate how to actually find the answer. ● If the model gets the correct answer, then you will need to write a more complex question to try to stump the model with. Stumping the Model
  • 10.
    ● Utilize questionslike the examples provided in the previous slides only as benchmarks and inspiration for question style, complexity, and the type of reasoning required. ● Create unique, original and complex reasoning questions and test them against the provided LLM model for benchmarking (“Test Model Response” button in Task UI). Question Creation
  • 11.
    ● If themodel is unable to answer the question correctly, the user should proceed and add a short answer, a detailed explanation, and the chain of thought required to answer the question. ● But if the model is able to answer the question correctly, the user is required to modify/create a new question to increase its complexity; else, if the model has made logical issues for the solution of the question, the user can go ahead with the short answer and the detailed explanation, where the chain-of-thought added should be accurate. ● And finally, the maker can proceed with filling in the mandatory attributes and adding reference images and citations, if applicable. Question Creation If the model has made logical issues for the solution of the question 04 The user can go ahead with the short answer and the detailed explanation, where the chain-of-thought added should be accurate If the model is able to answer the question correctly 03 The user is required to modify/create a new question to increase its complexity If the model is unable to answer the question correctly 02 Add a short answer, a detailed explanation, and the chain of thought required to answer the question. Create complex reasoning questions and test them against the provided LLM model for benchmarking 01
  • 12.
  • 13.
    ● The answerwill be the final answer derived from the answer. It should directly address the question. ● The answer should just be the answer. Any explanation for how to get the answer should go in the following explanation section. ● Accuracy and Factual Correctness: All information should be factually accurate. Contributors should cross-verify details with reliable sources or textbooks before submission, if needed. ● Relevance to the question: Answers must directly address the question or question without diverging from the core topic. Each answer should fully resolve the query in a concise and focused manner. Answer Creation
  • 14.
  • 15.
    ● Developing theChain-of-Thought Explanation: Along with the correct answer, contributors must provide a complete Chain-of-Thought explanation, which needs to be detailed, providing the step-by-step logical process, reasoning, and calculations to arrive at the correct answer from the information given in the question. The explanation should be clear, comprehensive, and accurately reflect the reasoning required. ● Clarity and Coherence: Use clear, structured language and present information in a logical sequence. The tone and explanation should be appropriate, ensuring ease of understanding. Explanation Creation
  • 16.
  • 17.
    ● The followinginformation should be provided for each job: ● Subdomain: Tag the question with the subdomain it is in within the domain of mathematics. (Example - Algebra, Geometry, Calculus, etc). You can find the list with the most common subdomains in the guidelines. In the case the drop down does not cover the sub-domain, you may select the “others” option and then fill in the open text field. ● Source: An optional piece of metadata. If there was a paper that served as the inspiration for the question or contained vital information related to the question, please provide information on that paper. Metadata
  • 18.
  • 19.
  • 20.
    ● REMEMBER: TheLive Feedback button uses a Large Language Model to provide feedback. ● The Live Feedback is almost always wrong on “correctness” categories. ● The Live Feedback is better at noticing if incorrect formatting is used. ● The Live Feedback rubric results have no bearing on how the Reviewer or QC steps will grade your task. Live Feedback
  • 21.
  • 22.
    Complexity: To whatextent does the prompt require multi step logical reasoning and the application of advanced mathematical concepts? ● 5_Very High Complexity: ○ The prompt requires a highly complex and creative chain of reasoning that is nonstandard. It involves the application of advanced, mathematical theorems or a novel synthesis of multiple complex concepts (typical of math Olympiad problems). The solution demands significant ingenuity and deep mathematical insight. ○ Two or more distinct areas of advanced mathematics are required to determine the correct final answer. ● 4_High Complexity: ○ The prompt requires a robust chain of logical deductions and the application of advanced concepts, which may be from early university-level mathematics (e.g., calculus, introductory number theory, vectors). The solution demands significant insight to connect different ideas or a creative approach to problem solving. ○ Two or more distinct areas of advanced mathematics are required to determine the correct final answer. ● 3_Moderate Complexity: ○ Either the prompt correctly requires multiple, integrated logical steps and the connection of two or more distinct mathematical concepts, but the concepts are typically from advanced high school mathematics (e.g., trigonometry, logarithms, sequences and series). ○ Or the prompt only requires one area of advanced mathematics instead of involving multiple distinct areas, but the mathematics is suitably advanced. ● 2_Low Complexity: ○ The prompt requires a short and predictable sequence of logical steps (2-3 steps). The reasoning follows a common, procedural method. ○ It involves the application of standard, high school–level concepts (e.g., solving linear equations, basic geometry theorems). ● 1_Very Low Complexity: ○ The prompt requires a single, straightforward step or the direct application of a basic, well-known formula. The reasoning is minimal, and the concepts involved are elementary (e.g., basic arithmetic, simple algebraic manipulation). The solution path is immediately obvious. Quality Rubric - Prompt
  • 23.
    Domain Match: Howclosely does the selected domain in the metadata match the prompt ● 5_ Perfectly Aligned: The selected domain is the primary and essential field required to solve the prompt. The core concepts, theorems, and solution methods are central to this domain. It is the most accurate and specific classification possible ○ Example: The prompt is about finding the number of integer solutions to an equation, and the selected domain is Number Theory > Diophantine Equations. ● 4_Closely Aligned: The selected domain is a major component of the prompt's solution. It is either one of two equally important domains required to solve the problem, or it is a correct broader category where a more specific subdomain would have been the perfect choice. ○ Example: The prompt requires both Combinatorics and Probability, and Combinatorics is selected. Or, the prompt is about Prime Factorization, and the broader domain of Number Theory is selected. ● 3_Somewhat Aligned: The selected domain is relevant and plays a secondary or supporting role in the solution, but it is not the primary domain. A more central or specific domain could have been chosen to better represent the prompt's main. ○ Example: The prompt involves using trigonometric identities to solve a geometric problem (Geometry), but the selected domain is Trigonometry. ● 2_Poorly Aligned: Description: The selected domain has only a minor or tangential connection to the prompt. While a concept from this domain might be mentioned or used in a trivial way, it is not at all central to the required reasoning or solution method. The choice is a significant misrepresentation of the prompt's core challenge. ○ Example: The prompt is a complex Diophantine equation (Number Theory) that happens to involve a quadratic, but the selected domain is Algebra > Theory of Equations. ● 1_Not Aligned: The selected domain has no connection to the mathematical concepts or reasoning required by the prompt. The selection is factually incorrect and does not reflect any part of the problem or its solution. ○ Example: The prompt is about Euclidean Geometry, but the selected domain is Number Theory. Quality Rubric - Prompt
  • 24.
    Quality Rubric -Prompt Well Defined: Is the prompt unambiguous, self-contained (no missing assumptions), and clearly formulated? ● Yes_the prompt ○ Unambiguous: All terms, conditions, and requirements are clearly defined with no room for multiple interpretations. ○ Self-contained: All necessary information to solve the problem is provided within the prompt, with no unstated or implied assumptions required. ○ Clearly formulated: The question or task is presented in a logical, well-structured manner that is easy to understand. ○ Complete: All relevant variables, constraints, and conditions are explicitly stated. ○ Consistent: There are no contradictions or conflicting information within the prompt. ○ Precise language: Mathematical terms and concepts are used accurately and appropriately. ● No_the prompt: ○ Contains ambiguities: Some terms, conditions, or requirements are open to multiple interpretations. ○ Missing information: Unstated or implied assumptions are necessary to solve the problem. ○ Unclear formulation: The question is presented in a confusing or illogical manner. ○ Incomplete: Some relevant variables, constraints, or conditions are not explicitly stated. ○ Inconsistent: Contains contradictions or conflicting information. ○ Imprecise language: Mathematical terms or concepts are used inaccurately or inappropriately.
  • 25.
    Quality Rubric -Prompt Formatting: Is LaTeX formatting used consistently and correctly for the prompts? ● Yes: ○ LaTeX formatting is applied consistently throughout the prompt, ensuring uniformity in the presentation of mathematical expressions and symbols. ○ All mathematical expressions, equations, and symbols are correctly formatted using LaTeX, with no errors or inconsistencies. ● No: ○ LaTeX formatting is not applied consistently, leading to variations in the presentation of mathematical expressions and symbols. ○ There are errors or inconsistencies in the LaTeX formatting of mathematical expressions, equations, or symbols. Naturalness: Is the prompt grammatical and natural sounding, and written using an academic register? ● 5_Excellent: The language is clear, precise, and natural-sounding, with no grammatical errors. It maintains a consistent and appropriate academic register, using precise mathematical terminology correctly. ● 4_Good / Mostly Correct: The prompt is free from grammatical errors. The language is for the most part clear and natural sounding and is easy to comprehend. The academic register is appropriate and consistent throughout the prompt, requiring minimal or no editing. ● 3_Moderate: The prompt is mostly understandable but contains noticeable minor grammatical or stylistic errors. The language is mostly natural but may have some awkward sentences that could be improved. The academic register is generally maintained but may have minor inconsistencies. The prompt is functional but requires editing. ● 2_Below Average: The prompt contains significant grammatical errors that impact readability. The language is frequently unnatural or confusing. The academic register may be inconsistent. While the core mathematical question is somewhat usable, the text requires substantial editing for clarity and correctness. ● 1_Poor: The prompt contains numerous critical grammatical errors that make it difficult to understand. The language is unnatural, and awkward. The register may also not be appropriate for a mathematical context. The prompt is fundamentally flawed and requires a complete rewrite.
  • 26.
    Quality Rubric -Final Answer
  • 27.
    Correctness: Is thefinal answer the mathematically-correct answer to the prompt? ● Yes: ○ The final answer provided is accurate and has been verified and validated against the prompt, ensuring that it is the correct solution. ● No: ○ The answer does not match the expected solution based on the prompt Formatting: Is LateX formatting used correctly and consistently for the final answers ● Yes: ○ LaTeX formatting is applied correctly in the final answer, ensuring uniformity in the presentation of mathematical expressions and symbols. ○ All mathematical expressions, equations, and symbols are correctly formatted using LaTeX, with no errors or inconsistencies. ● No: ○ LaTeX formatting is not applied correctly in the final answer, leading to variations in the presentation of mathematical expressions and symbols. ○ There are errors or inconsistencies in the LaTeX formatting of mathematical expressions, equations, or symbols. Quality Rubric - Final Answer
  • 28.
  • 29.
    Quality Rubric -Explanation Chain-of-Thought Reasoning: To what extent does the explanation follow clear, logical steps to demonstrate the reasoning process? ● 5_Perfect Reasoning: The explanation perfectly lays out all of the logical steps required to solve the prompt. Every step is clearly, logically, and elegantly presented. ● 4_Well Reasoned: The explanation does a good job laying out all of the logical steps required to solve the prompt. Each step is clear and logical, but minor improvements could be made. ● 3_Minor Gaps: The explanation generally does a good job of laying out the logical steps required to solve the prompt, but there are minor gaps in the reasoning process. There may also be minor problems with how the reasoning steps connect to one another. The explanation is functional but requires editing. ● 2_Major Gaps: The explanation contains major gaps in explaining the logical steps required to solve the prompt. There may also be major issues with the logic of how each step logically connects to the next. The explanation requires major revision. ● 1_Poor: The explanation completely fails to explain the logical steps required to solve the prompt. A complete rewrite of the explanation is needed.
  • 30.
    Quality Rubric -Explanation Naturalness: Is the explanation grammatical and natural sounding, and written using an academic register? ● 5_Excellent: The language of the explanation is clear, precise, and natural-sounding, with no grammatical errors. It maintains a consistent and appropriate academic register, using precise mathematical terminology correctly. ● 4_Good / Mostly Correct: The explanation is free from grammatical errors. The language is for the most part clear and natural sounding and is easy to comprehend. The academic register is appropriate and consistent throughout the explanation, requiring minimal or no editing. ● 3_Moderate: The explanation is mostly understandable but contains noticeable minor grammatical or stylistic errors. The language is mostly natural but may have some awkward sentences that could be improved. The academic register is generally maintained but may have minor inconsistencies. The explanation is functional but requires editing. ● 2_Below Average: The explanation contains significant grammatical errors that impact readability. The language is frequently unnatural or confusing. The academic register may be inconsistent. While the explanation is somewhat usable, the text requires substantial editing for clarity and correctness. ● 1_Poor: The explanation contains numerous critical grammatical errors that make it difficult to understand. The language is unnatural, and awkward. The register may also not be appropriate for a mathematical context. The explanation is fundamentally flawed and requires a complete rewrite.
  • 31.
    Quality Rubric -Explanation Use of Proper Notation: To what extent are mathematical symbols, formulas, and terminology used consistently and correctly in the explanation? ● 5_Excellent: The explanation is completely correct and consistent in its use of proper mathematical notation. ● 4_Correct / Mostly Consistent: The explanation is completely correct in its use of proper mathematical notation. It may contain minor inconsistencies with its mathematical notation (for example, mixing different versions of terminology referring to the same thing) that do not affect correctness. ● 3_Mostly Correct / Inconsistent: The explanation is mostly correct in its use of proper mathematical notation. Either minor errors are present or there are major inconsistencies in how mathematical notation is used. The explanation is functional but requires editing. ● 2_Incorrect: The explanation contains major errors in proper mathematical notation. It may also have issues in consistency of how mathematical notation is used. While the explanation is somewhat usable, the text requires substantial editing for proper mathematical notation. ● 1_Poor: The explanation contains significant errors in proper mathematical notation. The explanation is fundamentally flawed and requires a complete rewrite. The explanation is fundamentally flawed and requires a complete rewrite.
  • 32.
    Quality Rubric -Explanation Formatting: Is LaTex formatting used consistently and correctly in the explanation? ● Yes: ○ LaTeX formatting is applied correctly in the explanation, ensuring uniformity in the presentation of mathematical expressions and symbols. ○ All mathematical expressions, equations, and symbols are correctly formatted using LaTeX, with no errors or inconsistencies. ● No: ○ LaTeX formatting is not applied correctly in the explanation, leading to variations in the presentation of mathematical expressions and symbols. ○ There are errors or inconsistencies in the LaTeX formatting of mathematical expressions, equations, or symbols. Completeness: To what extent does the explanation fully solve all parts of the problem? ● 5_Complete: The explanation completely answers the prompt. There is nothing missing from the process of finding the solution, and all parts of the prompt are answered completely. ● 4_Mostly Complete: The explanation for the most part completely answers the prompt. The process for finding the solution could possibly be better fleshed out, but all parts of the prompt are answered completely. ● 3_Almost Complete: The explanation comes close to completely answering the prompt, but either is lacking some major parts of explaining the process of finding the solution or fails to answer some minor parts of the prompt. The explanation is functional but requires editing. ● 2_Incomplete: The explanation fails to answer major parts of the prompt and may also lack major parts of explaining the process for finding the solution. While the explanation is somewhat usable, the text is missing significant parts of what is needed for the explanation to be complete. ● 1_Poor: The explanation completely fails to answer the prompt. The explanation requires a complete rewrite.
  • 33.
    Quality Rubric -Explanation No Hallucination: Is the explanation free from invented definitions, theorems, or data not provided in the prompt or commonly known? ● Yes: ● The solution is free from invented definitions, theorems, or data not provided in the prompt or commonly known. ● No: ● The solution contains invented definitions, theorems, or data not provided in the prompt or commonly known. Conciseness: To what extent does the explanation avoid overly verbose explanations or redundant steps? ● 5_Concise: The explanation contains precisely as much information as is needed to answer the prompt. There is no irrelevant or redundant information contained in the explanation. Syntax and word choice are spot on to elegantly and concisely explain how the correct answer is found. ● 4_Mostly Concise: The explanation is concise and does not contain irrelevant or redundant information. It could potentially be written more elegantly. ● 3_Almost Concise: The explanation is almost concise but does contain some irrelevant or redundant information. The explanation is functional but requires editing. ● 2_Verbose: The explanation contains a lot of irrelevant or redundant information. Major edits are needed to fix this explanation. ● 1_Poor: The explanation is almost entirely redundant or irrelevant information. A complete rewrite of the explanation is needed.
  • 34.
    Example ● Subdomain: GraphTheory ● Source: “” ● Correct Final Answer: 5 ● Explanation (Chain of Thought): see next slides
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
    As part ofour continuous improvement expectations, we are implementing strict quality control measures. ● If your failed count exceeds 25%, you will be pulled from production and assigned to re-training. ● You will then have 5 business days to demonstrate improvement following the training. ● Failure to show improvement after this period will result in removal from the project We strongly encourage you to make use of the feedback provided by the QC team to improve continuously throughout the project. Quality Control & Continuous Improvement Notice
  • 40.
    To prevent multiplerework loops and ensure efficient use of contributor efforts, we are implementing the following restrictions moving forward: ● Only one rework per task is allowed at the maker stage. After a job exceeds the rework limit and is rejected the 2nd time at the review step, it will be cancelled and not accounted for. ● Read reviewer feedback carefully and fix all errors during the first rework attempt. ● Makers are expected to understand quality guidelines and improve task quality to remain eligible for the project. ● If you disagree with the reviewer’s feedback or wish to challenge a failed decision, submit the Job ID via [email protected], so the project team can investigate further. Please note: your performance is tracked based on the number of reworks, cancelled jobs as well as quality of your tasks. Rework of Tasks
  • 41.
    We observed alot of timed-out tasks at the maker step and hence did not get reassigned to the same maker. Please remember: ● FTS will time out if left idle (maximum of 3 hours). ● Exceeding this limit will result in automatic exit and loss of progress, so plan your work accordingly. ● You should open, save, and exit tasks properly to keep your workspace clean and avoid data loss. Time-out Tasks
  • 42.
    ● The contentcreated for each task must be unique and originally created by you. ● Using AI models or internet websites to create content is NOT allowed for this project. ● The use of AI to create content, or copying content from websites, will result in disqualification and removal from the project. Unique Prompts
  • 43.
    Minimum expected commitmentis: 5 hrs a week, with no maximum limit. ● Tasks are available on a first-come, first-served basis. ● If you don’t see tasks when you log in, it likely means others have already picked them up — keep checking! ● 📌 Tip: FTS does not send notifications, so we recommend checking the queue daily to stay active and maximize your opportunities. 🔍Task Description:
  • 44.
    Once in production,we will track your performance under live production standards, including: ● Average Handling Time (AHT) 70 minutes – this is monitored closely. ● Note: Consistently exceeding AHT expectations may lead to removal from the project. 📊 Performance Expectations
  • 45.
    You will berequired to complete the task using FTS Studio for this project. You can access your task using this link: https://fts-app.playment.io/ Additionally, we’ve emailed you a how-to document to further assist you in navigating the tool. The document provides step-by-step instruction to log on to the tool and proceed with your assigned task. Annotation Platform
  • 46.
    To be successfulin this project, you must study and deeply understand the project guidelines. This is the best way to ensure success and compensated tasks. Guidelines There will be some minor updates to the guidelines completed later today, so please reread the guidelines over the next few days. 📓Guidelines:
  • 47.
    Next Steps Project TeamIntro & Q&A Session Application & Pre-qualification Task Completion Simulation / Qualification Project Training & Guidelines Review Production
  • 48.
    Pay Rate orPortal Questions For any questions regarding your pay rate or regarding the AI Community Portal, you may address them at [email protected] or with your sourcing representative
  • 49.
  • 50.
  • 51.
    Thank you foryour attention!