Google's Gemini Makes Contractors Rate AI Beyond Their Expertise

Generative AI has taken the world by storm, captivating our imaginations with its seemingly magical capabilities. However, beneath the surface, there is a complex ecosystem of employees working tirelessly at companies like Google, OpenAI, and others. These "prompt engineers" and analysts play a crucial role in rating the accuracy of chatbots' outputs to enhance the AI's performance. But a recent internal guideline has raised concerns about the accuracy of Gemini on sensitive topics.

Internal Guideline and Its Impact

A new internal guideline passed from Google to contractors working on Gemini has led to significant changes. Previously, contractors could "skip" prompts outside their domain expertise. For example, a contractor working with GlobalLogic, an outsourcing firm owned by Hitachi, could skip a niche cardiology question if they had no scientific background. But now, contractors are no longer allowed to skip such prompts regardless of their expertise. Internal correspondence shows that the guidelines have changed from "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task" to "You should not skip prompts that require specialized domain knowledge." Instead, contractors are told to "rate the parts of the prompt you understand" and include a note about their lack of domain knowledge. This has directly affected the accuracy of Gemini on certain topics, as contractors are often tasked with evaluating highly technical AI responses about rare diseases they have no background in.

One contractor noted in internal correspondence, "I thought the point of skipping was to increase accuracy by giving it to someone better?" This shows the confusion and concern among contractors about the new guideline.

Currently, contractors can only skip prompts in two cases: if they are "completely missing information" like the full prompt or response, or if they contain harmful content that requires special consent forms to evaluate.

The Importance of Prompt Engineers

Prompt engineers are the unsung heroes behind the scenes of generative AI. They design and refine the prompts that guide the AI's responses. Their work is crucial in ensuring that the AI provides accurate and useful information. Without their expertise, the AI would be like a ship without a rudder, sailing aimlessly.

For example, prompt engineers at Google spend countless hours analyzing and improving the prompts used by chatbots. They test different prompts and evaluate their effectiveness in generating accurate responses. This iterative process helps to continuously improve the AI's performance.

Similarly, at OpenAI, prompt engineers work closely with the development team to optimize the prompts for different applications. They consider factors such as the complexity of the task, the target audience, and the desired outcome to create prompts that yield the best results.

The Role of Contractors

Contractors also play a vital role in the development of generative AI. They are responsible for evaluating the AI-generated responses according to factors like "truthfulness." This helps to identify any inaccuracies or biases in the responses and allows for improvements to be made.

For instance, contractors working with GlobalLogic are tasked with evaluating AI-written responses on a regular basis. They use their domain knowledge and expertise to rate the responses and provide feedback to the development team. This collaborative effort between contractors and the development team is essential for the continuous improvement of the AI.

However, the recent change in the internal guideline has put additional pressure on contractors. They now have to evaluate prompts that are outside their comfort zone, which can be challenging and may affect the accuracy of their evaluations.

The Future of Generative AI

The changes in the internal guideline raise important questions about the future of generative AI. How will these guidelines impact the accuracy and reliability of AI systems? Will contractors be able to handle the increased workload and provide accurate evaluations?

It is clear that generative AI is here to stay and will continue to evolve. As such, it is essential that we address these issues and find ways to ensure the accuracy and reliability of AI systems. This may require further research and development in the field of prompt engineering and contractor training.

Only by working together can we unlock the full potential of generative AI and ensure that it benefits society in the best possible way.