Microsoft’s Medprompt+ with GPT-4 Beats Gemini Ultra, Reclaims Benchmark Throne on MMLU

Well that was quick! Gemini Ultra from Google DeepMind’s purportedly superior performance benchmarks against OpenAI’s GPT-4 has been short-lived.

Microsoft Research just released a blog post about its Medprompt+ approach on GPT-4, retaking the benchmark throne against Gemini Ultra which was announced only a week ago.
Medprompt and its Evolution: Developed by a team at Microsoft Research, Medprompt represents a significant leap in prompting strategies. It utilizes specialized techniques to draw out the expertise-like responses from AI models. The approach has been extended into a more robust version, known as Medprompt+, which integrates simple and complex prompting methods. This combination has been instrumental in achieving state-of-the-art (SoTA) results on various benchmarks.

Performance on MMLU Benchmark: The Measuring Massive Multitask Language Understanding (MMLU) challenge is a comprehensive test of general knowledge and reasoning abilities of large language models. The Medprompt approach has shown exceptional performance on this benchmark, with the modified version, Medprompt+, achieving a record score of 90.10%, surpassing other models like Google’s Gemini Ultra.

Promptbase — A Resource Hub: Promptbase, a repository on GitHub, has been introduced to disseminate information and tools for maximizing the performance of foundation models. It includes scripts for replicating results using the Medprompt methodologies and will continue to expand with more resources in the future.
Techniques Behind Medprompt: Medprompt combines several strategies:
- Dynamic Few Shots: This involves selecting task-specific few-shot examples dynamically, enhancing relevance and adaptability.
- Self-Generated Chain of Thought (CoT): It encourages the model to generate intermediate reasoning steps, thereby improving complex reasoning capabilities.
- Majority Vote Ensembling: This technique combines multiple outputs to yield better predictive performance, enhanced by choice-shuffling for robustness.

Extending Medprompt: Medprompt+ extends the original framework by incorporating simple, direct prompts alongside the sophisticated CoT-based ones. This approach dynamically selects the most appropriate technique for each problem, leading to improved performance across diverse MMLU challenges.
Future Directions: The development of Medprompt and Medprompt+ marks a significant milestone in the realm of AI prompting strategies. These methodologies not only demonstrate the capabilities of generalist models like GPT-4 in specialist domains but also pave the way for more nuanced and efficient prompting strategies. The field is rapidly evolving, and with platforms like Promptbase, the AI community can expect continuous advancements and collaborative opportunities in prompt engineering.
Gemini Ultra Controversies


