Compare AI Chat models

Model Output Speed (Token/sec) Artificial Analysis Quality Index Reasoning & Knowledge (MMLU) Scientific Reasoning & Knowledge (GPQA) Quantitative Reasoning (MATH) Coding (HumanEval) Maths (MGSM)
ChatGPT v3.5 83 53 68 30 39 69 52
ChatGPT 4o Mini 145 71.4 82 43 75 86 87
ChatGPT 4o 95 77 89 51 78 90 90
ChatGPT v4 Plus 95 77 89 51 78 90 90
ChatGPT o1 Mini (beta) 73 81.6 85 58 90 93 90
ChatGPT o1 Preview 10 86 91 67 92 96 91
Gemini 1.5 Pro 61 80 86 61 85 87 76
Gemini 1.5 Flash 204 73 81 50 79 84 76
Gemini 2.0 Flash (Experimental) 169 - - - - - -
Claude V3.5 Haiku 64 69 81 37 73 85 71
Claude V3 Sonnet 54 57 77 37 46 69 84
Claude V3.5 Sonnet 59 80 89 58 79 93 92
Jamba 1.5 Mini 164 46 63 26 32 61 30
Jamba 1.5 Large 57 64 80 41 60 74 74
Llama 3.2 11B 117 54 72 26 50 67 67
Llama 3.2 90B 40 66 83 43 61 75 83
Llama 3.1 405B (Large) 28 72 87 50 69 82 83
Mistral 7B Instruct 108 24 34 19 14 31 23
Mistral 8X7B Instruct 86 42 63 30 33 41 30
Mistral Large 2 37 73 85 48 72 87 87
xAI Grok 57 70 85 43 69 85 -
Nova Micro 205 66 76 38 69 80 -
Nova Pro 95 75 84 48 79 88 -