Compare AI Chat models

Model Output Speed (Token/sec) Artificial Analysis Quality Index Reasoning & Knowledge (MMLU) Scientific Reasoning & Knowledge (GPQA) Quantitative Reasoning (MATH) Coding (HumanEval) Maths (MGSM)
ChatGPT v3.5 83 53 68 30 39 69 52
ChatGPT 4o Mini 145 71.4 82 43 75 86 87
ChatGPT 4o 95 77 89 51 78 90 90
ChatGPT v4 Plus 95 77 89 51 78 90 90
ChatGPT o1 Mini (beta) 73 81.6 85 58 90 93 90
Gemini 1.5 Pro 61 80 86 61 85 87 76
Gemini 1.5 Flash 204 73 81 50 79 84 76
Claude V3 Haiku 132 54 71 33 41 72 71
Claude V3 Sonnet 54 57 77 37 46 69 84
Claude V3.5 76 77 88 56 74 90 92
Jamba 1.5 Mini 164 46 63 26 32 61 30
Jamba 1.5 Large 57 64 80 41 60 74 74
Llama 3.2 11B 117 54 72 26 50 67 67
Llama 3.2 90B 40 66 83 43 61 75 83
Llama 3.1 405B (Large) 28 72 87 50 69 82 83
Mistral 7B Instruct 108 24 34 19 14 31 23
Mistral 8X7B Instruct 86 42 63 30 33 41 30
Mistral Large 2 37 73 85 48 72 87 87