Llama3.370B超越GPT-4o和Claude3.5 Sonnet

AI领域日新月异，最近AI 领域发生了太多事情，本文就语言大模型Llama 3.3 70B、GPT-4o 和 Claude 3.5 Sonnet进行对比。
12月7日，Meta今年的最终AI模型将要来了。Meta12月6日发布了Llama 3.3，拥有700亿个参数，但其性能与拥有4050亿个参数的Llama 3.1相当。

Meta强调，Llama 3.3型号效率更高、成本更低，可在标准工作站上运行，在提供高质量文本AI解决方案的同时降低运营成本。

Llama 3.3机型优化了多语言支持，支持英语、德语、法语、意大利语、葡萄牙语、印地语、西班牙语、泰语八种语言。
Llama 3.3 的上下文长度为 128K，支持多种工具格式，可以与外部工具和服务集成，扩展模型的功能。

在本文指南中，我们将介绍什么是 Llama 3.3、如何在本地使用 Llama 3.3 以及为什么 Llama 3.3 比 GPT-4o 和 Claude 3.5 Sonnet 强大。

什么是 Llama 3.3

Llama 3.3 是 Meta AI 推出的拥有 70B 参数的大型多语言预训练语言模型。该模型在性能上与 405B 参数的 Llama 3.1 相当，并针对多语言对话进行了优化，支持英语、德语、法语、意大利语、葡萄牙语、印地语、西班牙语和泰语。
Llama 3.3 具有更长的上下文窗口、多语言输入输出功能以及与第三方工具集成以扩展其功能的能力，使其适合商业和研究用途。

Llama 3.3 的主要功能

效率与成本：Llama 3.3模型效率更高、成本更低，可运行在标准工作站上，在提供高质量文本AI解决方案的同时，降低运营成本。
多语言支持：支持英语、德语、法语、意大利语、葡萄牙语、印地语、西班牙语和泰语等8种语言，并能处理这些语言的输入和输出。
长上下文窗口：模型支持128K的上下文长度。
集成第三方工具：与第三方工具和服务集成，扩展功能和应用场景。

Llama 3.3 与 Llama 3.2 之间的比较

与Llama 3.2 相比，Llama 3.3 实现了显着的性能提升，尤其是在基于文本的任务方面。 Llama 3.2 专注于提供适合部署在边缘设备上的轻量级模型（1B 和3B 参数）和多模态模型（11B Vision 和90B Vision）。相比之下，Llama 3.3 将参数范围缩小到70B，并专注于提高文本生成的质量。

调用3.3

Llama 3.3 使用 Meta 定制开发的训练库、定制的 GPU 集群和生产基础设施进行预训练。微调、注释和评估都在同一生产基础设施上进行。在 H100–80GB（TDP 700W）硬件上，训练累计耗费 39.3M GPU，即 700 万个 GPU 小时。

如何在本地使用 Llama 3.3？

Llama 3.3 使用与 Llama 3.1 相同的提示格式。因此，为 Llama 3.1 创建的提示将在 Llama 3.3 中保持不变。这对于从 Llama 3.1 迁移的用户来说是一个很大的优势。Llama 3.3 支持与 Llama 3.1 相同的代码解释器和工具调用功能。这些功能允许您的模型利用外部工具和信息。
Ollama 是启动和运行本地语言模型的最快方法我们建议尝试 Llama 3.3 70B，它的大小令人印象深刻，并且在大多数硬件上都能表现良好。
1.从此处下载 Ollama（它将引导您完成其余步骤）2.打开终端并运行ollama run llama3.3
在这里插入图片描述

Llama 3.3 70B vs Claude 3.5 Sonnet vs GPT-4o

现在来比较一下 Claude 3.5 Sonnet 和GPT-4o ，并验证 Llama 3.3 70B 的性能如何。
本次将核实以下几项内容：

编码能力

在过去的两年里，我广泛使用 Python 作为我的主要编程语言。我解决了一些受现实问题启发的棘手 Python 问题。

给定一个由正整数组成的数组nums 。从score = 0开始，
应用以下算法：选择数组中未标记的最小整数。如果存在平局，则选择索引最小的整数。将所选整数的值添加到score 中。标记所选元素及其两个相邻元素（如果存在）。重复此操作，直到所有数组元素都被标记。返回应用上述算法后获得的分数。
示例1 ：
输入： nums = [ 2 , 1 , 3 , 4 , 5 , 2 ]
输出：7
解释：对元素进行如下标记：- 1是最小的未标记元素，因此我们标记它和它的两个相邻元素： [ 2 , 1 , 3 , 4 , 5 , 2 ]。- 2是最小的未标记元素，因此我们标记它和其左相邻元素： [ 2 ，1 ，3 ，4 ，5 ，2 ]。- 4是唯一剩余的未标记元素，因此我们标记它： [ 2 ，1 ，3 ，4 ，5 ，2 ]。我们的分数是1 + 2 + 4 = 7。

LLama 3.3

def find_score(nums):"""Calculate the score by applying the given algorithm.Args:nums (list): A list of positive integers.Returns:int: The calculated score."""score = 0marked = [False] * len(nums)while False in marked:min_val = float('inf')min_idx = -1# Find the smallest unmarked integerfor i, num in enumerate(nums):if not marked[i] and num < min_val:min_val = nummin_idx = i# Add the value of the chosen integer to scorescore += min_val# Mark the chosen element and its two adjacent elements if they existfor i in range(max(0, min_idx - 1), min(len(nums), min_idx + 2)):marked[i] = Truereturn score# Example usage
nums = [2, 1, 3, 4, 5, 2]
print(find_score(nums))  # Output: 7

GPT-4o

def get_score(nums):n = len(nums)marked = [False] * n  # Array to track marked elementsscore = 0while True:# Find the smallest unmarked elementsmallest_value = float('inf')smallest_index = -1for i in range(n):if not marked[i] and nums[i] < smallest_value:smallest_value = nums[i]smallest_index = iif smallest_index == -1:  # If no unmarked elements are left, breakbreak# Add the value of the chosen element to the scorescore += nums[smallest_index]# Mark the chosen element and its two adjacent elementsmarked[smallest_index] = Trueif smallest_index > 0:marked[smallest_index - 1] = Trueif smallest_index < n - 1:marked[smallest_index + 1] = Truereturn score
# Example usage
nums = [2, 1, 3, 4, 5, 2]
print(get_score(nums))

Claude 3.5 Sonnet

def markElements(nums):# Create a copy of the input array to track marked elementsmarked = [False] * len(nums)score = 0while not all(marked):# Find the smallest unmarked elementmin_val = float('inf')min_index = -1# Find the smallest unmarked element with the smallest indexfor i in range(len(nums)):if not marked[i] and nums[i] < min_val:min_val = nums[i]min_index = i# Add the chosen element's value to scorescore += nums[min_index]# Mark the chosen elementmarked[min_index] = True# Mark left adjacent element if it existsif min_index > 0:marked[min_index - 1] = True# Mark right adjacent element if it existsif min_index < len(nums) - 1:marked[min_index + 1] = Truereturn score
# Test cases
test_cases = [[2,1,3,4,5,2],  # Example from the problem statement[1,2,3,4,5],    # Consecutive increasing numbers[5,4,3,2,1]     # Consecutive decreasing numbers
]
for nums in test_cases:print(f"Input: {nums}")print(f"Output: {markElements(nums)}\n")

在 Visual Studio Code 上运行代码时，这三个解决方案都是正确的。但是，LLama 3.3 的代码很清晰，但使用了循环while False in marked，这不如直观while not all(marked)。它还重复了查找最小未标记元素的逻辑，而没有使用辅助函数，这使得它更难维护。

GPT-4o 具有最易读的while True循环和明确的中断条件、简洁的注释和清晰的逻辑，避免了冗余，使其成为可读性和可维护性的可靠选择。

Claude 3.5 Sonnet 也很清晰并且使用了not all(marked)循环条件，但是它的冗长的风格和额外的测试用例虽然有帮助，但却使其不如 GPT-4o 版本那么集中。

总体而言，GPT-4o 的版本是最好的，因为它的结构清晰，注释运用良好，设计直观，有效地平衡了可读性、功能性和可维护性。

数学能力

比较并验证数学能力，我们会让参与者回答概率问题，很多人害怕进入数据科学，因为他们害怕高中和大学的代数、矩阵和组合学。

求5次抛硬币中正好出现两次反面的概率

LLama3.3

步骤 1：定义样本空间样本空间包含抛 5 枚硬币的所有可能结果。每枚硬币落地的方式有两种：正面（H）或反面（T）。因此，样本空间有 2^5 = 32 种可能的结果。步骤 2：确定有利结果我们想要找到恰好出现 2 次反面的概率。这意味着我们需要计算出 2 次反面和 3 次正面的结果的数量。我们可以使用二项式系数来计算：
从 5 次抛掷中选择 2 次反面的方式数 = C(5, 2) = 5 / (2 × 3!) = 10
因此，恰好出现 2 次反面的有利结果共有 10 个。步骤 3：计算概率恰好得到 2 个反面的概率是有利结果的数量除以可能结果的总数：
P（恰好 2 个反面）= 有利结果的数量 / 结果总数
= 10 / 32 
= 5/16
答案在 5 枚硬币抛掷中恰好得到 2 个反面的概率为 5/16 或大约 0.3125 (31.25%)。
用数学符号来说，这可以表示为：
P （ X = 2 ） = （5选2 ） × （1 / 2 ）^ 2 × （1 / 2 ）^ 3 = 10 / 32 = 5 / 16 ，
其中X是表示反面数量的随机变量，（5选2）是表示在5次抛掷中选择2次反面的方法数量的二项式系数。