Model Category
Llama 2 7B

Llama 2 7B is an open-source large language model developed by Meta AI. Released in July 2023, it's the smallest model in the Llama 2 family but offers a good balance of performance and resource requirements.

Specifications

  • Parameters: 7 billion
  • Context Length: 4,096 tokens
  • Training Data: Up to early 2023
  • Architecture: Transformer-based
  • License: Meta AI Llama 2 Community License

Performance

Llama 2 7B shows impressive capabilities for its size:

  • Strong performance on reasoning tasks
  • Good coding capabilities
  • Effective instruction following with the Chat version
  • Reasonable factual accuracy

Hardware Requirements

Minimum requirements for running Llama 2 7B:

  • RAM: 8GB+ (16GB recommended)
  • GPU: 8GB VRAM for FP16, 4GB for INT8/INT4 quantization
  • CPU-only: Possible but slow (2-5 tokens/second)

Quantization Options

Llama 2 7B can be effectively quantized with minimal quality loss:

  • GGUF 4-bit: ~4GB VRAM, ~10-20% quality reduction
  • GGUF 5-bit: ~5GB VRAM, ~5-10% quality reduction
  • GGUF 8-bit: ~8GB VRAM, minimal quality reduction

Benchmark Results

Our benchmarks show the following performance on different hardware:

  • NVIDIA RTX 4090: 180-220 tokens/second
  • NVIDIA RTX 3080: 120-150 tokens/second
  • AMD RX 7900 XTX: 80-100 tokens/second
  • Apple M2 Ultra: 60-80 tokens/second
  • Intel Core i9-13900K: 15-20 tokens/second

Use Cases

Llama 2 7B is well-suited for:

  • Personal assistants
  • Code completion
  • Content generation
  • Text summarization
  • Educational applications