Model Category
Llama 2 7B
Llama 2 7B is an open-source large language model developed by Meta AI. Released in July 2023, it's the smallest model in the Llama 2 family but offers a good balance of performance and resource requirements.
Specifications
- Parameters: 7 billion
- Context Length: 4,096 tokens
- Training Data: Up to early 2023
- Architecture: Transformer-based
- License: Meta AI Llama 2 Community License
Performance
Llama 2 7B shows impressive capabilities for its size:
- Strong performance on reasoning tasks
- Good coding capabilities
- Effective instruction following with the Chat version
- Reasonable factual accuracy
Hardware Requirements
Minimum requirements for running Llama 2 7B:
- RAM: 8GB+ (16GB recommended)
- GPU: 8GB VRAM for FP16, 4GB for INT8/INT4 quantization
- CPU-only: Possible but slow (2-5 tokens/second)
Quantization Options
Llama 2 7B can be effectively quantized with minimal quality loss:
- GGUF 4-bit: ~4GB VRAM, ~10-20% quality reduction
- GGUF 5-bit: ~5GB VRAM, ~5-10% quality reduction
- GGUF 8-bit: ~8GB VRAM, minimal quality reduction
Benchmark Results
Our benchmarks show the following performance on different hardware:
- NVIDIA RTX 4090: 180-220 tokens/second
- NVIDIA RTX 3080: 120-150 tokens/second
- AMD RX 7900 XTX: 80-100 tokens/second
- Apple M2 Ultra: 60-80 tokens/second
- Intel Core i9-13900K: 15-20 tokens/second
Use Cases
Llama 2 7B is well-suited for:
- Personal assistants
- Code completion
- Content generation
- Text summarization
- Educational applications