Running DeepSeek-R1 efficiently on AWS requires selecting the right instance types based on the specific model variant and the desired performance. Here's a detailed overview of recommended AWS instances for different DeepSeek-R1 models:
DeepSeek-R1 (Full Model)
The full DeepSeek-R1 model, with 671 billion parameters, requires substantial computational resources. For optimal performance, a multi-GPU setup is recommended, such as using NVIDIA A100 GPUs. However, AWS does not directly offer A100 GPUs in its standard EC2 instances. Instead, you can consider using instances like `inf2.48xlarge` for similar high-performance computing needs, though these are more suited for inference acceleration rather than training large models like DeepSeek-R1[4].DeepSeek-R1 Distilled Models
For the distilled versions of DeepSeek-R1, which are more efficient and require less VRAM, different AWS instances can be used:- DeepSeek-R1-Distill-Qwen-1.5B: This model can be efficiently run on a single GPU instance. The `ml.g5.xlarge` instance is recommended for hosting this model due to its performance metrics[3].
- DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B: These models perform well on instances like `ml.g6e.xlarge`, which offers a good balance of GPU power and cost. The `ml.g5.2xlarge` and `ml.g5.xlarge` instances are also viable options[3].
- DeepSeek-R1-Distill-Qwen-14B: For this model, an instance with a more powerful GPU is needed. The `g4dn.xlarge` instance, which features NVIDIA T4 GPUs, might not be sufficient due to its VRAM limitations. Instead, consider using instances with more powerful GPUs like those in the `ml.g6` family or opting for a custom setup with higher-end GPUs if available[1][2].
- DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Llama-70B: These larger models require even more powerful GPUs. For optimal performance, instances with high-end GPUs like NVIDIA RTX 4090 are recommended, though such specific GPUs are not directly available in standard AWS EC2 instances. However, you can use instances like `inf2.48xlarge` for high-performance inference tasks[4][6].
CPU-Based Deployment
For batch processing tasks where latency is not a critical factor, AWS Graviton4-based instances can offer a cost-effective solution. The `c8g.16xlarge` instance, with its high core count and memory bandwidth, is suitable for running models like DeepSeek-R1-Distill-Llama-70B in a CPU-only environment[6].Fully Managed Solutions
For users who prefer not to manage infrastructure, DeepSeek-R1 is also available as a fully managed serverless model in Amazon Bedrock. This option allows you to leverage the model's capabilities without worrying about underlying infrastructure complexities[9].In summary, the choice of AWS instance for running DeepSeek-R1 efficiently depends on the specific model variant, the required performance level, and whether GPU acceleration is necessary. For most distilled models, instances with powerful GPUs are recommended, while CPU-based instances can be suitable for batch processing tasks.
Citations:
[1] https://community.aws/content/2sEuHQlpyIFSwCkzmx585JckSgN/deploying-deepseek-r1-14b-on-amazon-ec2?lang=en
[2] https://dev.to/askyt/deepseek-r1-architecture-training-local-deployment-and-hardware-requirements-3mf8
[3] https://aws.amazon.com/blogs/machine-learning/optimize-hosting-deepseek-r1-distilled-models-with-hugging-face-tgi-on-amazon-sagemaker-ai/
[4] https://huggingface.co/blog/deepseek-r1-aws
[5] https://dev.to/askyt/deepseek-r1-671b-complete-hardware-requirements-optimal-deployment-setup-2e48
[6] https://community.aws/content/2rhRJI6cxBa1Ib5f3TjsfPadpXs/deploying-deepseek-r1-distill-llama-70b-for-batch-inference-on-aws-graviton4?lang=en
[7] https://aws.amazon.com/blogs/machine-learning/deploy-deepseek-r1-distilled-models-on-amazon-sagemaker-using-a-large-model-inference-container/
[8] https://www.reddit.com/r/LocalLLM/comments/1i6j3ih/how_to_install_deepseek_what_models_and/
[9] https://aws.amazon.com/blogs/aws/deepseek-r1-now-available-as-a-fully-managed-serverless-model-in-amazon-bedrock/