God, Love, News, Event, Entertainment, Amebo,..... All about Bringing out the best in you...
Show HN: How We Run 60 Hugging Face Models on 2 GPUs https://ift.tt/rVzBu9g
Show HN: How We Run 60 Hugging Face Models on 2 GPUs Most open-source LLM deployments assume one model per GPU. That works if traffic is steady. In practice, many workloads are long-tail or intermittent, which means GPUs sit idle most of the time. We experimented with a different approach. Instead of pinning one model to one GPU, we: •Stage model weights on fast local disk •Load models into GPU memory only when requested •Keep a small working set resident •Evict inactive models aggressively •Route everything through a single OpenAI-compatible endpoint In our recent test setup (2×A6000, 48GB each), we made ~60 Hugging Face text models available for activation. Only a few are resident in VRAM at any given time; the rest are restored when needed. Cold starts still exist. Larger models take seconds to restore. But by avoiding warm pools and dedicated GPUs per model, overall utilization improves significantly for light workloads. Short demo here: https://m.youtube.com/watch?v=IL7mBoRLHZk Live demo to play with: https://ift.tt/WXKzONw If anyone here is running multi-model inference and wants to benchmark this approach with their own models, I’m happy to provide temporary access for testing. January 31, 2026 at 04:43AM
Subscribe to:
Post Comments (Atom)
Show HN: How We Run 60 Hugging Face Models on 2 GPUs https://ift.tt/rVzBu9g
Show HN: How We Run 60 Hugging Face Models on 2 GPUs Most open-source LLM deployments assume one model per GPU. That works if traffic is ste...
-
A word of prayer for you this month of July. God bless you abundantly, Amen. Fr. Kris Ikegwuonu, MDM. (+234 803 435 7990)
-
Show HN: Applesoft BASIC editor with example programs This is an Applesoft BASIC editor that extracts and updates code into a live Apple II ...
-
Show HN: A Spotify player in the terminal with full feature parity https://ift.tt/oZgrl1Q July 18, 2024 at 02:57AM
No comments:
Post a Comment