{"id":4090,"date":"2024-11-22T09:43:04","date_gmt":"2024-11-22T00:43:04","guid":{"rendered":"https:\/\/derrylab.com\/?p=4090"},"modified":"2024-11-22T09:43:05","modified_gmt":"2024-11-22T00:43:05","slug":"how-to-easily-deploy-pixtral-large-using-docker-vllm-for-self-hosting-with-one-liner-command","status":"publish","type":"post","link":"https:\/\/blog.derrylab.com\/index.php\/2024\/11\/22\/how-to-easily-deploy-pixtral-large-using-docker-vllm-for-self-hosting-with-one-liner-command\/","title":{"rendered":"How To Easily Deploy Pixtral Large Using Docker VLLM For Self Hosting With One Liner Command"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Recently, Mistral has released a powerful multimodal model with 123B parameters. In this blog post, we will first understand what Pixtral is. Then, we will quickly get into the process of deploying <a href=\"https:\/\/mistral.ai\/news\/pixtral-large\/\" data-type=\"link\" data-id=\"https:\/\/mistral.ai\/news\/pixtral-large\/\" target=\"_blank\" rel=\"noreferrer noopener\">Pixtral Large<\/a> using <a href=\"https:\/\/github.com\/vllm-project\/vllm\" data-type=\"link\" data-id=\"https:\/\/github.com\/vllm-project\/vllm\" target=\"_blank\" rel=\"noreferrer noopener\">VLLM<\/a>.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h3 class=\"wp-block-heading\">Understanding Pixtral<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Pixtral is an innovative AI model designed to handle complex tasks with high efficiency and accuracy. It is particularly noted for its ability to process and generate multimedia content, including images and text. The model leverages advanced techniques in machine learning to provide state-of-the-art performance in various domains.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pixtral Large, specifically, is a variant of the Pixtral model that is optimized for large-scale deployments. It is designed to handle extensive datasets and provide high-quality outputs, making it ideal for enterprise-level applications.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Deploying Pixtral Large Using VLLM<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now that we have a basic understanding of Pixtral, let&#8217;s move on to the deployment process using VLLM. VLLM is a framework that simplifies the deployment of large language models by providing a streamlined and efficient runtime environment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Currently, on the Pixtral release page itself and the HuggingFace, there is no tutorial on VLLM with docker deployment. So here we go.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Prerequisites<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Before we begin, ensure you have the following prerequisites:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Docker<\/strong>: Make sure Docker is installed on your system.<\/li>\n\n\n\n<li><strong>NVIDIA GPUs<\/strong>: You need NVIDIA GPUs to leverage the full potential of Pixtral Large. Here I am using 8xH100 GPUs.<\/li>\n\n\n\n<li><strong>Hugging Face Token<\/strong>: You need a valid Hugging Face Hub token to access the model. Go to your Huggingface token page to get it.<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading\">Deployment Steps<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Pull the VLLM Docker Image<\/strong>:<br>First, you need to pull the latest VLLM Docker image that supports the Pixtral Large model.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>   docker pull vllm\/vllm-openai:latest<\/code><\/pre>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><strong>Run the Docker Container<\/strong>:<br>Use the following command to run the Docker container with the necessary configurations:<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>docker run --runtime nvidia --rm -d --gpus '\"device=0,1,2,3,4,5,6,7\"' --name=vllm-pixtral124 -v \/raid\/huggingface:\/root\/.cache\/huggingface --env \"HUGGING_FACE_HUB_TOKEN=&lt;HF_TOKEN>\" -p 8888:8000 --ipc=host  vllm\/vllm-openai:latest --model mistralai\/Pixtral-Large-Instruct-2411 --config-format mistral --load-format mistral --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Breaking down the command:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>--runtime nvidia<\/code>: Specifies the use of NVIDIA runtime.<\/li>\n\n\n\n<li><code>--rm<\/code>: Automatically removes the container when it exits.<\/li>\n\n\n\n<li><code>-d<\/code>: Runs the container in detached mode.<\/li>\n\n\n\n<li><code>--gpus '\"device=0,1,2,3,4,5,6,7\"'<\/code>: Specifies the GPUs to be used. I am using whole 8xH100 GPUs here.<\/li>\n\n\n\n<li><code>--name=vllm-pixtral124<\/code>: Names the container for easy reference.<\/li>\n\n\n\n<li><code>-v \/raid\/huggingface:\/root\/.cache\/huggingface<\/code>: Mounts the Hugging Face cache directory. If you already have the huggingface cache in your host then I strognly suggest to use this, so that after downloading the model, you can reuse it.<\/li>\n\n\n\n<li><code>--env \"HUGGING_FACE_HUB_TOKEN=&lt;HF_TOKEN>\"<\/code>: Sets the Hugging Face Hub token environment variable.<\/li>\n\n\n\n<li><code>-p 1318:8000<\/code>: Maps port 1318 on the host to port 8000 on the container.<\/li>\n\n\n\n<li><code>--ipc=host<\/code>: Uses the host&#8217;s IPC namespace.<\/li>\n\n\n\n<li><code>vllm\/vllm-openai:latest<\/code>: Specifies the Docker image to use.<\/li>\n\n\n\n<li><code>--model mistralai\/Pixtral-Large-Instruct-2411<\/code>: Specifies the model to be used. In this case we use Pixtral Large Instruct 2411.<\/li>\n\n\n\n<li><code>--config-format mistral<\/code>: Specifies the configuration format to mistral.<\/li>\n\n\n\n<li><code>--load-format mistral<\/code>: Specifies the load format, also to mistral.<\/li>\n\n\n\n<li><code>--tokenizer_mode mistral<\/code>: Specifies the tokenizer mode, to mistral, just like the guidelines from Mistral team.<\/li>\n\n\n\n<li><code>--limit_mm_per_prompt 'image=10'<\/code>: Limits the number of images per prompt to 10 to make sure it does not get OOM experience.<\/li>\n\n\n\n<li><code>--tensor-parallel-size 8<\/code>: Specifies the tensor parallel size.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Now to monitor the logs and ensure everything is running smoothly, use the following command:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>docker logs -f vllm-pixtral124<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This command will provide real-time logs from the running container, allowing you to troubleshoot any issues that may arise.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first start might be very slow because the VLLM is downloading the huge 200GB model from huggingface.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conclusion<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Deploying Pixtral Large using VLLM is a straightforward process. By following the steps outlined above, you can efficiently deploy this awesome multimodal model for your applications.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you encounter any issues or have further questions, feel free to reach me out. Happy deploying! \ud83d\ude42<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, Mistral has released a powerful multimodal model with 123B parameters. In this blog post, we will first understand what Pixtral is. Then, we will quickly get into the process of deploying Pixtral Large using VLLM.<\/p>\n","protected":false},"author":1,"featured_media":4091,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[217,4],"tags":[18,238,36,232,234,235,236,64],"class_list":["post-4090","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-linux","tag-development","tag-docker","tag-linux","tag-llmops","tag-mistral","tag-pixtral","tag-pixtral-large","tag-tutorial"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2024\/11\/image.png?fit=1024%2C768&ssl=1","jetpack-related-posts":[{"id":127,"url":"https:\/\/blog.derrylab.com\/index.php\/2020\/11\/17\/how-to-fix-raspberry-pi-ssh-hangs-or-not-responding\/","url_meta":{"origin":4090,"position":0},"title":"How to Fix Raspberry Pi SSH Hangs or Not Responding","author":"derry","date":"November 17, 2020","format":false,"excerpt":"I just set up a Raspberry PI 4 Model B in the laboratory to automatically connect to the lab's router. I found that each random minutes the SSH is hangs and not responding. Adding IPQoS cs0 cs0 line to the end of \/etc\/ssh\/sshd_config file will fix the issue. :)","rel":"","context":"In &quot;linux&quot;","block_context":{"text":"linux","link":"https:\/\/blog.derrylab.com\/index.php\/category\/linux\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2538,"url":"https:\/\/blog.derrylab.com\/index.php\/2023\/01\/31\/how-to-install-alfa-awus036nh-driver-for-kali-linux\/","url_meta":{"origin":4090,"position":1},"title":"How to Install Alfa AWUS036NH Driver for Kali Linux","author":"derry","date":"January 31, 2023","format":false,"excerpt":"This wifi adapter is an important weapon for penetration testers due to its feature that supports monitor and packet injection mode on a 2.4Ghz network. However, even if Alfa said it is already supported out of the box in Kali Linux, I found it unstable out of the box. Sometimes\u2026","rel":"","context":"In &quot;Hardware&quot;","block_context":{"text":"Hardware","link":"https:\/\/blog.derrylab.com\/index.php\/category\/hardware\/"},"img":{"alt_text":"Computer desktop","src":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/01\/czNmcy1wcml2YXRlL3Jhd3BpeGVsX2ltYWdlcy93ZWJzaXRlX2NvbnRlbnQvbHIvZnJ3bGFuX3dpZmlfbmV0d29ya19tb2RlbS1pbWFnZS1reWJjZThvZC5qcGc.jpg?fit=1200%2C800&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/01\/czNmcy1wcml2YXRlL3Jhd3BpeGVsX2ltYWdlcy93ZWJzaXRlX2NvbnRlbnQvbHIvZnJ3bGFuX3dpZmlfbmV0d29ya19tb2RlbS1pbWFnZS1reWJjZThvZC5qcGc.jpg?fit=1200%2C800&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/01\/czNmcy1wcml2YXRlL3Jhd3BpeGVsX2ltYWdlcy93ZWJzaXRlX2NvbnRlbnQvbHIvZnJ3bGFuX3dpZmlfbmV0d29ya19tb2RlbS1pbWFnZS1reWJjZThvZC5qcGc.jpg?fit=1200%2C800&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/01\/czNmcy1wcml2YXRlL3Jhd3BpeGVsX2ltYWdlcy93ZWJzaXRlX2NvbnRlbnQvbHIvZnJ3bGFuX3dpZmlfbmV0d29ya19tb2RlbS1pbWFnZS1reWJjZThvZC5qcGc.jpg?fit=1200%2C800&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/01\/czNmcy1wcml2YXRlL3Jhd3BpeGVsX2ltYWdlcy93ZWJzaXRlX2NvbnRlbnQvbHIvZnJ3bGFuX3dpZmlfbmV0d29ya19tb2RlbS1pbWFnZS1reWJjZThvZC5qcGc.jpg?fit=1200%2C800&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":2789,"url":"https:\/\/blog.derrylab.com\/index.php\/2023\/04\/25\/how-to-ssh-with-proxyjump-in-linux\/","url_meta":{"origin":4090,"position":2},"title":"How to SSH with ProxyJump in Linux","author":"derry","date":"April 25, 2023","format":false,"excerpt":"Secure Shell (SSH) is a widely used protocol for remotely connecting to a computer system, typically over a network. It provides encrypted communication and authentication to ensure secure access to a remote machine. In this article, we will discuss how to SSH with ProxyJump in Linux. ProxyJump is a feature\u2026","rel":"","context":"In &quot;linux&quot;","block_context":{"text":"linux","link":"https:\/\/blog.derrylab.com\/index.php\/category\/linux\/"},"img":{"alt_text":"photo of jumping man","src":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-2736809.jpeg?fit=1200%2C800&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-2736809.jpeg?fit=1200%2C800&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-2736809.jpeg?fit=1200%2C800&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-2736809.jpeg?fit=1200%2C800&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-2736809.jpeg?fit=1200%2C800&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":2780,"url":"https:\/\/blog.derrylab.com\/index.php\/2023\/04\/25\/how-to-replace-string-in-files-without-text-editor-in-linux\/","url_meta":{"origin":4090,"position":3},"title":"How to Replace String in Files without Text Editor in Linux","author":"derry","date":"April 25, 2023","format":false,"excerpt":"As a Linux user, it's important to know how to modify text files through the command line. One common scenario is changing a single line in a text file, such as enabling or disabling a feature. But what if the OS doesn't have any text editor installed at all? In\u2026","rel":"","context":"In &quot;linux&quot;","block_context":{"text":"linux","link":"https:\/\/blog.derrylab.com\/index.php\/category\/linux\/"},"img":{"alt_text":"pencil shavings","src":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-1237647.jpeg?fit=1200%2C800&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-1237647.jpeg?fit=1200%2C800&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-1237647.jpeg?fit=1200%2C800&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-1237647.jpeg?fit=1200%2C800&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2023\/04\/pexels-photo-1237647.jpeg?fit=1200%2C800&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":135,"url":"https:\/\/blog.derrylab.com\/index.php\/2020\/11\/18\/how-to-start-a-fresh-raspberry-pi-without-monitor\/","url_meta":{"origin":4090,"position":4},"title":"How to Start a Fresh Raspberry Pi without Monitor","author":"derry","date":"November 18, 2020","format":false,"excerpt":"I mean using SSH because HDMI, mouse, keyboard, and monitor will eat up the whole space in my desk. Prepare the Raspberry Pi OS Firstly we need to get our microSD card with bootable Raspberry Pi OS. I recommend using Raspberry Pi Imager to make this process easier. We just\u2026","rel":"","context":"In &quot;linux&quot;","block_context":{"text":"linux","link":"https:\/\/blog.derrylab.com\/index.php\/category\/linux\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2020\/11\/image.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2020\/11\/image.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.derrylab.com\/wp-content\/uploads\/2020\/11\/image.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":92,"url":"https:\/\/blog.derrylab.com\/index.php\/2020\/07\/18\/how-to-create-identical-image-of-usb-device\/","url_meta":{"origin":4090,"position":5},"title":"How to Create Identical Image of USB Device","author":"derry","date":"July 18, 2020","format":false,"excerpt":"Hi, currently I am backing up my micro SD card contents and I'm using dd for that. You can start listing your usb device using: $ sudo fdisk -l After you get the USB address, for example mine is \/dev\/sdb, you can start creating a copy images. Here I tried\u2026","rel":"","context":"In &quot;linux&quot;","block_context":{"text":"linux","link":"https:\/\/blog.derrylab.com\/index.php\/category\/linux\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/posts\/4090","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/comments?post=4090"}],"version-history":[{"count":2,"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/posts\/4090\/revisions"}],"predecessor-version":[{"id":4093,"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/posts\/4090\/revisions\/4093"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/media\/4091"}],"wp:attachment":[{"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/media?parent=4090"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/categories?post=4090"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.derrylab.com\/index.php\/wp-json\/wp\/v2\/tags?post=4090"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}