Linux (Ubuntu 22.04) Setup
Complete GPU server setup including NVIDIA drivers, CUDA toolkit, and Python AI environment.
1 Connect to Your Server via SSH
SSH CONNECTION
# Replace with your credentials from the portal
ssh root@YOUR_SERVER_IP -p 22 -i ~/.ssh/b8n6_key
2 Update System & Install Dependencies
SYSTEM SETUP
apt update && apt upgrade -y
apt install -y build-essential git curl wget htop nvtop \
python3 python3-pip python3-venv software-properties-common
3 Install NVIDIA Drivers & CUDA 12.x
NVIDIA CUDA SETUP
# Add NVIDIA repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb && apt update
apt install -y cuda-toolkit-12-6 nvidia-driver-550
nvidia-smi # verify
4 Set Up Python Environment
PYTHON VENV
python3 -m venv ~/ai-env
source ~/ai-env/bin/activate
pip install --upgrade pip torch torchvision \
--index-url https://download.pytorch.org/whl/cu124
python3 -c "import torch; print(torch.cuda.is_available())"
⚡ Add to ~/.bashrc: export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Windows Server 2022
Setup for Windows Server with WSL2, NVIDIA GPU passthrough, and AI framework support.
1 Connect via RDP or SSH
POWERSHELL
ssh Administrator@YOUR_SERVER_IP
# Or RDP: mstsc /v:YOUR_SERVER_IP
2 Enable WSL2 & Ubuntu
POWERSHELL — AS ADMIN
wsl --install
wsl --set-default-version 2
wsl --install -d Ubuntu-22.04
wsl -l -v # verify
3 Install NVIDIA CUDA for Windows
POWERSHELL
winget install NVIDIA.CUDA
wsl nvidia-smi # GPU accessible from WSL2
4 Install Python & PyTorch
POWERSHELL
winget install Python.Python.3.11
pip install torch torchvision \
--index-url https://download.pytorch.org/whl/cu124
python -c "import torch; print(torch.cuda.is_available())"
ComfyUI Setup
Node-based image generation with FLUX 1.1 Pro and Stable Diffusion. Access via browser tunnel.
1 Clone & Install ComfyUI
BASH
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt
# Install ComfyUI Manager
cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
2 Download FLUX / SDXL Models
BASH — MODEL DOWNLOAD
cd ~/ComfyUI/models/checkpoints
wget https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors
cd ~/ComfyUI/models/vae
wget https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/ae.safetensors
3 Launch & Tunnel
LAUNCH COMFYUI
python3 main.py --listen 0.0.0.0 --port 8188
# SSH tunnel from local machine:
ssh -L 8188:localhost:8188 root@YOUR_SERVER_IP -N
# Open: http://localhost:8188
⚡ Use --lowvram on A100 40GB. H200 runs full-precision FLUX without flags.
Open WebUI + Ollama
ChatGPT-style interface for self-hosted models via Docker. One-command deployment.
1 Install Docker + NVIDIA Container Toolkit
BASH
curl -fsSL https://get.docker.com | sh
systemctl enable --now docker
apt install -y nvidia-container-toolkit
systemctl restart docker
2 Deploy Open WebUI + Ollama (One Command)
DOCKER
docker run -d \
--name open-webui \
--gpus all \
-p 3000:8080 \
-v open-webui:/app/backend/data \
-v ollama:/root/.ollama \
-e OLLAMA_BASE_URL=http://localhost:11434 \
--restart always \
ghcr.io/open-webui/open-webui:ollama
# Access: http://YOUR_SERVER_IP:3000
3 Pull Models via Ollama
BASH
docker exec open-webui ollama pull llama4:scout
docker exec open-webui ollama pull deepseek-r1:70b
docker exec open-webui ollama pull qwen3:32b
⚡ Connect Claude/Gemini/DeepSeek APIs in Settings → Connections for a unified interface.
Ollama Setup
Run LLMs locally on your B8N6 GPU. CLI-based model management with auto GPU acceleration.
BASH
curl -fsSL https://ollama.com/install.sh | sh
ollama --version
BASH — MODEL MANAGEMENT
ollama pull llama4:scout
ollama pull deepseek-r1:70b
ollama pull qwen3:32b
ollama run deepseek-r1:70b
ollama list
API
OLLAMA_HOST=0.0.0.0 ollama serve &
curl http://localhost:11434/api/generate \
-d '{"model":"llama4:scout","prompt":"Hello!"}'
Anthropic Claude API
Integrate Claude Opus 4.6 and Sonnet 4.6 via the Anthropic Python SDK.
BASH
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-YOUR_KEY_HERE"
PYTHON
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role":"user","content":"Hello!"}]
)
print(message.content[0].text)
⚡ Models: claude-opus-4-6 · claude-sonnet-4-6 · claude-haiku-4-5-20251001
Docs: https://docs.anthropic.com
Google Gemini API
Access Gemini 3 Pro and Gemini 2.5 Flash including 1M token context and multimodal inputs.
BASH
pip install google-generativeai
export GOOGLE_API_KEY="AIzaSy-YOUR_KEY"
PYTHON
import google.generativeai as genai
genai.configure(api_key="YOUR_KEY")
model = genai.GenerativeModel("gemini-3.0-pro")
response = model.generate_content("Explain NVLink 4.0")
print(response.text)
⚡ Models: gemini-3.0-pro · gemini-2.5-flash · gemini-2.5-flash-lite
Docs: https://ai.google.dev
DeepSeek API
Use DeepSeek V3.2 and R1 via their OpenAI-compatible API, or self-host via vLLM.
PYTHON — OPENAI COMPATIBLE
from openai import OpenAI
client = OpenAI(
api_key="YOUR_DEEPSEEK_KEY",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role":"user","content":"Hello!"}]
)
print(response.choices[0].message.content)
BASH — VLLM SELF-HOST
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model deepseek-ai/DeepSeek-V3 \
--tensor-parallel-size 8 \
--host 0.0.0.0 --port 8000 \
--gpu-memory-utilization 0.95
⚡ Models: deepseek-chat (V3.2) · deepseek-reasoner (R1) · Docs: https://api-docs.deepseek.com
OpenAI API
Access GPT-5.2, o4-mini, GPT-4o and GPT-oss open-weight models.
BASH
pip install openai
export OPENAI_API_KEY="sk-YOUR_KEY"
PYTHON
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role":"user","content":"Hello!"}]
)
print(response.choices[0].message.content)
# Reasoning model
response = client.chat.completions.create(
model="o4-mini",
messages=[{"role":"user","content":"Solve step by step..."}],
reasoning_effort="high"
)
⚡ Models: gpt-5.2 · gpt-5 · o4-mini · o3 · gpt-4o · gpt-oss-120b
Docs: https://platform.openai.com/docs
Rocky Linux 9 Setup
Enterprise-grade RHEL-compatible OS. Ideal for GPU servers requiring maximum stability and CUDA support.
1 Initial Server Access & System Update
SSH + UPDATE
ssh root@YOUR_SERVER_IP
dnf update -y
dnf install -y epel-release
dnf groupinstall -y "Development Tools"
2 Install NVIDIA Drivers & CUDA
CUDA ON ROCKY 9
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
dnf install -y cuda-toolkit-12-6 nvidia-driver
reboot
nvidia-smi # verify after reboot
3 Install Python & PyTorch
PYTHON SETUP
dnf install -y python3.11 python3.11-pip
python3.11 -m venv ~/ai-env
source ~/ai-env/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
⚡ Rocky Linux 9 = RHEL9-compatible. Great for enterprise environments requiring SELinux and corporate compliance.
Debian 12 (Bookworm) Setup
Rock-solid Debian 12 with NVIDIA GPU and CUDA support. Preferred by many AI researchers for its stability.
1 System Update & Dependencies
DEBIAN SETUP
apt update && apt upgrade -y
apt install -y build-essential linux-headers-$(uname -r) curl wget git python3 python3-pip python3-venv
2 NVIDIA Drivers via apt
NVIDIA DEBIAN
apt install -y software-properties-common
add-apt-repository contrib non-free-firmware
apt update && apt install -y nvidia-driver firmware-misc-nonfree
reboot
nvidia-smi # verify
3 CUDA Toolkit
CUDA ON DEBIAN
wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
apt update && apt install -y cuda-toolkit-12-6
AlmaLinux 9 Setup
AlmaLinux 9: Community RHEL binary-compatible distro. Excellent for production GPU workloads.
1 Initial Setup
ALMALINUX 9
dnf update -y
dnf install -y epel-release
dnf groupinstall "Development Tools" -y
dnf install -y wget curl git htop
2 CUDA Repository & Drivers
CUDA ALMALINUX
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
dnf install -y cuda-toolkit-12-6 nvidia-driver
reboot && nvidia-smi
⚡ AlmaLinux is 1:1 binary compatible with RHEL9. Ideal choice if you're migrating from CentOS.
Kubernetes (K8s) + GPU
Deploy GPU-accelerated Kubernetes clusters for AI inference workloads. Includes NVIDIA Device Plugin and GPU Operator setup.
1 Install kubeadm, kubelet, kubectl
K8S INSTALL
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /" | tee /etc/apt/sources.list.d/kubernetes.list
apt update && apt install -y kubelet kubeadm kubectl
kubeadm init --pod-network-cidr=192.168.0.0/16
2 Install NVIDIA GPU Operator
GPU OPERATOR
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator
3 Deploy GPU AI Workload
GPU POD
kubectl apply -f - <kubectl logs ai-inference
Docker Swarm Cluster
Multi-host Docker Swarm for distributed AI service deployment with GPU support.
1 Install Docker & Initialize Swarm
SWARM INIT
curl -fsSL https://get.docker.com | sh
docker swarm init --advertise-addr YOUR_MANAGER_IP
# On worker nodes:
docker swarm join --token SWMTKN-xxx MANAGER_IP:2377
2 Deploy AI Stack
SWARM DEPLOY
docker stack deploy -c docker-compose.yml ai-stack
docker service ls
docker stack ps ai-stack
Nginx Reverse Proxy
Set up Nginx as a reverse proxy for your AI services with SSL termination and load balancing.
1 Install Nginx & Certbot
NGINX + SSL
apt install -y nginx certbot python3-certbot-nginx
certbot --nginx -d your.domain.com
2 Configure Proxy for AI API
NGINX CONFIG
server {
listen 443 ssl;
server_name your.domain.com;
location /api/ {
proxy_pass http://127.0.0.1:8000/;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 300s;
}
}
Postfix + Dovecot Mail
Production mail server with Postfix (SMTP) and Dovecot (IMAP/POP3). Supports TLS and modern authentication.
1 Install Postfix & Dovecot
MAIL SERVER
apt install -y postfix dovecot-core dovecot-imapd dovecot-pop3d spamassassin opendkim
# Choose: Internet Site during postfix setup
2 Configure Postfix
/ETC/POSTFIX/MAIN.CF
myhostname = mail.yourdomain.com
mydomain = yourdomain.com
inet_interfaces = all
smtpd_tls_cert_file = /etc/ssl/certs/cert.pem
smtpd_tls_key_file = /etc/ssl/private/key.pem
smtpd_use_tls = yes
⚡ Set up SPF, DKIM, and DMARC DNS records for email deliverability. Contact support for DNS help.
Mailcow Dockerized
Full-featured mail server suite via Docker. Includes webmail (SOGo), antispam, DKIM, and admin panel.
1 Clone & Configure
MAILCOW SETUP
git clone https://github.com/mailcow/mailcow-dockerized
cd mailcow-dockerized
./generate_config.sh
# Enter: mail.yourdomain.com when prompted
2 Start Mailcow
DOCKER COMPOSE
docker compose pull
docker compose up -d
# Admin UI: https://mail.yourdomain.com/admin
# Default admin password in mailcow.conf
Nextcloud Self-Hosted Cloud
Deploy Nextcloud for private cloud storage, collaboration, and file sync — a DigitalOcean Spaces or Dropbox alternative on your own server.
1 Install Dependencies
NEXTCLOUD STACK
apt install -y apache2 mariadb-server php8.2 php8.2-{curl,gd,mbstring,xml,zip,mysql,intl,bcmath,gmp}
systemctl enable --now apache2 mariadb
2 Deploy via Docker (Recommended)
DOCKER NEXTCLOUD
docker run -d --name nextcloud -p 8080:80 -v nextcloud:/var/www/html -e MYSQL_HOST=db -e NEXTCLOUD_ADMIN_USER=admin --restart always nextcloud:latest
⚡ B8N6 servers are perfect for Nextcloud — NVMe SSD ensures fast file access. Enable GPU transcoding for photos/videos.
GitLab Community Edition
Self-hosted GitLab CE for private code repos, CI/CD pipelines, and team collaboration.
1 Install GitLab CE
GITLAB INSTALL
curl -fsSL https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh | bash
apt install -y gitlab-ce
EXTERNAL_URL="https://gitlab.yourdomain.com" apt install gitlab-ce
2 Configure & Start
GITLAB CONFIGURE
gitlab-ctl reconfigure
gitlab-ctl status
# Get initial root password:
cat /etc/gitlab/initial_root_password
Grafana + Prometheus
Full observability stack for monitoring GPU utilization, AI model performance, and server health.
1 Install Prometheus
PROMETHEUS
wget https://github.com/prometheus/prometheus/releases/latest/download/prometheus-linux-amd64.tar.gz
tar -xvf prometheus-linux-amd64.tar.gz
cd prometheus-linux-amd64
./prometheus --config.file=prometheus.yml &
# GPU exporter:
docker run -d --gpus all -p 9835:9835 utkuozdemir/nvidia_gpu_exporter:1.2.0
2 Install Grafana
GRAFANA
apt install -y grafana
systemctl enable --now grafana-server
# Access: http://YOUR_SERVER_IP:3000
# Import NVIDIA GPU dashboard ID: 14574
vLLM Inference Server
High-throughput LLM inference engine. Serves DeepSeek, Llama 4, Mistral and any HuggingFace model with OpenAI-compatible API.
1 Install vLLM
VLLM INSTALL
pip install vllm
# Or with Docker:
docker pull vllm/vllm-openai:latest
2 Serve a Model
SERVE MODEL
python -m vllm.entrypoints.openai.api_server --model deepseek-ai/DeepSeek-V3 --tensor-parallel-size 8 --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.95 --max-model-len 32768
3 Query the API
API CALL
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[{"role":"user","content":"Hello!"}]
)
print(response.choices[0].message.content)
⚡ Use --tensor-parallel-size equal to your GPU count. H200×8 can serve 70B+ models at full speed.