In the era of generative AI, self-hosting large language models (LLMs) gives developers full control over data privacy and model customization. OpenLLM emerges as a powerful toolkit for deploying models like Llama 3 or Mistral locally, while Pinggy enables secure internet exposure without complex infrastructure. This guide walks you through self-hosting an LLM endpoint with a public URL, making it accessible and shareable in minutes.
Install OpenLLM & Deploy a Model
pip install openllm
openllm serve llama3.2:1b-instruct-ggml-fp16-linux
Change llama3.2:1b-instruct-ggml-fp16-linux
to the model you want to execute.
Expose API via Pinggy
Tunnel port 3000
:
ssh -p 443 -R0:localhost:3000 a.pinggy.io
With growing concerns about data privacy and API costs, tools like OpenLLM have become essential for running LLMs locally. However, limiting access to your local network restricts their utility. By sharing them online, you can:
Pinggy simplifies port forwarding by creating secure tunnels. Unlike alternatives like ngrok, it offers:
pip install openllm
Launch llama3.2:1b-instruct-ggml-fp16-linux. You can also choose a different model from this list.
openllm serve llama3.2:1b-instruct-ggml-fp16-linux
Available models include:
mistral
falcon
qwen
dolly-v2
While OpenLLM runs on port 3000
:
ssh -p 443 -R0:localhost:3000 a.pinggy.io
Command Breakdown:
-p 443
: Connects via HTTPS for firewall compatibility.-R0:localhost:3000
: Forwards Ollama’s port to Pinggy.qr@a.pinggy.io
: Pinggy’s tunneling endpoint.You’ll receive a public URL like https://xyz123.pinggy.link
.
curl https://xyz123.pinggy.link/
curl https://xyz123.pinggy.link/chat
curl https://xyz123.pinggy.link/v1/models
Enable Basic Authentication in Pinggy:
Secure your tunnel by appending a username and password to your SSH command:
ssh -p 443 -R0:localhost:3000 -t a.pinggy.io b:username:password
You can also configure multiple username-password pairs for enhanced access control. For more details, refer to the official documentation.
With Pinggy Pro ((3 USD/month)/month), you can set up a custom domain for your tunnels. This enhances branding and improves accessibility.
For a step-by-step guide on setting up a custom domain, refer to the Pinggy Custom Domain Documentation.
Distributed teams can:
Expose OpenLLM’s API to power: b
Researchers can securely share access to proprietary models with peers without exposing internal infrastructure.
Model Fails to Load
openllm run llama3.2:1b-instruct-ggml-fp16-linux --quantize int4
Connection Timeouts
while true; do
ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:3000 a.pinggy.io;
sleep 10; done
Combining OpenLLM’s flexible model serving with Pinggy’s secure tunneling provides a quick and easy way to deploy AI models accessible from anywhere. Whether you’re prototyping chatbots or testing NLP pipelines, this stack simplifies remote access without the complexity of traditional deployments.
Ready to deploy? Start with:
pip install openllm && openllm run llama3.2:1b-instruct-ggml-fp16-linux
For advanced configurations, explore OpenLLM Documentation and Pinggy Features.