How to Easily Share OpenLLM API Online

Updated on Jun 30, 2025 · 3 mins read

OpenLLM Pinggy Self-Hosted AI LLM Deployment AI Tunneling

In the era of generative AI, self-hosting large language models (LLMs) gives developers full control over data privacy and model customization. OpenLLM emerges as a powerful toolkit for deploying models like Llama 3 or Mistral locally, while Pinggy enables secure internet exposure without complex infrastructure. This guide walks you through self-hosting an LLM endpoint with a public URL, making it accessible and shareable in minutes.

Summary

Install OpenLLM & Deploy a Model
```
pip install openllm
openllm serve llama3.2:1b-instruct-ggml-fp16-linux
```
Change llama3.2:1b-instruct-ggml-fp16-linux to the model you want to execute.
Expose API via Pinggy
Tunnel port 3000:
```
ssh -p 443 -R0:localhost:3000 a.pinggy.io
```
- Share the generated URL for ChatGPT-like access to your LLMs.

Why Self-Host LLMs with OpenLLM?

The Rise of Local AI Deployment

With growing concerns about data privacy and API costs, tools like OpenLLM have become essential for running LLMs locally. However, limiting access to your local network restricts their utility. By sharing them online, you can:

Collaborate remotely with team members or clients.
Integrate AI into web/mobile apps via OpenLLM’s API.
Demo projects without deploying to the cloud.
Reduce latency by keeping inference local while enabling remote access.

Why Choose Pinggy for Tunneling?

Pinggy simplifies port forwarding by creating secure tunnels. Unlike alternatives like ngrok, it offers:

Free HTTPS URLs with no installation required.
No rate limits on the free tier.
No bandwidth limit with the Pinggy Pro plan.
Web Debugger Observe and monitor requests coming to your server.

Step 1: Install OpenLLM & Deploy Models

Prerequisites

Python
pip package manager

Install openllm

pip install openllm

Start a Model Server

Launch llama3.2:1b-instruct-ggml-fp16-linux. You can also choose a different model from this list.

openllm serve llama3.2:1b-instruct-ggml-fp16-linux

Available models include:

mistral
falcon
qwen
dolly-v2

Step 2: Expose OpenLLM API via Pinggy

Create Secure Tunnel

While OpenLLM runs on port 3000:

ssh -p 443 -R0:localhost:3000 a.pinggy.io

ssh -p 443 -R0:localhost:3000 a.pinggy.io

Command Breakdown:

-p 443: Connects via HTTPS for firewall compatibility.
-R0:localhost:3000: Forwards Ollama’s port to Pinggy.
qr@a.pinggy.io: Pinggy’s tunneling endpoint.

You’ll receive a public URL like https://xyz123.pinggy.link.

API Endpoints

OpenLLM Swagger Docs:
```
curl https://xyz123.pinggy.link/
```

OpenLLM WebUI:
```
curl https://xyz123.pinggy.link/chat
```

List Models:

curl https://xyz123.pinggy.link/v1/models

Advanced Configuration and Security Best Practices

Secure Your Deployment

Enable Basic Authentication in Pinggy:
Secure your tunnel by appending a username and password to your SSH command:
```
ssh -p 443 -R0:localhost:3000 -t a.pinggy.io b:username:password
```

You can also configure multiple username-password pairs for enhanced access control. For more details, refer to the official documentation.

Custom Domains and Performance Optimization

With Pinggy Pro ((3 USD/month)/month), you can set up a custom domain for your tunnels. This enhances branding and improves accessibility.

For a step-by-step guide on setting up a custom domain, refer to the Pinggy Custom Domain Documentation.

Real-World Use Cases for Remote AI Access

Collaborative Development

Distributed teams can:

Share a single Ollama instance for code review and documentation generation.
Train custom models collaboratively using Open WebUI’s RAG features.

Customer-Facing Applications

Expose OpenLLM’s API to power: b

AI chatbots for 24/7 customer support.
Content generation tools for blogs, social media, or product descriptions.

Academic and Research Workflows

Researchers can securely share access to proprietary models with peers without exposing internal infrastructure.

Troubleshooting

Model Fails to Load

Verify hardware requirements (RAM/GPU)

Check quantization compatibility:

openllm run llama3.2:1b-instruct-ggml-fp16-linux --quantize int4

Connection Timeouts

Use Pinggy’s TCP mode for unstable networks:

while true; do 
  ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:3000 a.pinggy.io; 
sleep 10; done

Note: This command requires Pinggy Pro.

Conclusion

Combining OpenLLM’s flexible model serving with Pinggy’s secure tunneling provides a quick and easy way to deploy AI models accessible from anywhere. Whether you’re prototyping chatbots or testing NLP pipelines, this stack simplifies remote access without the complexity of traditional deployments.

Complete data ownership
Cost-effective scaling
Developer-friendly tooling

Ready to deploy? Start with:

pip install openllm && openllm run llama3.2:1b-instruct-ggml-fp16-linux

For advanced configurations, explore OpenLLM Documentation and Pinggy Features.