How to Easily Share OpenLLM API Online


Updated on Feb 10, 2025
· 4 mins read
OpenLLM Pinggy Self-Hosted AI LLM Deployment AI Tunneling

In the era of generative AI, self-hosting large language models (LLMs) gives developers full control over data privacy and model customization. OpenLLM emerges as a powerful toolkit for deploying models like Llama 3 or Mistral locally, while Pinggy enables secure internet exposure without complex infrastructure. This guide walks you through self-hosting an LLM endpoint with a public URL, making it accessible and shareable in minutes.

Self-Host LLMs with OpenLLM and Pinggy

Summary

  1. Install OpenLLM & Deploy a Model

    pip install openllm
    openllm serve llama3.2:1b-instruct-ggml-fp16-linux
    

    Change llama3.2:1b-instruct-ggml-fp16-linux to the model you want to execute.

  2. Expose API via Pinggy
    Tunnel port 3000:

    ssh -p 443 -R0:localhost:3000 a.pinggy.io
    
    • Share the generated URL for ChatGPT-like access to your LLMs.

Why Self-Host LLMs with OpenLLM?

The Rise of Local AI Deployment

With growing concerns about data privacy and API costs, tools like OpenLLM have become essential for running LLMs locally. However, limiting access to your local network restricts their utility. By sharing them online, you can:

  • Collaborate remotely with team members or clients.
  • Integrate AI into web/mobile apps via OpenLLM’s API.
  • Demo projects without deploying to the cloud.
  • Reduce latency by keeping inference local while enabling remote access.

Why Choose Pinggy for Tunneling?

Pinggy simplifies port forwarding by creating secure tunnels. Unlike alternatives like ngrok, it offers:

  • Free HTTPS URLs with no installation required.
  • No rate limits on the free tier.
  • No bandwidth limit with the Pinggy Pro plan.
  • Web Debugger Observe and monitor requests coming to your server.

Step-by-Step Guide to Sharing OpenLLM

Step 1: Install OpenLLM & Deploy Models

Prerequisites

  • Python
  • pip package manager

Install openllm

pip install openllm
install openllm

Start a Model Server

Launch llama3.2:1b-instruct-ggml-fp16-linux. You can also choose a different model from this list.

openllm serve llama3.2:1b-instruct-ggml-fp16-linux

Available models include:

  • mistral
  • falcon
  • qwen
  • dolly-v2
serve openllm

Step 2: Expose OpenLLM API via Pinggy

Create Secure Tunnel

While OpenLLM runs on port 3000:

ssh -p 443 -R0:localhost:3000 a.pinggy.io

Command Breakdown:

  • -p 443: Connects via HTTPS for firewall compatibility.
  • -R0:localhost:3000: Forwards Ollama’s port to Pinggy.
  • qr@a.pinggy.io: Pinggy’s tunneling endpoint.
Pinggy Command

You’ll receive a public URL like https://xyz123.pinggy.link.

Pinggy Public Url

API Endpoints

  • OpenLLM Swagger Docs:
    curl https://xyz123.pinggy.link/
    
openllm swagger docs
  • OpenLLM WebUI:
    curl https://xyz123.pinggy.link/chat
    
openllm webui
  • List Models:
    curl https://xyz123.pinggy.link/v1/models
    
openllm models

Advanced Configuration and Security Best Practices

Secure Your Deployment

  • Enable Basic Authentication in Pinggy:
    Secure your tunnel by appending a username and password to your SSH command:

    ssh -p 443 -R0:localhost:3000 -t a.pinggy.io b:username:password
    

You can also configure multiple username-password pairs for enhanced access control. For more details, refer to the official documentation.

Custom Domains and Performance Optimization

With Pinggy Pro ((3 USD/month)/month), you can set up a custom domain for your tunnels. This enhances branding and improves accessibility.

For a step-by-step guide on setting up a custom domain, refer to the Pinggy Custom Domain Documentation.

Real-World Use Cases for Remote AI Access

Collaborative Development

Distributed teams can:

  • Share a single Ollama instance for code review and documentation generation.
  • Train custom models collaboratively using Open WebUI’s RAG features.

Customer-Facing Applications

Expose OpenLLM’s API to power: b

  • AI chatbots for 24/7 customer support.
  • Content generation tools for blogs, social media, or product descriptions.

Academic and Research Workflows

Researchers can securely share access to proprietary models with peers without exposing internal infrastructure.

Troubleshooting

Model Fails to Load

  • Verify hardware requirements (RAM/GPU)
  • Check quantization compatibility:
    openllm run llama3.2:1b-instruct-ggml-fp16-linux --quantize int4
    

Connection Timeouts

  • Use Pinggy’s TCP mode for unstable networks:
    while true; do 
      ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:3000 a.pinggy.io; 
    sleep 10; done
    
    Note: This command requires Pinggy Pro.

Conclusion

Combining OpenLLM’s flexible model serving with Pinggy’s secure tunneling provides a quick and easy way to deploy AI models accessible from anywhere. Whether you’re prototyping chatbots or testing NLP pipelines, this stack simplifies remote access without the complexity of traditional deployments.

  • Complete data ownership
  • Cost-effective scaling
  • Developer-friendly tooling

Ready to deploy? Start with:

pip install openllm && openllm run llama3.2:1b-instruct-ggml-fp16-linux

For advanced configurations, explore OpenLLM Documentation and Pinggy Features.