Access the Inference Service APIs¶

To access the APIs, there are two different methods.

First method: Using Open WebUI as a proxy to the inference back-ends.
- This allows you to make API calls from anywhere on the internet, but it requires you to use an authenticated API token.
- Open WebUI exposes a more limited API surface compared to vLLM or SGLang.
- To generate an API token from within the chat UI:
  - First log in with your ACCESS account.
  - Then click your user ID (lower-left corner), then Settings, then Account, then API keys, them create a new secret key.
  - Copy out the resulting key.
  - Treat this key like a password, do not share it.
Second method: Direct connections to the vLLM and SGLang inference servers.
- This allows you to make API calls with no token at all, but to prevent abuse, access is limited to Jetstream2 or IU Research Cloud networks and instances.
- If you try to connect from anywhere else, the server will return an HTTP 401 (unauthorized) response.
- It is possible to tunnel connections from a different computer through a Jetstream2 instance; instructions follow.
- vLLM and SGLang expose a more featureful API surface than Open WebUI exposes.

Generally, any of the API connection options (Open WebUI proxy or direct to vLLM/SGLang) expose OpenAI-compatible APIs, but there may be nuances that apply to your chosen connection option in Open WebUI, vLLM, or SGLang documentation.

If you are connecting directly to vLLM or SGLang without an API token, but your application insists that you provide one anyway, any non-empty string should work.

API Endpoints¶

If connecting via OpenWebUI proxy (first method above), use https://llm.jetstream-cloud.org/api/.
- You must pass an Authorization header containing your API token (e.g. -H "Authorization: bearer your-token-here" if using curl).
To access DeepSeek R1 directly (second method above),
- Use https://llm.jetstream-cloud.org/sglang/v1/.
- Specify model ID DeepSeek-R1.
To access Llama 4 Scout directly (second method above),
- Use https://llm.jetstream-cloud.org/llama-4-scout/v1/.
- Specify model ID llama-4-scout.

You can exchange these in the examples below to access different models.

Accessing APIs From a Jetstream2 instance¶

curl or otherwise connect to https://llm.jetstream-cloud.org/llama-4-scout/v1/. An example query directly to vLLM:

curl https://llm.jetstream-cloud.org/llama-4-scout/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "llama-4-scout",
        "prompt": "What is the difference between SSH and SSL",
        "max_tokens": 64,
        "temperature": 0.7
      }'

Accessing APIs from your own computer (via Open WebUI proxy)¶

curl or otherwise connect to https://llm.jetstream-cloud.org/api/, passing your API token as an argument. An example query:

curl https://llm.jetstream-cloud.org/api/chat/completions \
  -H "Authorization: bearer your-token-here" \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "llama-4-scout",
        "messages": [
          {
            "role": "user",
            "content": "What is the difference between SSH and SSL"
          }
        ],
        "max_tokens": 64
      }'

Accessing APIs from your own computer (tunnel to access vLLM / SGLang directly)¶

You can make connections to vLLM and SGLang from a computer that is not a Jetstream2 or IU Research Cloud instance, but you must tunnel the connection through an existing Jetstream2 or IU Research Cloud instance that you have access to.

There are several ways to do this; here are two examples. The sshuttle method is simpler but requires installing software (sshuttle) on the client computer. The port forwarding method requires root access on the client computer, but requires no additional client-side software.

Tunneling via sshuttle¶

First, install sshuttle if you haven’t already. (sudo apt install sshuttle on Ubuntu, brew install sshuttle or sudo port install sshuttle on Mac OS).

Then, run this command:

sshuttle -r exouser@your-instance-floating-ip-here 149.165.156.93/32

This directs sshuttle to connect to your instance, and forwards all connections to 149.165.156.93 (the inference server) through the instancel.

Now you can connect to the API at (e.g.) https://llm.jetstream-cloud.org/llama-4-scout/v1. Note that you must leave the sshuttle connection open while you’re using the inference service.

Tunneling via SSH Port Forwarding¶

First, add this to your local computer’s /etc/hosts file:

127.0.0.1 llm.jetstream-cloud.org

This directs your computer to resolve network connections to llm.jetstream-cloud.org to itself (the loopback address). Note that you usually need to become the root user (i.e. sudo) in order to modify your computer’s /etc/hosts file.

Next, create an SSH connection with TCP port forwarding:

ssh -L 1234:149.165.156.93:443 exouser@your-instance-floating-ip-here

In this example, we’re forwarding local TCP port 1234 (on your computer) through the SSH server (i.e. your instance) to the destination 149.165.156.93:443 (i.e. the inference server). You do not need to use the shell inside this SSH session, but you must leave the connection open while you’re using the inference service. (If the connection closes or breaks, e.g. because you close your laptop and go somewhere else, you must re-start it in order to continue using the service.)

Now you can connect to the API at (e.g.) https://llm.jetstream-cloud.org:1234/llama-4-scout/v1.