Inference Service API Usage Examples¶

Info

First read Access the APIs for guidance on how to connect to the APIs.

Python example (from Jetstream2 instance or tunnelled connection)¶

pip install openai, then create a Python script with these contents, and run it.

from openai import OpenAI

client = OpenAI(base_url="https://llm.jetstream-cloud.org/llama-4-scout/v1", api_key="empty")

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is the difference between SSH and SSL",
        }
    ],
    model="llama-4-scout",
)

print(chat_completion.choices[0].message.content)

Command line example (from Jetstream2 instance or tunnelled connection)¶

You can also use the llm to access the LLM from the command line, this is particularly convenient because you can integrate it with bash commands.

First, install llm in your favorite Python virtual environment:

pip install llm

Then, find where the configuration files are located:

dirname "$(llm logs path)"

Add a file named extra-openai-models.yaml to the directory that was printed by the previous command, with the following content:

- model_id: llama4scout
  model_name: "llama-4-scout"
  api_base: "https://llm.jetstream-cloud.org/llama-4-scout/v1/"

And set it default:

llm models default llama4scout

Finally you can use it interactively (-s sets the system prompt):

curl https://docs.jetstream-cloud.org/general/inference-service/ | html2text | llm -s "make a 1 paragraph summary"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 69412  100 69412    0     0   105k      0 --:--:-- --:--:-- --:--:--  105k
Here is a 1-paragraph summary of the Jetstream2 Large Language Model Inference Service documentation:

**Summary**: Jetstream2 offers a free, unlimited-use Large Language Model (LLM) Inference Service, powered by Llama 3.3, for its community. The service provides an OpenAI-compatible API and a browser-based chat interface (Open WebUI) for tasks like programming assistance, literature reviews, brainstorming, and writing aid. Access is restricted to Jetstream2 or IU Research Cloud networks and instances, but can be tunneled through from external computers. The service runs on an NVIDIA Grace Hopper server with an H100 GPU, supporting up to 4 simultaneous requests, and is subject to Jetstream2's acceptable use policies, primarily for research, education, or learning purposes.

or you can start a chat session on the command line (-c continues the conversation):

llm chat -c

Using API with Your IDE¶

Using with Cline extension for VSCode or VSCodium (from Jetstream2 instance or tunnelled connection)¶

Cline is a semi-autonomous coding agent with two modes (“plan” and “act”). Cline can also use your terminal to test and debug its own code changes.

Install the “Cline” extension, if you haven’t already. Then, in the extension pane:
If you’re starting the extension for the first time, choose “Use your own API key”. Otherwise, go to the settings part of the Extension pane (gear icon).
Choose “OpenAI Compatible” API Provider
For Base URL, enter
- https://llm.jetstream-cloud.org/sglang/v1 to use DeepSeek R1
  - (slower but higher-quality model, verbose thinking output in extension pane)
- https://llm.jetstream-cloud.org/llama-4-scout/v1 to use Llama 4 Scout
  - (faster but lower-quality model, more concise output in extension pane)
For API key, enter anything (like a space character)
For Model ID, enter
- DeepSeek-R1 if using DeepSeek R1
- llama-4-scout if using Llama 4 Scout
Click “let’s go!” or “Done”

You should now be able to give Cline a task to do.

Using with Continue extension for VSCode or VSCodium (from Jetstream2 instance or tunnelled connection)¶

Continue is an in-editor AI assistant.

Install the Continue extension. In the extension’s config.json, set the models like so:

  "models": [
    {
      "provider": "openai",
      "title": "Jetstream2 Inference Service",
      "apiBase": "https://llm.jetstream-cloud.org/llama-4-scout/v1/",
      "model": "llama-4-scout",
      "useLegacyCompletionsEndpoint": true
    }
  ],

The chat pane should now work.

Using with Aider code assistant (from Jetstream2 instance or tunnelled connection)¶

Aider is a command-line tool that lets you pair program with LLMs while editing code in a git repo. It is similar to Claude Code but is open-source and works with any LLM. To configure aider for Jetstream2’s DeepSeek R1, first install aider in a Python virtual environment:

python3 -m venv aider-venv
source aider-venv/bin/activate
pip install aider-chat

(You may need to exit and restart your terminal before proceeding.)

Next, set the following contents of ~/.aider.conf.yml:

openai-api-base: 'https://llm.jetstream-cloud.org/sglang/v1'
openai-api-key: 'x'

Next, set the following contents of ~/.aider.model.settings.yml:

- name: openai/DeepSeek-R1
  edit_format: diff
  use_repo_map: true
  use_temperature: false
  editor_edit_format: editor-diff
  reasoning_tag: think

Example usage:

aider --model openai/DeepSeek-R1

This will launch an interactive chat interface that can edit files in your current git repository while maintaining conversation context.

Using with JupyterLab via JupyterAI (from Jetstream2 instance or tunnelled connection)¶

Install the jupyter-ai package, version 2.29.1 or higher, and langchain-openai. In JupyterLab, open the JupyterAI settings, and configure:

Completion model = OpenRouter :: *
Local model ID = currently Deepseek R1 or llama-4-scout, according to your API base URL. You can find the most recent available models using the API Base url, appending models to it, and checking the output in your browser, for example https://llm.jetstream-cloud.org/llama-4-scout/v1/models for vLLM.
API Base url = https://llm.jetstream-cloud.org/sglang/v1/ for Deepseek, or https://llm.jetstream-cloud.org/llama-4-scout/v1/ for Llama 4 Scout.
OPENROUTER_API_KEY = “EMPTY”

Now you should be able to use the JupyterLab chat and the code assistant in the notebooks.