Ollama简明教程

Ollama是一个开源的大模型管理工具，通过Ollama可以轻松管理本地大模型，提高模型的部署效率。

下面介绍安装Ollama并部署Llama3.2、Qwen2.5、Gemma2等几个流行开源大模型，并使用Gradio创建webui用于交互。

Ollama安装

curl -fsSL https://ollama.com/install.sh | sh

安装成功自动运行ollama服务，如需手动启动服务，运行如下命令

systemctl stop    ollama.service # 停止服务
systemctl restart ollama.service # 重启服务
systemctl enable  ollama.service # 开机启动

Ollama用法

ollama help

Usage:
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

下载并运行模型

#ollama run llama3.2
#ollama run gemma2
#ollama run qwen2.5
ollama run deepseek-r1

>>> 你是谁
您好！我是由中国的深度求索（DeepSeek）公司开发的智能助手DeepSeek-R1。如您有任何任何问题，我会尽我所能为您提供帮助。

>>> /bye

ollama list

NAME               ID              SIZE      MODIFIED    
qwen2.5:latest     845dbda0ea48    4.7 GB    1 days ago     
llama3.2:latest    a80c4f17acd5    2.0 GB    1 days ago
deepseek-r1:latest 0a8c26691023    4.7 GB    1 days ago

模型安装位置

/usr/share/ollama/.ollama/

其他模型参见：https://ollama.com/search

deepseek-r1
qwen2.5
qwen2.5-coder
llama3.2
llama3.2-vision
gemma2

Python接口

ollama服务启动后会监听本机11434接口，用于API为调用。

如果需要其他设备访问，需要添加环境变量，然后重启服务。

/etc/systemd/system/ollama.service

Environment="OLLAMA_HOST=0.0.0.0:11434"

安装ollama库

pip install ollama

Python测试代码

import ollama

def chat_ollama(question, model='qwen2.5'):
    text = ''
    stream = True
    ollama_host = 'http://127.0.0.1:11434'
    client = ollama.Client(host=ollama_host)
    
    response = client.chat(model=model, stream=stream, messages=[
        {'role': 'user', 'content': question},
    ])
    
    if stream:
        for chunk in response:
            content = chunk['message']['content']
            text += content
            print(content, end='', flush=True)
    else:
        content = response['message']['content']
        text += content
        print(content)
    
    print('\n')
    return text

if __name__ == '__main__':
    chat_ollama('你是谁')

运行输出

我是Qwen，一个由阿里云开发的超大规模语言模型。我被设计用来回答问题、提供信息、参与对话以及帮助用户解决各种问题。如果你有任何疑问或需要帮助，都可以尝试和我说话哦！

API访问

curl http://127.0.0.1:11434/api/generate -d '{
  "model": "llama3.2",
  "stream": false,
  "prompt":"天空为什么是蓝的"
}'

curl http://192.168.1.20:11434/api/generate -d '{
  "model": "llama3.2",
  "stream": false,
  "prompt":"十进制1111转十六进制过程"
}'

curl http://192.168.1.20:11435/api/chat -d '{
  "model": "llama3.2",
  "stream": false,
  "messages": [
    { "role": "user", "content": "鲁迅和周树人什么关系" }
  ]
}'

Gradio创建webUI

Grdio是一个开源Python库，可以快速创建大语言模型的交互webUI，无需了解HTTP、CSS、JavaScript等web语言。

安装及测试

安装gradio库

pip install gradio

在上一个例子基础上添加gradio创建的UI

import ollama
import gradio as gr

def chat_ollama(question, model='qwen2.5'):
    text = ''
    stream = True
    ollama_host = 'http://127.0.0.1:11434'
    client = ollama.Client(host=ollama_host)
    
    response = client.chat(model=model, stream=stream, messages=[
        {'role': 'user', 'content': question},
    ])
    
    if stream:
        for chunk in response:
            content = chunk['message']['content']
            text += content
            print(content, end='', flush=True)
            yield text
    else:
        content = response['message']['content']
        text += content
        print(content)
        yield text
    
    print('\n')
    return text

def chat_response(message, history):
    resp = chat_ollama(message)
    for r in resp:
        yield r

def webui():
    demo = gr.ChatInterface(fn=chat_response, type='messages', examples=['你好', '你是谁'])
    demo.launch(server_name='0.0.0.0')

if __name__ == '__main__':
    webui()

访问地址

http://127.0.0.1:7860

运行效果如下图所示