ai

docker部署ollama+千问2.5-7b大语言模型

1.安装

docker pull ollama/ollama docker run -d -v ollama:/root/.ollama -p 11434:11434 –name ollama ollama/ollama 使用gpu加参数: –gpus all

浏览器访问: http://ip:11434/

进入容器: doicker exec -it ollama bash 执行: ollama run qwen2.5:7b (qwen2.5:7b 是模型名,模型:https://ollama.com/library) 第一次执行会下载模型文件。下载完后运行起来就可以对话了。

2.ollama命令

安装模型:
ollama pull <model_name>
如: ollama pull qwen2.5:7b

运行模型:
ollama run <model_name>

列出可用模型:
ollama list

删除模型:
ollama rm <model_name>

查看运行的模型:
ollama ps

停止一个模型:
ollama stop <model_name>

3.http 接口:


POST http://ip:11434/api/generate HTTP/1.1
content-Type: application/json

{
    "model": "qwen2.5:7b",
     "stream": false,
    "prompt": "简单介绍你自己",
    "max_tokens": 1000,
    "temperature": 0.01,
    "top_p": 0.01
}

###
POST http://ip:11434/api/chat HTTP/1.1
content-Type: application/json

{
  "model": "qwen2.5:7b",
  "stream": false,
  "messages": [
    { "role": "user", "content": "天空为啥是蓝色的?" },
    {"role": "assistant", "content": "The LA Dodgers won in 2020."},
    { "role": "user", "content": "我的前一个问题是什么" }
  ]
}

4. spring-boot 中访问ollama接口

  • 依赖
<dependency>
                <groupId>io.springboot.ai</groupId>
                <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
                <version>1.0.0</version>
            </dependency>
  • application.yml配置:
spring:
  ai:
    ollama:
      base-url: http://ip:11434
      chat:
        options:
          model: "qwen2.5:7b"
  • java调用

 //注入一个 ollama 客户端对象
    @Autowired
    private OllamaChatClient ollamaChatClient;

  private String modelName = "qwen2.5:7b";

    /**
     *对话
     */
    @PostMapping("/chat")
    public ResData<String> chat(@RequestParam(value = "msg") String msg) {
        List<Message> messageList = new ArrayList<>();
        messageList.add(new UserMessage(msg));
        //messageList.add(new AssistantMessage(msg));
        //messageList.add(new SystemMessage(msg));
        Prompt prompt = new Prompt(
                messageList, //或者直接给msg也行
                OllamaOptions.create()
                        .withModel(modelName)
                        .withTemperature(0.1F));

        ChatResponse response = ollamaChatClient.call(prompt);
        String content = response.getResult().getOutput().getContent();
//流式响应
//Flux<ChatResponse> flux = ollamaChatClient.stream(prompt);
//                flux.subscribe(item -> {
//                    String content = item.getResult().getOutput().getContent();
//
//                    push(emitter, content);
//
//                });
//               
//flux.blockLast();
     
        return ResData.okData(content);
    }

关于作者

程序员,软件工程师,java, golang, rust, c, python,vue, Springboot, mybatis, mysql,elasticsearch, docker, maven, gcc, linux, ubuntu, centos, axum,llm, paddlepaddle, onlyoffice,minio,银河麒麟,中科方德,rpm