docker部署ollama+千问2.5-7b大语言模型
1.安装
docker pull ollama/ollama docker run -d -v ollama:/root/.ollama -p 11434:11434 –name ollama ollama/ollama 使用gpu加参数: –gpus all
浏览器访问: http://ip:11434/
进入容器: doicker exec -it ollama bash 执行: ollama run qwen2.5:7b (qwen2.5:7b 是模型名,模型:https://ollama.com/library) 第一次执行会下载模型文件。下载完后运行起来就可以对话了。
2.ollama命令
安装模型:
ollama pull <model_name>
如: ollama pull qwen2.5:7b
运行模型:
ollama run <model_name>
列出可用模型:
ollama list
删除模型:
ollama rm <model_name>
查看运行的模型:
ollama ps
停止一个模型:
ollama stop <model_name>
3.http 接口:
POST http://ip:11434/api/generate HTTP/1.1
content-Type: application/json
{
"model": "qwen2.5:7b",
"stream": false,
"prompt": "简单介绍你自己",
"max_tokens": 1000,
"temperature": 0.01,
"top_p": 0.01
}
###
POST http://ip:11434/api/chat HTTP/1.1
content-Type: application/json
{
"model": "qwen2.5:7b",
"stream": false,
"messages": [
{ "role": "user", "content": "天空为啥是蓝色的?" },
{"role": "assistant", "content": "The LA Dodgers won in 2020."},
{ "role": "user", "content": "我的前一个问题是什么" }
]
}
4. spring-boot 中访问ollama接口
- 依赖
<dependency>
<groupId>io.springboot.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
<version>1.0.0</version>
</dependency>
- application.yml配置:
spring:
ai:
ollama:
base-url: http://ip:11434
chat:
options:
model: "qwen2.5:7b"
- java调用
//注入一个 ollama 客户端对象
@Autowired
private OllamaChatClient ollamaChatClient;
private String modelName = "qwen2.5:7b";
/**
*对话
*/
@PostMapping("/chat")
public ResData<String> chat(@RequestParam(value = "msg") String msg) {
List<Message> messageList = new ArrayList<>();
messageList.add(new UserMessage(msg));
//messageList.add(new AssistantMessage(msg));
//messageList.add(new SystemMessage(msg));
Prompt prompt = new Prompt(
messageList, //或者直接给msg也行
OllamaOptions.create()
.withModel(modelName)
.withTemperature(0.1F));
ChatResponse response = ollamaChatClient.call(prompt);
String content = response.getResult().getOutput().getContent();
//流式响应
//Flux<ChatResponse> flux = ollamaChatClient.stream(prompt);
// flux.subscribe(item -> {
// String content = item.getResult().getOutput().getContent();
//
// push(emitter, content);
//
// });
//
//flux.blockLast();
return ResData.okData(content);
}