AI-ML
Ollama v0.18.4
RESUMEN
What's Changed ggml: force flash attention off for grok by @rick-github in mlx: fix KV cache snapshot memory leak by @jessegross in * mlxrunner: schedule periodic snapshots during prefill by @jessegross
Descripción Detallada
What's Changed ggml: force flash attention off for grok by @rick-github in mlx: fix KV cache snapshot memory leak by @jessegross in mlxrunner: schedule periodic snapshots during prefill by @jessegross in doc: update vscode doc by @hoyyeva in Full Changelog:
Actualización de Ollama v0.18.4 con correcciones y mejoras.
- Desactivada la atención flash para grok.
- Corregido un problema de fuga de memoria en el caché KV.
- Programadas instantáneas periódicas durante el prellenado.
- Actualizada la documentación de VSCode.
A quién le importa
Todos los que usan Ollama.
Generado por IA · puede contener errores
Releases Relacionados
AI-ML
Ollama v0.30.10
## What's Changed * models: add Cohere2MoE model by @jmorganca in https://github.com/ollama/ollama/pull/16670 * llama: update llama.cpp to b9672 by @pdevine in https://github.com/ollama/ollama/pull/16775 **Full Changelog**: https://github.com/ollama/ollama/compare/v0.30.9...v0.30.10-rc0
AI-ML
Ollama v0.30.9
## What's Changed * Support for Cohere2Moe architecture * Fixed LFM2 parser/render for cases where thinking was not emitted * Fixed issue where `ollama launch claude` and other coding agent or assistant use cases would only output one token * Ollama will now return an error if a single message i