mirror of
https://github.com/geoffsee/open-gsio.git
synced 2025-09-08 22:56:46 +00:00
Add scripts and documentation for local inference configuration with Ollama and mlx-omni-server
- Introduced `configure_local_inference.sh` to automatically set `.dev.vars` based on active local inference services. - Updated `start_inference_server.sh` to handle both Ollama and mlx-omni-server server types. - Enhanced `package.json` to include new commands for starting and configuring inference servers. - Refined README to include updated instructions for running and adding models for local inference. - Minor cleanup in `MessageBubble.tsx`.
This commit is contained in:

committed by
Geoff Seemueller

parent
f2d91e2752
commit
9e8b427826
47
README.md
47
README.md
@@ -6,17 +6,18 @@
|
|||||||
<img src="https://github.com/user-attachments/assets/620d2517-e7be-4bb0-b2b7-3aa0cba37ef0" width="250" />
|
<img src="https://github.com/user-attachments/assets/620d2517-e7be-4bb0-b2b7-3aa0cba37ef0" width="250" />
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
## Project Status: Testing
|
|
||||||
|
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
- [Stack](#stack)
|
- [Stack](#stack)
|
||||||
- [Installation](#installation)
|
- [Installation](#installation)
|
||||||
- [Deployment](#deployment)
|
- [Deployment](#deployment)
|
||||||
- [Local Inference](#local-inference)
|
- [Local Inference](#local-inference)
|
||||||
- [Ollama](#ollama)
|
- [Ollama](#ollama)
|
||||||
|
- [Adding models for local inference (ollama)](#adding-models-for-local-inference-ollama)
|
||||||
- [mlx-omni-server (Apple Silicon Only)](#mlx-omni-server-apple-silicon-only)
|
- [mlx-omni-server (Apple Silicon Only)](#mlx-omni-server-apple-silicon-only)
|
||||||
- [Adding models for local inference (Apple Silicon)](#adding-models-for-local-inference-apple-silicon)
|
- [Adding models for local inference (Apple Silicon)](#adding-models-for-local-inference-apple-silicon)
|
||||||
- [Testing](#testing)
|
- [Testing](#testing)
|
||||||
|
- [Troubleshooting](#troubleshooting)
|
||||||
- [History](#history)
|
- [History](#history)
|
||||||
- [License](#license)
|
- [License](#license)
|
||||||
|
|
||||||
@@ -51,26 +52,33 @@
|
|||||||
> Local inference is achieved by overriding the `OPENAI_API_KEY` and `OPENAI_API_ENDPOINT` environment variables. See below.
|
> Local inference is achieved by overriding the `OPENAI_API_KEY` and `OPENAI_API_ENDPOINT` environment variables. See below.
|
||||||
### Ollama
|
### Ollama
|
||||||
~~~bash
|
~~~bash
|
||||||
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama ## Run Ollama (Can also be installed natively)
|
bun run openai:local ollama # Start ollama server
|
||||||
bun run openai:local # Start OpenAI-compatible server
|
bun run openai:local:enable # Configure connection
|
||||||
sed -i '' '/^OPENAI_API_KEY=/d' .dev.vars; echo >> .dev.vars; echo 'OPENAI_API_KEY=required-but-not-used' >> .dev.vars # Reset API key
|
bun run server:dev # Restart server
|
||||||
sed -i '' '/^OPENAI_API_ENDPOINT=/d' .dev.vars; echo >> .dev.vars; echo 'OPENAI_API_ENDPOINT=http://localhost:11434' >> .dev.vars # Reset endpoint
|
|
||||||
bun run server:dev # Start dev server
|
|
||||||
~~~
|
~~~
|
||||||
|
#### Adding models for local inference (ollama)
|
||||||
|
|
||||||
|
~~~bash
|
||||||
|
# See https://ollama.com/library for available models
|
||||||
|
MODEL_TO_ADD=gemma3
|
||||||
|
docker exec -it ollama ollama run ${MODEL_TO_ADD}
|
||||||
|
~~~
|
||||||
### mlx-omni-server (Apple Silicon Only)
|
### mlx-omni-server (Apple Silicon Only)
|
||||||
~~~bash
|
~~~bash
|
||||||
brew tap seemueller-io/tap # Add seemueller-io tap
|
# (prereq) install mlx-omni-server
|
||||||
brew install seemueller-io/tap/mlx-omni-server # Install mlx-omni-server
|
brew tap seemueller-io/tap
|
||||||
bun run openai:local # Start OpenAI-compatible server
|
brew install seemueller-io/tap/mlx-omni-server
|
||||||
sed -i '' '/^OPENAI_API_KEY=/d' .dev.vars; echo >> .dev.vars; echo 'OPENAI_API_KEY=required-but-not-used' >> .dev.vars # Reset API key
|
|
||||||
sed -i '' '/^OPENAI_API_ENDPOINT=/d' .dev.vars; echo >> .dev.vars; echo 'OPENAI_API_ENDPOINT=http://localhost:10240' >> .dev.vars # Reset endpoint
|
bun run openai:local mlx-omni-server # Start mlx-omni-server
|
||||||
bun run server:dev # Start dev server
|
bun run openai:local:enable # Configure connection
|
||||||
|
bun run server:dev # Restart server
|
||||||
~~~
|
~~~
|
||||||
#### Adding models for local inference (Apple Silicon)
|
#### Adding models for local inference (Apple Silicon)
|
||||||
|
|
||||||
~~~bash
|
~~~bash
|
||||||
# ensure mlx-omni-server is running in the background
|
# ensure mlx-omni-server is running
|
||||||
|
|
||||||
|
# See https://huggingface.co/mlx-community for available models
|
||||||
MODEL_TO_ADD=mlx-community/gemma-3-4b-it-8bit
|
MODEL_TO_ADD=mlx-community/gemma-3-4b-it-8bit
|
||||||
|
|
||||||
curl http://localhost:10240/v1/chat/completions \
|
curl http://localhost:10240/v1/chat/completions \
|
||||||
@@ -81,15 +89,20 @@ curl http://localhost:10240/v1/chat/completions \
|
|||||||
}"
|
}"
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
Tests are located in `__tests__` directories next to the code they test. Testing is incomplete at this time.
|
Tests are located in `__tests__` directories next to the code they test. Testing is incomplete at this time.
|
||||||
|
|
||||||
> `bun run test` will run all tests
|
> `bun run test` will run all tests
|
||||||
|
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
1. `bun run clean`
|
||||||
|
1. `bun i`
|
||||||
|
1. `bun server:dev`
|
||||||
|
1. `bun client:dev`
|
||||||
|
1. Submit an issue
|
||||||
|
|
||||||
History
|
History
|
||||||
---
|
---
|
||||||
A high-level overview for the development history of the parent repository, [geoff-seemueller-io](https://geoff.seemueller.io), is provided in [LEGACY.md](./LEGACY.md).
|
A high-level overview for the development history of the parent repository, [geoff-seemueller-io](https://geoff.seemueller.io), is provided in [LEGACY.md](./LEGACY.md).
|
||||||
|
@@ -19,6 +19,9 @@
|
|||||||
"tail:analytics-service": "wrangler tail -c workers/analytics/wrangler-analytics.toml",
|
"tail:analytics-service": "wrangler tail -c workers/analytics/wrangler-analytics.toml",
|
||||||
"tail:session-proxy": "wrangler tail -c workers/session-proxy/wrangler-session-proxy.toml --env production",
|
"tail:session-proxy": "wrangler tail -c workers/session-proxy/wrangler-session-proxy.toml --env production",
|
||||||
"openai:local": "./scripts/start_inference_server.sh",
|
"openai:local": "./scripts/start_inference_server.sh",
|
||||||
|
"openai:local:mlx": "./scripts/start_inference_server.sh mlx-omni-server",
|
||||||
|
"openai:local:ollama": "./scripts/start_inference_server.sh ollama",
|
||||||
|
"openai:local:configure": "scripts/configure_local_inference.sh",
|
||||||
"test": "vitest run",
|
"test": "vitest run",
|
||||||
"test:watch": "vitest",
|
"test:watch": "vitest",
|
||||||
"test:coverage": "vitest run --coverage.enabled=true"
|
"test:coverage": "vitest run --coverage.enabled=true"
|
||||||
|
49
scripts/configure_local_inference.sh
Executable file
49
scripts/configure_local_inference.sh
Executable file
@@ -0,0 +1,49 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
|
||||||
|
# Ensure .dev.vars file exists.
|
||||||
|
# This prevents errors if sed tries to edit a non-existent file and ensures '>>' appends.
|
||||||
|
# Function to configure .dev.vars with the specified API key and endpoint
|
||||||
|
configure_dev_vars() {
|
||||||
|
local endpoint_url=$1
|
||||||
|
local api_key="required-but-not-used"
|
||||||
|
|
||||||
|
echo "Configuring .dev.vars for endpoint: ${endpoint_url}"
|
||||||
|
|
||||||
|
# Configure OPENAI_API_KEY
|
||||||
|
# 1. Remove any existing OPENAI_API_KEY line
|
||||||
|
sed -i '' '/^OPENAI_API_KEY=/d' .dev.vars
|
||||||
|
# 2. Append a blank line (ensures the new variable is on a new line and adds spacing)
|
||||||
|
# 3. Append the new OPENAI_API_KEY line
|
||||||
|
echo "OPENAI_API_KEY=${api_key}" >> .dev.vars
|
||||||
|
|
||||||
|
# Configure OPENAI_API_ENDPOINT
|
||||||
|
# 1. Remove any existing OPENAI_API_ENDPOINT line
|
||||||
|
sed -i '' '/^OPENAI_API_ENDPOINT=/d' .dev.vars
|
||||||
|
# 3. Append the new OPENAI_API_ENDPOINT line
|
||||||
|
echo "OPENAI_API_ENDPOINT=${endpoint_url}" >> .dev.vars
|
||||||
|
|
||||||
|
echo "Local inference is configured for $endpoint_url"
|
||||||
|
}
|
||||||
|
|
||||||
|
echo "Checking for local inference services..."
|
||||||
|
|
||||||
|
# Check for Ollama on port 11434
|
||||||
|
# nc -z -w1 localhost 11434:
|
||||||
|
# -z: Zero-I/O mode (port scanning)
|
||||||
|
# -w1: Timeout after 1 second
|
||||||
|
# >/dev/null 2>&1: Suppress output from nc
|
||||||
|
if nc -z -w1 localhost 11434 >/dev/null 2>&1; then
|
||||||
|
echo "Ollama service detected on port 11434."
|
||||||
|
configure_dev_vars "http://localhost:11434"
|
||||||
|
# Else, check for mlx-omni-server on port 10240
|
||||||
|
elif nc -z -w1 localhost 10240 >/dev/null 2>&1; then
|
||||||
|
echo "mlx-omni-server service detected on port 10240."
|
||||||
|
configure_dev_vars "http://localhost:10240"
|
||||||
|
else
|
||||||
|
echo "No active local inference service (Ollama or mlx-omni-server) found on default ports (11434, 10240)."
|
||||||
|
echo "If a service is running on a different port, .dev.vars may need manual configuration."
|
||||||
|
echo ".dev.vars was not modified by this script for OpenAI local inference settings."
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Script finished."
|
@@ -1,8 +1,12 @@
|
|||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
SERVER_TYPE="mlx-omni-server"
|
if [ "$1" = "mlx-omni-server" ]; then
|
||||||
|
printf "Starting Inference Server: %s\n" "$1"
|
||||||
printf "Starting Inference Server: %s\n" ${SERVER_TYPE}
|
mlx-omni-server --log-level debug
|
||||||
|
elif [ "$1" = "ollama" ]; then
|
||||||
|
echo "starting ollama"
|
||||||
mlx-omni-server --log-level debug
|
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
||||||
|
else
|
||||||
|
printf "Error: First argument must be 'mlx-omni-server'\n"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
@@ -1,5 +1,4 @@
|
|||||||
import React, { useEffect, useRef, useState } from "react";
|
import React, { useEffect, useRef, useState } from "react";
|
||||||
import { motion } from "framer-motion";
|
|
||||||
import { Box, Flex, Text } from "@chakra-ui/react";
|
import { Box, Flex, Text } from "@chakra-ui/react";
|
||||||
import MessageRenderer from "./ChatMessageContent";
|
import MessageRenderer from "./ChatMessageContent";
|
||||||
import { observer } from "mobx-react-lite";
|
import { observer } from "mobx-react-lite";
|
||||||
@@ -65,14 +64,7 @@ const MessageBubble = observer(({ msg, scrollRef }) => {
|
|||||||
};
|
};
|
||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
if (
|
if (clientChatStore.items.length > 0 && clientChatStore.isLoading && UserOptionsStore.followModeEnabled) { // Refine condition
|
||||||
clientChatStore.items.length > 0 &&
|
|
||||||
clientChatStore.isLoading &&
|
|
||||||
UserOptionsStore.followModeEnabled
|
|
||||||
) {
|
|
||||||
console.log(
|
|
||||||
`${clientChatStore.items.length}/${clientChatStore.isLoading}/${UserOptionsStore.followModeEnabled}`,
|
|
||||||
);
|
|
||||||
scrollRef.current?.scrollTo({
|
scrollRef.current?.scrollTo({
|
||||||
top: scrollRef.current.scrollHeight,
|
top: scrollRef.current.scrollHeight,
|
||||||
behavior: "auto",
|
behavior: "auto",
|
||||||
|
Reference in New Issue
Block a user