mirror of
https://github.com/geoffsee/predict-otron-9001.git
synced 2025-09-08 22:46:44 +00:00
update docs
This commit is contained in:
2
.gitignore
vendored
2
.gitignore
vendored
@@ -74,8 +74,6 @@ venv/
|
|||||||
# Backup files
|
# Backup files
|
||||||
*.bak
|
*.bak
|
||||||
*.backup
|
*.backup
|
||||||
*~
|
|
||||||
/scripts/cli
|
|
||||||
!/scripts/cli.ts
|
!/scripts/cli.ts
|
||||||
/**/.*.bun-build
|
/**/.*.bun-build
|
||||||
/AGENTS.md
|
/AGENTS.md
|
||||||
|
33
README.md
33
README.md
@@ -42,7 +42,7 @@ The system supports both CPU and GPU acceleration (CUDA/Metal), with intelligent
|
|||||||
|
|
||||||
### Workspace Structure
|
### Workspace Structure
|
||||||
|
|
||||||
The project uses a 7-crate Rust workspace plus TypeScript components:
|
The project uses a 9-crate Rust workspace plus TypeScript components:
|
||||||
|
|
||||||
```
|
```
|
||||||
crates/
|
crates/
|
||||||
@@ -51,9 +51,10 @@ crates/
|
|||||||
├── gemma-runner/ # Gemma model inference via Candle (Rust 2021)
|
├── gemma-runner/ # Gemma model inference via Candle (Rust 2021)
|
||||||
├── llama-runner/ # Llama model inference via Candle (Rust 2021)
|
├── llama-runner/ # Llama model inference via Candle (Rust 2021)
|
||||||
├── embeddings-engine/ # FastEmbed embeddings service (Rust 2024)
|
├── embeddings-engine/ # FastEmbed embeddings service (Rust 2024)
|
||||||
├── leptos-app/ # WASM web frontend (Rust 2021)
|
├── chat-ui/ # WASM web frontend (Rust 2021)
|
||||||
├── helm-chart-tool/ # Kubernetes deployment tooling (Rust 2024)
|
├── helm-chart-tool/ # Kubernetes deployment tooling (Rust 2024)
|
||||||
└── scripts/
|
└── cli/ # CLI client crate (Rust 2024)
|
||||||
|
└── package/
|
||||||
└── cli.ts # TypeScript/Bun CLI client
|
└── cli.ts # TypeScript/Bun CLI client
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -61,7 +62,7 @@ crates/
|
|||||||
|
|
||||||
- **Main Server** (port 8080): Orchestrates inference and embeddings services
|
- **Main Server** (port 8080): Orchestrates inference and embeddings services
|
||||||
- **Embeddings Service** (port 8080): Standalone FastEmbed service with OpenAI API compatibility
|
- **Embeddings Service** (port 8080): Standalone FastEmbed service with OpenAI API compatibility
|
||||||
- **Web Frontend** (port 8788): cargo leptos SSR app
|
- **Web Frontend** (port 8788): chat-ui WASM app
|
||||||
- **CLI Client**: TypeScript/Bun client for testing and automation
|
- **CLI Client**: TypeScript/Bun client for testing and automation
|
||||||
|
|
||||||
### Deployment Modes
|
### Deployment Modes
|
||||||
@@ -144,26 +145,26 @@ cargo build --bin embeddings-engine --release
|
|||||||
|
|
||||||
#### Web Frontend (Port 8788)
|
#### Web Frontend (Port 8788)
|
||||||
```bash
|
```bash
|
||||||
cd crates/leptos-app
|
cd crates/chat-ui
|
||||||
./run.sh
|
./run.sh
|
||||||
```
|
```
|
||||||
- Serves Leptos WASM frontend on port 8788
|
- Serves chat-ui WASM frontend on port 8788
|
||||||
- Sets required RUSTFLAGS for WebAssembly getrandom support
|
- Sets required RUSTFLAGS for WebAssembly getrandom support
|
||||||
- Auto-reloads during development
|
- Auto-reloads during development
|
||||||
|
|
||||||
#### TypeScript CLI Client
|
#### TypeScript CLI Client
|
||||||
```bash
|
```bash
|
||||||
# List available models
|
# List available models
|
||||||
bun run scripts/cli.ts --list-models
|
cd crates/cli/package && bun run cli.ts --list-models
|
||||||
|
|
||||||
# Chat completion
|
# Chat completion
|
||||||
bun run scripts/cli.ts "What is the capital of France?"
|
cd crates/cli/package && bun run cli.ts "What is the capital of France?"
|
||||||
|
|
||||||
# With specific model
|
# With specific model
|
||||||
bun run scripts/cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
|
cd crates/cli/package && bun run cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
|
||||||
|
|
||||||
# Show help
|
# Show help
|
||||||
bun run scripts/cli.ts --help
|
cd crates/cli/package && bun run cli.ts --help
|
||||||
```
|
```
|
||||||
|
|
||||||
## API Usage
|
## API Usage
|
||||||
@@ -279,7 +280,7 @@ cargo test --workspace
|
|||||||
|
|
||||||
**End-to-end test script:**
|
**End-to-end test script:**
|
||||||
```bash
|
```bash
|
||||||
./smoke_test.sh
|
./scripts/smoke_test.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
This script:
|
This script:
|
||||||
@@ -368,7 +369,7 @@ All services include Docker metadata in `Cargo.toml`:
|
|||||||
- Port: 8080
|
- Port: 8080
|
||||||
|
|
||||||
**Web Frontend:**
|
**Web Frontend:**
|
||||||
- Image: `ghcr.io/geoffsee/leptos-app:latest`
|
- Image: `ghcr.io/geoffsee/chat-ui:latest`
|
||||||
- Port: 8788
|
- Port: 8788
|
||||||
|
|
||||||
**Docker Compose:**
|
**Docker Compose:**
|
||||||
@@ -427,7 +428,7 @@ For Kubernetes deployment details, see the [ARCHITECTURE.md](docs/ARCHITECTURE.m
|
|||||||
**Symptom:** WASM compilation failures
|
**Symptom:** WASM compilation failures
|
||||||
**Solution:**
|
**Solution:**
|
||||||
1. Install required targets: `rustup target add wasm32-unknown-unknown`
|
1. Install required targets: `rustup target add wasm32-unknown-unknown`
|
||||||
2. Check RUSTFLAGS in leptos-app/run.sh
|
2. Check RUSTFLAGS in chat-ui/run.sh
|
||||||
|
|
||||||
### Network/Timeout Issues
|
### Network/Timeout Issues
|
||||||
**Symptom:** First-time model downloads timing out
|
**Symptom:** First-time model downloads timing out
|
||||||
@@ -458,18 +459,18 @@ curl -s http://localhost:8080/v1/models | jq
|
|||||||
|
|
||||||
**CLI client test:**
|
**CLI client test:**
|
||||||
```bash
|
```bash
|
||||||
bun run scripts/cli.ts "What is 2+2?"
|
cd crates/cli/package && bun run cli.ts "What is 2+2?"
|
||||||
```
|
```
|
||||||
|
|
||||||
**Web frontend:**
|
**Web frontend:**
|
||||||
```bash
|
```bash
|
||||||
cd crates/leptos-app && ./run.sh &
|
cd crates/chat-ui && ./run.sh &
|
||||||
# Navigate to http://localhost:8788
|
# Navigate to http://localhost:8788
|
||||||
```
|
```
|
||||||
|
|
||||||
**Integration test:**
|
**Integration test:**
|
||||||
```bash
|
```bash
|
||||||
./smoke_test.sh
|
./scripts/smoke_test.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
**Cleanup:**
|
**Cleanup:**
|
||||||
|
@@ -1,2 +1,41 @@
|
|||||||
# chat-ui
|
# chat-ui
|
||||||
This is served by the predict-otron-9000 server. This needs to be built before the server.
|
|
||||||
|
A WASM-based web chat interface for the predict-otron-9000 AI platform.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The chat-ui provides a real-time web interface for interacting with language models through the predict-otron-9000 server. Built with Leptos and compiled to WebAssembly, it offers a modern chat experience with streaming response support.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- Real-time chat interface with the inference server
|
||||||
|
- Streaming response support
|
||||||
|
- Conversation history
|
||||||
|
- Responsive web design
|
||||||
|
- WebAssembly-powered for optimal performance
|
||||||
|
|
||||||
|
## Building and Running
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
- Rust toolchain with WASM target: `rustup target add wasm32-unknown-unknown`
|
||||||
|
- The predict-otron-9000 server must be running on port 8080
|
||||||
|
|
||||||
|
### Development Server
|
||||||
|
```bash
|
||||||
|
cd crates/chat-ui
|
||||||
|
./run.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This starts the development server on port 8788 with auto-reload capabilities.
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
1. Start the predict-otron-9000 server: `./scripts/run_server.sh`
|
||||||
|
2. Start the chat-ui: `cd crates/chat-ui && ./run.sh`
|
||||||
|
3. Navigate to `http://localhost:8788`
|
||||||
|
4. Start chatting with your AI models!
|
||||||
|
|
||||||
|
## Technical Details
|
||||||
|
- Built with Leptos framework
|
||||||
|
- Compiled to WebAssembly for browser execution
|
||||||
|
- Communicates with predict-otron-9000 API via HTTP
|
||||||
|
- Sets required RUSTFLAGS for WebAssembly getrandom support
|
@@ -3,7 +3,7 @@
|
|||||||
A Rust/Typescript Hybrid
|
A Rust/Typescript Hybrid
|
||||||
|
|
||||||
```console
|
```console
|
||||||
./cli [options] [prompt]
|
bun run cli.ts [options] [prompt]
|
||||||
|
|
||||||
Simple CLI tool for testing the local OpenAI-compatible API server.
|
Simple CLI tool for testing the local OpenAI-compatible API server.
|
||||||
|
|
||||||
@@ -14,10 +14,11 @@ Options:
|
|||||||
--help Show this help message
|
--help Show this help message
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
./cli "What is the capital of France?"
|
cd crates/cli/package
|
||||||
./cli --model gemma-3-1b-it --prompt "Hello, world!"
|
bun run cli.ts "What is the capital of France?"
|
||||||
./cli --prompt "Who was the 16th president of the United States?"
|
bun run cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
|
||||||
./cli --list-models
|
bun run cli.ts --prompt "Who was the 16th president of the United States?"
|
||||||
|
bun run cli.ts --list-models
|
||||||
|
|
||||||
The server must be running at http://localhost:8080
|
The server must be running at http://localhost:8080
|
||||||
```
|
```
|
@@ -1,4 +1,100 @@
|
|||||||
# Embeddings Engine
|
# Embeddings Engine
|
||||||
|
|
||||||
A high-performance text embeddings service that generates vector representations of text using state-of-the-art models.
|
A high-performance text embeddings service that generates vector representations of text using state-of-the-art models. This crate wraps the FastEmbed library to provide embeddings with OpenAI-compatible API endpoints.
|
||||||
This crate wraps the fastembed crate to provide embeddings and partially adapts the openai specification.
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The embeddings-engine provides a standalone service for generating text embeddings that can be used for semantic search, similarity comparisons, and other NLP tasks. It's designed to be compatible with OpenAI's embeddings API format.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **OpenAI-Compatible API**: `/v1/embeddings` endpoint matching OpenAI's specification
|
||||||
|
- **FastEmbed Integration**: Powered by the FastEmbed library for high-quality embeddings
|
||||||
|
- **Multiple Model Support**: Support for various embedding models
|
||||||
|
- **High Performance**: Optimized for fast embedding generation
|
||||||
|
- **Standalone Service**: Can run independently or as part of the predict-otron-9000 platform
|
||||||
|
|
||||||
|
## Building and Running
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
- Rust toolchain
|
||||||
|
- Internet connection for initial model downloads
|
||||||
|
|
||||||
|
### Standalone Server
|
||||||
|
```bash
|
||||||
|
cargo run --bin embeddings-engine --release
|
||||||
|
```
|
||||||
|
|
||||||
|
The service will start on port 8080 by default.
|
||||||
|
|
||||||
|
## API Usage
|
||||||
|
|
||||||
|
### Generate Embeddings
|
||||||
|
|
||||||
|
**Endpoint**: `POST /v1/embeddings`
|
||||||
|
|
||||||
|
**Request Body**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"input": "Your text to embed",
|
||||||
|
"model": "nomic-embed-text-v1.5"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"object": "list",
|
||||||
|
"data": [
|
||||||
|
{
|
||||||
|
"object": "embedding",
|
||||||
|
"index": 0,
|
||||||
|
"embedding": [0.1, 0.2, 0.3, ...]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"model": "nomic-embed-text-v1.5",
|
||||||
|
"usage": {
|
||||||
|
"prompt_tokens": 0,
|
||||||
|
"total_tokens": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example Usage
|
||||||
|
|
||||||
|
**Using cURL**:
|
||||||
|
```bash
|
||||||
|
curl -s http://localhost:8080/v1/embeddings \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"input": "The quick brown fox jumps over the lazy dog",
|
||||||
|
"model": "nomic-embed-text-v1.5"
|
||||||
|
}' | jq
|
||||||
|
```
|
||||||
|
|
||||||
|
**Using Python OpenAI Client**:
|
||||||
|
```python
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
client = OpenAI(
|
||||||
|
base_url="http://localhost:8080/v1",
|
||||||
|
api_key="dummy" # Not validated but required by client
|
||||||
|
)
|
||||||
|
|
||||||
|
response = client.embeddings.create(
|
||||||
|
input="Your text here",
|
||||||
|
model="nomic-embed-text-v1.5"
|
||||||
|
)
|
||||||
|
|
||||||
|
print(response.data[0].embedding)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
The service can be configured through environment variables:
|
||||||
|
- `SERVER_PORT`: Port to run on (default: 8080)
|
||||||
|
- `RUST_LOG`: Logging level (default: info)
|
||||||
|
|
||||||
|
## Integration
|
||||||
|
|
||||||
|
This service is designed to work seamlessly with the predict-otron-9000 main server, but can also be deployed independently for dedicated embeddings workloads.
|
@@ -137,7 +137,7 @@ Parsing workspace at: ..
|
|||||||
Output directory: ../generated-helm-chart
|
Output directory: ../generated-helm-chart
|
||||||
Chart name: predict-otron-9000
|
Chart name: predict-otron-9000
|
||||||
Found 4 services:
|
Found 4 services:
|
||||||
- leptos-app: ghcr.io/geoffsee/leptos-app:latest (port 8788)
|
- chat-ui: ghcr.io/geoffsee/chat-ui:latest (port 8788)
|
||||||
- inference-engine: ghcr.io/geoffsee/inference-service:latest (port 8080)
|
- inference-engine: ghcr.io/geoffsee/inference-service:latest (port 8080)
|
||||||
- embeddings-engine: ghcr.io/geoffsee/embeddings-service:latest (port 8080)
|
- embeddings-engine: ghcr.io/geoffsee/embeddings-service:latest (port 8080)
|
||||||
- predict-otron-9000: ghcr.io/geoffsee/predict-otron-9000:latest (port 8080)
|
- predict-otron-9000: ghcr.io/geoffsee/predict-otron-9000:latest (port 8080)
|
||||||
|
@@ -52,7 +52,7 @@ graph TB
|
|||||||
|
|
||||||
## Workspace Structure
|
## Workspace Structure
|
||||||
|
|
||||||
The project uses a 7-crate Rust workspace with TypeScript tooling, designed for maximum flexibility in deployment configurations.
|
The project uses a 9-crate Rust workspace with TypeScript tooling, designed for maximum flexibility in deployment configurations.
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
graph TD
|
graph TD
|
||||||
@@ -69,18 +69,15 @@ graph TD
|
|||||||
end
|
end
|
||||||
|
|
||||||
subgraph "Frontend"
|
subgraph "Frontend"
|
||||||
D[leptos-app<br/>Edition: 2021<br/>Port: 3000/8788<br/>WASM/SSR]
|
D[chat-ui<br/>Edition: 2021<br/>Port: 8788<br/>WASM UI]
|
||||||
end
|
end
|
||||||
|
|
||||||
subgraph "Tooling"
|
subgraph "Tooling"
|
||||||
L[helm-chart-tool<br/>Edition: 2024<br/>K8s deployment]
|
L[helm-chart-tool<br/>Edition: 2024<br/>K8s deployment]
|
||||||
|
E[cli<br/>Edition: 2024<br/>TypeScript/Bun CLI]
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
subgraph "External Tooling"
|
|
||||||
E[scripts/cli.ts<br/>TypeScript/Bun<br/>OpenAI SDK]
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph "Dependencies"
|
subgraph "Dependencies"
|
||||||
A --> B
|
A --> B
|
||||||
A --> C
|
A --> C
|
||||||
@@ -193,7 +190,7 @@ graph TB
|
|||||||
end
|
end
|
||||||
|
|
||||||
subgraph "Frontend"
|
subgraph "Frontend"
|
||||||
D[leptos-app Pod<br/>:8788<br/>ClusterIP Service]
|
D[chat-ui Pod<br/>:8788<br/>ClusterIP Service]
|
||||||
end
|
end
|
||||||
|
|
||||||
subgraph "Ingress"
|
subgraph "Ingress"
|
||||||
|
Reference in New Issue
Block a user