mirror of
https://github.com/geoffsee/predict-otron-9001.git
synced 2025-09-08 22:46:44 +00:00
update docs
This commit is contained in:
2
.gitignore
vendored
2
.gitignore
vendored
@@ -74,8 +74,6 @@ venv/
|
||||
# Backup files
|
||||
*.bak
|
||||
*.backup
|
||||
*~
|
||||
/scripts/cli
|
||||
!/scripts/cli.ts
|
||||
/**/.*.bun-build
|
||||
/AGENTS.md
|
||||
|
35
README.md
35
README.md
@@ -42,7 +42,7 @@ The system supports both CPU and GPU acceleration (CUDA/Metal), with intelligent
|
||||
|
||||
### Workspace Structure
|
||||
|
||||
The project uses a 7-crate Rust workspace plus TypeScript components:
|
||||
The project uses a 9-crate Rust workspace plus TypeScript components:
|
||||
|
||||
```
|
||||
crates/
|
||||
@@ -51,17 +51,18 @@ crates/
|
||||
├── gemma-runner/ # Gemma model inference via Candle (Rust 2021)
|
||||
├── llama-runner/ # Llama model inference via Candle (Rust 2021)
|
||||
├── embeddings-engine/ # FastEmbed embeddings service (Rust 2024)
|
||||
├── leptos-app/ # WASM web frontend (Rust 2021)
|
||||
├── chat-ui/ # WASM web frontend (Rust 2021)
|
||||
├── helm-chart-tool/ # Kubernetes deployment tooling (Rust 2024)
|
||||
└── scripts/
|
||||
└── cli.ts # TypeScript/Bun CLI client
|
||||
└── cli/ # CLI client crate (Rust 2024)
|
||||
└── package/
|
||||
└── cli.ts # TypeScript/Bun CLI client
|
||||
```
|
||||
|
||||
### Service Architecture
|
||||
|
||||
- **Main Server** (port 8080): Orchestrates inference and embeddings services
|
||||
- **Embeddings Service** (port 8080): Standalone FastEmbed service with OpenAI API compatibility
|
||||
- **Web Frontend** (port 8788): cargo leptos SSR app
|
||||
- **Web Frontend** (port 8788): chat-ui WASM app
|
||||
- **CLI Client**: TypeScript/Bun client for testing and automation
|
||||
|
||||
### Deployment Modes
|
||||
@@ -144,26 +145,26 @@ cargo build --bin embeddings-engine --release
|
||||
|
||||
#### Web Frontend (Port 8788)
|
||||
```bash
|
||||
cd crates/leptos-app
|
||||
cd crates/chat-ui
|
||||
./run.sh
|
||||
```
|
||||
- Serves Leptos WASM frontend on port 8788
|
||||
- Serves chat-ui WASM frontend on port 8788
|
||||
- Sets required RUSTFLAGS for WebAssembly getrandom support
|
||||
- Auto-reloads during development
|
||||
|
||||
#### TypeScript CLI Client
|
||||
```bash
|
||||
# List available models
|
||||
bun run scripts/cli.ts --list-models
|
||||
cd crates/cli/package && bun run cli.ts --list-models
|
||||
|
||||
# Chat completion
|
||||
bun run scripts/cli.ts "What is the capital of France?"
|
||||
cd crates/cli/package && bun run cli.ts "What is the capital of France?"
|
||||
|
||||
# With specific model
|
||||
bun run scripts/cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
|
||||
cd crates/cli/package && bun run cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
|
||||
|
||||
# Show help
|
||||
bun run scripts/cli.ts --help
|
||||
cd crates/cli/package && bun run cli.ts --help
|
||||
```
|
||||
|
||||
## API Usage
|
||||
@@ -279,7 +280,7 @@ cargo test --workspace
|
||||
|
||||
**End-to-end test script:**
|
||||
```bash
|
||||
./smoke_test.sh
|
||||
./scripts/smoke_test.sh
|
||||
```
|
||||
|
||||
This script:
|
||||
@@ -368,7 +369,7 @@ All services include Docker metadata in `Cargo.toml`:
|
||||
- Port: 8080
|
||||
|
||||
**Web Frontend:**
|
||||
- Image: `ghcr.io/geoffsee/leptos-app:latest`
|
||||
- Image: `ghcr.io/geoffsee/chat-ui:latest`
|
||||
- Port: 8788
|
||||
|
||||
**Docker Compose:**
|
||||
@@ -427,7 +428,7 @@ For Kubernetes deployment details, see the [ARCHITECTURE.md](docs/ARCHITECTURE.m
|
||||
**Symptom:** WASM compilation failures
|
||||
**Solution:**
|
||||
1. Install required targets: `rustup target add wasm32-unknown-unknown`
|
||||
2. Check RUSTFLAGS in leptos-app/run.sh
|
||||
2. Check RUSTFLAGS in chat-ui/run.sh
|
||||
|
||||
### Network/Timeout Issues
|
||||
**Symptom:** First-time model downloads timing out
|
||||
@@ -458,18 +459,18 @@ curl -s http://localhost:8080/v1/models | jq
|
||||
|
||||
**CLI client test:**
|
||||
```bash
|
||||
bun run scripts/cli.ts "What is 2+2?"
|
||||
cd crates/cli/package && bun run cli.ts "What is 2+2?"
|
||||
```
|
||||
|
||||
**Web frontend:**
|
||||
```bash
|
||||
cd crates/leptos-app && ./run.sh &
|
||||
cd crates/chat-ui && ./run.sh &
|
||||
# Navigate to http://localhost:8788
|
||||
```
|
||||
|
||||
**Integration test:**
|
||||
```bash
|
||||
./smoke_test.sh
|
||||
./scripts/smoke_test.sh
|
||||
```
|
||||
|
||||
**Cleanup:**
|
||||
|
@@ -1,2 +1,41 @@
|
||||
# chat-ui
|
||||
This is served by the predict-otron-9000 server. This needs to be built before the server.
|
||||
|
||||
A WASM-based web chat interface for the predict-otron-9000 AI platform.
|
||||
|
||||
## Overview
|
||||
|
||||
The chat-ui provides a real-time web interface for interacting with language models through the predict-otron-9000 server. Built with Leptos and compiled to WebAssembly, it offers a modern chat experience with streaming response support.
|
||||
|
||||
## Features
|
||||
|
||||
- Real-time chat interface with the inference server
|
||||
- Streaming response support
|
||||
- Conversation history
|
||||
- Responsive web design
|
||||
- WebAssembly-powered for optimal performance
|
||||
|
||||
## Building and Running
|
||||
|
||||
### Prerequisites
|
||||
- Rust toolchain with WASM target: `rustup target add wasm32-unknown-unknown`
|
||||
- The predict-otron-9000 server must be running on port 8080
|
||||
|
||||
### Development Server
|
||||
```bash
|
||||
cd crates/chat-ui
|
||||
./run.sh
|
||||
```
|
||||
|
||||
This starts the development server on port 8788 with auto-reload capabilities.
|
||||
|
||||
### Usage
|
||||
1. Start the predict-otron-9000 server: `./scripts/run_server.sh`
|
||||
2. Start the chat-ui: `cd crates/chat-ui && ./run.sh`
|
||||
3. Navigate to `http://localhost:8788`
|
||||
4. Start chatting with your AI models!
|
||||
|
||||
## Technical Details
|
||||
- Built with Leptos framework
|
||||
- Compiled to WebAssembly for browser execution
|
||||
- Communicates with predict-otron-9000 API via HTTP
|
||||
- Sets required RUSTFLAGS for WebAssembly getrandom support
|
@@ -3,7 +3,7 @@
|
||||
A Rust/Typescript Hybrid
|
||||
|
||||
```console
|
||||
./cli [options] [prompt]
|
||||
bun run cli.ts [options] [prompt]
|
||||
|
||||
Simple CLI tool for testing the local OpenAI-compatible API server.
|
||||
|
||||
@@ -14,10 +14,11 @@ Options:
|
||||
--help Show this help message
|
||||
|
||||
Examples:
|
||||
./cli "What is the capital of France?"
|
||||
./cli --model gemma-3-1b-it --prompt "Hello, world!"
|
||||
./cli --prompt "Who was the 16th president of the United States?"
|
||||
./cli --list-models
|
||||
cd crates/cli/package
|
||||
bun run cli.ts "What is the capital of France?"
|
||||
bun run cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
|
||||
bun run cli.ts --prompt "Who was the 16th president of the United States?"
|
||||
bun run cli.ts --list-models
|
||||
|
||||
The server must be running at http://localhost:8080
|
||||
```
|
@@ -1,4 +1,100 @@
|
||||
# Embeddings Engine
|
||||
|
||||
A high-performance text embeddings service that generates vector representations of text using state-of-the-art models.
|
||||
This crate wraps the fastembed crate to provide embeddings and partially adapts the openai specification.
|
||||
A high-performance text embeddings service that generates vector representations of text using state-of-the-art models. This crate wraps the FastEmbed library to provide embeddings with OpenAI-compatible API endpoints.
|
||||
|
||||
## Overview
|
||||
|
||||
The embeddings-engine provides a standalone service for generating text embeddings that can be used for semantic search, similarity comparisons, and other NLP tasks. It's designed to be compatible with OpenAI's embeddings API format.
|
||||
|
||||
## Features
|
||||
|
||||
- **OpenAI-Compatible API**: `/v1/embeddings` endpoint matching OpenAI's specification
|
||||
- **FastEmbed Integration**: Powered by the FastEmbed library for high-quality embeddings
|
||||
- **Multiple Model Support**: Support for various embedding models
|
||||
- **High Performance**: Optimized for fast embedding generation
|
||||
- **Standalone Service**: Can run independently or as part of the predict-otron-9000 platform
|
||||
|
||||
## Building and Running
|
||||
|
||||
### Prerequisites
|
||||
- Rust toolchain
|
||||
- Internet connection for initial model downloads
|
||||
|
||||
### Standalone Server
|
||||
```bash
|
||||
cargo run --bin embeddings-engine --release
|
||||
```
|
||||
|
||||
The service will start on port 8080 by default.
|
||||
|
||||
## API Usage
|
||||
|
||||
### Generate Embeddings
|
||||
|
||||
**Endpoint**: `POST /v1/embeddings`
|
||||
|
||||
**Request Body**:
|
||||
```json
|
||||
{
|
||||
"input": "Your text to embed",
|
||||
"model": "nomic-embed-text-v1.5"
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"object": "list",
|
||||
"data": [
|
||||
{
|
||||
"object": "embedding",
|
||||
"index": 0,
|
||||
"embedding": [0.1, 0.2, 0.3, ...]
|
||||
}
|
||||
],
|
||||
"model": "nomic-embed-text-v1.5",
|
||||
"usage": {
|
||||
"prompt_tokens": 0,
|
||||
"total_tokens": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Example Usage
|
||||
|
||||
**Using cURL**:
|
||||
```bash
|
||||
curl -s http://localhost:8080/v1/embeddings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"input": "The quick brown fox jumps over the lazy dog",
|
||||
"model": "nomic-embed-text-v1.5"
|
||||
}' | jq
|
||||
```
|
||||
|
||||
**Using Python OpenAI Client**:
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/v1",
|
||||
api_key="dummy" # Not validated but required by client
|
||||
)
|
||||
|
||||
response = client.embeddings.create(
|
||||
input="Your text here",
|
||||
model="nomic-embed-text-v1.5"
|
||||
)
|
||||
|
||||
print(response.data[0].embedding)
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The service can be configured through environment variables:
|
||||
- `SERVER_PORT`: Port to run on (default: 8080)
|
||||
- `RUST_LOG`: Logging level (default: info)
|
||||
|
||||
## Integration
|
||||
|
||||
This service is designed to work seamlessly with the predict-otron-9000 main server, but can also be deployed independently for dedicated embeddings workloads.
|
@@ -137,7 +137,7 @@ Parsing workspace at: ..
|
||||
Output directory: ../generated-helm-chart
|
||||
Chart name: predict-otron-9000
|
||||
Found 4 services:
|
||||
- leptos-app: ghcr.io/geoffsee/leptos-app:latest (port 8788)
|
||||
- chat-ui: ghcr.io/geoffsee/chat-ui:latest (port 8788)
|
||||
- inference-engine: ghcr.io/geoffsee/inference-service:latest (port 8080)
|
||||
- embeddings-engine: ghcr.io/geoffsee/embeddings-service:latest (port 8080)
|
||||
- predict-otron-9000: ghcr.io/geoffsee/predict-otron-9000:latest (port 8080)
|
||||
|
@@ -52,7 +52,7 @@ graph TB
|
||||
|
||||
## Workspace Structure
|
||||
|
||||
The project uses a 7-crate Rust workspace with TypeScript tooling, designed for maximum flexibility in deployment configurations.
|
||||
The project uses a 9-crate Rust workspace with TypeScript tooling, designed for maximum flexibility in deployment configurations.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
@@ -69,18 +69,15 @@ graph TD
|
||||
end
|
||||
|
||||
subgraph "Frontend"
|
||||
D[leptos-app<br/>Edition: 2021<br/>Port: 3000/8788<br/>WASM/SSR]
|
||||
D[chat-ui<br/>Edition: 2021<br/>Port: 8788<br/>WASM UI]
|
||||
end
|
||||
|
||||
subgraph "Tooling"
|
||||
L[helm-chart-tool<br/>Edition: 2024<br/>K8s deployment]
|
||||
E[cli<br/>Edition: 2024<br/>TypeScript/Bun CLI]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph "External Tooling"
|
||||
E[scripts/cli.ts<br/>TypeScript/Bun<br/>OpenAI SDK]
|
||||
end
|
||||
|
||||
subgraph "Dependencies"
|
||||
A --> B
|
||||
A --> C
|
||||
@@ -193,7 +190,7 @@ graph TB
|
||||
end
|
||||
|
||||
subgraph "Frontend"
|
||||
D[leptos-app Pod<br/>:8788<br/>ClusterIP Service]
|
||||
D[chat-ui Pod<br/>:8788<br/>ClusterIP Service]
|
||||
end
|
||||
|
||||
subgraph "Ingress"
|
||||
|
Reference in New Issue
Block a user