From 06fdfcf8988d3c4e9783b622c046b79f68b4ecb5 Mon Sep 17 00:00:00 2001 From: Geoff Seemueller <28698553+geoffsee@users.noreply.github.com> Date: Sat, 30 Aug 2025 08:23:38 -0400 Subject: [PATCH] clarify project intent --- README.md | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 35e84c1..c165aae 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,20 @@ -# predict-otron-9000 - -A comprehensive multi-service AI platform built around local LLM inference, embeddings, and web interfaces. - +

+ predict-otron-9000 +

Powerful local AI inference with OpenAI-compatible APIs

+
+ +> This project is an educational aide for bootstrapping my understanding of language model inferencing at the lowest levels I can, serving as a "rubber-duck" solution for Kuberenetes based performance-oriented inference capabilities on air-gapped networks. + +> By isolating application behaviors in components at the crate level, development reduces to a short feedback loop for validation and integration, ultimately smoothing the learning curve for scalable AI systems. +Stability is currently best effort. Many models require unique configuration. When stability is achieved, this project will be promoted to the seemueller-io GitHub organization under a different name. + +A comprehensive multi-service AI platform built around local LLM inference, embeddings, and web interfaces. + + ## Project Overview The predict-otron-9000 is a flexible AI platform that provides: @@ -24,7 +33,7 @@ The system supports both CPU and GPU acceleration (CUDA/Metal), with intelligent - **Text Embeddings**: Generate high-quality text embeddings using FastEmbed - **Text Generation**: Chat completions with OpenAI-compatible API using Gemma and Llama models (various sizes including instruction-tuned variants) - **Performance Optimized**: Efficient caching and platform-specific optimizations for improved throughput -- **Web Chat Interface**: Leptos-based WebAssembly (WASM) chat interface for browser-based interaction +- **Web Chat Interface**: Leptos chat interface - **Flexible Deployment**: Run as monolithic service or microservices architecture ## Architecture Overview @@ -50,7 +59,7 @@ crates/ - **Main Server** (port 8080): Orchestrates inference and embeddings services - **Embeddings Service** (port 8080): Standalone FastEmbed service with OpenAI API compatibility -- **Web Frontend** (port 8788): Leptos WASM chat interface served by Trunk +- **Web Frontend** (port 8788): cargo leptos SSR app - **CLI Client**: TypeScript/Bun client for testing and automation ### Deployment Modes @@ -497,4 +506,4 @@ For networked tests and full functionality, ensure Hugging Face authentication i 4. Ensure all tests pass: `cargo test` 5. Submit a pull request -_Warning: Do NOT use this in production unless you are cool like that._ \ No newline at end of file +_Warning: Do NOT use this in production unless you are cool like that._