Commit Graph

18 Commits

Author SHA1 Message Date
geoffsee
8d2b85b0b9 update docs 2025-08-31 19:27:15 -04:00
geoffsee
44e4f9e5e1 put proof in the pudding 2025-08-31 18:54:20 -04:00
geoffsee
38d51722f2 Update configuration loading with Cargo.toml path and clean up .gitignore
---

This commit message concisely communicates the key changes:

1. The code now builds an absolute path to the `Cargo.toml` file, enhancing clarity in configuration loading.
2. The addition of `PathBuf` usage improves type safety.
3. The removal of unnecessary entries from `.gitignore` helps maintain a clean project structure.

These updates reflect improvements in both functionality and project organization.
2025-08-31 14:06:44 -04:00
geoffsee
f5d2a85f2e cleanup, add ci 2025-08-31 10:31:20 -04:00
Geoff Seemueller
419e1c2ea7 fix Kubernetes spelling 2025-08-30 08:24:24 -04:00
Geoff Seemueller
06fdfcf898 clarify project intent 2025-08-30 08:23:38 -04:00
geoffsee
315ef17605 supports small llama and gemma models
Refactor inference

dedicated crates for llama and gemma inferencing, not integrated
2025-08-29 20:00:41 -04:00
geoffsee
62dcc8f5bb ai generated README.md 2025-08-28 16:04:45 -04:00
geoffsee
6b709b8ec5 remove weird art 2025-08-28 12:56:07 -04:00
geoffsee
d04340d9ac update docs 2025-08-28 12:54:09 -04:00
geoffsee
0488bddfdb Create ARCHITECTURE.md - update stale references to old chat crate 2025-08-28 12:22:05 -04:00
geoffsee
45d7cd8819 - Introduced ServerConfig for handling deployment modes and services.
- Added HighAvailability mode for proxying requests to external services.
- Maintained Local mode for embedded services.
- Updated `README.md` and included `SERVER_CONFIG.md` for detailed documentation.
2025-08-28 09:55:39 -04:00
geoffsee
956d00f596 Add CLEANUP.md with identified documentation and code issues. Update README files to fix repository URL, unify descriptions, and clarify Gemma model usage. 2025-08-28 07:24:14 -04:00
geoffsee
8338750beb Refactor apply_cached_repeat_penalty for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.
Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions.

- Add CPU fallback support for text generation when primary device is unsupported
- Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors
- Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations
- Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing

chat completion endpoint functions with gemma3 (no streaming)

Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking
2025-08-27 16:15:01 -04:00
geoffsee
7dd23213c9 fix image path again 2025-08-16 20:11:15 -04:00
geoffsee
dff09dc4d0 fix image path 2025-08-16 20:09:28 -04:00
geoffsee
83f2a8b295 add an image to the readme 2025-08-16 20:08:35 -04:00
geoffsee
2aa6d4cdf8 Introduce predict-otron-9000: Unified server combining embeddings and inference engines. Includes OpenAI-compatible APIs, full documentation, and example scripts. 2025-08-16 19:11:35 -04:00