predict-otron-9001

mirror of https://github.com/geoffsee/predict-otron-9001.git synced 2025-09-08 22:46:44 +00:00

Author	SHA1	Message	Date
geoffsee	8d2b85b0b9	update docs	2025-08-31 19:27:15 -04:00
geoffsee	44e4f9e5e1	put proof in the pudding	2025-08-31 18:54:20 -04:00
geoffsee	38d51722f2	Update configuration loading with Cargo.toml path and clean up `.gitignore` --- This commit message concisely communicates the key changes: 1. The code now builds an absolute path to the `Cargo.toml` file, enhancing clarity in configuration loading. 2. The addition of `PathBuf` usage improves type safety. 3. The removal of unnecessary entries from `.gitignore` helps maintain a clean project structure. These updates reflect improvements in both functionality and project organization.	2025-08-31 14:06:44 -04:00
geoffsee	f5d2a85f2e	cleanup, add ci	2025-08-31 10:31:20 -04:00
Geoff Seemueller	419e1c2ea7	fix Kubernetes spelling	2025-08-30 08:24:24 -04:00
Geoff Seemueller	06fdfcf898	clarify project intent	2025-08-30 08:23:38 -04:00
geoffsee	315ef17605	supports small llama and gemma models Refactor inference dedicated crates for llama and gemma inferencing, not integrated	2025-08-29 20:00:41 -04:00
geoffsee	62dcc8f5bb	ai generated README.md	2025-08-28 16:04:45 -04:00
geoffsee	6b709b8ec5	remove weird art	2025-08-28 12:56:07 -04:00
geoffsee	d04340d9ac	update docs	2025-08-28 12:54:09 -04:00
geoffsee	0488bddfdb	Create ARCHITECTURE.md - update stale references to old chat crate	2025-08-28 12:22:05 -04:00
geoffsee	45d7cd8819	- Introduced `ServerConfig` for handling deployment modes and services. - Added HighAvailability mode for proxying requests to external services. - Maintained Local mode for embedded services. - Updated `README.md` and included `SERVER_CONFIG.md` for detailed documentation.	2025-08-28 09:55:39 -04:00
geoffsee	956d00f596	Add `CLEANUP.md` with identified documentation and code issues. Update README files to fix repository URL, unify descriptions, and clarify Gemma model usage.	2025-08-28 07:24:14 -04:00
geoffsee	8338750beb	Refactor `apply_cached_repeat_penalty` for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models. Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions. - Add CPU fallback support for text generation when primary device is unsupported - Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors - Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations - Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing chat completion endpoint functions with gemma3 (no streaming) Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking	2025-08-27 16:15:01 -04:00
geoffsee	7dd23213c9	fix image path again	2025-08-16 20:11:15 -04:00
geoffsee	dff09dc4d0	fix image path	2025-08-16 20:09:28 -04:00
geoffsee	83f2a8b295	add an image to the readme	2025-08-16 20:08:35 -04:00
geoffsee	2aa6d4cdf8	Introduce predict-otron-9000: Unified server combining embeddings and inference engines. Includes OpenAI-compatible APIs, full documentation, and example scripts.	2025-08-16 19:11:35 -04:00

18 Commits