Files
predict-otron-9001/crates/predict-otron-9000/Cargo.toml
geoffsee 8338750beb Refactor apply_cached_repeat_penalty for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.
Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions.

- Add CPU fallback support for text generation when primary device is unsupported
- Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors
- Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations
- Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing

chat completion endpoint functions with gemma3 (no streaming)

Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking
2025-08-27 16:15:01 -04:00

27 lines
721 B
TOML

[package]
name = "predict-otron-9000"
version = "0.1.0"
edition = "2024"
[[bin]]
name = "predict-otron-9000"
path = "src/main.rs"
[dependencies]
# Axum web framework
axum = "0.8.4"
tokio = { version = "1.45.1", features = ["full"] }
tower = "0.5.2"
tower-http = { version = "0.6.6", features = ["trace", "cors"] }
serde = { version = "1.0.219", features = ["derive"] }
serde_json = "1.0.140"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
uuid = { version = "1.7.0", features = ["v4"] }
# Dependencies for embeddings functionality
embeddings-engine = { path = "../embeddings-engine" }
# Dependencies for inference functionality
inference-engine = { path = "../inference-engine" }