---
This commit message concisely communicates the key changes:
1. The code now builds an absolute path to the `Cargo.toml` file, enhancing clarity in configuration loading.
2. The addition of `PathBuf` usage improves type safety.
3. The removal of unnecessary entries from `.gitignore` helps maintain a clean project structure.
These updates reflect improvements in both functionality and project organization.
- Added HighAvailability mode for proxying requests to external services.
- Maintained Local mode for embedded services.
- Updated `README.md` and included `SERVER_CONFIG.md` for detailed documentation.
- Introduced `reset_state` for clearing cached state between requests.
- Enhanced chat UI with model selector and dynamic model fetching.
- Improved error logging and detailed debug messages for chat request flows.
- Added fresh instantiation of `TextGeneration` to prevent tensor shape mismatches.
Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions.
- Add CPU fallback support for text generation when primary device is unsupported
- Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors
- Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations
- Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing
chat completion endpoint functions with gemma3 (no streaming)
Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking