Introduce predict-otron-9000: Unified server combining embeddings and inference engines. Includes OpenAI-compatible APIs, full documentation, and example scripts.

This commit is contained in:
geoffsee
2025-08-16 19:11:35 -04:00
commit 2aa6d4cdf8
28 changed files with 16595 additions and 0 deletions

17
crates/inference-engine/test.sh Executable file
View File

@@ -0,0 +1,17 @@
#!/usr/bin/env bash
PROMPT='Who was the 16th president'
# will pull gemma-3-1b-it and run the prompt
cargo run -- --prompt "${PROMPT}"
#avx: false, neon: true, simd128: false, f16c: false
#temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
#retrieved the files in 1.388209ms
#loaded the model in 321.509333ms
# user
#Who was the 16th president
# model
#The 16th President of the United States was **Abraham Lincoln**. He served from March 4, 1861, to March 4, 1865.
#40 tokens generated (31.85 token/s)