mirror of
https://github.com/mii443/usls.git
synced 2025-08-22 15:45:41 +00:00
This demo shows how to use BLIP to do conditional or unconditional image captioning.
Quick Start
cargo run -r --example blip
Or you can manully
1. Donwload CLIP ONNX Model
blip-visual-base
blip-textual-base
2. Specify the ONNX model path in main.rs
// visual
let options_visual = Options::default()
.with_model("VISUAL_MODEL") // <= modify this
.with_profile(false);
// textual
let options_textual = Options::default()
.with_model("TEXTUAL_MODEL") // <= modify this
.with_profile(false);
3. Then, run
cargo run -r --example blip
Results
[Unconditional image captioning]: a group of people walking around a bus
[Conditional image captioning]: three man walking in front of a bus
TODO
- text decode with Top-p sample
- VQA
- Retrival
- TensorRT support for textual model