Files
usls/examples/clip/README.md
jamjamjon af934086bb Initial
2024-03-29 15:54:24 +08:00

1.6 KiB

This demo showcases how to use CLIP to compute similarity between texts and images, which can be employed for image-to-text or text-to-image retrieval tasks.

Quick Start

cargo run -r --example clip

Or you can manully

1.Donwload CLIP ONNX Model

clip-b32-visual
clip-b32-textual

2. Specify the ONNX model path in main.rs

    // visual
    let options_visual = Options::default()
        .with_model("VISUAL_MODEL")  // <= modify this
        .with_i00((1, 1, 4).into())
        .with_profile(false);

    // textual
    let options_textual = Options::default()
        .with_model("TEXTUAL_MODEL")  // <= modify this
        .with_i00((1, 1, 4).into())
        .with_profile(false);

3. Then, run

cargo run -r --example clip

Results

(82.24775%) ./examples/clip/images/carrot.jpg => 几个胡萝卜 
[0.06708972, 0.0067733657, 0.0019306632, 0.8224775, 0.003044935, 0.083962336, 0.014721389]

(85.56889%) ./examples/clip/images/doll.jpg => There is a doll with red hair and a clock on a table 
[0.0786363, 0.0004783095, 0.00060898095, 0.06286741, 0.0006842306, 0.8556889, 0.0010357979]

(90.03625%) ./examples/clip/images/peoples.jpg => Some people holding wine glasses in a restaurant 
[0.07473288, 0.0027821448, 0.0075673857, 0.010874652, 0.003041679, 0.0006387719, 0.9003625]

TODO

  • TensorRT support for textual model