usls
⭐️ Star if helpful! ⭐️
usls is an evolving Rust library focused on inference for advanced vision and vision-language models, along with practical vision utilities.
- SOTA Model Inference: Supports a wide range of state-of-the-art vision and multi-modal models (typically with fewer than 1B parameters).
- Multi-backend Acceleration: Supports CPU, CUDA, TensorRT, and CoreML.
- Easy Data Handling: Easily read images, video streams, and folders with iterator support.
- Rich Result Types: Built-in containers for common vision outputs like bounding boxes (Hbb, Obb), polygons, masks, etc.
- Annotation & Visualization: Draw and display inference results directly, similar to OpenCV's
imshow().
🧩 Supported Models
- YOLO Models: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLOv12
- SAM Models: SAM, SAM2, MobileSAM, EdgeSAM, SAM-HQ, FastSAM
- Vision Models: RT-DETR, RTMO, Depth-Anything, DINOv2, MODNet, Sapiens, DepthPro, FastViT, BEiT, MobileOne
- Vision-Language Models: CLIP, jina-clip-v1, BLIP, GroundingDINO, YOLO-World, Florence2, Moondream2
- OCR-Related Models: FAST, DB(PaddleOCR-Det), SVTR(PaddleOCR-Rec), SLANet, TrOCR, DocLayout-YOLO
Full list of supported models (click to expand)
| Model | Task / Description | Example | CoreML | CUDA FP32 |
CUDA FP16 |
TensorRT FP32 |
TensorRT FP16 |
|---|---|---|---|---|---|---|---|
| BEiT | Image Classification | demo | ✅ | ✅ | ✅ | ||
| ConvNeXt | Image Classification | demo | ✅ | ✅ | ✅ | ||
| FastViT | Image Classification | demo | ✅ | ✅ | ✅ | ||
| MobileOne | Image Classification | demo | ✅ | ✅ | ✅ | ||
| DeiT | Image Classification | demo | ✅ | ✅ | ✅ | ||
| DINOv2 | Vision Embedding | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| YOLOv5 | Image Classification Object Detection Instance Segmentation |
demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| YOLOv6 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| YOLOv7 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| YOLOv8 YOLO11 |
Object Detection Instance Segmentation Image Classification Oriented Object Detection Keypoint Detection |
demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| YOLOv9 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| YOLOv10 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| YOLOv12 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| RT-DETR | Object Detection | demo | ✅ | ✅ | ✅ | ||
| RF-DETR | Object Detection | demo | ✅ | ✅ | ✅ | ||
| PP-PicoDet | Object Detection | demo | ✅ | ✅ | ✅ | ||
| DocLayout-YOLO | Object Detection | demo | ✅ | ✅ | ✅ | ||
| D-FINE | Object Detection | demo | ✅ | ✅ | ✅ | ||
| DEIM | Object Detection | demo | ✅ | ✅ | ✅ | ||
| RTMO | Keypoint Detection | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
| SAM | Segment Anything | demo | ✅ | ✅ | ✅ | ||
| SAM2 | Segment Anything | demo | ✅ | ✅ | ✅ | ||
| MobileSAM | Segment Anything | demo | ✅ | ✅ | ✅ | ||
| EdgeSAM | Segment Anything | demo | ✅ | ✅ | ✅ | ||
| SAM-HQ | Segment Anything | demo | ✅ | ✅ | ✅ | ||
| FastSAM | Instance Segmentation | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| YOLO-World | Open-Set Detection With Language | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| GroundingDINO | Open-Set Detection With Language | demo | ✅ | ✅ | ✅ | ||
| CLIP | Vision-Language Embedding | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
| jina-clip-v1 | Vision-Language Embedding | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
| BLIP | Image Captioning | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
| DB(PaddleOCR-Det) | Text Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| FAST | Text Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| LinkNet | Text Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| SVTR(PaddleOCR-Rec) | Text Recognition | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| SLANet | Tabel Recognition | demo | ✅ | ✅ | ✅ | ||
| TrOCR | Text Recognition | demo | ✅ | ✅ | ✅ | ||
| YOLOPv2 | Panoptic Driving Perception | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| DepthAnything v1 DepthAnything v2 |
Monocular Depth Estimation | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
| DepthPro | Monocular Depth Estimation | demo | ✅ | ✅ | ✅ | ||
| MODNet | Image Matting | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
| Sapiens | Foundation for Human Vision Models | demo | ✅ | ✅ | ✅ | ||
| Florence2 | a Variety of Vision Tasks | demo | ✅ | ✅ | ✅ | ||
| Moondream2 | Open-Set Object Detection Open-Set Keypoints Detection Image Caption Visual Question Answering |
demo | ✅ | ✅ | ✅ | ||
| OWLv2 | Open-Set Object Detection | demo | ✅ | ✅ | ✅ | ||
| SmolVLM(256M, 500M) | Visual Question Answering | demo | ✅ | ✅ | ✅ | ||
| RMBG(1.4, 2.0) | Image Segmentation Answering | demo | ✅ | ✅ | ✅ |
🛠️ Installation
To get started, you'll need:
1. Protocol Buffers Compiler (protoc)
Required for building the project. Official installation guide
# Linux (apt)
sudo apt install -y protobuf-compiler
# macOS (Homebrew)
brew install protobuf
# Windows (Winget)
winget install protobuf
# Verify installation
protoc --version # Should be 3.x or higher
2. Rust Toolchain
# Install Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
3. Add usls to Your Project
Add the following to your Cargo.toml:
[dependencies]
# Recommended: Use GitHub version
usls = { git = "https://github.com/jamjamjon/usls" }
# Alternative: Use crates.io version
usls = "latest-version"
Note: The GitHub version is recommended as it contains the latest updates.
⚡ Cargo Features
-
ONNXRuntime-related features (enabled by default), provide model inference and model zoo support:
-
ort-download-binaries(default): Automatically downloads prebuiltONNXRuntimebinaries for supported platforms. Provides core model loading and inference capabilities using theCPUexecution provider. -
ort-load-dynamicDynamic linking. You'll need to compileONNXRuntimefrom source or download a precompiled package, and then link it manually. See the guide here. -
cuda: Enables the NVIDIACUDAprovider. RequiresCUDAtoolkit andcuDNNinstalled. -
trt: Enables the NVIDIATensorRTprovider. RequiresTensorRTlibraries installed. -
mps: Enables the AppleCoreMLprovider for macOS.
-
-
If you only need basic features (such as image/video reading, result visualization, etc.), you can disable the default features to minimize dependencies:
usls = { git = "https://github.com/jamjamjon/usls", default-features = false }
✨ Example
-
Model Inference
cargo run -r --example yolo # CPU cargo run -r -F cuda --example yolo -- --device cuda:0 # GPU -
Reading Images
// Read a single image let image = DataLoader::try_read_one("./assets/bus.jpg")?; // Read multiple images let images = DataLoader::try_read_n(&["./assets/bus.jpg", "./assets/cat.png"])?; // Read all images in a folder let images = DataLoader::try_read_folder("./assets")?; // Read images matching a pattern (glob) let images = DataLoader::try_read_pattern("./assets/*.Jpg")?; // Load images and iterate let dl = DataLoader::new("./assets")?.with_batch(2).build()?; for images in dl.iter() { // Code here } -
Reading Video
let dl = DataLoader::new("http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4")? .with_batch(1) .with_nf_skip(2) .with_progress_bar(true) .build()?; for images in dl.iter() { // Code here } -
Annotate
let annotator = Annotator::default(); let image = DataLoader::try_read_one("./assets/bus.jpg")?; // hbb let hbb = Hbb::default() .with_xyxy(669.5233, 395.4491, 809.0367, 878.81226) .with_id(0) .with_name("person") .with_confidence(0.87094545); let _ = annotator.annotate(&image, &hbb)?; // keypoints let keypoints: Vec<Keypoint> = vec![ Keypoint::default() .with_xy(139.35767, 443.43655) .with_id(0) .with_name("nose") .with_confidence(0.9739332), Keypoint::default() .with_xy(147.38545, 434.34055) .with_id(1) .with_name("left_eye") .with_confidence(0.9098319), Keypoint::default() .with_xy(128.5701, 434.07516) .with_id(2) .with_name("right_eye") .with_confidence(0.9320564), ]; let _ = annotator.annotate(&image, &keypoints)?; -
Visualizing Inference Results and Exporting Video
let dl = DataLoader::new(args.source.as_str())?.build()?; let mut viewer = Viewer::default().with_window_scale(0.5); for images in &dl { // Check if the window exists and is open if viewer.is_window_exist() && !viewer.is_window_open() { break; } // Show image in window viewer.imshow(&images[0])?; // Handle key events and delay if let Some(key) = viewer.wait_key(1) { if key == usls::Key::Escape { break; } } // Your custom code here // Write video frame (requires video feature) // if args.save_video { // viewer.write_video_frame(&images[0])?; // } }
All examples are located in the examples directory.
❓ FAQ
See issues or open a new discussion.
🤝 Contributing
Contributions are welcome! If you have suggestions, bug reports, or want to add new features or models, feel free to open an issue or submit a pull request.
📜 License
This project is licensed under LICENSE.