Add YOLOv8-OBB and some bug fixes (#9)

* Add YOLOv8-Obb & Refactor outputs

* Update README.md
This commit is contained in:
Jamjamjon
2024-04-21 17:06:58 +08:00
committed by GitHub
parent 91049fc18a
commit beda8ef803
109 changed files with 2542 additions and 1940 deletions

View File

@ -40,3 +40,4 @@ indicatif = "0.17.8"
image = "0.25.1"
imageproc = { version = "0.24" }
ab_glyph = "0.2.23"
geo = "0.28.0"

View File

@ -1,42 +1,65 @@
# usls
A Rust library integrated with **ONNXRuntime**, providing a collection of **Computer Vison** and **Vision-Language** models including [YOLOv8](https://github.com/ultralytics/ultralytics), [YOLOv9](https://github.com/WongKinYiu/yolov9), [RTDETR](https://arxiv.org/abs/2304.08069), [CLIP](https://github.com/openai/CLIP), [DINOv2](https://github.com/facebookresearch/dinov2), [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM), [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [BLIP](https://arxiv.org/abs/2201.12086), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) and others.
A Rust library integrated with **ONNXRuntime**, providing a collection of **Computer Vison** and **Vision-Language** models including [YOLOv5](https://github.com/ultralytics/yolov5), [YOLOv8](https://github.com/ultralytics/ultralytics), [YOLOv9](https://github.com/WongKinYiu/yolov9), [RTDETR](https://arxiv.org/abs/2304.08069), [CLIP](https://github.com/openai/CLIP), [DINOv2](https://github.com/facebookresearch/dinov2), [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM), [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [BLIP](https://arxiv.org/abs/2201.12086), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) and others.
## Recently Updated
| YOLOP-v2 | Face-Parsing | Text-Detection |
| :----------------------------: | :------------------------------: | :------------------------------: |
|<img src='examples/yolop/demo.png' height="240px">| <img src='examples/face-parsing/demo.png' height="240px"> | <img src='examples/db/demo.png' height="240px"> |
| YOLOv8-Obb |
| :----------------------------: |
|<img src='examples/yolov8/demo-obb-2.png' width="800px">|
## Supported Models
| Model | Task / Type | Example | CUDA<br />f32 | CUDA<br />f16 | TensorRT<br />f32 | TensorRT<br />f16 |
| :---------------------------------------------------------------: | :------------------------------------------------------------------------: | :----------------------: | :-----------: | :-----------: | :------------------------: | :-----------------------: |
| [YOLOv8-detection](https://github.com/ultralytics/ultralytics) | Object Detection | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv8-pose](https://github.com/ultralytics/ultralytics) | Keypoint Detection | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv8-classification](https://github.com/ultralytics/ultralytics) | Classification | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv8-segmentation](https://github.com/ultralytics/ultralytics) | Instance Segmentation | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv9](https://github.com/WongKinYiu/yolov9) | Object Detection | [demo](examples/yolov9) | ✅ | ✅ | ✅ | ✅ |
| [RT-DETR](https://arxiv.org/abs/2304.08069) | Object Detection | [demo](examples/rtdetr) | ✅ | ✅ | ✅ | ✅ |
| [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) | Instance Segmentation | [demo](examples/fastsam) | ✅ | ✅ | ✅ | ✅ |
| [YOLO-World](https://github.com/AILab-CVC/YOLO-World) | Object Detection | [demo](examples/yolo-world) | ✅ | ✅ | ✅ | ✅ |
| [DINOv2](https://github.com/facebookresearch/dinov2) | Vision-Self-Supervised | [demo](examples/dinov2) | ✅ | ✅ | ✅ | ✅ |
| [CLIP](https://github.com/openai/CLIP) | Vision-Language | [demo](examples/clip) | | ✅ | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
| [BLIP](https://github.com/salesforce/BLIP) | Vision-Language | [demo](examples/blip) | ✅ | ✅ | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
| [DB](https://arxiv.org/abs/1911.08947) | Text Detection | [demo](examples/db) | ✅ | ✅ | ✅ | ✅ |
| [SVTR](https://arxiv.org/abs/2205.00159) | Text Recognition | [demo](examples/svtr) | ✅ | ✅ | ✅ | ✅ |
| [RTMO](https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo) | Keypoint Detection | [demo](examples/rtmo) | ✅ | ✅ | | |
| [YOLOPv2](https://arxiv.org/abs/2208.11434) | Panoptic driving Perception | [demo](examples/yolop) | ✅ | ✅ | | |
| Model | Task / Type | Example | CUDA<br />f32 | CUDA<br />f16 | TensorRT<br />f32 | TensorRT<br />f16 |
| :---------------------------------------------------------------: | :-------------------------: | :----------------------: | :-----------: | :-----------: | :------------------------: | :-----------------------: |
| [YOLOv8-obb](https://github.com/ultralytics/ultralytics) | Oriented Object Detection | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv8-detection](https://github.com/ultralytics/ultralytics) | Object Detection | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv8-pose](https://github.com/ultralytics/ultralytics) | Keypoint Detection | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv8-classification](https://github.com/ultralytics/ultralytics) | Classification | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv8-segmentation](https://github.com/ultralytics/ultralytics) | Instance Segmentation | [demo](examples/yolov8) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv9](https://github.com/WongKinYiu/yolov9) | Object Detection | [demo](examples/yolov9) | ✅ | ✅ | ✅ | ✅ |
| [RT-DETR](https://arxiv.org/abs/2304.08069) | Object Detection | [demo](examples/rtdetr) | ✅ | ✅ | ✅ | ✅ |
| [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) | Instance Segmentation | [demo](examples/fastsam) | ✅ | ✅ | ✅ | ✅ |
| [YOLO-World](https://github.com/AILab-CVC/YOLO-World) | Object Detection | [demo](examples/yolo-world) | ✅ | ✅ | ✅ | ✅ |
| [DINOv2](https://github.com/facebookresearch/dinov2) | Vision-Self-Supervised | [demo](examples/dinov2) | | | | ✅ |
| [CLIP](https://github.com/openai/CLIP) | Vision-Language | [demo](examples/clip) | ✅ | ✅ | ✅ visual<br />❌ textual | ✅ visual<br />❌ textual |
| [BLIP](https://github.com/salesforce/BLIP) | Vision-Language | [demo](examples/blip) | ✅ | ✅ |  visual<br />❌ textual | ✅ visual<br />❌ textual |
| [DB](https://arxiv.org/abs/1911.08947) | Text Detection | [demo](examples/db) | ✅ | ✅ | ✅ | ✅ |
| [SVTR](https://arxiv.org/abs/2205.00159) | Text Recognition | [demo](examples/svtr) | ✅ | ✅ | | |
| [RTMO](https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo) | Keypoint Detection | [demo](examples/rtmo) | ✅ | ✅ | | |
| [YOLOPv2](https://arxiv.org/abs/2208.11434) | Panoptic driving Perception | [demo](examples/yolop) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv5-classification](https://github.com/ultralytics/yolov5) | Object Detection | [demo](examples/yolov5) | ✅ | ✅ | ✅ | ✅ |
| [YOLOv5-segmentation](https://github.com/ultralytics/yolov5) | Instance Segmentation | [demo](examples/yolov5) | ✅ | ✅ | ✅ | ✅ |
## Solution Models
Additionally, this repo also provides some solution models.
| Model | Example | Result |
| :------------------------------------------------------------: | :------------------------------: | :------------------------------: |
| Lane Line Segmentation<br /> Drivable Area Segmentation<br />Car Detection<br />车道线-可行驶区域-车辆检测 | [demo](examples/yolov8-plastic-bag) |<img src='examples/yolop/demo.png' width="220px" height="140px">|
| Face Parsing<br /> 人脸解析 | [demo](examples/face-parsing) |<img src='examples/face-parsing/demo.png' width="220px" height="200px"> |
| Text Detection<br />(PPOCR-det v3, v4)<br />通用文本检测 | [demo](examples/db) |<img src='examples/db/demo.jpg' width="250px" height="200px">|
| Text Recognition<br />(PPOCR-rec v3, v4)<br />中英文-文本识别 | [demo](examples/svtr) ||
| Face-Landmark Detection<br />人脸 & 关键点检测 | [demo](examples/yolov8-face) |<img src='examples/yolov8-face/demo.jpg' width="220px" height="180px">|
| Head Detection<br /> 人头检测 | [demo](examples/yolov8-head) |<img src='examples/yolov8-head/demo.jpg' width="220px" height="180px">|
| Fall Detection<br /> 摔倒检测 | [demo](examples/yolov8-falldown) | <img src='examples/yolov8-falldown/demo.jpg' width="220px" height="180px">|
| Trash Detection<br /> 垃圾检测 | [demo](examples/yolov8-plastic-bag) |<img src='examples/yolov8-trash/demo.jpg' width="250px" height="180px">|
<details close>
<summary>Additionally, this repo also provides some solution models.</summary>
| Model | Example | Result |
| :---------------------------------------------------------------------------------------------------------: | :------------------------------: | :-----------------------------------------------------------------------------: |
| Lane Line Segmentation<br /> Drivable Area Segmentation<br />Car Detection<br />车道线-可行驶区域-车辆检测 | [demo](examples/yolov8-plastic-bag) | <img src='examples/yolop/demo.png' width="220px" height="140px"> |
| Face Parsing<br /> 人脸解析 | [demo](examples/face-parsing) | <img src='examples/face-parsing/demo.png' width="220px" height="200px"> |
| Text Detection<br />(PPOCR-det v3, v4)<br />通用文本检测 | [demo](examples/db) | <img src='examples/db/demo.png' width="250px" height="200px"> |
| Text Recognition<br />(PPOCR-rec v3, v4)<br />中英文-文本识别 | [demo](examples/svtr) | |
| Face-Landmark Detection<br />人脸 & 关键点检测 | [demo](examples/yolov8-face) | <img src='examples/yolov8-face/demo.png' width="220px" height="180px"> |
| Head Detection<br /> 人头检测 | [demo](examples/yolov8-head) | <img src='examples/yolov8-head/demo.png' width="220px" height="180px"> |
| Fall Detection<br /> 摔倒检测 | [demo](examples/yolov8-falldown) | <img src='examples/yolov8-falldown/demo.png' width="220px" height="180px"> |
| Trash Detection<br /> 垃圾检测 | [demo](examples/yolov8-plastic-bag) | <img src='examples/yolov8-trash/demo.png' width="250px" height="180px"> |
</details>
## Demo
@ -59,8 +82,9 @@ check **[ort guide](https://ort.pyke.io/setup/linking)**
</details>
## Integrate into your own project
<details close>
<summary>Check Here</summary>
#### 1. Add `usls` as a dependency to your project's `Cargo.toml`
@ -126,3 +150,4 @@ let y = model.run(&x)?;
let annotator = Annotator::default().with_saveout("YOLOv8");
annotator.annotate(&x, &y);
```
</details>

BIN
assets/2.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 176 KiB

BIN
assets/dota.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 680 KiB

View File

@ -17,10 +17,12 @@ cargo run -r --example blip
```shell
[Unconditional image captioning]: a group of people walking around a bus
[Conditional image captioning]: three man walking in front of a bus
Some(["three man walking in front of a bus"])
```
## TODO
* [ ] Multi-batch inference for image caption
* [ ] VQA
* [ ] Retrival
* [ ] TensorRT support for textual model

View File

@ -1,4 +1,4 @@
use usls::{models::Blip, Options};
use usls::{models::Blip, DataLoader, Options};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// visual
@ -22,9 +22,11 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
// build model
let mut model = Blip::new(options_visual, options_textual)?;
// image caption
model.caption("./assets/bus.jpg", None)?; // unconditional
model.caption("./assets/bus.jpg", Some("three man"))?; // conditional
// image caption (this demo use batch_size=1)
let x = vec![DataLoader::try_read("./assets/bus.jpg")?];
let _y = model.caption(&x, None, true)?; // unconditional
let y = model.caption(&x, Some("three man"), true)?; // conditional
println!("{:?}", y[0].texts());
Ok(())
}

View File

@ -1,4 +1,4 @@
use usls::{models::Clip, ops, DataLoader, Options};
use usls::{models::Clip, DataLoader, Options};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// visual
@ -39,7 +39,7 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let feats_image = model.encode_images(&images).unwrap();
// use image to query texts
let matrix = ops::dot2(&feats_image, &feats_text)?; // [m, n]
let matrix = feats_image.dot2(&feats_text)?;
// summary
for i in 0..paths.len() {

View File

@ -20,4 +20,4 @@ cargo run -r --example db
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 165 KiB

BIN
examples/db/demo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

View File

@ -15,18 +15,21 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut model = DB::new(&options)?;
// load image
let x = vec![DataLoader::try_read("./assets/db.png")?];
let x = vec![
DataLoader::try_read("./assets/db.png")?,
// DataLoader::try_read("./assets/2.jpg")?,
];
// run
let y = model.run(&x)?;
// annotate
let annotator = Annotator::default()
.without_name(true)
.without_polygons(false)
.with_mask_alpha(0)
.without_bboxes(false)
.with_saveout("DB-Text-Detection");
.without_bboxes(true)
.with_masks_alpha(60)
.with_polygon_color([255, 105, 180, 255])
.without_mbrs(true)
.with_saveout("DB");
annotator.annotate(&x, &y);
Ok(())

Binary file not shown.

Before

Width:  |  Height:  |  Size: 448 KiB

After

Width:  |  Height:  |  Size: 105 KiB

View File

@ -9,7 +9,6 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
.with_i03((416, 640, 800).into())
// .with_trt(0)
// .with_fp16(true)
// .with_dry_run(10)
.with_confs(&[0.5]);
let mut model = YOLO::new(&options)?;
@ -21,10 +20,10 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
// annotate
let annotator = Annotator::default()
.without_conf(true)
.without_name(true)
.without_polygons(false)
.without_bboxes(true)
.without_bboxes_conf(true)
.without_bboxes_name(true)
.without_polygons(false)
.with_masks_name(false)
.with_saveout("Face-Parsing");
annotator.annotate(&x, &y);

View File

@ -20,4 +20,4 @@ cargo run -r --example fastsam
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 302 KiB

BIN
examples/fastsam/demo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 321 KiB

View File

@ -18,4 +18,4 @@ cargo run -r --example rtdetr
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 258 KiB

BIN
examples/rtdetr/demo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 439 KiB

View File

@ -1,11 +1,11 @@
use usls::{models::RTDETR, Annotator, DataLoader, Options, COCO_NAMES_80};
use usls::{coco, models::RTDETR, Annotator, DataLoader, Options};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// build model
let options = Options::default()
.with_model("../models/rtdetr-l-f16.onnx")
.with_confs(&[0.4, 0.15]) // person: 0.4, others: 0.15
.with_names(&COCO_NAMES_80);
.with_names(&coco::NAMES_80);
let mut model = RTDETR::new(&options)?;
// load image

View File

@ -15,4 +15,4 @@ cargo run -r --example rtmo
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 242 KiB

BIN
examples/rtmo/demo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 455 KiB

View File

@ -1,10 +1,10 @@
use usls::{models::RTMO, Annotator, DataLoader, Options, COCO_SKELETON_17};
use usls::{coco, models::RTMO, Annotator, DataLoader, Options};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// build model
let options = Options::default()
.with_model("../rtmo-l-dyn-f16.onnx")
.with_i00((1, 2, 8).into())
.with_model("../rtmo-s-dyn.onnx")
.with_i00((1, 1, 8).into())
.with_nk(17)
.with_confs(&[0.3])
.with_kconfs(&[0.5]);
@ -19,7 +19,7 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
// // annotate
let annotator = Annotator::default()
.with_saveout("RTMO")
.with_skeletons(&COCO_SKELETON_17);
.with_skeletons(&coco::SKELETONS_16);
annotator.annotate(&x, &y);
Ok(())

View File

@ -24,9 +24,13 @@ cargo run -r --example svtr
## Results
```shell
[Texts] from the background, but also separate text instances which
[Texts] are closely jointed. Some examples are illustrated in Fig.7.
[Texts] 你有这么高速运转的机械进入中国,记住我给出的原理
[Texts] 110022345
[Texts] 冀B6G000
```
["./examples/svtr/images/5.png"]: Some(["are closely jointed. Some examples are illustrated in Fig.7."])
["./examples/svtr/images/6.png"]: Some(["小菊儿胡同71号"])
["./examples/svtr/images/4.png"]: Some(["我在南锣鼓捣猫呢"])
["./examples/svtr/images/1.png"]: Some(["你有这么高速运转的机械进入中国,记住我给出的原理"])
["./examples/svtr/images/2.png"]: Some(["冀B6G000"])
["./examples/svtr/images/9.png"]: Some(["from the background, but also separate text instances which"])
["./examples/svtr/images/8.png"]: Some(["110022345"])
["./examples/svtr/images/3.png"]: Some(["粤A·68688"])
["./examples/svtr/images/7.png"]: Some(["Please lower your volume"])
```

View File

Before

Width:  |  Height:  |  Size: 14 KiB

After

Width:  |  Height:  |  Size: 14 KiB

BIN
examples/svtr/images/2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

BIN
examples/svtr/images/3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

BIN
examples/svtr/images/4.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

View File

Before

Width:  |  Height:  |  Size: 17 KiB

After

Width:  |  Height:  |  Size: 17 KiB

BIN
examples/svtr/images/6.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

BIN
examples/svtr/images/7.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

View File

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

View File

Before

Width:  |  Height:  |  Size: 9.0 KiB

After

Width:  |  Height:  |  Size: 9.0 KiB

View File

@ -5,23 +5,20 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let options = Options::default()
.with_i00((1, 2, 8).into())
.with_i03((320, 960, 1600).into())
.with_confs(&[0.4])
.with_confs(&[0.2])
.with_vocab("../ppocr_rec_vocab.txt")
.with_model("../models/ppocr-v4-svtr-ch-dyn.onnx");
let mut model = SVTR::new(&options)?;
// load image
let xs = vec![
DataLoader::try_read("./examples/svtr/text1.png")?,
DataLoader::try_read("./examples/svtr/text2.png")?,
DataLoader::try_read("./examples/svtr/text3.png")?,
DataLoader::try_read("./examples/svtr/text4.png")?,
DataLoader::try_read("./examples/svtr/text5.png")?,
];
// load images
let dl = DataLoader::default()
.with_batch(1)
.load("./examples/svtr/images")?;
// run
for text in model.run(&xs)?.into_iter() {
println!("[Texts] {text}")
for (xs, paths) in dl {
let ys = model.run(&xs)?;
println!("{paths:?}: {:?}", ys[0].texts())
}
Ok(())

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

View File

@ -40,4 +40,4 @@ cargo run -r --example yolo-world
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 216 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 453 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 922 KiB

After

Width:  |  Height:  |  Size: 296 KiB

View File

@ -5,8 +5,6 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let options = Options::default()
.with_model("../models/yolopv2-dyn-480x800.onnx")
.with_i00((1, 1, 8).into())
// .with_trt(0)
// .with_fp16(true)
.with_confs(&[0.3]);
let mut model = YOLOPv2::new(&options)?;
@ -18,7 +16,7 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
// annotate
let annotator = Annotator::default()
.with_masks_name(false)
.with_masks_name(true)
.with_saveout("YOLOPv2");
annotator.annotate(&x, &y);

BIN
examples/yolov5/demo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 395 KiB

32
examples/yolov5/main.rs Normal file
View File

@ -0,0 +1,32 @@
use usls::{
models::{YOLOTask, YOLO},
Annotator, DataLoader, Options,
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// build model
let options = Options::default()
.with_conf_independent(true)
.with_anchors_first(true)
.with_yolo_task(YOLOTask::Segment)
.with_model("../models/yolov5s-seg.onnx")
.with_trt(0)
.with_fp16(true)
.with_i00((1, 1, 4).into())
.with_i02((224, 640, 800).into())
.with_i03((224, 640, 800).into())
.with_dry_run(3);
let mut model = YOLO::new(&options)?;
// load image
let x = vec![DataLoader::try_read("./assets/bus.jpg")?];
// run
let y = model.run(&x)?;
// annotate
let annotator = Annotator::default().with_saveout("YOLOv5");
annotator.annotate(&x, &y);
Ok(())
}

View File

@ -10,4 +10,4 @@ cargo run -r --example yolov8-face
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 129 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 285 KiB

View File

@ -11,4 +11,4 @@ cargo run -r --example yolov8-falldown
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

View File

@ -2,9 +2,7 @@ use usls::{models::YOLO, Annotator, DataLoader, Options};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// build model
let options = Options::default()
.with_model("../models/yolov8-falldown-f16.onnx")
.with_confs(&[0.3]);
let options = Options::default().with_model("../models/yolov8-falldown-f16.onnx");
let mut model = YOLO::new(&options)?;
// load image

View File

@ -11,4 +11,4 @@ cargo run -r --example yolov8-head
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 291 KiB

View File

@ -2,9 +2,7 @@ use usls::{models::YOLO, Annotator, DataLoader, Options};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// build model
let options = Options::default()
.with_model("../models/yolov8-head-f16.onnx")
.with_confs(&[0.3]);
let options = Options::default().with_model("../models/yolov8-head-f16.onnx");
let mut model = YOLO::new(&options)?;
// load image

View File

@ -13,4 +13,4 @@ cargo run -r --example yolov8-trash
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 214 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 367 KiB

View File

@ -4,7 +4,6 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1.build model
let options = Options::default()
.with_model("../models/yolov8-plastic-bag-f16.onnx")
.with_confs(&[0.3])
.with_names(&["trash"]);
let mut model = YOLO::new(&options)?;

View File

@ -14,19 +14,22 @@ yolo export model=yolov8m.pt format=onnx simplify dynamic
yolo export model=yolov8m-cls.pt format=onnx simplify dynamic
yolo export model=yolov8m-pose.pt format=onnx simplify dynamic
yolo export model=yolov8m-seg.pt format=onnx simplify dynamic
yolo export model=yolov8m-obb.pt format=onnx simplify dynamic
# export onnx model with fixed shapes
yolo export model=yolov8m.pt format=onnx simplify
yolo export model=yolov8m-cls.pt format=onnx simplify
yolo export model=yolov8m-pose.pt format=onnx simplify
yolo export model=yolov8m-seg.pt format=onnx simplify
yolo export model=yolov8m-obb.pt format=onnx simplify
```
## Result
| Task | Annotated image |
| :-------------------: | --------------------- |
| Obb | ![img](./demo-obb.png) |
| Instance Segmentation | ![img](./demo-seg.png) |
| Classification | ![img](./demo-cls.jpg) |
| Classification | ![img](./demo-cls.png) |
| Detection | ![img](./demo-det.png) |
| Pose | ![img](./demo-pose.png) |

Binary file not shown.

Before

Width:  |  Height:  |  Size: 221 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 453 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.8 MiB

After

Width:  |  Height:  |  Size: 451 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 546 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 552 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.8 MiB

After

Width:  |  Height:  |  Size: 457 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.6 MiB

After

Width:  |  Height:  |  Size: 387 KiB

View File

@ -1,38 +1,70 @@
use usls::{
models::YOLO, Annotator, DataLoader, Options, COCO_KEYPOINT_NAMES_17, COCO_SKELETON_17,
};
use usls::{coco, models::YOLO, Annotator, DataLoader, Options};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// build model
let options = Options::default()
.with_model("../models/yolov8m-dyn-f16.onnx")
// .with_trt(0) // cuda by default
// .with_model("../models/yolov8m.onnx")
// .with_model("../models/yolov8m-dyn-f16.onnx")
// .with_model("../models/yolov8m-pose-dyn-f16.onnx")
// .with_model("../models/yolov8m-seg-dyn-f16.onnx")
.with_model("../models/yolov8s-cls.onnx")
// .with_model("../models/yolov8s-obb.onnx")
// .with_trt(0)
// .with_fp16(true)
.with_i00((1, 1, 4).into())
.with_i02((224, 640, 800).into())
.with_i03((224, 640, 800).into())
.with_i02((224, 1024, 1024).into())
.with_i03((224, 1024, 1024).into())
// .with_i02((224, 640, 800).into())
// .with_i03((224, 640, 800).into())
.with_confs(&[0.4, 0.15]) // person: 0.4, others: 0.15
.with_names2(&COCO_KEYPOINT_NAMES_17)
.with_profile(false)
.with_dry_run(3);
.with_names2(&coco::KEYPOINTS_NAMES_17)
.with_profile(true)
.with_dry_run(10);
let mut model = YOLO::new(&options)?;
// build dataloader
let dl = DataLoader::default()
.with_batch(1)
.load("./assets/bus.jpg")?;
// .load("./assets/dota.png")?;
// build annotate
let annotator = Annotator::default()
.with_skeletons(&COCO_SKELETON_17)
.without_conf(false)
.without_name(false)
.with_keypoints_name(false)
.with_keypoints_conf(false)
.with_masks_name(false)
.without_masks(false)
.without_polygons(false)
.without_bboxes(false)
// .with_probs_topk(10)
// // bboxes
// .without_bboxes(false)
// .without_bboxes_conf(false)
// .without_bboxes_name(false)
// .without_bboxes_text_bg(false)
// .with_bboxes_text_color([255, 255, 255, 255])
// .with_bboxes_text_bg_alpha(255)
// // keypoints
// .without_keypoints(false)
// .with_keypoints_palette(&COCO_KEYPOINT_COLORS_17)
.with_skeletons(&coco::SKELETONS_16)
// .with_keypoints_name(false)
// .with_keypoints_conf(false)
// .without_keypoints_text_bg(false)
// .with_keypoints_text_color([255, 255, 255, 255])
// .with_keypoints_text_bg_alpha(255)
// .with_keypoints_radius(4)
// // masks
// .without_masks(false)
// .with_masks_alpha(190)
// .without_polygons(false)
// // .with_polygon_color([0, 255, 255, 255])
// .with_masks_conf(false)
// .with_masks_name(true)
// .with_masks_text_bg(true)
// .with_masks_text_color([255, 255, 255, 255])
// .with_masks_text_bg_alpha(10)
// // mbrs
// .without_mbrs(false)
// .without_mbrs_conf(false)
// .without_mbrs_name(false)
// .without_mbrs_text_bg(false)
// .with_mbrs_text_color([255, 255, 255, 255])
// .with_mbrs_text_bg_alpha(70)
.with_saveout("YOLOv8");
// run & annotate

View File

@ -26,4 +26,4 @@ cargo run -r --example yolov9
## Results
![](./demo.jpg)
![](./demo.png)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 232 KiB

BIN
examples/yolov9/demo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 450 KiB

View File

@ -7,8 +7,7 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
.with_i00((1, 1, 4).into())
.with_i02((416, 640, 800).into())
.with_i03((416, 640, 800).into())
.with_confs(&[0.4, 0.15]) // person: 0.4, others: 0.15
.with_profile(false);
.with_confs(&[0.4, 0.15]); // person: 0.4, others: 0.15
let mut model = YOLO::new(&options)?;
// load image

View File

@ -1,64 +1,135 @@
use crate::{auto_load, string_now, Bbox, Embedding, Keypoint, Mask, Ys, CHECK_MARK, CROSS_MARK};
use crate::{auto_load, string_now, Bbox, Keypoint, Mask, Mbr, Prob, CHECK_MARK, CROSS_MARK, Y};
use ab_glyph::{FontVec, PxScale};
use anyhow::Result;
use image::{DynamicImage, Rgba, RgbaImage};
/// Annotator for struct `Y`
#[derive(Debug)]
pub struct Annotator {
font: ab_glyph::FontVec,
scale_: f32, // Cope with ab_glyph & imageproc=0.24.0
skeletons: Option<Vec<(usize, usize)>>,
font: FontVec,
_scale: f32, // Cope with ab_glyph & imageproc=0.24.0
scale_dy: f32,
saveout: Option<String>,
mask_alpha: u8,
polygon_color: Rgba<u8>,
without_conf: bool,
without_name: bool,
// About mbrs
without_mbrs: bool,
without_mbrs_conf: bool,
without_mbrs_name: bool,
without_mbrs_text_bg: bool,
mbrs_text_color: Rgba<u8>,
// About bboxes
without_bboxes: bool,
without_bboxes_conf: bool,
without_bboxes_name: bool,
without_bboxes_text_bg: bool,
bboxes_text_color: Rgba<u8>,
// About keypoints
without_keypoints: bool,
with_keypoints_conf: bool,
with_keypoints_name: bool,
with_masks_name: bool,
without_bboxes: bool,
without_keypoints_text_bg: bool,
keypoints_text_color: Rgba<u8>,
skeletons: Option<Vec<(usize, usize)>>,
keypoints_radius: usize,
keypoints_palette: Option<Vec<(u8, u8, u8, u8)>>,
// About masks
without_masks: bool,
without_polygons: bool,
without_keypoints: bool,
keypoint_radius: usize,
with_masks_conf: bool,
with_masks_name: bool,
with_masks_text_bg: bool,
masks_text_color: Rgba<u8>,
masks_alpha: u8,
polygon_color: Rgba<u8>,
// About probs
probs_topk: usize,
}
impl Default for Annotator {
fn default() -> Self {
Self {
font: Self::load_font(None).unwrap(),
scale_: 6.666667,
mask_alpha: 179,
polygon_color: Rgba([255, 255, 255, 255]),
skeletons: None,
_scale: 6.666667,
scale_dy: 28.,
masks_alpha: 179,
saveout: None,
without_conf: false,
without_name: false,
without_bboxes: false,
without_bboxes_conf: false,
without_bboxes_name: false,
bboxes_text_color: Rgba([0, 0, 0, 255]),
without_bboxes_text_bg: false,
without_mbrs: false,
without_mbrs_conf: false,
without_mbrs_name: false,
without_mbrs_text_bg: false,
mbrs_text_color: Rgba([0, 0, 0, 255]),
without_keypoints: false,
with_keypoints_conf: false,
with_keypoints_name: false,
with_masks_name: false,
without_bboxes: false,
keypoints_radius: 3,
skeletons: None,
keypoints_palette: None,
without_keypoints_text_bg: false,
keypoints_text_color: Rgba([0, 0, 0, 255]),
without_masks: false,
without_polygons: false,
without_keypoints: false,
keypoint_radius: 3,
polygon_color: Rgba([255, 255, 255, 255]),
with_masks_name: false,
with_masks_conf: false,
with_masks_text_bg: false,
masks_text_color: Rgba([255, 255, 255, 255]),
probs_topk: 5usize,
}
}
}
impl Annotator {
pub fn with_keypoint_radius(mut self, x: usize) -> Self {
self.keypoint_radius = x;
pub fn without_bboxes(mut self, x: bool) -> Self {
self.without_bboxes = x;
self
}
pub fn without_conf(mut self, x: bool) -> Self {
self.without_conf = x;
pub fn without_bboxes_conf(mut self, x: bool) -> Self {
self.without_bboxes_conf = x;
self
}
pub fn without_name(mut self, x: bool) -> Self {
self.without_name = x;
pub fn without_bboxes_name(mut self, x: bool) -> Self {
self.without_bboxes_name = x;
self
}
pub fn without_bboxes_text_bg(mut self, x: bool) -> Self {
self.without_bboxes_text_bg = x;
self
}
pub fn with_bboxes_text_bg_alpha(mut self, x: u8) -> Self {
self.bboxes_text_color.0[3] = x;
self
}
pub fn with_bboxes_text_color(mut self, rgba: [u8; 4]) -> Self {
self.bboxes_text_color = Rgba(rgba);
self
}
pub fn without_keypoints(mut self, x: bool) -> Self {
self.without_keypoints = x;
self
}
pub fn with_skeletons(mut self, x: &[(usize, usize)]) -> Self {
self.skeletons = Some(x.to_vec());
self
}
pub fn with_keypoints_palette(mut self, x: &[(u8, u8, u8, u8)]) -> Self {
self.keypoints_palette = Some(x.to_vec());
self
}
pub fn with_keypoints_radius(mut self, x: usize) -> Self {
self.keypoints_radius = x;
self
}
@ -72,13 +143,48 @@ impl Annotator {
self
}
pub fn with_masks_name(mut self, x: bool) -> Self {
self.with_masks_name = x;
pub fn with_keypoints_text_color(mut self, rgba: [u8; 4]) -> Self {
self.keypoints_text_color = Rgba(rgba);
self
}
pub fn without_bboxes(mut self, x: bool) -> Self {
self.without_bboxes = x;
pub fn without_keypoints_text_bg(mut self, x: bool) -> Self {
self.without_keypoints_text_bg = x;
self
}
pub fn with_keypoints_text_bg_alpha(mut self, x: u8) -> Self {
self.keypoints_text_color.0[3] = x;
self
}
pub fn without_mbrs(mut self, x: bool) -> Self {
self.without_mbrs = x;
self
}
pub fn without_mbrs_conf(mut self, x: bool) -> Self {
self.without_mbrs_conf = x;
self
}
pub fn without_mbrs_name(mut self, x: bool) -> Self {
self.without_mbrs_name = x;
self
}
pub fn without_mbrs_text_bg(mut self, x: bool) -> Self {
self.without_mbrs_text_bg = x;
self
}
pub fn with_mbrs_text_color(mut self, rgba: [u8; 4]) -> Self {
self.mbrs_text_color = Rgba(rgba);
self
}
pub fn with_mbrs_text_bg_alpha(mut self, x: u8) -> Self {
self.mbrs_text_color.0[3] = x;
self
}
@ -92,8 +198,33 @@ impl Annotator {
self
}
pub fn with_mask_alpha(mut self, x: u8) -> Self {
self.mask_alpha = x;
pub fn with_masks_conf(mut self, x: bool) -> Self {
self.with_masks_conf = x;
self
}
pub fn with_masks_name(mut self, x: bool) -> Self {
self.with_masks_name = x;
self
}
pub fn with_masks_text_bg(mut self, x: bool) -> Self {
self.with_masks_text_bg = x;
self
}
pub fn with_masks_text_color(mut self, rgba: [u8; 4]) -> Self {
self.masks_text_color = Rgba(rgba);
self
}
pub fn with_masks_alpha(mut self, x: u8) -> Self {
self.masks_alpha = x;
self
}
pub fn with_masks_text_bg_alpha(mut self, x: u8) -> Self {
self.masks_text_color.0[3] = x;
self
}
@ -102,8 +233,8 @@ impl Annotator {
self
}
pub fn without_keypoints(mut self, x: bool) -> Self {
self.without_keypoints = x;
pub fn with_probs_topk(mut self, x: usize) -> Self {
self.probs_topk = x;
self
}
@ -112,11 +243,6 @@ impl Annotator {
self
}
pub fn with_skeletons(mut self, skeletons: &[(usize, usize)]) -> Self {
self.skeletons = Some(skeletons.to_vec());
self
}
pub fn with_font(mut self, path: &str) -> Self {
self.font = Self::load_font(Some(path)).unwrap();
self
@ -135,36 +261,44 @@ impl Annotator {
}
}
pub fn annotate(&self, imgs: &[DynamicImage], ys: &[Ys]) {
pub fn annotate(&self, imgs: &[DynamicImage], ys: &[Y]) {
for (img, y) in imgs.iter().zip(ys.iter()) {
let mut img_rgb = img.to_rgba8();
// masks
if !self.without_polygons {
if let Some(xs) = &y.masks {
self.plot_polygons(&mut img_rgb, xs)
if !self.without_masks {
if let Some(xs) = &y.masks() {
self.plot_masks_and_polygons(&mut img_rgb, xs)
}
}
// bboxes
if !self.without_bboxes {
if let Some(xs) = &y.bboxes {
if let Some(xs) = &y.bboxes() {
self.plot_bboxes(&mut img_rgb, xs)
}
}
// mbrs
if !self.without_mbrs {
if let Some(xs) = &y.mbrs() {
self.plot_mbrs(&mut img_rgb, xs)
}
}
// keypoints
if !self.without_keypoints {
if let Some(xs) = &y.keypoints {
if let Some(xs) = &y.keypoints() {
self.plot_keypoints(&mut img_rgb, xs)
}
}
// probs
if let Some(xs) = &y.probs {
if let Some(xs) = &y.probs() {
self.plot_probs(&mut img_rgb, xs)
}
// save
if let Some(saveout) = &self.saveout {
self.save(&img_rgb, saveout);
}
@ -173,127 +307,149 @@ impl Annotator {
pub fn plot_bboxes(&self, img: &mut RgbaImage, bboxes: &[Bbox]) {
for bbox in bboxes.iter() {
// bboxes
imageproc::drawing::draw_hollow_rect_mut(
img,
imageproc::rect::Rect::at(bbox.xmin().round() as i32, bbox.ymin().round() as i32)
.of_size(bbox.width().round() as u32, bbox.height().round() as u32),
image::Rgba(self.get_color(bbox.id()).into()),
image::Rgba(self.get_color(bbox.id() as usize).into()),
);
// texts
let mut legend = String::new();
if !self.without_name {
if !self.without_bboxes_name {
legend.push_str(&bbox.name().unwrap_or(&bbox.id().to_string()).to_string());
}
if !self.without_conf {
if !self.without_name {
if !self.without_bboxes_conf {
if !self.without_bboxes_name {
legend.push_str(&format!(": {:.4}", bbox.confidence()));
} else {
legend.push_str(&format!("{:.4}", bbox.confidence()));
}
}
if !legend.is_empty() {
let scale_dy = img.width().max(img.height()) as f32 / 40.0;
let scale = PxScale::from(scale_dy);
let (text_w, text_h) = imageproc::drawing::text_size(scale, &self.font, &legend); // u32
let text_h = text_h + text_h / 3;
let top = if bbox.ymin() > text_h as f32 {
(bbox.ymin().round() as u32 - text_h) as i32
} else {
(text_h - bbox.ymin().round() as u32) as i32
};
let mut left = bbox.xmin() as i32;
if left + text_w as i32 > img.width() as i32 {
left = img.width() as i32 - text_w as i32;
}
imageproc::drawing::draw_filled_rect_mut(
img,
imageproc::rect::Rect::at(left, top).of_size(text_w, text_h),
image::Rgba(self.get_color(bbox.id()).into()),
);
imageproc::drawing::draw_text_mut(
img,
image::Rgba([0, 0, 0, 255]),
left,
top - (scale_dy / self.scale_).floor() as i32 + 2,
scale,
&self.font,
&legend,
);
}
self.put_text(
img,
legend.as_str(),
bbox.xmin(),
bbox.ymin(),
image::Rgba(self.get_color(bbox.id() as usize).into()),
self.bboxes_text_color,
self.without_bboxes_text_bg,
);
}
}
pub fn plot_polygons(&self, img: &mut RgbaImage, masks: &[Mask]) {
pub fn plot_mbrs(&self, img: &mut RgbaImage, mbrs: &[Mbr]) {
for mbr in mbrs.iter() {
// mbrs
for i in 0..mbr.vertices().len() {
let p1 = mbr.vertices()[i];
let p2 = mbr.vertices()[(i + 1) % mbr.vertices().len()];
imageproc::drawing::draw_line_segment_mut(
img,
(p1.x.round() as f32, p1.y.round() as f32),
(p2.x.round() as f32, p2.y.round() as f32),
image::Rgba(self.get_color(mbr.id() as usize).into()),
);
}
// text
let mut legend = String::new();
if !self.without_mbrs_name {
legend.push_str(&mbr.name().unwrap_or(&mbr.id().to_string()).to_string());
}
if !self.without_mbrs_conf {
if !self.without_mbrs_name {
legend.push_str(&format!(": {:.4}", mbr.confidence()));
} else {
legend.push_str(&format!("{:.4}", mbr.confidence()));
}
}
self.put_text(
img,
legend.as_str(),
mbr.top().x as f32,
mbr.top().y as f32,
image::Rgba(self.get_color(mbr.id() as usize).into()),
self.mbrs_text_color,
self.without_mbrs_text_bg,
);
}
}
pub fn plot_masks_and_polygons(&self, img: &mut RgbaImage, masks: &[Mask]) {
let mut convas = img.clone();
for mask in masks.iter() {
// mask
let mut polygon_i32 = mask
.polygon
.points
.iter()
.map(|p| imageproc::point::Point::new(p.x as i32, p.y as i32))
// masks
let polygon_i32 = mask
.polygon()
.exterior()
.points()
.take(if mask.is_closed() {
mask.count() - 1
} else {
mask.count()
})
.map(|p| imageproc::point::Point::new(p.x() as i32, p.y() as i32))
.collect::<Vec<_>>();
if polygon_i32.first() == polygon_i32.last() {
polygon_i32.pop();
}
let mut mask_color = self.get_color(mask.id);
mask_color.3 = self.mask_alpha;
let mut mask_color = self.get_color(mask.id() as usize);
mask_color.3 = self.masks_alpha;
imageproc::drawing::draw_polygon_mut(
&mut convas,
&polygon_i32,
Rgba(mask_color.into()),
);
// contour
let polygon_f32 = mask
.polygon
.points
.iter()
.map(|p| imageproc::point::Point::new(p.x, p.y))
.collect::<Vec<_>>();
imageproc::drawing::draw_hollow_polygon_mut(img, &polygon_f32, self.polygon_color);
// text
let mut legend = String::new();
if self.with_masks_name {
legend.push_str(&mask.name().unwrap_or(&mask.id().to_string()).to_string());
}
if !legend.is_empty() {
let scale_dy = img.width().max(img.height()) as f32 / 60.0;
let scale = PxScale::from(scale_dy);
let (text_w, text_h) = imageproc::drawing::text_size(scale, &self.font, &legend); // u32
let text_h = text_h + text_h / 3;
let bbox = mask.polygon.find_min_rect();
let top = (bbox.cy().round() as u32 - text_h) as i32;
let mut left = (bbox.cx() as i32 - text_w as i32 / 2).max(0);
if left + text_w as i32 > img.width() as i32 {
left = img.width() as i32 - text_w as i32;
}
imageproc::drawing::draw_filled_rect_mut(
&mut convas,
imageproc::rect::Rect::at(left, top).of_size(text_w, text_h),
image::Rgba(self.get_color(mask.id()).into()),
);
imageproc::drawing::draw_text_mut(
&mut convas,
image::Rgba([0, 0, 0, 255]),
left,
top - (scale_dy / self.scale_).floor() as i32 + 2,
scale,
&self.font,
&legend,
);
// contours(polygons)
if !self.without_polygons {
let polygon_f32 = mask
.polygon()
.exterior()
.points()
.take(if mask.is_closed() {
mask.count() - 1
} else {
mask.count()
})
.map(|p| imageproc::point::Point::new(p.x() as f32, p.y() as f32))
.collect::<Vec<_>>();
imageproc::drawing::draw_hollow_polygon_mut(img, &polygon_f32, self.polygon_color);
}
}
image::imageops::overlay(img, &convas, 0, 0);
// text on top
for mask in masks.iter() {
if let Some((x, y)) = mask.centroid() {
let mut legend = String::new();
if self.with_masks_name {
legend.push_str(&mask.name().unwrap_or(&mask.id().to_string()).to_string());
}
if self.with_masks_conf {
if self.with_masks_name {
legend.push_str(&format!(": {:.4}", mask.confidence()));
} else {
legend.push_str(&format!("{:.4}", mask.confidence()));
}
}
self.put_text(
img,
legend.as_str(),
x,
y,
image::Rgba(self.get_color(mask.id() as usize).into()),
self.masks_text_color,
!self.with_masks_text_bg,
);
}
}
}
pub fn plot_probs(&self, img: &mut RgbaImage, probs: &Embedding) {
let topk = 5usize;
pub fn plot_probs(&self, img: &mut RgbaImage, probs: &Prob) {
let (x, mut y) = (img.width() as i32 / 20, img.height() as i32 / 20);
for k in probs.topk(topk).iter() {
for k in probs.topk(self.probs_topk).iter() {
let legend = format!("{}: {:.4}", k.2.as_ref().unwrap_or(&k.0.to_string()), k.1);
let scale_dy = img.width().max(img.height()) as f32 / 30.0;
let scale = PxScale::from(scale_dy);
let scale = PxScale::from(self.scale_dy);
let (text_w, text_h) = imageproc::drawing::text_size(scale, &self.font, &legend);
let text_h = text_h + text_h / 3;
y += text_h as i32;
@ -306,7 +462,7 @@ impl Annotator {
img,
image::Rgba([0, 0, 0, 255]),
x,
y - (scale_dy / self.scale_).floor() as i32 + 2,
y - (self.scale_dy / self._scale).floor() as i32 + 2,
scale,
&self.font,
&legend,
@ -320,12 +476,20 @@ impl Annotator {
if kpt.confidence() == 0.0 {
continue;
}
// keypoints
let color = match &self.keypoints_palette {
None => self.get_color(i + 10),
Some(keypoints_palette) => keypoints_palette[i],
};
imageproc::drawing::draw_filled_circle_mut(
img,
(kpt.x() as i32, kpt.y() as i32),
self.keypoint_radius as i32,
image::Rgba(self.get_color(i + 10).into()),
self.keypoints_radius as i32,
image::Rgba(color.into()),
);
// text
let mut legend = String::new();
if self.with_keypoints_name {
legend.push_str(&kpt.name().unwrap_or(&kpt.id().to_string()).to_string());
@ -337,37 +501,15 @@ impl Annotator {
legend.push_str(&format!("{:.4}", kpt.confidence()));
}
}
if !legend.is_empty() {
let scale_dy = img.width().max(img.height()) as f32 / 80.0;
let scale = PxScale::from(scale_dy);
let (text_w, text_h) =
imageproc::drawing::text_size(scale, &self.font, &legend); // u32
let text_h = text_h + text_h / 3;
let top = if kpt.y() > text_h as f32 {
(kpt.y().round() as u32 - text_h - self.keypoint_radius as u32) as i32
} else {
(text_h - self.keypoint_radius as u32 - kpt.y().round() as u32) as i32
};
let mut left =
(kpt.x() as i32 - self.keypoint_radius as i32 - text_w as i32 / 2).max(0);
if left + text_w as i32 > img.width() as i32 {
left = img.width() as i32 - text_w as i32;
}
imageproc::drawing::draw_filled_rect_mut(
img,
imageproc::rect::Rect::at(left, top).of_size(text_w, text_h),
image::Rgba(self.get_color(kpt.id() as usize).into()),
);
imageproc::drawing::draw_text_mut(
img,
image::Rgba([0, 0, 0, 255]),
left,
top - (scale_dy / self.scale_).floor() as i32 + 2,
scale,
&self.font,
&legend,
);
}
self.put_text(
img,
legend.as_str(),
kpt.x(),
kpt.y(),
image::Rgba(self.get_color(kpt.id() as usize).into()),
self.keypoints_text_color,
self.without_keypoints_text_bg,
);
}
// draw skeleton
@ -389,6 +531,53 @@ impl Annotator {
}
}
#[allow(clippy::too_many_arguments)]
fn put_text(
&self,
img: &mut RgbaImage,
legend: &str,
x: f32,
y: f32,
color: Rgba<u8>,
text_color: Rgba<u8>,
without_text_bg: bool,
) {
if !legend.is_empty() {
let scale = PxScale::from(self.scale_dy);
let (text_w, text_h) = imageproc::drawing::text_size(scale, &self.font, legend);
let text_h = text_h + text_h / 3;
let top = if y > text_h as f32 {
(y.round() as u32 - text_h) as i32
} else {
(text_h - y.round() as u32) as i32
};
let mut left = x as i32;
if left + text_w as i32 > img.width() as i32 {
left = img.width() as i32 - text_w as i32;
}
// text bbox
if !without_text_bg {
imageproc::drawing::draw_filled_rect_mut(
img,
imageproc::rect::Rect::at(left, top).of_size(text_w, text_h),
color,
);
}
// text
imageproc::drawing::draw_text_mut(
img,
text_color,
left,
top - (self.scale_dy / self._scale).floor() as i32 + 2,
scale,
&self.font,
legend,
);
}
}
fn load_font(path: Option<&str>) -> Result<FontVec> {
let path_font = match path {
None => auto_load("Arial.ttf")?,
@ -402,28 +591,28 @@ impl Annotator {
Self::color_palette()[n % Self::color_palette().len()]
}
fn color_palette() -> Vec<(u8, u8, u8, u8)> {
vec![
(0, 255, 0, 255),
(255, 128, 0, 255),
(0, 0, 255, 255),
(255, 153, 51, 255),
(255, 0, 0, 255),
(255, 51, 255, 255),
(102, 178, 255, 255),
(51, 153, 255, 255),
(255, 51, 51, 255),
(153, 255, 153, 255),
(102, 255, 102, 255),
(153, 204, 255, 255),
(255, 153, 153, 255),
(255, 178, 102, 255),
(230, 230, 0, 255),
(255, 153, 255, 255),
(255, 102, 255, 255),
(255, 102, 102, 255),
(51, 255, 51, 255),
(255, 255, 255, 255),
fn color_palette() -> [(u8, u8, u8, u8); 20] {
[
(0, 255, 127, 255), // spring green
(255, 105, 180, 255), // hot pink
(255, 99, 71, 255), // tomato
(255, 215, 0, 255), // glod
(188, 143, 143, 255), // rosy brown
(0, 191, 255, 255), // deep sky blue
(143, 188, 143, 255), // dark sea green
(238, 130, 238, 255), // violet
(154, 205, 50, 255), // yellow green
(205, 133, 63, 255), // peru
(30, 144, 255, 255), // dodger blue
(112, 128, 144, 255), // slate gray
(127, 255, 212, 255), // aqua marine
(51, 153, 255, 255), // blue
(0, 255, 255, 255), // cyan
(138, 43, 226, 255), // blue violet
(165, 42, 42, 255), // brown
(216, 191, 216, 255), // thistle
(240, 255, 255, 255), // azure
(95, 158, 160, 255), // cadet blue
]
}
}

View File

@ -1,78 +0,0 @@
use crate::Rect;
#[derive(Clone, PartialEq, Default)]
pub struct Bbox {
rect: Rect,
id: usize,
confidence: f32,
name: Option<String>,
}
impl std::fmt::Debug for Bbox {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Bbox")
.field("xmin", &self.rect.xmin())
.field("ymin", &self.rect.ymin())
.field("xmax", &self.rect.xmax())
.field("ymax", &self.rect.ymax())
.field("id", &self.id)
.field("name", &self.name)
.field("confidence", &self.confidence)
.finish()
}
}
impl Bbox {
pub fn new(rect: Rect, id: usize, confidence: f32, name: Option<String>) -> Self {
Self {
rect,
id,
confidence,
name,
}
}
pub fn width(&self) -> f32 {
self.rect.width()
}
pub fn height(&self) -> f32 {
self.rect.height()
}
pub fn xmin(&self) -> f32 {
self.rect.xmin()
}
pub fn ymin(&self) -> f32 {
self.rect.ymin()
}
pub fn xmax(&self) -> f32 {
self.rect.xmax()
}
pub fn ymax(&self) -> f32 {
self.rect.ymax()
}
pub fn id(&self) -> usize {
self.id
}
pub fn name(&self) -> Option<&String> {
self.name.as_ref()
}
pub fn confidence(&self) -> f32 {
self.confidence
}
pub fn area(&self) -> f32 {
self.rect.area()
}
pub fn iou(&self, other: &Bbox) -> f32 {
self.rect.intersect(&other.rect) / self.rect.union(&other.rect)
}
}

View File

@ -5,12 +5,12 @@ use std::collections::VecDeque;
use std::path::{Path, PathBuf};
use walkdir::{DirEntry, WalkDir};
/// Dataloader for load images
#[derive(Debug, Clone)]
pub struct DataLoader {
// source could be single image path, folder with images (TODO: video, stream)
pub paths: VecDeque<PathBuf>,
pub recursive: bool,
pub batch: usize,
pub paths: VecDeque<PathBuf>,
}
impl Iterator for DataLoader {

View File

@ -4,7 +4,7 @@ pub enum Device {
Cuda(usize),
Trt(usize),
CoreML(usize),
Cann(usize),
// Cann(usize),
// Acl(usize),
// Rocm(usize),
// Rknpu(usize),

View File

@ -1,5 +1,6 @@
use std::ops::Index;
/// Dynamic Confidences
#[derive(Clone, PartialEq, PartialOrd)]
pub struct DynConf {
confs: Vec<f32>,

View File

@ -8,6 +8,7 @@ use ort::{
use crate::{config_dir, Device, MinOptMax, Options, CHECK_MARK, CROSS_MARK, SAFE_CROSS_MARK};
/// ONNXRuntime Backend
#[derive(Debug)]
pub struct OrtEngine {
session: Session,
@ -145,8 +146,7 @@ impl OrtEngine {
Device::Cpu(_) => {
println!("{CHECK_MARK} Using CPU");
ort::CPUExecutionProvider::default().build()
}
_ => todo!(),
} // _ => todo!(),
};
let session = builder
.with_optimization_level(ort::GraphOptimizationLevel::Level3)?

View File

@ -1,63 +0,0 @@
use crate::Point;
#[derive(PartialEq, Clone)]
pub struct Keypoint {
pub point: Point,
confidence: f32,
id: isize,
name: Option<String>,
}
impl Default for Keypoint {
fn default() -> Self {
Self {
id: -1,
confidence: 0.0,
point: Point::default(),
name: None,
}
}
}
impl std::fmt::Debug for Keypoint {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Keypoint")
.field("x", &self.point.x)
.field("y", &self.point.y)
.field("confidence", &self.confidence)
.field("id", &self.id)
.field("name", &self.name)
.finish()
}
}
impl Keypoint {
pub fn new(point: Point, confidence: f32, id: isize, name: Option<String>) -> Self {
Self {
point,
confidence,
id,
name,
}
}
pub fn x(&self) -> f32 {
self.point.x
}
pub fn y(&self) -> f32 {
self.point.y
}
pub fn confidence(&self) -> f32 {
self.confidence
}
pub fn id(&self) -> isize {
self.id
}
pub fn name(&self) -> Option<&String> {
self.name.as_ref()
}
}

View File

@ -1,6 +1,7 @@
use anyhow::Result;
use rand::distributions::{Distribution, WeightedIndex};
/// Logits Sampler
#[derive(Debug)]
pub struct LogitsSampler {
temperature: f32,

View File

@ -1,28 +0,0 @@
use crate::Polygon;
#[derive(Default, Clone, PartialEq)]
pub struct Mask {
pub polygon: Polygon,
pub id: usize,
pub name: Option<String>,
}
impl std::fmt::Debug for Mask {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Mask")
.field("polygons(num_points)", &self.polygon.points.len())
.field("id", &self.id)
.field("name", &self.name)
.finish()
}
}
impl Mask {
pub fn id(&self) -> usize {
self.id
}
pub fn name(&self) -> Option<&String> {
self.name.as_ref()
}
}

View File

@ -1,3 +1,4 @@
/// A value composed of Min-Opt-Max
#[derive(Debug, Clone)]
pub struct MinOptMax {
pub min: isize,

View File

@ -1,45 +1,22 @@
mod annotator;
mod bbox;
mod dataloader;
mod device;
mod dynconf;
mod embedding;
mod engine;
mod keypoint;
mod logits_sampler;
mod mask;
mod metric;
mod min_opt_max;
pub mod ops;
mod options;
mod point;
mod polygon;
mod rect;
mod rotated_rect;
mod tokenizer_stream;
mod utils;
mod ys;
pub use annotator::Annotator;
pub use bbox::Bbox;
pub use dataloader::DataLoader;
pub use device::Device;
pub use dynconf::DynConf;
pub use embedding::Embedding;
pub use engine::OrtEngine;
pub use keypoint::Keypoint;
pub use logits_sampler::LogitsSampler;
pub use mask::Mask;
pub use metric::Metric;
pub use min_opt_max::MinOptMax;
pub use options::Options;
pub use point::Point;
pub use polygon::Polygon;
pub use rect::Rect;
pub use rotated_rect::RotatedRect;
pub use tokenizer_stream::TokenizerStream;
pub use utils::{
auto_load, config_dir, download, string_now, COCO_KEYPOINT_NAMES_17, COCO_NAMES_80,
COCO_SKELETON_17,
};
pub use ys::Ys;

View File

@ -1,7 +1,6 @@
use crate::{Mask, Polygon};
use anyhow::Result;
use image::{DynamicImage, GenericImageView, GrayImage, ImageBuffer};
use ndarray::{Array, Axis, Ix2, IxDyn};
use image::{DynamicImage, GenericImageView, ImageBuffer};
use ndarray::{Array, Axis, IxDyn};
pub fn standardize(xs: Array<f32, IxDyn>, mean: &[f32], std: &[f32]) -> Array<f32, IxDyn> {
let mean = Array::from_shape_vec((1, mean.len(), 1, 1), mean.to_vec()).unwrap();
@ -22,18 +21,6 @@ pub fn norm2(xs: &Array<f32, IxDyn>) -> Array<f32, IxDyn> {
xs / std_
}
pub fn dot2(query: &Array<f32, IxDyn>, gallery: &Array<f32, IxDyn>) -> Result<Vec<Vec<f32>>> {
// (m, ndim) * (n, ndim).t => (m, n)
let query = query.to_owned().into_dimensionality::<Ix2>()?;
let gallery = gallery.to_owned().into_dimensionality::<Ix2>()?;
let matrix = query.dot(&gallery.t());
let exps = matrix.mapv(|x| x.exp());
let stds = exps.sum_axis(Axis(1));
let matrix = exps / stds.insert_axis(Axis(1));
let matrix: Vec<Vec<f32>> = matrix.axis_iter(Axis(0)).map(|row| row.to_vec()).collect();
Ok(matrix)
}
pub fn scale_wh(w0: f32, h0: f32, w1: f32, h1: f32) -> (f32, f32, f32) {
let r = (w1 / w0).min(h1 / h0);
(r, (w0 * r).round(), (h0 * r).round())
@ -61,6 +48,7 @@ pub fn letterbox(
width: u32,
bg: f32,
) -> Result<Array<f32, IxDyn>> {
// TODO: refactor
let mut ys = Array::ones((xs.len(), 3, height as usize, width as usize)).into_dyn();
ys.fill(bg);
for (idx, x) in xs.iter().enumerate() {
@ -121,26 +109,3 @@ pub fn descale_mask(mask: DynamicImage, w0: f32, h0: f32, w1: f32, h1: f32) -> D
let mask = mask.crop(0, 0, w as u32, h as u32);
mask.resize_exact(w1 as u32, h1 as u32, image::imageops::FilterType::Triangle)
}
pub fn get_masks_from_image(
mask: GrayImage,
thresh: u8,
id: usize,
name: Option<String>,
) -> Vec<Mask> {
// let mask = mask.into_luma8();
let contours: Vec<imageproc::contours::Contour<i32>> =
imageproc::contours::find_contours_with_threshold(&mask, thresh);
let mut masks: Vec<Mask> = Vec::new();
contours.iter().for_each(|contour| {
// contour.border_type == imageproc::contours::BorderType::Outer &&
if contour.points.len() > 2 {
masks.push(Mask {
polygon: Polygon::from_contour(contour),
id,
name: name.to_owned(),
});
}
});
masks
}

View File

@ -1,5 +1,6 @@
use crate::{auto_load, Device, MinOptMax};
use crate::{auto_load, models::YOLOTask, Device, MinOptMax};
/// Options for building models
#[derive(Debug, Clone)]
pub struct Options {
pub onnx_path: String,
@ -47,11 +48,15 @@ pub struct Options {
pub tokenizer: Option<String>,
pub vocab: Option<String>,
pub names: Option<Vec<String>>, // names
pub names2: Option<Vec<String>>, // names2, could be keypoints names
pub anchors_first: bool, // otuput format: [bs, anchors/na, pos+nc+nm]
pub names2: Option<Vec<String>>, // names2: could be keypoints names
pub names3: Option<Vec<String>>, // names3
pub min_width: Option<f32>,
pub min_height: Option<f32>,
pub unclip_ratio: f32, // DB
pub yolo_task: Option<YOLOTask>,
pub anchors_first: bool, // yolo model output format like: [batch_size, anchors, xywh_clss_xxx]
pub conf_independent: bool, // xywh_conf_clss
pub apply_probs_softmax: bool,
}
impl Default for Options {
@ -99,10 +104,14 @@ impl Default for Options {
vocab: None,
names: None,
names2: None,
anchors_first: false,
names3: None,
min_width: None,
min_height: None,
unclip_ratio: 1.5,
yolo_task: None,
anchors_first: false,
conf_independent: false,
apply_probs_softmax: false,
}
}
}
@ -143,6 +152,21 @@ impl Options {
self
}
pub fn with_yolo_task(mut self, x: YOLOTask) -> Self {
self.yolo_task = Some(x);
self
}
pub fn with_conf_independent(mut self, x: bool) -> Self {
self.conf_independent = x;
self
}
pub fn apply_probs_softmax(mut self, x: bool) -> Self {
self.apply_probs_softmax = x;
self
}
pub fn with_profile(mut self, profile: bool) -> Self {
self.profile = profile;
self
@ -158,6 +182,11 @@ impl Options {
self
}
pub fn with_names3(mut self, names: &[&str]) -> Self {
self.names3 = Some(names.iter().map(|x| x.to_string()).collect::<Vec<String>>());
self
}
pub fn with_vocab(mut self, vocab: &str) -> Self {
self.vocab = Some(auto_load(vocab).unwrap());
self
@ -183,8 +212,8 @@ impl Options {
self
}
pub fn with_anchors_first(mut self) -> Self {
self.anchors_first = true;
pub fn with_anchors_first(mut self, x: bool) -> Self {
self.anchors_first = x;
self
}

View File

@ -1,194 +0,0 @@
use std::ops::{Add, Div, Mul, Sub};
#[derive(Default, Debug, PartialOrd, PartialEq, Clone, Copy)]
pub struct Point {
pub x: f32,
pub y: f32,
}
impl Add for Point {
type Output = Self;
fn add(self, other: Self) -> Self::Output {
Self {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
impl Add<f32> for Point {
type Output = Self;
fn add(self, other: f32) -> Self::Output {
Self {
x: self.x + other,
y: self.y + other,
}
}
}
impl Sub for Point {
type Output = Self;
fn sub(self, other: Self) -> Self::Output {
Self {
x: self.x - other.x,
y: self.y - other.y,
}
}
}
impl Sub<f32> for Point {
type Output = Self;
fn sub(self, other: f32) -> Self::Output {
Self {
x: self.x * other,
y: self.y * other,
}
}
}
impl Mul<f32> for Point {
type Output = Self;
fn mul(self, other: f32) -> Self::Output {
Self {
x: self.x * other,
y: self.y * other,
}
}
}
impl Mul for Point {
type Output = Self;
fn mul(self, other: Self) -> Self::Output {
Self {
x: self.x * other.x,
y: self.y * other.y,
}
}
}
impl Div for Point {
type Output = Self;
fn div(self, other: Self) -> Self::Output {
Self {
x: self.x / other.x,
y: self.y / other.y,
}
}
}
impl Div<f32> for Point {
type Output = Self;
fn div(self, other: f32) -> Self::Output {
Self {
x: self.x / other,
y: self.y / other,
}
}
}
impl From<(f32, f32)> for Point {
fn from((x, y): (f32, f32)) -> Self {
Self { x, y }
}
}
impl From<Point> for (f32, f32) {
fn from(Point { x, y }: Point) -> Self {
(x, y)
}
}
impl From<[f32; 2]> for Point {
fn from([x, y]: [f32; 2]) -> Self {
Self { x, y }
}
}
impl From<Point> for [f32; 2] {
fn from(Point { x, y }: Point) -> Self {
[x, y]
}
}
impl Point {
pub fn new(x: f32, y: f32) -> Self {
Self { x, y }
}
pub fn coord(&self) -> [f32; 2] {
[self.x, self.y]
}
pub fn is_origin(&self) -> bool {
self.x == 0.0_f32 && self.y == 0.0_f32
}
pub fn distance_from(&self, other: &Point) -> f32 {
((self.x - other.x).powf(2.0) + (self.y - other.y).powf(2.0)).sqrt()
}
pub fn distance_from_origin(&self) -> f32 {
(self.x.powf(2.0) + self.y.powf(2.0)).sqrt()
}
pub fn sum(&self) -> f32 {
self.x + self.y
}
pub fn perpendicular_distance(&self, start: &Point, end: &Point) -> f32 {
let numerator = ((end.y - start.y) * self.x - (end.x - start.x) * self.y + end.x * start.y
- end.y * start.x)
.abs();
let denominator = ((end.y - start.y).powi(2) + (end.x - start.x).powi(2)).sqrt();
numerator / denominator
}
pub fn cross(&self, other: &Point) -> f32 {
self.x * other.y - self.y * other.x
}
}
#[cfg(test)]
mod tests_points {
use super::Point;
#[test]
fn new() {
let origin1 = Point::from((0.0f32, 0.0f32));
let origin2 = Point::from([0.0f32, 0.0f32]);
let origin3 = (0.0f32, 0.0f32).into();
let origin4 = [0.0f32, 0.0f32].into();
let origin5 = Point::new(1.0f32, 2.0f32);
let origin6 = Point {
x: 1.0f32,
y: 2.0f32,
};
assert_eq!(origin1, origin2);
assert_eq!(origin2, origin3);
assert_eq!(origin3, origin4);
assert_eq!(origin5, origin6);
assert!(origin1.is_origin());
assert!(origin2.is_origin());
assert!(origin3.is_origin());
assert!(origin4.is_origin());
assert!(!origin5.is_origin());
assert!(!origin6.is_origin());
}
#[test]
fn into_tuple_array() {
let point = Point::from((1.0, 2.0));
let tuple: (f32, f32) = point.into();
let array: [f32; 2] = point.into();
assert_eq!(tuple, (1.0, 2.0));
assert_eq!(array, [1.0, 2.0]);
}
}

View File

@ -1,239 +0,0 @@
use crate::{Point, Rect};
#[derive(Default, Debug, Clone, PartialEq)]
pub struct Polygon {
pub points: Vec<Point>,
}
impl From<Vec<Point>> for Polygon {
fn from(points: Vec<Point>) -> Self {
Self { points }
}
}
impl Polygon {
pub fn new() -> Self {
Self::default()
}
pub fn from_contour(contour: &imageproc::contours::Contour<i32>) -> Self {
let points = contour
.points
.iter()
.map(|p| Point::new(p.x as f32, p.y as f32))
.collect::<Vec<_>>();
Self { points }
}
pub fn to_imageproc_points(&self) -> Vec<imageproc::point::Point<i32>> {
self.points
.iter()
.map(|p| imageproc::point::Point::new(p.x as i32, p.y as i32))
.collect::<Vec<_>>()
}
pub fn from_imageproc_points(points: &[imageproc::point::Point<i32>]) -> Self {
let points = points
.iter()
.map(|p| Point::new(p.x as f32, p.y as f32))
.collect::<Vec<_>>();
Self { points }
}
pub fn with_points(mut self, points: &[Point]) {
self.points = points.to_vec();
}
pub fn area(&self) -> f32 {
// make sure points are already sorted
let mut area = 0.0;
let n = self.points.len();
for i in 0..n {
let j = (i + 1) % n;
area += self.points[i].x * self.points[j].y;
area -= self.points[j].x * self.points[i].y;
}
area.abs() / 2.0
}
pub fn center(&self) -> Point {
let rect = self.find_min_rect();
rect.center()
}
pub fn find_min_rect(&self) -> Rect {
let (mut min_x, mut min_y, mut max_x, mut max_y) = (f32::MAX, f32::MAX, f32::MIN, f32::MIN);
for point in self.points.iter() {
if point.x <= min_x {
min_x = point.x
}
if point.x > max_x {
max_x = point.x
}
if point.y <= min_y {
min_y = point.y
}
if point.y > max_y {
max_y = point.y
}
}
((min_x - 1.0, min_y - 1.0), (max_x + 1.0, max_y + 1.0)).into()
}
pub fn perimeter(&self) -> f32 {
let mut perimeter = 0.0;
let n = self.points.len();
for i in 0..n {
let j = (i + 1) % n;
perimeter += self.points[i].distance_from(&self.points[j]);
}
perimeter
}
pub fn offset(&self, delta: f32, width: f32, height: f32) -> Self {
let num_points = self.points.len();
let mut new_points = Vec::with_capacity(self.points.len());
for i in 0..num_points {
let prev_idx = if i == 0 { num_points - 1 } else { i - 1 };
let next_idx = (i + 1) % num_points;
let edge_vector = Point {
x: self.points[next_idx].x - self.points[prev_idx].x,
y: self.points[next_idx].y - self.points[prev_idx].y,
};
let normal_vector = Point {
x: -edge_vector.y,
y: edge_vector.x,
};
let normal_length = (normal_vector.x.powi(2) + normal_vector.y.powi(2)).sqrt();
if normal_length.abs() < 1e-6 {
new_points.push(self.points[i]);
} else {
let normalized_normal = Point {
x: normal_vector.x / normal_length,
y: normal_vector.y / normal_length,
};
let new_x = self.points[i].x + normalized_normal.x * delta;
let new_y = self.points[i].y + normalized_normal.y * delta;
let new_x = new_x.max(0.0).min(width);
let new_y = new_y.max(0.0).min(height);
new_points.push(Point { x: new_x, y: new_y });
}
}
Self { points: new_points }
}
pub fn resample(&self, num_samples: usize) -> Polygon {
let mut points = Vec::new();
for i in 0..self.points.len() {
let start_point = self.points[i];
let end_point = self.points[(i + 1) % self.points.len()];
points.push(start_point);
let dx = end_point.x - start_point.x;
let dy = end_point.y - start_point.y;
for j in 1..num_samples {
let t = (j as f32) / (num_samples as f32);
let new_x = start_point.x + t * dx;
let new_y = start_point.y + t * dy;
points.push(Point { x: new_x, y: new_y });
}
}
Self { points }
}
pub fn simplify(&self, epsilon: f32) -> Self {
let mask = self.rdp_iter(epsilon);
let points = self
.points
.iter()
.enumerate()
.filter_map(|(i, &point)| if mask[i] { Some(point) } else { None })
.collect();
Self { points }
}
#[allow(clippy::needless_range_loop)]
fn rdp_iter(&self, epsilon: f32) -> Vec<bool> {
let mut stk = Vec::new();
let mut indices = vec![true; self.points.len()];
stk.push((0, self.points.len() - 1));
while let Some((start_index, last_index)) = stk.pop() {
let mut dmax = 0.0;
let mut index = start_index;
for i in (start_index + 1)..last_index {
let d = self.points[i]
.perpendicular_distance(&self.points[start_index], &self.points[last_index]);
if d > dmax {
index = i;
dmax = d;
}
}
if dmax > epsilon {
stk.push((start_index, index));
stk.push((index, last_index));
} else {
for j in (start_index + 1)..last_index {
indices[j] = false;
}
}
}
indices
}
pub fn convex_hull(&self) -> Self {
let mut points = self.points.clone();
points.sort_by(|a, b| {
a.x.partial_cmp(&b.x)
.unwrap()
.then(a.y.partial_cmp(&b.y).unwrap())
});
let mut hull: Vec<Point> = Vec::new();
// Lower hull
for &point in &points {
while hull.len() >= 2 {
let last = hull.len() - 1;
let second_last = hull.len() - 2;
let vec_a = hull[last] - hull[second_last];
let vec_b = point - hull[second_last];
if vec_a.cross(&vec_b) <= 0.0 {
hull.pop();
} else {
break;
}
}
hull.push(point);
}
// Upper hull
let lower_hull_size = hull.len();
for &point in points.iter().rev().skip(1) {
while hull.len() > lower_hull_size {
let last = hull.len() - 1;
let second_last = hull.len() - 2;
let vec_a: Point = hull[last] - hull[second_last];
let vec_b = point - hull[second_last];
if vec_a.cross(&vec_b) <= 0.0 {
hull.pop();
} else {
break;
}
}
hull.push(point);
}
// Remove duplicate points
hull.dedup();
if hull.len() > 1 && hull.first() == hull.last() {
hull.pop();
}
Self { points: hull }
}
}

View File

@ -1,206 +0,0 @@
use crate::Point;
#[derive(Default, PartialOrd, PartialEq, Clone, Copy)]
pub struct Rect {
top_left: Point,
bottom_right: Point,
}
impl std::fmt::Debug for Rect {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Rectangle")
.field("xmin", &self.xmin())
.field("ymin", &self.ymin())
.field("xmax", &self.xmax())
.field("ymax", &self.ymax())
.finish()
}
}
impl<P: Into<Point>> From<(P, P)> for Rect {
fn from((top_left, bottom_right): (P, P)) -> Self {
Self {
top_left: top_left.into(),
bottom_right: bottom_right.into(),
}
}
}
impl<P: Into<Point>> From<[P; 2]> for Rect {
fn from([top_left, bottom_right]: [P; 2]) -> Self {
Self {
top_left: top_left.into(),
bottom_right: bottom_right.into(),
}
}
}
impl Rect {
pub fn new(top_left: Point, bottom_right: Point) -> Self {
Self {
top_left,
bottom_right,
}
}
pub fn from_xywh(x: f32, y: f32, w: f32, h: f32) -> Self {
Self {
top_left: Point::new(x, y),
bottom_right: Point::new(x + w, y + h),
}
}
pub fn from_xyxy(x1: f32, y1: f32, x2: f32, y2: f32) -> Self {
Self {
top_left: Point::new(x1, y1),
bottom_right: Point::new(x2, y2),
}
}
pub fn from_cxywh(cx: f32, cy: f32, w: f32, h: f32) -> Self {
Self {
top_left: Point::new(cx - w / 2.0, cy - h / 2.0),
bottom_right: Point::new(cx + w / 2.0, cy + h / 2.0),
}
}
pub fn width(&self) -> f32 {
(self.bottom_right - self.top_left).x
}
pub fn height(&self) -> f32 {
(self.bottom_right - self.top_left).y
}
pub fn xmin(&self) -> f32 {
self.top_left.x
}
pub fn ymin(&self) -> f32 {
self.top_left.y
}
pub fn xmax(&self) -> f32 {
self.bottom_right.x
}
pub fn ymax(&self) -> f32 {
self.bottom_right.y
}
pub fn cx(&self) -> f32 {
(self.bottom_right.x + self.top_left.x) / 2.0
}
pub fn cy(&self) -> f32 {
(self.bottom_right.y + self.top_left.y) / 2.0
}
pub fn tl(&self) -> Point {
self.top_left
}
pub fn br(&self) -> Point {
self.bottom_right
}
pub fn tr(&self) -> Point {
Point::new(self.bottom_right.x, self.top_left.y)
}
pub fn bl(&self) -> Point {
Point::new(self.top_left.x, self.bottom_right.y)
}
pub fn center(&self) -> Point {
(self.bottom_right + self.top_left) / 2.0
}
pub fn area(&self) -> f32 {
self.height() * self.width()
}
pub fn perimeter(&self) -> f32 {
(self.height() + self.width()) * 2.0
}
pub fn is_empty(&self) -> bool {
self.area() == 0.0
}
pub fn is_squre(&self) -> bool {
self.width() == self.height()
}
pub fn intersect(&self, other: &Rect) -> f32 {
let l = self.xmin().max(other.xmin());
let r = (self.xmin() + self.width()).min(other.xmin() + other.width());
let t = self.ymin().max(other.ymin());
let b = (self.ymin() + self.height()).min(other.ymin() + other.height());
(r - l).max(0.) * (b - t).max(0.)
}
pub fn union(&self, other: &Rect) -> f32 {
self.area() + other.area() - self.intersect(other)
}
pub fn iou(&self, other: &Rect) -> f32 {
self.intersect(other) / self.union(other)
}
pub fn contains(&self, other: &Rect) -> bool {
self.xmin() <= other.xmin()
&& self.xmax() >= other.xmax()
&& self.ymin() <= other.ymin()
&& self.ymax() >= other.ymax()
}
pub fn expand(&mut self, x: f32, y: f32, max_x: f32, max_y: f32) -> Self {
Self::from_xyxy(
(self.xmin() - x).max(0.0f32).min(max_x),
(self.ymin() - y).max(0.0f32).min(max_y),
(self.xmax() + x).max(0.0f32).min(max_x),
(self.ymax() + y).max(0.0f32).min(max_y),
)
}
}
#[cfg(test)]
mod tests {
use super::Rect;
use crate::Point;
#[test]
fn new() {
let rect1 = Rect {
top_left: Point {
x: 0.0f32,
y: 0.0f32,
},
bottom_right: Point {
x: 5.0f32,
y: 5.0f32,
},
};
let rect2 = Rect {
top_left: (0.0f32, 0.0f32).into(),
bottom_right: [5.0f32, 5.0f32].into(),
};
let rect3 = Rect::new([0.0, 0.0].into(), [5.0, 5.0].into());
let rect4: Rect = ((0.0, 0.0), (5.0, 5.0)).into();
let rect5: Rect = [(0.0, 0.0), (5.0, 5.0)].into();
let rect6: Rect = ([0.0, 0.0], [5.0, 5.0]).into();
let rect7: Rect = Rect::from(([0.0, 0.0], [5.0, 5.0]));
let rect8: Rect = Rect::from([[0.0, 0.0], [5.0, 5.0]]);
let rect9: Rect = Rect::from([(0.0, 0.0), (5.0, 5.0)]);
let rect10: Rect = Rect::from_xyxy(0.0, 0.0, 5.0, 5.0);
let rect11: Rect = Rect::from_xywh(0.0, 0.0, 5.0, 5.0);
assert_eq!(rect1, rect2);
assert_eq!(rect3, rect4);
assert_eq!(rect5, rect6);
assert_eq!(rect7, rect8);
assert_eq!(rect9, rect8);
assert_eq!(rect10, rect11);
}
}

View File

@ -1,155 +0,0 @@
use crate::Point;
#[derive(Default, PartialOrd, PartialEq, Clone, Copy)]
pub struct RotatedRect {
center: Point,
width: f32,
height: f32,
rotation: f32, // (0, 90) radians
}
impl std::fmt::Debug for RotatedRect {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("RotatedRectangle")
.field("height", &self.height)
.field("width", &self.width)
.field("center", &self.center)
.field("rotation", &self.rotation)
.field("vertices", &self.vertices())
.finish()
}
}
impl RotatedRect {
pub fn new(center: Point, width: f32, height: f32, rotation: f32) -> Self {
Self {
center,
width,
height,
rotation,
}
}
pub fn vertices(&self) -> [Point; 4] {
// [cos -sin]
// [sin cos]
let m = [
[
self.rotation.cos() * 0.5 * self.width,
-self.rotation.sin() * 0.5 * self.height,
],
[
self.rotation.sin() * 0.5 * self.width,
self.rotation.cos() * 0.5 * self.height,
],
];
let v1 = self.center + Point::new(m[0][0] + m[0][1], m[1][0] + m[1][1]);
let v2 = self.center + Point::new(m[0][0] - m[0][1], m[1][0] - m[1][1]);
let v3 = self.center * 2.0 - v1;
let v4 = self.center * 2.0 - v2;
[v1, v2, v3, v4]
}
pub fn height(&self) -> f32 {
self.height
}
pub fn width(&self) -> f32 {
self.width
}
pub fn center(&self) -> Point {
self.center
}
pub fn area(&self) -> f32 {
self.height * self.width
}
// pub fn contain_point(&self, point: Point) -> bool {
// // ray casting
// todo!()
// }
}
#[test]
fn test1() {
let pi = std::f32::consts::PI;
let rt = RotatedRect::new(
Point::new(0.0f32, 0.0f32),
2.0f32,
4.0f32,
pi / 180.0 * 90.0,
);
assert_eq!(
rt.vertices(),
[
Point {
x: -2.0,
y: 0.99999994,
},
Point {
x: 2.0,
y: 1.0000001,
},
Point {
x: 2.0,
y: -0.99999994,
},
Point {
x: -2.0,
y: -1.0000001,
},
]
);
}
#[test]
fn test2() {
let pi = std::f32::consts::PI;
let rt = RotatedRect::new(
Point::new(0.0f32, 0.0f32),
2.0f32.sqrt(),
2.0f32.sqrt(),
pi / 180.0 * 45.0,
);
assert_eq!(
rt.vertices(),
[
Point {
x: 0.0,
y: 0.99999994
},
Point {
x: 0.99999994,
y: 0.0
},
Point {
x: 0.0,
y: -0.99999994
},
Point {
x: -0.99999994,
y: 0.0
},
]
);
}
// #[test]
// fn contain_point() {
// let pi = std::f32::consts::PI;
// let rt = RotatedRect::new(
// Point::new(0.0f32, 0.0f32),
// 1.0f32.sqrt(),
// 1.0f32.sqrt(),
// pi / 180.0 * 45.0,
// );
// assert!(rt.contain_point(Point::new(0.0, 0.0)));
// assert!(rt.contain_point(Point::new(0.5, 0.0)));
// assert!(rt.contain_point(Point::new(0.0, 0.5)));
// }

View File

@ -1,79 +0,0 @@
use crate::{Bbox, Embedding, Keypoint, Mask};
#[derive(Clone, PartialEq, Default)]
pub struct Ys {
// Results for each frame
pub probs: Option<Embedding>,
pub bboxes: Option<Vec<Bbox>>,
pub keypoints: Option<Vec<Vec<Keypoint>>>,
pub masks: Option<Vec<Mask>>,
}
impl std::fmt::Debug for Ys {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Results")
.field("Probabilities", &self.probs)
.field("BoundingBoxes", &self.bboxes)
.field("Keypoints", &self.keypoints)
.field("Masks", &self.masks)
.finish()
}
}
impl Ys {
pub fn with_probs(mut self, probs: Embedding) -> Self {
self.probs = Some(probs);
self
}
pub fn with_bboxes(mut self, bboxes: &[Bbox]) -> Self {
self.bboxes = Some(bboxes.to_vec());
self
}
pub fn with_keypoints(mut self, keypoints: &[Vec<Keypoint>]) -> Self {
self.keypoints = Some(keypoints.to_vec());
self
}
pub fn with_masks(mut self, masks: &[Mask]) -> Self {
self.masks = Some(masks.to_vec());
self
}
pub fn probs(&self) -> Option<&Embedding> {
self.probs.as_ref()
}
pub fn keypoints(&self) -> Option<&Vec<Vec<Keypoint>>> {
self.keypoints.as_ref()
}
pub fn masks(&self) -> Option<&Vec<Mask>> {
self.masks.as_ref()
}
pub fn bboxes(&self) -> Option<&Vec<Bbox>> {
self.bboxes.as_ref()
}
pub fn non_max_suppression(xs: &mut Vec<Bbox>, iou_threshold: f32) {
xs.sort_by(|b1, b2| b2.confidence().partial_cmp(&b1.confidence()).unwrap());
let mut current_index = 0;
for index in 0..xs.len() {
let mut drop = false;
for prev_index in 0..current_index {
let iou = xs[prev_index].iou(&xs[index]);
if iou > iou_threshold {
drop = true;
break;
}
}
if !drop {
xs.swap(current_index, index);
current_index += 1;
}
}
xs.truncate(current_index);
}
}

View File

@ -1,8 +1,8 @@
mod core;
pub mod models;
pub use core::*;
mod utils;
mod ys;
const GITHUB_ASSETS: &str = "https://github.com/jamjamjon/assets/releases/download/v0.0.1";
const CHECK_MARK: &str = "";
const CROSS_MARK: &str = "";
const SAFE_CROSS_MARK: &str = "";
pub use core::*;
pub use utils::*;
pub use ys::*;

View File

@ -4,7 +4,7 @@ use ndarray::{s, Array, Axis, IxDyn};
use std::io::Write;
use tokenizers::Tokenizer;
use crate::{ops, LogitsSampler, MinOptMax, Options, OrtEngine, TokenizerStream};
use crate::{ops, Embedding, LogitsSampler, MinOptMax, Options, OrtEngine, TokenizerStream, Y};
#[derive(Debug)]
pub struct Blip {
@ -42,7 +42,7 @@ impl Blip {
})
}
pub fn encode_images(&self, xs: &[DynamicImage]) -> Result<Array<f32, IxDyn>> {
pub fn encode_images(&self, xs: &[DynamicImage]) -> Result<Embedding> {
let xs_ = ops::resize(xs, self.height.opt as u32, self.width.opt as u32)?;
let xs_ = ops::normalize(xs_, 0.0, 255.0);
let xs_ = ops::standardize(
@ -51,24 +51,31 @@ impl Blip {
&[0.26862954, 0.2613026, 0.2757771],
);
let ys: Vec<Array<f32, IxDyn>> = self.visual.run(&[xs_])?;
let ys = ys[0].to_owned();
Ok(ys)
// let ys = ys[0].to_owned();
Ok(Embedding::new(ys[0].to_owned()))
// Ok(ys)
}
pub fn caption(&mut self, path: &str, prompt: Option<&str>) -> Result<()> {
// this demo use batch_size=1
let x = image::io::Reader::open(path)?.decode()?;
let image_embeds = self.encode_images(&[x])?;
pub fn caption(
&mut self,
x: &[DynamicImage],
prompt: Option<&str>,
show: bool,
) -> Result<Vec<Y>> {
let mut ys: Vec<Y> = Vec::new();
let image_embeds = self.encode_images(x)?;
let image_embeds_attn_mask: Array<f32, IxDyn> =
Array::ones((1, image_embeds.shape()[1])).into_dyn();
Array::ones((1, image_embeds.embedding().shape()[1])).into_dyn();
let mut y_text = String::new();
// conditional
let mut input_ids = match prompt {
None => {
print!("[Unconditional]: ");
if show {
print!("[Unconditional]: ");
}
vec![0.0f32]
}
Some(prompt) => {
let encodings = self.tokenizer.tokenizer().encode(prompt, false);
let ids: Vec<f32> = encodings
@ -77,7 +84,10 @@ impl Blip {
.iter()
.map(|x| *x as f32)
.collect();
print!("[Conditional]: {} ", prompt);
if show {
print!("[Conditional]: {} ", prompt);
}
y_text.push_str(&format!("{} ", prompt));
ids
}
};
@ -91,7 +101,7 @@ impl Blip {
let y = self.textual.run(&[
input_ids_nd,
input_ids_attn_mask,
image_embeds.to_owned(),
image_embeds.embedding().to_owned(),
image_embeds_attn_mask.to_owned(),
])?; // N, length, vocab_size
let y = y[0].slice(s!(0, -1.., ..));
@ -106,16 +116,20 @@ impl Blip {
// streaming generation
if let Some(t) = self.tokenizer.next_token(token_id as u32)? {
print!("{t}");
y_text.push_str(&t);
if show {
print!("{t}");
// std::thread::sleep(std::time::Duration::from_millis(5));
}
std::io::stdout().flush()?;
}
// sleep for test
std::thread::sleep(std::time::Duration::from_millis(5));
}
println!();
if show {
println!();
}
self.tokenizer.clear();
Ok(())
ys.push(Y::default().with_texts(&[y_text]));
Ok(ys)
}
pub fn batch_visual(&self) -> usize {

View File

@ -1,7 +1,7 @@
use crate::{ops, MinOptMax, Options, OrtEngine};
use crate::{ops, Embedding, MinOptMax, Options, OrtEngine};
use anyhow::Result;
use image::DynamicImage;
use ndarray::{Array, Array2, Axis, IxDyn};
use ndarray::{Array, Array2, IxDyn};
use tokenizers::{PaddingDirection, PaddingParams, PaddingStrategy, Tokenizer};
#[derive(Debug)]
@ -52,7 +52,7 @@ impl Clip {
})
}
pub fn encode_images(&self, xs: &[DynamicImage]) -> Result<Array<f32, IxDyn>> {
pub fn encode_images(&self, xs: &[DynamicImage]) -> Result<Embedding> {
let xs_ = ops::resize(xs, self.height.opt as u32, self.width.opt as u32)?;
let xs_ = ops::normalize(xs_, 0.0, 255.0);
let xs_ = ops::standardize(
@ -61,11 +61,10 @@ impl Clip {
&[0.26862954, 0.2613026, 0.2757771],
);
let ys: Vec<Array<f32, IxDyn>> = self.visual.run(&[xs_])?;
let ys = ys[0].to_owned();
Ok(ys)
Ok(Embedding::new(ys[0].to_owned()))
}
pub fn encode_texts(&self, texts: &[String]) -> Result<Array<f32, IxDyn>> {
pub fn encode_texts(&self, texts: &[String]) -> Result<Embedding> {
let encodings = self
.tokenizer
.encode_batch(texts.to_owned(), false)
@ -76,23 +75,7 @@ impl Clip {
.collect();
let xs = Array2::from_shape_vec((texts.len(), self.context_length), xs)?.into_dyn();
let ys = self.textual.run(&[xs])?;
let ys = ys[0].to_owned();
Ok(ys)
}
pub fn get_similarity(
&self,
images_feats: &Array<f32, IxDyn>,
texts_feats: &Array<f32, IxDyn>,
) -> Result<Vec<Vec<f32>>> {
let images_feats = images_feats.clone().into_dimensionality::<ndarray::Ix2>()?;
let texts_feats = texts_feats.clone().into_dimensionality::<ndarray::Ix2>()?;
let matrix = images_feats.dot(&texts_feats.t()); // [M, N]
let exps = matrix.mapv(|x| x.exp()); //[M, N]
let stds = exps.sum_axis(Axis(1)); //[M, 1]
let matrix = exps / stds.insert_axis(Axis(1)); // [M, N]
let similarity: Vec<Vec<f32>> = matrix.axis_iter(Axis(0)).map(|row| row.to_vec()).collect();
Ok(similarity)
Ok(Embedding::new(ys[0].to_owned()))
}
pub fn batch_visual(&self) -> usize {

View File

@ -1,6 +1,6 @@
use crate::{ops, Bbox, DynConf, Mask, MinOptMax, Options, OrtEngine, Polygon, Ys};
use crate::{ops, DynConf, Mask, Mbr, MinOptMax, Options, OrtEngine, Y};
use anyhow::Result;
use image::{DynamicImage, ImageBuffer};
use image::DynamicImage;
use ndarray::{Array, Axis, IxDyn};
#[derive(Debug)]
@ -44,19 +44,20 @@ impl DB {
})
}
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Ys>> {
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Y>> {
let xs_ = ops::letterbox(xs, self.height.opt as u32, self.width.opt as u32, 144.0)?;
let xs_ = ops::normalize(xs_, 0.0, 255.0);
let xs_ = ops::standardize(xs_, &[0.485, 0.456, 0.406], &[0.229, 0.224, 0.225]);
let ys = self.engine.run(&[xs_])?;
let ys = self.postprocess(ys, xs)?;
Ok(ys)
self.postprocess(ys, xs)
}
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Ys>> {
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Y>> {
let mut ys = Vec::new();
for (idx, luma) in xs[0].axis_iter(Axis(0)).enumerate() {
let mut y_bbox = Vec::new();
let mut y_masks: Vec<Mask> = Vec::new();
let mut y_mbrs: Vec<Mbr> = Vec::new();
// reshape
let h = luma.dim()[1];
@ -64,15 +65,13 @@ impl DB {
let luma = luma.into_shape((h, w, 1))?.into_owned();
// build image from ndarray
let raw_vec = luma
let v = luma
.into_raw_vec()
.iter()
.map(|x| if x <= &self.binary_thresh { 0.0 } else { *x })
.collect::<Vec<_>>();
let mask_im: ImageBuffer<image::Luma<_>, Vec<f32>> =
ImageBuffer::from_raw(w as u32, h as u32, raw_vec)
.expect("Faild to create image from ndarray");
let mut mask_im = image::DynamicImage::from(mask_im);
let mut mask_im =
ops::build_dyn_image_from_raw(v, self.height() as u32, self.width() as u32);
// input image
let image_width = xs0[idx].width() as f32;
@ -94,37 +93,45 @@ impl DB {
imageproc::contours::find_contours_with_threshold(&mask_im, 1);
// loop
let mut y_masks: Vec<Mask> = Vec::new();
for contour in contours.iter() {
if contour.points.len() <= 1 {
if contour.border_type == imageproc::contours::BorderType::Hole
&& contour.points.len() <= 2
{
continue;
}
let polygon = Polygon::from_imageproc_points(&contour.points);
let perimeter = polygon.perimeter();
let delta = polygon.area() * ratio.round() * self.unclip_ratio / perimeter;
let polygon = polygon
// .simplify(6e-4 * perimeter)
.offset(delta, image_width, image_height)
let mask = Mask::default().with_points_imageproc(&contour.points);
let delta = mask.area() * ratio.round() as f64 * self.unclip_ratio as f64
/ mask.perimeter();
let mask = mask
.unclip(delta, image_width as f64, image_height as f64)
.resample(50)
// .simplify(6e-4)
.convex_hull();
let rect = polygon.find_min_rect();
if rect.height() < self.min_height || rect.width() < self.min_width {
continue;
}
let confidence = polygon.area() / rect.area();
if confidence < self.confs[0] {
continue;
}
y_bbox.push(Bbox::new(rect, 0, confidence, None));
y_masks.push(Mask {
polygon,
id: 0,
name: None,
});
}
ys.push(Ys::default().with_bboxes(&y_bbox).with_masks(&y_masks));
}
if let Some(bbox) = mask.bbox() {
if bbox.height() < self.min_height || bbox.width() < self.min_width {
continue;
}
let confidence = mask.area() as f32 / bbox.area();
if confidence < self.confs[0] {
continue;
}
y_bbox.push(bbox.with_confidence(confidence).with_id(0));
if let Some(mbr) = mask.mbr() {
y_mbrs.push(mbr.with_confidence(confidence).with_id(0));
}
y_masks.push(mask.with_id(0));
} else {
continue;
}
}
ys.push(
Y::default()
.with_bboxes(&y_bbox)
.with_masks(&y_masks)
.with_mbrs(&y_mbrs),
);
}
Ok(ys)
}

View File

@ -15,5 +15,5 @@ pub use dinov2::Dinov2;
pub use rtdetr::RTDETR;
pub use rtmo::RTMO;
pub use svtr::SVTR;
pub use yolo::YOLO;
pub use yolo::{YOLOTask, YOLO};
pub use yolop::YOLOPv2;

View File

@ -3,7 +3,7 @@ use image::DynamicImage;
use ndarray::{s, Array, Axis, IxDyn};
use regex::Regex;
use crate::{ops, Bbox, DynConf, MinOptMax, Options, OrtEngine, Rect, Ys};
use crate::{ops, Bbox, DynConf, MinOptMax, Options, OrtEngine, Y};
#[derive(Debug)]
pub struct RTDETR {
@ -55,15 +55,14 @@ impl RTDETR {
})
}
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Ys>> {
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Y>> {
let xs_ = ops::letterbox(xs, self.height() as u32, self.width() as u32, 144.0)?;
let xs_ = ops::normalize(xs_, 0.0, 255.0);
let ys = self.engine.run(&[xs_])?;
let ys = self.postprocess(ys, xs)?;
Ok(ys)
self.postprocess(ys, xs)
}
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Ys>> {
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Y>> {
const CXYWH_OFFSET: usize = 4; // cxcywh
let preds = &xs[0];
@ -98,20 +97,20 @@ impl RTDETR {
let y = (bbox[1] - bbox[3] / 2.) * self.height() as f32 / ratio;
let w = bbox[2] * self.width() as f32 / ratio;
let h = bbox[3] * self.height() as f32 / ratio;
let y_bbox = Bbox::new(
Rect::from_xywh(
x.max(0.0f32).min(width_original),
y.max(0.0f32).min(height_original),
w,
h,
),
id,
confidence,
self.names.as_ref().map(|names| names[id].clone()),
);
y_bboxes.push(y_bbox)
y_bboxes.push(
Bbox::default()
.with_xywh(
x.max(0.0f32).min(width_original),
y.max(0.0f32).min(height_original),
w,
h,
)
.with_confidence(confidence)
.with_id(id as isize)
.with_name(self.names.as_ref().map(|names| names[id].to_owned())),
)
}
ys.push(Ys::default().with_bboxes(&y_bboxes));
ys.push(Y::default().with_bboxes(&y_bboxes));
}
Ok(ys)
}

View File

@ -2,7 +2,7 @@ use anyhow::Result;
use image::DynamicImage;
use ndarray::{Array, Axis, IxDyn};
use crate::{ops, Bbox, DynConf, Keypoint, MinOptMax, Options, OrtEngine, Ys};
use crate::{ops, Bbox, DynConf, Keypoint, MinOptMax, Options, OrtEngine, Y};
#[derive(Debug)]
pub struct RTMO {
@ -38,15 +38,14 @@ impl RTMO {
})
}
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Ys>> {
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Y>> {
let xs_ = ops::letterbox(xs, self.height() as u32, self.width() as u32, 114.0)?;
let ys = self.engine.run(&[xs_])?;
let ys = self.postprocess(ys, xs)?;
Ok(ys)
self.postprocess(ys, xs)
}
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Ys>> {
let mut ys: Vec<Ys> = Vec::new();
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Y>> {
let mut ys: Vec<Y> = Vec::new();
let (preds_bboxes, preds_kpts) = if xs[0].ndim() == 3 {
(&xs[0], &xs[1])
} else {
@ -78,20 +77,18 @@ impl RTMO {
if confidence < self.confs[0] {
continue;
}
let y_bbox = Bbox::new(
(
(
y_bboxes.push(
Bbox::default()
.with_xyxy(
x1.max(0.0f32).min(width_original),
y1.max(0.0f32).min(height_original),
),
(x2, y2),
)
.into(),
0,
confidence,
Some(String::from("Person")),
x2,
y2,
)
.with_confidence(confidence)
.with_id(0isize)
.with_name(Some(String::from("Person"))),
);
y_bboxes.push(y_bbox);
// keypoints
let mut kpts_ = Vec::new();
@ -102,21 +99,20 @@ impl RTMO {
if c < self.kconfs[i] {
kpts_.push(Keypoint::default());
} else {
kpts_.push(Keypoint::new(
(
x.max(0.0f32).min(width_original),
y.max(0.0f32).min(height_original),
)
.into(),
c,
i as isize,
None, // Name
));
kpts_.push(
Keypoint::default()
.with_id(i as isize)
.with_confidence(c)
.with_xy(
x.max(0.0f32).min(width_original),
y.max(0.0f32).min(height_original),
),
);
}
}
y_kpts.push(kpts_);
}
ys.push(Ys::default().with_bboxes(&y_bboxes).with_keypoints(&y_kpts));
ys.push(Y::default().with_bboxes(&y_bboxes).with_keypoints(&y_kpts));
}
Ok(ys)
}

View File

@ -1,8 +1,9 @@
use crate::{ops, DynConf, MinOptMax, Options, OrtEngine};
use anyhow::Result;
use image::DynamicImage;
use ndarray::{Array, Axis, IxDyn};
use crate::{ops, DynConf, MinOptMax, Options, OrtEngine, Y};
#[derive(Debug)]
pub struct SVTR {
engine: OrtEngine,
@ -41,18 +42,17 @@ impl SVTR {
})
}
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<String>> {
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Y>> {
let xs_ =
ops::resize_with_fixed_height(xs, self.height.opt as u32, self.width.opt as u32, 0.0)?;
let xs_ = ops::normalize(xs_, 0.0, 255.0);
let ys: Vec<Array<f32, IxDyn>> = self.engine.run(&[xs_])?;
let ys = ys[0].to_owned();
self.postprocess(&ys)
}
pub fn postprocess(&self, output: &Array<f32, IxDyn>) -> Result<Vec<String>> {
let mut texts: Vec<String> = Vec::new();
pub fn postprocess(&self, output: &Array<f32, IxDyn>) -> Result<Vec<Y>> {
let mut ys: Vec<Y> = Vec::new();
for batch in output.axis_iter(Axis(0)) {
let preds = batch
.axis_iter(Axis(0))
@ -72,7 +72,6 @@ impl SVTR {
}
if idx == 0 || idx == self.vocab.len() - 1 {
text_ids.push(*text_id);
return text_ids;
}
@ -85,9 +84,9 @@ impl SVTR {
.map(|idx| self.vocab[idx].to_owned())
.collect::<String>();
texts.push(text);
ys.push(Y::default().with_texts(&[text]))
}
Ok(texts)
Ok(ys)
}
}

View File

@ -4,20 +4,18 @@ use image::DynamicImage;
use ndarray::{s, Array, Axis, IxDyn};
use regex::Regex;
use crate::{
ops, Bbox, DynConf, Embedding, Keypoint, Mask, MinOptMax, Options, OrtEngine, Point, Rect, Ys,
};
use crate::{ops, Bbox, DynConf, Keypoint, Mask, Mbr, MinOptMax, Options, OrtEngine, Prob, Y};
const CXYWH_OFFSET: usize = 4;
const KPT_STEP: usize = 3;
#[derive(Debug, Clone, ValueEnum)]
enum YOLOTask {
pub enum YOLOTask {
Classify,
Detect,
Pose,
Segment,
Obb, // TODO
Obb,
}
#[derive(Debug)]
@ -37,6 +35,8 @@ pub struct YOLO {
names_kpt: Option<Vec<String>>,
apply_nms: bool,
anchors_first: bool,
conf_independent: bool,
apply_probs_softmax: bool,
}
impl YOLO {
@ -47,16 +47,21 @@ impl YOLO {
engine.height().to_owned(),
engine.width().to_owned(),
);
let task = match engine
.try_fetch("task")
.unwrap_or("detect".to_string())
.as_str()
{
"classify" => YOLOTask::Classify,
"detect" => YOLOTask::Detect,
"pose" => YOLOTask::Pose,
"segment" => YOLOTask::Segment,
x => todo!("{:?} is not supported for now!", x),
let task = match &options.yolo_task {
Some(task) => task.to_owned(),
None => match engine
.try_fetch("task")
.unwrap_or("detect".to_string())
.as_str()
{
"classify" => YOLOTask::Classify,
"detect" => YOLOTask::Detect,
"pose" => YOLOTask::Pose,
"segment" => YOLOTask::Segment,
"obb" => YOLOTask::Obb,
x => todo!("{:?} is not supported for now!", x),
},
};
// try from custom class names, and then model metadata
@ -119,219 +124,275 @@ impl YOLO {
names,
names_kpt,
anchors_first: options.anchors_first,
conf_independent: options.conf_independent,
apply_probs_softmax: options.apply_probs_softmax,
})
}
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Ys>> {
let xs_ = ops::letterbox(xs, self.height() as u32, self.width() as u32, 144.0)?;
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Y>> {
let xs_ = match self.task {
YOLOTask::Classify => ops::resize(xs, self.height() as u32, self.width() as u32)?,
_ => ops::letterbox(xs, self.height() as u32, self.width() as u32, 114.0)?,
};
let xs_ = ops::normalize(xs_, 0.0, 255.0);
let ys = self.engine.run(&[xs_])?;
let ys = self.postprocess(ys, xs)?;
Ok(ys)
self.postprocess(ys, xs)
}
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Ys>> {
if let YOLOTask::Classify = self.task {
let mut ys = Vec::new();
for batch in xs[0].axis_iter(Axis(0)) {
ys.push(
Ys::default()
.with_probs(Embedding::new(batch.into_owned(), self.names.to_owned())),
);
}
Ok(ys)
} else {
let (preds, protos) = if xs.len() == 2 {
if xs[0].ndim() == 3 {
(&xs[0], Some(&xs[1]))
} else {
(&xs[1], Some(&xs[0]))
}
} else {
(&xs[0], None)
};
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Y>> {
let mut ys = Vec::new();
let protos = if xs.len() == 2 { Some(&xs[1]) } else { None };
for (idx, preds) in xs[0].axis_iter(Axis(0)).enumerate() {
let image_width = xs0[idx].width() as f32;
let image_height = xs0[idx].height() as f32;
let mut ys = Vec::new();
for (idx, anchor) in preds.axis_iter(Axis(0)).enumerate() {
let width_original = xs0[idx].width() as f32;
let height_original = xs0[idx].height() as f32;
let ratio = (self.width() as f32 / width_original)
.min(self.height() as f32 / height_original);
#[allow(clippy::type_complexity)]
let mut data: Vec<(Bbox, Option<Vec<Keypoint>>, Option<Vec<f32>>)> = Vec::new();
for pred in anchor.axis_iter(if self.anchors_first { Axis(0) } else { Axis(1) }) {
// split preds for different tasks
let bbox = pred.slice(s![0..CXYWH_OFFSET]);
let clss = pred.slice(s![CXYWH_OFFSET..CXYWH_OFFSET + self.nc]);
let kpts = {
if let YOLOTask::Pose = self.task {
Some(pred.slice(s![pred.len() - KPT_STEP * self.nk..]))
} else {
None
}
};
let coefs = {
if let YOLOTask::Segment = self.task {
Some(pred.slice(s![pred.len() - self.nm..]).to_vec())
} else {
None
}
// decode
match self.task {
YOLOTask::Classify => {
let y = if self.apply_probs_softmax {
let exps = preds.mapv(|x| x.exp());
let stds = exps.sum_axis(Axis(0));
exps / stds
} else {
preds.into_owned()
};
// confidence and index
let (id, &confidence) = clss
.into_iter()
.enumerate()
.reduce(|max, x| if x.1 > max.1 { x } else { max })
.unwrap();
// confidence filter
if confidence < self.confs[id] {
continue;
}
// bbox re-scale
let cx = bbox[0] / ratio;
let cy = bbox[1] / ratio;
let w = bbox[2] / ratio;
let h = bbox[3] / ratio;
let x = cx - w / 2.;
let y = cy - h / 2.;
let y_bbox = Bbox::new(
Rect::from_xywh(
x.max(0.0f32).min(width_original),
y.max(0.0f32).min(height_original),
w,
h,
ys.push(
Y::default().with_probs(
Prob::default()
.with_probs(&y.into_raw_vec())
.with_names(self.names.to_owned()),
),
id,
confidence,
self.names.as_ref().map(|names| names[id].to_owned()),
);
}
YOLOTask::Obb => {
let mut y_mbrs: Vec<Mbr> = Vec::new();
let ratio = (self.width() as f32 / image_width)
.min(self.height() as f32 / image_height);
for pred in preds.axis_iter(if self.anchors_first { Axis(0) } else { Axis(1) })
{
// xywhclsr
let xywh = pred.slice(s![0..CXYWH_OFFSET]);
let clss = pred.slice(s![CXYWH_OFFSET..CXYWH_OFFSET + self.nc]);
let radians = pred[pred.len() - 1];
let (id, &confidence) = clss
.into_iter()
.enumerate()
.max_by(|a, b| a.1.total_cmp(b.1))
.unwrap();
if confidence < self.confs[id] {
continue;
}
// kpts
let y_kpts = {
if let Some(kpts) = kpts {
let mut kpts_ = Vec::new();
for i in 0..self.nk {
let kx = kpts[KPT_STEP * i] / ratio;
let ky = kpts[KPT_STEP * i + 1] / ratio;
let kconf = kpts[KPT_STEP * i + 2];
if kconf < self.kconfs[i] {
kpts_.push(Keypoint::default());
} else {
kpts_.push(Keypoint::new(
Point::new(
kx.max(0.0f32).min(width_original),
ky.max(0.0f32).min(height_original),
),
kconf,
i as isize,
self.names_kpt.as_ref().map(|names| names[i].to_owned()),
));
}
}
Some(kpts_)
// re-scale
let cx = xywh[0] / ratio;
let cy = xywh[1] / ratio;
let w = xywh[2] / ratio;
let h = xywh[3] / ratio;
let (w, h, radians) = if w > h {
(w, h, radians)
} else {
None
(h, w, radians + std::f32::consts::PI / 2.)
};
let radians = radians % std::f32::consts::PI;
y_mbrs.push(
Mbr::from_cxcywhr(
cx as f64,
cy as f64,
w as f64,
h as f64,
radians as f64,
)
.with_confidence(confidence)
.with_id(id as isize)
.with_name(self.names.as_ref().map(|names| names[id].to_owned())),
);
}
ys.push(Y::default().with_mbrs(&y_mbrs).apply_mbrs_nms(self.iou));
}
_ => {
let mut y_bboxes: Vec<Bbox> = Vec::new();
let ratio = (self.width() as f32 / image_width)
.min(self.height() as f32 / image_height);
// bboxes
for (i, pred) in preds
.axis_iter(if self.anchors_first { Axis(0) } else { Axis(1) })
.enumerate()
{
let bbox = pred.slice(s![0..CXYWH_OFFSET]);
let (conf_, clss) = if self.conf_independent {
(
pred[CXYWH_OFFSET],
pred.slice(s![CXYWH_OFFSET + 1..CXYWH_OFFSET + self.nc + 1]),
)
} else {
(1.0, pred.slice(s![CXYWH_OFFSET..CXYWH_OFFSET + self.nc]))
};
let (id, &confidence) = clss
.into_iter()
.enumerate()
.max_by(|a, b| a.1.total_cmp(b.1))
.unwrap();
let confidence = confidence * conf_;
if confidence < self.confs[id] {
continue;
}
};
// merged
data.push((y_bbox, y_kpts, coefs));
}
// nms
if self.apply_nms {
Self::non_max_suppression(&mut data, self.iou);
}
// decode
let mut y_bboxes: Vec<Bbox> = Vec::new();
let mut y_kpts: Vec<Vec<Keypoint>> = Vec::new();
let mut y_masks: Vec<Mask> = Vec::new();
for elem in data.into_iter() {
if let Some(kpts) = elem.1 {
y_kpts.push(kpts)
// re-scale
let cx = bbox[0] / ratio;
let cy = bbox[1] / ratio;
let w = bbox[2] / ratio;
let h = bbox[3] / ratio;
let x = cx - w / 2.;
let y = cy - h / 2.;
let x = x.max(0.0).min(image_width);
let y = y.max(0.0).min(image_height);
let y_bbox = Bbox::default()
.with_xywh(x, y, w, h)
.with_confidence(confidence)
.with_id(id as isize)
.with_id_born(i as isize)
.with_name(self.names.as_ref().map(|names| names[id].to_owned()));
y_bboxes.push(y_bbox);
}
// decode masks
if let Some(coefs) = elem.2 {
let proto = protos.unwrap().slice(s![idx, .., .., ..]);
let (nm, nh, nw) = proto.dim();
// nms
let mut y = Y::default().with_bboxes(&y_bboxes);
if self.apply_nms {
y = y.apply_bboxes_nms(self.iou);
}
// coefs * proto -> mask
let coefs = Array::from_shape_vec((1, nm), coefs)?; // (n, nm)
let proto = proto.to_owned().into_shape((nm, nh * nw))?; // (nm, nh*nw)
let mask = coefs.dot(&proto).into_shape((nh, nw, 1))?; // (nh, nw, n)
// keypoints
if let YOLOTask::Pose = self.task {
if let Some(bboxes) = y.bboxes() {
let mut y_kpts: Vec<Vec<Keypoint>> = Vec::new();
for bbox in bboxes.iter() {
let pred = if self.anchors_first {
preds.slice(s![
bbox.id_born(),
preds.shape()[1] - KPT_STEP * self.nk..,
])
} else {
preds.slice(s![
preds.shape()[0] - KPT_STEP * self.nk..,
bbox.id_born(),
])
};
// build image from ndarray
let mask_im = ops::build_dyn_image_from_raw(
mask.into_raw_vec(),
nw as u32,
nh as u32,
);
// rescale masks
let mask_original = ops::descale_mask(
mask_im,
nw as f32,
nh as f32,
width_original,
height_original,
);
// crop mask with bbox
let mut mask_original = mask_original.into_luma8();
for y in 0..height_original as usize {
for x in 0..width_original as usize {
if x < elem.0.xmin() as usize
|| x > elem.0.xmax() as usize
|| y < elem.0.ymin() as usize
|| y > elem.0.ymax() as usize
{
mask_original.put_pixel(x as u32, y as u32, image::Luma([0u8]));
let mut kpts_: Vec<Keypoint> = Vec::new();
for i in 0..self.nk {
let kx = pred[KPT_STEP * i] / ratio;
let ky = pred[KPT_STEP * i + 1] / ratio;
let kconf = pred[KPT_STEP * i + 2];
if kconf < self.kconfs[i] {
kpts_.push(Keypoint::default());
} else {
kpts_.push(
Keypoint::default()
.with_id(i as isize)
.with_confidence(kconf)
.with_name(
self.names_kpt
.as_ref()
.map(|names| names[i].to_owned()),
)
.with_xy(
kx.max(0.0f32).min(image_width),
ky.max(0.0f32).min(image_height),
),
);
}
}
y_kpts.push(kpts_);
}
y = y.with_keypoints(&y_kpts);
}
// get masks from image
let masks = ops::get_masks_from_image(
mask_original,
1,
elem.0.id(),
elem.0.name().cloned(),
);
y_masks.extend(masks);
}
y_bboxes.push(elem.0);
// masks
if let YOLOTask::Segment = self.task {
if let Some(bboxes) = y.bboxes() {
let mut y_masks: Vec<Mask> = Vec::new();
for bbox in bboxes.iter() {
let coefs = if self.anchors_first {
preds
.slice(s![bbox.id_born(), preds.shape()[1] - self.nm..])
.to_vec()
} else {
preds
.slice(s![preds.shape()[0] - self.nm.., bbox.id_born()])
.to_vec()
};
let proto = protos.unwrap().slice(s![idx, .., .., ..]);
// coefs * proto -> mask
let (nm, nh, nw) = proto.dim();
let coefs = Array::from_shape_vec((1, nm), coefs)?; // (n, nm)
let proto = proto.to_owned().into_shape((nm, nh * nw))?; // (nm, nh*nw)
let mask = coefs.dot(&proto).into_shape((nh, nw, 1))?; // (nh, nw, n)
// build image from ndarray
let mask_im = ops::build_dyn_image_from_raw(
mask.into_raw_vec(),
nw as u32,
nh as u32,
);
// rescale masks
let mask_original = ops::descale_mask(
mask_im,
nw as f32,
nh as f32,
image_width,
image_height,
);
// crop mask with bbox
let mut mask_original = mask_original.into_luma8();
for y in 0..image_height as usize {
for x in 0..image_width as usize {
if x < bbox.xmin() as usize
|| x > bbox.xmax() as usize
|| y < bbox.ymin() as usize
|| y > bbox.ymax() as usize
{
mask_original.put_pixel(
x as u32,
y as u32,
image::Luma([0u8]),
);
}
}
}
// get masks from image
let mut masks: Vec<Mask> = Vec::new();
let contours: Vec<imageproc::contours::Contour<i32>> =
imageproc::contours::find_contours_with_threshold(
&mask_original,
1,
);
contours.iter().for_each(|contour| {
if contour.points.len() > 2 {
masks.push(
Mask::default()
.with_id(bbox.id())
.with_points_imageproc(&contour.points)
.with_name(bbox.name().cloned()),
);
}
});
y_masks.extend(masks);
}
y = y.with_masks(&y_masks);
}
}
ys.push(y);
}
// save result
ys.push(
Ys::default()
.with_bboxes(&y_bboxes)
.with_keypoints(&y_kpts)
.with_masks(&y_masks),
);
}
Ok(ys)
}
}
fn fetch_names(engine: &OrtEngine) -> Option<Vec<String>> {
// fetch class names from onnx metadata
// String format: `{0: 'person', 1: 'bicycle', 2: 'sports ball', ..., 27: "yellow_lady's_slipper"}`
engine.try_fetch("names").map(|names| {
let re = Regex::new(r#"(['"])([-()\w '"]+)(['"])"#).unwrap();
let mut names_ = vec![];
for (_, [_, name, _]) in re.captures_iter(&names).map(|x| x.extract()) {
names_.push(name.to_string());
}
names_
})
Ok(ys)
}
pub fn batch(&self) -> isize {
@ -346,28 +407,16 @@ impl YOLO {
self.height.opt
}
#[allow(clippy::type_complexity)]
fn non_max_suppression(
xs: &mut Vec<(Bbox, Option<Vec<Keypoint>>, Option<Vec<f32>>)>,
iou_threshold: f32,
) {
xs.sort_by(|b1, b2| b2.0.confidence().partial_cmp(&b1.0.confidence()).unwrap());
let mut current_index = 0;
for index in 0..xs.len() {
let mut drop = false;
for prev_index in 0..current_index {
let iou = xs[prev_index].0.iou(&xs[index].0);
if iou > iou_threshold {
drop = true;
break;
}
fn fetch_names(engine: &OrtEngine) -> Option<Vec<String>> {
// fetch class names from onnx metadata
// String format: `{0: 'person', 1: 'bicycle', 2: 'sports ball', ..., 27: "yellow_lady's_slipper"}`
engine.try_fetch("names").map(|names| {
let re = Regex::new(r#"(['"])([-()\w '"]+)(['"])"#).unwrap();
let mut names_ = vec![];
for (_, [_, name, _]) in re.captures_iter(&names).map(|x| x.extract()) {
names_.push(name.to_string());
}
if !drop {
xs.swap(current_index, index);
current_index += 1;
}
}
xs.truncate(current_index);
names_
})
}
}

View File

@ -2,7 +2,7 @@ use anyhow::Result;
use image::DynamicImage;
use ndarray::{s, Array, Axis, IxDyn};
use crate::{ops, Bbox, DynConf, MinOptMax, Options, OrtEngine, Rect, Ys};
use crate::{ops, Bbox, DynConf, Mask, MinOptMax, Options, OrtEngine, Y};
#[derive(Debug)]
pub struct YOLOPv2 {
@ -36,17 +36,16 @@ impl YOLOPv2 {
})
}
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Ys>> {
pub fn run(&mut self, xs: &[DynamicImage]) -> Result<Vec<Y>> {
let xs_ = ops::letterbox(xs, self.height() as u32, self.width() as u32, 114.0)?;
let xs_ = ops::normalize(xs_, 0.0, 255.0);
let ys = self.engine.run(&[xs_])?;
let ys = self.postprocess(ys, xs)?;
Ok(ys)
self.postprocess(ys, xs)
}
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Ys>> {
pub fn postprocess(&self, xs: Vec<Array<f32, IxDyn>>, xs0: &[DynamicImage]) -> Result<Vec<Y>> {
let mut ys: Vec<Y> = Vec::new();
let (xs_da, xs_ll, xs_det) = (&xs[0], &xs[1], &xs[2]);
let mut ys: Vec<Ys> = Vec::new();
for (idx, ((x_det, x_ll), x_da)) in xs_det
.axis_iter(Axis(0))
.zip(xs_ll.axis_iter(Axis(0)))
@ -63,7 +62,7 @@ impl YOLOPv2 {
);
// Vehicle
let mut ys_bbox = Vec::new();
let mut y_bboxes = Vec::new();
for x in x_det.axis_iter(Axis(0)) {
let bbox = x.slice(s![0..4]);
let clss = x.slice(s![5..]).to_owned();
@ -83,19 +82,15 @@ impl YOLOPv2 {
let h = bbox[3] / ratio;
let x = cx - w / 2.;
let y = cy - h / 2.;
ys_bbox.push(Bbox::new(
Rect::from_xywh(
x.max(0.0f32).min(image_width),
y.max(0.0f32).min(image_height),
w,
h,
),
id,
conf,
None,
));
let x = x.max(0.0).min(image_width);
let y = y.max(0.0).min(image_height);
y_bboxes.push(
Bbox::default()
.with_xywh(x, y, w, h)
.with_confidence(conf)
.with_id(id as isize),
);
}
Ys::non_max_suppression(&mut ys_bbox, self.iou);
// Drivable area
let x_da_0 = x_da.slice(s![0, .., ..]).to_owned();
@ -119,8 +114,21 @@ impl YOLOPv2 {
image_height,
);
let mask_da = mask_da.into_luma8();
let mut y_masks =
ops::get_masks_from_image(mask_da, 1, 0, Some("Drivable area".to_string()));
let mut y_masks: Vec<Mask> = Vec::new();
let contours: Vec<imageproc::contours::Contour<i32>> =
imageproc::contours::find_contours_with_threshold(&mask_da, 1);
contours.iter().for_each(|contour| {
if contour.border_type == imageproc::contours::BorderType::Outer
&& contour.points.len() > 2
{
y_masks.push(
Mask::default()
.with_id(0)
.with_points_imageproc(&contour.points)
.with_name(Some("Drivable area".to_string())),
);
}
});
// Lane line
let x_ll = x_ll
@ -141,9 +149,30 @@ impl YOLOPv2 {
image_height,
);
let mask_ll = mask_ll.into_luma8();
let masks = ops::get_masks_from_image(mask_ll, 1, 5, Some("Lane line".to_string()));
let contours: Vec<imageproc::contours::Contour<i32>> =
imageproc::contours::find_contours_with_threshold(&mask_ll, 1);
let mut masks: Vec<Mask> = Vec::new();
contours.iter().for_each(|contour| {
if contour.border_type == imageproc::contours::BorderType::Outer
&& contour.points.len() > 2
{
masks.push(
Mask::default()
.with_id(1)
.with_points_imageproc(&contour.points)
.with_name(Some("Lane line".to_string())),
);
}
});
y_masks.extend(masks);
ys.push(Ys::default().with_bboxes(&ys_bbox).with_masks(&y_masks));
// save
ys.push(
Y::default()
.with_bboxes(&y_bboxes)
.with_masks(&y_masks)
.apply_bboxes_nms(self.iou),
);
}
Ok(ys)
}

121
src/utils/coco.rs Normal file
View File

@ -0,0 +1,121 @@
pub const SKELETONS_16: [(usize, usize); 16] = [
(0, 1),
(0, 2),
(1, 3),
(2, 4),
(5, 6),
(5, 11),
(6, 12),
(11, 12),
(5, 7),
(6, 8),
(7, 9),
(8, 10),
(11, 13),
(12, 14),
(13, 15),
(14, 16),
];
pub const KEYPOINTS_NAMES_17: [&str; 17] = [
"nose",
"left_eye",
"right_eye",
"left_ear",
"right_ear",
"left_shoulder",
"right_shoulder",
"left_elbow",
"right_elbow",
"left_wrist",
"right_wrist",
"left_hip",
"right_hip",
"left_knee",
"right_knee",
"left_ankle",
"right_ankle",
];
pub const NAMES_80: [&str; 80] = [
"person",
"bicycle",
"car",
"motorcycle",
"airplane",
"bus",
"train",
"truck",
"boat",
"traffic light",
"fire hydrant",
"stop sign",
"parking meter",
"bench",
"bird",
"cat",
"dog",
"horse",
"sheep",
"cow",
"elephant",
"bear",
"zebra",
"giraffe",
"backpack",
"umbrella",
"handbag",
"tie",
"suitcase",
"frisbee",
"skis",
"snowboard",
"sports ball",
"kite",
"baseball bat",
"baseball glove",
"skateboard",
"surfboard",
"tennis racket",
"bottle",
"wine glass",
"cup",
"fork",
"knife",
"spoon",
"bowl",
"banana",
"apple",
"sandwich",
"orange",
"broccoli",
"carrot",
"hot dog",
"pizza",
"donut",
"cake",
"chair",
"couch",
"potted plant",
"bed",
"dining table",
"toilet",
"tv",
"laptop",
"mouse",
"remote",
"keyboard",
"cell phone",
"microwave",
"oven",
"toaster",
"sink",
"refrigerator",
"book",
"clock",
"vase",
"scissors",
"teddy bear",
"hair drier",
"toothbrush",
];

Some files were not shown because too many files have changed in this diff Show More