mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-06 04:38:23 +00:00
Add Strip normalizer (#140)
* WIP strip. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Rust StripNormalizer Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Allow to specify strip direction Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Renamed StripNormalizer to Strip Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added Python binding. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Makes Strip python compatible with pythonic constructor. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Run RustFmt Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Clippy next ofc. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Move lstrip and rstrip on NormalizedString Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * implment strip() for normalizer + unittests. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Add some more unittests on edge cases. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * clippy and fmt. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Simplify strip and fix offsets * Python - Update strip bindings with default values Co-authored-by: MOI Anthony <xn1t0x@gmail.com>
This commit is contained in:
@@ -76,6 +76,7 @@ fn normalizers(_py: Python, m: &PyModule) -> PyResult<()> {
|
||||
m.add_class::<normalizers::NFKC>()?;
|
||||
m.add_class::<normalizers::Sequence>()?;
|
||||
m.add_class::<normalizers::Lowercase>()?;
|
||||
m.add_class::<normalizers::Strip>()?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user