* Added lookup table model mapping string to id present in a vocab map.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* RustFmt
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Formatting.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix invalid void return on Rust side.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Python binding for LookupTable model
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Enable loading from Python's side.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Renamed LookupTable to WordLevel
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* RustFmt happy now.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* clippy happy now.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing mismatching names.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing mismatching names (one missing).
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix invalid method bindings on Python side.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Introduce factory function to create normalizer instance from the name of an unicode normalizer.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Rename BPETokenizer to CharBPETokenizer for clarity
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Give more flexibility in the way CharBPETokenizer handles normalizers creation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Change .pyi file to reflection Normalizer hierarchy
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Make ByteLevelBPE as flexible for normalization than CharBPE.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added RobertaProcessor on Rust side.
Required to match the double separator token in the middle of pairs.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix typo in RobertaProcessing method declaration
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Correctly include RobertProcessor in the Python binding
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Roberta doesnt use token_type_ids so let's set everything to 0
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Attempt to make it works on Node side too.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* fix js bindings / `npm run lint`
* Make RustFmt happy.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>