Char based delimiter splitting - TransfoXL (#114)

* WIP delimiter splitter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bind on Python side.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add missing delimiter parameter in CharDelimiterSplit constructor.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Attempt to provide CharDelimiterSplit for node.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Apply Rust formatting.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* fix bindings node

Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
This commit is contained in:
Funtowicz Morgan
2020-02-04 16:23:00 +00:00
committed by GitHub
parent 3adb220973
commit 6165910ca6
9 changed files with 112 additions and 7 deletions

View File

@@ -40,6 +40,7 @@ fn pre_tokenizers(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_class::<pre_tokenizers::WhitespaceSplit>()?;
m.add_class::<pre_tokenizers::BertPreTokenizer>()?;
m.add_class::<pre_tokenizers::Metaspace>()?;
m.add_class::<pre_tokenizers::CharDelimiterSplit>()?;
Ok(())
}