* Rust - add a CTCDecoder as a seperate mod
* Adding bindings to Node + Python.
* Clippy update.
* Stub.
* Fixing roberta.json URLs.
* Moving test files to hf.co.
* Update cargo check and clippy to 1.52.
* Inner ':' actually is used for domains in sphinx.
Making `domain` work correctly was just too much work so I went the easy
way and have global roles for the custom rust extension.
* Update struct naming and docs
* Update changelog
Co-authored-by: Thomaub <github.thomaub@gmail.com>
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
* Draft functionality of visualization
* Added comments to make code more intelligble
* polish the styles
* Ensure colors are stable and comment the css
* Code clean up
* Made visualizer importable and added some docs
* Fix styling
* implement comments from PR
* Fixed the regex for UNK tokens and examples in notebook
* Converted docs to google format
* Added a notebook showing multiple languages and tokenizers
* Added visual indication of chars that are tokenized with >1 token
* Reorganize things a bit and fix import
* Update docs
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
* start playing around
* make a first version
* refactor
* apply make format
* add python bindings
* add some python binding tests
* correct pre-tokenizers
* update auto-generated bindings
* lint python bindings
* add code node
* add split to docs
* refactor python binding a bit
* cargo fmt
* clippy and fmt in node
* quick updates and fixes
* Oops
* Update node typings
* Update changelog
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>