mirror of
https://github.com/mii443/tokenizers.git
synced 2025-08-22 16:25:30 +00:00
Add a visualization utility to render tokens and annotations in a notebook (#508)
* Draft functionality of visualization * Added comments to make code more intelligble * polish the styles * Ensure colors are stable and comment the css * Code clean up * Made visualizer importable and added some docs * Fix styling * implement comments from PR * Fixed the regex for UNK tokens and examples in notebook * Converted docs to google format * Added a notebook showing multiple languages and tokenizers * Added visual indication of chars that are tokenized with >1 token * Reorganize things a bit and fix import * Update docs Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
This commit is contained in:
@ -41,6 +41,7 @@ setup(
|
||||
"tokenizers.processors",
|
||||
"tokenizers.trainers",
|
||||
"tokenizers.implementations",
|
||||
"tokenizers.tools",
|
||||
],
|
||||
package_data={
|
||||
"tokenizers": ["py.typed", "__init__.pyi"],
|
||||
@ -51,6 +52,7 @@ setup(
|
||||
"tokenizers.processors": ["py.typed", "__init__.pyi"],
|
||||
"tokenizers.trainers": ["py.typed", "__init__.pyi"],
|
||||
"tokenizers.implementations": ["py.typed"],
|
||||
"tokenizers.tools": ["py.typed", "visualizer-styles.css"],
|
||||
},
|
||||
zip_safe=False,
|
||||
)
|
||||
|
Reference in New Issue
Block a user