Commit Graph

30 Commits

Author SHA1 Message Date
2419c14e42 ByteLevel is also a Decoder 2019-11-21 11:52:55 -05:00
56e37475c3 Add Decoder to Tokenizer 2019-11-21 11:51:43 -05:00
3ec26b332c Add Tokenizer token_to_id/id_to_token 2019-11-20 17:28:28 -05:00
8b3d7d1aa0 Add vocab/merge arguments to example.py 2019-11-20 16:47:02 -05:00
98323d1f21 Update readme and fix example 2019-11-19 19:38:57 -05:00
351d526e1e Basic python bindings 2019-11-19 19:31:37 -05:00
39afc64e13 impl PreTokenizer for Whitespace 2019-11-19 19:31:37 -05:00
2d7c5f04f8 Fix readme indentation 2019-11-18 16:34:13 -05:00
1b32560067 Update readme with simple example 2019-11-18 16:31:35 -05:00
872aa86b71 Basic cli for testing 2019-11-18 15:47:35 -05:00
4e5106989f Ability to load a BPE model from files 2019-11-18 10:00:53 -05:00
0b450d62ff Add ByteLevel pre tokenizer 2019-11-17 00:40:22 -05:00
a55dccafb5 Add BPE training 2019-11-17 00:28:36 -05:00
1c7dcebca7 Add BPE tokenization 2019-11-17 00:27:30 -05:00
b2ba864248 Move whitespace pre tokenizer 2019-11-16 22:42:02 -05:00
1294f400dc Add folder structure 2019-11-16 22:40:51 -05:00
7b8b765269 Add Tokenizer interface 2019-11-16 22:36:44 -05:00
195423fe11 Rust install 2019-11-01 19:45:00 -04:00
9f15d2c165 Node readme 2019-11-01 19:44:44 -04:00
05cbb32eca Python readme 2019-11-01 19:42:36 -04:00
6d91bf4005 Update node bindings 2019-11-01 19:23:22 -04:00
fd7ec39367 Update python bindings 2019-11-01 18:56:55 -04:00
9fd10ca1c5 Simple whitespace tokenizer 2019-11-01 18:31:05 -04:00
57a1ce7e1d Node bindings backbone 2019-11-01 16:39:03 -04:00
8448d50e6f Quick improvement over python bindings 2019-11-01 16:08:10 -04:00
5d37cfde7f Python bindings backbone 2019-11-01 15:02:19 -04:00
5f57ee9f0e Global gitignore 2019-11-01 14:55:08 -04:00
7dbee7157f add gitignore 2019-11-01 13:56:08 -04:00
2b72a5737f Basic bin+lib setup 2019-11-01 13:54:17 -04:00
b9b519c84a Initial commit 2019-11-01 13:52:44 -04:00