d1b6b14bd7
Attempt fix workflows
2019-11-29 19:28:49 -05:00
989e9b03ca
Ignore some python files
2019-11-27 12:22:01 -05:00
428890d6e0
Basic python setuptools
2019-11-27 12:21:37 -05:00
e49abab747
Python - Add Decoder/PreTokenizer standalone capabilities
2019-11-26 17:52:19 -05:00
d565bbf309
Container - Add ability to execute
2019-11-26 17:51:26 -05:00
5c6834f363
Added GitHub Action workflow for Rust
...
This allows for automated build & test of the library.
2019-11-26 09:47:48 +00:00
f4369b312d
Python - Add ability to create custom Decoder
2019-11-25 19:14:07 -05:00
d7ba6802df
Update gitignore
2019-11-25 15:35:54 -05:00
512e85dfda
Update python README
2019-11-24 00:55:13 -05:00
bafdc5e157
Code style
2019-11-24 00:52:48 -05:00
6437c40235
Python - PoC Custom PreTokenizer
2019-11-24 00:52:13 -05:00
b081e6ca04
Python - Also expose default classes
2019-11-24 00:35:05 -05:00
bd1aa80d8a
Python - Custom PreTokenizer backbone
2019-11-23 23:59:33 -05:00
891fc12de2
Python - Update example with new format
2019-11-22 21:09:17 -05:00
8fbe3c2662
Python - Add decoders
2019-11-22 21:08:57 -05:00
e44f52024c
Python - Set a PreTokenizer in a model
2019-11-22 21:01:52 -05:00
9b71c8f8de
Python - BPE construction
2019-11-22 20:57:54 -05:00
f6a9b57b5b
Python - Add pre_tokenizers module
2019-11-22 20:56:50 -05:00
39a6d04c53
Improve Python bindings
...
This is an attempt at actually exposing the same structure that we use in the Rust lib. This will allow Python to instantiate Model/PreTokenizer/... with their own arguments, combining everything without relying on parsed kwargs.
2019-11-22 17:57:36 -05:00
663644e041
Fix ByteLevel Decoder
...
The join was done after replacing bytes and building subwords, which was preventing bytes across these subwords to be merged correctly. We need to join first.
2019-11-21 16:50:25 -05:00
634415c098
Add a parallel capable cache for BPE
...
This allows for some performance improvement in the best case scenarios (up to 40% during some tests)
2019-11-21 16:09:07 -05:00
070fd08583
Update python example
2019-11-21 11:57:57 -05:00
c28a83cdc4
Update python bindings
2019-11-21 11:55:07 -05:00
6853e6c904
Tokenizer decoding
2019-11-21 11:54:54 -05:00
2419c14e42
ByteLevel is also a Decoder
2019-11-21 11:52:55 -05:00
56e37475c3
Add Decoder to Tokenizer
2019-11-21 11:51:43 -05:00
3ec26b332c
Add Tokenizer token_to_id/id_to_token
2019-11-20 17:28:28 -05:00
8b3d7d1aa0
Add vocab/merge arguments to example.py
2019-11-20 16:47:02 -05:00
98323d1f21
Update readme and fix example
2019-11-19 19:38:57 -05:00
351d526e1e
Basic python bindings
2019-11-19 19:31:37 -05:00
39afc64e13
impl PreTokenizer for Whitespace
2019-11-19 19:31:37 -05:00
2d7c5f04f8
Fix readme indentation
2019-11-18 16:34:13 -05:00
1b32560067
Update readme with simple example
2019-11-18 16:31:35 -05:00
872aa86b71
Basic cli for testing
2019-11-18 15:47:35 -05:00
4e5106989f
Ability to load a BPE model from files
2019-11-18 10:00:53 -05:00
0b450d62ff
Add ByteLevel pre tokenizer
2019-11-17 00:40:22 -05:00
a55dccafb5
Add BPE training
2019-11-17 00:28:36 -05:00
1c7dcebca7
Add BPE tokenization
2019-11-17 00:27:30 -05:00
b2ba864248
Move whitespace pre tokenizer
2019-11-16 22:42:02 -05:00
1294f400dc
Add folder structure
2019-11-16 22:40:51 -05:00
7b8b765269
Add Tokenizer interface
2019-11-16 22:36:44 -05:00
195423fe11
Rust install
2019-11-01 19:45:00 -04:00
9f15d2c165
Node readme
2019-11-01 19:44:44 -04:00
05cbb32eca
Python readme
2019-11-01 19:42:36 -04:00
6d91bf4005
Update node bindings
2019-11-01 19:23:22 -04:00
fd7ec39367
Update python bindings
2019-11-01 18:56:55 -04:00
9fd10ca1c5
Simple whitespace tokenizer
2019-11-01 18:31:05 -04:00
57a1ce7e1d
Node bindings backbone
2019-11-01 16:39:03 -04:00
8448d50e6f
Quick improvement over python bindings
2019-11-01 16:08:10 -04:00
5d37cfde7f
Python bindings backbone
2019-11-01 15:02:19 -04:00