Bjarte Johansen
f32e0c09fc
Implement __new__ for PostProcessors
...
Allows PostProcessors to be instansiated through python class constructor.
2020-02-10 10:43:53 +01:00
Bjarte Johansen
03508826cb
Implement __new__ on Decoders
...
Allow decoders to be initialized from python using the class
constructor.
2020-02-10 10:43:53 +01:00
Bjarte Johansen
4971e9608d
Implement __new__ on Trainers
...
__new__ allows Trainers to be initialized in the normal python
fashion.
2020-02-10 10:43:29 +01:00
Bjarte Johansen
0e5d81b400
Implement __new__ on Normalizers
...
__new__ allows Normalizers to be initialized as normal python
objects. This also means that Normalizers are given the correct class
name.
2020-02-10 10:43:19 +01:00
Pierric Cistac
be67d51185
node: add more infos in package.json
2020-02-05 18:07:39 -05:00
Pierric Cistac
3df188dc27
node: version 0.4.0
2020-02-05 17:38:59 -05:00
Pierric Cistac
cb8585bc4e
Merge pull request #126 from huggingface/node-bindings
...
node: expose tokenizer configuration / truncation / padding
2020-02-05 16:53:24 -05:00
Pierric Cistac
3adf199a0c
fix pad calls
2020-02-05 14:49:47 -05:00
Pierric Cistac
41fee6de3d
rust: derive Copy for PaddingDirection
2020-02-05 14:44:07 -05:00
Pierric Cistac
10e2d286ca
node: fix bert special tokens
2020-02-05 14:40:03 -05:00
Pierric Cistac
02ab624050
node: expose truncation/padding getters on base tokenizer
2020-02-05 14:28:53 -05:00
Pierric Cistac
51cc581f32
node: setTruncation and setPadding return the complete config
2020-02-05 14:28:53 -05:00
Pierric Cistac
a54d5f05fa
node: expose tokenizers config
...
fix tokenizers config types
2020-02-05 14:28:53 -05:00
Pierric Cistac
2bcd47440c
node: add enums for padding and truncation strategies
2020-02-05 14:28:53 -05:00
Anthony MOI
3b2414c200
Fix indentation in README for consistency
2020-02-05 14:15:25 -05:00
Anthony MOI
32e6856c6c
Ignore rust-toolchain when publishing
2020-02-05 14:12:28 -05:00
Anthony MOI
e2e9cff606
Add rust-toolchain
2020-02-05 14:10:46 -05:00
Anthony MOI
9745786b89
Bump versions for release
2020-02-05 13:55:51 -05:00
Anthony MOI
89f6db28f0
update cargo.lock for indicatif
2020-02-05 13:38:12 -05:00
Anthony MOI
8decd020cb
Python - Provide mapping to original offsets
...
As requested on #81
2020-02-05 13:33:19 -05:00
Anthony MOI
42c4691e4d
Python - Update Bert default special tokens
...
Closes #106
2020-02-05 12:55:01 -05:00
MOI Anthony
a1284f6220
Merge pull request #128 from huitseeker/warts
...
Maintenance : simplifications & update
2020-02-05 12:28:22 -05:00
Funtowicz Morgan
8200112e9b
Introduce WordLevel model for TransformerXL ( #125 )
...
* Added lookup table model mapping string to id present in a vocab map.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* RustFmt
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Formatting.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Fix invalid void return on Rust side.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Python binding for LookupTable model
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Enable loading from Python's side.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Renamed LookupTable to WordLevel
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* RustFmt happy now.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* clippy happy now.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Addressing mismatching names.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Addressing mismatching names (one missing).
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-02-05 16:51:35 +00:00
François Garillot
d4f71e50ad
update indicatif
2020-02-05 07:11:47 -08:00
François Garillot
42bc3cb21f
Simplify a few Option / Result pattern-matches
2020-02-05 07:11:47 -08:00
Pierric Cistac
9770be5661
node: fix encodinggetSpecialTokensMask type
2020-02-04 16:59:46 -05:00
Funtowicz Morgan
6165910ca6
Char based delimiter splitting - TransfoXL ( #114 )
...
* WIP delimiter splitter
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Bind on Python side.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Add missing delimiter parameter in CharDelimiterSplit constructor.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Attempt to provide CharDelimiterSplit for node.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Apply Rust formatting.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* fix bindings node
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com >
2020-02-04 16:23:00 +00:00
MOI Anthony
3adb220973
Merge pull request #124 from huggingface/fix-overflowing-padding
...
rust: fix padding on overflowings
2020-02-03 19:01:07 -05:00
Pierric Cistac
bd0f52f3d1
rust: fix padding on overflowings
...
shadowing of `pad_length` made it useless on overflowings
2020-02-03 18:32:49 -05:00
Pierric Cistac
7051480c33
node: expose more methods in base tokenizer
2020-02-03 17:51:53 -05:00
Pierric Cistac
220bd0d9df
node: uniformize tests semantic
2020-02-03 17:51:53 -05:00
Pierric Cistac
acef252dac
node: add special tokens in tokenizers implementations
2020-02-03 17:49:51 -05:00
Anthony MOI
53637d4d88
Python - Also add missing special tokens for SentencePiece
2020-02-03 12:52:39 -05:00
Anthony MOI
9e0b971f20
Python - Add missing special tokens in implementations classes
2020-02-03 12:49:40 -05:00
Pierric Cistac
4940f26b65
node: fix build error handling
2020-02-03 12:07:49 -05:00
MOI Anthony
a48b337d7b
Merge pull request #99 from kdexd/get-vocab-size
...
Expose get_vocab_size in tokenizer python API.
2020-02-03 11:52:29 -05:00
MOI Anthony
0094393610
Merge pull request #77 from huggingface/improve-truncation
...
Improve truncation
2020-02-03 11:49:46 -05:00
Pierric Cistac
e55905126d
Fix js overflowing tests
2020-02-03 11:41:09 -05:00
Anthony MOI
9fd64a7863
Update bert processing and padding
2020-02-03 11:38:52 -05:00
Anthony MOI
81457c0241
Node - Actually keep the previous name
2020-02-03 11:38:52 -05:00
Anthony MOI
b90104e705
Update Python bindings
2020-02-03 11:38:52 -05:00
Anthony MOI
ffda63cd33
Update node bindings
2020-02-03 11:38:52 -05:00
Anthony MOI
c2978457ae
Handle merging two Encoding and their overflowings
2020-02-03 11:38:52 -05:00
Anthony MOI
4a5d2b1053
Handle padding of the overflowings
2020-02-03 11:38:52 -05:00
Anthony MOI
68f99bb822
Improve the truncation of an Encoding
2020-02-03 11:38:52 -05:00
Pierric Cistac
78e26905a7
Merge pull request #109 from huggingface/node-bindings
...
node: add tokenizer truncation / padding bindings
2020-02-03 11:38:05 -05:00
Pierric Cistac
75f56a0adc
node: add some padding / truncation tests
2020-02-03 11:31:30 -05:00
Pierric Cistac
680eed15e7
node: add basic test on tokenizer methods
2020-02-03 11:31:30 -05:00
Pierric Cistac
461052c06f
node: add disablePadding and disableTruncation in Tokenizer
2020-02-03 11:31:30 -05:00
Pierric Cistac
7e36239d74
node: add setPadding in Tokenizer
2020-02-03 11:31:30 -05:00