mirror of
https://github.com/mii443/AzooKeyKanaKanjiConverter.git
synced 2025-08-22 15:05:26 +00:00
Merge pull request #223 from azooKey/feat/surface_based_indexing
feat(breaking): Latticeの操作をconvertTargetベースのindexとinputベースのindexの二重化して扱えるように変更
This commit is contained in:
@ -1,6 +1,6 @@
|
||||
# Conversion Algorithms
|
||||
|
||||
azooKey内部で用いられている複雑な実装を大まかに説明します。
|
||||
AzooKeyKanaKanjiConverter内部で用いられている複雑な実装を大まかに説明します。
|
||||
|
||||
## かな漢字変換
|
||||
|
||||
@ -10,9 +10,9 @@ azooKey内部で用いられている複雑な実装を大まかに説明しま
|
||||
|
||||
アルゴリズムに特徴的な点として、文節単位に分割したあと、「内容語バイグラム」とでもいうべき追加のコストを計算します。このコスト計算により、「共起しやすい語」が共起している場合により評価が高く、「共起しづらい語」が共起している場合に評価が低くなります。
|
||||
|
||||
## 入力管理
|
||||
## 入力管理(Input Management)
|
||||
|
||||
入力管理は簡単に見えて非常に複雑な問題です。azooKeyでは`ComposingText`の内部で管理されています。
|
||||
入力管理とは、ユーザのキー入力の履歴を管理し、それに応じてローマ字かな変換などの適用を行う仕組みです。入力管理は簡単に見えて非常に複雑な問題です。AzooKeyKanaKanjiConverterではおもに`ComposingText`の内部で管理されています。
|
||||
|
||||
典型的なエッジケースは「ローマ字入力中に英語キーボードに切り替えて英字を打ち、日本語キーボードに戻って入力を続ける」という操作です。つまり、次の2つは区別できなければいけません。
|
||||
|
||||
@ -26,7 +26,7 @@ azooKey内部で用いられている複雑な実装を大まかに説明しま
|
||||
入力 a (日本語) // →kあ
|
||||
```
|
||||
|
||||
azooKeyの`ComposingText`は、次のような構造になっています。このように`input`を持つことによって、この問題に対処しています。
|
||||
AzooKeyKanaKanjiConverterの`ComposingText`は、次のような構造になっています。このように`input`を持つことによって、この問題に対処しています。
|
||||
|
||||
```swift
|
||||
struct ComposingText {
|
||||
@ -76,7 +76,7 @@ ComposingText(
|
||||
1. じゅあ
|
||||
1. 諦めて編集状態を解除する
|
||||
|
||||
1は最も直感的で、azooKeyはこの方式をとっています。この場合、`input`を修正する必要があります。そこでazooKeyでは、「u」をローマ字入力した場合に`ComposingText`が次のように変化します。
|
||||
1は最も直感的で、AzooKeyKanaKanjiConverterはこの方式をとっています。この場合、`input`を修正する必要があります。AzooKeyKanaKanjiConverterでは、「u」をローマ字入力した場合に`ComposingText`が次のように変化します。
|
||||
|
||||
```swift
|
||||
ComposingText(
|
||||
@ -90,7 +90,7 @@ ComposingText(
|
||||
)
|
||||
```
|
||||
|
||||
一方でiOSの標準ローマ字入力では、「2」が選ばれています。これはある意味で綺麗な方法で、ローマ字入力時に「一度に」入力された単位は不可侵にしてしまう、という方法で上記の変化を無くしています。もしazooKeyがこの方式をとっているのであれば、以下のように変化することになります。しかし、このような挙動は非直感的でもあります。
|
||||
一方でiOSの標準ローマ字入力では、「2」が選ばれています。これはある意味で綺麗な方法で、ローマ字入力時に「一度に」入力された単位は不可侵にしてしまう、という方法で上記の変化を無くしています。もしAzooKeyKanaKanjiConverterがこの方式をとっているのであれば、以下のように変化することになります。しかし、このような挙動は非直感的でもあります。
|
||||
|
||||
```swift
|
||||
ComposingText(
|
||||
@ -106,26 +106,31 @@ ComposingText(
|
||||
|
||||
「3」の「じゅあ」を選んでいるシステムは知る限りありません。この方式は「ja / じゃ」の間に「u」を入れる場合はうまくいきますが、「cha / ちゃ」の「ち」と「ゃ」の間に「u」を入れる場合は入れる位置をどのように決定するのかという問題が残ります。(chua、とすることになるのでしょうか)
|
||||
|
||||
「4」はある意味素直な立場で、「そんなんどうでもええやろ」な実装はしばしばこういう形になっています。合理的です。azooKeyも、ライブ変換中はカーソル移動を諦めているため、このように実装しています。
|
||||
「4」はある意味素直な立場で、「そんなんどうでもええやろ」な実装はしばしばこういう形になっています。合理的です。AzooKeyKanaKanjiConverterも、ライブ変換中はカーソル移動を諦めているため、このように実装しています。
|
||||
|
||||
このように、入力にはさまざまなエッジケースがあります。こうした複雑なケースに対応していくため、入力の管理は複雑にならざるを得ないのです。
|
||||
|
||||
## 誤り訂正
|
||||
## 誤り訂正(Typo Correction)
|
||||
|
||||
誤り訂正は、上記の`ComposingText`を基盤としたアドホックな実装になっています。
|
||||
AzooKeyKanaKanjiConverterの誤り訂正は、`ComposingText.input`に対する置換として実装されています。つまり、例えば「ts」というシーケンスが存在した場合、一定のペナルティを課した上でこれを「ts」と読み替えたり、「た」というシーケンスが存在した場合、一定のペナルティを課した上でこれを「だ」と読み替えたり、といった具合です。これらのルールは事前にソースコードレベルで定義されています。
|
||||
|
||||
具体的には、`ComposingText`のそれぞれの部分に対して
|
||||
誤り訂正をナイーブに実装した場合、訂正候補の組み合わせ爆発が課題となります。例えば、入力が「たたたたたたたたたた」であるような場合、それぞれの「た」についてルールを適用するか否かで1024通りの候補が生じてしまいます。
|
||||
|
||||
* 「た」があれば「だ」も許す
|
||||
* 「ts」とがあれば「た」に置き換える
|
||||
このような問題に対処するため、AzooKeyKanaKanjiConverterでは効率的な誤り訂正のための工夫を導入しています。ペナルティは置換を適用するたびに蓄積するので、このペナルティには上限が設けられており、それを超えた場合は列挙の対象から外れます。これにより、パターンを大きく減らすことができます。
|
||||
|
||||
というような事前に列挙されたルールを適用します。
|
||||
さらに、v0.9系以降のAzooKeyKanaKanjiConverterでは誤り訂正と辞書検索が並行して行われます。Trie木を用いた辞書検索では、特定の文字列に対応するノードがなかった場合、その文字列をプレフィックスに持ついかなる文字列も辞書登録されていないことがわかります。この性質を利用し、ありえない候補を生み出すような誤り訂正は早期に枝刈りされ、列挙のコストを大幅に削減することができています。
|
||||
|
||||
しかし、任意の回数適用を行えるとなると、「たたたたたたたたたた」が入ってきた場合、それぞれの「た」についてルールを適用するか否かで1024通りの候補が生じてしまいます。これでは困るので、実際には「ルールの適用は3回まで」というように制約をつけ、組み合わせ爆発を防いでいます。
|
||||
## 二重インデックスラティス (Dual-Indexed Lattice)
|
||||
|
||||
また、ルールの適用をおこなった場合、候補のコストを追加することで「ある程度のコストをかけても上位にくる場合、誤っている可能性が高い」ということを表現しています。このコストは人力で決めていて、「か」「が」のような助詞同士のペアではより高くするなど一部調整をしています。
|
||||
v0.9系以降のAzooKeyKanaKanjiConverterではかな漢字変換のためのラティス構造に大きな変更を加え、「二重インデックスラティス」と呼ぶ構造を導入しました。
|
||||
|
||||
## 学習
|
||||
従来のかな漢字変換用のラティスは、`ComposingText.input`のインデックスに対応する二重配列として表現されていました。二重配列の各ノード配列要素は、対応する`input`の位置から始まり、特定の`input`の位置で終わるようなノードを格納しています。この方法では、例えば「ittai」という入力に対して`[[i, it, itta, itta, ittai], [tt, tta, ttai], [ta, tai], [a, ai], [i]]`のような形で部分入力文字列が作られ、それぞれについて辞書引きが行われて実際のラティスノードが作られます。
|
||||
|
||||
このような実装は大筋で問題なく動作しますが、例外的なケースで問題が発生します。具体的には、「イッ」のような文字列に対応する入力を作ることができません。なぜなら、「ittai」というローマ字入力列のうち、「イッ」に過不足なく対応するような部分文字列が存在しないからです。「it」はそれ単体では「イt」、「itt」はそれ単体では「イッt」です。しかし、辞書においては「行った」の「行っ」など、「イッ」で検索をかけなければ作れない単語が数多くあります。
|
||||
|
||||
この問題に対処するため、AzooKeyKanaKanjiConverterでは「表層文字列レベルのインデックス」つまり「イッタイ」という文字列ベースのインデックスと、「内部文字列レベルのインデックス」つまり「ittai」という履歴ベースのインデックスを混在させる構造を導入しました。通常の変換には常に表層文字列レベルのインデックスを利用しつつ、誤り訂正については内部文字列ベースのインデックスを利用し、両者の間の対応関係を適切に取り扱うことにより、上記の問題を解決しています。
|
||||
|
||||
## 学習(Learning)
|
||||
|
||||
学習は、「一時記憶(キーボードを開く〜閉じるの間)」と「長期記憶(半永続)」の2つのデータを用いて行います。一時記憶は揮発性メモリ上にのみ存在し、長期記憶はファイルとして非揮発性のストレージに保存します。
|
||||
|
||||
@ -171,11 +176,11 @@ ComposingText(
|
||||
|
||||
3のステップの実行中にエラーが生じた場合、`.pause`があるため、次回キーボードを開いた際は学習を停止状態にします。ついで適切なタイミングで再度ステップ3を実行することで、安全に全てのファイルを更新することができます。
|
||||
|
||||
azooKeyKanaKanjiConverter では、変換器を開いた際に `.pause` ファイルが残っている場合、自動的に空の一時記憶とマージを試みて `.pause` を削除し、学習機能を復旧します。
|
||||
AzooKeyKanaKanjiConverter では、変換器を開いた際に `.pause` ファイルが残っている場合、自動的に空の一時記憶とマージを試みて `.pause` を削除し、学習機能を復旧します。
|
||||
|
||||
## 変換候補の並び順
|
||||
## 変換候補の並び順(Candidate Ordering)
|
||||
|
||||
変換候補の並び順の決定はとても難しい問題です。azooKeyではおおよそ以下のようになっています。`Converter.swift`が並び順を決めていますが、とても複雑な実装になっているため、改善したいと思っています。
|
||||
変換候補の並び順の決定はとても難しい問題です。AzooKeyKanaKanjiConverterではおおよそ以下のようになっています。`Converter.swift`が並び順を決めていますが、とても複雑な実装になっているため、改善したいと思っています。
|
||||
|
||||
```
|
||||
最初の5件: 完全一致または予測変換またはローマ字英語変換(ただし上位3件までに最低1つは完全一致が含まれる)
|
||||
@ -183,11 +188,11 @@ azooKeyKanaKanjiConverter では、変換器を開いた際に `.pause` ファ
|
||||
そこから: 全部ひらがな、全部カタカナ、全部大文字などの変換と前方一致で長い順・高評価順に辞書データを表示(5番目あたりでUnicode変換、西暦和暦変換、メアド変換、装飾文字などの特殊変換を挿入する)
|
||||
```
|
||||
|
||||
## ライブ変換
|
||||
## ライブ変換(Live Conversion)
|
||||
|
||||
ライブ変換はかなり単純なアイデアで実現しています。ライブ変換のない場合と同様に変換候補をリクエストし、「(予測変換ではなく)完全一致変換の中で最も順位が高いもの」をディスプレイします。
|
||||
|
||||
## 予測変換
|
||||
## 予測変換(Prediction)
|
||||
|
||||
予測変換は「入力中(mid composition)」と「確定後(post composition)」で実装が異なります。
|
||||
|
||||
|
@ -146,15 +146,15 @@ extension Subcommands {
|
||||
|
||||
/// ユーザ辞書
|
||||
var user_dictionary: [InputUserDictionaryItem]? = nil
|
||||
}
|
||||
|
||||
struct InputUserDictionaryItem: Codable {
|
||||
/// 漢字
|
||||
var word: String
|
||||
/// 読み
|
||||
var reading: String
|
||||
/// ヒント
|
||||
var hint: String? = nil
|
||||
}
|
||||
struct InputUserDictionaryItem: Codable {
|
||||
/// 漢字
|
||||
var word: String
|
||||
/// 読み
|
||||
var reading: String
|
||||
/// ヒント
|
||||
var hint: String? = nil
|
||||
}
|
||||
|
||||
struct EvaluateResult: Codable {
|
||||
|
@ -30,6 +30,8 @@ extension Subcommands {
|
||||
var reportScore = false
|
||||
@Flag(name: [.customLong("roman2kana")], help: "Use roman2kana input.")
|
||||
var roman2kana = false
|
||||
@Option(name: [.customLong("config_user_dictionary")], help: "User Dictionary JSON file path")
|
||||
var configUserDictionary: String? = nil
|
||||
@Option(name: [.customLong("config_zenzai_inference_limit")], help: "inference limit for zenzai.")
|
||||
var configZenzaiInferenceLimit: Int = .max
|
||||
@Flag(name: [.customLong("config_zenzai_rich_n_best")], help: "enable rich n_best generation for zenzai.")
|
||||
@ -70,6 +72,15 @@ extension Subcommands {
|
||||
}
|
||||
}
|
||||
|
||||
private func parseUserDictionaryFile() throws -> [InputUserDictionaryItem] {
|
||||
guard let configUserDictionary else {
|
||||
return []
|
||||
}
|
||||
let url = URL(fileURLWithPath: configUserDictionary)
|
||||
let data = try Data(contentsOf: url)
|
||||
return try JSONDecoder().decode([InputUserDictionaryItem].self, from: data)
|
||||
}
|
||||
|
||||
@MainActor mutating func run() async {
|
||||
if self.zenzV1 || self.zenzV2 {
|
||||
print("\(bold: "We strongly recommend to use zenz-v3 models")")
|
||||
@ -80,6 +91,11 @@ extension Subcommands {
|
||||
if !self.zenzWeightPath.isEmpty && (!self.zenzV1 && !self.zenzV2 && !self.zenzV3) {
|
||||
print("zenz version is not specified. By default, zenz-v3 will be used.")
|
||||
}
|
||||
|
||||
let userDictionary = try! self.parseUserDictionaryFile().map {
|
||||
DicdataElement(word: $0.word, ruby: $0.reading.toKatakana(), cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -10)
|
||||
}
|
||||
|
||||
let learningType: LearningType = if self.readOnlyMemoryPath != nil {
|
||||
// 読み取りのみ
|
||||
.onlyOutput
|
||||
@ -107,6 +123,7 @@ extension Subcommands {
|
||||
converter.sendToDicdataStore(
|
||||
.setRequestOptions(requestOptions(learningType: learningType, memoryDirectory: memoryDirectory, leftSideContext: nil))
|
||||
)
|
||||
converter.sendToDicdataStore(.importDynamicUserDict(userDictionary))
|
||||
var composingText = ComposingText()
|
||||
let inputStyle: InputStyle = self.roman2kana ? .roman2kana : .direct
|
||||
var lastCandidates: [Candidate] = []
|
||||
@ -220,7 +237,7 @@ extension Subcommands {
|
||||
print("Submit \(candidate.text)")
|
||||
converter.setCompletedData(candidate)
|
||||
converter.updateLearningData(candidate)
|
||||
composingText.prefixComplete(correspondingCount: candidate.correspondingCount)
|
||||
composingText.prefixComplete(composingCount: candidate.composingCount)
|
||||
if composingText.isEmpty {
|
||||
composingText.stopComposition()
|
||||
converter.stopComposition()
|
||||
|
@ -6,6 +6,7 @@
|
||||
// Copyright © 2020 ensan. All rights reserved.
|
||||
//
|
||||
|
||||
import Algorithms
|
||||
import Foundation
|
||||
import SwiftUtils
|
||||
|
||||
@ -28,11 +29,36 @@ extension Kana2Kanji {
|
||||
/// (4)ノードをアップデートした上で返却する。
|
||||
func kana2lattice_all(_ inputData: ComposingText, N_best: Int, needTypoCorrection: Bool) -> (result: LatticeNode, lattice: Lattice) {
|
||||
debug("新規に計算を行います。inputされた文字列は\(inputData.input.count)文字分の\(inputData.convertTarget)")
|
||||
let count: Int = inputData.input.count
|
||||
let result: LatticeNode = LatticeNode.EOSNode
|
||||
let lattice: Lattice = Lattice(nodes: (.zero ..< count).map {dicdataStore.getLOUDSDataInRange(inputData: inputData, from: $0, needTypoCorrection: needTypoCorrection)})
|
||||
let inputCount: Int = inputData.input.count
|
||||
let surfaceCount = inputData.convertTarget.count
|
||||
let indexMap = LatticeDualIndexMap(inputData)
|
||||
let latticeIndices = indexMap.indices(inputCount: inputCount, surfaceCount: surfaceCount)
|
||||
let rawNodes = latticeIndices.map { index in
|
||||
let inputRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let iIndex = index.inputIndex {
|
||||
(iIndex, nil)
|
||||
} else {
|
||||
nil
|
||||
}
|
||||
let surfaceRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let sIndex = index.surfaceIndex {
|
||||
(sIndex, nil)
|
||||
} else {
|
||||
nil
|
||||
}
|
||||
return dicdataStore.lookupDicdata(
|
||||
composingText: inputData,
|
||||
inputRange: inputRange,
|
||||
surfaceRange: surfaceRange,
|
||||
needTypoCorrection: needTypoCorrection
|
||||
)
|
||||
}
|
||||
let lattice: Lattice = Lattice(
|
||||
inputCount: inputCount,
|
||||
surfaceCount: surfaceCount,
|
||||
rawNodes: rawNodes
|
||||
)
|
||||
// 「i文字目から始まるnodes」に対して
|
||||
for (i, nodeArray) in lattice.enumerated() {
|
||||
for (isHead, nodeArray) in lattice.indexedNodes(indices: latticeIndices) {
|
||||
// それぞれのnodeに対して
|
||||
for node in nodeArray {
|
||||
if node.prevs.isEmpty {
|
||||
@ -43,20 +69,20 @@ extension Kana2Kanji {
|
||||
}
|
||||
// 生起確率を取得する。
|
||||
let wValue: PValue = node.data.value()
|
||||
if i == 0 {
|
||||
if isHead {
|
||||
// valuesを更新する
|
||||
node.values = node.prevs.map {$0.totalValue + wValue + self.dicdataStore.getCCValue($0.data.rcid, node.data.lcid)}
|
||||
} else {
|
||||
// valuesを更新する
|
||||
node.values = node.prevs.map {$0.totalValue + wValue}
|
||||
}
|
||||
// 変換した文字数
|
||||
let nextIndex: Int = node.inputRange.endIndex
|
||||
// 後続ノードのindex(正規化する)
|
||||
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
|
||||
// 文字数がcountと等しい場合登録する
|
||||
if nextIndex == count {
|
||||
if nextIndex.inputIndex == inputCount && nextIndex.surfaceIndex == surfaceCount {
|
||||
self.updateResultNode(with: node, resultNode: result)
|
||||
} else {
|
||||
self.updateNextNodes(with: node, nextNodes: lattice[inputIndex: nextIndex], nBest: N_best)
|
||||
self.updateNextNodes(with: node, nextNodes: lattice[index: nextIndex], nBest: N_best)
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -70,7 +96,7 @@ extension Kana2Kanji {
|
||||
}
|
||||
}
|
||||
/// N-Best計算を高速に実行しつつ、遷移先ノードを更新する
|
||||
func updateNextNodes(with node: LatticeNode, nextNodes: [LatticeNode], nBest: Int) {
|
||||
func updateNextNodes(with node: LatticeNode, nextNodes: some Sequence<LatticeNode>, nBest: Int) {
|
||||
for nextnode in nextNodes {
|
||||
if self.dicdataStore.shouldBeRemoved(data: nextnode.data) {
|
||||
continue
|
||||
|
@ -1,3 +1,4 @@
|
||||
import Algorithms
|
||||
import Foundation
|
||||
import SwiftUtils
|
||||
|
||||
@ -20,11 +21,36 @@ extension Kana2Kanji {
|
||||
/// (4)ノードをアップデートした上で返却する。
|
||||
func kana2lattice_all_with_prefix_constraint(_ inputData: ComposingText, N_best: Int, constraint: PrefixConstraint) -> (result: LatticeNode, lattice: Lattice) {
|
||||
debug("新規に計算を行います。inputされた文字列は\(inputData.input.count)文字分の\(inputData.convertTarget)。制約は\(constraint)")
|
||||
let count: Int = inputData.input.count
|
||||
let result: LatticeNode = LatticeNode.EOSNode
|
||||
let lattice: Lattice = Lattice(nodes: (.zero ..< count).map {dicdataStore.getLOUDSDataInRange(inputData: inputData, from: $0, needTypoCorrection: false)})
|
||||
let inputCount: Int = inputData.input.count
|
||||
let surfaceCount = inputData.convertTarget.count
|
||||
let indexMap = LatticeDualIndexMap(inputData)
|
||||
let latticeIndices = indexMap.indices(inputCount: inputCount, surfaceCount: surfaceCount)
|
||||
let rawNodes = latticeIndices.map { index in
|
||||
let inputRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let iIndex = index.inputIndex {
|
||||
(iIndex, nil)
|
||||
} else {
|
||||
nil
|
||||
}
|
||||
let surfaceRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let sIndex = index.surfaceIndex {
|
||||
(sIndex, nil)
|
||||
} else {
|
||||
nil
|
||||
}
|
||||
return dicdataStore.lookupDicdata(
|
||||
composingText: inputData,
|
||||
inputRange: inputRange,
|
||||
surfaceRange: surfaceRange,
|
||||
needTypoCorrection: false
|
||||
)
|
||||
}
|
||||
let lattice: Lattice = Lattice(
|
||||
inputCount: inputCount,
|
||||
surfaceCount: surfaceCount,
|
||||
rawNodes: rawNodes
|
||||
)
|
||||
// 「i文字目から始まるnodes」に対して
|
||||
for (i, nodeArray) in lattice.enumerated() {
|
||||
for (isHead, nodeArray) in lattice.indexedNodes(indices: latticeIndices) {
|
||||
// それぞれのnodeに対して
|
||||
for node in nodeArray {
|
||||
if node.prevs.isEmpty {
|
||||
@ -32,7 +58,7 @@ extension Kana2Kanji {
|
||||
}
|
||||
// 生起確率を取得する。
|
||||
let wValue: PValue = node.data.value()
|
||||
if i == 0 {
|
||||
if isHead {
|
||||
// valuesを更新する
|
||||
node.values = node.prevs.map {$0.totalValue + wValue + self.dicdataStore.getCCValue($0.data.rcid, node.data.lcid)}
|
||||
} else {
|
||||
@ -40,9 +66,9 @@ extension Kana2Kanji {
|
||||
node.values = node.prevs.map {$0.totalValue + wValue}
|
||||
}
|
||||
// 変換した文字数
|
||||
let nextIndex: Int = node.inputRange.endIndex
|
||||
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
|
||||
// 文字数がcountと等しい場合登録する
|
||||
if nextIndex == count {
|
||||
if nextIndex.inputIndex == inputCount && nextIndex.surfaceIndex == surfaceCount {
|
||||
for index in node.prevs.indices {
|
||||
let newnode: RegisteredNode = node.getRegisteredNode(index, value: node.values[index])
|
||||
// 学習データやユーザ辞書由来の場合は素通しする
|
||||
@ -61,7 +87,7 @@ extension Kana2Kanji {
|
||||
Array(($0.data.reduce(into: "") { $0.append(contentsOf: $1.word)} + node.data.word).utf8)
|
||||
}
|
||||
// nodeの繋がる次にあり得る全てのnextnodeに対して
|
||||
for nextnode in lattice[inputIndex: nextIndex] {
|
||||
for nextnode in lattice[index: nextIndex] {
|
||||
// クラスの連続確率を計算する。
|
||||
let ccValue: PValue = self.dicdataStore.getCCValue(node.data.rcid, nextnode.data.lcid)
|
||||
// nodeの持っている全てのprevnodeに対して
|
||||
|
@ -14,7 +14,7 @@ extension Kana2Kanji {
|
||||
return Candidate(
|
||||
text: left.text + right.text,
|
||||
value: left.value + right.value,
|
||||
correspondingCount: left.correspondingCount + right.correspondingCount,
|
||||
composingCount: .composite(left.composingCount, right.composingCount),
|
||||
lastMid: right.lastMid,
|
||||
data: left.data + right.data
|
||||
)
|
||||
@ -26,7 +26,7 @@ extension Kana2Kanji {
|
||||
return Candidate(
|
||||
text: left.text + right.text,
|
||||
value: newValue,
|
||||
correspondingCount: left.correspondingCount + right.correspondingCount,
|
||||
composingCount: .composite(left.composingCount, right.composingCount),
|
||||
lastMid: right.lastMid,
|
||||
data: left.data + right.data
|
||||
)
|
||||
@ -57,7 +57,7 @@ extension Kana2Kanji {
|
||||
prefixCandidate.data = prefixCandidateData
|
||||
|
||||
prefixCandidate.text = prefixCandidateData.reduce(into: "") { $0 += $1.word }
|
||||
prefixCandidate.correspondingCount = prefixCandidateData.reduce(into: 0) { $0 += $1.ruby.count }
|
||||
prefixCandidate.composingCount = .surfaceCount(prefixCandidateData.reduce(into: 0) { $0 += $1.ruby.count })
|
||||
}
|
||||
|
||||
totalWord.insert(contentsOf: element.word, at: totalWord.startIndex)
|
||||
|
@ -6,6 +6,7 @@
|
||||
// Copyright © 2020 ensan. All rights reserved.
|
||||
//
|
||||
|
||||
import Algorithms
|
||||
import Foundation
|
||||
import SwiftUtils
|
||||
|
||||
@ -17,29 +18,32 @@ extension Kana2Kanji {
|
||||
/// (2)次に、再度計算して良い候補を得る。
|
||||
func kana2lattice_afterComplete(_ inputData: ComposingText, completedData: Candidate, N_best: Int, previousResult: (inputData: ComposingText, lattice: Lattice), needTypoCorrection: Bool) -> (result: LatticeNode, lattice: Lattice) {
|
||||
debug("確定直後の変換、前は:", previousResult.inputData, "後は:", inputData)
|
||||
let count = inputData.input.count
|
||||
let inputCount = inputData.input.count
|
||||
let surfaceCount = inputData.convertTarget.count
|
||||
// TODO: 実際にはもっとチェックが必要。具体的には、input/convertTarget両方のsuffixが一致する必要がある
|
||||
let convertedInputCount = previousResult.inputData.input.count - inputCount
|
||||
let convertedSurfaceCount = previousResult.inputData.convertTarget.count - surfaceCount
|
||||
// (1)
|
||||
let start = RegisteredNode.fromLastCandidate(completedData)
|
||||
let lattice = previousResult.lattice.suffix(count)
|
||||
for (i, nodeArray) in lattice.enumerated() {
|
||||
if i == .zero {
|
||||
for node in nodeArray {
|
||||
node.prevs = [start]
|
||||
// inputRangeを確定した部分のカウント分ずらす
|
||||
node.inputRange = node.inputRange.startIndex - completedData.correspondingCount ..< node.inputRange.endIndex - completedData.correspondingCount
|
||||
}
|
||||
let indexMap = LatticeDualIndexMap(inputData)
|
||||
let latticeIndices = indexMap.indices(inputCount: inputCount, surfaceCount: surfaceCount)
|
||||
let lattice = previousResult.lattice.suffix(inputCount: inputCount, surfaceCount: surfaceCount)
|
||||
for (isHead, nodeArray) in lattice.indexedNodes(indices: latticeIndices) {
|
||||
let prevs: [RegisteredNode] = if isHead {
|
||||
[start]
|
||||
} else {
|
||||
for node in nodeArray {
|
||||
node.prevs = []
|
||||
// inputRangeを確定した部分のカウント分ずらす
|
||||
node.inputRange = node.inputRange.startIndex - completedData.correspondingCount ..< node.inputRange.endIndex - completedData.correspondingCount
|
||||
}
|
||||
[]
|
||||
}
|
||||
for node in nodeArray {
|
||||
node.prevs = prevs
|
||||
// inputRangeを確定した部分のカウント分ずらす
|
||||
node.range = node.range.offseted(inputOffset: -convertedInputCount, surfaceOffset: -convertedSurfaceCount)
|
||||
}
|
||||
}
|
||||
// (2)
|
||||
let result = LatticeNode.EOSNode
|
||||
|
||||
for (i, nodeArray) in lattice.enumerated() {
|
||||
for (isHead, nodeArray) in lattice.indexedNodes(indices: latticeIndices) {
|
||||
for node in nodeArray {
|
||||
if node.prevs.isEmpty {
|
||||
continue
|
||||
@ -49,7 +53,7 @@ extension Kana2Kanji {
|
||||
}
|
||||
// 生起確率を取得する。
|
||||
let wValue = node.data.value()
|
||||
if i == 0 {
|
||||
if isHead {
|
||||
// valuesを更新する
|
||||
node.values = node.prevs.map {$0.totalValue + wValue + self.dicdataStore.getCCValue($0.data.rcid, node.data.lcid)}
|
||||
} else {
|
||||
@ -57,11 +61,11 @@ extension Kana2Kanji {
|
||||
node.values = node.prevs.map {$0.totalValue + wValue}
|
||||
}
|
||||
// 変換した文字数
|
||||
let nextIndex = node.inputRange.endIndex
|
||||
if nextIndex != count {
|
||||
self.updateNextNodes(with: node, nextNodes: lattice[inputIndex: nextIndex], nBest: N_best)
|
||||
} else {
|
||||
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
|
||||
if nextIndex.inputIndex == inputCount || nextIndex.surfaceIndex == surfaceCount {
|
||||
self.updateResultNode(with: node, resultNode: result)
|
||||
} else {
|
||||
self.updateNextNodes(with: node, nextNodes: lattice[index: nextIndex], nBest: N_best)
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -6,6 +6,7 @@
|
||||
// Copyright © 2020 ensan. All rights reserved.
|
||||
//
|
||||
|
||||
import Algorithms
|
||||
import Foundation
|
||||
import SwiftUtils
|
||||
|
||||
@ -24,28 +25,59 @@ extension Kana2Kanji {
|
||||
///
|
||||
/// (5)ノードをアップデートした上で返却する。
|
||||
|
||||
func kana2lattice_changed(_ inputData: ComposingText, N_best: Int, counts: (deleted: Int, added: Int), previousResult: (inputData: ComposingText, lattice: Lattice), needTypoCorrection: Bool) -> (result: LatticeNode, lattice: Lattice) {
|
||||
func kana2lattice_changed(
|
||||
_ inputData: ComposingText,
|
||||
N_best: Int,
|
||||
counts: (deletedInput: Int, addedInput: Int, deletedSurface: Int, addedSurface: Int),
|
||||
previousResult: (inputData: ComposingText, lattice: Lattice),
|
||||
needTypoCorrection: Bool
|
||||
) -> (result: LatticeNode, lattice: Lattice) {
|
||||
// (0)
|
||||
let count = inputData.input.count
|
||||
let commonCount = previousResult.inputData.input.count - counts.deleted
|
||||
debug("kana2lattice_changed", inputData, counts, previousResult.inputData, count, commonCount)
|
||||
let inputCount = inputData.input.count
|
||||
let surfaceCount = inputData.convertTarget.count
|
||||
let commonInputCount = previousResult.inputData.input.count - counts.deletedInput
|
||||
let commonSurfaceCount = previousResult.inputData.convertTarget.count - counts.deletedSurface
|
||||
debug("kana2lattice_changed", inputData, counts, previousResult.inputData, inputCount, commonInputCount)
|
||||
|
||||
// (1)
|
||||
var lattice = previousResult.lattice.prefix(commonCount)
|
||||
let indexMap = LatticeDualIndexMap(inputData)
|
||||
let latticeIndices = indexMap.indices(inputCount: inputCount, surfaceCount: surfaceCount)
|
||||
var lattice = previousResult.lattice.prefix(inputCount: commonInputCount, surfaceCount: commonSurfaceCount)
|
||||
|
||||
let terminalNodes: Lattice
|
||||
if counts.added == 0 {
|
||||
terminalNodes = Lattice(nodes: lattice.map {
|
||||
var terminalNodes = Lattice(
|
||||
inputCount: inputCount,
|
||||
surfaceCount: surfaceCount,
|
||||
rawNodes: lattice.map {
|
||||
$0.filter {
|
||||
$0.inputRange.endIndex == count
|
||||
$0.range.endIndex == .input(inputCount) || $0.range.endIndex == .surface(surfaceCount)
|
||||
}
|
||||
})
|
||||
} else {
|
||||
}
|
||||
)
|
||||
if !(counts.addedInput == 0 && counts.addedSurface == 0) {
|
||||
// (2)
|
||||
let addedNodes: Lattice = Lattice(nodes: (0..<count).map {(i: Int) in
|
||||
self.dicdataStore.getLOUDSDataInRange(inputData: inputData, from: i, toIndexRange: max(commonCount, i) ..< count, needTypoCorrection: needTypoCorrection)
|
||||
})
|
||||
|
||||
let rawNodes = latticeIndices.map { index in
|
||||
let inputRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let iIndex = index.inputIndex, max(commonInputCount, iIndex) < inputCount {
|
||||
(iIndex, max(commonInputCount, iIndex) ..< inputCount)
|
||||
} else {
|
||||
nil
|
||||
}
|
||||
let surfaceRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let sIndex = index.surfaceIndex, max(commonSurfaceCount, sIndex) < surfaceCount {
|
||||
(sIndex, max(commonSurfaceCount, sIndex) ..< surfaceCount)
|
||||
} else {
|
||||
nil
|
||||
}
|
||||
return self.dicdataStore.lookupDicdata(
|
||||
composingText: inputData,
|
||||
inputRange: inputRange,
|
||||
surfaceRange: surfaceRange,
|
||||
needTypoCorrection: needTypoCorrection
|
||||
)
|
||||
}
|
||||
let addedNodes: Lattice = Lattice(
|
||||
inputCount: inputCount,
|
||||
surfaceCount: surfaceCount,
|
||||
rawNodes: rawNodes
|
||||
)
|
||||
// (3)
|
||||
for nodeArray in lattice {
|
||||
for node in nodeArray {
|
||||
@ -56,12 +88,14 @@ extension Kana2Kanji {
|
||||
continue
|
||||
}
|
||||
// 変換した文字数
|
||||
let nextIndex = node.inputRange.endIndex
|
||||
self.updateNextNodes(with: node, nextNodes: addedNodes[inputIndex: nextIndex], nBest: N_best)
|
||||
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
|
||||
if nextIndex != .bothIndex(inputIndex: inputCount, surfaceIndex: surfaceCount) {
|
||||
self.updateNextNodes(with: node, nextNodes: addedNodes[index: nextIndex], nBest: N_best)
|
||||
}
|
||||
}
|
||||
}
|
||||
lattice.merge(addedNodes)
|
||||
terminalNodes = addedNodes
|
||||
terminalNodes.merge(addedNodes)
|
||||
}
|
||||
|
||||
// (3)
|
||||
@ -86,11 +120,11 @@ extension Kana2Kanji {
|
||||
// valuesを更新する
|
||||
node.values = node.prevs.map {$0.totalValue + wValue}
|
||||
}
|
||||
let nextIndex = node.inputRange.endIndex
|
||||
if count == nextIndex {
|
||||
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
|
||||
if nextIndex.inputIndex == inputCount && nextIndex.surfaceIndex == surfaceCount {
|
||||
self.updateResultNode(with: node, resultNode: result)
|
||||
} else {
|
||||
self.updateNextNodes(with: node, nextNodes: terminalNodes[inputIndex: nextIndex], nBest: N_best)
|
||||
self.updateNextNodes(with: node, nextNodes: terminalNodes[index: nextIndex], nBest: N_best)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -6,6 +6,7 @@
|
||||
// Copyright © 2022 ensan. All rights reserved.
|
||||
//
|
||||
|
||||
import Algorithms
|
||||
import Foundation
|
||||
import SwiftUtils
|
||||
|
||||
@ -26,12 +27,13 @@ extension Kana2Kanji {
|
||||
|
||||
func kana2lattice_no_change(N_best: Int, previousResult: (inputData: ComposingText, lattice: Lattice)) -> (result: LatticeNode, lattice: Lattice) {
|
||||
debug("キャッシュから復元、元の文字は:", previousResult.inputData.convertTarget)
|
||||
let count = previousResult.inputData.input.count
|
||||
let inputCount = previousResult.inputData.input.count
|
||||
let surfaceCount = previousResult.inputData.convertTarget.count
|
||||
// (1)
|
||||
let result = LatticeNode.EOSNode
|
||||
|
||||
for nodeArray in previousResult.lattice {
|
||||
for node in nodeArray where node.inputRange.endIndex == count {
|
||||
for node in nodeArray where node.range.endIndex == .input(inputCount) || node.range.endIndex == .surface(surfaceCount) {
|
||||
if node.prevs.isEmpty {
|
||||
continue
|
||||
}
|
||||
|
@ -34,11 +34,16 @@ struct Kana2Kanji {
|
||||
let text = data.clauses.map {$0.clause.text}.joined()
|
||||
let value = data.clauses.last!.value + mmValue.value
|
||||
let lastMid = data.clauses.last!.clause.mid
|
||||
let correspondingCount = data.clauses.reduce(into: 0) {$0 += $1.clause.inputRange.count}
|
||||
|
||||
let composingCount: ComposingCount = data.clauses.reduce(into: .inputCount(0)) {
|
||||
for range in $1.clause.ranges {
|
||||
$0 = .composite($0, range.count)
|
||||
}
|
||||
}
|
||||
return Candidate(
|
||||
text: text,
|
||||
value: value,
|
||||
correspondingCount: correspondingCount,
|
||||
composingCount: composingCount,
|
||||
lastMid: lastMid,
|
||||
data: data.data
|
||||
)
|
||||
|
@ -1,49 +1,261 @@
|
||||
struct Lattice: Sequence {
|
||||
typealias Element = [LatticeNode]
|
||||
typealias Iterator = IndexingIterator<[[LatticeNode]]>
|
||||
import Algorithms
|
||||
import SwiftUtils
|
||||
|
||||
init(nodes: [[LatticeNode]] = []) {
|
||||
self.nodes = nodes
|
||||
struct LatticeNodeArray: Sequence {
|
||||
typealias Element = LatticeNode
|
||||
|
||||
var inputIndexedNodes: [LatticeNode]
|
||||
var surfaceIndexedNodes: [LatticeNode]
|
||||
|
||||
func makeIterator() -> Chain2Sequence<[LatticeNode], [LatticeNode]>.Iterator {
|
||||
inputIndexedNodes.chained(surfaceIndexedNodes).makeIterator()
|
||||
}
|
||||
}
|
||||
|
||||
struct LatticeDualIndexMap: Sendable {
|
||||
private var inputIndexToSurfaceIndexMap: [Int: Int]
|
||||
init(_ composingText: ComposingText) {
|
||||
self.inputIndexToSurfaceIndexMap = composingText.inputIndexToSurfaceIndexMap()
|
||||
}
|
||||
|
||||
private var nodes: [[LatticeNode]]
|
||||
enum DualIndex: Sendable, Equatable, Hashable {
|
||||
case inputIndex(Int)
|
||||
case surfaceIndex(Int)
|
||||
case bothIndex(inputIndex: Int, surfaceIndex: Int)
|
||||
|
||||
func prefix(_ k: Int) -> Lattice {
|
||||
var lattice = Lattice(nodes: self.nodes.prefix(k).map {(nodes: [LatticeNode]) in
|
||||
nodes.filter {$0.inputRange.endIndex <= k}
|
||||
})
|
||||
while lattice.nodes.last?.isEmpty ?? false {
|
||||
lattice.nodes.removeLast()
|
||||
var inputIndex: Int? {
|
||||
switch self {
|
||||
case .inputIndex(let index), .bothIndex(let index, _):
|
||||
index
|
||||
case .surfaceIndex:
|
||||
nil
|
||||
}
|
||||
}
|
||||
return lattice
|
||||
}
|
||||
|
||||
func suffix(_ count: Int) -> Lattice {
|
||||
Lattice(nodes: self.nodes.suffix(count))
|
||||
}
|
||||
|
||||
mutating func merge(_ lattice: Lattice) {
|
||||
for (index, nodeArray) in lattice.nodes.enumerated() where index < self.nodes.endIndex {
|
||||
self.nodes[index].append(contentsOf: nodeArray)
|
||||
}
|
||||
if self.nodes.endIndex < lattice.nodes.endIndex {
|
||||
for nodeArray in lattice.nodes[self.nodes.endIndex...] {
|
||||
self.nodes.append(nodeArray)
|
||||
var surfaceIndex: Int? {
|
||||
switch self {
|
||||
case .inputIndex:
|
||||
nil
|
||||
case .surfaceIndex(let index), .bothIndex(_, let index):
|
||||
index
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
subscript(inputIndex i: Int) -> [LatticeNode] {
|
||||
get {
|
||||
self.nodes[i]
|
||||
func dualIndex(for latticeIndex: Lattice.LatticeIndex) -> DualIndex {
|
||||
switch latticeIndex {
|
||||
case .input(let iIndex):
|
||||
if let sIndex = self.inputIndexToSurfaceIndexMap[iIndex] {
|
||||
.bothIndex(inputIndex: iIndex, surfaceIndex: sIndex)
|
||||
} else {
|
||||
.inputIndex(iIndex)
|
||||
}
|
||||
case .surface(let sIndex):
|
||||
if let iIndex = self.inputIndexToSurfaceIndexMap.filter({ $0.value == sIndex}).first?.key {
|
||||
.bothIndex(inputIndex: iIndex, surfaceIndex: sIndex)
|
||||
} else {
|
||||
.surfaceIndex(sIndex)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func makeIterator() -> IndexingIterator<[[LatticeNode]]> {
|
||||
self.nodes.makeIterator()
|
||||
func indices(inputCount: Int, surfaceCount: Int) -> [DualIndex] {
|
||||
var indices: [DualIndex] = []
|
||||
var sIndexPointer = 0
|
||||
for i in 0 ..< inputCount {
|
||||
if let sIndex = self.inputIndexToSurfaceIndexMap[i] {
|
||||
for j in sIndexPointer ..< sIndex {
|
||||
indices.append(.surfaceIndex(j))
|
||||
}
|
||||
indices.append(.bothIndex(inputIndex: i, surfaceIndex: sIndex))
|
||||
sIndexPointer = sIndex + 1
|
||||
} else {
|
||||
indices.append(.inputIndex(i))
|
||||
}
|
||||
}
|
||||
for j in sIndexPointer ..< surfaceCount {
|
||||
indices.append(.surfaceIndex(j))
|
||||
}
|
||||
return indices
|
||||
}
|
||||
}
|
||||
|
||||
struct Lattice: Sequence {
|
||||
typealias Element = LatticeNodeArray
|
||||
|
||||
init() {
|
||||
self.inputIndexedNodes = []
|
||||
self.surfaceIndexedNodes = []
|
||||
}
|
||||
|
||||
init(inputCount: Int, surfaceCount: Int, rawNodes: [[LatticeNode]]) {
|
||||
self.inputIndexedNodes = .init(repeating: [], count: inputCount)
|
||||
self.surfaceIndexedNodes = .init(repeating: [], count: surfaceCount)
|
||||
|
||||
for nodes in rawNodes {
|
||||
guard let first = nodes.first else { continue }
|
||||
switch first.range.startIndex {
|
||||
case .surface(let i):
|
||||
self.surfaceIndexedNodes[i].append(contentsOf: nodes)
|
||||
case .input(let i):
|
||||
self.inputIndexedNodes[i].append(contentsOf: nodes)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private init(inputIndexedNodes: [[LatticeNode]], surfaceIndexedNodes: [[LatticeNode]]) {
|
||||
self.inputIndexedNodes = inputIndexedNodes
|
||||
self.surfaceIndexedNodes = surfaceIndexedNodes
|
||||
}
|
||||
|
||||
private var inputIndexedNodes: [[LatticeNode]]
|
||||
private var surfaceIndexedNodes: [[LatticeNode]]
|
||||
|
||||
func prefix(inputCount: Int, surfaceCount: Int) -> Lattice {
|
||||
let filterClosure: (LatticeNode) -> Bool = { (node: LatticeNode) -> Bool in
|
||||
switch node.range.endIndex {
|
||||
case .input(let value):
|
||||
value <= inputCount
|
||||
case .surface(let value):
|
||||
value <= surfaceCount
|
||||
}
|
||||
}
|
||||
let newInputIndexedNodes = self.inputIndexedNodes.prefix(inputCount).map {(nodes: [LatticeNode]) in
|
||||
nodes.filter(filterClosure)
|
||||
}
|
||||
let newSurfaceIndexedNodes = self.surfaceIndexedNodes.prefix(surfaceCount).map {(nodes: [LatticeNode]) in
|
||||
nodes.filter(filterClosure)
|
||||
}
|
||||
|
||||
return Lattice(inputIndexedNodes: newInputIndexedNodes, surfaceIndexedNodes: newSurfaceIndexedNodes)
|
||||
}
|
||||
|
||||
func suffix(inputCount: Int, surfaceCount: Int) -> Lattice {
|
||||
Lattice(
|
||||
inputIndexedNodes: self.inputIndexedNodes.suffix(inputCount),
|
||||
surfaceIndexedNodes: self.surfaceIndexedNodes.suffix(surfaceCount)
|
||||
)
|
||||
}
|
||||
|
||||
mutating func merge(_ lattice: Lattice) {
|
||||
for (index, nodeArray) in lattice.inputIndexedNodes.enumerated() where index < self.inputIndexedNodes.endIndex {
|
||||
self.inputIndexedNodes[index].append(contentsOf: nodeArray)
|
||||
}
|
||||
if self.inputIndexedNodes.endIndex < lattice.inputIndexedNodes.endIndex {
|
||||
for nodeArray in lattice.inputIndexedNodes[self.inputIndexedNodes.endIndex...] {
|
||||
self.inputIndexedNodes.append(nodeArray)
|
||||
}
|
||||
}
|
||||
for (index, nodeArray) in lattice.surfaceIndexedNodes.enumerated() where index < self.surfaceIndexedNodes.endIndex {
|
||||
self.surfaceIndexedNodes[index].append(contentsOf: nodeArray)
|
||||
}
|
||||
if self.surfaceIndexedNodes.endIndex < lattice.surfaceIndexedNodes.endIndex {
|
||||
for nodeArray in lattice.surfaceIndexedNodes[self.surfaceIndexedNodes.endIndex...] {
|
||||
self.surfaceIndexedNodes.append(nodeArray)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
subscript(index index: LatticeDualIndexMap.DualIndex) -> LatticeNodeArray {
|
||||
get {
|
||||
let iNodes: [LatticeNode] = if let iIndex = index.inputIndex { self.inputIndexedNodes[iIndex] } else { [] }
|
||||
let sNodes: [LatticeNode] = if let sIndex = index.surfaceIndex { self.surfaceIndexedNodes[sIndex] } else { [] }
|
||||
return LatticeNodeArray(inputIndexedNodes: iNodes, surfaceIndexedNodes: sNodes)
|
||||
}
|
||||
}
|
||||
|
||||
func indexedNodes(indices: [LatticeDualIndexMap.DualIndex]) -> some Sequence<(isHead: Bool, nodes: LatticeNodeArray)> {
|
||||
indices.lazy.map { index in
|
||||
return (index.inputIndex == 0 && index.surfaceIndex == 0, self[index: index])
|
||||
}
|
||||
}
|
||||
|
||||
struct Iterator: IteratorProtocol {
|
||||
init(lattice: Lattice) {
|
||||
self.lattice = lattice
|
||||
self.indices = (0, lattice.surfaceIndexedNodes.endIndex, 0, lattice.inputIndexedNodes.endIndex)
|
||||
}
|
||||
|
||||
typealias Element = LatticeNodeArray
|
||||
let lattice: Lattice
|
||||
var indices: (currentSurfaceIndex: Int, surfaceEndIndex: Int, currentInputIndex: Int, inputEndIndex: Int)
|
||||
|
||||
mutating func next() -> LatticeNodeArray? {
|
||||
if self.indices.currentSurfaceIndex < self.indices.surfaceEndIndex {
|
||||
defer {
|
||||
self.indices.currentSurfaceIndex += 1
|
||||
}
|
||||
return .init(inputIndexedNodes: [], surfaceIndexedNodes: self.lattice.surfaceIndexedNodes[self.indices.currentSurfaceIndex])
|
||||
} else if self.indices.currentInputIndex < self.indices.inputEndIndex {
|
||||
defer {
|
||||
self.indices.currentInputIndex += 1
|
||||
}
|
||||
return .init(inputIndexedNodes: self.lattice.inputIndexedNodes[self.indices.currentInputIndex], surfaceIndexedNodes: [])
|
||||
} else {
|
||||
return nil
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func makeIterator() -> Iterator {
|
||||
Iterator(lattice: self)
|
||||
}
|
||||
|
||||
var isEmpty: Bool {
|
||||
self.nodes.isEmpty
|
||||
self.inputIndexedNodes.isEmpty && self.surfaceIndexedNodes.isEmpty
|
||||
}
|
||||
|
||||
enum LatticeIndex: Sendable, Equatable, Hashable {
|
||||
case surface(Int)
|
||||
case input(Int)
|
||||
|
||||
var isZero: Bool {
|
||||
self == .surface(0) || self == .input(0)
|
||||
}
|
||||
}
|
||||
|
||||
enum LatticeRange: Sendable, Equatable, Hashable {
|
||||
static var zero: Self {
|
||||
.input(from: 0, to: 0)
|
||||
}
|
||||
case surface(from: Int, to: Int)
|
||||
case input(from: Int, to: Int)
|
||||
|
||||
var count: ComposingCount {
|
||||
switch self {
|
||||
case .surface(let from, let to):
|
||||
.surfaceCount(to - from)
|
||||
case .input(let from, let to):
|
||||
.inputCount(to - from)
|
||||
}
|
||||
}
|
||||
|
||||
var startIndex: LatticeIndex {
|
||||
switch self {
|
||||
case .surface(let from, _):
|
||||
.surface(from)
|
||||
case .input(let from, _):
|
||||
.input(from)
|
||||
}
|
||||
}
|
||||
|
||||
var endIndex: LatticeIndex {
|
||||
switch self {
|
||||
case .surface(_, let to):
|
||||
.surface(to)
|
||||
case .input(_, let to):
|
||||
.input(to)
|
||||
}
|
||||
}
|
||||
|
||||
func offseted(inputOffset: Int, surfaceOffset: Int) -> Self {
|
||||
switch self {
|
||||
case .surface(from: let from, to: let to):
|
||||
.surface(from: from + surfaceOffset, to: to + surfaceOffset)
|
||||
case .input(from: let from, to: let to):
|
||||
.input(from: from + inputOffset, to: to + inputOffset)
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
@ -17,29 +17,29 @@ public final class LatticeNode {
|
||||
/// `prevs`の各要素に対応するスコアのデータ
|
||||
var values: [PValue] = []
|
||||
/// inputData.input内のrange
|
||||
var inputRange: Range<Int>
|
||||
var range: Lattice.LatticeRange
|
||||
|
||||
/// `EOS`に対応するノード。
|
||||
static var EOSNode: LatticeNode {
|
||||
LatticeNode(data: DicdataElement.EOSData, inputRange: 0..<0)
|
||||
LatticeNode(data: DicdataElement.EOSData, range: .zero)
|
||||
}
|
||||
|
||||
init(data: DicdataElement, inputRange: Range<Int>) {
|
||||
init(data: DicdataElement, range: Lattice.LatticeRange) {
|
||||
self.data = data
|
||||
self.values = [data.value()]
|
||||
self.inputRange = inputRange
|
||||
self.range = range
|
||||
}
|
||||
|
||||
/// `LatticeNode`の持っている情報を反映した`RegisteredNode`を作成する
|
||||
/// `LatticeNode`は複数の過去のノードを持つことができるが、`RegisteredNode`は1つしか持たない。
|
||||
func getRegisteredNode(_ index: Int, value: PValue) -> RegisteredNode {
|
||||
RegisteredNode(data: self.data, registered: self.prevs[index], totalValue: value, inputRange: self.inputRange)
|
||||
RegisteredNode(data: self.data, registered: self.prevs[index], totalValue: value, range: self.range)
|
||||
}
|
||||
|
||||
/// 再帰的にノードを遡り、`CandidateData`を構築する関数
|
||||
/// - Returns: 文節単位の区切り情報を持った変換候補データのリスト。
|
||||
/// - Note: 最終的に`EOS`ノードにおいて実行する想定のAPIになっている。
|
||||
func getCandidateData() -> [CandidateData] {
|
||||
self.prevs.map {$0.getCandidateData()}
|
||||
return self.prevs.map {$0.getCandidateData()}
|
||||
}
|
||||
}
|
||||
|
@ -36,7 +36,7 @@ public struct PostCompositionPredictionCandidate {
|
||||
candidate.data.append(data)
|
||||
}
|
||||
candidate.value = self.value
|
||||
candidate.correspondingCount = candidate.data.reduce(into: 0) { $0 += $1.ruby.count }
|
||||
candidate.composingCount = .surfaceCount(candidate.rubyCount)
|
||||
candidate.lastMid = data.last(where: DicdataStore.includeMMValueCalculation)?.mid ?? candidate.lastMid
|
||||
return candidate
|
||||
case .replacement(let targetData, let replacementData):
|
||||
@ -45,7 +45,7 @@ public struct PostCompositionPredictionCandidate {
|
||||
candidate.text = candidate.data.reduce(into: "") {$0 += $1.word}
|
||||
candidate.value = self.value
|
||||
candidate.lastMid = candidate.data.last(where: DicdataStore.includeMMValueCalculation)?.mid ?? MIDData.BOS.mid
|
||||
candidate.correspondingCount = candidate.data.reduce(into: 0) { $0 += $1.ruby.count }
|
||||
candidate.composingCount = .surfaceCount(candidate.rubyCount)
|
||||
return candidate
|
||||
}
|
||||
}
|
||||
|
@ -22,9 +22,17 @@ extension Kana2Kanji {
|
||||
/// - note:
|
||||
/// この関数の役割は意味連接の考慮にある。
|
||||
func getPredictionCandidates(composingText: ComposingText, prepart: CandidateData, lastClause: ClauseDataUnit, N_best: Int) -> [Candidate] {
|
||||
debug("getPredictionCandidates", composingText, lastClause.inputRange, lastClause.text)
|
||||
let lastRuby = ComposingText.getConvertTarget(for: composingText.input[lastClause.inputRange]).toKatakana()
|
||||
let lastRubyCount = lastClause.inputRange.count
|
||||
debug(#function, composingText, lastClause.ranges, lastClause.text)
|
||||
let lastRuby = lastClause.ranges.reduce(into: "") {
|
||||
let ruby = switch $1 {
|
||||
case let .input(left, right):
|
||||
ComposingText.getConvertTarget(for: composingText.input[left..<right]).toKatakana()
|
||||
case let .surface(left, right):
|
||||
String(composingText.convertTarget.dropFirst(left).prefix(right - left)).toKatakana()
|
||||
}
|
||||
$0.append(ruby)
|
||||
}
|
||||
let lastRubyCount = lastRuby.count
|
||||
let datas: [DicdataElement]
|
||||
do {
|
||||
var _str = ""
|
||||
@ -42,11 +50,11 @@ extension Kana2Kanji {
|
||||
|
||||
let osuserdict: [DicdataElement] = dicdataStore.getPrefixMatchDynamicUserDict(lastRuby)
|
||||
|
||||
let lastCandidate: Candidate = prepart.isEmpty ? Candidate(text: "", value: .zero, correspondingCount: 0, lastMid: MIDData.EOS.mid, data: []) : self.processClauseCandidate(prepart)
|
||||
let lastCandidate: Candidate = prepart.isEmpty ? Candidate(text: "", value: .zero, composingCount: .inputCount(0), lastMid: MIDData.EOS.mid, data: []) : self.processClauseCandidate(prepart)
|
||||
let lastRcid: Int = lastCandidate.data.last?.rcid ?? CIDData.EOS.cid
|
||||
let nextLcid: Int = prepart.lastClause?.nextLcid ?? CIDData.EOS.cid
|
||||
let lastMid: Int = lastCandidate.lastMid
|
||||
let correspoindingCount: Int = lastCandidate.correspondingCount + lastRubyCount
|
||||
let composingCount: ComposingCount = .composite(lastCandidate.composingCount, .surfaceCount(lastRubyCount))
|
||||
let ignoreCCValue: PValue = self.dicdataStore.getCCValue(lastRcid, nextLcid)
|
||||
|
||||
let inputStyle = composingText.input.last?.inputStyle ?? .direct
|
||||
@ -63,10 +71,10 @@ extension Kana2Kanji {
|
||||
break
|
||||
}
|
||||
let possibleNexts: [Substring] = DicdataStore.possibleNexts[String(roman), default: []].map {ruby + $0}
|
||||
debug("getPredictionCandidates", lastRuby, ruby, roman, possibleNexts, prepart, lastRubyCount)
|
||||
debug(#function, lastRuby, ruby, roman, possibleNexts, prepart, lastRubyCount)
|
||||
dicdata = possibleNexts.flatMap { self.dicdataStore.getPredictionLOUDSDicdata(key: $0) }
|
||||
} else {
|
||||
debug("getPredicitonCandidates", lastRuby, roman)
|
||||
debug(#function, lastRuby, "roman == \"\"")
|
||||
dicdata = self.dicdataStore.getPredictionLOUDSDicdata(key: lastRuby)
|
||||
}
|
||||
}
|
||||
@ -91,7 +99,7 @@ extension Kana2Kanji {
|
||||
let candidate: Candidate = Candidate(
|
||||
text: lastCandidate.text + data.word,
|
||||
value: newValue,
|
||||
correspondingCount: correspoindingCount,
|
||||
composingCount: composingCount,
|
||||
lastMid: includeMMValueCalculation ? data.mid:lastMid,
|
||||
data: nodedata
|
||||
)
|
||||
|
@ -14,7 +14,7 @@ protocol RegisteredNodeProtocol {
|
||||
var data: DicdataElement {get}
|
||||
var prev: (any RegisteredNodeProtocol)? {get}
|
||||
var totalValue: PValue {get}
|
||||
var inputRange: Range<Int> {get}
|
||||
var range: Lattice.LatticeRange {get}
|
||||
}
|
||||
|
||||
struct RegisteredNode: RegisteredNodeProtocol {
|
||||
@ -25,19 +25,19 @@ struct RegisteredNode: RegisteredNodeProtocol {
|
||||
/// 始点からこのノードまでのコスト
|
||||
let totalValue: PValue
|
||||
/// `composingText`の`input`で対応する範囲
|
||||
let inputRange: Range<Int>
|
||||
let range: Lattice.LatticeRange
|
||||
|
||||
init(data: DicdataElement, registered: RegisteredNode?, totalValue: PValue, inputRange: Range<Int>) {
|
||||
init(data: DicdataElement, registered: RegisteredNode?, totalValue: PValue, range: Lattice.LatticeRange) {
|
||||
self.data = data
|
||||
self.prev = registered
|
||||
self.totalValue = totalValue
|
||||
self.inputRange = inputRange
|
||||
self.range = range
|
||||
}
|
||||
|
||||
/// 始点ノードを生成する関数
|
||||
/// - Returns: 始点ノードのデータ
|
||||
static func BOSNode() -> RegisteredNode {
|
||||
RegisteredNode(data: DicdataElement.BOSData, registered: nil, totalValue: 0, inputRange: 0 ..< 0)
|
||||
RegisteredNode(data: DicdataElement.BOSData, registered: nil, totalValue: 0, range: .zero)
|
||||
}
|
||||
|
||||
/// 入力中、確定した部分を考慮した始点ノードを生成する関数
|
||||
@ -47,7 +47,7 @@ struct RegisteredNode: RegisteredNodeProtocol {
|
||||
data: DicdataElement(word: "", ruby: "", lcid: CIDData.BOS.cid, rcid: candidate.data.last?.rcid ?? CIDData.BOS.cid, mid: candidate.lastMid, value: 0),
|
||||
registered: nil,
|
||||
totalValue: 0,
|
||||
inputRange: 0 ..< 0
|
||||
range: .zero
|
||||
)
|
||||
}
|
||||
}
|
||||
@ -59,7 +59,7 @@ extension RegisteredNodeProtocol {
|
||||
guard let prev else {
|
||||
let unit = ClauseDataUnit()
|
||||
unit.mid = self.data.mid
|
||||
unit.inputRange = self.inputRange
|
||||
unit.ranges = [self.range]
|
||||
return CandidateData(clauses: [(clause: unit, value: .zero)], data: [])
|
||||
}
|
||||
var lastcandidate = prev.getCandidateData() // 自分に至るregisterdそれぞれのデータに処理
|
||||
@ -75,7 +75,7 @@ extension RegisteredNodeProtocol {
|
||||
if lastClause.text.isEmpty || !DicdataStore.isClause(prev.data.rcid, self.data.lcid) {
|
||||
// 文節ではないので、最後に追加する。
|
||||
lastClause.text.append(self.data.word)
|
||||
lastClause.inputRange = lastClause.inputRange.startIndex ..< self.inputRange.endIndex
|
||||
lastClause.ranges.append(self.range)
|
||||
// 最初だった場合を想定している
|
||||
if (lastClause.mid == 500 && self.data.mid != 500) || DicdataStore.includeMMValueCalculation(self.data) {
|
||||
lastClause.mid = self.data.mid
|
||||
@ -88,7 +88,7 @@ extension RegisteredNodeProtocol {
|
||||
else {
|
||||
let unit = ClauseDataUnit()
|
||||
unit.text = self.data.word
|
||||
unit.inputRange = self.inputRange
|
||||
unit.ranges.append(self.range)
|
||||
if DicdataStore.includeMMValueCalculation(self.data) {
|
||||
unit.mid = self.data.mid
|
||||
}
|
||||
|
@ -65,7 +65,7 @@ extension Kana2Kanji {
|
||||
var constraint = zenzaiCache?.getNewConstraint(for: inputData) ?? PrefixConstraint([])
|
||||
debug("initial constraint", constraint)
|
||||
let eosNode = LatticeNode.EOSNode
|
||||
var lattice: Lattice = Lattice(nodes: [])
|
||||
var lattice: Lattice = Lattice()
|
||||
var constructedCandidates: [(RegisteredNode, Candidate)] = []
|
||||
var insertedCandidates: [(RegisteredNode, Candidate)] = []
|
||||
defer {
|
||||
|
@ -17,28 +17,28 @@ final class ClauseDataUnit {
|
||||
/// The text of the unit.
|
||||
var text: String = ""
|
||||
/// The range of the unit in input text.
|
||||
var inputRange: Range<Int> = 0 ..< 0
|
||||
var ranges: [Lattice.LatticeRange] = []
|
||||
|
||||
/// Merge the given unit to this unit.
|
||||
/// - Parameter:
|
||||
/// - unit: The unit to merge.
|
||||
func merge(with unit: ClauseDataUnit) {
|
||||
self.text.append(unit.text)
|
||||
self.inputRange = self.inputRange.startIndex ..< unit.inputRange.endIndex
|
||||
self.ranges.append(contentsOf: unit.ranges)
|
||||
self.nextLcid = unit.nextLcid
|
||||
}
|
||||
}
|
||||
|
||||
extension ClauseDataUnit: Equatable {
|
||||
static func == (lhs: ClauseDataUnit, rhs: ClauseDataUnit) -> Bool {
|
||||
lhs.mid == rhs.mid && lhs.nextLcid == rhs.nextLcid && lhs.text == rhs.text && lhs.inputRange == rhs.inputRange
|
||||
lhs.mid == rhs.mid && lhs.nextLcid == rhs.nextLcid && lhs.text == rhs.text && lhs.ranges == rhs.ranges
|
||||
}
|
||||
}
|
||||
|
||||
#if DEBUG
|
||||
extension ClauseDataUnit: CustomDebugStringConvertible {
|
||||
var debugDescription: String {
|
||||
"ClauseDataUnit(mid: \(mid), nextLcid: \(nextLcid), text: \(text), inputRange: \(inputRange))"
|
||||
"ClauseDataUnit(mid: \(mid), nextLcid: \(nextLcid), text: \(text), ranges: \(ranges))"
|
||||
}
|
||||
}
|
||||
#endif
|
||||
@ -67,14 +67,35 @@ public enum CompleteAction: Equatable, Sendable {
|
||||
case moveCursor(Int)
|
||||
}
|
||||
|
||||
public enum ComposingCount: Equatable, Sendable {
|
||||
/// composingText.inputにおいて対応する文字数。
|
||||
case inputCount(Int)
|
||||
/// composingText.convertTargeにおいて対応する文字数。
|
||||
case surfaceCount(Int)
|
||||
|
||||
/// 複数のカウントの連結
|
||||
indirect case composite(lhs: Self, rhs: Self)
|
||||
|
||||
static func composite(_ lhs: Self, _ rhs: Self) -> Self {
|
||||
switch (lhs, rhs) {
|
||||
case (.inputCount(let l), .inputCount(let r)):
|
||||
.inputCount(l + r)
|
||||
case (.surfaceCount(let l), .surfaceCount(let r)):
|
||||
.surfaceCount(l + r)
|
||||
default:
|
||||
.composite(lhs: lhs, rhs: rhs)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// 変換候補のデータ
|
||||
public struct Candidate: Sendable {
|
||||
/// 入力となるテキスト
|
||||
public var text: String
|
||||
/// 評価値
|
||||
public var value: PValue
|
||||
/// composingText.inputにおいて対応する文字数。
|
||||
public var correspondingCount: Int
|
||||
|
||||
public var composingCount: ComposingCount
|
||||
/// 最後のmid(予測変換に利用)
|
||||
public var lastMid: Int
|
||||
/// DicdataElement列
|
||||
@ -86,14 +107,18 @@ public struct Candidate: Sendable {
|
||||
/// - note: 文字数表示のために追加したフラグ
|
||||
public let inputable: Bool
|
||||
|
||||
public init(text: String, value: PValue, correspondingCount: Int, lastMid: Int, data: [DicdataElement], actions: [CompleteAction] = [], inputable: Bool = true) {
|
||||
/// ルビ文字数
|
||||
public let rubyCount: Int
|
||||
|
||||
public init(text: String, value: PValue, composingCount: ComposingCount, lastMid: Int, data: [DicdataElement], actions: [CompleteAction] = [], inputable: Bool = true) {
|
||||
self.text = text
|
||||
self.value = value
|
||||
self.correspondingCount = correspondingCount
|
||||
self.composingCount = composingCount
|
||||
self.lastMid = lastMid
|
||||
self.data = data
|
||||
self.actions = actions
|
||||
self.inputable = inputable
|
||||
self.rubyCount = self.data.reduce(into: 0) { $0 += $1.ruby.count }
|
||||
}
|
||||
/// 後から`action`を追加した形を生成する関数
|
||||
/// - parameters:
|
||||
@ -138,7 +163,7 @@ public struct Candidate: Sendable {
|
||||
/// 入力を文としたとき、prefixになる文節に対応するCandidateを作る
|
||||
public static func makePrefixClauseCandidate(data: some Collection<DicdataElement>) -> Candidate {
|
||||
var text = ""
|
||||
var correspondingCount = 0
|
||||
var composingCount = 0
|
||||
var lastRcid = CIDData.BOS.cid
|
||||
var lastMid = 501
|
||||
var candidateData: [DicdataElement] = []
|
||||
@ -148,7 +173,7 @@ public struct Candidate: Sendable {
|
||||
break
|
||||
}
|
||||
text.append(item.word)
|
||||
correspondingCount += item.ruby.count
|
||||
composingCount += item.ruby.count
|
||||
lastRcid = item.rcid
|
||||
// 最初だった場合を想定している
|
||||
if item.mid != 500 && DicdataStore.includeMMValueCalculation(item) {
|
||||
@ -159,7 +184,7 @@ public struct Candidate: Sendable {
|
||||
return Candidate(
|
||||
text: text,
|
||||
value: -5,
|
||||
correspondingCount: correspondingCount,
|
||||
composingCount: .surfaceCount(composingCount),
|
||||
lastMid: lastMid,
|
||||
data: candidateData
|
||||
)
|
||||
|
@ -28,8 +28,9 @@ public struct ConvertRequestOptions: Sendable {
|
||||
/// - textReplacer: 予測変換のための置換機を指定します。
|
||||
/// - specialCandidateProviders: 特殊変換を実施する変換関数を挿入します
|
||||
/// - metadata: メタデータを指定します。詳しくは`ConvertRequestOptions.Metadata`を参照してください。
|
||||
public init(N_best: Int = 10, requireJapanesePrediction: Bool, requireEnglishPrediction: Bool, keyboardLanguage: KeyboardLanguage, englishCandidateInRoman2KanaInput: Bool = false, fullWidthRomanCandidate: Bool = false, halfWidthKanaCandidate: Bool = false, learningType: LearningType, maxMemoryCount: Int = 65536, shouldResetMemory: Bool = false, dictionaryResourceURL: URL, memoryDirectoryURL: URL, sharedContainerURL: URL, textReplacer: TextReplacer, specialCandidateProviders: [any SpecialCandidateProvider]?, zenzaiMode: ZenzaiMode = .off, preloadDictionary: Bool = false, metadata: ConvertRequestOptions.Metadata?) {
|
||||
public init(N_best: Int = 10, needTypoCorrection: Bool? = nil, requireJapanesePrediction: Bool, requireEnglishPrediction: Bool, keyboardLanguage: KeyboardLanguage, englishCandidateInRoman2KanaInput: Bool = false, fullWidthRomanCandidate: Bool = false, halfWidthKanaCandidate: Bool = false, learningType: LearningType, maxMemoryCount: Int = 65536, shouldResetMemory: Bool = false, dictionaryResourceURL: URL, memoryDirectoryURL: URL, sharedContainerURL: URL, textReplacer: TextReplacer, specialCandidateProviders: [any SpecialCandidateProvider]?, zenzaiMode: ZenzaiMode = .off, preloadDictionary: Bool = false, metadata: ConvertRequestOptions.Metadata?) {
|
||||
self.N_best = N_best
|
||||
self.needTypoCorrection = needTypoCorrection
|
||||
self.requireJapanesePrediction = requireJapanesePrediction
|
||||
self.requireEnglishPrediction = requireEnglishPrediction
|
||||
self.keyboardLanguage = keyboardLanguage
|
||||
@ -86,6 +87,7 @@ public struct ConvertRequestOptions: Sendable {
|
||||
specialCandidateProviders.append(.commaSeparatedNumber)
|
||||
|
||||
self.N_best = N_best
|
||||
self.needTypoCorrection = nil
|
||||
self.requireJapanesePrediction = requireJapanesePrediction
|
||||
self.requireEnglishPrediction = requireEnglishPrediction
|
||||
self.keyboardLanguage = keyboardLanguage
|
||||
@ -157,6 +159,7 @@ public struct ConvertRequestOptions: Sendable {
|
||||
public var requireJapanesePrediction: Bool
|
||||
public var requireEnglishPrediction: Bool
|
||||
public var keyboardLanguage: KeyboardLanguage
|
||||
public var needTypoCorrection: Bool?
|
||||
// KeyboardSettingのinjection用途
|
||||
public var englishCandidateInRoman2KanaInput: Bool
|
||||
public var fullWidthRomanCandidate: Bool
|
||||
@ -183,6 +186,7 @@ public struct ConvertRequestOptions: Sendable {
|
||||
static var `default`: Self {
|
||||
Self(
|
||||
N_best: 10,
|
||||
needTypoCorrection: nil,
|
||||
requireJapanesePrediction: true,
|
||||
requireEnglishPrediction: true,
|
||||
keyboardLanguage: .ja_JP,
|
||||
|
@ -168,7 +168,7 @@ import EfficientNGram
|
||||
var textIndex = [String: Int]()
|
||||
for candidate in candidates where !candidate.text.isEmpty && !seenCandidates.contains(candidate.text) {
|
||||
if let index = textIndex[candidate.text] {
|
||||
if result[index].value < candidate.value || result[index].correspondingCount < candidate.correspondingCount {
|
||||
if result[index].value < candidate.value || result[index].rubyCount < candidate.rubyCount {
|
||||
result[index] = candidate
|
||||
}
|
||||
} else {
|
||||
@ -219,7 +219,7 @@ import EfficientNGram
|
||||
let candidate: Candidate = Candidate(
|
||||
text: ruby,
|
||||
value: penalty,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: data
|
||||
)
|
||||
@ -232,7 +232,7 @@ import EfficientNGram
|
||||
let candidate: Candidate = Candidate(
|
||||
text: word,
|
||||
value: value,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: data
|
||||
)
|
||||
@ -251,7 +251,7 @@ import EfficientNGram
|
||||
let candidate: Candidate = Candidate(
|
||||
text: ruby,
|
||||
value: penalty,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: data
|
||||
)
|
||||
@ -264,7 +264,7 @@ import EfficientNGram
|
||||
let candidate: Candidate = Candidate(
|
||||
text: word,
|
||||
value: value,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: data
|
||||
)
|
||||
@ -368,7 +368,7 @@ import EfficientNGram
|
||||
private func getAdditionalCandidate(_ inputData: ComposingText, options: ConvertRequestOptions) -> [Candidate] {
|
||||
var candidates: [Candidate] = []
|
||||
let string = inputData.convertTarget.toKatakana()
|
||||
let correspondingCount = inputData.input.count
|
||||
let composingCount: ComposingCount = .inputCount(inputData.input.count)
|
||||
do {
|
||||
// カタカナ
|
||||
let value = -14 * getKatakanaScore(string)
|
||||
@ -376,7 +376,7 @@ import EfficientNGram
|
||||
let katakana = Candidate(
|
||||
text: string,
|
||||
value: value,
|
||||
correspondingCount: correspondingCount,
|
||||
composingCount: composingCount,
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [data]
|
||||
)
|
||||
@ -390,7 +390,7 @@ import EfficientNGram
|
||||
let hiragana = Candidate(
|
||||
text: hiraganaString,
|
||||
value: -14.5,
|
||||
correspondingCount: correspondingCount,
|
||||
composingCount: composingCount,
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [data]
|
||||
)
|
||||
@ -403,7 +403,7 @@ import EfficientNGram
|
||||
let uppercasedLetter = Candidate(
|
||||
text: word,
|
||||
value: -14.6,
|
||||
correspondingCount: correspondingCount,
|
||||
composingCount: composingCount,
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [data]
|
||||
)
|
||||
@ -416,7 +416,7 @@ import EfficientNGram
|
||||
let fullWidthLetter = Candidate(
|
||||
text: word,
|
||||
value: -14.7,
|
||||
correspondingCount: correspondingCount,
|
||||
composingCount: composingCount,
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [data]
|
||||
)
|
||||
@ -429,7 +429,7 @@ import EfficientNGram
|
||||
let halfWidthKatakana = Candidate(
|
||||
text: word,
|
||||
value: -15,
|
||||
correspondingCount: correspondingCount,
|
||||
composingCount: composingCount,
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [data]
|
||||
)
|
||||
@ -472,7 +472,7 @@ import EfficientNGram
|
||||
return Candidate(
|
||||
text: first.clause.text,
|
||||
value: first.value,
|
||||
correspondingCount: first.clause.inputRange.count,
|
||||
composingCount: first.clause.ranges.reduce(into: .inputCount(0)) { $0 = .composite($0, $1.count) },
|
||||
lastMid: first.clause.mid,
|
||||
data: Array(candidateData.data[0...count])
|
||||
)
|
||||
@ -529,21 +529,21 @@ import EfficientNGram
|
||||
var seenCandidate: Set<String> = full_candidate.mapSet {$0.text}
|
||||
// 文節のみ変換するパターン(上位5件)
|
||||
let clause_candidates = self.getUniqueCandidate(clauseCandidates, seenCandidates: seenCandidate).min(count: 5) {
|
||||
if $0.correspondingCount == $1.correspondingCount {
|
||||
if $0.rubyCount == $1.rubyCount {
|
||||
$0.value > $1.value
|
||||
} else {
|
||||
$0.correspondingCount > $1.correspondingCount
|
||||
$0.rubyCount > $1.rubyCount
|
||||
}
|
||||
}
|
||||
seenCandidate.formUnion(clause_candidates.map {$0.text})
|
||||
|
||||
// 最初の辞書データ
|
||||
let dicCandidates: [Candidate] = result.lattice[inputIndex: 0]
|
||||
let dicCandidates: [Candidate] = result.lattice[index: .bothIndex(inputIndex: 0, surfaceIndex: 0)]
|
||||
.map {
|
||||
Candidate(
|
||||
text: $0.data.word,
|
||||
value: $0.data.value(),
|
||||
correspondingCount: $0.inputRange.count,
|
||||
composingCount: $0.range.count,
|
||||
lastMid: $0.data.mid,
|
||||
data: [$0.data]
|
||||
)
|
||||
@ -554,8 +554,8 @@ import EfficientNGram
|
||||
// 文字列の長さごとに並べ、かつその中で評価の高いものから順に並べる。
|
||||
var word_candidates: [Candidate] = self.getUniqueCandidate(dicCandidates.chained(additionalCandidates), seenCandidates: seenCandidate)
|
||||
.sorted {
|
||||
let count0 = $0.correspondingCount
|
||||
let count1 = $1.correspondingCount
|
||||
let count0 = $0.rubyCount
|
||||
let count1 = $1.rubyCount
|
||||
return count0 == count1 ? $0.value > $1.value : count0 > count1
|
||||
}
|
||||
seenCandidate.formUnion(word_candidates.map {$0.text})
|
||||
@ -589,13 +589,17 @@ import EfficientNGram
|
||||
item.parseTemplate()
|
||||
}
|
||||
// 文節のみ変換するパターン(上位5件)
|
||||
let firstClauseResults = self.getUniqueCandidate(clauseCandidates).min(count: 5) {
|
||||
if $0.correspondingCount == $1.correspondingCount {
|
||||
var firstClauseResults = self.getUniqueCandidate(clauseCandidates).min(count: 5) {
|
||||
if $0.rubyCount == $1.rubyCount {
|
||||
$0.value > $1.value
|
||||
} else {
|
||||
$0.correspondingCount > $1.correspondingCount
|
||||
$0.rubyCount > $1.rubyCount
|
||||
}
|
||||
}
|
||||
firstClauseResults.mutatingForEach { item in
|
||||
item.withActions(self.getAppropriateActions(item))
|
||||
item.parseTemplate()
|
||||
}
|
||||
return ConversionResult(mainResults: result, firstClauseResults: firstClauseResults)
|
||||
}
|
||||
|
||||
@ -605,7 +609,7 @@ import EfficientNGram
|
||||
/// - N_best: 計算途中で保存する候補数。実際に得られる候補数とは異なる。
|
||||
/// - Returns:
|
||||
/// 結果のラティスノードと、計算済みノードの全体
|
||||
private func convertToLattice(_ inputData: ComposingText, N_best: Int, zenzaiMode: ConvertRequestOptions.ZenzaiMode) -> (result: LatticeNode, lattice: Lattice)? {
|
||||
private func convertToLattice(_ inputData: ComposingText, N_best: Int, zenzaiMode: ConvertRequestOptions.ZenzaiMode, needTypoCorrection: Bool) -> (result: LatticeNode, lattice: Lattice)? {
|
||||
if inputData.convertTarget.isEmpty {
|
||||
return nil
|
||||
}
|
||||
@ -625,11 +629,6 @@ import EfficientNGram
|
||||
self.previousInputData = inputData
|
||||
return (result, nodes)
|
||||
}
|
||||
#if os(iOS)
|
||||
let needTypoCorrection = true
|
||||
#else
|
||||
let needTypoCorrection = false
|
||||
#endif
|
||||
|
||||
guard let previousInputData else {
|
||||
debug("\(#function): 新規計算用の関数を呼びますA")
|
||||
@ -662,7 +661,7 @@ import EfficientNGram
|
||||
let diff = inputData.differenceSuffix(to: previousInputData)
|
||||
|
||||
debug("\(#function): 最後尾文字置換用の関数を呼びます、差分は\(diff)")
|
||||
let result = converter.kana2lattice_changed(inputData, N_best: N_best, counts: (diff.deleted, diff.addedCount), previousResult: (inputData: previousInputData, lattice: self.lattice), needTypoCorrection: needTypoCorrection)
|
||||
let result = converter.kana2lattice_changed(inputData, N_best: N_best, counts: diff, previousResult: (inputData: previousInputData, lattice: self.lattice), needTypoCorrection: needTypoCorrection)
|
||||
self.previousInputData = inputData
|
||||
return result
|
||||
}
|
||||
@ -698,7 +697,14 @@ import EfficientNGram
|
||||
// DicdataStoreにRequestOptionを通知する
|
||||
self.sendToDicdataStore(.setRequestOptions(options))
|
||||
|
||||
guard let result = self.convertToLattice(inputData, N_best: options.N_best, zenzaiMode: options.zenzaiMode) else {
|
||||
#if os(iOS)
|
||||
let needTypoCorrection = options.needTypoCorrection ?? true
|
||||
#else
|
||||
let needTypoCorrection = options.needTypoCorrection ?? false
|
||||
#endif
|
||||
|
||||
|
||||
guard let result = self.convertToLattice(inputData, N_best: options.N_best, zenzaiMode: options.zenzaiMode, needTypoCorrection: needTypoCorrection) else {
|
||||
return ConversionResult(mainResults: [], firstClauseResults: [])
|
||||
}
|
||||
|
||||
|
@ -21,7 +21,7 @@ extension KanaKanjiConverter {
|
||||
return result.map {[Candidate(
|
||||
text: $0,
|
||||
value: -15,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [DicdataElement(word: $0, ruby: string, cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -15)]
|
||||
)]} ?? []
|
||||
@ -116,7 +116,7 @@ extension KanaKanjiConverter {
|
||||
Candidate(
|
||||
text: $0,
|
||||
value: -18,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.年.mid,
|
||||
data: [DicdataElement(word: $0, ruby: string, cid: CIDData.一般名詞.cid, mid: MIDData.年.mid, value: -18)]
|
||||
)
|
||||
@ -125,7 +125,7 @@ extension KanaKanjiConverter {
|
||||
Candidate(
|
||||
text: $0,
|
||||
value: -19,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.年.mid,
|
||||
data: [DicdataElement(word: $0, ruby: string, cid: CIDData.一般名詞.cid, mid: MIDData.年.mid, value: -19)]
|
||||
)
|
||||
|
@ -38,7 +38,7 @@ extension KanaKanjiConverter {
|
||||
let candidate = Candidate(
|
||||
text: result,
|
||||
value: -10,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [DicdataElement(word: result, ruby: ruby, cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -10)]
|
||||
)
|
||||
|
@ -46,7 +46,7 @@ extension KanaKanjiConverter {
|
||||
Candidate(
|
||||
text: address,
|
||||
value: baseValue - PValue(i),
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [DicdataElement(word: address, ruby: string, cid: .zero, mid: MIDData.一般.mid, value: baseValue - PValue(i))]
|
||||
)
|
||||
|
@ -37,7 +37,7 @@ extension KanaKanjiConverter {
|
||||
Candidate(
|
||||
text: $0,
|
||||
value: -15,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [DicdataElement(word: $0, ruby: string, cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -15)]
|
||||
)
|
||||
|
@ -17,7 +17,7 @@ extension KanaKanjiConverter {
|
||||
let candidate = Candidate(
|
||||
text: timeExpression,
|
||||
value: -10,
|
||||
correspondingCount: numberString.count,
|
||||
composingCount: .surfaceCount(numberString.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [DicdataElement(word: timeExpression, ruby: numberString, cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -10)]
|
||||
)
|
||||
@ -31,7 +31,7 @@ extension KanaKanjiConverter {
|
||||
let candidate = Candidate(
|
||||
text: timeExpression,
|
||||
value: -10,
|
||||
correspondingCount: numberString.count,
|
||||
composingCount: .surfaceCount(numberString.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [DicdataElement(word: timeExpression, ruby: numberString, cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -10)]
|
||||
)
|
||||
|
@ -22,7 +22,7 @@ extension KanaKanjiConverter {
|
||||
Candidate(
|
||||
text: char,
|
||||
value: value0,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [DicdataElement(word: char, ruby: string, cid: .zero, mid: MIDData.一般.mid, value: value0)]
|
||||
)
|
||||
|
@ -20,7 +20,7 @@ extension KanaKanjiConverter {
|
||||
return [Candidate(
|
||||
text: versionString,
|
||||
value: -30,
|
||||
correspondingCount: inputData.input.count,
|
||||
composingCount: .inputCount(inputData.input.count),
|
||||
lastMid: MIDData.一般.mid,
|
||||
data: [DicdataElement(word: versionString, ruby: inputData.convertTarget.toKatakana(), cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -30)]
|
||||
)]
|
||||
|
@ -242,20 +242,93 @@ public final class DicdataStore {
|
||||
return [louds.searchNodeIndex(chars: charIDs)].compactMap {$0}
|
||||
}
|
||||
|
||||
private struct UnifiedGenerator {
|
||||
struct SurfaceGenerator {
|
||||
var surface: [Character] = []
|
||||
var range: TypoCorrectionGenerator.ProcessRange
|
||||
var currentIndex: Int
|
||||
|
||||
init(surface: [Character], range: TypoCorrectionGenerator.ProcessRange) {
|
||||
self.surface = surface
|
||||
self.range = range
|
||||
self.currentIndex = range.rightIndexRange.lowerBound
|
||||
}
|
||||
|
||||
mutating func setUnreachablePath<C: Collection<Character>>(target: C) where C.Indices == Range<Int> {
|
||||
if self.surface[self.range.leftIndex...].hasPrefix(target) {
|
||||
// new upper boundを計算
|
||||
let currentLowerBound = self.range.rightIndexRange.lowerBound
|
||||
let currentUpperBound = self.range.rightIndexRange.upperBound
|
||||
let targetUpperBound = self.range.leftIndex + target.indices.upperBound
|
||||
self.range.rightIndexRange = min(currentLowerBound, targetUpperBound) ..< min(currentUpperBound, targetUpperBound)
|
||||
}
|
||||
}
|
||||
|
||||
mutating func next() -> ([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))? {
|
||||
if self.surface.indices.contains(self.currentIndex), self.currentIndex < self.range.rightIndexRange.upperBound {
|
||||
defer {
|
||||
self.currentIndex += 1
|
||||
}
|
||||
let characters = Array(self.surface[self.range.leftIndex ... self.currentIndex])
|
||||
return (characters, (.surface(self.currentIndex), 0))
|
||||
}
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
var typoCorrectionGenerator: TypoCorrectionGenerator? = nil
|
||||
var surfaceGenerator: SurfaceGenerator? = nil
|
||||
|
||||
mutating func register(_ generator: TypoCorrectionGenerator) {
|
||||
self.typoCorrectionGenerator = generator
|
||||
}
|
||||
mutating func register(_ generator: SurfaceGenerator) {
|
||||
self.surfaceGenerator = generator
|
||||
}
|
||||
mutating func setUnreachablePath<C: Collection<Character>>(target: C) where C.Indices == Range<Int> {
|
||||
self.typoCorrectionGenerator?.setUnreachablePath(target: target)
|
||||
self.surfaceGenerator?.setUnreachablePath(target: target)
|
||||
}
|
||||
mutating func next() -> ([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))? {
|
||||
if let next = self.surfaceGenerator?.next() {
|
||||
return next
|
||||
}
|
||||
if let next = self.typoCorrectionGenerator?.next() {
|
||||
return next
|
||||
}
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
func movingTowardPrefixSearch(
|
||||
inputs: [ComposingText.InputElement],
|
||||
leftIndex: Int,
|
||||
rightIndexRange: Range<Int>,
|
||||
composingText: ComposingText,
|
||||
inputProcessRange: TypoCorrectionGenerator.ProcessRange?,
|
||||
surfaceProcessRange: TypoCorrectionGenerator.ProcessRange?,
|
||||
useMemory: Bool,
|
||||
needTypoCorrection: Bool
|
||||
) -> (
|
||||
stringToInfo: [[Character]: (endIndex: Int, penalty: PValue)],
|
||||
stringToInfo: [[Character]: (endIndex: Lattice.LatticeIndex, penalty: PValue)],
|
||||
indices: [(key: String, indices: [Int])],
|
||||
temporaryMemoryDicdata: [DicdataElement]
|
||||
) {
|
||||
var generator = TypoCorrectionGenerator(inputs: inputs, leftIndex: leftIndex, rightIndexRange: rightIndexRange, needTypoCorrection: needTypoCorrection)
|
||||
var generator = UnifiedGenerator()
|
||||
if let surfaceProcessRange {
|
||||
let surfaceGenerator = UnifiedGenerator.SurfaceGenerator(
|
||||
surface: Array(composingText.convertTarget.toKatakana()),
|
||||
range: surfaceProcessRange
|
||||
)
|
||||
generator.register(surfaceGenerator)
|
||||
}
|
||||
if let inputProcessRange {
|
||||
let typoCorrectionGenerator = TypoCorrectionGenerator(
|
||||
inputs: composingText.input,
|
||||
range: inputProcessRange,
|
||||
needTypoCorrection: needTypoCorrection
|
||||
)
|
||||
generator.register(typoCorrectionGenerator)
|
||||
}
|
||||
var targetLOUDS: [String: LOUDS.MovingTowardPrefixSearchHelper] = [:]
|
||||
var stringToInfo: [([Character], (endIndex: Int, penalty: PValue))] = []
|
||||
var stringToInfo: [([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))] = []
|
||||
// 動的辞書(一時学習データ、動的ユーザ辞書)から取り出されたデータ
|
||||
var dynamicDicdata: [Int: [DicdataElement]] = [:]
|
||||
// ジェネレータを舐める
|
||||
@ -332,8 +405,25 @@ public final class DicdataStore {
|
||||
}
|
||||
let minCount = stringToInfo.map {$0.0.count}.min() ?? 0
|
||||
return (
|
||||
Dictionary(stringToInfo, uniquingKeysWith: {$0.penalty < $1.penalty ? $1 : $0}),
|
||||
targetLOUDS.map { ($0.key, $0.value.indicesInDepth(depth: minCount - 1 ..< .max) )},
|
||||
Dictionary(
|
||||
stringToInfo,
|
||||
uniquingKeysWith: { (lhs, rhs) in
|
||||
if lhs.penalty < rhs.penalty {
|
||||
return lhs
|
||||
} else if lhs.penalty == rhs.penalty {
|
||||
return switch (lhs.endIndex, rhs.endIndex) {
|
||||
case (.input, .input), (.surface, .surface): lhs // どっちでもいい
|
||||
case (.surface, .input): lhs // surfaceIndexを優先
|
||||
case (.input, .surface): rhs // surfaceIndexを優先
|
||||
}
|
||||
} else {
|
||||
return rhs
|
||||
}
|
||||
}
|
||||
),
|
||||
targetLOUDS.map {
|
||||
($0.key, $0.value.indicesInDepth(depth: minCount - 1 ..< .max))
|
||||
},
|
||||
dynamicDicdata.flatMap {
|
||||
minCount < $0.key + 1 ? $0.value : []
|
||||
}
|
||||
@ -375,30 +465,69 @@ public final class DicdataStore {
|
||||
}
|
||||
return data
|
||||
}
|
||||
|
||||
/// kana2latticeから参照する。
|
||||
|
||||
/// 辞書データを取得する
|
||||
/// - Parameters:
|
||||
/// - inputData: 入力データ
|
||||
/// - from: 起点
|
||||
/// - toIndexRange: `from ..< (toIndexRange)`の範囲で辞書ルックアップを行う。
|
||||
public func getLOUDSDataInRange(inputData: ComposingText, from fromIndex: Int, toIndexRange: Range<Int>? = nil, needTypoCorrection: Bool = true) -> [LatticeNode] {
|
||||
let toIndexLeft = toIndexRange?.startIndex ?? fromIndex
|
||||
let toIndexRight = min(toIndexRange?.endIndex ?? inputData.input.count, fromIndex + self.maxlength)
|
||||
if fromIndex > toIndexLeft || toIndexLeft >= toIndexRight {
|
||||
debug(#function, "index is wrong")
|
||||
/// - composingText: 現在の入力情報
|
||||
/// - inputRange: 検索に用いる`composingText.input`の範囲。
|
||||
/// - surfaceRange: 検索に用いる`composingText.convertTarget`の範囲。
|
||||
/// - needTypoCorrection: 誤り訂正を行うかどうか
|
||||
/// - Returns: 発見された辞書データを`LatticeNode`のインスタンスとしたもの。
|
||||
public func lookupDicdata(
|
||||
composingText: ComposingText,
|
||||
inputRange:(startIndex: Int, endIndexRange: Range<Int>?)? = nil,
|
||||
surfaceRange: (startIndex: Int, endIndexRange: Range<Int>?)? = nil,
|
||||
needTypoCorrection: Bool = true
|
||||
) -> [LatticeNode] {
|
||||
|
||||
let inputProcessRange: TypoCorrectionGenerator.ProcessRange?
|
||||
if let inputRange {
|
||||
let toInputIndexLeft = inputRange.endIndexRange?.startIndex ?? inputRange.startIndex
|
||||
let toInputIndexRight = min(
|
||||
inputRange.endIndexRange?.endIndex ?? composingText.input.count,
|
||||
inputRange.startIndex + self.maxlength
|
||||
)
|
||||
if inputRange.startIndex > toInputIndexLeft || toInputIndexLeft >= toInputIndexRight {
|
||||
debug(#function, "index is wrong", inputRange)
|
||||
return []
|
||||
}
|
||||
inputProcessRange = .init(leftIndex: inputRange.startIndex, rightIndexRange: toInputIndexLeft ..< toInputIndexRight)
|
||||
} else {
|
||||
inputProcessRange = nil
|
||||
}
|
||||
|
||||
let surfaceProcessRange: TypoCorrectionGenerator.ProcessRange?
|
||||
if let surfaceRange {
|
||||
let toSurfaceIndexLeft = surfaceRange.endIndexRange?.startIndex ?? surfaceRange.startIndex
|
||||
let toSurfaceIndexRight = min(
|
||||
surfaceRange.endIndexRange?.endIndex ?? composingText.convertTarget.count,
|
||||
surfaceRange.startIndex + self.maxlength
|
||||
)
|
||||
if surfaceRange.startIndex > toSurfaceIndexLeft || toSurfaceIndexLeft >= toSurfaceIndexRight {
|
||||
debug(#function, "index is wrong", surfaceRange)
|
||||
return []
|
||||
}
|
||||
surfaceProcessRange = .init(leftIndex: surfaceRange.startIndex, rightIndexRange: toSurfaceIndexLeft ..< toSurfaceIndexRight)
|
||||
} else {
|
||||
surfaceProcessRange = nil
|
||||
}
|
||||
if inputProcessRange == nil && surfaceProcessRange == nil {
|
||||
debug(#function, "either of inputProcessRange and surfaceProcessRange must not be nil")
|
||||
return []
|
||||
}
|
||||
|
||||
let segments = (fromIndex ..< toIndexRight).reduce(into: []) { (segments: inout [String], rightIndex: Int) in
|
||||
segments.append((segments.last ?? "") + String(inputData.input[rightIndex].character.toKatakana()))
|
||||
}
|
||||
// MARK: 誤り訂正の対象を列挙する。非常に重い処理。
|
||||
var (stringToInfo, indices, dicdata) = self.movingTowardPrefixSearch(inputs: inputData.input, leftIndex: fromIndex, rightIndexRange: toIndexLeft ..< toIndexRight, useMemory: self.learningManager.enabled, needTypoCorrection: needTypoCorrection)
|
||||
var (stringToInfo, indices, dicdata) = self.movingTowardPrefixSearch(
|
||||
composingText: composingText,
|
||||
inputProcessRange: inputProcessRange,
|
||||
surfaceProcessRange: surfaceProcessRange,
|
||||
useMemory: self.learningManager.enabled,
|
||||
needTypoCorrection: needTypoCorrection
|
||||
)
|
||||
// MARK: 検索によって得たindicesから辞書データを実際に取り出していく
|
||||
for (identifier, value) in indices {
|
||||
let result: [DicdataElement] = self.getDicdataFromLoudstxt3(identifier: identifier, indices: value).compactMap { (data) -> DicdataElement? in
|
||||
let rubyArray = Array(data.ruby)
|
||||
let penalty = stringToInfo[rubyArray, default: (0, .zero)].penalty
|
||||
let penalty = stringToInfo[rubyArray]?.penalty ?? 0
|
||||
if penalty.isZero {
|
||||
return data
|
||||
}
|
||||
@ -413,34 +542,39 @@ public final class DicdataStore {
|
||||
dicdata.append(contentsOf: result)
|
||||
}
|
||||
|
||||
for i in toIndexLeft ..< toIndexRight {
|
||||
do {
|
||||
let result = self.getWiseDicdata(convertTarget: segments[i - fromIndex], inputData: inputData, inputRange: fromIndex ..< i + 1)
|
||||
// 機械的に一部のデータを生成する
|
||||
if let surfaceProcessRange {
|
||||
let chars = Array(composingText.convertTarget.toKatakana())
|
||||
var segment = String(chars[surfaceProcessRange.leftIndex ..< surfaceProcessRange.rightIndexRange.lowerBound])
|
||||
for i in surfaceProcessRange.rightIndexRange {
|
||||
segment.append(String(chars[i]))
|
||||
let result = self.getWiseDicdata(
|
||||
convertTarget: segment,
|
||||
inputData: composingText,
|
||||
surfaceRange: surfaceProcessRange.leftIndex ..< i + 1
|
||||
)
|
||||
for item in result {
|
||||
stringToInfo[Array(item.ruby)] = (i, 0)
|
||||
stringToInfo[Array(item.ruby)] = (.surface(i), 0)
|
||||
}
|
||||
dicdata.append(contentsOf: result)
|
||||
}
|
||||
}
|
||||
if fromIndex == .zero {
|
||||
let result: [LatticeNode] = dicdata.compactMap {
|
||||
guard let endIndex = stringToInfo[Array($0.ruby)]?.endIndex else {
|
||||
return nil
|
||||
}
|
||||
let node = LatticeNode(data: $0, inputRange: fromIndex ..< endIndex + 1)
|
||||
let needBOS = inputRange?.startIndex == .zero || surfaceRange?.startIndex == .zero
|
||||
let result: [LatticeNode] = dicdata.compactMap {
|
||||
guard let endIndex = stringToInfo[Array($0.ruby)]?.endIndex else {
|
||||
return nil
|
||||
}
|
||||
let range: Lattice.LatticeRange = switch endIndex {
|
||||
case .input(let endIndex): .input(from: (inputRange?.startIndex)!, to: endIndex + 1)
|
||||
case .surface(let endIndex): .surface(from: (surfaceRange?.startIndex)!, to: endIndex + 1)
|
||||
}
|
||||
let node = LatticeNode(data: $0, range: range)
|
||||
if needBOS {
|
||||
node.prevs.append(RegisteredNode.BOSNode())
|
||||
return node
|
||||
}
|
||||
return result
|
||||
} else {
|
||||
let result: [LatticeNode] = dicdata.compactMap {
|
||||
guard let endIndex = stringToInfo[Array($0.ruby)]?.endIndex else {
|
||||
return nil
|
||||
}
|
||||
return LatticeNode(data: $0, inputRange: fromIndex ..< endIndex + 1)
|
||||
}
|
||||
return result
|
||||
return node
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
func getZeroHintPredictionDicdata(lastRcid: Int) -> [DicdataElement] {
|
||||
@ -510,35 +644,27 @@ public final class DicdataStore {
|
||||
/// - convertTarget: カタカナ変換済みの文字列
|
||||
/// - note
|
||||
/// - 入力全体をカタカナとかひらがなに変換するやつは、Converter側でやっているので注意。
|
||||
func getWiseDicdata(convertTarget: String, inputData: ComposingText, inputRange: Range<Int>) -> [DicdataElement] {
|
||||
func getWiseDicdata(convertTarget: String, inputData: ComposingText, surfaceRange: Range<Int>) -> [DicdataElement] {
|
||||
print(#function, convertTarget, inputData, surfaceRange)
|
||||
var result: [DicdataElement] = []
|
||||
result.append(contentsOf: self.getJapaneseNumberDicdata(head: convertTarget))
|
||||
if inputData.input[..<inputRange.startIndex].last?.character.isNumber != true && inputData.input[inputRange.endIndex...].first?.character.isNumber != true, let number = Int(convertTarget) {
|
||||
if inputData.convertTarget.prefix(surfaceRange.lowerBound).last?.isNumber != true,
|
||||
inputData.convertTarget.dropFirst(surfaceRange.upperBound).first?.isNumber != true,
|
||||
let number = Int(convertTarget) {
|
||||
result.append(DicdataElement(ruby: convertTarget, cid: CIDData.数.cid, mid: MIDData.小さい数字.mid, value: -14))
|
||||
if Double(number) <= 1E12 && -1E12 <= Double(number), let kansuji = self.numberFormatter.string(from: NSNumber(value: number)) {
|
||||
result.append(DicdataElement(word: kansuji, ruby: convertTarget, cid: CIDData.数.cid, mid: MIDData.小さい数字.mid, value: -16))
|
||||
}
|
||||
}
|
||||
|
||||
// convertTargetを英単語として候補に追加する
|
||||
if requestOptions.keyboardLanguage == .en_US && convertTarget.onlyRomanAlphabet {
|
||||
result.append(DicdataElement(ruby: convertTarget, cid: CIDData.固有名詞.cid, mid: MIDData.英単語.mid, value: -14))
|
||||
}
|
||||
|
||||
// ローマ字入力の場合、単体でひらがな・カタカナ化した候補も追加
|
||||
if requestOptions.keyboardLanguage != .en_US && inputData.input[inputRange].allSatisfy({$0.inputStyle == .roman2kana}) {
|
||||
let roman = String(inputData.input[inputRange].map(\.character))
|
||||
if let katakana = Roman2Kana.katakanaChanges[roman], let hiragana = Roman2Kana.hiraganaChanges[Array(roman)] {
|
||||
result.append(DicdataElement(word: String(hiragana), ruby: katakana, cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -13))
|
||||
result.append(DicdataElement(ruby: katakana, cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -14))
|
||||
}
|
||||
}
|
||||
|
||||
// 入力を全てひらがな、カタカナに変換したものを候補に追加する
|
||||
// convertTargetが1文字のケースでは、ひらがな・カタカナに変換したものを候補に追加する
|
||||
if convertTarget.count == 1 {
|
||||
let katakana = convertTarget.toKatakana()
|
||||
let hiragana = convertTarget.toHiragana()
|
||||
if convertTarget == katakana && katakana == hiragana {
|
||||
if katakana == hiragana {
|
||||
// カタカナとひらがなが同じ場合(記号など)
|
||||
let element = DicdataElement(ruby: katakana, cid: CIDData.固有名詞.cid, mid: MIDData.一般.mid, value: -14)
|
||||
result.append(element)
|
||||
@ -550,7 +676,6 @@ public final class DicdataStore {
|
||||
result.append(katakanaElement)
|
||||
}
|
||||
}
|
||||
|
||||
// 記号変換
|
||||
if convertTarget.count == 1, let first = convertTarget.first {
|
||||
var value: PValue = -14
|
||||
|
@ -1,13 +1,12 @@
|
||||
import SwiftUtils
|
||||
|
||||
struct TypoCorrectionGenerator: Sendable {
|
||||
init(inputs: [ComposingText.InputElement], leftIndex left: Int, rightIndexRange: Range<Int>, needTypoCorrection: Bool) {
|
||||
init(inputs: [ComposingText.InputElement], range: ProcessRange, needTypoCorrection: Bool) {
|
||||
self.maxPenalty = needTypoCorrection ? 3.5 * 3 : 0
|
||||
self.inputs = inputs
|
||||
self.left = left
|
||||
self.rightIndexRange = rightIndexRange
|
||||
self.range = range
|
||||
|
||||
let count = rightIndexRange.endIndex - left
|
||||
let count = self.range.rightIndexRange.endIndex - range.leftIndex
|
||||
self.count = count
|
||||
self.nodes = (0..<count).map {(i: Int) in
|
||||
Self.lengths.flatMap {(k: Int) -> [TypoCandidate] in
|
||||
@ -15,7 +14,7 @@ struct TypoCorrectionGenerator: Sendable {
|
||||
if count <= j {
|
||||
return []
|
||||
}
|
||||
return Self.getTypo(inputs[left + i ... left + j], frozen: !needTypoCorrection)
|
||||
return Self.getTypo(inputs[range.leftIndex + i ... range.leftIndex + j], frozen: !needTypoCorrection)
|
||||
}
|
||||
}
|
||||
// 深さ優先で列挙する
|
||||
@ -23,7 +22,7 @@ struct TypoCorrectionGenerator: Sendable {
|
||||
guard let firstElement = typoCandidate.inputElements.first else {
|
||||
return nil
|
||||
}
|
||||
if ComposingText.isLeftSideValid(first: firstElement, of: inputs, from: left) {
|
||||
if ComposingText.isLeftSideValid(first: firstElement, of: inputs, from: range.leftIndex) {
|
||||
var convertTargetElements = [ComposingText.ConvertTargetElement]()
|
||||
for element in typoCandidate.inputElements {
|
||||
ComposingText.updateConvertTargetElements(currentElements: &convertTargetElements, newElement: element)
|
||||
@ -36,11 +35,15 @@ struct TypoCorrectionGenerator: Sendable {
|
||||
|
||||
let maxPenalty: PValue
|
||||
let inputs: [ComposingText.InputElement]
|
||||
let left: Int
|
||||
let rightIndexRange: Range<Int>
|
||||
let range: ProcessRange
|
||||
let nodes: [[TypoCandidate]]
|
||||
let count: Int
|
||||
|
||||
struct ProcessRange: Sendable, Equatable {
|
||||
var leftIndex: Int
|
||||
var rightIndexRange: Range<Int>
|
||||
}
|
||||
|
||||
var stack: [(convertTargetElements: [ComposingText.ConvertTargetElement], lastElement: ComposingText.InputElement, count: Int, penalty: PValue)]
|
||||
|
||||
/// `target`で始まる場合は到達不可能であることを知らせる
|
||||
@ -75,12 +78,12 @@ struct TypoCorrectionGenerator: Sendable {
|
||||
}
|
||||
}
|
||||
|
||||
mutating func next() -> ([Character], (endIndex: Int, penalty: PValue))? {
|
||||
mutating func next() -> ([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))? {
|
||||
while let (convertTargetElements, lastElement, count, penalty) = self.stack.popLast() {
|
||||
var result: ([Character], (endIndex: Int, penalty: PValue))? = nil
|
||||
if rightIndexRange.contains(count + left - 1) {
|
||||
if let convertTarget = ComposingText.getConvertTargetIfRightSideIsValid(lastElement: lastElement, of: inputs, to: count + left, convertTargetElements: convertTargetElements)?.map({$0.toKatakana()}) {
|
||||
result = (convertTarget, (count + left - 1, penalty))
|
||||
var result: ([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))? = nil
|
||||
if self.range.rightIndexRange.contains(count + self.range.leftIndex - 1) {
|
||||
if let convertTarget = ComposingText.getConvertTargetIfRightSideIsValid(lastElement: lastElement, of: inputs, to: count + self.range.leftIndex, convertTargetElements: convertTargetElements)?.map({$0.toKatakana()}) {
|
||||
result = (convertTarget, (.input(count + self.range.leftIndex - 1), penalty))
|
||||
}
|
||||
}
|
||||
// エスケープ
|
||||
@ -94,7 +97,7 @@ struct TypoCorrectionGenerator: Sendable {
|
||||
// 訂正数上限(3個)
|
||||
if penalty >= maxPenalty {
|
||||
var convertTargetElements = convertTargetElements
|
||||
let correct = [inputs[left + count]].map {ComposingText.InputElement(character: $0.character.toKatakana(), inputStyle: $0.inputStyle)}
|
||||
let correct = [inputs[self.range.leftIndex + count]].map {ComposingText.InputElement(character: $0.character.toKatakana(), inputStyle: $0.inputStyle)}
|
||||
if count + correct.count > self.nodes.endIndex {
|
||||
if let result {
|
||||
return result
|
||||
|
@ -213,31 +213,6 @@ public struct ComposingText: Sendable {
|
||||
return (oldString.count - common.count, String(newString.dropFirst(common.count)))
|
||||
}
|
||||
|
||||
/// inputの更新における特殊処理を扱う
|
||||
/// TODO: アドホックな対処なのでどうにか一般化したい。
|
||||
private mutating func updateInput(_ string: String, at inputCursorPosition: Int, inputStyle: InputStyle) {
|
||||
if inputCursorPosition == 0 {
|
||||
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
|
||||
return
|
||||
}
|
||||
let prev = self.input[inputCursorPosition - 1]
|
||||
if inputStyle == .roman2kana && prev.inputStyle == inputStyle, let first = string.first, String(first).onlyRomanAlphabet {
|
||||
if prev.character == first && !["a", "i", "u", "e", "o", "n"].contains(first) {
|
||||
self.input[inputCursorPosition - 1] = InputElement(character: "っ", inputStyle: .direct)
|
||||
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
|
||||
return
|
||||
}
|
||||
let n_prefix = self.input[0 ..< inputCursorPosition].suffix {$0.character == "n" && $0.inputStyle == .roman2kana}
|
||||
if n_prefix.count % 2 == 1 && !["n", "a", "i", "u", "e", "o", "y"].contains(first)
|
||||
&& self.input.dropLast(n_prefix.count).last != .init(character: "x", inputStyle: .roman2kana) {
|
||||
self.input[inputCursorPosition - 1] = InputElement(character: "ん", inputStyle: .direct)
|
||||
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
|
||||
return
|
||||
}
|
||||
}
|
||||
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
|
||||
}
|
||||
|
||||
/// 現在のカーソル位置に文字を追加する関数
|
||||
public mutating func insertAtCursorPosition(_ string: String, inputStyle: InputStyle) {
|
||||
if string.isEmpty {
|
||||
@ -246,7 +221,7 @@ public struct ComposingText: Sendable {
|
||||
let inputCursorPosition = self.forceGetInputCursorPosition(target: self.convertTarget.prefix(convertTargetCursorPosition))
|
||||
// input, convertTarget, convertTargetCursorPositionの3つを更新する
|
||||
// inputを更新
|
||||
self.updateInput(string, at: inputCursorPosition, inputStyle: inputStyle)
|
||||
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
|
||||
|
||||
let oldConvertTarget = self.convertTarget.prefix(self.convertTargetCursorPosition)
|
||||
let newConvertTarget = Self.getConvertTarget(for: self.input.prefix(inputCursorPosition + string.count))
|
||||
@ -341,18 +316,37 @@ public struct ComposingText: Sendable {
|
||||
/// 文頭の方を確定させる関数
|
||||
/// - parameters:
|
||||
/// - correspondingCount: `input`において対応する文字数
|
||||
public mutating func prefixComplete(correspondingCount: Int) {
|
||||
let correspondingCount = min(correspondingCount, self.input.count)
|
||||
self.input.removeFirst(correspondingCount)
|
||||
// convetTargetを更新する
|
||||
let newConvertTarget = Self.getConvertTarget(for: self.input)
|
||||
// カーソルの位置は、消す文字数の分削除する
|
||||
let cursorDelta = self.convertTarget.count - newConvertTarget.count
|
||||
self.convertTarget = newConvertTarget
|
||||
self.convertTargetCursorPosition -= cursorDelta
|
||||
// もしも左端にカーソルが位置していたら、文頭に移動させる
|
||||
if self.convertTargetCursorPosition == 0 {
|
||||
self.convertTargetCursorPosition = self.convertTarget.count
|
||||
public mutating func prefixComplete(composingCount: ComposingCount) {
|
||||
switch composingCount {
|
||||
case .inputCount(let correspondingCount):
|
||||
let correspondingCount = min(correspondingCount, self.input.count)
|
||||
self.input.removeFirst(correspondingCount)
|
||||
// convetTargetを更新する
|
||||
let newConvertTarget = Self.getConvertTarget(for: self.input)
|
||||
// カーソルの位置は、消す文字数の分削除する
|
||||
let cursorDelta = self.convertTarget.count - newConvertTarget.count
|
||||
self.convertTarget = newConvertTarget
|
||||
self.convertTargetCursorPosition -= cursorDelta
|
||||
// もしも左端にカーソルが位置していたら、文頭に移動させる
|
||||
if self.convertTargetCursorPosition == 0 {
|
||||
self.convertTargetCursorPosition = self.convertTarget.count
|
||||
}
|
||||
case .surfaceCount(let correspondingCount):
|
||||
// 先頭correspondingCountを削除する操作に相当する
|
||||
// カーソルを移動する
|
||||
let prefix = self.convertTarget.prefix(correspondingCount)
|
||||
let index = self.forceGetInputCursorPosition(target: prefix)
|
||||
self.input = Array(self.input[index...])
|
||||
self.convertTarget = String(self.convertTarget.dropFirst(correspondingCount))
|
||||
self.convertTargetCursorPosition -= correspondingCount
|
||||
// もしも左端にカーソルが位置していたら、文頭に移動させる
|
||||
if self.convertTargetCursorPosition == 0 {
|
||||
self.convertTargetCursorPosition = self.convertTarget.count
|
||||
}
|
||||
|
||||
case .composite(let left, let right):
|
||||
self.prefixComplete(composingCount: left)
|
||||
self.prefixComplete(composingCount: right)
|
||||
}
|
||||
}
|
||||
|
||||
@ -365,6 +359,40 @@ public struct ComposingText: Sendable {
|
||||
return text
|
||||
}
|
||||
|
||||
public func inputIndexToSurfaceIndexMap() -> [Int: Int] {
|
||||
// i2c: input indexからconvert target indexへのmap
|
||||
// c2i: convert target indexからinput indexへのmap
|
||||
|
||||
// 例1.
|
||||
// [k, y, o, u, h, a, i, i, t, e, n, k, i, d, a]
|
||||
// [き, ょ, う, は, い, い, て, ん, き, だ]
|
||||
// i2c: [0: 0, 3: 2(きょ), 4: 3(う), 6: 4(は), 7: 5(い), 8: 6(い), 10: 7(て), 13: 9(んき), 15: 10(だ)]
|
||||
|
||||
var map: [Int: (surfaceIndex: Int, surface: String)] = [0: (0, "")]
|
||||
|
||||
// 逐次更新用のバッファ
|
||||
var convertTargetElements: [ConvertTargetElement] = []
|
||||
|
||||
for (idx, element) in self.input.enumerated() {
|
||||
// 要素を追加して表層文字列を更新
|
||||
Self.updateConvertTargetElements(currentElements: &convertTargetElements, newElement: element)
|
||||
// 表層側の長さを再計算
|
||||
let currentSurface = convertTargetElements.reduce(into: "") { $0 += $1.string }
|
||||
// idx 個の要素を処理し終えた直後(= 次の要素を処理する前)の
|
||||
// カーソル位置は idx + 1
|
||||
map[idx + 1] = (currentSurface.count, currentSurface)
|
||||
}
|
||||
// 最終的なサーフェスと一致したものだけ残す
|
||||
let finalSurface = convertTargetElements.reduce(into: "") { $0 += $1.string }
|
||||
return map
|
||||
.filter {
|
||||
finalSurface.hasPrefix($0.value.surface)
|
||||
}
|
||||
.mapValues {
|
||||
$0.surfaceIndex
|
||||
}
|
||||
}
|
||||
|
||||
public mutating func stopComposition() {
|
||||
self.input = []
|
||||
self.convertTarget = ""
|
||||
@ -580,17 +608,20 @@ extension ComposingText.ConvertTargetElement: Equatable {}
|
||||
extension ComposingText {
|
||||
/// 2つの`ComposingText`のデータを比較し、差分を計算する。
|
||||
/// `convertTarget`との整合性をとるため、`convertTarget`に合わせた上で比較する
|
||||
func differenceSuffix(to previousData: ComposingText) -> (deleted: Int, addedCount: Int) {
|
||||
func differenceSuffix(to previousData: ComposingText) -> (deletedInput: Int, addedInput: Int, deletedSurface: Int, addedSurface: Int) {
|
||||
// k→か、sh→しゃ、のような場合、差分は全てx ... lastの範囲に現れるので、差分計算が問題なく動作する
|
||||
// かn → かんs、のような場合、「かんs、んs、s」のようなものは現れるが、「かん」が生成できない
|
||||
// 本質的にこれはポリシーの問題であり、「は|しゃ」の変換で「はし」が部分変換として現れないことと同根の問題である。
|
||||
// 解決のためには、inputの段階で「ん」をdirectで扱うべきである。
|
||||
|
||||
// 差分を計算する
|
||||
let common = self.input.commonPrefix(with: previousData.input)
|
||||
let deleted = previousData.input.count - common.count
|
||||
let added = self.input.dropFirst(common.count).count
|
||||
return (deleted, added)
|
||||
|
||||
let commonSurface = self.convertTarget.commonPrefix(with: previousData.convertTarget)
|
||||
let deletedSurface = previousData.convertTarget.count - commonSurface.count
|
||||
let addedSurface = self.convertTarget.count - commonSurface.count
|
||||
return (deleted, added, deletedSurface, addedSurface)
|
||||
}
|
||||
|
||||
func inputHasSuffix(inputOf suffix: ComposingText) -> Bool {
|
||||
|
@ -4,6 +4,7 @@ public import Foundation
|
||||
public extension ConvertRequestOptions {
|
||||
static func withDefaultDictionary(
|
||||
N_best: Int = 10,
|
||||
needTypoCorrection: Bool? = nil,
|
||||
requireJapanesePrediction: Bool,
|
||||
requireEnglishPrediction: Bool,
|
||||
keyboardLanguage: KeyboardLanguage,
|
||||
@ -29,13 +30,26 @@ public extension ConvertRequestOptions {
|
||||
#else
|
||||
let dictionaryDirectory = Bundle.module.resourceURL!.appendingPathComponent("Dictionary", isDirectory: true)
|
||||
#endif
|
||||
|
||||
var specialCandidateProviders = [any SpecialCandidateProvider]()
|
||||
if typographyLetterCandidate {
|
||||
specialCandidateProviders.append(.typography)
|
||||
}
|
||||
if unicodeCandidate {
|
||||
specialCandidateProviders.append(.unicode)
|
||||
}
|
||||
specialCandidateProviders.append(.emailAddress)
|
||||
specialCandidateProviders.append(.timeExpression)
|
||||
specialCandidateProviders.append(.calendar)
|
||||
specialCandidateProviders.append(.version)
|
||||
specialCandidateProviders.append(.commaSeparatedNumber)
|
||||
|
||||
return Self(
|
||||
N_best: N_best,
|
||||
needTypoCorrection: needTypoCorrection,
|
||||
requireJapanesePrediction: requireJapanesePrediction,
|
||||
requireEnglishPrediction: requireEnglishPrediction,
|
||||
keyboardLanguage: keyboardLanguage,
|
||||
typographyLetterCandidate: typographyLetterCandidate,
|
||||
unicodeCandidate: unicodeCandidate,
|
||||
englishCandidateInRoman2KanaInput: englishCandidateInRoman2KanaInput,
|
||||
fullWidthRomanCandidate: fullWidthRomanCandidate,
|
||||
halfWidthKanaCandidate: halfWidthKanaCandidate,
|
||||
@ -44,8 +58,9 @@ public extension ConvertRequestOptions {
|
||||
shouldResetMemory: shouldResetMemory,
|
||||
dictionaryResourceURL: dictionaryDirectory,
|
||||
memoryDirectoryURL: memoryDirectoryURL,
|
||||
sharedContainerURL: sharedContainerURL,
|
||||
sharedContainerURL: sharedContainerURL,
|
||||
textReplacer: textReplacer,
|
||||
specialCandidateProviders: specialCandidateProviders,
|
||||
zenzaiMode: zenzaiMode,
|
||||
preloadDictionary: preloadDictionary,
|
||||
metadata: metadata
|
||||
|
@ -14,19 +14,19 @@ final class ClauseDataUnitTests: XCTestCase {
|
||||
do {
|
||||
let unit1 = ClauseDataUnit()
|
||||
unit1.text = "僕が"
|
||||
unit1.inputRange = 0 ..< 3
|
||||
unit1.ranges = [.input(from: 0, to: 3)]
|
||||
unit1.mid = 0
|
||||
unit1.nextLcid = 0
|
||||
|
||||
let unit2 = ClauseDataUnit()
|
||||
unit2.text = "走る"
|
||||
unit2.inputRange = 3 ..< 6
|
||||
unit2.ranges = [.input(from: 3, to: 6)]
|
||||
unit2.mid = 1
|
||||
unit2.nextLcid = 1
|
||||
|
||||
unit1.merge(with: unit2)
|
||||
XCTAssertEqual(unit1.text, "僕が走る")
|
||||
XCTAssertEqual(unit1.inputRange, 0 ..< 6)
|
||||
XCTAssertEqual(unit1.ranges, [.input(from: 0, to: 3), .input(from: 3, to: 6)])
|
||||
XCTAssertEqual(unit1.nextLcid, 1)
|
||||
XCTAssertEqual(unit1.mid, 0)
|
||||
}
|
||||
@ -34,19 +34,19 @@ final class ClauseDataUnitTests: XCTestCase {
|
||||
do {
|
||||
let unit1 = ClauseDataUnit()
|
||||
unit1.text = "君は"
|
||||
unit1.inputRange = 0 ..< 3
|
||||
unit1.ranges = [.input(from: 0, to: 3)]
|
||||
unit1.mid = 0
|
||||
unit1.nextLcid = 0
|
||||
|
||||
let unit2 = ClauseDataUnit()
|
||||
unit2.text = "笑った"
|
||||
unit2.inputRange = 3 ..< 7
|
||||
unit2.ranges = [.input(from: 3, to: 7)]
|
||||
unit2.mid = 3
|
||||
unit2.nextLcid = 3
|
||||
|
||||
unit1.merge(with: unit2)
|
||||
XCTAssertEqual(unit1.text, "君は笑った")
|
||||
XCTAssertEqual(unit1.inputRange, 0 ..< 7)
|
||||
XCTAssertEqual(unit1.ranges, [.input(from: 0, to: 3), .input(from: 3, to: 7)])
|
||||
XCTAssertEqual(unit1.nextLcid, 3)
|
||||
XCTAssertEqual(unit1.mid, 0)
|
||||
}
|
||||
|
@ -75,7 +75,7 @@ final class ComposingTextTests: XCTestCase {
|
||||
sequentialInput(&c, sequence: "itte", inputStyle: .roman2kana)
|
||||
XCTAssertEqual(c.input, [
|
||||
ComposingText.InputElement(character: "i", inputStyle: .roman2kana),
|
||||
ComposingText.InputElement(character: "っ", inputStyle: .direct),
|
||||
ComposingText.InputElement(character: "t", inputStyle: .roman2kana),
|
||||
ComposingText.InputElement(character: "t", inputStyle: .roman2kana),
|
||||
ComposingText.InputElement(character: "e", inputStyle: .roman2kana)
|
||||
])
|
||||
@ -88,7 +88,7 @@ final class ComposingTextTests: XCTestCase {
|
||||
sequentialInput(&c, sequence: "anta", inputStyle: .roman2kana)
|
||||
XCTAssertEqual(c.input, [
|
||||
ComposingText.InputElement(character: "a", inputStyle: .roman2kana),
|
||||
ComposingText.InputElement(character: "ん", inputStyle: .direct),
|
||||
ComposingText.InputElement(character: "n", inputStyle: .roman2kana),
|
||||
ComposingText.InputElement(character: "t", inputStyle: .roman2kana),
|
||||
ComposingText.InputElement(character: "a", inputStyle: .roman2kana)
|
||||
])
|
||||
@ -202,8 +202,8 @@ final class ComposingTextTests: XCTestCase {
|
||||
var c2 = ComposingText()
|
||||
c2.insertAtCursorPosition("hasiru", inputStyle: .roman2kana)
|
||||
|
||||
XCTAssertEqual(c2.differenceSuffix(to: c1).deleted, 0)
|
||||
XCTAssertEqual(c2.differenceSuffix(to: c1).addedCount, 1)
|
||||
XCTAssertEqual(c2.differenceSuffix(to: c1).deletedInput, 0)
|
||||
XCTAssertEqual(c2.differenceSuffix(to: c1).addedInput, 1)
|
||||
}
|
||||
do {
|
||||
var c1 = ComposingText()
|
||||
@ -212,8 +212,47 @@ final class ComposingTextTests: XCTestCase {
|
||||
var c2 = ComposingText()
|
||||
c2.insertAtCursorPosition("tukatte", inputStyle: .roman2kana)
|
||||
|
||||
XCTAssertEqual(c2.differenceSuffix(to: c1).deleted, 0)
|
||||
XCTAssertEqual(c2.differenceSuffix(to: c1).addedCount, 1)
|
||||
XCTAssertEqual(c2.differenceSuffix(to: c1).deletedInput, 0)
|
||||
XCTAssertEqual(c2.differenceSuffix(to: c1).addedInput, 1)
|
||||
}
|
||||
}
|
||||
|
||||
func testIndexMap() throws {
|
||||
do {
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: "kyouhaiitenkida", inputStyle: .roman2kana)
|
||||
let map = c.inputIndexToSurfaceIndexMap()
|
||||
|
||||
XCTAssertEqual(map[0], 0) // ""
|
||||
XCTAssertEqual(map[1], nil) // k
|
||||
XCTAssertEqual(map[2], nil) // y
|
||||
XCTAssertEqual(map[3], 2) // o
|
||||
XCTAssertEqual(map[4], 3) // u
|
||||
XCTAssertEqual(map[5], nil) // h
|
||||
XCTAssertEqual(map[6], 4) // a
|
||||
XCTAssertEqual(map[7], 5) // i
|
||||
XCTAssertEqual(map[8], 6) // i
|
||||
XCTAssertEqual(map[9], nil) // t
|
||||
XCTAssertEqual(map[10], 7) // e
|
||||
XCTAssertEqual(map[11], nil) // n
|
||||
XCTAssertEqual(map[12], nil) // k
|
||||
XCTAssertEqual(map[13], 9) // i
|
||||
XCTAssertEqual(map[14], nil) // d
|
||||
XCTAssertEqual(map[15], 10) // a
|
||||
}
|
||||
do {
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: "sakujoshori", inputStyle: .roman2kana)
|
||||
let map = c.inputIndexToSurfaceIndexMap()
|
||||
let reversedMap = (0 ..< c.convertTarget.count + 1).compactMap {
|
||||
if map.values.contains($0) {
|
||||
String(c.convertTarget.prefix($0))
|
||||
} else {
|
||||
nil
|
||||
}
|
||||
}
|
||||
XCTAssertFalse(reversedMap.contains("さくじ"))
|
||||
XCTAssertFalse(reversedMap.contains("さくじょし"))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -16,7 +16,7 @@ final class CandidateTests: XCTestCase {
|
||||
let candidate = Candidate(
|
||||
text: text,
|
||||
value: -40,
|
||||
correspondingCount: 4,
|
||||
composingCount: .inputCount(4),
|
||||
lastMid: 5,
|
||||
data: [DicdataElement(word: text, ruby: "サイコロ", cid: 0, mid: 5, value: -40)]
|
||||
)
|
||||
@ -27,7 +27,7 @@ final class CandidateTests: XCTestCase {
|
||||
print(candidate2.text)
|
||||
XCTAssertTrue(Set((1...3).map(String.init)).contains(candidate2.text))
|
||||
XCTAssertEqual(candidate.value, candidate2.value)
|
||||
XCTAssertEqual(candidate.correspondingCount, candidate2.correspondingCount)
|
||||
XCTAssertEqual(candidate.composingCount, candidate2.composingCount)
|
||||
XCTAssertEqual(candidate.lastMid, candidate2.lastMid)
|
||||
XCTAssertEqual(candidate.data, candidate2.data)
|
||||
XCTAssertEqual(candidate.actions, candidate2.actions)
|
||||
@ -38,7 +38,7 @@ final class CandidateTests: XCTestCase {
|
||||
let candidate = Candidate(
|
||||
text: text,
|
||||
value: 0,
|
||||
correspondingCount: 0,
|
||||
composingCount: .inputCount(0),
|
||||
lastMid: 0,
|
||||
data: [DicdataElement(word: text, ruby: "", cid: 0, mid: 0, value: 0)]
|
||||
)
|
||||
|
@ -0,0 +1,41 @@
|
||||
import Foundation
|
||||
@testable import KanaKanjiConverterModule
|
||||
import XCTest
|
||||
|
||||
final class TemplateConversionTests: XCTestCase {
|
||||
func requestOptions() -> ConvertRequestOptions {
|
||||
.default
|
||||
}
|
||||
|
||||
func testTemplateConversion() async throws {
|
||||
let converter = await KanaKanjiConverter()
|
||||
let template = #"<date format="yyyy年MM月dd日" type="western" language="ja_JP" delta="0" deltaunit="1">"#
|
||||
await converter.sendToDicdataStore(.importDynamicUserDict([
|
||||
.init(word: template, ruby: "キョウ", cid: CIDData.一般名詞.cid, mid: MIDData.一般.mid, value: 5)
|
||||
]))
|
||||
let formatter = DateFormatter()
|
||||
formatter.dateFormat = "yyyy年MM月dd日"
|
||||
formatter.calendar = Calendar(identifier: .gregorian)
|
||||
let todayString = formatter.string(from: Date())
|
||||
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("きょう", inputStyle: .direct)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions())
|
||||
XCTAssertTrue(results.mainResults.contains(where: { $0.text == todayString} ))
|
||||
XCTAssertFalse(results.mainResults.contains(where: { $0.text == template} ))
|
||||
XCTAssertFalse(results.firstClauseResults.contains(where: { $0.text == template} ))
|
||||
await converter.stopComposition()
|
||||
}
|
||||
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("kyou", inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions())
|
||||
XCTAssertTrue(results.mainResults.contains(where: { $0.text == todayString} ))
|
||||
XCTAssertFalse(results.mainResults.contains(where: { $0.text == template} ))
|
||||
XCTAssertFalse(results.firstClauseResults.contains(where: { $0.text == template} ))
|
||||
await converter.stopComposition()
|
||||
}
|
||||
}
|
||||
}
|
@ -88,7 +88,7 @@ final class LearningMemoryTests: XCTestCase {
|
||||
Candidate(
|
||||
text: element.word,
|
||||
value: element.value(),
|
||||
correspondingCount: 3,
|
||||
composingCount: .inputCount(3),
|
||||
lastMid: element.mid,
|
||||
data: [element]
|
||||
)
|
||||
@ -128,7 +128,7 @@ final class LearningMemoryTests: XCTestCase {
|
||||
Candidate(
|
||||
text: element.word,
|
||||
value: element.value(),
|
||||
correspondingCount: 3,
|
||||
composingCount: .inputCount(3),
|
||||
lastMid: element.mid,
|
||||
data: [element]
|
||||
)
|
||||
|
@ -12,16 +12,16 @@ import XCTest
|
||||
final class RegisteredNodeTests: XCTestCase {
|
||||
func testBOSNode() throws {
|
||||
let bos = RegisteredNode.BOSNode()
|
||||
XCTAssertEqual(bos.inputRange, 0..<0)
|
||||
XCTAssertEqual(bos.range, Lattice.LatticeRange.zero)
|
||||
XCTAssertNil(bos.prev)
|
||||
XCTAssertEqual(bos.totalValue, 0)
|
||||
XCTAssertEqual(bos.data.rcid, CIDData.BOS.cid)
|
||||
}
|
||||
|
||||
func testFromLastCandidate() throws {
|
||||
let candidate = Candidate(text: "我輩は猫", value: -20, correspondingCount: 7, lastMid: 100, data: [DicdataElement(word: "我輩は猫", ruby: "ワガハイハネコ", cid: CIDData.一般名詞.cid, mid: 100, value: -20)])
|
||||
let candidate = Candidate(text: "我輩は猫", value: -20, composingCount: .inputCount(7), lastMid: 100, data: [DicdataElement(word: "我輩は猫", ruby: "ワガハイハネコ", cid: CIDData.一般名詞.cid, mid: 100, value: -20)])
|
||||
let bos = RegisteredNode.fromLastCandidate(candidate)
|
||||
XCTAssertEqual(bos.inputRange, 0..<0)
|
||||
XCTAssertEqual(bos.range, Lattice.LatticeRange.zero)
|
||||
XCTAssertNil(bos.prev)
|
||||
XCTAssertEqual(bos.totalValue, 0)
|
||||
XCTAssertEqual(bos.data.rcid, CIDData.一般名詞.cid)
|
||||
@ -34,37 +34,37 @@ final class RegisteredNodeTests: XCTestCase {
|
||||
data: DicdataElement(word: "我輩", ruby: "ワガハイ", cid: CIDData.一般名詞.cid, mid: 1, value: -5),
|
||||
registered: bos,
|
||||
totalValue: -10,
|
||||
inputRange: 0..<4
|
||||
range: .input(from: 0, to: 4)
|
||||
)
|
||||
let node2 = RegisteredNode(
|
||||
data: DicdataElement(word: "は", ruby: "ハ", cid: CIDData.係助詞ハ.cid, mid: 2, value: -2),
|
||||
registered: node1,
|
||||
totalValue: -13,
|
||||
inputRange: 4..<5
|
||||
range: .input(from: 4, to: 5)
|
||||
)
|
||||
let node3 = RegisteredNode(
|
||||
data: DicdataElement(word: "猫", ruby: "ネコ", cid: CIDData.一般名詞.cid, mid: 3, value: -4),
|
||||
registered: node2,
|
||||
totalValue: -20,
|
||||
inputRange: 5..<7
|
||||
range: .input(from: 5, to: 7)
|
||||
)
|
||||
let node4 = RegisteredNode(
|
||||
data: DicdataElement(word: "です", ruby: "デス", cid: CIDData.助動詞デス基本形.cid, mid: 4, value: -3),
|
||||
registered: node3,
|
||||
totalValue: -25,
|
||||
inputRange: 7..<9
|
||||
range: .input(from: 7, to: 9)
|
||||
)
|
||||
let result = node4.getCandidateData()
|
||||
let clause1 = ClauseDataUnit()
|
||||
clause1.text = "我輩は"
|
||||
clause1.nextLcid = CIDData.一般名詞.cid
|
||||
clause1.inputRange = 0..<5
|
||||
clause1.ranges = [.input(from: 0, to: 0), .input(from: 0, to: 4), .input(from: 4, to: 5)] // (0, 0) はBOSのためのダミー
|
||||
clause1.mid = 1
|
||||
|
||||
let clause2 = ClauseDataUnit()
|
||||
clause2.text = "猫です"
|
||||
clause2.nextLcid = CIDData.EOS.cid
|
||||
clause2.inputRange = 5..<9
|
||||
clause2.ranges = [.input(from: 5, to: 7), .input(from: 7, to: 9)]
|
||||
clause2.mid = 3
|
||||
|
||||
let expectedResult: CandidateData = CandidateData(
|
||||
|
@ -7,7 +7,7 @@
|
||||
//
|
||||
|
||||
import Foundation
|
||||
import KanaKanjiConverterModuleWithDefaultDictionary
|
||||
@testable import KanaKanjiConverterModuleWithDefaultDictionary
|
||||
import XCTest
|
||||
|
||||
final class ConverterTests: XCTestCase {
|
||||
@ -17,9 +17,10 @@ final class ConverterTests: XCTestCase {
|
||||
}
|
||||
}
|
||||
|
||||
func requestOptions() -> ConvertRequestOptions {
|
||||
func requestOptions(needTypoCorrection: Bool = false) -> ConvertRequestOptions {
|
||||
.withDefaultDictionary(
|
||||
N_best: 10,
|
||||
needTypoCorrection: needTypoCorrection,
|
||||
requireJapanesePrediction: false,
|
||||
requireEnglishPrediction: false,
|
||||
keyboardLanguage: .ja_JP,
|
||||
@ -56,19 +57,21 @@ final class ConverterTests: XCTestCase {
|
||||
}
|
||||
|
||||
func testRoman2KanaFullConversion() async throws {
|
||||
do {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("azuーkiーhasinjidainokiーboーdoapuridesu", inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions())
|
||||
XCTAssertEqual(results.mainResults.first?.text, "azooKeyは新時代のキーボードアプリです")
|
||||
}
|
||||
do {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("youshoukikaratenisusuieiyakyuushourinjikenpounadosamazamanasupoーtuwokeikennsinagarasodatishougakkouzidaiharosanzerusukinkounitaizaisiteorigoruhuyatenisuwonaratteita", inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions())
|
||||
XCTAssertEqual(results.mainResults.first?.text, "幼少期からテニス水泳野球少林寺拳法など様々なスポーツを経験しながら育ち小学校時代はロサンゼルス近郊に滞在しておりゴルフやテニスを習っていた")
|
||||
for needTypoCorrection in [true, false] {
|
||||
do {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("azuーkiーhasinjidainokiーboーdoapuridesu", inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions(needTypoCorrection: needTypoCorrection))
|
||||
XCTAssertEqual(results.mainResults.first?.text, "azooKeyは新時代のキーボードアプリです")
|
||||
}
|
||||
do {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("youshoukikaratenisusuieiyakyuushourinjikenpounadosamazamanasupoーtuwokeikennsinagarasodatishougakkouzidaiharosanzerusukinkounitaizaisiteorigoruhuyatenisuwonaratteita", inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions(needTypoCorrection: needTypoCorrection))
|
||||
XCTAssertEqual(results.mainResults.first?.text, "幼少期からテニス水泳野球少林寺拳法など様々なスポーツを経験しながら育ち小学校時代はロサンゼルス近郊に滞在しておりゴルフやテニスを習っていた")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -128,6 +131,53 @@ final class ConverterTests: XCTestCase {
|
||||
}
|
||||
}
|
||||
}
|
||||
// memo: このケースで単漢字変換などの結果が得られない問題があった
|
||||
func testKimiAndThenDelete() async throws {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
let text = "kimi"
|
||||
// 許容される変換結果
|
||||
let possibles = [
|
||||
"君",
|
||||
"気味",
|
||||
"黄身"
|
||||
]
|
||||
for char in text {
|
||||
c.insertAtCursorPosition(String(char), inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions())
|
||||
if c.input.count == text.count {
|
||||
XCTAssertTrue(possibles.contains(results.mainResults.first!.text))
|
||||
}
|
||||
}
|
||||
// 1文字削除
|
||||
c.deleteBackwardFromCursorPosition(count: 1)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions())
|
||||
XCTAssertTrue(results.mainResults.contains { $0.text == "黄" })
|
||||
}
|
||||
|
||||
// memo: このケースでfatalErrorが発生する不具合が生じることがあった
|
||||
func testIttaAndThenDelete() async throws {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
let text = "itta"
|
||||
// 許容される変換結果
|
||||
let possibles = [
|
||||
"いった",
|
||||
"行った",
|
||||
"言った"
|
||||
]
|
||||
for char in text {
|
||||
c.insertAtCursorPosition(String(char), inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions())
|
||||
if c.input.count == text.count {
|
||||
XCTAssertTrue(possibles.contains(results.mainResults.first!.text))
|
||||
}
|
||||
}
|
||||
// 1文字削除
|
||||
c.deleteBackwardFromCursorPosition(count: 1)
|
||||
let results = await converter.requestCandidates(c, options: requestOptions())
|
||||
XCTAssertTrue(results.mainResults.contains { $0.text == "言っ" })
|
||||
}
|
||||
|
||||
// 1文字ずつ入力するが、時折削除を行う
|
||||
// memo: 内部実装としてはdeleted_last_nのテストを意図している
|
||||
@ -171,72 +221,103 @@ final class ConverterTests: XCTestCase {
|
||||
|
||||
// 必ず正解すべきテストケース
|
||||
func testMustCases() async throws {
|
||||
// ダイレクト入力
|
||||
do {
|
||||
let cases: [(input: String, expect: String)] = [
|
||||
("つかっている", "使っている"),
|
||||
("しんだどうぶつ", "死んだ動物"),
|
||||
("けいさん", "計算"),
|
||||
("azooKeyをつかう", "azooKeyを使う"),
|
||||
("じどうAIそうじゅう。", "自動AI操縦。"),
|
||||
("1234567890123456789012", "1234567890123456789012")
|
||||
]
|
||||
// ダイレクト入力
|
||||
do {
|
||||
let cases: [(input: String, expect: String)] = [
|
||||
("つかっている", "使っている"),
|
||||
("しんだどうぶつ", "死んだ動物"),
|
||||
("けいさん", "計算"),
|
||||
("azooKeyをつかう", "azooKeyを使う"),
|
||||
("じどうAIそうじゅう。", "自動AI操縦。"),
|
||||
("1234567890123456789012", "1234567890123456789012")
|
||||
]
|
||||
|
||||
// full input
|
||||
var options = requestOptions()
|
||||
options.requireJapanesePrediction = false
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: input, inputStyle: .direct)
|
||||
// full input
|
||||
var options = requestOptions()
|
||||
options.requireJapanesePrediction = false
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: input, inputStyle: .direct)
|
||||
let results = await converter.requestCandidates(c, options: options)
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
// gradual input
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
for char in input {
|
||||
c.insertAtCursorPosition(String(char), inputStyle: .direct)
|
||||
let results = await converter.requestCandidates(c, options: options)
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
// gradual input
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
for char in input {
|
||||
c.insertAtCursorPosition(String(char), inputStyle: .direct)
|
||||
let results = await converter.requestCandidates(c, options: options)
|
||||
if c.input.count == input.count {
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
if c.input.count == input.count {
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
}
|
||||
}
|
||||
// ローマ字入力
|
||||
do {
|
||||
let cases: [(input: String, expect: String)] = [
|
||||
("tukatteiru", "使っている"),
|
||||
("sindadoubutu", "死んだ動物"),
|
||||
("keisann", "計算")
|
||||
]
|
||||
}
|
||||
// ローマ字入力
|
||||
do {
|
||||
let cases: [(input: String, expect: String)] = [
|
||||
("tukatteiru", "使っている"),
|
||||
("sindadoubutu", "死んだ動物"),
|
||||
("keisann", "計算")
|
||||
]
|
||||
|
||||
// full input
|
||||
var options = requestOptions()
|
||||
options.requireJapanesePrediction = false
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: input, inputStyle: .roman2kana)
|
||||
// full input
|
||||
var options = requestOptions()
|
||||
options.requireJapanesePrediction = false
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: input, inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: options)
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
|
||||
// gradual input
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
for char in input {
|
||||
c.insertAtCursorPosition(String(char), inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: options)
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
|
||||
// gradual input
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
for char in input {
|
||||
c.insertAtCursorPosition(String(char), inputStyle: .roman2kana)
|
||||
let results = await converter.requestCandidates(c, options: options)
|
||||
if c.input.count == input.count {
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
if c.input.count == input.count {
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
// typo訂正アリ
|
||||
do {
|
||||
let cases: [(input: String, expect: String)] = [
|
||||
("たいかくせい", "大学生"),
|
||||
("きみのことかすき", "君のことが好き"),
|
||||
("おへんとうをもつていく", "お弁当を持っていく"),
|
||||
]
|
||||
|
||||
// full input
|
||||
var options = requestOptions(needTypoCorrection: true)
|
||||
options.requireJapanesePrediction = false
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: input, inputStyle: .direct)
|
||||
let results = await converter.requestCandidates(c, options: options)
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
// gradual input
|
||||
for (input, expect) in cases {
|
||||
let converter = await KanaKanjiConverter()
|
||||
var c = ComposingText()
|
||||
for char in input {
|
||||
c.insertAtCursorPosition(String(char), inputStyle: .direct)
|
||||
let results = await converter.requestCandidates(c, options: options)
|
||||
if c.input.count == input.count {
|
||||
XCTAssertEqual(results.mainResults.first?.text, expect)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 変換結果が比較的一意なテストケースを無数に持ち、一定の割合を正解することを要求する
|
||||
|
@ -129,7 +129,7 @@ final class DicdataStoreTests: XCTestCase {
|
||||
for (key, word) in mustWords {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition(key, inputStyle: .direct)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
|
||||
// 冗長な書き方だが、こうすることで「どの項目でエラーが発生したのか」がはっきりするため、こう書いている。
|
||||
XCTAssertEqual(result.first(where: {$0.data.word == word})?.data.word, word)
|
||||
}
|
||||
@ -150,7 +150,7 @@ final class DicdataStoreTests: XCTestCase {
|
||||
for (key, word) in mustWords {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition(key, inputStyle: .direct)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
|
||||
XCTAssertNil(result.first(where: {$0.data.word == word && $0.data.ruby == key}))
|
||||
}
|
||||
}
|
||||
@ -170,17 +170,17 @@ final class DicdataStoreTests: XCTestCase {
|
||||
for (key, word) in mustWords {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition(key, inputStyle: .direct)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: true)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: true)
|
||||
XCTAssertEqual(result.first(where: {$0.data.word == word})?.data.word, word)
|
||||
}
|
||||
}
|
||||
|
||||
func testGetLOUDSDataInRange() throws {
|
||||
func testLookupDicdata() throws {
|
||||
let dicdataStore = DicdataStore(convertRequestOptions: requestOptions())
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("ヘンカン", inputStyle: .roman2kana)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 2..<4)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 2 ..< 4))
|
||||
XCTAssertFalse(result.contains(where: {$0.data.word == "変"}))
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "変化"}))
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "変換"}))
|
||||
@ -188,7 +188,7 @@ final class DicdataStoreTests: XCTestCase {
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("ヘンカン", inputStyle: .roman2kana)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 0..<4)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 0..<4))
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "変"}))
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "変化"}))
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "変換"}))
|
||||
@ -196,19 +196,19 @@ final class DicdataStoreTests: XCTestCase {
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("ツカッ", inputStyle: .roman2kana)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 2..<3)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 2..<3))
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "使っ"}))
|
||||
}
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("ツカッt", inputStyle: .roman2kana)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 2..<4)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 2..<4))
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "使っ"}))
|
||||
}
|
||||
do {
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: "tukatt", inputStyle: .roman2kana)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 4..<6)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 4..<6))
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "使っ"}))
|
||||
}
|
||||
}
|
||||
@ -218,7 +218,7 @@ final class DicdataStoreTests: XCTestCase {
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("999999999999", inputStyle: .roman2kana)
|
||||
let result = dicdataStore.getWiseDicdata(convertTarget: c.convertTarget, inputData: c, inputRange: 0..<12)
|
||||
let result = dicdataStore.getWiseDicdata(convertTarget: c.convertTarget, inputData: c, surfaceRange: 0..<12)
|
||||
XCTAssertTrue(result.contains(where: {$0.word == "999999999999"}))
|
||||
XCTAssertTrue(result.contains(where: {$0.word == "九千九百九十九億九千九百九十九万九千九百九十九"}))
|
||||
}
|
||||
@ -255,7 +255,7 @@ final class DicdataStoreTests: XCTestCase {
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("テストタンゴ", inputStyle: .direct)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "テスト単語"}))
|
||||
}
|
||||
|
||||
@ -263,7 +263,7 @@ final class DicdataStoreTests: XCTestCase {
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("ドウテキジショ", inputStyle: .direct)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "動的辞書"}))
|
||||
}
|
||||
|
||||
@ -288,16 +288,16 @@ final class DicdataStoreTests: XCTestCase {
|
||||
do {
|
||||
var c = ComposingText()
|
||||
sequentialInput(&c, sequence: "tesutowaーdo", inputStyle: .roman2kana)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
|
||||
XCTAssertTrue(result.contains(where: {$0.data.word == "テストワード"}))
|
||||
XCTAssertEqual(result.first(where: {$0.data.word == "テストワード"})?.inputRange, 0 ..< 11)
|
||||
XCTAssertEqual(result.first(where: {$0.data.word == "テストワード"})?.range, .input(from: 0, to: 11))
|
||||
}
|
||||
|
||||
// 動的ユーザ辞書の単語が通常の辞書よりも優先されることのテスト
|
||||
do {
|
||||
var c = ComposingText()
|
||||
c.insertAtCursorPosition("トクシュヨミ", inputStyle: .direct)
|
||||
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
|
||||
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
|
||||
let dynamicUserDictResult = result.first(where: {$0.data.word == "特殊読み"})
|
||||
XCTAssertNotNil(dynamicUserDictResult)
|
||||
XCTAssertEqual(dynamicUserDictResult?.data.metadata, .isFromUserDictionary)
|
||||
|
Reference in New Issue
Block a user