Merge pull request #223 from azooKey/feat/surface_based_indexing

feat(breaking): Latticeの操作をconvertTargetベースのindexとinputベースのindexの二重化して扱えるように変更
This commit is contained in:
Miwa
2025-07-20 19:32:01 -07:00
committed by GitHub
38 changed files with 1141 additions and 432 deletions

View File

@ -1,6 +1,6 @@
# Conversion Algorithms
azooKey内部で用いられている複雑な実装を大まかに説明します。
AzooKeyKanaKanjiConverter内部で用いられている複雑な実装を大まかに説明します。
## かな漢字変換
@ -10,9 +10,9 @@ azooKey内部で用いられている複雑な実装を大まかに説明しま
アルゴリズムに特徴的な点として、文節単位に分割したあと、「内容語バイグラム」とでもいうべき追加のコストを計算します。このコスト計算により、「共起しやすい語」が共起している場合により評価が高く、「共起しづらい語」が共起している場合に評価が低くなります。
## 入力管理
## 入力管理Input Management
入力管理は簡単に見えて非常に複雑な問題です。azooKeyでは`ComposingText`の内部で管理されています。
入力管理とは、ユーザのキー入力の履歴を管理し、それに応じてローマ字かな変換などの適用を行う仕組みです。入力管理は簡単に見えて非常に複雑な問題です。AzooKeyKanaKanjiConverterではおもに`ComposingText`の内部で管理されています。
典型的なエッジケースは「ローマ字入力中に英語キーボードに切り替えて英字を打ち、日本語キーボードに戻って入力を続ける」という操作です。つまり、次の2つは区別できなければいけません。
@ -26,7 +26,7 @@ azooKey内部で用いられている複雑な実装を大まかに説明しま
入力 a (日本語) // →kあ
```
azooKeyの`ComposingText`は、次のような構造になっています。このように`input`を持つことによって、この問題に対処しています。
AzooKeyKanaKanjiConverter`ComposingText`は、次のような構造になっています。このように`input`を持つことによって、この問題に対処しています。
```swift
struct ComposingText {
@ -76,7 +76,7 @@ ComposingText(
1. じゅあ
1. 諦めて編集状態を解除する
1は最も直感的で、azooKeyはこの方式をとっています。この場合、`input`を修正する必要があります。そこでazooKeyでは、「u」をローマ字入力した場合に`ComposingText`が次のように変化します。
1は最も直感的で、AzooKeyKanaKanjiConverterはこの方式をとっています。この場合、`input`を修正する必要があります。AzooKeyKanaKanjiConverterでは、「u」をローマ字入力した場合に`ComposingText`が次のように変化します。
```swift
ComposingText(
@ -90,7 +90,7 @@ ComposingText(
)
```
一方でiOSの標準ローマ字入力では、「2」が選ばれています。これはある意味で綺麗な方法で、ローマ字入力時に「一度に」入力された単位は不可侵にしてしまう、という方法で上記の変化を無くしています。もしazooKeyがこの方式をとっているのであれば、以下のように変化することになります。しかし、このような挙動は非直感的でもあります。
一方でiOSの標準ローマ字入力では、「2」が選ばれています。これはある意味で綺麗な方法で、ローマ字入力時に「一度に」入力された単位は不可侵にしてしまう、という方法で上記の変化を無くしています。もしAzooKeyKanaKanjiConverterがこの方式をとっているのであれば、以下のように変化することになります。しかし、このような挙動は非直感的でもあります。
```swift
ComposingText(
@ -106,26 +106,31 @@ ComposingText(
「3」の「じゅあ」を選んでいるシステムは知る限りありません。この方式は「ja / じゃ」の間に「u」を入れる場合はうまくいきますが、「cha / ちゃ」の「ち」と「ゃ」の間に「u」を入れる場合は入れる位置をどのように決定するのかという問題が残ります。chua、とすることになるのでしょうか
「4」はある意味素直な立場で、「そんなんどうでもええやろ」な実装はしばしばこういう形になっています。合理的です。azooKeyも、ライブ変換中はカーソル移動を諦めているため、このように実装しています。
「4」はある意味素直な立場で、「そんなんどうでもええやろ」な実装はしばしばこういう形になっています。合理的です。AzooKeyKanaKanjiConverterも、ライブ変換中はカーソル移動を諦めているため、このように実装しています。
このように、入力にはさまざまなエッジケースがあります。こうした複雑なケースに対応していくため、入力の管理は複雑にならざるを得ないのです。
## 誤り訂正
## 誤り訂正Typo Correction
誤り訂正は、上記の`ComposingText`を基盤としたアドホックな実装になっています。
AzooKeyKanaKanjiConverterの誤り訂正は、`ComposingText.input`に対する置換として実装されています。つまり、例えば「ts」というシーケンスが存在した場合、一定のペナルティを課した上でこれを「ts」と読み替えたり、「た」というシーケンスが存在した場合、一定のペナルティを課した上でこれを「だ」と読み替えたり、といった具合です。これらのルールは事前にソースコードレベルで定義されています。
具体的には、`ComposingText`のそれぞれの部分に対して
誤り訂正をナイーブに実装した場合、訂正候補の組み合わせ爆発が課題となります。例えば、入力が「たたたたたたたたたた」であるような場合、それぞれの「た」についてルールを適用するか否かで1024通りの候補が生じてしまいます。
* 「た」があれば「だ」も許す
* 「ts」とがあれば「た」に置き換える
このような問題に対処するため、AzooKeyKanaKanjiConverterでは効率的な誤り訂正のための工夫を導入しています。ペナルティは置換を適用するたびに蓄積するので、このペナルティには上限が設けられており、それを超えた場合は列挙の対象から外れます。これにより、パターンを大きく減らすことができます。
というような事前に列挙されたルールを適用します。
さらに、v0.9系以降のAzooKeyKanaKanjiConverterでは誤り訂正と辞書検索が並行して行われます。Trie木を用いた辞書検索では、特定の文字列に対応するードがなかった場合、その文字列をプレフィックスに持ついかなる文字列も辞書登録されていないことがわかります。この性質を利用し、ありえない候補を生み出すような誤り訂正は早期に枝刈りされ、列挙のコストを大幅に削減することができています。
しかし、任意の回数適用を行えるとなると、「たたたたたたたたたた」が入ってきた場合、それぞれの「た」についてルールを適用するか否かで1024通りの候補が生じてしまいます。これでは困るので、実際には「ルールの適用は3回まで」というように制約をつけ、組み合わせ爆発を防いでいます。
## 二重インデックスラティス (Dual-Indexed Lattice)
また、ルールの適用をおこなった場合、候補のコストを追加することで「ある程度のコストをかけても上位にくる場合、誤っている可能性が高い」ということを表現しています。このコストは人力で決めていて、「か」「が」のような助詞同士のペアではより高くするなど一部調整をしています
v0.9系以降のAzooKeyKanaKanjiConverterではかな漢字変換のためのラティス構造に大きな変更を加え、「二重インデックスラティス」と呼ぶ構造を導入しました
## 学習
従来のかな漢字変換用のラティスは、`ComposingText.input`のインデックスに対応する二重配列として表現されていました。二重配列の各ノード配列要素は、対応する`input`の位置から始まり、特定の`input`の位置で終わるようなードを格納しています。この方法では、例えば「ittai」という入力に対して`[[i, it, itta, itta, ittai], [tt, tta, ttai], [ta, tai], [a, ai], [i]]`のような形で部分入力文字列が作られ、それぞれについて辞書引きが行われて実際のラティスノードが作られます。
このような実装は大筋で問題なく動作しますが、例外的なケースで問題が発生します。具体的には、「イッ」のような文字列に対応する入力を作ることができません。なぜなら、「ittai」というローマ字入力列のうち、「イッ」に過不足なく対応するような部分文字列が存在しないからです。「it」はそれ単体では「イt」、「itt」はそれ単体では「イッt」です。しかし、辞書においては「行った」の「行っ」など、「イッ」で検索をかけなければ作れない単語が数多くあります。
この問題に対処するため、AzooKeyKanaKanjiConverterでは「表層文字列レベルのインデックス」つまり「イッタイ」という文字列ベースのインデックスと、「内部文字列レベルのインデックス」つまり「ittai」という履歴ベースのインデックスを混在させる構造を導入しました。通常の変換には常に表層文字列レベルのインデックスを利用しつつ、誤り訂正については内部文字列ベースのインデックスを利用し、両者の間の対応関係を適切に取り扱うことにより、上記の問題を解決しています。
## 学習Learning
学習は、「一時記憶キーボードを開く〜閉じるの間」と「長期記憶半永続」の2つのデータを用いて行います。一時記憶は揮発性メモリ上にのみ存在し、長期記憶はファイルとして非揮発性のストレージに保存します。
@ -171,11 +176,11 @@ ComposingText(
3のステップの実行中にエラーが生じた場合、`.pause`があるため、次回キーボードを開いた際は学習を停止状態にします。ついで適切なタイミングで再度ステップ3を実行することで、安全に全てのファイルを更新することができます。
azooKeyKanaKanjiConverter では、変換器を開いた際に `.pause` ファイルが残っている場合、自動的に空の一時記憶とマージを試みて `.pause` を削除し、学習機能を復旧します。
AzooKeyKanaKanjiConverter では、変換器を開いた際に `.pause` ファイルが残っている場合、自動的に空の一時記憶とマージを試みて `.pause` を削除し、学習機能を復旧します。
## 変換候補の並び順
## 変換候補の並び順Candidate Ordering
変換候補の並び順の決定はとても難しい問題です。azooKeyではおおよそ以下のようになっています。`Converter.swift`が並び順を決めていますが、とても複雑な実装になっているため、改善したいと思っています。
変換候補の並び順の決定はとても難しい問題です。AzooKeyKanaKanjiConverterではおおよそ以下のようになっています。`Converter.swift`が並び順を決めていますが、とても複雑な実装になっているため、改善したいと思っています。
```
最初の5件: 完全一致または予測変換またはローマ字英語変換ただし上位3件までに最低1つは完全一致が含まれる
@ -183,11 +188,11 @@ azooKeyKanaKanjiConverter では、変換器を開いた際に `.pause` ファ
そこから: 全部ひらがな、全部カタカナ、全部大文字などの変換と前方一致で長い順・高評価順に辞書データを表示5番目あたりでUnicode変換、西暦和暦変換、メアド変換、装飾文字などの特殊変換を挿入する
```
## ライブ変換
## ライブ変換Live Conversion
ライブ変換はかなり単純なアイデアで実現しています。ライブ変換のない場合と同様に変換候補をリクエストし、「(予測変換ではなく)完全一致変換の中で最も順位が高いもの」をディスプレイします。
## 予測変換
## 予測変換Prediction
予測変換は「入力中mid composition」と「確定後post composition」で実装が異なります。

View File

@ -146,15 +146,15 @@ extension Subcommands {
///
var user_dictionary: [InputUserDictionaryItem]? = nil
}
struct InputUserDictionaryItem: Codable {
///
var word: String
///
var reading: String
///
var hint: String? = nil
}
struct InputUserDictionaryItem: Codable {
///
var word: String
///
var reading: String
///
var hint: String? = nil
}
struct EvaluateResult: Codable {

View File

@ -30,6 +30,8 @@ extension Subcommands {
var reportScore = false
@Flag(name: [.customLong("roman2kana")], help: "Use roman2kana input.")
var roman2kana = false
@Option(name: [.customLong("config_user_dictionary")], help: "User Dictionary JSON file path")
var configUserDictionary: String? = nil
@Option(name: [.customLong("config_zenzai_inference_limit")], help: "inference limit for zenzai.")
var configZenzaiInferenceLimit: Int = .max
@Flag(name: [.customLong("config_zenzai_rich_n_best")], help: "enable rich n_best generation for zenzai.")
@ -70,6 +72,15 @@ extension Subcommands {
}
}
private func parseUserDictionaryFile() throws -> [InputUserDictionaryItem] {
guard let configUserDictionary else {
return []
}
let url = URL(fileURLWithPath: configUserDictionary)
let data = try Data(contentsOf: url)
return try JSONDecoder().decode([InputUserDictionaryItem].self, from: data)
}
@MainActor mutating func run() async {
if self.zenzV1 || self.zenzV2 {
print("\(bold: "We strongly recommend to use zenz-v3 models")")
@ -80,6 +91,11 @@ extension Subcommands {
if !self.zenzWeightPath.isEmpty && (!self.zenzV1 && !self.zenzV2 && !self.zenzV3) {
print("zenz version is not specified. By default, zenz-v3 will be used.")
}
let userDictionary = try! self.parseUserDictionaryFile().map {
DicdataElement(word: $0.word, ruby: $0.reading.toKatakana(), cid: CIDData..cid, mid: MIDData..mid, value: -10)
}
let learningType: LearningType = if self.readOnlyMemoryPath != nil {
//
.onlyOutput
@ -107,6 +123,7 @@ extension Subcommands {
converter.sendToDicdataStore(
.setRequestOptions(requestOptions(learningType: learningType, memoryDirectory: memoryDirectory, leftSideContext: nil))
)
converter.sendToDicdataStore(.importDynamicUserDict(userDictionary))
var composingText = ComposingText()
let inputStyle: InputStyle = self.roman2kana ? .roman2kana : .direct
var lastCandidates: [Candidate] = []
@ -220,7 +237,7 @@ extension Subcommands {
print("Submit \(candidate.text)")
converter.setCompletedData(candidate)
converter.updateLearningData(candidate)
composingText.prefixComplete(correspondingCount: candidate.correspondingCount)
composingText.prefixComplete(composingCount: candidate.composingCount)
if composingText.isEmpty {
composingText.stopComposition()
converter.stopComposition()

View File

@ -6,6 +6,7 @@
// Copyright © 2020 ensan. All rights reserved.
//
import Algorithms
import Foundation
import SwiftUtils
@ -28,11 +29,36 @@ extension Kana2Kanji {
/// (4)
func kana2lattice_all(_ inputData: ComposingText, N_best: Int, needTypoCorrection: Bool) -> (result: LatticeNode, lattice: Lattice) {
debug("新規に計算を行います。inputされた文字列は\(inputData.input.count)文字分の\(inputData.convertTarget)")
let count: Int = inputData.input.count
let result: LatticeNode = LatticeNode.EOSNode
let lattice: Lattice = Lattice(nodes: (.zero ..< count).map {dicdataStore.getLOUDSDataInRange(inputData: inputData, from: $0, needTypoCorrection: needTypoCorrection)})
let inputCount: Int = inputData.input.count
let surfaceCount = inputData.convertTarget.count
let indexMap = LatticeDualIndexMap(inputData)
let latticeIndices = indexMap.indices(inputCount: inputCount, surfaceCount: surfaceCount)
let rawNodes = latticeIndices.map { index in
let inputRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let iIndex = index.inputIndex {
(iIndex, nil)
} else {
nil
}
let surfaceRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let sIndex = index.surfaceIndex {
(sIndex, nil)
} else {
nil
}
return dicdataStore.lookupDicdata(
composingText: inputData,
inputRange: inputRange,
surfaceRange: surfaceRange,
needTypoCorrection: needTypoCorrection
)
}
let lattice: Lattice = Lattice(
inputCount: inputCount,
surfaceCount: surfaceCount,
rawNodes: rawNodes
)
// inodes
for (i, nodeArray) in lattice.enumerated() {
for (isHead, nodeArray) in lattice.indexedNodes(indices: latticeIndices) {
// node
for node in nodeArray {
if node.prevs.isEmpty {
@ -43,20 +69,20 @@ extension Kana2Kanji {
}
//
let wValue: PValue = node.data.value()
if i == 0 {
if isHead {
// values
node.values = node.prevs.map {$0.totalValue + wValue + self.dicdataStore.getCCValue($0.data.rcid, node.data.lcid)}
} else {
// values
node.values = node.prevs.map {$0.totalValue + wValue}
}
//
let nextIndex: Int = node.inputRange.endIndex
// index
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
// count
if nextIndex == count {
if nextIndex.inputIndex == inputCount && nextIndex.surfaceIndex == surfaceCount {
self.updateResultNode(with: node, resultNode: result)
} else {
self.updateNextNodes(with: node, nextNodes: lattice[inputIndex: nextIndex], nBest: N_best)
self.updateNextNodes(with: node, nextNodes: lattice[index: nextIndex], nBest: N_best)
}
}
}
@ -70,7 +96,7 @@ extension Kana2Kanji {
}
}
/// N-Best
func updateNextNodes(with node: LatticeNode, nextNodes: [LatticeNode], nBest: Int) {
func updateNextNodes(with node: LatticeNode, nextNodes: some Sequence<LatticeNode>, nBest: Int) {
for nextnode in nextNodes {
if self.dicdataStore.shouldBeRemoved(data: nextnode.data) {
continue

View File

@ -1,3 +1,4 @@
import Algorithms
import Foundation
import SwiftUtils
@ -20,11 +21,36 @@ extension Kana2Kanji {
/// (4)
func kana2lattice_all_with_prefix_constraint(_ inputData: ComposingText, N_best: Int, constraint: PrefixConstraint) -> (result: LatticeNode, lattice: Lattice) {
debug("新規に計算を行います。inputされた文字列は\(inputData.input.count)文字分の\(inputData.convertTarget)。制約は\(constraint)")
let count: Int = inputData.input.count
let result: LatticeNode = LatticeNode.EOSNode
let lattice: Lattice = Lattice(nodes: (.zero ..< count).map {dicdataStore.getLOUDSDataInRange(inputData: inputData, from: $0, needTypoCorrection: false)})
let inputCount: Int = inputData.input.count
let surfaceCount = inputData.convertTarget.count
let indexMap = LatticeDualIndexMap(inputData)
let latticeIndices = indexMap.indices(inputCount: inputCount, surfaceCount: surfaceCount)
let rawNodes = latticeIndices.map { index in
let inputRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let iIndex = index.inputIndex {
(iIndex, nil)
} else {
nil
}
let surfaceRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let sIndex = index.surfaceIndex {
(sIndex, nil)
} else {
nil
}
return dicdataStore.lookupDicdata(
composingText: inputData,
inputRange: inputRange,
surfaceRange: surfaceRange,
needTypoCorrection: false
)
}
let lattice: Lattice = Lattice(
inputCount: inputCount,
surfaceCount: surfaceCount,
rawNodes: rawNodes
)
// inodes
for (i, nodeArray) in lattice.enumerated() {
for (isHead, nodeArray) in lattice.indexedNodes(indices: latticeIndices) {
// node
for node in nodeArray {
if node.prevs.isEmpty {
@ -32,7 +58,7 @@ extension Kana2Kanji {
}
//
let wValue: PValue = node.data.value()
if i == 0 {
if isHead {
// values
node.values = node.prevs.map {$0.totalValue + wValue + self.dicdataStore.getCCValue($0.data.rcid, node.data.lcid)}
} else {
@ -40,9 +66,9 @@ extension Kana2Kanji {
node.values = node.prevs.map {$0.totalValue + wValue}
}
//
let nextIndex: Int = node.inputRange.endIndex
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
// count
if nextIndex == count {
if nextIndex.inputIndex == inputCount && nextIndex.surfaceIndex == surfaceCount {
for index in node.prevs.indices {
let newnode: RegisteredNode = node.getRegisteredNode(index, value: node.values[index])
//
@ -61,7 +87,7 @@ extension Kana2Kanji {
Array(($0.data.reduce(into: "") { $0.append(contentsOf: $1.word)} + node.data.word).utf8)
}
// nodenextnode
for nextnode in lattice[inputIndex: nextIndex] {
for nextnode in lattice[index: nextIndex] {
//
let ccValue: PValue = self.dicdataStore.getCCValue(node.data.rcid, nextnode.data.lcid)
// nodeprevnode

View File

@ -14,7 +14,7 @@ extension Kana2Kanji {
return Candidate(
text: left.text + right.text,
value: left.value + right.value,
correspondingCount: left.correspondingCount + right.correspondingCount,
composingCount: .composite(left.composingCount, right.composingCount),
lastMid: right.lastMid,
data: left.data + right.data
)
@ -26,7 +26,7 @@ extension Kana2Kanji {
return Candidate(
text: left.text + right.text,
value: newValue,
correspondingCount: left.correspondingCount + right.correspondingCount,
composingCount: .composite(left.composingCount, right.composingCount),
lastMid: right.lastMid,
data: left.data + right.data
)
@ -57,7 +57,7 @@ extension Kana2Kanji {
prefixCandidate.data = prefixCandidateData
prefixCandidate.text = prefixCandidateData.reduce(into: "") { $0 += $1.word }
prefixCandidate.correspondingCount = prefixCandidateData.reduce(into: 0) { $0 += $1.ruby.count }
prefixCandidate.composingCount = .surfaceCount(prefixCandidateData.reduce(into: 0) { $0 += $1.ruby.count })
}
totalWord.insert(contentsOf: element.word, at: totalWord.startIndex)

View File

@ -6,6 +6,7 @@
// Copyright © 2020 ensan. All rights reserved.
//
import Algorithms
import Foundation
import SwiftUtils
@ -17,29 +18,32 @@ extension Kana2Kanji {
/// (2)
func kana2lattice_afterComplete(_ inputData: ComposingText, completedData: Candidate, N_best: Int, previousResult: (inputData: ComposingText, lattice: Lattice), needTypoCorrection: Bool) -> (result: LatticeNode, lattice: Lattice) {
debug("確定直後の変換、前は:", previousResult.inputData, "後は:", inputData)
let count = inputData.input.count
let inputCount = inputData.input.count
let surfaceCount = inputData.convertTarget.count
// TODO: input/convertTargetsuffix
let convertedInputCount = previousResult.inputData.input.count - inputCount
let convertedSurfaceCount = previousResult.inputData.convertTarget.count - surfaceCount
// (1)
let start = RegisteredNode.fromLastCandidate(completedData)
let lattice = previousResult.lattice.suffix(count)
for (i, nodeArray) in lattice.enumerated() {
if i == .zero {
for node in nodeArray {
node.prevs = [start]
// inputRange
node.inputRange = node.inputRange.startIndex - completedData.correspondingCount ..< node.inputRange.endIndex - completedData.correspondingCount
}
let indexMap = LatticeDualIndexMap(inputData)
let latticeIndices = indexMap.indices(inputCount: inputCount, surfaceCount: surfaceCount)
let lattice = previousResult.lattice.suffix(inputCount: inputCount, surfaceCount: surfaceCount)
for (isHead, nodeArray) in lattice.indexedNodes(indices: latticeIndices) {
let prevs: [RegisteredNode] = if isHead {
[start]
} else {
for node in nodeArray {
node.prevs = []
// inputRange
node.inputRange = node.inputRange.startIndex - completedData.correspondingCount ..< node.inputRange.endIndex - completedData.correspondingCount
}
[]
}
for node in nodeArray {
node.prevs = prevs
// inputRange
node.range = node.range.offseted(inputOffset: -convertedInputCount, surfaceOffset: -convertedSurfaceCount)
}
}
// (2)
let result = LatticeNode.EOSNode
for (i, nodeArray) in lattice.enumerated() {
for (isHead, nodeArray) in lattice.indexedNodes(indices: latticeIndices) {
for node in nodeArray {
if node.prevs.isEmpty {
continue
@ -49,7 +53,7 @@ extension Kana2Kanji {
}
//
let wValue = node.data.value()
if i == 0 {
if isHead {
// values
node.values = node.prevs.map {$0.totalValue + wValue + self.dicdataStore.getCCValue($0.data.rcid, node.data.lcid)}
} else {
@ -57,11 +61,11 @@ extension Kana2Kanji {
node.values = node.prevs.map {$0.totalValue + wValue}
}
//
let nextIndex = node.inputRange.endIndex
if nextIndex != count {
self.updateNextNodes(with: node, nextNodes: lattice[inputIndex: nextIndex], nBest: N_best)
} else {
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
if nextIndex.inputIndex == inputCount || nextIndex.surfaceIndex == surfaceCount {
self.updateResultNode(with: node, resultNode: result)
} else {
self.updateNextNodes(with: node, nextNodes: lattice[index: nextIndex], nBest: N_best)
}
}

View File

@ -6,6 +6,7 @@
// Copyright © 2020 ensan. All rights reserved.
//
import Algorithms
import Foundation
import SwiftUtils
@ -24,28 +25,59 @@ extension Kana2Kanji {
///
/// (5)
func kana2lattice_changed(_ inputData: ComposingText, N_best: Int, counts: (deleted: Int, added: Int), previousResult: (inputData: ComposingText, lattice: Lattice), needTypoCorrection: Bool) -> (result: LatticeNode, lattice: Lattice) {
func kana2lattice_changed(
_ inputData: ComposingText,
N_best: Int,
counts: (deletedInput: Int, addedInput: Int, deletedSurface: Int, addedSurface: Int),
previousResult: (inputData: ComposingText, lattice: Lattice),
needTypoCorrection: Bool
) -> (result: LatticeNode, lattice: Lattice) {
// (0)
let count = inputData.input.count
let commonCount = previousResult.inputData.input.count - counts.deleted
debug("kana2lattice_changed", inputData, counts, previousResult.inputData, count, commonCount)
let inputCount = inputData.input.count
let surfaceCount = inputData.convertTarget.count
let commonInputCount = previousResult.inputData.input.count - counts.deletedInput
let commonSurfaceCount = previousResult.inputData.convertTarget.count - counts.deletedSurface
debug("kana2lattice_changed", inputData, counts, previousResult.inputData, inputCount, commonInputCount)
// (1)
var lattice = previousResult.lattice.prefix(commonCount)
let indexMap = LatticeDualIndexMap(inputData)
let latticeIndices = indexMap.indices(inputCount: inputCount, surfaceCount: surfaceCount)
var lattice = previousResult.lattice.prefix(inputCount: commonInputCount, surfaceCount: commonSurfaceCount)
let terminalNodes: Lattice
if counts.added == 0 {
terminalNodes = Lattice(nodes: lattice.map {
var terminalNodes = Lattice(
inputCount: inputCount,
surfaceCount: surfaceCount,
rawNodes: lattice.map {
$0.filter {
$0.inputRange.endIndex == count
$0.range.endIndex == .input(inputCount) || $0.range.endIndex == .surface(surfaceCount)
}
})
} else {
}
)
if !(counts.addedInput == 0 && counts.addedSurface == 0) {
// (2)
let addedNodes: Lattice = Lattice(nodes: (0..<count).map {(i: Int) in
self.dicdataStore.getLOUDSDataInRange(inputData: inputData, from: i, toIndexRange: max(commonCount, i) ..< count, needTypoCorrection: needTypoCorrection)
})
let rawNodes = latticeIndices.map { index in
let inputRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let iIndex = index.inputIndex, max(commonInputCount, iIndex) < inputCount {
(iIndex, max(commonInputCount, iIndex) ..< inputCount)
} else {
nil
}
let surfaceRange: (startIndex: Int, endIndexRange: Range<Int>?)? = if let sIndex = index.surfaceIndex, max(commonSurfaceCount, sIndex) < surfaceCount {
(sIndex, max(commonSurfaceCount, sIndex) ..< surfaceCount)
} else {
nil
}
return self.dicdataStore.lookupDicdata(
composingText: inputData,
inputRange: inputRange,
surfaceRange: surfaceRange,
needTypoCorrection: needTypoCorrection
)
}
let addedNodes: Lattice = Lattice(
inputCount: inputCount,
surfaceCount: surfaceCount,
rawNodes: rawNodes
)
// (3)
for nodeArray in lattice {
for node in nodeArray {
@ -56,12 +88,14 @@ extension Kana2Kanji {
continue
}
//
let nextIndex = node.inputRange.endIndex
self.updateNextNodes(with: node, nextNodes: addedNodes[inputIndex: nextIndex], nBest: N_best)
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
if nextIndex != .bothIndex(inputIndex: inputCount, surfaceIndex: surfaceCount) {
self.updateNextNodes(with: node, nextNodes: addedNodes[index: nextIndex], nBest: N_best)
}
}
}
lattice.merge(addedNodes)
terminalNodes = addedNodes
terminalNodes.merge(addedNodes)
}
// (3)
@ -86,11 +120,11 @@ extension Kana2Kanji {
// values
node.values = node.prevs.map {$0.totalValue + wValue}
}
let nextIndex = node.inputRange.endIndex
if count == nextIndex {
let nextIndex = indexMap.dualIndex(for: node.range.endIndex)
if nextIndex.inputIndex == inputCount && nextIndex.surfaceIndex == surfaceCount {
self.updateResultNode(with: node, resultNode: result)
} else {
self.updateNextNodes(with: node, nextNodes: terminalNodes[inputIndex: nextIndex], nBest: N_best)
self.updateNextNodes(with: node, nextNodes: terminalNodes[index: nextIndex], nBest: N_best)
}
}
}

View File

@ -6,6 +6,7 @@
// Copyright © 2022 ensan. All rights reserved.
//
import Algorithms
import Foundation
import SwiftUtils
@ -26,12 +27,13 @@ extension Kana2Kanji {
func kana2lattice_no_change(N_best: Int, previousResult: (inputData: ComposingText, lattice: Lattice)) -> (result: LatticeNode, lattice: Lattice) {
debug("キャッシュから復元、元の文字は:", previousResult.inputData.convertTarget)
let count = previousResult.inputData.input.count
let inputCount = previousResult.inputData.input.count
let surfaceCount = previousResult.inputData.convertTarget.count
// (1)
let result = LatticeNode.EOSNode
for nodeArray in previousResult.lattice {
for node in nodeArray where node.inputRange.endIndex == count {
for node in nodeArray where node.range.endIndex == .input(inputCount) || node.range.endIndex == .surface(surfaceCount) {
if node.prevs.isEmpty {
continue
}

View File

@ -34,11 +34,16 @@ struct Kana2Kanji {
let text = data.clauses.map {$0.clause.text}.joined()
let value = data.clauses.last!.value + mmValue.value
let lastMid = data.clauses.last!.clause.mid
let correspondingCount = data.clauses.reduce(into: 0) {$0 += $1.clause.inputRange.count}
let composingCount: ComposingCount = data.clauses.reduce(into: .inputCount(0)) {
for range in $1.clause.ranges {
$0 = .composite($0, range.count)
}
}
return Candidate(
text: text,
value: value,
correspondingCount: correspondingCount,
composingCount: composingCount,
lastMid: lastMid,
data: data.data
)

View File

@ -1,49 +1,261 @@
struct Lattice: Sequence {
typealias Element = [LatticeNode]
typealias Iterator = IndexingIterator<[[LatticeNode]]>
import Algorithms
import SwiftUtils
init(nodes: [[LatticeNode]] = []) {
self.nodes = nodes
struct LatticeNodeArray: Sequence {
typealias Element = LatticeNode
var inputIndexedNodes: [LatticeNode]
var surfaceIndexedNodes: [LatticeNode]
func makeIterator() -> Chain2Sequence<[LatticeNode], [LatticeNode]>.Iterator {
inputIndexedNodes.chained(surfaceIndexedNodes).makeIterator()
}
}
struct LatticeDualIndexMap: Sendable {
private var inputIndexToSurfaceIndexMap: [Int: Int]
init(_ composingText: ComposingText) {
self.inputIndexToSurfaceIndexMap = composingText.inputIndexToSurfaceIndexMap()
}
private var nodes: [[LatticeNode]]
enum DualIndex: Sendable, Equatable, Hashable {
case inputIndex(Int)
case surfaceIndex(Int)
case bothIndex(inputIndex: Int, surfaceIndex: Int)
func prefix(_ k: Int) -> Lattice {
var lattice = Lattice(nodes: self.nodes.prefix(k).map {(nodes: [LatticeNode]) in
nodes.filter {$0.inputRange.endIndex <= k}
})
while lattice.nodes.last?.isEmpty ?? false {
lattice.nodes.removeLast()
var inputIndex: Int? {
switch self {
case .inputIndex(let index), .bothIndex(let index, _):
index
case .surfaceIndex:
nil
}
}
return lattice
}
func suffix(_ count: Int) -> Lattice {
Lattice(nodes: self.nodes.suffix(count))
}
mutating func merge(_ lattice: Lattice) {
for (index, nodeArray) in lattice.nodes.enumerated() where index < self.nodes.endIndex {
self.nodes[index].append(contentsOf: nodeArray)
}
if self.nodes.endIndex < lattice.nodes.endIndex {
for nodeArray in lattice.nodes[self.nodes.endIndex...] {
self.nodes.append(nodeArray)
var surfaceIndex: Int? {
switch self {
case .inputIndex:
nil
case .surfaceIndex(let index), .bothIndex(_, let index):
index
}
}
}
subscript(inputIndex i: Int) -> [LatticeNode] {
get {
self.nodes[i]
func dualIndex(for latticeIndex: Lattice.LatticeIndex) -> DualIndex {
switch latticeIndex {
case .input(let iIndex):
if let sIndex = self.inputIndexToSurfaceIndexMap[iIndex] {
.bothIndex(inputIndex: iIndex, surfaceIndex: sIndex)
} else {
.inputIndex(iIndex)
}
case .surface(let sIndex):
if let iIndex = self.inputIndexToSurfaceIndexMap.filter({ $0.value == sIndex}).first?.key {
.bothIndex(inputIndex: iIndex, surfaceIndex: sIndex)
} else {
.surfaceIndex(sIndex)
}
}
}
func makeIterator() -> IndexingIterator<[[LatticeNode]]> {
self.nodes.makeIterator()
func indices(inputCount: Int, surfaceCount: Int) -> [DualIndex] {
var indices: [DualIndex] = []
var sIndexPointer = 0
for i in 0 ..< inputCount {
if let sIndex = self.inputIndexToSurfaceIndexMap[i] {
for j in sIndexPointer ..< sIndex {
indices.append(.surfaceIndex(j))
}
indices.append(.bothIndex(inputIndex: i, surfaceIndex: sIndex))
sIndexPointer = sIndex + 1
} else {
indices.append(.inputIndex(i))
}
}
for j in sIndexPointer ..< surfaceCount {
indices.append(.surfaceIndex(j))
}
return indices
}
}
struct Lattice: Sequence {
typealias Element = LatticeNodeArray
init() {
self.inputIndexedNodes = []
self.surfaceIndexedNodes = []
}
init(inputCount: Int, surfaceCount: Int, rawNodes: [[LatticeNode]]) {
self.inputIndexedNodes = .init(repeating: [], count: inputCount)
self.surfaceIndexedNodes = .init(repeating: [], count: surfaceCount)
for nodes in rawNodes {
guard let first = nodes.first else { continue }
switch first.range.startIndex {
case .surface(let i):
self.surfaceIndexedNodes[i].append(contentsOf: nodes)
case .input(let i):
self.inputIndexedNodes[i].append(contentsOf: nodes)
}
}
}
private init(inputIndexedNodes: [[LatticeNode]], surfaceIndexedNodes: [[LatticeNode]]) {
self.inputIndexedNodes = inputIndexedNodes
self.surfaceIndexedNodes = surfaceIndexedNodes
}
private var inputIndexedNodes: [[LatticeNode]]
private var surfaceIndexedNodes: [[LatticeNode]]
func prefix(inputCount: Int, surfaceCount: Int) -> Lattice {
let filterClosure: (LatticeNode) -> Bool = { (node: LatticeNode) -> Bool in
switch node.range.endIndex {
case .input(let value):
value <= inputCount
case .surface(let value):
value <= surfaceCount
}
}
let newInputIndexedNodes = self.inputIndexedNodes.prefix(inputCount).map {(nodes: [LatticeNode]) in
nodes.filter(filterClosure)
}
let newSurfaceIndexedNodes = self.surfaceIndexedNodes.prefix(surfaceCount).map {(nodes: [LatticeNode]) in
nodes.filter(filterClosure)
}
return Lattice(inputIndexedNodes: newInputIndexedNodes, surfaceIndexedNodes: newSurfaceIndexedNodes)
}
func suffix(inputCount: Int, surfaceCount: Int) -> Lattice {
Lattice(
inputIndexedNodes: self.inputIndexedNodes.suffix(inputCount),
surfaceIndexedNodes: self.surfaceIndexedNodes.suffix(surfaceCount)
)
}
mutating func merge(_ lattice: Lattice) {
for (index, nodeArray) in lattice.inputIndexedNodes.enumerated() where index < self.inputIndexedNodes.endIndex {
self.inputIndexedNodes[index].append(contentsOf: nodeArray)
}
if self.inputIndexedNodes.endIndex < lattice.inputIndexedNodes.endIndex {
for nodeArray in lattice.inputIndexedNodes[self.inputIndexedNodes.endIndex...] {
self.inputIndexedNodes.append(nodeArray)
}
}
for (index, nodeArray) in lattice.surfaceIndexedNodes.enumerated() where index < self.surfaceIndexedNodes.endIndex {
self.surfaceIndexedNodes[index].append(contentsOf: nodeArray)
}
if self.surfaceIndexedNodes.endIndex < lattice.surfaceIndexedNodes.endIndex {
for nodeArray in lattice.surfaceIndexedNodes[self.surfaceIndexedNodes.endIndex...] {
self.surfaceIndexedNodes.append(nodeArray)
}
}
}
subscript(index index: LatticeDualIndexMap.DualIndex) -> LatticeNodeArray {
get {
let iNodes: [LatticeNode] = if let iIndex = index.inputIndex { self.inputIndexedNodes[iIndex] } else { [] }
let sNodes: [LatticeNode] = if let sIndex = index.surfaceIndex { self.surfaceIndexedNodes[sIndex] } else { [] }
return LatticeNodeArray(inputIndexedNodes: iNodes, surfaceIndexedNodes: sNodes)
}
}
func indexedNodes(indices: [LatticeDualIndexMap.DualIndex]) -> some Sequence<(isHead: Bool, nodes: LatticeNodeArray)> {
indices.lazy.map { index in
return (index.inputIndex == 0 && index.surfaceIndex == 0, self[index: index])
}
}
struct Iterator: IteratorProtocol {
init(lattice: Lattice) {
self.lattice = lattice
self.indices = (0, lattice.surfaceIndexedNodes.endIndex, 0, lattice.inputIndexedNodes.endIndex)
}
typealias Element = LatticeNodeArray
let lattice: Lattice
var indices: (currentSurfaceIndex: Int, surfaceEndIndex: Int, currentInputIndex: Int, inputEndIndex: Int)
mutating func next() -> LatticeNodeArray? {
if self.indices.currentSurfaceIndex < self.indices.surfaceEndIndex {
defer {
self.indices.currentSurfaceIndex += 1
}
return .init(inputIndexedNodes: [], surfaceIndexedNodes: self.lattice.surfaceIndexedNodes[self.indices.currentSurfaceIndex])
} else if self.indices.currentInputIndex < self.indices.inputEndIndex {
defer {
self.indices.currentInputIndex += 1
}
return .init(inputIndexedNodes: self.lattice.inputIndexedNodes[self.indices.currentInputIndex], surfaceIndexedNodes: [])
} else {
return nil
}
}
}
func makeIterator() -> Iterator {
Iterator(lattice: self)
}
var isEmpty: Bool {
self.nodes.isEmpty
self.inputIndexedNodes.isEmpty && self.surfaceIndexedNodes.isEmpty
}
enum LatticeIndex: Sendable, Equatable, Hashable {
case surface(Int)
case input(Int)
var isZero: Bool {
self == .surface(0) || self == .input(0)
}
}
enum LatticeRange: Sendable, Equatable, Hashable {
static var zero: Self {
.input(from: 0, to: 0)
}
case surface(from: Int, to: Int)
case input(from: Int, to: Int)
var count: ComposingCount {
switch self {
case .surface(let from, let to):
.surfaceCount(to - from)
case .input(let from, let to):
.inputCount(to - from)
}
}
var startIndex: LatticeIndex {
switch self {
case .surface(let from, _):
.surface(from)
case .input(let from, _):
.input(from)
}
}
var endIndex: LatticeIndex {
switch self {
case .surface(_, let to):
.surface(to)
case .input(_, let to):
.input(to)
}
}
func offseted(inputOffset: Int, surfaceOffset: Int) -> Self {
switch self {
case .surface(from: let from, to: let to):
.surface(from: from + surfaceOffset, to: to + surfaceOffset)
case .input(from: let from, to: let to):
.input(from: from + inputOffset, to: to + inputOffset)
}
}
}
}

View File

@ -17,29 +17,29 @@ public final class LatticeNode {
/// `prevs`
var values: [PValue] = []
/// inputData.inputrange
var inputRange: Range<Int>
var range: Lattice.LatticeRange
/// `EOS`
static var EOSNode: LatticeNode {
LatticeNode(data: DicdataElement.EOSData, inputRange: 0..<0)
LatticeNode(data: DicdataElement.EOSData, range: .zero)
}
init(data: DicdataElement, inputRange: Range<Int>) {
init(data: DicdataElement, range: Lattice.LatticeRange) {
self.data = data
self.values = [data.value()]
self.inputRange = inputRange
self.range = range
}
/// `LatticeNode``RegisteredNode`
/// `LatticeNode``RegisteredNode`1
func getRegisteredNode(_ index: Int, value: PValue) -> RegisteredNode {
RegisteredNode(data: self.data, registered: self.prevs[index], totalValue: value, inputRange: self.inputRange)
RegisteredNode(data: self.data, registered: self.prevs[index], totalValue: value, range: self.range)
}
/// `CandidateData`
/// - Returns:
/// - Note: `EOS`API
func getCandidateData() -> [CandidateData] {
self.prevs.map {$0.getCandidateData()}
return self.prevs.map {$0.getCandidateData()}
}
}

View File

@ -36,7 +36,7 @@ public struct PostCompositionPredictionCandidate {
candidate.data.append(data)
}
candidate.value = self.value
candidate.correspondingCount = candidate.data.reduce(into: 0) { $0 += $1.ruby.count }
candidate.composingCount = .surfaceCount(candidate.rubyCount)
candidate.lastMid = data.last(where: DicdataStore.includeMMValueCalculation)?.mid ?? candidate.lastMid
return candidate
case .replacement(let targetData, let replacementData):
@ -45,7 +45,7 @@ public struct PostCompositionPredictionCandidate {
candidate.text = candidate.data.reduce(into: "") {$0 += $1.word}
candidate.value = self.value
candidate.lastMid = candidate.data.last(where: DicdataStore.includeMMValueCalculation)?.mid ?? MIDData.BOS.mid
candidate.correspondingCount = candidate.data.reduce(into: 0) { $0 += $1.ruby.count }
candidate.composingCount = .surfaceCount(candidate.rubyCount)
return candidate
}
}

View File

@ -22,9 +22,17 @@ extension Kana2Kanji {
/// - note:
///
func getPredictionCandidates(composingText: ComposingText, prepart: CandidateData, lastClause: ClauseDataUnit, N_best: Int) -> [Candidate] {
debug("getPredictionCandidates", composingText, lastClause.inputRange, lastClause.text)
let lastRuby = ComposingText.getConvertTarget(for: composingText.input[lastClause.inputRange]).toKatakana()
let lastRubyCount = lastClause.inputRange.count
debug(#function, composingText, lastClause.ranges, lastClause.text)
let lastRuby = lastClause.ranges.reduce(into: "") {
let ruby = switch $1 {
case let .input(left, right):
ComposingText.getConvertTarget(for: composingText.input[left..<right]).toKatakana()
case let .surface(left, right):
String(composingText.convertTarget.dropFirst(left).prefix(right - left)).toKatakana()
}
$0.append(ruby)
}
let lastRubyCount = lastRuby.count
let datas: [DicdataElement]
do {
var _str = ""
@ -42,11 +50,11 @@ extension Kana2Kanji {
let osuserdict: [DicdataElement] = dicdataStore.getPrefixMatchDynamicUserDict(lastRuby)
let lastCandidate: Candidate = prepart.isEmpty ? Candidate(text: "", value: .zero, correspondingCount: 0, lastMid: MIDData.EOS.mid, data: []) : self.processClauseCandidate(prepart)
let lastCandidate: Candidate = prepart.isEmpty ? Candidate(text: "", value: .zero, composingCount: .inputCount(0), lastMid: MIDData.EOS.mid, data: []) : self.processClauseCandidate(prepart)
let lastRcid: Int = lastCandidate.data.last?.rcid ?? CIDData.EOS.cid
let nextLcid: Int = prepart.lastClause?.nextLcid ?? CIDData.EOS.cid
let lastMid: Int = lastCandidate.lastMid
let correspoindingCount: Int = lastCandidate.correspondingCount + lastRubyCount
let composingCount: ComposingCount = .composite(lastCandidate.composingCount, .surfaceCount(lastRubyCount))
let ignoreCCValue: PValue = self.dicdataStore.getCCValue(lastRcid, nextLcid)
let inputStyle = composingText.input.last?.inputStyle ?? .direct
@ -63,10 +71,10 @@ extension Kana2Kanji {
break
}
let possibleNexts: [Substring] = DicdataStore.possibleNexts[String(roman), default: []].map {ruby + $0}
debug("getPredictionCandidates", lastRuby, ruby, roman, possibleNexts, prepart, lastRubyCount)
debug(#function, lastRuby, ruby, roman, possibleNexts, prepart, lastRubyCount)
dicdata = possibleNexts.flatMap { self.dicdataStore.getPredictionLOUDSDicdata(key: $0) }
} else {
debug("getPredicitonCandidates", lastRuby, roman)
debug(#function, lastRuby, "roman == \"\"")
dicdata = self.dicdataStore.getPredictionLOUDSDicdata(key: lastRuby)
}
}
@ -91,7 +99,7 @@ extension Kana2Kanji {
let candidate: Candidate = Candidate(
text: lastCandidate.text + data.word,
value: newValue,
correspondingCount: correspoindingCount,
composingCount: composingCount,
lastMid: includeMMValueCalculation ? data.mid:lastMid,
data: nodedata
)

View File

@ -14,7 +14,7 @@ protocol RegisteredNodeProtocol {
var data: DicdataElement {get}
var prev: (any RegisteredNodeProtocol)? {get}
var totalValue: PValue {get}
var inputRange: Range<Int> {get}
var range: Lattice.LatticeRange {get}
}
struct RegisteredNode: RegisteredNodeProtocol {
@ -25,19 +25,19 @@ struct RegisteredNode: RegisteredNodeProtocol {
///
let totalValue: PValue
/// `composingText``input`
let inputRange: Range<Int>
let range: Lattice.LatticeRange
init(data: DicdataElement, registered: RegisteredNode?, totalValue: PValue, inputRange: Range<Int>) {
init(data: DicdataElement, registered: RegisteredNode?, totalValue: PValue, range: Lattice.LatticeRange) {
self.data = data
self.prev = registered
self.totalValue = totalValue
self.inputRange = inputRange
self.range = range
}
///
/// - Returns:
static func BOSNode() -> RegisteredNode {
RegisteredNode(data: DicdataElement.BOSData, registered: nil, totalValue: 0, inputRange: 0 ..< 0)
RegisteredNode(data: DicdataElement.BOSData, registered: nil, totalValue: 0, range: .zero)
}
///
@ -47,7 +47,7 @@ struct RegisteredNode: RegisteredNodeProtocol {
data: DicdataElement(word: "", ruby: "", lcid: CIDData.BOS.cid, rcid: candidate.data.last?.rcid ?? CIDData.BOS.cid, mid: candidate.lastMid, value: 0),
registered: nil,
totalValue: 0,
inputRange: 0 ..< 0
range: .zero
)
}
}
@ -59,7 +59,7 @@ extension RegisteredNodeProtocol {
guard let prev else {
let unit = ClauseDataUnit()
unit.mid = self.data.mid
unit.inputRange = self.inputRange
unit.ranges = [self.range]
return CandidateData(clauses: [(clause: unit, value: .zero)], data: [])
}
var lastcandidate = prev.getCandidateData() // registerd
@ -75,7 +75,7 @@ extension RegisteredNodeProtocol {
if lastClause.text.isEmpty || !DicdataStore.isClause(prev.data.rcid, self.data.lcid) {
//
lastClause.text.append(self.data.word)
lastClause.inputRange = lastClause.inputRange.startIndex ..< self.inputRange.endIndex
lastClause.ranges.append(self.range)
//
if (lastClause.mid == 500 && self.data.mid != 500) || DicdataStore.includeMMValueCalculation(self.data) {
lastClause.mid = self.data.mid
@ -88,7 +88,7 @@ extension RegisteredNodeProtocol {
else {
let unit = ClauseDataUnit()
unit.text = self.data.word
unit.inputRange = self.inputRange
unit.ranges.append(self.range)
if DicdataStore.includeMMValueCalculation(self.data) {
unit.mid = self.data.mid
}

View File

@ -65,7 +65,7 @@ extension Kana2Kanji {
var constraint = zenzaiCache?.getNewConstraint(for: inputData) ?? PrefixConstraint([])
debug("initial constraint", constraint)
let eosNode = LatticeNode.EOSNode
var lattice: Lattice = Lattice(nodes: [])
var lattice: Lattice = Lattice()
var constructedCandidates: [(RegisteredNode, Candidate)] = []
var insertedCandidates: [(RegisteredNode, Candidate)] = []
defer {

View File

@ -17,28 +17,28 @@ final class ClauseDataUnit {
/// The text of the unit.
var text: String = ""
/// The range of the unit in input text.
var inputRange: Range<Int> = 0 ..< 0
var ranges: [Lattice.LatticeRange] = []
/// Merge the given unit to this unit.
/// - Parameter:
/// - unit: The unit to merge.
func merge(with unit: ClauseDataUnit) {
self.text.append(unit.text)
self.inputRange = self.inputRange.startIndex ..< unit.inputRange.endIndex
self.ranges.append(contentsOf: unit.ranges)
self.nextLcid = unit.nextLcid
}
}
extension ClauseDataUnit: Equatable {
static func == (lhs: ClauseDataUnit, rhs: ClauseDataUnit) -> Bool {
lhs.mid == rhs.mid && lhs.nextLcid == rhs.nextLcid && lhs.text == rhs.text && lhs.inputRange == rhs.inputRange
lhs.mid == rhs.mid && lhs.nextLcid == rhs.nextLcid && lhs.text == rhs.text && lhs.ranges == rhs.ranges
}
}
#if DEBUG
extension ClauseDataUnit: CustomDebugStringConvertible {
var debugDescription: String {
"ClauseDataUnit(mid: \(mid), nextLcid: \(nextLcid), text: \(text), inputRange: \(inputRange))"
"ClauseDataUnit(mid: \(mid), nextLcid: \(nextLcid), text: \(text), ranges: \(ranges))"
}
}
#endif
@ -67,14 +67,35 @@ public enum CompleteAction: Equatable, Sendable {
case moveCursor(Int)
}
public enum ComposingCount: Equatable, Sendable {
/// composingText.input
case inputCount(Int)
/// composingText.convertTarge
case surfaceCount(Int)
///
indirect case composite(lhs: Self, rhs: Self)
static func composite(_ lhs: Self, _ rhs: Self) -> Self {
switch (lhs, rhs) {
case (.inputCount(let l), .inputCount(let r)):
.inputCount(l + r)
case (.surfaceCount(let l), .surfaceCount(let r)):
.surfaceCount(l + r)
default:
.composite(lhs: lhs, rhs: rhs)
}
}
}
///
public struct Candidate: Sendable {
///
public var text: String
///
public var value: PValue
/// composingText.input
public var correspondingCount: Int
public var composingCount: ComposingCount
/// mid()
public var lastMid: Int
/// DicdataElement
@ -86,14 +107,18 @@ public struct Candidate: Sendable {
/// - note:
public let inputable: Bool
public init(text: String, value: PValue, correspondingCount: Int, lastMid: Int, data: [DicdataElement], actions: [CompleteAction] = [], inputable: Bool = true) {
///
public let rubyCount: Int
public init(text: String, value: PValue, composingCount: ComposingCount, lastMid: Int, data: [DicdataElement], actions: [CompleteAction] = [], inputable: Bool = true) {
self.text = text
self.value = value
self.correspondingCount = correspondingCount
self.composingCount = composingCount
self.lastMid = lastMid
self.data = data
self.actions = actions
self.inputable = inputable
self.rubyCount = self.data.reduce(into: 0) { $0 += $1.ruby.count }
}
/// `action`
/// - parameters:
@ -138,7 +163,7 @@ public struct Candidate: Sendable {
/// prefixCandidate
public static func makePrefixClauseCandidate(data: some Collection<DicdataElement>) -> Candidate {
var text = ""
var correspondingCount = 0
var composingCount = 0
var lastRcid = CIDData.BOS.cid
var lastMid = 501
var candidateData: [DicdataElement] = []
@ -148,7 +173,7 @@ public struct Candidate: Sendable {
break
}
text.append(item.word)
correspondingCount += item.ruby.count
composingCount += item.ruby.count
lastRcid = item.rcid
//
if item.mid != 500 && DicdataStore.includeMMValueCalculation(item) {
@ -159,7 +184,7 @@ public struct Candidate: Sendable {
return Candidate(
text: text,
value: -5,
correspondingCount: correspondingCount,
composingCount: .surfaceCount(composingCount),
lastMid: lastMid,
data: candidateData
)

View File

@ -28,8 +28,9 @@ public struct ConvertRequestOptions: Sendable {
/// - textReplacer:
/// - specialCandidateProviders:
/// - metadata: `ConvertRequestOptions.Metadata`
public init(N_best: Int = 10, requireJapanesePrediction: Bool, requireEnglishPrediction: Bool, keyboardLanguage: KeyboardLanguage, englishCandidateInRoman2KanaInput: Bool = false, fullWidthRomanCandidate: Bool = false, halfWidthKanaCandidate: Bool = false, learningType: LearningType, maxMemoryCount: Int = 65536, shouldResetMemory: Bool = false, dictionaryResourceURL: URL, memoryDirectoryURL: URL, sharedContainerURL: URL, textReplacer: TextReplacer, specialCandidateProviders: [any SpecialCandidateProvider]?, zenzaiMode: ZenzaiMode = .off, preloadDictionary: Bool = false, metadata: ConvertRequestOptions.Metadata?) {
public init(N_best: Int = 10, needTypoCorrection: Bool? = nil, requireJapanesePrediction: Bool, requireEnglishPrediction: Bool, keyboardLanguage: KeyboardLanguage, englishCandidateInRoman2KanaInput: Bool = false, fullWidthRomanCandidate: Bool = false, halfWidthKanaCandidate: Bool = false, learningType: LearningType, maxMemoryCount: Int = 65536, shouldResetMemory: Bool = false, dictionaryResourceURL: URL, memoryDirectoryURL: URL, sharedContainerURL: URL, textReplacer: TextReplacer, specialCandidateProviders: [any SpecialCandidateProvider]?, zenzaiMode: ZenzaiMode = .off, preloadDictionary: Bool = false, metadata: ConvertRequestOptions.Metadata?) {
self.N_best = N_best
self.needTypoCorrection = needTypoCorrection
self.requireJapanesePrediction = requireJapanesePrediction
self.requireEnglishPrediction = requireEnglishPrediction
self.keyboardLanguage = keyboardLanguage
@ -86,6 +87,7 @@ public struct ConvertRequestOptions: Sendable {
specialCandidateProviders.append(.commaSeparatedNumber)
self.N_best = N_best
self.needTypoCorrection = nil
self.requireJapanesePrediction = requireJapanesePrediction
self.requireEnglishPrediction = requireEnglishPrediction
self.keyboardLanguage = keyboardLanguage
@ -157,6 +159,7 @@ public struct ConvertRequestOptions: Sendable {
public var requireJapanesePrediction: Bool
public var requireEnglishPrediction: Bool
public var keyboardLanguage: KeyboardLanguage
public var needTypoCorrection: Bool?
// KeyboardSettinginjection
public var englishCandidateInRoman2KanaInput: Bool
public var fullWidthRomanCandidate: Bool
@ -183,6 +186,7 @@ public struct ConvertRequestOptions: Sendable {
static var `default`: Self {
Self(
N_best: 10,
needTypoCorrection: nil,
requireJapanesePrediction: true,
requireEnglishPrediction: true,
keyboardLanguage: .ja_JP,

View File

@ -168,7 +168,7 @@ import EfficientNGram
var textIndex = [String: Int]()
for candidate in candidates where !candidate.text.isEmpty && !seenCandidates.contains(candidate.text) {
if let index = textIndex[candidate.text] {
if result[index].value < candidate.value || result[index].correspondingCount < candidate.correspondingCount {
if result[index].value < candidate.value || result[index].rubyCount < candidate.rubyCount {
result[index] = candidate
}
} else {
@ -219,7 +219,7 @@ import EfficientNGram
let candidate: Candidate = Candidate(
text: ruby,
value: penalty,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: data
)
@ -232,7 +232,7 @@ import EfficientNGram
let candidate: Candidate = Candidate(
text: word,
value: value,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: data
)
@ -251,7 +251,7 @@ import EfficientNGram
let candidate: Candidate = Candidate(
text: ruby,
value: penalty,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: data
)
@ -264,7 +264,7 @@ import EfficientNGram
let candidate: Candidate = Candidate(
text: word,
value: value,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: data
)
@ -368,7 +368,7 @@ import EfficientNGram
private func getAdditionalCandidate(_ inputData: ComposingText, options: ConvertRequestOptions) -> [Candidate] {
var candidates: [Candidate] = []
let string = inputData.convertTarget.toKatakana()
let correspondingCount = inputData.input.count
let composingCount: ComposingCount = .inputCount(inputData.input.count)
do {
//
let value = -14 * getKatakanaScore(string)
@ -376,7 +376,7 @@ import EfficientNGram
let katakana = Candidate(
text: string,
value: value,
correspondingCount: correspondingCount,
composingCount: composingCount,
lastMid: MIDData..mid,
data: [data]
)
@ -390,7 +390,7 @@ import EfficientNGram
let hiragana = Candidate(
text: hiraganaString,
value: -14.5,
correspondingCount: correspondingCount,
composingCount: composingCount,
lastMid: MIDData..mid,
data: [data]
)
@ -403,7 +403,7 @@ import EfficientNGram
let uppercasedLetter = Candidate(
text: word,
value: -14.6,
correspondingCount: correspondingCount,
composingCount: composingCount,
lastMid: MIDData..mid,
data: [data]
)
@ -416,7 +416,7 @@ import EfficientNGram
let fullWidthLetter = Candidate(
text: word,
value: -14.7,
correspondingCount: correspondingCount,
composingCount: composingCount,
lastMid: MIDData..mid,
data: [data]
)
@ -429,7 +429,7 @@ import EfficientNGram
let halfWidthKatakana = Candidate(
text: word,
value: -15,
correspondingCount: correspondingCount,
composingCount: composingCount,
lastMid: MIDData..mid,
data: [data]
)
@ -472,7 +472,7 @@ import EfficientNGram
return Candidate(
text: first.clause.text,
value: first.value,
correspondingCount: first.clause.inputRange.count,
composingCount: first.clause.ranges.reduce(into: .inputCount(0)) { $0 = .composite($0, $1.count) },
lastMid: first.clause.mid,
data: Array(candidateData.data[0...count])
)
@ -529,21 +529,21 @@ import EfficientNGram
var seenCandidate: Set<String> = full_candidate.mapSet {$0.text}
// 5
let clause_candidates = self.getUniqueCandidate(clauseCandidates, seenCandidates: seenCandidate).min(count: 5) {
if $0.correspondingCount == $1.correspondingCount {
if $0.rubyCount == $1.rubyCount {
$0.value > $1.value
} else {
$0.correspondingCount > $1.correspondingCount
$0.rubyCount > $1.rubyCount
}
}
seenCandidate.formUnion(clause_candidates.map {$0.text})
//
let dicCandidates: [Candidate] = result.lattice[inputIndex: 0]
let dicCandidates: [Candidate] = result.lattice[index: .bothIndex(inputIndex: 0, surfaceIndex: 0)]
.map {
Candidate(
text: $0.data.word,
value: $0.data.value(),
correspondingCount: $0.inputRange.count,
composingCount: $0.range.count,
lastMid: $0.data.mid,
data: [$0.data]
)
@ -554,8 +554,8 @@ import EfficientNGram
//
var word_candidates: [Candidate] = self.getUniqueCandidate(dicCandidates.chained(additionalCandidates), seenCandidates: seenCandidate)
.sorted {
let count0 = $0.correspondingCount
let count1 = $1.correspondingCount
let count0 = $0.rubyCount
let count1 = $1.rubyCount
return count0 == count1 ? $0.value > $1.value : count0 > count1
}
seenCandidate.formUnion(word_candidates.map {$0.text})
@ -589,13 +589,17 @@ import EfficientNGram
item.parseTemplate()
}
// 5
let firstClauseResults = self.getUniqueCandidate(clauseCandidates).min(count: 5) {
if $0.correspondingCount == $1.correspondingCount {
var firstClauseResults = self.getUniqueCandidate(clauseCandidates).min(count: 5) {
if $0.rubyCount == $1.rubyCount {
$0.value > $1.value
} else {
$0.correspondingCount > $1.correspondingCount
$0.rubyCount > $1.rubyCount
}
}
firstClauseResults.mutatingForEach { item in
item.withActions(self.getAppropriateActions(item))
item.parseTemplate()
}
return ConversionResult(mainResults: result, firstClauseResults: firstClauseResults)
}
@ -605,7 +609,7 @@ import EfficientNGram
/// - N_best:
/// - Returns:
///
private func convertToLattice(_ inputData: ComposingText, N_best: Int, zenzaiMode: ConvertRequestOptions.ZenzaiMode) -> (result: LatticeNode, lattice: Lattice)? {
private func convertToLattice(_ inputData: ComposingText, N_best: Int, zenzaiMode: ConvertRequestOptions.ZenzaiMode, needTypoCorrection: Bool) -> (result: LatticeNode, lattice: Lattice)? {
if inputData.convertTarget.isEmpty {
return nil
}
@ -625,11 +629,6 @@ import EfficientNGram
self.previousInputData = inputData
return (result, nodes)
}
#if os(iOS)
let needTypoCorrection = true
#else
let needTypoCorrection = false
#endif
guard let previousInputData else {
debug("\(#function): 新規計算用の関数を呼びますA")
@ -662,7 +661,7 @@ import EfficientNGram
let diff = inputData.differenceSuffix(to: previousInputData)
debug("\(#function): 最後尾文字置換用の関数を呼びます、差分は\(diff)")
let result = converter.kana2lattice_changed(inputData, N_best: N_best, counts: (diff.deleted, diff.addedCount), previousResult: (inputData: previousInputData, lattice: self.lattice), needTypoCorrection: needTypoCorrection)
let result = converter.kana2lattice_changed(inputData, N_best: N_best, counts: diff, previousResult: (inputData: previousInputData, lattice: self.lattice), needTypoCorrection: needTypoCorrection)
self.previousInputData = inputData
return result
}
@ -698,7 +697,14 @@ import EfficientNGram
// DicdataStoreRequestOption
self.sendToDicdataStore(.setRequestOptions(options))
guard let result = self.convertToLattice(inputData, N_best: options.N_best, zenzaiMode: options.zenzaiMode) else {
#if os(iOS)
let needTypoCorrection = options.needTypoCorrection ?? true
#else
let needTypoCorrection = options.needTypoCorrection ?? false
#endif
guard let result = self.convertToLattice(inputData, N_best: options.N_best, zenzaiMode: options.zenzaiMode, needTypoCorrection: needTypoCorrection) else {
return ConversionResult(mainResults: [], firstClauseResults: [])
}

View File

@ -21,7 +21,7 @@ extension KanaKanjiConverter {
return result.map {[Candidate(
text: $0,
value: -15,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: $0, ruby: string, cid: CIDData..cid, mid: MIDData..mid, value: -15)]
)]} ?? []
@ -116,7 +116,7 @@ extension KanaKanjiConverter {
Candidate(
text: $0,
value: -18,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: $0, ruby: string, cid: CIDData..cid, mid: MIDData..mid, value: -18)]
)
@ -125,7 +125,7 @@ extension KanaKanjiConverter {
Candidate(
text: $0,
value: -19,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: $0, ruby: string, cid: CIDData..cid, mid: MIDData..mid, value: -19)]
)

View File

@ -38,7 +38,7 @@ extension KanaKanjiConverter {
let candidate = Candidate(
text: result,
value: -10,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: result, ruby: ruby, cid: CIDData..cid, mid: MIDData..mid, value: -10)]
)

View File

@ -46,7 +46,7 @@ extension KanaKanjiConverter {
Candidate(
text: address,
value: baseValue - PValue(i),
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: address, ruby: string, cid: .zero, mid: MIDData..mid, value: baseValue - PValue(i))]
)

View File

@ -37,7 +37,7 @@ extension KanaKanjiConverter {
Candidate(
text: $0,
value: -15,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: $0, ruby: string, cid: CIDData..cid, mid: MIDData..mid, value: -15)]
)

View File

@ -17,7 +17,7 @@ extension KanaKanjiConverter {
let candidate = Candidate(
text: timeExpression,
value: -10,
correspondingCount: numberString.count,
composingCount: .surfaceCount(numberString.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: timeExpression, ruby: numberString, cid: CIDData..cid, mid: MIDData..mid, value: -10)]
)
@ -31,7 +31,7 @@ extension KanaKanjiConverter {
let candidate = Candidate(
text: timeExpression,
value: -10,
correspondingCount: numberString.count,
composingCount: .surfaceCount(numberString.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: timeExpression, ruby: numberString, cid: CIDData..cid, mid: MIDData..mid, value: -10)]
)

View File

@ -22,7 +22,7 @@ extension KanaKanjiConverter {
Candidate(
text: char,
value: value0,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: char, ruby: string, cid: .zero, mid: MIDData..mid, value: value0)]
)

View File

@ -20,7 +20,7 @@ extension KanaKanjiConverter {
return [Candidate(
text: versionString,
value: -30,
correspondingCount: inputData.input.count,
composingCount: .inputCount(inputData.input.count),
lastMid: MIDData..mid,
data: [DicdataElement(word: versionString, ruby: inputData.convertTarget.toKatakana(), cid: CIDData..cid, mid: MIDData..mid, value: -30)]
)]

View File

@ -242,20 +242,93 @@ public final class DicdataStore {
return [louds.searchNodeIndex(chars: charIDs)].compactMap {$0}
}
private struct UnifiedGenerator {
struct SurfaceGenerator {
var surface: [Character] = []
var range: TypoCorrectionGenerator.ProcessRange
var currentIndex: Int
init(surface: [Character], range: TypoCorrectionGenerator.ProcessRange) {
self.surface = surface
self.range = range
self.currentIndex = range.rightIndexRange.lowerBound
}
mutating func setUnreachablePath<C: Collection<Character>>(target: C) where C.Indices == Range<Int> {
if self.surface[self.range.leftIndex...].hasPrefix(target) {
// new upper bound
let currentLowerBound = self.range.rightIndexRange.lowerBound
let currentUpperBound = self.range.rightIndexRange.upperBound
let targetUpperBound = self.range.leftIndex + target.indices.upperBound
self.range.rightIndexRange = min(currentLowerBound, targetUpperBound) ..< min(currentUpperBound, targetUpperBound)
}
}
mutating func next() -> ([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))? {
if self.surface.indices.contains(self.currentIndex), self.currentIndex < self.range.rightIndexRange.upperBound {
defer {
self.currentIndex += 1
}
let characters = Array(self.surface[self.range.leftIndex ... self.currentIndex])
return (characters, (.surface(self.currentIndex), 0))
}
return nil
}
}
var typoCorrectionGenerator: TypoCorrectionGenerator? = nil
var surfaceGenerator: SurfaceGenerator? = nil
mutating func register(_ generator: TypoCorrectionGenerator) {
self.typoCorrectionGenerator = generator
}
mutating func register(_ generator: SurfaceGenerator) {
self.surfaceGenerator = generator
}
mutating func setUnreachablePath<C: Collection<Character>>(target: C) where C.Indices == Range<Int> {
self.typoCorrectionGenerator?.setUnreachablePath(target: target)
self.surfaceGenerator?.setUnreachablePath(target: target)
}
mutating func next() -> ([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))? {
if let next = self.surfaceGenerator?.next() {
return next
}
if let next = self.typoCorrectionGenerator?.next() {
return next
}
return nil
}
}
func movingTowardPrefixSearch(
inputs: [ComposingText.InputElement],
leftIndex: Int,
rightIndexRange: Range<Int>,
composingText: ComposingText,
inputProcessRange: TypoCorrectionGenerator.ProcessRange?,
surfaceProcessRange: TypoCorrectionGenerator.ProcessRange?,
useMemory: Bool,
needTypoCorrection: Bool
) -> (
stringToInfo: [[Character]: (endIndex: Int, penalty: PValue)],
stringToInfo: [[Character]: (endIndex: Lattice.LatticeIndex, penalty: PValue)],
indices: [(key: String, indices: [Int])],
temporaryMemoryDicdata: [DicdataElement]
) {
var generator = TypoCorrectionGenerator(inputs: inputs, leftIndex: leftIndex, rightIndexRange: rightIndexRange, needTypoCorrection: needTypoCorrection)
var generator = UnifiedGenerator()
if let surfaceProcessRange {
let surfaceGenerator = UnifiedGenerator.SurfaceGenerator(
surface: Array(composingText.convertTarget.toKatakana()),
range: surfaceProcessRange
)
generator.register(surfaceGenerator)
}
if let inputProcessRange {
let typoCorrectionGenerator = TypoCorrectionGenerator(
inputs: composingText.input,
range: inputProcessRange,
needTypoCorrection: needTypoCorrection
)
generator.register(typoCorrectionGenerator)
}
var targetLOUDS: [String: LOUDS.MovingTowardPrefixSearchHelper] = [:]
var stringToInfo: [([Character], (endIndex: Int, penalty: PValue))] = []
var stringToInfo: [([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))] = []
//
var dynamicDicdata: [Int: [DicdataElement]] = [:]
//
@ -332,8 +405,25 @@ public final class DicdataStore {
}
let minCount = stringToInfo.map {$0.0.count}.min() ?? 0
return (
Dictionary(stringToInfo, uniquingKeysWith: {$0.penalty < $1.penalty ? $1 : $0}),
targetLOUDS.map { ($0.key, $0.value.indicesInDepth(depth: minCount - 1 ..< .max) )},
Dictionary(
stringToInfo,
uniquingKeysWith: { (lhs, rhs) in
if lhs.penalty < rhs.penalty {
return lhs
} else if lhs.penalty == rhs.penalty {
return switch (lhs.endIndex, rhs.endIndex) {
case (.input, .input), (.surface, .surface): lhs //
case (.surface, .input): lhs // surfaceIndex
case (.input, .surface): rhs // surfaceIndex
}
} else {
return rhs
}
}
),
targetLOUDS.map {
($0.key, $0.value.indicesInDepth(depth: minCount - 1 ..< .max))
},
dynamicDicdata.flatMap {
minCount < $0.key + 1 ? $0.value : []
}
@ -375,30 +465,69 @@ public final class DicdataStore {
}
return data
}
/// kana2lattice
///
/// - Parameters:
/// - inputData:
/// - from:
/// - toIndexRange: `from ..< (toIndexRange)`
public func getLOUDSDataInRange(inputData: ComposingText, from fromIndex: Int, toIndexRange: Range<Int>? = nil, needTypoCorrection: Bool = true) -> [LatticeNode] {
let toIndexLeft = toIndexRange?.startIndex ?? fromIndex
let toIndexRight = min(toIndexRange?.endIndex ?? inputData.input.count, fromIndex + self.maxlength)
if fromIndex > toIndexLeft || toIndexLeft >= toIndexRight {
debug(#function, "index is wrong")
/// - composingText:
/// - inputRange: `composingText.input`
/// - surfaceRange: `composingText.convertTarget`
/// - needTypoCorrection:
/// - Returns: `LatticeNode`
public func lookupDicdata(
composingText: ComposingText,
inputRange:(startIndex: Int, endIndexRange: Range<Int>?)? = nil,
surfaceRange: (startIndex: Int, endIndexRange: Range<Int>?)? = nil,
needTypoCorrection: Bool = true
) -> [LatticeNode] {
let inputProcessRange: TypoCorrectionGenerator.ProcessRange?
if let inputRange {
let toInputIndexLeft = inputRange.endIndexRange?.startIndex ?? inputRange.startIndex
let toInputIndexRight = min(
inputRange.endIndexRange?.endIndex ?? composingText.input.count,
inputRange.startIndex + self.maxlength
)
if inputRange.startIndex > toInputIndexLeft || toInputIndexLeft >= toInputIndexRight {
debug(#function, "index is wrong", inputRange)
return []
}
inputProcessRange = .init(leftIndex: inputRange.startIndex, rightIndexRange: toInputIndexLeft ..< toInputIndexRight)
} else {
inputProcessRange = nil
}
let surfaceProcessRange: TypoCorrectionGenerator.ProcessRange?
if let surfaceRange {
let toSurfaceIndexLeft = surfaceRange.endIndexRange?.startIndex ?? surfaceRange.startIndex
let toSurfaceIndexRight = min(
surfaceRange.endIndexRange?.endIndex ?? composingText.convertTarget.count,
surfaceRange.startIndex + self.maxlength
)
if surfaceRange.startIndex > toSurfaceIndexLeft || toSurfaceIndexLeft >= toSurfaceIndexRight {
debug(#function, "index is wrong", surfaceRange)
return []
}
surfaceProcessRange = .init(leftIndex: surfaceRange.startIndex, rightIndexRange: toSurfaceIndexLeft ..< toSurfaceIndexRight)
} else {
surfaceProcessRange = nil
}
if inputProcessRange == nil && surfaceProcessRange == nil {
debug(#function, "either of inputProcessRange and surfaceProcessRange must not be nil")
return []
}
let segments = (fromIndex ..< toIndexRight).reduce(into: []) { (segments: inout [String], rightIndex: Int) in
segments.append((segments.last ?? "") + String(inputData.input[rightIndex].character.toKatakana()))
}
// MARK:
var (stringToInfo, indices, dicdata) = self.movingTowardPrefixSearch(inputs: inputData.input, leftIndex: fromIndex, rightIndexRange: toIndexLeft ..< toIndexRight, useMemory: self.learningManager.enabled, needTypoCorrection: needTypoCorrection)
var (stringToInfo, indices, dicdata) = self.movingTowardPrefixSearch(
composingText: composingText,
inputProcessRange: inputProcessRange,
surfaceProcessRange: surfaceProcessRange,
useMemory: self.learningManager.enabled,
needTypoCorrection: needTypoCorrection
)
// MARK: indices
for (identifier, value) in indices {
let result: [DicdataElement] = self.getDicdataFromLoudstxt3(identifier: identifier, indices: value).compactMap { (data) -> DicdataElement? in
let rubyArray = Array(data.ruby)
let penalty = stringToInfo[rubyArray, default: (0, .zero)].penalty
let penalty = stringToInfo[rubyArray]?.penalty ?? 0
if penalty.isZero {
return data
}
@ -413,34 +542,39 @@ public final class DicdataStore {
dicdata.append(contentsOf: result)
}
for i in toIndexLeft ..< toIndexRight {
do {
let result = self.getWiseDicdata(convertTarget: segments[i - fromIndex], inputData: inputData, inputRange: fromIndex ..< i + 1)
//
if let surfaceProcessRange {
let chars = Array(composingText.convertTarget.toKatakana())
var segment = String(chars[surfaceProcessRange.leftIndex ..< surfaceProcessRange.rightIndexRange.lowerBound])
for i in surfaceProcessRange.rightIndexRange {
segment.append(String(chars[i]))
let result = self.getWiseDicdata(
convertTarget: segment,
inputData: composingText,
surfaceRange: surfaceProcessRange.leftIndex ..< i + 1
)
for item in result {
stringToInfo[Array(item.ruby)] = (i, 0)
stringToInfo[Array(item.ruby)] = (.surface(i), 0)
}
dicdata.append(contentsOf: result)
}
}
if fromIndex == .zero {
let result: [LatticeNode] = dicdata.compactMap {
guard let endIndex = stringToInfo[Array($0.ruby)]?.endIndex else {
return nil
}
let node = LatticeNode(data: $0, inputRange: fromIndex ..< endIndex + 1)
let needBOS = inputRange?.startIndex == .zero || surfaceRange?.startIndex == .zero
let result: [LatticeNode] = dicdata.compactMap {
guard let endIndex = stringToInfo[Array($0.ruby)]?.endIndex else {
return nil
}
let range: Lattice.LatticeRange = switch endIndex {
case .input(let endIndex): .input(from: (inputRange?.startIndex)!, to: endIndex + 1)
case .surface(let endIndex): .surface(from: (surfaceRange?.startIndex)!, to: endIndex + 1)
}
let node = LatticeNode(data: $0, range: range)
if needBOS {
node.prevs.append(RegisteredNode.BOSNode())
return node
}
return result
} else {
let result: [LatticeNode] = dicdata.compactMap {
guard let endIndex = stringToInfo[Array($0.ruby)]?.endIndex else {
return nil
}
return LatticeNode(data: $0, inputRange: fromIndex ..< endIndex + 1)
}
return result
return node
}
return result
}
func getZeroHintPredictionDicdata(lastRcid: Int) -> [DicdataElement] {
@ -510,35 +644,27 @@ public final class DicdataStore {
/// - convertTarget:
/// - note
/// - Converter
func getWiseDicdata(convertTarget: String, inputData: ComposingText, inputRange: Range<Int>) -> [DicdataElement] {
func getWiseDicdata(convertTarget: String, inputData: ComposingText, surfaceRange: Range<Int>) -> [DicdataElement] {
print(#function, convertTarget, inputData, surfaceRange)
var result: [DicdataElement] = []
result.append(contentsOf: self.getJapaneseNumberDicdata(head: convertTarget))
if inputData.input[..<inputRange.startIndex].last?.character.isNumber != true && inputData.input[inputRange.endIndex...].first?.character.isNumber != true, let number = Int(convertTarget) {
if inputData.convertTarget.prefix(surfaceRange.lowerBound).last?.isNumber != true,
inputData.convertTarget.dropFirst(surfaceRange.upperBound).first?.isNumber != true,
let number = Int(convertTarget) {
result.append(DicdataElement(ruby: convertTarget, cid: CIDData..cid, mid: MIDData..mid, value: -14))
if Double(number) <= 1E12 && -1E12 <= Double(number), let kansuji = self.numberFormatter.string(from: NSNumber(value: number)) {
result.append(DicdataElement(word: kansuji, ruby: convertTarget, cid: CIDData..cid, mid: MIDData..mid, value: -16))
}
}
// convertTarget
if requestOptions.keyboardLanguage == .en_US && convertTarget.onlyRomanAlphabet {
result.append(DicdataElement(ruby: convertTarget, cid: CIDData..cid, mid: MIDData..mid, value: -14))
}
//
if requestOptions.keyboardLanguage != .en_US && inputData.input[inputRange].allSatisfy({$0.inputStyle == .roman2kana}) {
let roman = String(inputData.input[inputRange].map(\.character))
if let katakana = Roman2Kana.katakanaChanges[roman], let hiragana = Roman2Kana.hiraganaChanges[Array(roman)] {
result.append(DicdataElement(word: String(hiragana), ruby: katakana, cid: CIDData..cid, mid: MIDData..mid, value: -13))
result.append(DicdataElement(ruby: katakana, cid: CIDData..cid, mid: MIDData..mid, value: -14))
}
}
//
// convertTarget1
if convertTarget.count == 1 {
let katakana = convertTarget.toKatakana()
let hiragana = convertTarget.toHiragana()
if convertTarget == katakana && katakana == hiragana {
if katakana == hiragana {
//
let element = DicdataElement(ruby: katakana, cid: CIDData..cid, mid: MIDData..mid, value: -14)
result.append(element)
@ -550,7 +676,6 @@ public final class DicdataStore {
result.append(katakanaElement)
}
}
//
if convertTarget.count == 1, let first = convertTarget.first {
var value: PValue = -14

View File

@ -1,13 +1,12 @@
import SwiftUtils
struct TypoCorrectionGenerator: Sendable {
init(inputs: [ComposingText.InputElement], leftIndex left: Int, rightIndexRange: Range<Int>, needTypoCorrection: Bool) {
init(inputs: [ComposingText.InputElement], range: ProcessRange, needTypoCorrection: Bool) {
self.maxPenalty = needTypoCorrection ? 3.5 * 3 : 0
self.inputs = inputs
self.left = left
self.rightIndexRange = rightIndexRange
self.range = range
let count = rightIndexRange.endIndex - left
let count = self.range.rightIndexRange.endIndex - range.leftIndex
self.count = count
self.nodes = (0..<count).map {(i: Int) in
Self.lengths.flatMap {(k: Int) -> [TypoCandidate] in
@ -15,7 +14,7 @@ struct TypoCorrectionGenerator: Sendable {
if count <= j {
return []
}
return Self.getTypo(inputs[left + i ... left + j], frozen: !needTypoCorrection)
return Self.getTypo(inputs[range.leftIndex + i ... range.leftIndex + j], frozen: !needTypoCorrection)
}
}
//
@ -23,7 +22,7 @@ struct TypoCorrectionGenerator: Sendable {
guard let firstElement = typoCandidate.inputElements.first else {
return nil
}
if ComposingText.isLeftSideValid(first: firstElement, of: inputs, from: left) {
if ComposingText.isLeftSideValid(first: firstElement, of: inputs, from: range.leftIndex) {
var convertTargetElements = [ComposingText.ConvertTargetElement]()
for element in typoCandidate.inputElements {
ComposingText.updateConvertTargetElements(currentElements: &convertTargetElements, newElement: element)
@ -36,11 +35,15 @@ struct TypoCorrectionGenerator: Sendable {
let maxPenalty: PValue
let inputs: [ComposingText.InputElement]
let left: Int
let rightIndexRange: Range<Int>
let range: ProcessRange
let nodes: [[TypoCandidate]]
let count: Int
struct ProcessRange: Sendable, Equatable {
var leftIndex: Int
var rightIndexRange: Range<Int>
}
var stack: [(convertTargetElements: [ComposingText.ConvertTargetElement], lastElement: ComposingText.InputElement, count: Int, penalty: PValue)]
/// `target`
@ -75,12 +78,12 @@ struct TypoCorrectionGenerator: Sendable {
}
}
mutating func next() -> ([Character], (endIndex: Int, penalty: PValue))? {
mutating func next() -> ([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))? {
while let (convertTargetElements, lastElement, count, penalty) = self.stack.popLast() {
var result: ([Character], (endIndex: Int, penalty: PValue))? = nil
if rightIndexRange.contains(count + left - 1) {
if let convertTarget = ComposingText.getConvertTargetIfRightSideIsValid(lastElement: lastElement, of: inputs, to: count + left, convertTargetElements: convertTargetElements)?.map({$0.toKatakana()}) {
result = (convertTarget, (count + left - 1, penalty))
var result: ([Character], (endIndex: Lattice.LatticeIndex, penalty: PValue))? = nil
if self.range.rightIndexRange.contains(count + self.range.leftIndex - 1) {
if let convertTarget = ComposingText.getConvertTargetIfRightSideIsValid(lastElement: lastElement, of: inputs, to: count + self.range.leftIndex, convertTargetElements: convertTargetElements)?.map({$0.toKatakana()}) {
result = (convertTarget, (.input(count + self.range.leftIndex - 1), penalty))
}
}
//
@ -94,7 +97,7 @@ struct TypoCorrectionGenerator: Sendable {
// (3)
if penalty >= maxPenalty {
var convertTargetElements = convertTargetElements
let correct = [inputs[left + count]].map {ComposingText.InputElement(character: $0.character.toKatakana(), inputStyle: $0.inputStyle)}
let correct = [inputs[self.range.leftIndex + count]].map {ComposingText.InputElement(character: $0.character.toKatakana(), inputStyle: $0.inputStyle)}
if count + correct.count > self.nodes.endIndex {
if let result {
return result

View File

@ -213,31 +213,6 @@ public struct ComposingText: Sendable {
return (oldString.count - common.count, String(newString.dropFirst(common.count)))
}
/// input
/// TODO:
private mutating func updateInput(_ string: String, at inputCursorPosition: Int, inputStyle: InputStyle) {
if inputCursorPosition == 0 {
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
return
}
let prev = self.input[inputCursorPosition - 1]
if inputStyle == .roman2kana && prev.inputStyle == inputStyle, let first = string.first, String(first).onlyRomanAlphabet {
if prev.character == first && !["a", "i", "u", "e", "o", "n"].contains(first) {
self.input[inputCursorPosition - 1] = InputElement(character: "", inputStyle: .direct)
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
return
}
let n_prefix = self.input[0 ..< inputCursorPosition].suffix {$0.character == "n" && $0.inputStyle == .roman2kana}
if n_prefix.count % 2 == 1 && !["n", "a", "i", "u", "e", "o", "y"].contains(first)
&& self.input.dropLast(n_prefix.count).last != .init(character: "x", inputStyle: .roman2kana) {
self.input[inputCursorPosition - 1] = InputElement(character: "", inputStyle: .direct)
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
return
}
}
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
}
///
public mutating func insertAtCursorPosition(_ string: String, inputStyle: InputStyle) {
if string.isEmpty {
@ -246,7 +221,7 @@ public struct ComposingText: Sendable {
let inputCursorPosition = self.forceGetInputCursorPosition(target: self.convertTarget.prefix(convertTargetCursorPosition))
// input, convertTarget, convertTargetCursorPosition3
// input
self.updateInput(string, at: inputCursorPosition, inputStyle: inputStyle)
self.input.insert(contentsOf: string.map {InputElement(character: $0, inputStyle: inputStyle)}, at: inputCursorPosition)
let oldConvertTarget = self.convertTarget.prefix(self.convertTargetCursorPosition)
let newConvertTarget = Self.getConvertTarget(for: self.input.prefix(inputCursorPosition + string.count))
@ -341,18 +316,37 @@ public struct ComposingText: Sendable {
///
/// - parameters:
/// - correspondingCount: `input`
public mutating func prefixComplete(correspondingCount: Int) {
let correspondingCount = min(correspondingCount, self.input.count)
self.input.removeFirst(correspondingCount)
// convetTarget
let newConvertTarget = Self.getConvertTarget(for: self.input)
//
let cursorDelta = self.convertTarget.count - newConvertTarget.count
self.convertTarget = newConvertTarget
self.convertTargetCursorPosition -= cursorDelta
//
if self.convertTargetCursorPosition == 0 {
self.convertTargetCursorPosition = self.convertTarget.count
public mutating func prefixComplete(composingCount: ComposingCount) {
switch composingCount {
case .inputCount(let correspondingCount):
let correspondingCount = min(correspondingCount, self.input.count)
self.input.removeFirst(correspondingCount)
// convetTarget
let newConvertTarget = Self.getConvertTarget(for: self.input)
//
let cursorDelta = self.convertTarget.count - newConvertTarget.count
self.convertTarget = newConvertTarget
self.convertTargetCursorPosition -= cursorDelta
//
if self.convertTargetCursorPosition == 0 {
self.convertTargetCursorPosition = self.convertTarget.count
}
case .surfaceCount(let correspondingCount):
// correspondingCount
//
let prefix = self.convertTarget.prefix(correspondingCount)
let index = self.forceGetInputCursorPosition(target: prefix)
self.input = Array(self.input[index...])
self.convertTarget = String(self.convertTarget.dropFirst(correspondingCount))
self.convertTargetCursorPosition -= correspondingCount
//
if self.convertTargetCursorPosition == 0 {
self.convertTargetCursorPosition = self.convertTarget.count
}
case .composite(let left, let right):
self.prefixComplete(composingCount: left)
self.prefixComplete(composingCount: right)
}
}
@ -365,6 +359,40 @@ public struct ComposingText: Sendable {
return text
}
public func inputIndexToSurfaceIndexMap() -> [Int: Int] {
// i2c: input indexconvert target indexmap
// c2i: convert target indexinput indexmap
// 1.
// [k, y, o, u, h, a, i, i, t, e, n, k, i, d, a]
// [, , , , , , , , , ]
// i2c: [0: 0, 3: 2(), 4: 3(), 6: 4(), 7: 5(), 8: 6(), 10: 7(), 13: 9(), 15: 10()]
var map: [Int: (surfaceIndex: Int, surface: String)] = [0: (0, "")]
//
var convertTargetElements: [ConvertTargetElement] = []
for (idx, element) in self.input.enumerated() {
//
Self.updateConvertTargetElements(currentElements: &convertTargetElements, newElement: element)
//
let currentSurface = convertTargetElements.reduce(into: "") { $0 += $1.string }
// idx =
// idx + 1
map[idx + 1] = (currentSurface.count, currentSurface)
}
//
let finalSurface = convertTargetElements.reduce(into: "") { $0 += $1.string }
return map
.filter {
finalSurface.hasPrefix($0.value.surface)
}
.mapValues {
$0.surfaceIndex
}
}
public mutating func stopComposition() {
self.input = []
self.convertTarget = ""
@ -580,17 +608,20 @@ extension ComposingText.ConvertTargetElement: Equatable {}
extension ComposingText {
/// 2`ComposingText`
/// `convertTarget``convertTarget`
func differenceSuffix(to previousData: ComposingText) -> (deleted: Int, addedCount: Int) {
func differenceSuffix(to previousData: ComposingText) -> (deletedInput: Int, addedInput: Int, deletedSurface: Int, addedSurface: Int) {
// kshx ... last
// n ssss
// |
// inputdirect
//
let common = self.input.commonPrefix(with: previousData.input)
let deleted = previousData.input.count - common.count
let added = self.input.dropFirst(common.count).count
return (deleted, added)
let commonSurface = self.convertTarget.commonPrefix(with: previousData.convertTarget)
let deletedSurface = previousData.convertTarget.count - commonSurface.count
let addedSurface = self.convertTarget.count - commonSurface.count
return (deleted, added, deletedSurface, addedSurface)
}
func inputHasSuffix(inputOf suffix: ComposingText) -> Bool {

View File

@ -4,6 +4,7 @@ public import Foundation
public extension ConvertRequestOptions {
static func withDefaultDictionary(
N_best: Int = 10,
needTypoCorrection: Bool? = nil,
requireJapanesePrediction: Bool,
requireEnglishPrediction: Bool,
keyboardLanguage: KeyboardLanguage,
@ -29,13 +30,26 @@ public extension ConvertRequestOptions {
#else
let dictionaryDirectory = Bundle.module.resourceURL!.appendingPathComponent("Dictionary", isDirectory: true)
#endif
var specialCandidateProviders = [any SpecialCandidateProvider]()
if typographyLetterCandidate {
specialCandidateProviders.append(.typography)
}
if unicodeCandidate {
specialCandidateProviders.append(.unicode)
}
specialCandidateProviders.append(.emailAddress)
specialCandidateProviders.append(.timeExpression)
specialCandidateProviders.append(.calendar)
specialCandidateProviders.append(.version)
specialCandidateProviders.append(.commaSeparatedNumber)
return Self(
N_best: N_best,
needTypoCorrection: needTypoCorrection,
requireJapanesePrediction: requireJapanesePrediction,
requireEnglishPrediction: requireEnglishPrediction,
keyboardLanguage: keyboardLanguage,
typographyLetterCandidate: typographyLetterCandidate,
unicodeCandidate: unicodeCandidate,
englishCandidateInRoman2KanaInput: englishCandidateInRoman2KanaInput,
fullWidthRomanCandidate: fullWidthRomanCandidate,
halfWidthKanaCandidate: halfWidthKanaCandidate,
@ -44,8 +58,9 @@ public extension ConvertRequestOptions {
shouldResetMemory: shouldResetMemory,
dictionaryResourceURL: dictionaryDirectory,
memoryDirectoryURL: memoryDirectoryURL,
sharedContainerURL: sharedContainerURL,
sharedContainerURL: sharedContainerURL,
textReplacer: textReplacer,
specialCandidateProviders: specialCandidateProviders,
zenzaiMode: zenzaiMode,
preloadDictionary: preloadDictionary,
metadata: metadata

View File

@ -14,19 +14,19 @@ final class ClauseDataUnitTests: XCTestCase {
do {
let unit1 = ClauseDataUnit()
unit1.text = "僕が"
unit1.inputRange = 0 ..< 3
unit1.ranges = [.input(from: 0, to: 3)]
unit1.mid = 0
unit1.nextLcid = 0
let unit2 = ClauseDataUnit()
unit2.text = "走る"
unit2.inputRange = 3 ..< 6
unit2.ranges = [.input(from: 3, to: 6)]
unit2.mid = 1
unit2.nextLcid = 1
unit1.merge(with: unit2)
XCTAssertEqual(unit1.text, "僕が走る")
XCTAssertEqual(unit1.inputRange, 0 ..< 6)
XCTAssertEqual(unit1.ranges, [.input(from: 0, to: 3), .input(from: 3, to: 6)])
XCTAssertEqual(unit1.nextLcid, 1)
XCTAssertEqual(unit1.mid, 0)
}
@ -34,19 +34,19 @@ final class ClauseDataUnitTests: XCTestCase {
do {
let unit1 = ClauseDataUnit()
unit1.text = "君は"
unit1.inputRange = 0 ..< 3
unit1.ranges = [.input(from: 0, to: 3)]
unit1.mid = 0
unit1.nextLcid = 0
let unit2 = ClauseDataUnit()
unit2.text = "笑った"
unit2.inputRange = 3 ..< 7
unit2.ranges = [.input(from: 3, to: 7)]
unit2.mid = 3
unit2.nextLcid = 3
unit1.merge(with: unit2)
XCTAssertEqual(unit1.text, "君は笑った")
XCTAssertEqual(unit1.inputRange, 0 ..< 7)
XCTAssertEqual(unit1.ranges, [.input(from: 0, to: 3), .input(from: 3, to: 7)])
XCTAssertEqual(unit1.nextLcid, 3)
XCTAssertEqual(unit1.mid, 0)
}

View File

@ -75,7 +75,7 @@ final class ComposingTextTests: XCTestCase {
sequentialInput(&c, sequence: "itte", inputStyle: .roman2kana)
XCTAssertEqual(c.input, [
ComposingText.InputElement(character: "i", inputStyle: .roman2kana),
ComposingText.InputElement(character: "", inputStyle: .direct),
ComposingText.InputElement(character: "t", inputStyle: .roman2kana),
ComposingText.InputElement(character: "t", inputStyle: .roman2kana),
ComposingText.InputElement(character: "e", inputStyle: .roman2kana)
])
@ -88,7 +88,7 @@ final class ComposingTextTests: XCTestCase {
sequentialInput(&c, sequence: "anta", inputStyle: .roman2kana)
XCTAssertEqual(c.input, [
ComposingText.InputElement(character: "a", inputStyle: .roman2kana),
ComposingText.InputElement(character: "", inputStyle: .direct),
ComposingText.InputElement(character: "n", inputStyle: .roman2kana),
ComposingText.InputElement(character: "t", inputStyle: .roman2kana),
ComposingText.InputElement(character: "a", inputStyle: .roman2kana)
])
@ -202,8 +202,8 @@ final class ComposingTextTests: XCTestCase {
var c2 = ComposingText()
c2.insertAtCursorPosition("hasiru", inputStyle: .roman2kana)
XCTAssertEqual(c2.differenceSuffix(to: c1).deleted, 0)
XCTAssertEqual(c2.differenceSuffix(to: c1).addedCount, 1)
XCTAssertEqual(c2.differenceSuffix(to: c1).deletedInput, 0)
XCTAssertEqual(c2.differenceSuffix(to: c1).addedInput, 1)
}
do {
var c1 = ComposingText()
@ -212,8 +212,47 @@ final class ComposingTextTests: XCTestCase {
var c2 = ComposingText()
c2.insertAtCursorPosition("tukatte", inputStyle: .roman2kana)
XCTAssertEqual(c2.differenceSuffix(to: c1).deleted, 0)
XCTAssertEqual(c2.differenceSuffix(to: c1).addedCount, 1)
XCTAssertEqual(c2.differenceSuffix(to: c1).deletedInput, 0)
XCTAssertEqual(c2.differenceSuffix(to: c1).addedInput, 1)
}
}
func testIndexMap() throws {
do {
var c = ComposingText()
sequentialInput(&c, sequence: "kyouhaiitenkida", inputStyle: .roman2kana)
let map = c.inputIndexToSurfaceIndexMap()
XCTAssertEqual(map[0], 0) // ""
XCTAssertEqual(map[1], nil) // k
XCTAssertEqual(map[2], nil) // y
XCTAssertEqual(map[3], 2) // o
XCTAssertEqual(map[4], 3) // u
XCTAssertEqual(map[5], nil) // h
XCTAssertEqual(map[6], 4) // a
XCTAssertEqual(map[7], 5) // i
XCTAssertEqual(map[8], 6) // i
XCTAssertEqual(map[9], nil) // t
XCTAssertEqual(map[10], 7) // e
XCTAssertEqual(map[11], nil) // n
XCTAssertEqual(map[12], nil) // k
XCTAssertEqual(map[13], 9) // i
XCTAssertEqual(map[14], nil) // d
XCTAssertEqual(map[15], 10) // a
}
do {
var c = ComposingText()
sequentialInput(&c, sequence: "sakujoshori", inputStyle: .roman2kana)
let map = c.inputIndexToSurfaceIndexMap()
let reversedMap = (0 ..< c.convertTarget.count + 1).compactMap {
if map.values.contains($0) {
String(c.convertTarget.prefix($0))
} else {
nil
}
}
XCTAssertFalse(reversedMap.contains("さくじ"))
XCTAssertFalse(reversedMap.contains("さくじょし"))
}
}
}

View File

@ -16,7 +16,7 @@ final class CandidateTests: XCTestCase {
let candidate = Candidate(
text: text,
value: -40,
correspondingCount: 4,
composingCount: .inputCount(4),
lastMid: 5,
data: [DicdataElement(word: text, ruby: "サイコロ", cid: 0, mid: 5, value: -40)]
)
@ -27,7 +27,7 @@ final class CandidateTests: XCTestCase {
print(candidate2.text)
XCTAssertTrue(Set((1...3).map(String.init)).contains(candidate2.text))
XCTAssertEqual(candidate.value, candidate2.value)
XCTAssertEqual(candidate.correspondingCount, candidate2.correspondingCount)
XCTAssertEqual(candidate.composingCount, candidate2.composingCount)
XCTAssertEqual(candidate.lastMid, candidate2.lastMid)
XCTAssertEqual(candidate.data, candidate2.data)
XCTAssertEqual(candidate.actions, candidate2.actions)
@ -38,7 +38,7 @@ final class CandidateTests: XCTestCase {
let candidate = Candidate(
text: text,
value: 0,
correspondingCount: 0,
composingCount: .inputCount(0),
lastMid: 0,
data: [DicdataElement(word: text, ruby: "", cid: 0, mid: 0, value: 0)]
)

View File

@ -0,0 +1,41 @@
import Foundation
@testable import KanaKanjiConverterModule
import XCTest
final class TemplateConversionTests: XCTestCase {
func requestOptions() -> ConvertRequestOptions {
.default
}
func testTemplateConversion() async throws {
let converter = await KanaKanjiConverter()
let template = #"<date format="yyyy年MM月dd日" type="western" language="ja_JP" delta="0" deltaunit="1">"#
await converter.sendToDicdataStore(.importDynamicUserDict([
.init(word: template, ruby: "キョウ", cid: CIDData..cid, mid: MIDData..mid, value: 5)
]))
let formatter = DateFormatter()
formatter.dateFormat = "yyyy年MM月dd日"
formatter.calendar = Calendar(identifier: .gregorian)
let todayString = formatter.string(from: Date())
do {
var c = ComposingText()
c.insertAtCursorPosition("きょう", inputStyle: .direct)
let results = await converter.requestCandidates(c, options: requestOptions())
XCTAssertTrue(results.mainResults.contains(where: { $0.text == todayString} ))
XCTAssertFalse(results.mainResults.contains(where: { $0.text == template} ))
XCTAssertFalse(results.firstClauseResults.contains(where: { $0.text == template} ))
await converter.stopComposition()
}
do {
var c = ComposingText()
c.insertAtCursorPosition("kyou", inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: requestOptions())
XCTAssertTrue(results.mainResults.contains(where: { $0.text == todayString} ))
XCTAssertFalse(results.mainResults.contains(where: { $0.text == template} ))
XCTAssertFalse(results.firstClauseResults.contains(where: { $0.text == template} ))
await converter.stopComposition()
}
}
}

View File

@ -88,7 +88,7 @@ final class LearningMemoryTests: XCTestCase {
Candidate(
text: element.word,
value: element.value(),
correspondingCount: 3,
composingCount: .inputCount(3),
lastMid: element.mid,
data: [element]
)
@ -128,7 +128,7 @@ final class LearningMemoryTests: XCTestCase {
Candidate(
text: element.word,
value: element.value(),
correspondingCount: 3,
composingCount: .inputCount(3),
lastMid: element.mid,
data: [element]
)

View File

@ -12,16 +12,16 @@ import XCTest
final class RegisteredNodeTests: XCTestCase {
func testBOSNode() throws {
let bos = RegisteredNode.BOSNode()
XCTAssertEqual(bos.inputRange, 0..<0)
XCTAssertEqual(bos.range, Lattice.LatticeRange.zero)
XCTAssertNil(bos.prev)
XCTAssertEqual(bos.totalValue, 0)
XCTAssertEqual(bos.data.rcid, CIDData.BOS.cid)
}
func testFromLastCandidate() throws {
let candidate = Candidate(text: "我輩は猫", value: -20, correspondingCount: 7, lastMid: 100, data: [DicdataElement(word: "我輩は猫", ruby: "ワガハイハネコ", cid: CIDData..cid, mid: 100, value: -20)])
let candidate = Candidate(text: "我輩は猫", value: -20, composingCount: .inputCount(7), lastMid: 100, data: [DicdataElement(word: "我輩は猫", ruby: "ワガハイハネコ", cid: CIDData..cid, mid: 100, value: -20)])
let bos = RegisteredNode.fromLastCandidate(candidate)
XCTAssertEqual(bos.inputRange, 0..<0)
XCTAssertEqual(bos.range, Lattice.LatticeRange.zero)
XCTAssertNil(bos.prev)
XCTAssertEqual(bos.totalValue, 0)
XCTAssertEqual(bos.data.rcid, CIDData..cid)
@ -34,37 +34,37 @@ final class RegisteredNodeTests: XCTestCase {
data: DicdataElement(word: "我輩", ruby: "ワガハイ", cid: CIDData..cid, mid: 1, value: -5),
registered: bos,
totalValue: -10,
inputRange: 0..<4
range: .input(from: 0, to: 4)
)
let node2 = RegisteredNode(
data: DicdataElement(word: "", ruby: "", cid: CIDData..cid, mid: 2, value: -2),
registered: node1,
totalValue: -13,
inputRange: 4..<5
range: .input(from: 4, to: 5)
)
let node3 = RegisteredNode(
data: DicdataElement(word: "", ruby: "ネコ", cid: CIDData..cid, mid: 3, value: -4),
registered: node2,
totalValue: -20,
inputRange: 5..<7
range: .input(from: 5, to: 7)
)
let node4 = RegisteredNode(
data: DicdataElement(word: "です", ruby: "デス", cid: CIDData..cid, mid: 4, value: -3),
registered: node3,
totalValue: -25,
inputRange: 7..<9
range: .input(from: 7, to: 9)
)
let result = node4.getCandidateData()
let clause1 = ClauseDataUnit()
clause1.text = "我輩は"
clause1.nextLcid = CIDData..cid
clause1.inputRange = 0..<5
clause1.ranges = [.input(from: 0, to: 0), .input(from: 0, to: 4), .input(from: 4, to: 5)] // (0, 0) BOS
clause1.mid = 1
let clause2 = ClauseDataUnit()
clause2.text = "猫です"
clause2.nextLcid = CIDData.EOS.cid
clause2.inputRange = 5..<9
clause2.ranges = [.input(from: 5, to: 7), .input(from: 7, to: 9)]
clause2.mid = 3
let expectedResult: CandidateData = CandidateData(

View File

@ -7,7 +7,7 @@
//
import Foundation
import KanaKanjiConverterModuleWithDefaultDictionary
@testable import KanaKanjiConverterModuleWithDefaultDictionary
import XCTest
final class ConverterTests: XCTestCase {
@ -17,9 +17,10 @@ final class ConverterTests: XCTestCase {
}
}
func requestOptions() -> ConvertRequestOptions {
func requestOptions(needTypoCorrection: Bool = false) -> ConvertRequestOptions {
.withDefaultDictionary(
N_best: 10,
needTypoCorrection: needTypoCorrection,
requireJapanesePrediction: false,
requireEnglishPrediction: false,
keyboardLanguage: .ja_JP,
@ -56,19 +57,21 @@ final class ConverterTests: XCTestCase {
}
func testRoman2KanaFullConversion() async throws {
do {
let converter = await KanaKanjiConverter()
var c = ComposingText()
c.insertAtCursorPosition("azuーkiーhasinjidainokiーboーdoapuridesu", inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: requestOptions())
XCTAssertEqual(results.mainResults.first?.text, "azooKeyは新時代のキーボードアプリです")
}
do {
let converter = await KanaKanjiConverter()
var c = ComposingText()
c.insertAtCursorPosition("youshoukikaratenisusuieiyakyuushourinjikenpounadosamazamanasupoーtuwokeikennsinagarasodatishougakkouzidaiharosanzerusukinkounitaizaisiteorigoruhuyatenisuwonaratteita", inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: requestOptions())
XCTAssertEqual(results.mainResults.first?.text, "幼少期からテニス水泳野球少林寺拳法など様々なスポーツを経験しながら育ち小学校時代はロサンゼルス近郊に滞在しておりゴルフやテニスを習っていた")
for needTypoCorrection in [true, false] {
do {
let converter = await KanaKanjiConverter()
var c = ComposingText()
c.insertAtCursorPosition("azuーkiーhasinjidainokiーboーdoapuridesu", inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: requestOptions(needTypoCorrection: needTypoCorrection))
XCTAssertEqual(results.mainResults.first?.text, "azooKeyは新時代のキーボードアプリです")
}
do {
let converter = await KanaKanjiConverter()
var c = ComposingText()
c.insertAtCursorPosition("youshoukikaratenisusuieiyakyuushourinjikenpounadosamazamanasupoーtuwokeikennsinagarasodatishougakkouzidaiharosanzerusukinkounitaizaisiteorigoruhuyatenisuwonaratteita", inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: requestOptions(needTypoCorrection: needTypoCorrection))
XCTAssertEqual(results.mainResults.first?.text, "幼少期からテニス水泳野球少林寺拳法など様々なスポーツを経験しながら育ち小学校時代はロサンゼルス近郊に滞在しておりゴルフやテニスを習っていた")
}
}
}
@ -128,6 +131,53 @@ final class ConverterTests: XCTestCase {
}
}
}
// memo:
func testKimiAndThenDelete() async throws {
let converter = await KanaKanjiConverter()
var c = ComposingText()
let text = "kimi"
//
let possibles = [
"",
"気味",
"黄身"
]
for char in text {
c.insertAtCursorPosition(String(char), inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: requestOptions())
if c.input.count == text.count {
XCTAssertTrue(possibles.contains(results.mainResults.first!.text))
}
}
// 1
c.deleteBackwardFromCursorPosition(count: 1)
let results = await converter.requestCandidates(c, options: requestOptions())
XCTAssertTrue(results.mainResults.contains { $0.text == "" })
}
// memo: fatalError
func testIttaAndThenDelete() async throws {
let converter = await KanaKanjiConverter()
var c = ComposingText()
let text = "itta"
//
let possibles = [
"いった",
"行った",
"言った"
]
for char in text {
c.insertAtCursorPosition(String(char), inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: requestOptions())
if c.input.count == text.count {
XCTAssertTrue(possibles.contains(results.mainResults.first!.text))
}
}
// 1
c.deleteBackwardFromCursorPosition(count: 1)
let results = await converter.requestCandidates(c, options: requestOptions())
XCTAssertTrue(results.mainResults.contains { $0.text == "言っ" })
}
// 1
// memo: deleted_last_n
@ -171,72 +221,103 @@ final class ConverterTests: XCTestCase {
//
func testMustCases() async throws {
//
do {
let cases: [(input: String, expect: String)] = [
("つかっている", "使っている"),
("しんだどうぶつ", "死んだ動物"),
("けいさん", "計算"),
("azooKeyをつかう", "azooKeyを使う"),
("じどうAIそうじゅう。", "自動AI操縦。"),
("1234567890123456789012", "1234567890123456789012")
]
//
do {
let cases: [(input: String, expect: String)] = [
("つかっている", "使っている"),
("しんだどうぶつ", "死んだ動物"),
("けいさん", "計算"),
("azooKeyをつかう", "azooKeyを使う"),
("じどうAIそうじゅう。", "自動AI操縦。"),
("1234567890123456789012", "1234567890123456789012")
]
// full input
var options = requestOptions()
options.requireJapanesePrediction = false
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
sequentialInput(&c, sequence: input, inputStyle: .direct)
// full input
var options = requestOptions()
options.requireJapanesePrediction = false
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
sequentialInput(&c, sequence: input, inputStyle: .direct)
let results = await converter.requestCandidates(c, options: options)
XCTAssertEqual(results.mainResults.first?.text, expect)
}
// gradual input
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
for char in input {
c.insertAtCursorPosition(String(char), inputStyle: .direct)
let results = await converter.requestCandidates(c, options: options)
XCTAssertEqual(results.mainResults.first?.text, expect)
}
// gradual input
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
for char in input {
c.insertAtCursorPosition(String(char), inputStyle: .direct)
let results = await converter.requestCandidates(c, options: options)
if c.input.count == input.count {
XCTAssertEqual(results.mainResults.first?.text, expect)
}
if c.input.count == input.count {
XCTAssertEqual(results.mainResults.first?.text, expect)
}
}
}
//
do {
let cases: [(input: String, expect: String)] = [
("tukatteiru", "使っている"),
("sindadoubutu", "死んだ動物"),
("keisann", "計算")
]
}
//
do {
let cases: [(input: String, expect: String)] = [
("tukatteiru", "使っている"),
("sindadoubutu", "死んだ動物"),
("keisann", "計算")
]
// full input
var options = requestOptions()
options.requireJapanesePrediction = false
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
sequentialInput(&c, sequence: input, inputStyle: .roman2kana)
// full input
var options = requestOptions()
options.requireJapanesePrediction = false
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
sequentialInput(&c, sequence: input, inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: options)
XCTAssertEqual(results.mainResults.first?.text, expect)
}
// gradual input
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
for char in input {
c.insertAtCursorPosition(String(char), inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: options)
XCTAssertEqual(results.mainResults.first?.text, expect)
}
// gradual input
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
for char in input {
c.insertAtCursorPosition(String(char), inputStyle: .roman2kana)
let results = await converter.requestCandidates(c, options: options)
if c.input.count == input.count {
XCTAssertEqual(results.mainResults.first?.text, expect)
}
if c.input.count == input.count {
XCTAssertEqual(results.mainResults.first?.text, expect)
}
}
}
}
// typo
do {
let cases: [(input: String, expect: String)] = [
("たいかくせい", "大学生"),
("きみのことかすき", "君のことが好き"),
("おへんとうをもつていく", "お弁当を持っていく"),
]
// full input
var options = requestOptions(needTypoCorrection: true)
options.requireJapanesePrediction = false
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
sequentialInput(&c, sequence: input, inputStyle: .direct)
let results = await converter.requestCandidates(c, options: options)
XCTAssertEqual(results.mainResults.first?.text, expect)
}
// gradual input
for (input, expect) in cases {
let converter = await KanaKanjiConverter()
var c = ComposingText()
for char in input {
c.insertAtCursorPosition(String(char), inputStyle: .direct)
let results = await converter.requestCandidates(c, options: options)
if c.input.count == input.count {
XCTAssertEqual(results.mainResults.first?.text, expect)
}
}
}
}
}
//

View File

@ -129,7 +129,7 @@ final class DicdataStoreTests: XCTestCase {
for (key, word) in mustWords {
var c = ComposingText()
c.insertAtCursorPosition(key, inputStyle: .direct)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
//
XCTAssertEqual(result.first(where: {$0.data.word == word})?.data.word, word)
}
@ -150,7 +150,7 @@ final class DicdataStoreTests: XCTestCase {
for (key, word) in mustWords {
var c = ComposingText()
c.insertAtCursorPosition(key, inputStyle: .direct)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
XCTAssertNil(result.first(where: {$0.data.word == word && $0.data.ruby == key}))
}
}
@ -170,17 +170,17 @@ final class DicdataStoreTests: XCTestCase {
for (key, word) in mustWords {
var c = ComposingText()
c.insertAtCursorPosition(key, inputStyle: .direct)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: true)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: true)
XCTAssertEqual(result.first(where: {$0.data.word == word})?.data.word, word)
}
}
func testGetLOUDSDataInRange() throws {
func testLookupDicdata() throws {
let dicdataStore = DicdataStore(convertRequestOptions: requestOptions())
do {
var c = ComposingText()
c.insertAtCursorPosition("ヘンカン", inputStyle: .roman2kana)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 2..<4)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 2 ..< 4))
XCTAssertFalse(result.contains(where: {$0.data.word == ""}))
XCTAssertTrue(result.contains(where: {$0.data.word == "変化"}))
XCTAssertTrue(result.contains(where: {$0.data.word == "変換"}))
@ -188,7 +188,7 @@ final class DicdataStoreTests: XCTestCase {
do {
var c = ComposingText()
c.insertAtCursorPosition("ヘンカン", inputStyle: .roman2kana)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 0..<4)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 0..<4))
XCTAssertTrue(result.contains(where: {$0.data.word == ""}))
XCTAssertTrue(result.contains(where: {$0.data.word == "変化"}))
XCTAssertTrue(result.contains(where: {$0.data.word == "変換"}))
@ -196,19 +196,19 @@ final class DicdataStoreTests: XCTestCase {
do {
var c = ComposingText()
c.insertAtCursorPosition("ツカッ", inputStyle: .roman2kana)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 2..<3)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 2..<3))
XCTAssertTrue(result.contains(where: {$0.data.word == "使っ"}))
}
do {
var c = ComposingText()
c.insertAtCursorPosition("ツカッt", inputStyle: .roman2kana)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 2..<4)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 2..<4))
XCTAssertTrue(result.contains(where: {$0.data.word == "使っ"}))
}
do {
var c = ComposingText()
sequentialInput(&c, sequence: "tukatt", inputStyle: .roman2kana)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: 4..<6)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, 4..<6))
XCTAssertTrue(result.contains(where: {$0.data.word == "使っ"}))
}
}
@ -218,7 +218,7 @@ final class DicdataStoreTests: XCTestCase {
do {
var c = ComposingText()
c.insertAtCursorPosition("999999999999", inputStyle: .roman2kana)
let result = dicdataStore.getWiseDicdata(convertTarget: c.convertTarget, inputData: c, inputRange: 0..<12)
let result = dicdataStore.getWiseDicdata(convertTarget: c.convertTarget, inputData: c, surfaceRange: 0..<12)
XCTAssertTrue(result.contains(where: {$0.word == "999999999999"}))
XCTAssertTrue(result.contains(where: {$0.word == "九千九百九十九億九千九百九十九万九千九百九十九"}))
}
@ -255,7 +255,7 @@ final class DicdataStoreTests: XCTestCase {
do {
var c = ComposingText()
c.insertAtCursorPosition("テストタンゴ", inputStyle: .direct)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
XCTAssertTrue(result.contains(where: {$0.data.word == "テスト単語"}))
}
@ -263,7 +263,7 @@ final class DicdataStoreTests: XCTestCase {
do {
var c = ComposingText()
c.insertAtCursorPosition("ドウテキジショ", inputStyle: .direct)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
XCTAssertTrue(result.contains(where: {$0.data.word == "動的辞書"}))
}
@ -288,16 +288,16 @@ final class DicdataStoreTests: XCTestCase {
do {
var c = ComposingText()
sequentialInput(&c, sequence: "tesutowaーdo", inputStyle: .roman2kana)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
XCTAssertTrue(result.contains(where: {$0.data.word == "テストワード"}))
XCTAssertEqual(result.first(where: {$0.data.word == "テストワード"})?.inputRange, 0 ..< 11)
XCTAssertEqual(result.first(where: {$0.data.word == "テストワード"})?.range, .input(from: 0, to: 11))
}
//
do {
var c = ComposingText()
c.insertAtCursorPosition("トクシュヨミ", inputStyle: .direct)
let result = dicdataStore.getLOUDSDataInRange(inputData: c, from: 0, toIndexRange: c.input.endIndex - 1 ..< c.input.endIndex, needTypoCorrection: false)
let result = dicdataStore.lookupDicdata(composingText: c, inputRange: (0, c.input.endIndex - 1 ..< c.input.endIndex), needTypoCorrection: false)
let dynamicUserDictResult = result.first(where: {$0.data.word == "特殊読み"})
XCTAssertNotNil(dynamicUserDictResult)
XCTAssertEqual(dynamicUserDictResult?.data.metadata, .isFromUserDictionary)