Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transliteration map fix #14

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e4c30c2
Complete Ukrainian transliteration hash with Josh.
Stratus3D Oct 20, 2017
82f33b2
Add rspec.
Stratus3D Oct 20, 2017
2530db8
Add first test.
Stratus3D Oct 20, 2017
364ece6
Add gemspec to Gemfile.
Stratus3D Oct 20, 2017
abdbcfe
More work on the rspec tests.
Stratus3D Oct 20, 2017
abd3fd1
More work on the translit tests. Fix several issues.
Stratus3D Oct 20, 2017
e2825bf
Add links to pages with details on Ukrainian transliteration.
Stratus3D Oct 20, 2017
4f60bd1
Uncomment English => Russian test.
Stratus3D Oct 20, 2017
935702b
Fix English => Russian transliteration test.
Stratus3D Oct 20, 2017
4d1fa6c
Add support for transliteration to and from Ukrainian.
Stratus3D Oct 20, 2017
069723e
Check in updated Gemfile.lock.
Stratus3D Oct 20, 2017
6f4ff4c
Merge pull request #1 from euroteamoutreach/tb/add-ukrainian-support
Stratus3D Oct 20, 2017
ea79ec6
Try sorting transliteration hashes by key length.
Stratus3D Oct 20, 2017
d7c0590
Try another sorting choice.
Stratus3D Oct 20, 2017
472407c
Add tests to assert the ordering of the keys in the transliteration m…
Stratus3D Oct 20, 2017
09bc230
Add another test for english to ukrainian transliteration.
Stratus3D Oct 21, 2017
b8f2bbc
Remove ü from Ukrainian character map.
Stratus3D Oct 21, 2017
9c5608c
Simplify gsub logic so we don't have to capitalize cyrillic characters.
Stratus3D Oct 21, 2017
1ac93d6
Merge pull request #3 from euroteamoutreach/simplify-gsub
Stratus3D Oct 21, 2017
c38f366
Handle empty strings and strings that contain lines without text.
Stratus3D Oct 21, 2017
ca6146f
Merge pull request #4 from euroteamoutreach/handle-empty-strings
Stratus3D Oct 21, 2017
fac196a
Remove rake requirement, FileList.
joshukraine Jan 29, 2018
dbde4d8
Merge pull request #5 from euroteamoutreach/js/require-rake-issue
joshukraine Mar 7, 2018
c63f3e4
Fixed typo in readme.
Epigene Apr 12, 2017
5c509b5
Merge pull request #12 from Epigene/master
tjbladez Jan 26, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .rspec
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--color
2 changes: 1 addition & 1 deletion Gemfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
source "http://rubygems.org"

gem "riot"

gemspec
24 changes: 24 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,12 +1,36 @@
PATH
remote: .
specs:
translit (0.1.5)

GEM
remote: http://rubygems.org/
specs:
diff-lcs (1.2.5)
riot (0.12.5)
rr
rr (1.0.4)
rspec (3.5.0)
rspec-core (~> 3.5.0)
rspec-expectations (~> 3.5.0)
rspec-mocks (~> 3.5.0)
rspec-core (3.5.3)
rspec-support (~> 3.5.0)
rspec-expectations (3.5.0)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.5.0)
rspec-mocks (3.5.0)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.5.0)
rspec-support (3.5.0)

PLATFORMS
ruby

DEPENDENCIES
riot
rspec
translit!

BUNDLED WITH
1.15.4
16 changes: 8 additions & 8 deletions README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ Or you can translit stdin now via just: <code>translit</code>. To stop stdin inp
Translit using autodetection:

```
Translit.convert("Отличный день") #=> Otlichnyj den'
Translit.convert("Otlichnyj den'") #=> Отличный день
Translit.convert("Отличный день") #=> "Otlichnyj den'"
Translit.convert("Otlichnyj den'") #=> "Отличный день"
```
Translit forcing target language:

```
Translit.convert("Отличный den'", :russian) #=> Отличный деньOtlichnyj den'
Translit.convert("Otlichnyj день", :english) #=> Otlichnyj den'
Translit.convert("Отличный den'", :russian) #=> "Отличный день"
Translit.convert("Otlichnyj день", :english) #=> "Otlichnyj den'"
```

## Русский ##
Expand All @@ -40,11 +40,11 @@ Translit.convert("Otlichnyj день", :english) #=> Otlichnyj den'
Транслитирование с автоопределнием языка:

```
Translit.convert("Отличный день") #=> Otlichnyj den'
Translit.convert("Otlichnyj den'") #=> Отличный день
Translit.convert("Отличный день") #=> "Otlichnyj den'"
Translit.convert("Otlichnyj den'") #=> "Отличный день"
```
Транслитирование на определенный язык:

```
Translit.convert("Отличный den'", :russian) #=> Отличный деньOtlichnyj den'
Translit.convert("Otlichnyj день", :english) #=> Otlichnyj den'
Translit.convert("Отличный den'", :russian) #=> "Отличный день"
Translit.convert("Otlichnyj день", :english) #=> "Otlichnyj den'"
87 changes: 68 additions & 19 deletions lib/translit.rb
Original file line number Diff line number Diff line change
@@ -1,17 +1,24 @@
# coding: utf-8
# Resources for Ukrainian transliteration
# * http://en.ukrlandia.com.ua/cyrillic-alphabets/
# * http://www.ukrainiansintheuk.info/eng/00/translit-e.htm

module Translit
# Ukrainian only chars: Ґ, І, Ї, Є, ‘
UKRAINIAN_ONLY_CHARS = %w(Ґ ґ І і Ї ї Є є ‘)

# Russian only chars: Э
RUSSIAN_ONLY_CHARS = %w(Э э)

def self.convert!(text, enforce_language = nil)
language = if enforce_language
enforce_input_language(enforce_language)
enforce_input_language(non_empty_line(text), enforce_language)
else
detect_input_language(text.split(/\s+/).first)
detect_input_language(non_empty_line(text))
end

map = self.send(language.to_s).sort_by {|k,v| v.length <=> k.length}
map = self.send(language.to_s + "_to_" + enforce_language.to_s).sort_by {|k,v| k.length}.reverse
map.each do |translit_key, translit_value|
text.gsub!(translit_key.capitalize, translit_value.first)
text.gsub!(translit_key, translit_value.last)
text.gsub!(translit_key, translit_value.first)
end
text
end
Expand All @@ -21,37 +28,79 @@ def self.convert(text, enforce_language = nil)
end

private
def self.create_russian_map
self.english.inject({}) do |acc, tuple|

def self.non_empty_line(text)
text.split(/\s+/).select { |line| !line.empty? }.first
end

def self.invert_character_map(map)
map.dup.inject({}) do |acc, tuple|
rus = tuple.last.first
eng_value = tuple.first
acc[rus] ? acc[rus] << eng_value : acc[rus] = [eng_value]
acc
end
end

def self.latin_cases(map)
map.dup.inject({}) do |acc, tuple|
rus_up, rus_low = tuple.last
eng_value = tuple.first
acc[rus_up] ? acc[rus_up] << eng_value.capitalize : acc[rus_up] = [eng_value.capitalize]
acc[rus_low] ? acc[rus_low] << eng_value : acc[rus_low] = [eng_value]
acc[eng_value] = [rus_low]
unless eng_value == eng_value.capitalize
acc[eng_value.capitalize] = [rus_up]
end
acc
end
end

def self.detect_input_language(text)
text.scan(/\w+/).empty? ? :russian : :english
if text && text.scan(/\w+/).empty?
slavic_language(text)
else
:english
end
end

def self.enforce_input_language(language)
def self.enforce_input_language(text, language)
if language == :english
:russian
slavic_language(text)
else
:english
end
end

def self.english
{ "a"=>["А","а"], "b"=>["Б","б"], "v"=>["В","в"], "g"=>["Г","г"], "d"=>["Д","д"], "e"=>["Е","е"], "yo"=>["Ё","ё"], "jo"=>["Ё","ё"], "ö"=>["Ё","ё"], "zh"=>["Ж","ж"],
def self.slavic_language(text)
# If text contains Ukrainian chars we know it is Ukrainian
if UKRAINIAN_ONLY_CHARS.any? { |uk_char| text && text.include?(uk_char) }
:ukrainian
else
:russian
end
end

# Unsupported latin: "ä"=>["Э","э"], "ü"=>["Ю","ю"],
def self.english_to_ukrainian
@english_to_ukrainian ||= latin_cases({ "a"=>["А","а"], "b"=>["Б","б"], "v"=>["В","в"], "h"=>["Г","г"], "g"=>["Ґ","ґ"], "d"=>["Д","д"], "e"=>["Е","е"], "ye"=>["Є","є"], "je"=>["Є","є"], "zh"=>["Ж","ж"],
"z"=>["З","з"], "i"=>["І","і"], "yi"=>["Ї","ї"], "j"=>["Й","й"], "k"=>["К","к"], "l"=>["Л","л"], "m"=>["М","м"], "n"=>["Н","н"], "o"=>["О","о"], "p"=>["П","п"], "r"=>["Р","р"],
"s"=>["С","с"], "t"=>["Т","т"], "u"=>["У","у"], "f"=>["Ф","ф"], "kh"=>["Х","х"], "x"=>["Кс","кс"], "ts"=>["Ц","ц"], "ch"=>["Ч","ч"], "sh"=>["Ш","ш"], "w"=>["В","в"],
"shch"=>["Щ","щ"], "sch"=>["Щ","щ"], "y"=>["И","и"], "'"=>["Ь","ь"], "yu"=>["Ю","ю"], "ju"=>["Ю","ю"],
"ü"=>["Ю","ю"], "ya"=>["Я","я"], "ja"=>["Я","я"], "q"=>["К","к"]})
end

def self.english_to_russian
@english_to_russian ||= latin_cases({ "a"=>["А","а"], "b"=>["Б","б"], "v"=>["В","в"], "g"=>["Г","г"], "d"=>["Д","д"], "e"=>["Е","е"], "yo"=>["Ё","ё"], "jo"=>["Ё","ё"], "ö"=>["Ё","ё"], "zh"=>["Ж","ж"],
"z"=>["З","з"], "i"=>["И","и"], "j"=>["Й","й"], "k"=>["К","к"], "l"=>["Л","л"], "m"=>["М","м"], "n"=>["Н","н"], "o"=>["О","о"], "p"=>["П","п"], "r"=>["Р","р"],
"s"=>["С","с"], "t"=>["Т","т"], "u"=>["У","у"], "f"=>["Ф","ф"], "h"=>["Х","х"], "x"=>["Кс","кс"], "ts"=>["Ц","ц"], "ch"=>["Ч","ч"], "sh"=>["Ш","ш"], "w"=>["В","в"],
"shch"=>["Щ","щ"], "sch"=>["Щ","щ"], "#"=>["Ъ","ъ"], "y"=>["Ы","ы"], ""=>["Ь","ь"], "je"=>["Э","э"], "ä"=>["Э","э"], "yu"=>["Ю","ю"], "ju"=>["Ю","ю"],
"ü"=>["Ю","ю"], "ya"=>["Я","я"], "ja"=>["Я","я"], "q"=>["Я","я"]}
"shch"=>["Щ","щ"], "sch"=>["Щ","щ"], "#"=>["Ъ","ъ"], "y"=>["Ы","ы"], "'"=>["Ь","ь"], "je"=>["Э","э"], "ä"=>["Э","э"], "yu"=>["Ю","ю"], "ju"=>["Ю","ю"],
"ü"=>["Ю","ю"], "ya"=>["Я","я"], "ja"=>["Я","я"], "q"=>["Я","я"]})
end

def self.russian_to_english
@russian ||= invert_character_map(english_to_russian)
end

def self.russian
@russian ||= create_russian_map
def self.ukrainian_to_english
@ukrainian ||= invert_character_map(english_to_ukrainian)
end
end
60 changes: 60 additions & 0 deletions spec/translit_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
require 'translit'

describe "translit" do
let(:russian) { "Транслитерация между кириллицей и латиницей с коммандной строки или в твоей программе" }
let(:russian_transliteration) { "Transliteratsiya mezhdu kirillitsej i latinitsej s kommandnoj stroki ili v tvoej programme" }
let(:english) { "Transliteration between cyrillic <-> latin from command-line or your program" }
let(:ukrainian) { "Транслітерація між кирилицею <-> Латинська з командного рядка або вашої програми" }
let(:ukrainian_transliteration) { "Transliteratsiya mizh kyrylytseyu <-> Latyns'ka z komandnoho ryadka abo vashoyi prohramy" }
# This is broken
let(:english_transliteration) { "Транслитератион бетвеен cыриллиc <-> латин фром cомманд-лине ор ёур програм" }
let(:ukrainian_english_transliteration) { "Транслітератіон бетвеен cирілліc <-> латін фром cомманд-ліне ор иоур проґрам" }

it "transliterates from russian to english" do
expect(Translit.convert(russian, :english)).to eq(russian_transliteration)
end

it "transliterates from ukrainian to english" do
expect(Translit.convert(ukrainian, :english)).to eq(ukrainian_transliteration)
end

it "transliterates from english to russian" do
expect(Translit.convert(english, :russian)).to eq(english_transliteration)
end

it "transliterates from english to ukrainian" do
expect(Translit.convert(english, :ukrainian)).to eq(ukrainian_english_transliteration)
end

describe "ukrainian to english transliteration" do
it "should put yu ahead of ü" do
#"yu"=>["Ю","ю"], "ju"=>["Ю","ю"], "ü"=>["Ю","ю"]
expect(Translit.convert("Ю", :english)).to eq("Yu")
end

it "should put yu ahead of ü even at the end of a word" do
expect(Translit.convert("Біблію", :english)).to eq("Bibliyu")
end

it "should put ye ahead of je" do
#"ye"=>["Є","є"], "je"=>["Є","є"]
expect(Translit.convert("Є", :english)).to eq("Ye")
end
end

describe "english to ukrainian transliteration" do
it "should put yu ahead of ü even at the end of a word" do
expect(Translit.convert("Bible", :ukrainian)).to eq("Бібле")
end
end

describe "edge cases" do
it "should handle empty strings" do
expect(Translit.convert("", :english)).to eq("")
end

it "should handle strings with empty lines" do
expect(Translit.convert("\n\n", :english)).to eq("\n\n")
end
end
end
4 changes: 1 addition & 3 deletions translit.gemspec
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# coding: utf-8

require 'rake'

Gem::Specification.new do |s|
s.name = %q{translit}
s.version = '0.1.5'
Expand All @@ -11,12 +9,12 @@ Gem::Specification.new do |s|
s.date = %q{2010-09-28}
s.description = %q{Transliteration between cyrillic <-> latin | Транслитерация между кириллицей и латиницей }
s.email = %q{tjbladez@gmail.com}
s.files = FileList['{bin,lib}/**/*', 'README.markdown'].to_a
s.has_rdoc = false
s.bindir = 'bin'
s.executables = %w{translit}
s.default_executable = 'bin/translit'
s.homepage = %q{http://github.com/tjbladez/translit}
s.summary = %q{Transliteration between cyrillic <-> latin from command-line or your program | Транслитерация между кириллицей и латиницей с коммандной строки или в твоей программе}
s.post_install_message = %q{You are ready to transliterate | Вы готовы к транслитерации}
s.add_development_dependency "rspec"
end