Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Unicode 15.0.0 #92

Merged
merged 7 commits into from
Oct 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This repository provides packages to use the

The Haskell data structures are generated programmatically from the UCD files.
The latest Unicode version supported by these libraries is
[`14.0.0`](https://www.unicode.org/versions/Unicode14.0.0/).
[`15.0.0`](https://www.unicode.org/versions/Unicode15.0.0/).

### `unicode-data`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ does not match the version of this package.
| 9.0.[1-2] | 4.15.0 | 12.1 |
| 9.2.[1-4] | 4.16.0 | 14.0 |
| 9.4.[1-2] | 4.17.0 | 14.0 |
| 9.6.1 | 4.18.0 | 15.0 |
+-------------+----------------+-----------------+
-}

Expand Down
6 changes: 3 additions & 3 deletions experimental/unicode-data-text/unicode-data-text.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ library
build-depends:
base >= 4.7 && < 4.18,
text >= 1.2.4 && < 2.1,
unicode-data >= 0.3 && < 0.4
unicode-data >= 0.3 && < 0.5

test-suite test
import: default-extensions, compile-options
Expand All @@ -85,7 +85,7 @@ test-suite test
unicode-data-text
build-tool-depends:
hspec-discover:hspec-discover >= 2.0 && < 2.11
if impl(ghc >= 9.2.1)
if impl(ghc >= 9.5.1)
cpp-options: -DCOMPATIBLE_GHC_UNICODE
default-language: Haskell2010

Expand All @@ -100,6 +100,6 @@ benchmark bench
tasty-bench >= 0.2.5 && < 0.4,
tasty >= 1.4.1,
text >= 1.2.4 && < 2.1,
unicode-data >= 0.3 && < 0.4,
unicode-data >= 0.3 && < 0.5,
unicode-data-text
ghc-options: -O2 -fdicts-strict -rtsopts
3 changes: 3 additions & 0 deletions stack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ resolver: lts-18.18
packages:
- './unicode-data'
- './unicode-data-names'
- './unicode-data-scripts'
- './unicode-data-security'
- './experimental/unicode-data-text'
extra-deps:
- streamly-0.8.0
flags:
Expand Down
38 changes: 19 additions & 19 deletions ucd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# we used to generate them earlier are exactly the same as the ones we are
# downloading. To ensure that verfication of the checksum is necessary.

VERSION=14.0.0
VERSION=15.0.0

# When downloading fresh new version comment this out
VERIFY_CHECKSUM=y
Expand All @@ -14,29 +14,29 @@ VERIFY_CHECKSUM=y
UCD_URL="https://www.unicode.org/Public/$VERSION/ucd"
# Filename:checksum
UCD_FILES="\
Blocks.txt:598870dddef7b34b5a972916528c456aff2765b79cd4f9647fb58ceb767e7f17 \
CaseFolding.txt:a566cd48687b2cd897e02501118b2413c14ae86d318f9abbbba97feb84189f0f \
DerivedCoreProperties.txt:e3eddd7d469cd1b0feed7528defad1a1cc7c6a9ceb0ae4446a6d10921ed2e7bc \
DerivedNormalizationProps.txt:b2c444c20730b097787fdf50bd7d6dd3fc5256ab8084f5b35b11c8776eca674c \
NameAliases.txt:14b3b677d33f95c51423dce6eef4a6a28b4b160451ecedee4b91edb6745cf4a3 \
PropertyValueAliases.txt:eb755757e20b72b330b2948df3cf2ff7adb0e31bb060140dc09dafb132ace2cd \
PropList.txt:6bddfdb850417a5bee6deff19290fd1b138589909afb50f5a049f343bf2c6722 \
Scripts.txt:52db475c4ec445e73b0b16915448c357614946ad7062843c563e00d7535c6510 \
ScriptExtensions.txt:d37eedf63ff9c48bac863d5f76862373d6cf5269fd21253d499e2430d638c01d \
SpecialCasing.txt:c667b45908fd269af25fd55d2fc5bbc157fb1b77675936e25c513ce32e080334 \
UnicodeData.txt:36018e68657fdcb3485f636630ffe8c8532e01c977703d2803f5b89d6c5feafb \
extracted/DerivedCombiningClass.txt:12b0c3af9b600b49488d66545a3e7844ea980809627201bf9afeebe1c9f16f4e \
extracted/DerivedName.txt:fef3e11514ba152f0d38a09f8018c03a825f846dbb912334c1e5c9fb29392a02 \
extracted/DerivedNumericValues.txt:11075771b112e8e7ccf6ffa637c4c91eadc3ef3db0517b24e605df8fd3624239"
Blocks.txt:529dc5d0f6386d52f2f56e004bbfab48ce2d587eea9d38ba546c4052491bd820 \
CaseFolding.txt:cdd49e55eae3bbf1f0a3f6580c974a0263cb86a6a08daa10fbf705b4808a56f7 \
DerivedCoreProperties.txt:d367290bc0867e6b484c68370530bdd1a08b6b32404601b8c7accaf83e05628d \
DerivedNormalizationProps.txt:d5687a48c95c7d6e1ec59cb29c0f2e8b052018eb069a4371b7368d0561e12a29 \
NameAliases.txt:3e39509e8fae3e5d50ba73759d0b97194501d14a9c63107a6372a46b38be18e8 \
PropertyValueAliases.txt:13a7666843abea5c6b7eb8c057c57ab9bb2ba96cfc936e204224dd67d71cafad \
PropList.txt:e05c0a2811d113dae4abd832884199a3ea8d187ee1b872d8240a788a96540bfd \
Scripts.txt:cca85d830f46aece2e7c1459ef1249993dca8f2e46d51e869255be140d7ea4b0 \
ScriptExtensions.txt:7e07313d9d0bee42220c476b64485995130ae30917bbcf7780b602d677d7e33f \
SpecialCasing.txt:78b29c64b5840d25c11a9f31b665ee551b8a499eca6c70d770fcad7dd710f494 \
UnicodeData.txt:806e9aed65037197f1ec85e12be6e8cd870fc5608b4de0fffd990f689f376a73 \
extracted/DerivedCombiningClass.txt:ca54f6360cd288ad92113415bf1f77749015abe11cbd6798d21f7fa81f04205d \
extracted/DerivedName.txt:f76288153e20de185a40f7ee6e0e365f3c6c80e9e3019b5aa0afc8ac2c1b15f2 \
extracted/DerivedNumericValues.txt:6bd30f385f3baf3ab5d5308c111a81de87bea5f494ba0ba69e8ab45263b8c34d"

# Security files (https://www.unicode.org/Public/security/$VERSION/$file)
SECURITY_URL="https://www.unicode.org/Public/security/$VERSION"
# Filename:checksum
SECURITY_FILES="\
IdentifierStatus.txt:3f3f368fccdb37f350ecedc20b37fa71ab31c04e847884c77780d34283539f73 \
IdentifierType.txt:45a150c23961b58d7784704af6c4daccd6517d97b6489e53d13bbdbf9e4f065f \
confusables.txt:f901938af166c3afa471bd10c224b0979cd024340f290649e16b29f779d48bfe \
intentional.txt:42243c12a2e20546e836576e3091a5a5db2c1fc506899b1d8b56f7b6eab77cb3"
IdentifierStatus.txt:fd5c5e510914a2018e092bc51ea653bd2bfcf7daa116a346f09179a0f74704b0 \
IdentifierType.txt:71e95d5811999776a39c33a9149e5bf3c3311217a36b89005c678f34f08debc0 \
confusables.txt:2b10130885c3370b101c52d7baedc452ab7f0e257b86c1e52ee657ecfc29ce64 \
intentional.txt:4550bcc406b5ce3b1a40ff857a3f8b703ea0c868c35f2f7c93d86bfb733215f9"

# Download the files

Expand Down
4 changes: 4 additions & 0 deletions unicode-data-names/Changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## 0.2.0 (September 2022)

- Update to [Unicode 15.0.0](https://www.unicode.org/versions/Unicode15.0.0/).

## 0.1.0 (June 2022)

- Initial release
2 changes: 1 addition & 1 deletion unicode-data-names/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ character names and aliases from the
The Haskell data structures are generated programmatically from the
Unicode character database (UCD) files. The latest Unicode version
supported by this library is
[`14.0.0`](https://www.unicode.org/versions/Unicode14.0.0/).
[`15.0.0`](https://www.unicode.org/versions/Unicode15.0.0/).

Please see the
[Haddock documentation](https://hackage.haskell.org/package/unicode-data-names)
Expand Down
4 changes: 2 additions & 2 deletions unicode-data-names/lib/Unicode/Char/General/Names.hs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
-- Stability : experimental
--
-- Unicode character names and name aliases.
-- See Unicode standard 14.0.0, section 4.8.
-- See Unicode standard 15.0.0, section 4.8.
--
-- @since 0.1.0

Expand Down Expand Up @@ -84,7 +84,7 @@ nameAliasesWithTypes
= fmap (fmap (fmap unpack))
. NameAliases.nameAliasesWithTypes

-- Note: names are ASCII. See Unicode Standard 14.0.0, section 4.8.
-- Note: names are ASCII. See Unicode Standard 15.0.0, section 4.8.
{-# INLINE unpack #-}
unpack :: CString -> String
unpack = unsafePerformIO . peekCAString

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
-- autogenerated from https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt
-- autogenerated from https://www.unicode.org/Public/15.0.0/ucd/NameAliases.txt
-- |
-- Module : Unicode.Internal.Char.UnicodeData.NameAliases
-- Copyright : (c) 2022 Composewell Technologies and Contributors
Expand All @@ -17,7 +17,7 @@ import Data.Maybe (fromMaybe)
import Foreign.C.String (CString)
import GHC.Exts (Ptr(..))

-- | Type of name alias. See Unicode Standard 14.0.0, section 4.8.
-- | Type of name alias. See Unicode Standard 15.0.0, section 4.8.
--
-- @since 0.1.0
data NameAliasType
Expand Down Expand Up @@ -86,7 +86,7 @@ nameAliasesWithTypes = \case
'\x0016' -> [(Control,[Ptr "SYNCHRONOUS IDLE\0"#]),(Abbreviation,[Ptr "SYN\0"#])]
'\x0017' -> [(Control,[Ptr "END OF TRANSMISSION BLOCK\0"#]),(Abbreviation,[Ptr "ETB\0"#])]
'\x0018' -> [(Control,[Ptr "CANCEL\0"#]),(Abbreviation,[Ptr "CAN\0"#])]
'\x0019' -> [(Control,[Ptr "END OF MEDIUM\0"#]),(Abbreviation,[Ptr "EOM\0"#])]
'\x0019' -> [(Control,[Ptr "END OF MEDIUM\0"#]),(Abbreviation,[Ptr "EOM\0"#,Ptr "EM\0"#])]
'\x001a' -> [(Control,[Ptr "SUBSTITUTE\0"#]),(Abbreviation,[Ptr "SUB\0"#])]
'\x001b' -> [(Control,[Ptr "ESCAPE\0"#]),(Abbreviation,[Ptr "ESC\0"#])]
'\x001c' -> [(Control,[Ptr "INFORMATION SEPARATOR FOUR\0"#,Ptr "FILE SEPARATOR\0"#]),(Abbreviation,[Ptr "FS\0"#])]
Expand Down Expand Up @@ -132,6 +132,7 @@ nameAliasesWithTypes = \case
'\x01a2' -> [(Correction,[Ptr "LATIN CAPITAL LETTER GHA\0"#])]
'\x01a3' -> [(Correction,[Ptr "LATIN SMALL LETTER GHA\0"#])]
'\x034f' -> [(Abbreviation,[Ptr "CGJ\0"#])]
'\x0616' -> [(Correction,[Ptr "ARABIC SMALL HIGH LIGATURE ALEF WITH YEH BARREE\0"#])]
'\x061c' -> [(Abbreviation,[Ptr "ALM\0"#])]
'\x0709' -> [(Correction,[Ptr "SYRIAC SUBLINEAR COLON SKEWED LEFT\0"#])]
'\x0cde' -> [(Correction,[Ptr "KANNADA LETTER LLLA\0"#])]
Expand All @@ -149,6 +150,7 @@ nameAliasesWithTypes = \case
'\x180d' -> [(Abbreviation,[Ptr "FVS3\0"#])]
'\x180e' -> [(Abbreviation,[Ptr "MVS\0"#])]
'\x180f' -> [(Abbreviation,[Ptr "FVS4\0"#])]
'\x1bbd' -> [(Correction,[Ptr "SUNDANESE LETTER ARCHAIC I\0"#])]
'\x200b' -> [(Abbreviation,[Ptr "ZWSP\0"#])]
'\x200c' -> [(Abbreviation,[Ptr "ZWNJ\0"#])]
'\x200d' -> [(Abbreviation,[Ptr "ZWJ\0"#])]
Expand Down
6 changes: 3 additions & 3 deletions unicode-data-names/test/Unicode/Char/General/NamesSpec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ spec = do
name '\x1f41d' `shouldBe` Just "HONEYBEE"
-- Name generated using pattern (example from UCD file)
name '\x2f89f' `shouldBe` Just "CJK COMPATIBILITY IDEOGRAPH-2F89F"
-- Last name defined, as of Unicode 14.0.0
-- Last name defined, as of Unicode 15.0.0
name '\xe01ef' `shouldBe` Just "VARIATION SELECTOR-256"
name maxBound `shouldBe` Nothing
it "correctedName: Test some characters" do
Expand All @@ -48,7 +48,7 @@ spec = do
correctedName '\x1f41d' `shouldBe` Just "HONEYBEE"
-- Name generated using pattern (example from UCD file)
correctedName '\x2f89f' `shouldBe` Just "CJK COMPATIBILITY IDEOGRAPH-2F89F"
-- Last name defined, as of Unicode 14.0.0
-- Last name defined, as of Unicode 15.0.0
correctedName '\xe01ef' `shouldBe` Just "VARIATION SELECTOR-256"
correctedName maxBound `shouldBe` Nothing
it "nameOrAlias: Test some characters" do
Expand All @@ -68,7 +68,7 @@ spec = do
nameOrAlias '\x1f41d' `shouldBe` Just "HONEYBEE"
-- Name generated using pattern (example from UCD file)
nameOrAlias '\x2f89f' `shouldBe` Just "CJK COMPATIBILITY IDEOGRAPH-2F89F"
-- Last name defined, as of Unicode 14.0.0
-- Last name defined, as of Unicode 15.0.0
nameOrAlias '\xe01ef' `shouldBe` Just "VARIATION SELECTOR-256"
nameOrAlias maxBound `shouldBe` Nothing
it "Every defined character has at least a name or an alias" do
Expand Down
6 changes: 3 additions & 3 deletions unicode-data-names/unicode-data-names.cabal
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: 2.2
name: unicode-data-names
version: 0.1.0
version: 0.2.0
synopsis: Unicode characters names and aliases
description:
@unicode-data-names@ provides Haskell APIs to access the Unicode
Expand All @@ -9,7 +9,7 @@ description:
.
The Haskell data structures are generated programmatically from the UCD files.
The latest Unicode version supported by this library is
@<https://www.unicode.org/versions/Unicode14.0.0/ 14.0.0>@.
@<https://www.unicode.org/versions/Unicode15.0.0/ 15.0.0>@.
homepage: http://github.com/composewell/unicode-data
bug-reports: https://github.com/composewell/unicode-data/issues
license: Apache-2.0
Expand Down Expand Up @@ -96,7 +96,7 @@ test-suite test
build-depends:
base >= 4.7 && < 4.18
, hspec >= 2.0 && < 2.11
, unicode-data
, unicode-data >= 0.4 && < 0.5
, unicode-data-names
build-tool-depends:
hspec-discover:hspec-discover >= 2.0 && < 2.11
Expand Down
4 changes: 4 additions & 0 deletions unicode-data-scripts/Changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## 0.2.0 (September 2022)

- Update to [Unicode 15.0.0](https://www.unicode.org/versions/Unicode15.0.0/).

## 0.1.0 (September 2022)

Initial release
Expand Down
2 changes: 1 addition & 1 deletion unicode-data-scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ character [scripts](https://www.unicode.org/reports/tr24/) from the
The Haskell data structures are generated programmatically from the
Unicode character database (UCD) files. The latest Unicode version
supported by this library is
[`14.0.0`](https://www.unicode.org/versions/Unicode14.0.0/).
[`15.0.0`](https://www.unicode.org/versions/Unicode15.0.0/).

Please see the
[Haddock documentation](https://hackage.haskell.org/package/unicode-data-scripts)
Expand Down
Loading