more updates for 15.1 data workflow

unicode-org · Feb 7, 2023 · 5f2992b · 5f2992b
1 parent 3ab01ec
commit 5f2992b
Show file tree

Hide file tree

Showing 4 changed files with 50 additions and 44 deletions.
diff --git a/docs/build.md b/docs/build.md
@@ -217,9 +217,7 @@ See the top level `pom.xml` under `<properties>`.
 
 The input data files for the Unicode Tools are checked into the repo since
 2012-dec-21:
-
-*   <https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd>
-*   <https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd>
+*   https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/
 
 This is inside the unicodetools file tree, and the Java code has been updated to
 assume that. Any old Eclipse setup needs its path variables checked.
@@ -242,7 +240,9 @@ Starting with Unicode 15, we are developing most of the Unicode data files
 in this Unicode Tools project, and publish them to the Public folder
 only for alpha/beta/final releases.
 That is, we are reversing the flow of files.
-(See [issue #144](https://github.com/unicode-org/unicodetools/issues/144).)
+
+See [data workflow](data-workflow.md). (Based on
+[issue #144](https://github.com/unicode-org/unicodetools/issues/144).)
 
 We are also no longer generating and posting files with version suffixes.
 (We now generate files into an output folder with the Unicode version number.)
@@ -255,7 +255,7 @@ unversioned "dev" folders in this repo.
 
 #### Unicode 15.1+ workflow
 
-See data-workflow.md .
+See [data workflow](data-workflow.md).
 
 ### Unicode 15.0.0 changes
 
@@ -374,10 +374,10 @@ to generate new files). For all the new ones:
 Make a pull request to incorporate these updates, and upload the generated files
 in a way that can be shared with ucd-dev.
 
-Unicode 15 TODO:
-We plan to
+Unicode 15+:
 - make a commit for changes in input data files
 - copy the output files back into the input folders, review, and commit again
+
 ... instead of posting draft files elsewhere and re-ingesting them later.
 
 Ideally, diff the files to check for any discrepancies. The script will do this
@@ -530,13 +530,16 @@ If there are new break rules (or changes), see
     Unicode.
 4.  On Windows you can run these BATs to compare files: TODO??
 
-### Upload for Ken Whistler & editorial committee
+### Upload for Ken Whistler & other reviewers
 
-Unicode 15 TODO: See above; commit new input data, run tools, review output, copy back to input, commit, pull request...
+Unicode 15+: See above; commit new input data, run tools, review output, copy back to input, commit, pull request...
 
 1.  Check diffs for problems
-2.  First drop for a version: Upload **all** files
-3.  Subsequent drop for a version: Upload *only modified* files
+2.  Ask for reviews on the pull request.
+3.  For & during alpha & beta we publish whole snapshots of multiple repo data folders
+    using publication scripts: See [data workflow](data-workflow.md).
+
+We no longer post files to FTP folders, nor publish individual files without consistent changes in others.
 
 ### Invariant Checking
 

diff --git a/docs/inputdata.md b/docs/inputdata.md
@@ -6,7 +6,9 @@ Starting with Unicode 15, we are developing most of the Unicode data files
 in this Unicode Tools project, and publish them to the Public folder
 only for alpha/beta/final releases.
 That is, we are reversing the flow of files.
-(See [issue #144](https://github.com/unicode-org/unicodetools/issues/144).)
+
+See [data workflow](data-workflow.md). (Based on
+[issue #144](https://github.com/unicode-org/unicodetools/issues/144).)
 
 We are also no longer generating and posting files with version suffixes.
 
@@ -15,6 +17,34 @@ and we continue to ingest them as before.
 
 ## Source Files
 
+*Starting with Unicode 15.1, the “source of truth” for most data files is in the repo,
+and most of this section is obsolete. See [data workflow](data-workflow.md).
+The biggest exception is Unihan.zip, which we don't track in the repo; see the Unihan section below.
+Also, it's still useful to delete the BIN files/folders after changing data files.*
+
+### Unihan
+
+You may need to manually change the "Unihan-8.0.0d2 Folder" to "Unihan".
+
+Unzip the Unihan.zip file into a "Unihan" subfolder.
+
+Starting with Unicode 13, we split the Unihan data into single-property files
+and parse those.
+
+Run the script that is checked in at
+[py/splitunihan.py](../py/splitunihan.py)
+with one argument, the path to the Unihan folder.
+
+Ignore or delete the Unihan\*.txt files now. Do not check them into the tools
+any more.
+
+Check for new and no-longer-present files (Unihan properties).
+`git add` and `git rm` as necessary.
+
+### Fetching files from Public
+
+Only for Unicode 15.0 and earlier:
+
 The source files that you will need for a release such as 8.0.0 are in:
 
 *   [ftp://unicode.org/Public/8.0.0/ucd](ftp://unicode.org/Public/8.0.0/ucd)
@@ -68,6 +98,7 @@ files have the version suffix.
 ### Removing Suffixes
 
 Only for Unicode 14 and earlier:
+
 For the ucd and uca files, you will have to remove the suffixes.
 
 Tip: On Linux, you can remove version suffixes on the command line like this:
@@ -134,25 +165,6 @@ $ cd {workspace}/unicodetools/data/ucd/staging
 $ ../../desuffixucd.py .
 ```
 
-### Unihan
-
-You may need to manually change the "Unihan-8.0.0d2 Folder" to "Unihan".
-
-Unzip the Unihan.zip file into a "Unihan" subfolder.
-
-Starting with Unicode 13, we split the Unihan data into single-property files
-and parse those.
-
-Run the script that is checked in at
-[py/splitunihan.py](../py/splitunihan.py)
-with one argument, the path to the Unihan folder.
-
-Ignore or delete the Unihan\*.txt files now. Do not check them into the tools
-any more.
-
-Check for new and no-longer-present files (Unihan properties).
-`git add` and `git rm` as necessary.
-
 ## Original data file setup instructions
 
 ### 2. Download all of the UnicodeData files for each version into UCD_DIR.

diff --git a/docs/security.md b/docs/security.md
@@ -2,13 +2,6 @@
 
 ## Modifying
 
-Create new revision directory, such as .../unicodetools/data/security/6.3.0. The
-folder will match the version of the UCD used (perhaps with an incrementing 3rd
-field).
-
-*   As usual, use `git cp` to copy the previous directory to the new one. Do not
-    just "mkdir" and copy the files!
-
 To add or fix xidmodifications, look at source/removals.txt.
 
 To add or fix confusables, there are multiple source files. Many were

diff --git a/docs/uca/index.md b/docs/uca/index.md
@@ -7,11 +7,12 @@ the character properties are pretty stable (coming up on the beta),
 Ken inserts all of the new characters into the default sort order.
 
 For a few releases, he has documented his incremental progress with valuable notes
-sent to the ucd-dev mailing list.
+sent to the properties mailing list (formerly the ucd-dev list).
 Markus has been taking the incremental file changes, and the notes, into this repo.
 
 See the history of commits that changed decomps.txt and allkeys.txt.
 (We lost some of that history in the Unicode server crash of 2020.)
+-   For UCA 15.1 see https://github.com/unicode-org/unicodetools/pull/403
 -   For UCA 15 see https://github.com/unicode-org/unicodetools/pull/246
 -   For UCA 14 see https://github.com/unicode-org/unicodetools/pull/71
 -   For the collection of notes for UCA 10 see ducet.md.
@@ -34,12 +35,9 @@ for the CLDR/ICU FractionalUCA.txt data.
 2.  We also need the UCA/DUCET files in
     https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/uca/dev
     When they become first available for a new version, or when they are updated:
-    1.  Note that the following steps are probably no longer necessary.
-        Instead, we get the updated files from Ken, or we run the sifter tool, and
+    1.  We get the updated files from Ken, or we run the sifter tool, and
         update the files in .../data/uca/dev.
-    1.  Download UCA files (mostly allkeys.txt) from
-        `https://www.unicode.org/Public/UCA/{beta version}/`
-    1.  Run `desuffixucd.py` (see the [inputdata](../inputdata.md) page)
+    1.  Download Ken's UCA files (allkeys.txt & decomps.txt).
     1.  Update the input files for the UCA tools, at
         {this repo}/unicodetools/data/uca/dev
 3.  You will use `org.unicode.text.UCA.Main` as your main class.