-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a Dart Script Inputs VRI Tipitaka XML Files and Outputs SQL Files For Inserting Into SQLite DB #219
Comments
You have made some great progress all on your own. The simple field in toc is no longer used. I will remove that. It would be good to do a call on google meet. |
Missing anna books: Prioritize a Saṃgāyana pucchā book or a missing Ledi Sayadaw book I'll start with "Patanudessa" Get it into our system is prioriorty, focus on Myanmar paragraph numbers and real pages. |
reorganize tpr_downloads to have a release dir where we put .zip with sql file for importing the anna texts (later whole VRI Tipitaka) |
TODOS Iuliu:
Bhante Subhuti:
|
I sent request to janaka for the vri codes |
Anudīpanīpāṭha was suggested. and if you want to do sanghayana pucca , you can also do that. You can choose which one .. probably the ledi sayadaw book will be easier. |
The message I got back was this.. "I don't think the codes are documented anywhere. At least I haven't seen. also I don't think the codes are too complicated to understand studying oneself which is what I did. when you go through looking at the XML file you will intuitively understand what the codes mean. of course if he has any questions I would be happy to answer as well. The problem with making documentation is that I will have to go through a file and try to understand them again since I have forgotten all of it. So it is best to ask questions when you have and I will be happy to answer." He is on facebook under the name Path Nirvana So you can send him a message if you need help. I think this is the link here |
Hi @bksubhuti, I read more carefully the TPR Downloads repo and I understand now that all of the SQL files there are for importing extensions but not the main Tipitaka texts. I wonder how were the current Tipitaka texts available in TIptitak Pali Reader imported? Are there SQL files or code used to generate SQL files for them available anywhere? Or were they imported manually somehow? Or maybe you reused the already loaded database from the Myanmar only app? In any case, I dumped the current DB data to SQL to get started. I followed up on this issue in this PR draft: bksubhuti/tpr_downloads#2. Let's continue the conversation there. |
I will forward a message to @pndaza . He is more familiar the format. Hopefully he can comment and answer your questions. It is important or critical to have the page breaks match his page breaks, especially for the main texts. The main texts are a great tool for a learning exercise rather than the añña books which don't have links in them. The original design made several years has the three top level categories hard-coded. It has caused some issues with searching and we would like to fix this. I thought we had an issue to fix this, but I cannot seem to find it. If you go to book_list_page.dart you will find the correct codes for the topmost level categories. I'm going to breakfast now.. but I think you are exceeding my knowledge of the texts now. Great job. I'll try to send ven pndazza to your PR and also merge this. You have direct access as well to push. |
Note to self and update on progress: |
Original request from Bhante Subhuti:
Here's is what I've understood the task is:
Write a Dart script which processes each XML file of the Roman script version of the Tipitaka provided by VRI and outputs an SQL file for importing the book into the SQLite DB.
Each XML file maps to a book in the
books
table, each book is related to one category in thecategory
table, each book is related to many pages in thepages table
,and each book is related to a toc in thetocs
table.You do not mention the
paragraphs
table here, should that be written to as well?I would add the "simple" field to what table? the
pages
table? It has the "content" text field.What specific XML codes are you referring to? In the XML files all I see are paragraph numbers like:
I don't see information about pages, alt readings, and other book pages and paragraphs.
Where is the program that made the SQL files for importing the books like the one you shared?
The SQL file for importing the chanting book you shared: iit_chantingbook.sql.txt
Log Of What I've Investigated So Far (These are more notes to myself)
I downloaded the app "Tipitaka Pali Reader" from the App Store on my desktop MacOS, and found the SQLite .db file at
/Users/iulspop/Library/Containers/org.americanmonk.tpp/Data/Documents/tipitaka_pali.db
.I also learned I can clone this repo and run:
To download the unsplit
tipitaka_pali.db
file.I downloaded the DB Browser for SQLite to explore the schema in a GUI.
I then look at the structure of the VRI .xml files
It looks like for each of the seven Abhidhamma Piṭaka books there's a
.att.xml
file for the "aṭṭhakathā" or commentary, an.tik.xml
file for the "mūlaṭīkā" or sub-commentary, and a.mul.xml
file for the book.I don't understand what the
.nrf.xml
files are. Some areanuṭīkā
texts, which I think means "sub-sub-commentary"? Others are not from the "Abhidhammapiṭake" nikaya but from other nikaya like "Abhidhammāvatāra-purāṇaṭīkā", or don't have a nikaya attribute at all but only a book title like "Abhidhammatthasaṅgaho". I suppose their additional texts not part of the Pali Canon?I found this "Essence of the Tipitaka" document by VRI a good reference for understanding what texts these various .xml files refer: https://www.tipitaka.org/eot
I'm starting to see a structure.
There's an
abh
series of XML files which contain the Abhidhamma Piṭaka, it's commentaries and sub-commentaries and additonal related texts.There's a
e
series of files that seems to be extra Pali books outside the Tipiṭaka.There's an
s
series of files that are part of the Sutta Piṭaka and it's commentaries and sub-commentaries.Then there's a
vin
series of files that are part of Vinaya Piṭaka.The XML files have these elements (haven't gotten a comprehensive list yet):
head
dix
p
paragraphpb
page breakteiHeader
text
hi
highlightednote
p elements often have a
rend
attribute, like:centre
nikaya
title
book
subsubhead
gatha1
gathalast
subhead
bodytext
indent
gatha2
gatha3
chapter
unindented
hangnum
The text was updated successfully, but these errors were encountered: