Skip to content

Convert GEDCOM Files to an JSON File or an JavaScript-Object and vice versa

License

Notifications You must be signed in to change notification settings

findmypast/gedcom.json

 
 

Repository files navigation

gedcom.json

codecov Known Vulnerabilities

Live example: CodeSandbox

Description

This package can be used to parse a file or a string from gedcom format to an object. The object can be used for further data processing or just to create an JSON file.

⚠️ Conversion from JSON or JS to GEDCOM is currently under development ⚠️

A predefined yaml configuration file (GEDCOM Version 5.5.1) is already included for parsing the data. This is based on the original long name of the gedcom tags and should reflect the structure as best as possible. It's used by default if no other configuration is given.

So this file can be used as a template if the target object should look different.

Why

This is just a side project to import (and later export) GEDCOM files to another project of myself Visual Family Tree.

I am aware that there are already several parsers for the gedcom format. However, I have found few that parse directly into an javascript object and if so, then you have to parse\search this object in order to be able to use it for your own purpose. I want to avoid this parsing and searching by parsing directly to the target format.

Through this own definition of how the parsing should be done, it is possible to process files or lines that differ from the original GEDCOM format.

📁 There are 6 example gedcom files available which i found on the internet. I used this files to test against. All files can be found in the "examples" subfolder. Next to the gedcom files are the converted json files, which were created with this package.

How-To

Use

Via commandline

Just run npx ts-node src/console.ts with the wanted flags. Eg if you run "npm run demo:JSON" it will execute "ts-node src/console.ts --path 'examples/simpsons.get'" and will print out the Simpsons GEDCOM examplke file as JSON object in the console. With "npm run demoFile:JSON" it will do the same but prints the JSON object in a 'test.json' file.

Flag Description
--onlyStats Only print the parsing statistcs to the console
--opt xxx.yaml Set the path to the yaml definition file
--out xxx.json File path to print into
--path xxx.ged Set the path to the GEDCOM file
--silent Don't print anything to the console
--showProgress Print the progress during processing the file
Via Node or JS

In your js\ts file you can import the parsing file via

import { JsonParsing, ParsingOptions } from "gedcom.json";

Create an new parsing options object.

let parsingOptions = new ParsingOptions();

Then you have to set the path to the gedcom and the config file respectively the gedcom content and the config as string.

File content:

 parsingOptions.SetFilePath("examples/simpsons.ged");
 parsingOptions.SetConfigFile("options/version551.yaml"); // optional! uses options/version551.yaml by default

String content:

 parsingOptions.SetText(`
0 HEAD
...
0TRLR
`);
 parsingOptions.SetConfig(`
Definition:
...
`);

Create an new parsing object.

let parse = new JsonParsing(parsingOptions);

To get the result there two methods for parsing of file or string content. One sync with callbacks and one that returns an promise.

File content:

// async
parse.ParseFileAsync().then(result => { ... }).catch(e => {...});
// callbacks because the file is parsed line by line
parse.ParseFile(result => {...}, error => {...});

String content

// async
parse.ParseTextAsync().then(result => { ... }).catch(e => {...});
// sync
parse.ParseText(); // returns also the result object

The result object has two properties: 'Object' and 'Statistics'.

The Object property contains the javascript object. This can be used **directly **or printed to file via

parse.SaveAs(result.Object, "test.json");

Statistics

Property Description
LinesCount Count of all lines
ParsedLinesCount Count of all lines that has been parsed
NotParsedLinesCount Count of all lines that has NOT been parsed
NotParsedLinesList List of all line numbers of not parsed lines
NotParsedLinesWithoutGEDCOMTagCount Count of all lines that has NOT been parsed because their tag is not defined in the yaml definition file
IncorrectLinesCount Count of all incorrect lines (no tag, too long etc pp)
IncorrectLines Array of object from incorrect lines. Properties: LineNumber, Line and Text

Create your own defintion file

Structure

The configuration file has to begin with the Definition Property. Followed by multiple Tag definitions. Each Tag can have different properties including further Tags. It's possible to do a flat Tags definition list, or to do specific defintions of each Tag beneath another Tag. 😆

When a line with a specific tag is parsed the parser searches the matching Tag defintion.

NAME Test /Name/

will search a for an matching Tag defintion in the yaml definition

- Tag: NAME
  Property: Name

The search will always been executed in the given context of the gedcom file.

Example:

0 @Abraham_Simpson@ INDI
1 NAME Abraham /Simpson/

Will first look for an defintion of the NAME Tag beneath the INDI Tag.

- Tag: INDI
  ...
  Properties:
  - Tag: NAME
    Property: Fullname

If this is not defined the search continues be going higher in the defintion until it ends by the 'global' defintion for the NAME Tag.

- Tag: NAME
  Property: Name

In this way it is possible to parse the NAME Tag by default with a property name Name but beneath the INDI object it will be parsed as Fullname property.

Tag definition

The most simple defintion of an Tag looks like the following

- Tag: NAME			# Tag name matching the tag in the gedcom file
  Property: Name	# Name of the target property

Example:

GEDCOM Line JS Property
1 NAME Abraham /Simpson/ Name: "Abraham /Simpson/"

But mostly there is no single value (eg property) defined in the gedcom files, the common case is an object in an array of objects

This will be done via the CollectAs property.

- Tag: INDI					# Tag name mathing the tag in the gedcom file
  CollectAs: Individuals	# Name of the target collection property
- Tag: NAME					# Tag name matching the tag in the gedcom file
  Property: Name			# Name of the target property

Example:

GEDCOM Lines (order matters) JS Result
0 @Abraham_Simpson@ INDI
{
Individuals: []
}
1 NAME Abraham /Simpson/
{
Individuals: [
Name: "Abraham /Simpson/"
]
}

Extended example of the INDI Tag (all properties will be explained below the example:

YAML Definition:

- Tag: INDI
  CollectAs: Individuals
  CollectAsArray: true
  Property: Id
  Properties:
  - Tag: NAME
    Property: Fullname
    Properties:
    - Tag: GIVN
      Property: Givenname
      MergeWithLast: INDI
    - Tag: SURN
      Property: Surname
      MergeWithLast: INDI

GEDCOM Lines

0 @Abraham_Simpson@ INDI
1 NAME Abraham /Simpson/
2 GIVN Abraham
2 SURN Simpson

Result

{
  Individuals: [
    {
      Id: "@Abraham_Simpson@",
      Surname: "Simpson",
      Givenname: "Abraham",
      Fullname: "Abraham /Simpson/"
    }
  ]
}

Tag Properties

CollectAs (String)

Name of the collection for all contained sub-objects. This can be interpretated as an object path if "." are included.

CollectAsArray (Boolean)

Default value is false.

If set to false or not set the collection is an object if only a single item is in the collection, and an array if more than one object is collect.

If set to true, the collection will be always an array.

Example:

YAML Config:

- Tag: INDI
  CollectAs: Individuals
  Property: Id

GEDCOM Lines (only one object)

0 @Abraham_Simpson@ INDI

Result

{
  Individuals: 
  {
    Id: "@Abraham_Simpson@"
  }
}

GEDCOM Lines (two objects) with same config

0 @Abraham_Simpson@ INDI
0 @Homer_Simpson@ INDI

Result

{
  Individuals: 
  [
    {
      Id: "@Abraham_Simpson@"
    },
    {
      Id: "@Homer_Simpson@"
    }
  ]
}

Set CollectAsArray to true.

YAML Config:

- Tag: INDI
  CollectAs: Individuals
  CollectAsArray: true
  Property: Id

GEDCOM Lines (only one object)

0 @Abraham_Simpson@ INDI

Result

{
  Individuals: 
  [
    {
      Id: "@Abraham_Simpson@"
    }
  ]
}
ConvertTo

Long form of Type property. Allows input of conversion options.

Array

Optional: Delimiter (default value is ',')

 Definition:
 - Tag: NOTE
   CollectAs: Notes
   Property: 
     Name: Value
     ConvertTo:
       Type: Array
       Delimiter: "#"
0 NOTE A#B#C,D

Result:

{
  Notes:
  {
    Value: [ "A", "B", "C,D" ]
  }
}
Date

A detailed explanaition of Date\Time conversions can be found here.

Optional:

Property Default value
About About
And And
After After
Before Before
Between Between
Calculated Calculated
Calendar Calendar
Estimated Estimated
From From
HasDay HasDay
HasMonth HasMonth
HasYear HasYear
Interpreted Interpreted
Original Original
To To
Value Value

Converts a GEDCOM Date String to a date. Because there a multiple variants possible, it will parse the date into an object with the date(s) as values an different flags. So no information will be lost.

Definition:
- Tag: DATES
  Properties:
  - Tag: DATE
    Property: Date
    ConvertTo: 
      Type: Date
      From: Start		# Property will be "Start" instead the default value "From"
      To: End		    # Property will be "End" instead the default value "To"
      Original: Initial	# Property will be "Initial" instead the default value "Original"
      Value: JSDate		# Property will be "JSDate" instead the default value "Value"
0 @1@ DATES
1 DATE FROM 4 FEB 1980 TO 4 JUN 1999

Result:

{
  Date:
  {
    Start: 
    {
      JSDate: new Date(1980, 1, 4, 0, 0 , 0),
      HasYear: true,
      HasMonth: true,
      HasDay: true
    },
    End: {
      JSDate: new Date(1999, 5, 4, 0, 0 , 0),
      HasYear: true,
      HasMonth: true,
      HasDay: true
    },
    Initial: "FROM 4 FEB 1980 TO 4 JUN 1999",
  }
}
String

Optional:

Property Default value Description
NewLineCharacter \n Defines the new line character
NewLineIfEmpty false Defines if a new line is added if the object value is empty
Definition:
- Tag: NOTE
  CollectAs: Notes
  Properties:
  - Tag: CONC
    Property: Text   
    Type: String 
  - Tag: CONT
    Property: Text
    ConvertTo:
      Type: String
      NewLineIfEmpty: true		# all empty CONT values will add a new line
      NewLineCharacter: " | "   # value of new line
0 @N00010@ NOTE
1 CONC 1
1 CONT
1 CONT A
1 CONT B
1 CONT C
1 CONT
1 CONT ...

Result:

{
  Notes:
  {
    Text: `1 | ABC | ...`
  }
}
Time

A detailed explanaition of Date\Time conversions can be found here.

Works only in combination with a previous defined explicit date eg 1 JAN 1999. Will add the time to the date

Definition:
- Tag: DATES
  Properties:
  - Tag: DATE
    Property: Date
    Type: Date
    Properties:
    - Tag: TIME
      Property: Time
      ConvertTo:
        Type: Time
0 @1@ DATES
1 DATE 4 JUN 1999
2 TIME 14:35:22

Result:

{
  Date:
  {
    Value: new Date(1999, 5, 4, 14, 35 , 22),	// Value is date and time combined
    HasYear: true,
    HasMonth: true,
    HasDay: true,
    Original: "4 JUN 1999",
    Time: "14:35:22"  // is own property because of TIME has a property defined
  }
}

Time has no own property

Definition:
- Tag: DATES
  Properties:
  - Tag: DATE
    Property: Date
    Type: Date
    Properties:
    - Tag: TIME
      ConvertTo:
        Type: Time

Result:

{
Date:
  {
    Value: new Date(1999, 5, 4, 14, 35 , 22),	// Value is date and time combined
    HasYear: true,
    HasMonth: true,
    HasDay: true,
    Original: "4 JUN 1999 14:35:22"	// date and time combined, because the original time value will else be lost	 
  }
}
IsSingleValue (Boolean)

Normally if a property value is found twice or more it will be converted to an array of values. With IsSingleValue it is possible to force the last found value to win.

 Definition:
 - Tag: NOTE
   Property: Note
0 NOTE Note1
0 NOTE Note2

Result:

{
    Note: [ "Note1", "Note2" ]
}

Set IsSingleValue to be 'true';

 Definition:
 - Tag: NOTE
   Property: Note
   IsSingleValue: true

Result (last value wins):

{
    Note: "Note2"
}
MergeWithNext (String)

It's possible to merge a value with the next object with the given Tag. If no next Tag will be found, the value will be ignored.

Example:

 Definition:
 - Tag: A
   CollectAs: A
 - Tag: B
   Property: Value_B
 - Tag: C
   Property: Value_C
 - Tag: D
   Property: Value_D
   MergeWithNext: B	# merge result with next Tag B
0 A
1 D Value_Of_D
1 C Value_Of_C
1 B Value_Of_B

Result:

{
  A: 
  [
    // parsed Tag C
    {
      Value_C: "Value_Of_C"
    },
    // parsed Tag B merged with Tag D
    {
      Value_B: "Value_Of_B",
      Value_D: "Value_Of_D"
    }
  ]
}
MergeWithLast (String OR Boolean)

Same as MergeWithNext but in the reversed direction. Can be an string to define a specific Tag or 'true' to merge with the last defined object before this object.

Example (String)

 Definition:
 - Tag: A
   CollectAs: A
 - Tag: B
   Property: Value_B
 - Tag: C
   Property: Value_C
 - Tag: D
   Property: Value_D
   MergeWithLast: B	# merge result with last Tag B
0 A
1 B Value_Of_B
1 D Value_Of_D
1 C Value_Of_C

Result:

{
A: 
  [
    // parsed Tag B merged with Tag D
    {
      Value_B: "Value_Of_B",
      Value_D: "Value_Of_D"
    },        
    // parsed Tag C
    {
      Value_C: "Value_Of_C"
    },
  ]
}

Example (Boolean)

Definition:
- Tag: NOTE
  Property: Id
  CollectAs: Notes
- Tag: CONC
  MergeWithLast: true
- Tag: CONT
  MergeWithLast: true
0 @N00010@ NOTE
1 CONC [RCKarnes.ged]
1 CONT
1 CONT In Norse mythology, the god Bor, or Borr was the father of Odin, Ve an
1 CONC d Vili by the frost giantess Bestla.  Bor was the son of the giant Buri.
1 CONT
1 CONT ...

Result:

{
  Notes: {
    Id: "@N00010@",
    Text: [RCKarnes.ged]In Norse mythology, the god Bor, or Borr was the father of Odin, Ve and Vili by the frost giantess Bestla.  Bor was the son of the giant Buri....
  }
}

⚠️ Because the object has no last property (CONC has no defined property) there will be created on property with the name Text.

Example (Boolean) WITH defined parent property

Definition:
- Tag: NOTE
  Property: Id
  CollectAs: Notes
- Tag: EVEN
  CollectAs: Events
  Property: Name
- Tag: CONC
  MergeWithLast: true
0 @N00010@ NOTE
1 EVEN RCKarnes-RootsWeb & John D Newport-Ancestry.com (johndnewport@valornet
2 CONC .com)

Result:

{
  Notes: 
  {
    Id: "@N00010@",
    Events: 
    {
      Name: "RCKarnes-RootsWeb & John D Newport-Ancestry.com (johndnewport@valornet.com)",
    }
  }
}
Property (String)

Property Name in the object. This can be interpretated as an object path if "." are included.

0 @ID@ WHAT
0 TRLR
Definition:
- Tag: WHAT
  Property: Id

Result:

{
    Id: 'ID'
}

Configuration with 'path':

Definition:
- Tag: WHAT
  Property: What.Id

Result:

{
  What: 
  {
    Id: 'ID'
  }
}
Properties (Object)

Defines that a Tag definition has specific definitions for following Tags.

- Tag: INDI
  CollectAs: Individuals
  CollectAsArray: true
  Property: Id
  Properties:				# specifies the NAME Tag
  - Tag: NAME
    Property: Fullname
    Properties:				# specifies the GIVN and SURN Tag
    - Tag: GIVN
      Property: Givenname
      MergeWithLast: INDI
    - Tag: SURN
      Property: Surname
      MergeWithLast: INDI
Replace (Object)

Could be used to replace substring in the property value.

Property Description
Value Text to replace
With Text to replace with
Definition:
- Tag: INDI
  CollectAs: Persons
  Properties:
  - Tag: RESI
    Properties:
    - Tag: EMAIL
      Property: EMail
      Replace:
        Value: "@@"		# replace '@@'
        With: "@"		# with a single '@'
0 @1@ INDI
1 RESI
2 EMAIL email@@test.com
1 RESI
2 EMAIL anotherEmail@@test.com

Result:

{
  Persons:
  {
    EMail: [ 'email@test.com', 'anotherEmail@test.com' ]
  }
}
StartWith (String)

Adds the given string at the beginning of the value.

Definition:
- Tag: INDI
  CollectAs: Persons
  Properties:
  - Tag: RESI
    Properties:
    - Tag: EMAIL
      Property: EMail
      StartWith: >-		# needed in yaml because the string value 'mail:' end with ':'
        mail:		    # add 'mail:' at the beginning of each EMail value
0 @1@ INDI
1 RESI
2 EMAIL email@@test.com
1 RESI
2 EMAIL anotherEmail@@test.com

Result:

{
  Persons:
  {
  EMail: [ 'mail:email@@test.com', 'mail:anotherEmail@@test.com' ]
  }
}
StripHtml (Boolean)

Could be used to remove html from property values.

Definition:
- Tag: INDI
  CollectAs: Persons
  Properties:
  - Tag: NOTE
    Property: Note
    StripHtml: true
0 @1@ INDI
1 NOTE <p>Whatever</p>

Result:

{
  Persons: 
  { 
    Note: 'Whatever' 
  }
}
Tag

Reference to GEDCOM format. This activates parsing of this tag. If a tag that has not been defined occurs when a GEDCOM formatted line is run through, this and all subsequent subordinate lines are ignored.

Tags must be redefined for each property within the main tag. Any tag can be used and is not tied to a defined GEDCOM tag. 🚀 If you defined it the parser can parse it.

Type

Short version of ConvertTo without options. The default values of the options will be used.

- Tag: CONC
  Property: Text   
  Type: String 
- Tag: DATE
  Property: Date
  Type: Date
- Tag: NOTE
  Property: Time
  Type: Array
- Tag: TIME
  Property: Time
  Type: Time

Date and Time Parsing

Only done if the Type or ConvertTo property is defined. All property names of the parsed object can be changed via configuration.

Is a single date is not explicit like 1 JAN 1999 it will be interpreted as a between value. For example JAN 1999 is a value between 1 JAN 1999 and 31 JAN 1999 and 1999 is a value between 1 JAN 1999 and 31 DEC 1999.

Also other date formats are supported.

@#DGREGORIAN@ 4 FEB 1980	# gregorian calendar -> standard
@#DJULIAN@ 22 JAN 1980		# julian calendar
@#DHEBREW@ 17 SHV 5740		# hebrew calendar
From-To

A date string that define a ranges of dates.

FROM 4 FEB 1980 TO 4 JUN 1999
{
    From: {
        Value: new Date(1980, 1, 4, 0, 0 , 0),
        HasYear: true,
        HasMonth: true,
        HasDay: true
    },
    To: {
        Value: new Date(1999, 5, 4, 0, 0 , 0),
        HasYear: true,
        HasMonth: true,
        HasDay: true
    },
    Original: "FROM 4 FEB 1980 TO 4 JUN 1999"
}
Between

A date string that define a date between a range of dates. Actually the same as with "From-To" but the property names are different.

BETWEEN 4 FEB 1980 AND 4 JUN 1999

Result:

{
    Between: {
        Value: new Date(1980, 1, 4, 0, 0 , 0),
        HasYear: true,
        HasMonth: true,
        HasDay: true
    },
    And: {
        Value: new Date(1999, 5, 4, 0, 0 , 0),
        HasYear: true,
        HasMonth: true,
        HasDay: true
    },
    Original: "BETWEEN 4 FEB 1980 AND 4 JUN 1999"
}

Dates like JAN 1999 will also be converted to a between value. JAN 1999 for example is between 01.01.1999 and 01.02.1999

Single Dates

A single date in the GEDCOM format looks like 20 JAN 1999.

20 JAN 1999
{
    Value: new Date(1999, 0, 20, 0, 0 , 0),
    HasYear: true,
    HasMonth: true,
    HasDay: true,
    Original: "20 JAN 1999"
}

There are different markings for dates. These are optional. For each a new property with the value 'true' is added.

Marker Description Example
EST Estimated based on an algorithm using some other event date EST 20 JAN 1999
ABT About, meaning the date is not exact. ABT 20 JAN 1999
CAL Calculated mathematically, for example, from an event date and age. CAL 20 JAN 1999
AFT Event happened after the given date. AFT 20 JAN 1999
BEF Event happened before the given date. BEF 20 JAN 1999
INT Interpreted from knowledge about the associated date phrase included in parentheses. Returns just the text value. INT Sometime

Estimated example (but it's the same for ABT, CAL, AFT and BEF).

EST 20 JAN 1999
{
    Value: new Date(1999, 0, 20, 0, 0 , 0),
    HasYear: true,
    HasMonth: true,
    HasDay: true,
    Original: "20 JAN 1999",
    Estimated: true
}

Interpreted example

INT At the end of 2020
{
    Value: "At the end of 2020",
    Original: "INT At the end of 2020",
    Interpreted: true
}
Time

It is possible that a subordinate time is specified for a date. These can be summarized.

But ONLY when the date is explicit. Dates with only a year or a combination of month and year, don't work.

About

Convert GEDCOM Files to an JSON File or an JavaScript-Object and vice versa

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 99.8%
  • JavaScript 0.2%