Skip to content

Commit

Permalink
Initial commit, version 0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
Milan Lukac committed Jul 29, 2019
0 parents commit 6409395
Show file tree
Hide file tree
Showing 14 changed files with 1,201 additions and 0 deletions.
203 changes: 203 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
#### joe made this: http://goel.io/joe

#####=== Linux ===#####
*~

# KDE directory preferences
.directory

# Linux trash folder which might appear on any partition or disk
.Trash-*

#####=== OSX ===#####
.DS_Store
.AppleDouble
.LSOverride

# Icon must end with two \r
Icon


# Thumbnails
._*

# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns

# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk

#####=== Vim ===#####
[._]*.s[a-w][a-z]
[._]s[a-w][a-z]
*.un~
Session.vim
.netrwhist
*~

#####=== Python ===#####

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]

# C extensions
*.so

# Distribution / packaging
.Python
env/
venv/
build/
develop-eggs/
dist/
downloads/
eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml

# Translations
*.mo
*.pot

# Django stuff:
*.log

# Sphinx documentation
docs/_build/

# PyBuilder
target/

#####=== JetBrains ===#####
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm

*.iml

## Directory-based project format:
.idea/
# if you remove the above rule, at least ignore the following:

# User-specific stuff:
# .idea/workspace.xml
# .idea/tasks.xml
# .idea/dictionaries

# Sensitive or high-churn files:
# .idea/dataSources.ids
# .idea/dataSources.xml
# .idea/sqlDataSources.xml
# .idea/dynamic.xml
# .idea/uiDesigner.xml

# Gradle:
# .idea/gradle.xml
# .idea/libraries

# Mongo Explorer plugin:
# .idea/mongoSettings.xml

## File-based project format:
*.ipr
*.iws

## Plugin-specific files:

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties

#####=== Vim ===#####
[._]*.s[a-w][a-z]
[._]s[a-w][a-z]
*.un~
Session.vim
.netrwhist
*~

#####=== VirtualEnv ===#####
# Virtualenv
# http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
.Python
[Bb]in
[Ii]nclude
[Ll]ib
[Ss]cripts
pyvenv.cfg
pip-selfcheck.json

#####=== OSX ===#####
.DS_Store
.AppleDouble
.LSOverride

# Icon must end with two \r
Icon


# Thumbnails
._*

# Files that might appear on external disk
.Spotlight-V100
.Trashes

####=== Airflow ===####
logs
.env
.env2
.env-*
airflow-webserver.pid
airflow.db

####=== coala ===####
.orig

####=== pytest ===####
.pytest_cache

####=== Docker ===####
docker-compose.override.yml
19 changes: 19 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
stages:
- build
- test

black:
stage: build
image: python:3.6
before_script:
- pip install black
script:
- black --check . --diff

pytest:
stage: test
image: python:3.6
before_script:
- pip install -r requirements-test.txt
script:
- pytest
92 changes: 92 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# pg2avro

Postgres to Avro generator.

## Features

- Generate Avro schema from column definition.
- Generate data format consumable for Avro serialization.

# Usage

## Generating schema

Method: `pg2avro.get_avro_schema`

```
get_avro_schema(
"mytable",
"public",
[
# Dictionary mode
{
"name": "column_name_1",
"type": "int2",
"nullable": False,
},
# SqlAlchemy mode
SqlAlchemyColumn(ARRAY(TEXT), name="column_name_2"),
...
]
)
```

Schema generator needs the following information:
- table name
- namespace (`schema` in SQL, `dataset` in Big Query etc.)
- columns - iterable of columns, each element with:
- name
- type - `_` prefix is used to indicate array types
- nullable (optional, `True` assumed if not provided)
- column mapping - optional `ColumnMapping` object with column mappings (see below for more info).

Column data can be passed in multiple formats.

### Supported column formats

- Dictionary with required keys and data
- SqlAlchemy Column object
- Any object with compatible attributes and required data
- Dictionary or object with required data, but without compatible attributes/keys, supplied with ColumnMapping.

Note: this mode supports **generating schema from raw postgres data** - `udt_name` can be used to generate the schema.
```
columns = [
CustomColumn(name="column_name", udt_name="int2", is_nullable=False),
]
get_avro_schema(
table_name,
namespace,
columns,
ColumnMapping(name="name", type="udt_name", nullable="is_nullable"),
)
```

## Generating rows data

Method: `pg2avro.get_avro_row_dict`

This method requires rows data and schema to generate the rows with.

### Supported row formats

- Dictionary with keys corresponding to schema field names
- Object with keys corresponding to schema field names (works the same as dictionary with corresponding fields)
- Tuple with data in the same order as fields specified in schema

```
columns = [
{"name": "name", "type": "varchar", "nullable": False},
{"name": "number", "type": "float4", "float4", "nullable": False},
]
schema = get_avro_schema(table_name, namespace, columns)
rows = [
{"name": "John", "number": 1.0},
RowObject(name="Jack", number=2.0),
("Jim", 3.0),
]
data = [get_avro_row_dict(row, schema) for row in rows]
```
Loading

0 comments on commit 6409395

Please sign in to comment.