Skip to content

Commit

Permalink
Source Greenhouse: migrate to base image and increase test coverage (#…
Browse files Browse the repository at this point in the history
…32397)

Co-authored-by: pnilan <pnilan@users.noreply.github.com>
Co-authored-by: ChristoGrab <christo.grab@gmail.com>
  • Loading branch information
3 people authored Nov 30, 2023
1 parent ca028bb commit 7aaaa06
Show file tree
Hide file tree
Showing 8 changed files with 204 additions and 64 deletions.
16 changes: 0 additions & 16 deletions airbyte-integrations/connectors/source-greenhouse/Dockerfile

This file was deleted.

79 changes: 63 additions & 16 deletions airbyte-integrations/connectors/source-greenhouse/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Firebolt Source

This is the repository for the Firebolt source connector, written in Python.
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/firebolt).
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/greenhouse).

## Local development

Expand Down Expand Up @@ -30,12 +30,12 @@ If this is mumbo jumbo to you, don't worry about it, just put your deps in `setu
should work as you expect.

#### Create credentials
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/firebolt)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_firebolt/spec.json` file.
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/greenhouse)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_greenhouse/spec.json` file.
Note that the `secrets` directory is gitignored by default, so there is no danger of accidentally checking in sensitive information.
See `integration_tests/sample_config.json` for a sample config file.

**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source firebolt test creds`
**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source greenhouse test creds`
and place them into `secrets/config.json`.

### Locally running the connector
Expand All @@ -48,27 +48,75 @@ python main.py read --config secrets/config.json --catalog integration_tests/con

### Locally running the connector docker image

#### Use `airbyte-ci` to build your connector
The Airbyte way of building this connector is to use our `airbyte-ci` tool.
You can follow install instructions [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md#L1).
Then running the following command will build your connector:

#### Build
**Via [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) (recommended):**
```bash
airbyte-ci connectors --name source-firebolt build
airbyte-ci connectors --name source-greenhouse build
```
Once the command is done, you will find your connector image in your local docker registry: `airbyte/source-greenhouse:dev`.

An image will be built with the tag `airbyte/source-firebolt:dev`.
##### Customizing our build process
When contributing on our connector you might need to customize the build process to add a system dependency or set an env var.
You can customize our build process by adding a `build_customization.py` module to your connector.
This module should contain a `pre_connector_install` and `post_connector_install` async function that will mutate the base image and the connector container respectively.
It will be imported at runtime by our build process and the functions will be called if they exist.

**Via `docker build`:**
```bash
docker build -t airbyte/source-firebolt:dev .
Here is an example of a `build_customization.py` module:
```python
from __future__ import annotations

from typing import TYPE_CHECKING

if TYPE_CHECKING:
# Feel free to check the dagger documentation for more information on the Container object and its methods.
# https://dagger-io.readthedocs.io/en/sdk-python-v0.6.4/
from dagger import Container


async def pre_connector_install(base_image_container: Container) -> Container:
return await base_image_container.with_env_variable("MY_PRE_BUILD_ENV_VAR", "my_pre_build_env_var_value")

async def post_connector_install(connector_container: Container) -> Container:
return await connector_container.with_env_variable("MY_POST_BUILD_ENV_VAR", "my_post_build_env_var_value")
```

#### Build your own connector image
This connector is built using our dynamic built process in `airbyte-ci`.
The base image used to build it is defined within the metadata.yaml file under the `connectorBuildOptions`.
The build logic is defined using [Dagger](https://dagger.io/) [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/pipelines/builds/python_connectors.py).
It does not rely on a Dockerfile.

If you would like to patch our connector and build your own a simple approach would be to:

1. Create your own Dockerfile based on the latest version of the connector image.
```Dockerfile
FROM airbyte/source-greenhouse:latest

COPY . ./airbyte/integration_code
RUN pip install ./airbyte/integration_code

# The entrypoint and default env vars are already set in the base image
# ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
# ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]
```
Please use this as an example. This is not optimized.

2. Build your image:
```bash
docker build -t airbyte/source-greenhouse:dev .
# Running the spec command against your patched connector
docker run airbyte/source-greenhouse:dev spec
```
#### Run
Then run any of the connector commands as follows:
```
docker run --rm airbyte/source-firebolt:dev spec
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-firebolt:dev check --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-firebolt:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-firebolt:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
docker run --rm airbyte/source-greenhouse:dev spec
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-greenhouse:dev check --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-greenhouse:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-greenhouse:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
```

## Testing
Expand Down Expand Up @@ -96,4 +144,3 @@ You've checked out the repo, implemented a million dollar feature, and you're re
5. Create a Pull Request: use [our PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#pull-request-title-convention).
6. Pat yourself on the back for being an awesome contributor.
7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.

Large diffs are not rendered by default.

14 changes: 8 additions & 6 deletions airbyte-integrations/connectors/source-greenhouse/metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
data:
ab_internal:
ql: 200
sl: 200
allowedHosts:
hosts:
- harvest.greenhouse.io
connectorBuildOptions:
baseImage: docker.io/airbyte/python-connector-base:1.2.0@sha256:c22a9d97464b69d6ef01898edf3f8612dc11614f05a84984451dde195f337db9
connectorSubtype: api
connectorType: source
definitionId: 59f1e50a-331f-4f09-b3e8-2e8d4d355f44
dockerImageTag: 0.4.3
dockerImageTag: 0.4.4
dockerRepository: airbyte/source-greenhouse
documentationUrl: https://docs.airbyte.com/integrations/sources/greenhouse
githubIssueLabel: source-greenhouse
icon: greenhouse.svg
license: MIT
Expand All @@ -17,12 +23,8 @@ data:
oss:
enabled: true
releaseStage: generally_available
documentationUrl: https://docs.airbyte.com/integrations/sources/greenhouse
supportLevel: certified
tags:
- language:low-code
- language:python
ab_internal:
sl: 200
ql: 400
supportLevel: certified
metadataSpecVersion: "1.0"
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ def is_greater_than_or_equal(self, first: Record, second: Record) -> bool:
"""
Evaluating which record is greater in terms of cursor. This is used to avoid having to capture all the records to close a slice
"""
first_cursor_value = first.get(self.cursor_field)
second_cursor_value = second.get(self.cursor_field)
first_cursor_value = first.get(self.cursor_field, "")
second_cursor_value = second.get(self.cursor_field, "")
if first_cursor_value and second_cursor_value:
return first_cursor_value >= second_cursor_value
elif first_cursor_value:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

from unittest.mock import MagicMock, Mock

import pytest
from airbyte_cdk.sources.streams import Stream
from source_greenhouse.components import GreenHouseSlicer, GreenHouseSubstreamSlicer


@pytest.fixture
def greenhouse_slicer():
date_time = "2022-09-05T10:10:10.000000Z"
return GreenHouseSlicer(cursor_field=date_time, parameters={}, request_cursor_field=None)


@pytest.fixture
def greenhouse_substream_slicer():
parent_stream = MagicMock(spec=Stream)
return GreenHouseSubstreamSlicer(cursor_field='cursor_field', stream_slice_field='slice_field', parent_stream=parent_stream, parent_key='parent_key', parameters={}, request_cursor_field=None)
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@
from source_greenhouse.components import GreenHouseSlicer, GreenHouseSubstreamSlicer


def test_slicer():
def test_slicer(greenhouse_slicer):
date_time = "2022-09-05T10:10:10.000000Z"
date_time_dict = {date_time: date_time}
slicer = GreenHouseSlicer(cursor_field=date_time, parameters={}, request_cursor_field=None)
slicer = greenhouse_slicer
slicer.close_slice(date_time_dict, date_time_dict)
assert slicer.get_stream_state() == {date_time: "2022-09-05T10:10:10.000Z"}
assert slicer.get_request_headers() == {}
Expand Down Expand Up @@ -48,3 +48,90 @@ def test_sub_slicer(last_record, expected, records):
stream_slice = next(slicer.stream_slices()) if records else {}
slicer.close_slice(stream_slice, last_record)
assert slicer.get_stream_state() == expected


@pytest.mark.parametrize(
"stream_state, cursor_field, expected_state",
[
({'cursor_field_1': '2022-09-05T10:10:10.000Z'}, 'cursor_field_1', {'cursor_field_1': '2022-09-05T10:10:10.000Z'}),
({'cursor_field_2': '2022-09-05T10:10:100000Z'}, 'cursor_field_3', {}),
({'cursor_field_4': None}, 'cursor_field_4', {}),
({'cursor_field_5': ''}, 'cursor_field_5', {}),
],
ids=[
"cursor_value_present",
"cursor_value_not_present",
"cursor_value_is_None",
"cursor_value_is_empty_string"
]
)
def test_slicer_set_initial_state(stream_state, cursor_field, expected_state):
slicer = GreenHouseSlicer(cursor_field=cursor_field, parameters={}, request_cursor_field=None)
# Set initial state
slicer.set_initial_state(stream_state)
assert slicer.get_stream_state() == expected_state

@pytest.mark.parametrize(
"stream_state, initial_state, expected_state",
[
(
{'id1': {'cursor_field': '2023-01-01T10:00:00.000Z'}},
{'id2': {'cursor_field': '2023-01-02T11:00:00.000Z'}},
{
'id1': {'cursor_field': '2023-01-01T10:00:00.000Z'},
'id2': {'cursor_field': '2023-01-02T11:00:00.000Z'}
}
),
(
{'id1': {'cursor_field': '2023-01-01T10:00:00.000Z'}},
{'id1': {'cursor_field': '2023-01-01T09:00:00.000Z'}},
{'id1': {'cursor_field': '2023-01-01T10:00:00.000Z'}}
),
(
{},
{},
{}
),
],
ids=[
"stream_state and initial_state have different keys",
"stream_state and initial_state have overlapping keys with different values",
"stream_state and initial_state are empty"
]
)
def test_substream_set_initial_state(greenhouse_substream_slicer, stream_state, initial_state, expected_state):
slicer = greenhouse_substream_slicer
# Set initial state
slicer._state = initial_state
slicer.set_initial_state(stream_state)
assert slicer._state == expected_state


@pytest.mark.parametrize(
"first_record, second_record, expected_result",
[
(
{'cursor_field': '2023-01-01T00:00:00.000Z'},
{'cursor_field': '2023-01-02T00:00:00.000Z'},
False
),
(
{'cursor_field': '2023-02-01T00:00:00.000Z'},
{'cursor_field': '2023-01-01T00:00:00.000Z'},
True
),
(
{'cursor_field': '2023-01-02T00:00:00.000Z'},
{'cursor_field': ''},
True
),
(
{'cursor_field': ''},
{'cursor_field': '2023-01-02T00:00:00.000Z'},
False
),
]
)
def test_is_greater_than_or_equal(greenhouse_substream_slicer, first_record, second_record, expected_result):
slicer = greenhouse_substream_slicer
assert slicer.is_greater_than_or_equal(first_record, second_record) == expected_result
31 changes: 16 additions & 15 deletions docs/integrations/sources/greenhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,19 +62,20 @@ The Greenhouse connector should not run into Greenhouse API limitations under no

## Changelog

| Version | Date | Pull Request | Subject |
| :------ | :--------- | :------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 0.4.3 | 2023-09-20 | [30648](https://github.com/airbytehq/airbyte/pull/30648) | Update candidates.json |
| 0.4.2 | 2023-08-02 | [28969](https://github.com/airbytehq/airbyte/pull/28969) | Update CDK version |
| 0.4.1 | 2023-06-28 | [27773](https://github.com/airbytehq/airbyte/pull/27773) | Update following state breaking changes |
| Version | Date | Pull Request | Subject |
| :------ | :--------- | :------------------------------------------------------- | :------------------------------------------------------------------------------- |
| 0.4.4 | 2023-11-29 | [32397](https://github.com/airbytehq/airbyte/pull/32397) | Increase test coverage and migrate to base image |
| 0.4.3 | 2023-09-20 | [30648](https://github.com/airbytehq/airbyte/pull/30648) | Update candidates.json |
| 0.4.2 | 2023-08-02 | [28969](https://github.com/airbytehq/airbyte/pull/28969) | Update CDK version |
| 0.4.1 | 2023-06-28 | [27773](https://github.com/airbytehq/airbyte/pull/27773) | Update following state breaking changes |
| 0.4.0 | 2023-04-26 | [25332](https://github.com/airbytehq/airbyte/pull/25332) | Add new streams: `ActivityFeed`, `Approvals`, `Disciplines`, `Eeoc`, `EmailTemplates`, `Offices`, `ProspectPools`, `Schools`, `Tags`, `UserPermissions`, `UserRoles` |
| 0.3.1 | 2023-03-06 | [23231](https://github.com/airbytehq/airbyte/pull/23231) | Publish using low-code CDK Beta version |
| 0.3.0 | 2022-10-19 | [18154](https://github.com/airbytehq/airbyte/pull/18154) | Extend `Users` stream schema |
| 0.2.11 | 2022-09-27 | [17239](https://github.com/airbytehq/airbyte/pull/17239) | Always install the latest version of Airbyte CDK |
| 0.2.10 | 2022-09-05 | [16338](https://github.com/airbytehq/airbyte/pull/16338) | Implement incremental syncs & fix SATs |
| 0.2.9 | 2022-08-22 | [15800](https://github.com/airbytehq/airbyte/pull/15800) | Bugfix to allow reading sentry.yaml and schemas at runtime |
| 0.2.8 | 2022-08-10 | [15344](https://github.com/airbytehq/airbyte/pull/15344) | Migrate connector to config-based framework |
| 0.2.7 | 2022-04-15 | [11941](https://github.com/airbytehq/airbyte/pull/11941) | Correct Schema data type for Applications, Candidates, Scorecards and Users |
| 0.2.6 | 2021-11-08 | [7607](https://github.com/airbytehq/airbyte/pull/7607) | Implement demographics streams support. Update SAT for demographics streams |
| 0.2.5 | 2021-09-22 | [6377](https://github.com/airbytehq/airbyte/pull/6377) | Refactor the connector to use CDK. Implement additional stream support |
| 0.2.4 | 2021-09-15 | [6238](https://github.com/airbytehq/airbyte/pull/6238) | Add identification of accessible streams for API keys with limited permissions |
| 0.3.1 | 2023-03-06 | [23231](https://github.com/airbytehq/airbyte/pull/23231) | Publish using low-code CDK Beta version |
| 0.3.0 | 2022-10-19 | [18154](https://github.com/airbytehq/airbyte/pull/18154) | Extend `Users` stream schema |
| 0.2.11 | 2022-09-27 | [17239](https://github.com/airbytehq/airbyte/pull/17239) | Always install the latest version of Airbyte CDK |
| 0.2.10 | 2022-09-05 | [16338](https://github.com/airbytehq/airbyte/pull/16338) | Implement incremental syncs & fix SATs |
| 0.2.9 | 2022-08-22 | [15800](https://github.com/airbytehq/airbyte/pull/15800) | Bugfix to allow reading sentry.yaml and schemas at runtime |
| 0.2.8 | 2022-08-10 | [15344](https://github.com/airbytehq/airbyte/pull/15344) | Migrate connector to config-based framework |
| 0.2.7 | 2022-04-15 | [11941](https://github.com/airbytehq/airbyte/pull/11941) | Correct Schema data type for Applications, Candidates, Scorecards and Users |
| 0.2.6 | 2021-11-08 | [7607](https://github.com/airbytehq/airbyte/pull/7607) | Implement demographics streams support. Update SAT for demographics streams |
| 0.2.5 | 2021-09-22 | [6377](https://github.com/airbytehq/airbyte/pull/6377) | Refactor the connector to use CDK. Implement additional stream support |
| 0.2.4 | 2021-09-15 | [6238](https://github.com/airbytehq/airbyte/pull/6238) | Add identification of accessible streams for API keys with limited permissions |

0 comments on commit 7aaaa06

Please sign in to comment.