Skip to content

Commit

Permalink
feat: implements datacollection function
Browse files Browse the repository at this point in the history
+ Adds proto definition for ConversationBit.
+ Changes ConversationBit.Created field to int64.
+ Adds error-handling in client side JS.
  • Loading branch information
telpirion authored Nov 16, 2024
2 parents aa7a159 + e924ff3 commit 8a6f231
Show file tree
Hide file tree
Showing 19 changed files with 1,052 additions and 50 deletions.
16 changes: 15 additions & 1 deletion .github/workflows/go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:

jobs:

build:
build_server:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
Expand All @@ -24,3 +24,17 @@ jobs:
- name: Build
working-directory: ./server
run: go build -v ./...

build_service:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.23.1'

- name: Build data collection
working-directory: ./services/data-collection
run: go build -v .
40 changes: 40 additions & 0 deletions docs/Instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,43 @@ $ docker build . -t myherodotus -f Dockerfile --build-arg BUILD_VER=Herodotus
$ docker tag myherodotus us-west1-docker.pkg.dev/${PROJECT_ID}/my-herodotus/base-image:${SEMVER}
$ docker push us-west1-docker.pkg.dev/${PROJECT_ID}/my-herodotus/base-image:${SEMVER}
```

## Update protobuf files

This project uses [protocol buffer][protobuf] files to define types shared between the
different services and surfaces of the application. These files must be updated and the output
files regenerated when new fields are added.

**NOTE**: The tool registry used by `buf` has a rate limit of 10 unauthenticated requests
per hour :/. If this quota is reached, use [protoc directly instead][#protoc]

1. Install the [`buf` CLI][buf].

```sh
$ GO111MODULE=on go install github.com/bufbuild/buf/cmd/buf@v1.47.2
```

1. Check the installation.

```sh
buf --version
```

1. Build the protos. From the `protos/` directory, run the following command.

```sh
buf build
```

1. Generate the protos. From the `protos/` directory, run the following command.

```sh
buf generate
```

### Generate with protoc-gen-go {#protoc}

1.

[buf]: https://buf.build/docs/tutorials/getting-started-with-buf-cli
[protobuf]: https://protobuf.dev/getting-started/gotutorial/
53 changes: 53 additions & 0 deletions docs/Services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Services

The MyHerodotus app has several microservices running behind the scenes to assist
with data collection, evaluation, and model tuning.

## Data collection

The [Firestore-to-BigQuery](../services/data-collection/) service updates a
BigQuery table with user data and responses from the MyHerodotus app. The service
is triggered by a specific event: when a document is updated in the Firestore database.
This event is used for data collection because it only occurs when a user has rated
a response provided by the app.

All data collected has had PII removed from it, specifically first name, last name,
age, and email addresses. This list of deidentified info types is configurable in
the app.

The following code shows the equivalent gcloud command for exporting from Firestore.

```sh
$ gcloud firestore export gs://myherodotus --database=l200 --collection-ids=HerodotusDev,Conversations
```

### Deploy the service to Cloud functions

To deploy the `data-collection` function to Cloud Run, run the following command from the
`data-collection/` directory. Be sure to set the project ID using `gcloud config set project`.

**IMPORTANT**: Make sure that `$PROJECT_ID` and `$DATASET_NAME` env vars are set before deploying
the function!

```sh
$ gcloud functions deploy data-collection \
--gen2 \
--runtime=go121 \
--region="us-west1" \
--trigger-location="us-west1" \
--source=. \
--entry-point=CollectData \
--set-env-vars PROJECT_ID=${PROJECT_ID},DATASET_NAME=${DATASET_NAME},BUILD_VER=Herodotus \
--trigger-event-filters="type=google.cloud.firestore.document.v1.updated" \
--trigger-event-filters="database=l200" \
--trigger-event-filters-path-pattern=document='Herodotus/{userId}/Conversations/{conversationId}'
```

### Sources

+ https://cloud.google.com/functions/docs/calling/cloud-firestore
+ https://cloud.google.com/functions/docs/tutorials/storage
+ https://cloud.google.com/functions/docs/calling/eventarc
+ https://cloud.google.com/eventarc/docs/reference/supported-events#cloud-firestore
+ https://cloud.google.com/bigquery/docs/loading-data-cloud-firestore#python
+ https://cloud.google.com/firestore/docs/manage-data/export-import#gcloud
Binary file modified docs/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 18 additions & 0 deletions protos/buf.gen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: v2
managed:
enabled: false
#override:
# - file_option: go_package_prefix
# path: conversation.proto
# value: myherodotus.com/datacollection
plugins:
- remote: buf.build/protocolbuffers/go
out: ../server
opt:
- module=myherodotus.com/main
- remote: buf.build/protocolbuffers/go
out: ../services/data-collection
opt:
- module=myherodotus.com/datacollection
inputs:
- directory: .
10 changes: 10 additions & 0 deletions protos/buf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# For details on buf.yaml configuration, visit https://buf.build/docs/configuration/v2/buf-yaml
version: v2
modules:
- path: .
lint:
use:
- STANDARD
breaking:
use:
- FILE
14 changes: 14 additions & 0 deletions protos/conversation.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
syntax = "proto3";
package myherodotus;

//option go_package = "myherodotus.com/main";

message ConversationBit {
string bot_response = 1;
string user_query = 2;
string model = 3;
string prompt = 4;
int64 created = 5;
int32 token_count = 6;
string rating = 7;
}
189 changes: 189 additions & 0 deletions server/conversation.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 8a6f231

Please sign in to comment.