Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OC-888: Deploying ARI ingests to prod #692

Merged
merged 20 commits into from
Sep 27, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
4f00f48
explicitly stop scheduled ari ingest on int and prod
finlay-jisc Sep 18, 2024
8f10bfe
rename directory to integration to fit convention
finlay-jisc Sep 18, 2024
e685b79
allow incremental ari ingest to be triggered over HTTP or scheduled
finlay-jisc Sep 18, 2024
c862c9a
send email with details at end of incremental ari ingest
finlay-jisc Sep 18, 2024
8b8b620
remove mostly unused helper
finlay-jisc Sep 18, 2024
9694200
add new environment variable for ingest report recipients
finlay-jisc Sep 18, 2024
e250c06
serverless offline doesnt need ssm parameters defined, it uses env file
finlay-jisc Sep 18, 2024
bbb755a
Merge remote-tracking branch 'origin/main' into OC-888
finlay-jisc Sep 24, 2024
49e1f49
move ari utils and tests into integration component
finlay-jisc Sep 24, 2024
8285f73
use schedule and http trigger in one lambda function
finlay-jisc Sep 24, 2024
1750b75
reject ari ingest trigger if already running
finlay-jisc Sep 24, 2024
ee74804
fix integrations readme link
finlay-jisc Sep 24, 2024
86e9656
fix ingest log tests
finlay-jisc Sep 24, 2024
4e54f55
reintroduce local env vars, but don't use unnecessary plugin
finlay-jisc Sep 24, 2024
6656274
vscode playwright seems to handle paths differently now
finlay-jisc Sep 24, 2024
30c1a75
remove remaining serverless-offline-ssm config
finlay-jisc Sep 25, 2024
522f178
add new env var to dockerfile
finlay-jisc Sep 25, 2024
bfba3b8
add ingest recipients email to local env vars
finlay-jisc Sep 25, 2024
4d61426
Revert "remove mostly unused helper"
finlay-jisc Sep 25, 2024
d41b6f7
rework code not to use ts-ignore
finlay-jisc Sep 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion api/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,10 @@ MAIL_SERVER=localhost
LOCALSTACK_SERVER=http://localhost:4566

QUEUE_URL=http://localhost:4566/000000000000/your-queue-name
SQS_ENDPOINT=http://localhost:4566
SQS_ENDPOINT=http://localhost:4566

finlay-jisc marked this conversation as resolved.
Show resolved Hide resolved
LIST_USERS_API_KEY=12345
TRIGGER_ARI_INGEST_API_KEY=12345

SLACK_CHANNEL_EMAIL=example@mailinator.com
INGEST_REPORT_RECIPIENTS=example.one@mailinator.com,example.two@mailinator.com
2 changes: 1 addition & 1 deletion api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ A similar process happens when the database is seeded. After publications are in

## Integrations

Octopus is built to integrate with some external systems in order to import publications. For more information please read the dedicated [integrations readme](./src/lib/integrations/README.md).
Octopus is built to integrate with some external systems in order to import publications. For more information please read the dedicated [integrations readme](./src/components/integration/README.md).

---

Expand Down
10 changes: 0 additions & 10 deletions api/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,6 @@
"prisma": "^5.11.0",
"puppeteer": "^22.12.0",
"serverless-offline": "^12.0.4",
"serverless-offline-ssm": "^6.2.0",
"serverless-plugin-split-stacks": "^1.13.0",
"serverless-prune-plugin": "^2.0.2",
"serverless-webpack": "^5.13.0",
Expand Down
2 changes: 1 addition & 1 deletion api/scripts/fullAriImport.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import { expand } from 'dotenv-expand';
// Important to do this so that environment variables are treated the same as in deployed code.
expand(dotenv.config());

import * as ariUtils from 'lib/integrations/ariUtils';
import * as ariUtils from 'integration/ariUtils';
import * as I from 'interface';

const fullAriImport = async (): Promise<string> => {
Expand Down
6 changes: 5 additions & 1 deletion api/serverless-config-default.yml
Original file line number Diff line number Diff line change
Expand Up @@ -494,9 +494,13 @@ functions:
cors: true
# Integrations
incrementalAriIngest:
handler: src/components/integrations/service.incrementalAriIngest
handler: src/components/integration/routes.incrementalAriIngest
timeout: 900
events:
- schedule:
rate: cron(0 5 ? * TUE *) # Every Tuesday at 5 a.m.
enabled: ${self:custom.scheduledAriIngestEnabled.${opt:stage}, false}
- http:
path: ${self:custom.versions.v1}/integrations/ari/incremental
method: POST
cors: true
47 changes: 25 additions & 22 deletions api/serverless-offline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ frameworkVersion: ${file(./serverless-config-default.yml):frameworkVersion}
useDotenv: ${file(./serverless-config-default.yml):useDotenv}

plugins:
- serverless-offline-ssm
- serverless-offline
- serverless-webpack
- serverless-webpack-prisma
Expand All @@ -19,29 +18,33 @@ provider:
stage: 'local'
environment:
STAGE: ${self:provider.stage}
ELASTICSEARCH_USER: ${ssm:/elasticsearch_user_${self:provider.stage}_octopus}
ELASTICSEARCH_PASSWORD: ${ssm:/elasticsearch_password_${self:provider.stage}_octopus}
ELASTICSEARCH_ENDPOINT: ${ssm:/elasticsearch_endpoint_${self:provider.stage}_octopus}
ELASTICSEARCH_PROTOCOL: ${ssm:/elastic_search_protocol_${self:provider.stage}_octopus}
DATABASE_URL: ${ssm:/db_connection_string_${self:provider.stage}_octopus}
JWT_SECRET: ${ssm:/jwt_secret_${self:provider.stage}_octopus}
DATABASE_URL: ${env:DATABASE_URL}
ORCID_SECRET: ${env:ORCID_SECRET}
ORCID_ID: ${env:ORCID_ID}
ORCID_AUTH_URL: ${env:ORCID_AUTH_URL}
ORCID_MEMBER_API_URL: ${env:ORCID_MEMBER_API_URL}
JWT_SECRET: ${env:JWT_SECRET}
EMAIL_SENDER_ADDRESS: ${env:EMAIL_SENDER_ADDRESS}
BASE_URL: ${env:BASE_URL}
AUTHORISATION_CALLBACK_URL: ${env:AUTHORISATION_CALLBACK_URL}
ELASTICSEARCH_PROTOCOL: ${env:ELASTICSEARCH_PROTOCOL}
ELASTICSEARCH_USER: ${env:ELASTICSEARCH_USER}
ELASTICSEARCH_PASSWORD: ${env:ELASTICSEARCH_PASSWORD}
ELASTICSEARCH_ENDPOINT: ${env:ELASTICSEARCH_ENDPOINT}
VALIDATION_CODE_EXPIRY: 10
VALIDATION_CODE_ATTEMPTS: 3
ORCID_ID: ${ssm:/orcid_app_id_${self:provider.stage}_octopus}
ORCID_SECRET: ${ssm:/orcid_secret_key_${self:provider.stage}_octopus}
ORCID_AUTH_URL: ${ssm:/orcid_auth_url_${self:provider.stage}_octopus}
ORCID_MEMBER_API_URL: ${ssm:/orcid_member_api_url_${self:provider.stage}_octopus}
DOI_PREFIX: ${ssm:/doi_prefix_${self:provider.stage}_octopus}
DATACITE_ENDPOINT: ${ssm:/datacite_endpoint_${self:provider.stage}_octopus}
DATACITE_USER: ${ssm:/datacite_user_${self:provider.stage}_octopus}
DATACITE_PASSWORD: ${ssm:/datacite_password_${self:provider.stage}_octopus}
EMAIL_SENDER_ADDRESS: ${ssm:/email_sender_address_${self:provider.stage}_octopus}
BASE_URL: ${ssm:/base_url_${self:provider.stage}_octopus}
AUTHORISATION_CALLBACK_URL: ${ssm:/authorization_callback_url_${self:provider.stage}_octopus}
LIST_USERS_API_KEY: ${ssm:/list_users_api_key_${self:provider.stage}_octopus}
QUEUE_URL: ${ssm:/queue_url_${self:provider.stage}_octopus}
SQS_ENDPOINT: ${ssm:/sqs_endpoint_${self:provider.stage}_octopus}
MAIL_SERVER: ${ssm:/mail_server_${self:provider.stage}_octopus}
DOI_PREFIX: ${env:DOI_PREFIX}
DATACITE_ENDPOINT: ${env:DATACITE_ENDPOINT}
DATACITE_USER: ${env:DATACITE_USER}
DATACITE_PASSWORD: ${env:DATACITE_PASSWORD}
MAIL_SERVER: ${env:MAIL_SERVER}
LOCALSTACK_SERVER: ${env:LOCALSTACK_SERVER}
QUEUE_URL: ${env:QUEUE_URL}
SQS_ENDPOINT: ${env:SQS_ENDPOINT}
LIST_USERS_API_KEY: ${env:LIST_USERS_API_KEY}
TRIGGER_ARI_INGEST_API_KEY: ${env:TRIGGER_ARI_INGEST_API_KEY}
SLACK_CHANNEL_EMAIL: ${env:SLACK_CHANNEL_EMAIL}
INGEST_REPORT_RECIPIENTS: ${env:INGEST_REPORT_RECIPIENTS}
deploymentBucket:
tags:
Project: Octopus
Expand Down
5 changes: 4 additions & 1 deletion api/serverless.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ provider:
SQS_ENDPOINT: ${ssm:/sqs_endpoint_${self:provider.stage}_octopus}
MAIL_SERVER: ${ssm:/mail_server_${self:provider.stage}_octopus}
SLACK_CHANNEL_EMAIL: ${ssm:/slack_channel_email_${self:provider.stage}_octopus}
TRIGGER_ARI_INGEST_API_KEY: ${ssm:/trigger_ari_ingest_api_key_${self:provider.stage}_octopus}
INGEST_REPORT_RECIPIENTS: ${ssm:/ingest_report_recipients_${self:provider.stage}_octopus}
deploymentBucket:
tags:
Project: Octopus
Expand Down Expand Up @@ -105,7 +107,8 @@ custom:
automatic: true
number: 3
scheduledAriIngestEnabled:
int: true
int: false
prod: false
functions:
- ${file(./serverless-config-default.yml):functions}
- ${file(./serverless-config-deploy.yml):functions}
8 changes: 4 additions & 4 deletions api/src/components/ingestLog/__tests__/ingestLog.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ describe('Ingest log functions', () => {
});
});

test('Get most recent start time', async () => {
const mostRecentStart = await ingestLogService.getMostRecentStartTime('ARI');
expect(mostRecentStart).toEqual(new Date('2024-09-11T12:53:00.000Z'));
test('Get most recent log', async () => {
const mostRecentLog = await ingestLogService.getMostRecentLog('ARI');
expect(mostRecentLog?.start).toEqual(new Date('2024-09-11T12:53:00.000Z'));
});

test('Most recent start is null if no run that ended successfully is present', async () => {
await client.prisma.ingestLog.update({ where: { id: 'ingest-log-1' }, data: { end: null } });
const mostRecentStart = await ingestLogService.getMostRecentStartTime('ARI');
const mostRecentStart = await ingestLogService.getMostRecentLog('ARI');
expect(mostRecentStart).toBeNull();
});
});
20 changes: 4 additions & 16 deletions api/src/components/ingestLog/service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,14 @@ export const setEndTime = (id: string, end: Date) =>
}
});

export const getMostRecentStartTime = async (source: I.PublicationImportSource): Promise<Date | null> => {
const mostRecentStartQuery = await client.prisma.ingestLog.findFirst({
export const getMostRecentLog = (source: I.PublicationImportSource, includeOpenLogs?: boolean) =>
client.prisma.ingestLog.findFirst({
where: {
source,
// Successful runs only.
end: {
not: null
}
// By default, get successful (having an end time) logs only.
...(includeOpenLogs ? {} : { end: { not: null } })
},
orderBy: {
start: 'desc'
},
select: {
start: true
}
});

if (mostRecentStartQuery) {
return mostRecentStartQuery.start;
} else {
return null;
}
};
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ On import, ARIs go through a handling flow:

#### How ARI data is mapped to octopus data

Various ARI fields are mapped to octpous ones in the `mapAriQuestionToPublicationVersion` function in [ariUtils.ts](./ariUtils.ts).
Various ARI fields are mapped to octopus ones in the `mapAriQuestionToPublicationVersion` function in [ariUtils.ts](./ariUtils.ts).

Of particular importance is how ARIs are matched to an owning organisational user account. The mapping process expects a UserMapping to exist associating the `department` field value from the ARI (where the title matches, case insensitive, and the mapping source is 'ARI') with the user ID of an organisational account.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import * as ariUtils from 'lib/integrations/ariUtils';
import * as ariUtils from 'integration/ariUtils';
import * as I from 'interface';
import * as ingestLogService from 'ingestLog/service';
import * as testUtils from 'lib/testUtils';

// This ARI will match a publication in the seed data via the questionId.
Expand Down Expand Up @@ -120,6 +121,18 @@ describe('ARI Mapping', () => {
});
});

test('Unrecognised topics are reported', async () => {
const mappingAttempt = await ariUtils.mapAriQuestionToPublicationVersion({
...sampleARIQuestion,
topics: ['unrecognised topic']
});
expect(mappingAttempt).toMatchObject({
success: true,
message: 'Found unrecognised topic(s).',
unrecognisedTopics: ['unrecognised topic']
});
});

test('Department is matched to existing user', async () => {
const mappingAttempt = await ariUtils.mapAriQuestionToPublicationVersion(sampleARIQuestion);
expect(mappingAttempt).toMatchObject({
Expand All @@ -135,7 +148,8 @@ describe('ARI Mapping', () => {
expect(mappingAttempt).toMatchObject({
success: false,
mappedData: null,
message: 'User not found for department: unrecognised department.'
message: 'User not found for department: unrecognised department.',
unrecognisedDepartment: 'unrecognised department'
});
});

Expand Down Expand Up @@ -221,7 +235,7 @@ describe('ARI handling', () => {
});
});

test('ARI with unrecognised department is skipped', async () => {
test('ARI with unrecognised department is skipped and dept name is reported in a field', async () => {
const handleARI = await ariUtils.handleIncomingARI({
...sampleARIQuestion,
department: 'Unrecognised Department name'
Expand All @@ -231,7 +245,8 @@ describe('ARI handling', () => {
actionTaken: 'none',
success: false,
message:
'Failed to map ARI data to octopus data. User not found for department: Unrecognised Department name.'
'Failed to map ARI data to octopus data. User not found for department: Unrecognised Department name.',
unrecognisedDepartment: 'Unrecognised Department name'
});
});

Expand Down Expand Up @@ -281,6 +296,25 @@ describe('ARI handling', () => {
});
});

test('Unrecognised topics are reported', async () => {
const handleARI = await ariUtils.handleIncomingARI({
...sampleARIQuestion,
topics: [...sampleARIQuestion.topics, 'unrecognised topic']
});
expect(handleARI).toMatchObject({
actionTaken: 'none',
success: true,
publicationVersion: {
topics: [
{
id: 'test-topic-1a'
}
]
},
unrecognisedTopics: ['unrecognised topic']
});
});

test('Keywords update when fieldsOfResearch/tags change', async () => {
const handleARI = await ariUtils.handleIncomingARI({
...sampleARIQuestion,
Expand Down Expand Up @@ -352,3 +386,32 @@ describe('ARI handling', () => {
});
});
});

describe('ARI import processes', () => {
beforeEach(async () => {
await testUtils.clearDB();
await testUtils.testSeed();
});

test('Incremental import endpoint requires API key', async () => {
const triggerImport = await testUtils.agent.post('/integrations/ari/incremental');

expect(triggerImport.status).toEqual(401);
expect(triggerImport.body).toMatchObject({
message: "Please provide a valid 'apiKey'."
});
});

test('Incremental ingest cancels if already in progress', async () => {
// Create an open ended log first.
await ingestLogService.create('ARI');
const triggerImport = await testUtils.agent
.post('/integrations/ari/incremental')
.query({ apiKey: process.env.TRIGGER_ARI_INGEST_API_KEY });

expect(triggerImport.status).toEqual(202);
expect(triggerImport.body).toMatchObject({
message: 'Cancelling ingest. Either an import is already in progress or the last import failed.'
});
});
});
Loading
Loading