Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OC-932: Deploy ARI integration for specific departments #701

Merged
merged 2 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion api/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,5 @@ LIST_USERS_API_KEY=12345
TRIGGER_ARI_INGEST_API_KEY=12345

SLACK_CHANNEL_EMAIL=example@mailinator.com
INGEST_REPORT_RECIPIENTS=example.one@mailinator.com,example.two@mailinator.com
INGEST_REPORT_RECIPIENTS=example.one@mailinator.com,example.two@mailinator.com
PARTICIPATING_ARI_USER_IDS=abc123,def456,ghi789
1 change: 1 addition & 0 deletions api/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ services:
- LIST_USERS_API_KEY=123456789
- TRIGGER_ARI_INGEST_API_KEY=123456789
- INGEST_REPORT_RECIPIENTS=example.jisc@mailinator.com
- PARTICIPATING_ARI_USER_IDS=

volumes:
opensearch-data1:
14 changes: 12 additions & 2 deletions api/scripts/fullAriImport.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ const fullAriImport = async (): Promise<string> => {
// Updates with each loop. Contains total count which we need outside the loop.
let paginationInfo;

console.log(`${((performance.now() - startTime) / 1000).toFixed(1)}: Retrieving ARIs from ARI DB...`);

do {
// Get page.
const response = await axios.get(pageUrl);
Expand All @@ -34,15 +36,23 @@ const fullAriImport = async (): Promise<string> => {
pageUrl = paginationInfo.links.next;
} while (pageUrl);

console.log(`${((performance.now() - startTime) / 1000).toFixed(1)}: Finished retrieving ARIs.`);

// In case something has caused this process to fail, perhaps the API changed, etc...
if (allAris.length !== paginationInfo.total) {
throw new Error('Number of ARIs retrieved does not match reported total. Stopping.');
}

// Remove archived aris.
const aris = allAris.filter((ari) => !ari.isArchived);
// Determine which departments are having their ARIs imported.
const participatingDepartmentNames = await ariUtils.getParticipatingDepartmentNames();

// Remove archived ARIs and ARIs from departments we are not importing.
const aris = allAris.filter(
(ari) => !ari.isArchived && participatingDepartmentNames.includes(ari.department.toLowerCase())
);

// Process all the ARIs.
console.log('Processing ARIs...');
const failed: I.HandledARI[] = [];
let createdCount = 0;
let updatedCount = 0;
Expand Down
1 change: 1 addition & 0 deletions api/serverless-offline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ provider:
LIST_USERS_API_KEY: ${env:LIST_USERS_API_KEY}
TRIGGER_ARI_INGEST_API_KEY: ${env:TRIGGER_ARI_INGEST_API_KEY}
INGEST_REPORT_RECIPIENTS: ${env:INGEST_REPORT_RECIPIENTS}
PARTICIPATING_ARI_USER_IDS: ${env:PARTICIPATING_ARI_USER_IDS}
deploymentBucket:
tags:
Project: Octopus
Expand Down
1 change: 1 addition & 0 deletions api/serverless.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ provider:
SLACK_CHANNEL_EMAIL: ${ssm:/slack_channel_email_${self:provider.stage}_octopus}
TRIGGER_ARI_INGEST_API_KEY: ${ssm:/trigger_ari_ingest_api_key_${self:provider.stage}_octopus}
INGEST_REPORT_RECIPIENTS: ${ssm:/ingest_report_recipients_${self:provider.stage}_octopus}
PARTICIPATING_ARI_USER_IDS: ${ssm:/participating_ari_user_ids_${self:provider.stage}_octopus}
deploymentBucket:
tags:
Project: Octopus
Expand Down
9 changes: 7 additions & 2 deletions api/src/components/integration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,13 @@ The ARI DB is a UK government database storing research questions that governmen

#### ARI import process

ARIs can be excluded from the import process for one of these reasons:

- The `isArchived` field is `true`.
- The `department` field doesn't correspond to one of our participating departments.
- These are defined in an environment variable (`PARTICIPATING_ARI_USER_IDS`) as a comma-separated list of octopus user IDs. Each ID corresponds to the organisational account representing an ARI department.
- At import time, the user mapping table is consulted to get the expected names the departments have on the ARI DB side. As such, user mappings must also exist in the octopus DB for a department to be included properly.

On import, ARIs go through a handling flow:

- If no publication exists with the ARI's question ID in its `externalId` field, it is created as a new publication.
Expand All @@ -37,5 +44,3 @@ Various ARI fields are mapped to octopus ones in the `mapAriQuestionToPublicatio
Of particular importance is how ARIs are matched to an owning organisational user account. The mapping process expects a UserMapping to exist associating the `department` field value from the ARI (where the title matches, case insensitive, and the mapping source is 'ARI') with the user ID of an organisational account.

Topics are mapped similarly. If an ARI has values in its `topics` field, the mapping will check whether octopus has any TopicMappings in the database that match with them and associate the publication it creates/updates with the topic(s) from those mappings. If there are no topics listed on the ARI, the organisational user is expected to have a `defaultTopicId`, which is used as a fallback.

ARIs can be archived (`isArchived` field). These are not imported by Octopus.
10 changes: 10 additions & 0 deletions api/src/components/integration/ariUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -360,3 +360,13 @@ export const handleIncomingARI = async (question: I.ARIQuestion, dryRun?: boolea
};

export const ariEndpoint = 'https://ari.org.uk/api/questions?order_by=dateUpdated';

// Returns the mapped ARI department names for a given set of octopus organisational user IDs.
export const getParticipatingDepartmentNames = async (): Promise<string[]> => {
const participatingDepartmentIds = process.env.PARTICIPATING_ARI_USER_IDS?.split(',') ?? [];
const queryResults = await Promise.all(
participatingDepartmentIds.map((userId) => client.prisma.userMapping.findMany({ where: { userId } }))
);

return queryResults.flatMap((userMappings) => userMappings.map((userMapping) => userMapping.value));
};
16 changes: 10 additions & 6 deletions api/src/components/integration/service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ export const incrementalAriIngest = async (dryRun: boolean): Promise<string> =>
logId = log.id;
}

// Determine which departments are having their ARIs imported.
const participatingDepartmentNames = await ariUtils.getParticipatingDepartmentNames();

// Count sequential unchanged ARIs so that we can stop when the streak hits MAX_UNCHANGED_STREAK.
let unchangedStreak = 0;

Expand All @@ -55,12 +58,13 @@ export const incrementalAriIngest = async (dryRun: boolean): Promise<string> =>
const pageAris = response.data.data;

for (const pageAri of pageAris) {
if (mostRecentStart && new Date(pageAri.dateUpdated) < new Date(mostRecentStart) && !timeOverlap) {
timeOverlap = true;
console.log('Time overlap - reached an ARI older than the most recent ingest start time.');
}
// Skip archived ARIs and ones not from one of the participating departments.
if (!pageAri.isArchived && participatingDepartmentNames.includes(pageAri.department.toLowerCase())) {
if (mostRecentStart && new Date(pageAri.dateUpdated) < new Date(mostRecentStart) && !timeOverlap) {
timeOverlap = true;
console.log('Time overlap - reached an ARI older than the most recent ingest start time.');
}

if (!pageAri.isArchived) {
// Create, update, or skip this ARI as appropriate.
const handle = await ariUtils.handleIncomingARI(pageAri, dryRun);
checkedCount++;
Expand Down Expand Up @@ -121,7 +125,7 @@ export const incrementalAriIngest = async (dryRun: boolean): Promise<string> =>
// On the last run this will be undefined but that's fine because we won't need to repeat the loop.
paginationInfo = response.data.meta.pagination;
pageUrl = paginationInfo.links.next;
} while (pageUrl && unchangedStreak < MAX_UNCHANGED_STREAK && !timeOverlap);
} while (pageUrl && unchangedStreak < MAX_UNCHANGED_STREAK && !timeOverlap && participatingDepartmentNames.length);

const end = new Date();
// Get duration in seconds to the nearest 1st decimal place.
Expand Down