Skip to content
This repository has been archived by the owner on Aug 18, 2021. It is now read-only.

Auto refresh team nav data; improve image handling #38

Merged
merged 10 commits into from
Nov 15, 2020

Conversation

zephraph
Copy link
Contributor

@zephraph zephraph commented Nov 15, 2020

This was originally going to be my hackathon entry but I unfortunately didn't finish it on time. Anyway, better late than never.

Problem

The rebuild of team nav was completely static. It statically generated all the pages but never updated them. To get fresh data from the google sheet a new deployment had to be sent out (which was a whole thing). The ultimate reason for that is that reading from gsheets is slow and prone to being rate limited. The last version of team nav required an external database to sync to which I wanted to try to avoid (more on that later). On top of that, the image handling was very clumsy and complicated using an amalgamation of simpledb, s3, and duct tape. The S3 bucket never cleaned up after itself, the resizing was problematic... yeah.

Solution

This PR does two major things (albeit there's a lot to it)

  1. Enables Next.js' incremental static site regeneration.
  2. Leverages Next.js' automatic image optimization to vastly simplify the image processing workflow.

Let's dig into the details a bit

Incremental Static Site Regeneration

The principle here is that we want to statically generate these pages (they don't change that often so there's no need to continually update them), but when they do change we want to ensure the changes are eventually reflected without human intervention. Essentially this means that each page will have a revalidation timer. After the elapse of a page's revalidation time, the next time a user visits the page it'll kick off the process of regenerating the page in the background (that user immediately gets the stale page). That means if we set the revalidation times to 5 minutes, those pages will at most be regenerated every 5 minutes. If the page isn't getting traffic, it's not being revalidated.

Let's talk constraints. I didn't want to use gsheets as the source of truth every time pages were being updated. While theoretically possible with incremental regeneration (given the time between page updates), pulling down that data and filtering/caching/formatting isn't necessarily something I wanted to do. I also didn't want to depend on an external postgres or mongodb if it could be avoided. We're talking about something like 200 rows of data, it's really not that complicated. Of course, it would be nice if it's relatively quick regardless.

The solution I came up w/ is to use an ORM called prisma to interface w/ a sqlite database which I store the data downloaded from the gsheet into. This separates the copy down process from the data lookup process which makes things a little easier to reason about (in my mind). You can see the database schema in schema.prisma.

There's a sync api (src/pages/api/sync.ts) which grabs the sheets data, uploads new images to S3, and upserts the data into sqlite using prisma. There's a little cleanup/validation that happens at that stage too, but hopefully that's all relatively self explanatory.

The sqlite database is populated at build time by the script yarn prime-cache. On production there's a cron job that runs yarn prime-cache roughly every work hour during the work week. The sqlite database can also be updated at any time by calling /api/sync on the service. That'll be behind the same auth firewall, so you'll have to be logged in first.

You'll notice a lot of UI shuffling resulting from the original gsheet parsing I was doing to the more structured prisma data access.

Optimizing Image Delivery

My initial design was problematic for many reasons. I built a complicated caching system for remembering what images had been uploaded to S3, but that just ended up with a lot of duplicates. I opted to simplify that process by querying S3 to see what images were already stored there and using a specific naming system (user slug + hash of the image url) to see if the image should be uploaded. Since automatic image optimization was released w/ Next.js 10, I get to simplify further and only send 1 image up to S3 per user and let Next handle resizing. That means less data in S3 overall and it makes it easier to delete old images from the user.

Next.js automatic image handling is pretty neat. You give it a range of image sizes, use their image component, and it'll do the resize (and store that locally) automatically for you. We could also use a service like cloudinary, but I just opted to keep it contained for now.


There's a lot here. Happy to answer questions.

cc @damassi, @dblandin, @anandaroop, @erikdstock, @bhoggard as I firmly recognize you all as my fellow Nextians at this point 😉

@@ -133,3 +133,34 @@ spec:
backend:
serviceName: team-web-internal
servicePort: 3000
---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shamelessly stole this from horizon

withTM({
reactStrictMode: true,
images: {
domains: [`${process.env.IMAGE_BUCKET}.s3.amazonaws.com`],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With automatic image optimization you have to allowlist the domains the images come from that you want to cache.

module.exports = withBundleAnalyzer(withTM({}));
module.exports = withBundleAnalyzer(
withTM({
reactStrictMode: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damassi an interesting flag to automatically apply react strict mode to the whole app.

Comment on lines +63 to 68
"_moduleAliases": {
"lodash-es": "node_modules/lodash",
"utils": "src/utils",
"components": "src/components",
"data": "src/data"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required by the module-alias module. I added this so that I could use ts-node to run some of the scripts like prime-cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice package!

Comment on lines +1 to +2
// This is your Prisma schema file,
// learn more about it in the docs: https://pris.ly/d/prisma-schema
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using prisma has been a pretty interesting journey. Their tooling (especially the editor tooling to write these schema files) is top notch. Most things just work, but sometimes it's hard to find out how to do certain things (like removing relations from an entity... use set). I've still been seeing some weird things where there are a ton of processes managing database connections and I haven't quite figured out what I'm doing wrong with that. Otherwise, it works really well.

Comment on lines +30 to +31
location Location? @relation(fields: [locationSlug], references: [slug])
locationSlug String?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for a 1-to-many relationship

Comment on lines +33 to +35
manager Member? @relation("MemberToMember", fields: [managerSlug], references: [slug])
reports Member[] @relation("MemberToMember")
managerSlug String?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are required for a self referring one to many. It looks pretty complicated to grok, but the editor will literally write all this for you after you write the manager Member? part. It's pretty amazing.

(async () => {
const result = await sync();
console.log(result);
await disconnect();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't disconnect from prisma it'll leave the node process running indefinitely. FYI.

@@ -0,0 +1,20 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had the damnedest time getting ts-node to work right w/ these scripts. This special tsconfig along side module-alias and a special way of running the scripts (ts-node --dir ./script <script>) was the crazy specialness I had to bundle together. The --dir flag is important because that changes how ts-node looks for tsconfig files and without this tsconfig file it definitely doesn't work, ha.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on just keeping as .js and using // @ts-check? Have found this pretty sufficient in various force code-paths that don't intersect main asset pipeline.

@@ -0,0 +1,16 @@
/**
* This file writes out the google credentials required to access the google drive api to read our team images.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing is a little annoying. So far as I'm aware you can't feed these credentials into google's node sdk. They literally have to be in a file. I'm assuming that's just the standard way they do things, but it's annoying all the same.

Thankfully, there's a hack for that. I'm saying that enough lately that I should print stickers and tshirts.

Comment on lines +9 to 24
interface MemberDetails {
member: Member & {
orgs: Organization[];
locations: Location[];
teams: Team[];
subteams: Subteam[];
manager: {
name: string;
slug: string;
};
reports: {
name: string;
slug: string;
}[];
};
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it was hard to extract the types of what a query result would be to pass it around to different places. It'd be easier if I centralized all the queries, which I may do... but for now there's a few places w/ this pretty verbose/ugly type redeclaration.

@@ -63,10 +75,10 @@ export function MemberDetails({ member }: MemberDetailsProps) {
</Serif>
<span>
{orgs.map((org) => (
<Fragment key={`org-${normalizeParam(org)}`}>
<RouterLink href={`/org/${normalizeParam(org)}`} passHref>
<Fragment key={`org-${org.slug}`}>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontloaded all the slug generation. Actually use all the slugs as ids. Makes things much easier.

@@ -125,11 +134,7 @@ export function MemberDetails({ member }: MemberDetailsProps) {
<Serif size="4" weight="semibold">
Manager:
</Serif>
<RouterLink
href={"/member/[member]"}
as={`/member/${normalizeParam(manager.name)}`}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next.js 10 got rid of the need to have these as props when you're linking to a dynamic page. Thank goodness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 😌

Comment on lines -42 to -72
const aggregateMemberLinks = (
members: Member[],
field: keyof Member,
prefix: string
) => {
const isArrayType = Array.isArray(members[0][field]);
let aggregateGroup: [fieldValue: string, group: Member[]][] = [];
if (isArrayType) {
let fieldAggregate: string[] = Array.from(
new Set(members.flatMap((member) => member[field]))
) as string[];
aggregateGroup = fieldAggregate.map((fieldValue) => [
fieldValue,
members.filter((member) =>
(member[field] as string[]).includes(fieldValue)
),
]);
}

return (isArrayType
? aggregateGroup
: Object.entries(groupBy(members, field))
)
.map(([fieldValue, group]) => ({
text: fieldValue,
count: (group as any)?.length,
href: `/${prefix}/${normalizeParam(fieldValue)}`,
}))
.filter(({ text }) => text);
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Ha, being able to query for this data makes the logic tons easier. You can see what replaced this in data/sidebar.ts.

Copy link
Member

@damassi damassi Nov 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me think of...

Screen Shot 2020-11-15 at 10 48 50 AM

(from graphql: a success story at paypal)

Wondering if we should do https://www.prisma.io/docs/concepts/overview/prisma-in-your-stack/graphql at some point?

Comment on lines +21 to +35
const pulse = keyframes`
0% { background-color: ${color("black10")}; }
50% { background-color: ${color("black5")}; }
100% { background-color: ${color("black10")}; }
`;

const AvatarContainer = styled(Box)`
flex-shrink: 0;
position: relative;
width: 100px;
height: 100px;
background-color: ${color("black10")};
animation: ${pulse} 2s ease-in-out infinite;
border-radius: 50%;
`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to essentially rip this from palette. We need to export that animation somewhere if we're not already.

@@ -2,6 +2,14 @@ import sharp from "sharp";
import S3 from "aws-sdk/clients/s3";
import stream from "stream";
import { google } from "googleapis";
import memoize from "p-memoize";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using this memoize library because it gives the ability to add a timeout which I find incredibly handy.

* S3 bucket. This method is cached for an hour and can be cleared by
* calling `memoize.clear(listS3Images)`.
*/
export const listS3Images = memoize(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Clear this image cache if images were uploaded or deleted after a sync. Don't want to do it between every user image upload but at the end.... somehow.

* processing are done you can update the salt value to regenerate all
* the hashes.
*/
const imageUrlHash = hash(imageUrl.toString() + "salt2");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The salt here is a little hack if you want to regenerate all images for some reason. Like if you changed the base size or did something w/ sharp.

Comment on lines +96 to +98
const [ErrorAccessingS3Images, S3Images] = await to(
listS3Images(bucketPrefix)
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await-to-js is probably hands down my most used tool for writing async logic.


log.info(`Uploading image for ${userSlug}`);

const normalizer = sharp().rotate().resize(500, 500);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of magic in this one line. It normalizes the image sizes and corrects their orientation. Without it some folks would be sideways and a lot of folks would be squashed. Giving both sizes is super important as providing only one still ends up with squashing. Sharp does a decent job of finding a face and cropping appropriately.

@@ -0,0 +1,30 @@
import { PrismaClient } from "@prisma/client";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me trying to figure out how the heck to not have some many connection processes flying around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So did this end up solving the problem you noted above?

some weird things where there are a ton of processes managing database connections

Comment on lines +4 to +8
const results = await prisma.location.findMany({
include: {
members: true,
},
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lost a lot of time with a bug here. I was trying to do a select statement and select only what was needed from the child relation (members), but I was getting this prisma panic. Only way to get around it was just to include the whole child. I had similar logic in other places that this didn't happen to so idk what was up.

return values.length > 0 ? values : undefined;
};

export async function sync() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ Thus begins the magic ✨

await prisma.subteam.upsert(body(slug, subteam));
}

// 3. Delete any orphaned members
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clean up step from earlier runs (if any have happened)

["Organizations", "org", await getOrgLinksData()],
["Teams", "team", await getTeamLinksData()],
];
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned previously (and obviously non-blocking) but at this point it seems like it would be pretty straightforward to hoist this into some schema resolvers and graphql it on the client.

};
};

export const getStaticProps: GetStaticProps = async ({ params }) => {
const members = await getMembers();
const locationSlug = params?.location as string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its funny how elegant it is to mix client / server code in Next... nbd

Copy link
Member

@damassi damassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @zephraph 👍 No substantial feedback other than getting this on prisma seems like a good step towards spinning up a light GraphQL api for us to read from, but as it is it all makes sense!

@zephraph
Copy link
Contributor Author

I'm gonna iterate on this. I'll ship it out to staging and see how it does.

@zephraph zephraph merged commit 5362825 into master Nov 15, 2020
@zephraph zephraph deleted the mirror-data-locally branch November 15, 2020 19:35
@artsy-peril artsy-peril bot mentioned this pull request Nov 16, 2020
Copy link
Member

@anandaroop anandaroop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff 😎

@@ -125,11 +134,7 @@ export function MemberDetails({ member }: MemberDetailsProps) {
<Serif size="4" weight="semibold">
Manager:
</Serif>
<RouterLink
href={"/member/[member]"}
as={`/member/${normalizeParam(manager.name)}`}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 😌

return values.length > 0 ? values : undefined;
};

export default async function sync(req: NowRequest, res: NowResponse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually have these API handlers typed as NextApiRequest and NextApiResponse — is this just legacy or is there a substantive difference here?

@@ -0,0 +1,30 @@
import { PrismaClient } from "@prisma/client";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So did this end up solving the problem you noted above?

some weird things where there are a ton of processes managing database connections

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants