Auto refresh team nav data; improve image handling #38

zephraph · 2020-11-15T05:52:18Z

This was originally going to be my hackathon entry but I unfortunately didn't finish it on time. Anyway, better late than never.

Problem

The rebuild of team nav was completely static. It statically generated all the pages but never updated them. To get fresh data from the google sheet a new deployment had to be sent out (which was a whole thing). The ultimate reason for that is that reading from gsheets is slow and prone to being rate limited. The last version of team nav required an external database to sync to which I wanted to try to avoid (more on that later). On top of that, the image handling was very clumsy and complicated using an amalgamation of simpledb, s3, and duct tape. The S3 bucket never cleaned up after itself, the resizing was problematic... yeah.

Solution

This PR does two major things (albeit there's a lot to it)

Enables Next.js' incremental static site regeneration.
Leverages Next.js' automatic image optimization to vastly simplify the image processing workflow.

Let's dig into the details a bit

Incremental Static Site Regeneration

The principle here is that we want to statically generate these pages (they don't change that often so there's no need to continually update them), but when they do change we want to ensure the changes are eventually reflected without human intervention. Essentially this means that each page will have a revalidation timer. After the elapse of a page's revalidation time, the next time a user visits the page it'll kick off the process of regenerating the page in the background (that user immediately gets the stale page). That means if we set the revalidation times to 5 minutes, those pages will at most be regenerated every 5 minutes. If the page isn't getting traffic, it's not being revalidated.

Let's talk constraints. I didn't want to use gsheets as the source of truth every time pages were being updated. While theoretically possible with incremental regeneration (given the time between page updates), pulling down that data and filtering/caching/formatting isn't necessarily something I wanted to do. I also didn't want to depend on an external postgres or mongodb if it could be avoided. We're talking about something like 200 rows of data, it's really not that complicated. Of course, it would be nice if it's relatively quick regardless.

The solution I came up w/ is to use an ORM called prisma to interface w/ a sqlite database which I store the data downloaded from the gsheet into. This separates the copy down process from the data lookup process which makes things a little easier to reason about (in my mind). You can see the database schema in schema.prisma.

There's a sync api (src/pages/api/sync.ts) which grabs the sheets data, uploads new images to S3, and upserts the data into sqlite using prisma. There's a little cleanup/validation that happens at that stage too, but hopefully that's all relatively self explanatory.

The sqlite database is populated at build time by the script yarn prime-cache. On production there's a cron job that runs yarn prime-cache roughly every work hour during the work week. The sqlite database can also be updated at any time by calling /api/sync on the service. That'll be behind the same auth firewall, so you'll have to be logged in first.

You'll notice a lot of UI shuffling resulting from the original gsheet parsing I was doing to the more structured prisma data access.

Optimizing Image Delivery

My initial design was problematic for many reasons. I built a complicated caching system for remembering what images had been uploaded to S3, but that just ended up with a lot of duplicates. I opted to simplify that process by querying S3 to see what images were already stored there and using a specific naming system (user slug + hash of the image url) to see if the image should be uploaded. Since automatic image optimization was released w/ Next.js 10, I get to simplify further and only send 1 image up to S3 per user and let Next handle resizing. That means less data in S3 overall and it makes it easier to delete old images from the user.

Next.js automatic image handling is pretty neat. You give it a range of image sizes, use their image component, and it'll do the resize (and store that locally) automatically for you. We could also use a service like cloudinary, but I just opted to keep it contained for now.

There's a lot here. Happy to answer questions.

cc @damassi, @dblandin, @anandaroop, @erikdstock, @bhoggard as I firmly recognize you all as my fellow Nextians at this point 😉

zephraph · 2020-11-15T05:54:49Z

hokusai/production.yml

@@ -133,3 +133,34 @@ spec:
            backend:
              serviceName: team-web-internal
              servicePort: 3000
+---


I shamelessly stole this from horizon

zephraph · 2020-11-15T05:55:33Z

next.config.js

+  withTM({
+    reactStrictMode: true,
+    images: {
+      domains: [`${process.env.IMAGE_BUCKET}.s3.amazonaws.com`],


With automatic image optimization you have to allowlist the domains the images come from that you want to cache.

zephraph · 2020-11-15T05:55:57Z

next.config.js

-module.exports = withBundleAnalyzer(withTM({}));
+module.exports = withBundleAnalyzer(
+  withTM({
+    reactStrictMode: true,


@damassi an interesting flag to automatically apply react strict mode to the whole app.

zephraph · 2020-11-15T05:57:18Z

package.json

+  "_moduleAliases": {
+    "lodash-es": "node_modules/lodash",
+    "utils": "src/utils",
+    "components": "src/components",
+    "data": "src/data"
  }


This is required by the module-alias module. I added this so that I could use ts-node to run some of the scripts like prime-cache.

Nice package!

zephraph · 2020-11-15T06:01:06Z

prisma/schema.prisma

+// This is your Prisma schema file,
+// learn more about it in the docs: https://pris.ly/d/prisma-schema


Using prisma has been a pretty interesting journey. Their tooling (especially the editor tooling to write these schema files) is top notch. Most things just work, but sometimes it's hard to find out how to do certain things (like removing relations from an entity... use set). I've still been seeing some weird things where there are a ton of processes managing database connections and I haven't quite figured out what I'm doing wrong with that. Otherwise, it works really well.

zephraph · 2020-11-15T06:01:44Z

prisma/schema.prisma

+  location     Location? @relation(fields: [locationSlug], references: [slug])
+  locationSlug String?


This is required for a 1-to-many relationship

zephraph · 2020-11-15T06:02:52Z

prisma/schema.prisma

+  manager     Member?  @relation("MemberToMember", fields: [managerSlug], references: [slug])
+  reports     Member[] @relation("MemberToMember")
+  managerSlug String?


These are required for a self referring one to many. It looks pretty complicated to grok, but the editor will literally write all this for you after you write the manager Member? part. It's pretty amazing.

zephraph · 2020-11-15T06:03:45Z

scripts/prime-cache.ts

+(async () => {
+  const result = await sync();
+  console.log(result);
+  await disconnect();


If you don't disconnect from prisma it'll leave the node process running indefinitely. FYI.

zephraph · 2020-11-15T06:06:09Z

scripts/tsconfig.json

@@ -0,0 +1,20 @@
+{


Had the damnedest time getting ts-node to work right w/ these scripts. This special tsconfig along side module-alias and a special way of running the scripts (ts-node --dir ./script <script>) was the crazy specialness I had to bundle together. The --dir flag is important because that changes how ts-node looks for tsconfig files and without this tsconfig file it definitely doesn't work, ha.

Thoughts on just keeping as .js and using // @ts-check? Have found this pretty sufficient in various force code-paths that don't intersect main asset pipeline.

zephraph · 2020-11-15T06:07:35Z

scripts/write-credentials.ts

@@ -0,0 +1,16 @@
+/**
+ * This file writes out the google credentials required to access the google drive api to read our team images.


This whole thing is a little annoying. So far as I'm aware you can't feed these credentials into google's node sdk. They literally have to be in a file. I'm assuming that's just the standard way they do things, but it's annoying all the same.

Thankfully, there's a hack for that. I'm saying that enough lately that I should print stickers and tshirts.

zephraph · 2020-11-15T06:09:05Z

src/components/MemberDetails.tsx

+interface MemberDetails {
+  member: Member & {
+    orgs: Organization[];
+    locations: Location[];
+    teams: Team[];
+    subteams: Subteam[];
+    manager: {
+      name: string;
+      slug: string;
+    };
+    reports: {
+      name: string;
+      slug: string;
+    }[];
+  };
 }


Sometimes it was hard to extract the types of what a query result would be to pass it around to different places. It'd be easier if I centralized all the queries, which I may do... but for now there's a few places w/ this pretty verbose/ugly type redeclaration.

zephraph · 2020-11-15T06:09:42Z

src/components/MemberDetails.tsx

@@ -63,10 +75,10 @@ export function MemberDetails({ member }: MemberDetailsProps) {
            </Serif>
            <span>
              {orgs.map((org) => (
-                <Fragment key={`org-${normalizeParam(org)}`}>
-                  <RouterLink href={`/org/${normalizeParam(org)}`} passHref>
+                <Fragment key={`org-${org.slug}`}>


Frontloaded all the slug generation. Actually use all the slugs as ids. Makes things much easier.

zephraph · 2020-11-15T06:10:21Z

src/components/MemberDetails.tsx

@@ -125,11 +134,7 @@ export function MemberDetails({ member }: MemberDetailsProps) {
            <Serif size="4" weight="semibold">
              Manager:
            </Serif>
-            <RouterLink
-              href={"/member/[member]"}
-              as={`/member/${normalizeParam(manager.name)}`}


Next.js 10 got rid of the need to have these as props when you're linking to a dynamic page. Thank goodness.

zephraph · 2020-11-15T06:11:30Z

src/components/Sidebar/index.tsx

-const aggregateMemberLinks = (
-  members: Member[],
-  field: keyof Member,
-  prefix: string
-) => {
-  const isArrayType = Array.isArray(members[0][field]);
-  let aggregateGroup: [fieldValue: string, group: Member[]][] = [];
-  if (isArrayType) {
-    let fieldAggregate: string[] = Array.from(
-      new Set(members.flatMap((member) => member[field]))
-    ) as string[];
-    aggregateGroup = fieldAggregate.map((fieldValue) => [
-      fieldValue,
-      members.filter((member) =>
-        (member[field] as string[]).includes(fieldValue)
-      ),
-    ]);
-  }
-
-  return (isArrayType
-    ? aggregateGroup
-    : Object.entries(groupBy(members, field))
-  )
-    .map(([fieldValue, group]) => ({
-      text: fieldValue,
-      count: (group as any)?.length,
-      href: `/${prefix}/${normalizeParam(fieldValue)}`,
-    }))
-    .filter(({ text }) => text);
-};
-


Yikes. Ha, being able to query for this data makes the logic tons easier. You can see what replaced this in data/sidebar.ts.

Makes me think of...

(from graphql: a success story at paypal)

Wondering if we should do https://www.prisma.io/docs/concepts/overview/prisma-in-your-stack/graphql at some point?

zephraph · 2020-11-15T06:12:22Z

src/components/TeamMember.tsx

+const pulse = keyframes`
+  0% { background-color: ${color("black10")}; }
+  50% { background-color: ${color("black5")}; }
+  100% { background-color: ${color("black10")}; }
+`;
+
 const AvatarContainer = styled(Box)`
  flex-shrink: 0;
  position: relative;
+  width: 100px;
+  height: 100px;
+  background-color: ${color("black10")};
+  animation: ${pulse} 2s ease-in-out infinite;
+  border-radius: 50%;
+`;


Had to essentially rip this from palette. We need to export that animation somewhere if we're not already.

zephraph · 2020-11-15T06:13:13Z

src/data/image.ts

@@ -2,6 +2,14 @@ import sharp from "sharp";
 import S3 from "aws-sdk/clients/s3";
 import stream from "stream";
 import { google } from "googleapis";
+import memoize from "p-memoize";


I'm using this memoize library because it gives the ability to add a timeout which I find incredibly handy.

zephraph · 2020-11-15T06:15:07Z

src/data/image.ts

+ * S3 bucket. This method is cached for an hour and can be cleared by
+ * calling `memoize.clear(listS3Images)`.
+ */
+export const listS3Images = memoize(


TODO: Clear this image cache if images were uploaded or deleted after a sync. Don't want to do it between every user image upload but at the end.... somehow.

zephraph · 2020-11-15T06:15:55Z

src/data/image.ts

+   * processing are done you can update the salt value to regenerate all
+   * the hashes.
+   */
+  const imageUrlHash = hash(imageUrl.toString() + "salt2");


The salt here is a little hack if you want to regenerate all images for some reason. Like if you changed the base size or did something w/ sharp.

zephraph · 2020-11-15T06:16:50Z

src/data/image.ts

+  const [ErrorAccessingS3Images, S3Images] = await to(
+    listS3Images(bucketPrefix)
+  );


await-to-js is probably hands down my most used tool for writing async logic.

zephraph · 2020-11-15T06:18:46Z

src/data/image.ts

+
+  log.info(`Uploading image for ${userSlug}`);
+
+  const normalizer = sharp().rotate().resize(500, 500);


There's a lot of magic in this one line. It normalizes the image sizes and corrects their orientation. Without it some folks would be sideways and a lot of folks would be squashed. Giving both sizes is super important as providing only one still ends up with squashing. Sharp does a decent job of finding a face and cropping appropriately.

zephraph · 2020-11-15T06:19:28Z

src/data/prisma.ts

@@ -0,0 +1,30 @@
+import { PrismaClient } from "@prisma/client";


Me trying to figure out how the heck to not have some many connection processes flying around.

So did this end up solving the problem you noted above?

some weird things where there are a ton of processes managing database connections

zephraph · 2020-11-15T06:21:11Z

src/data/sidebar.ts

+  const results = await prisma.location.findMany({
+    include: {
+      members: true,
+    },
+  });


I lost a lot of time with a bug here. I was trying to do a select statement and select only what was needed from the child relation (members), but I was getting this prisma panic. Only way to get around it was just to include the whole child. I had similar logic in other places that this didn't happen to so idk what was up.

zephraph · 2020-11-15T06:22:43Z

src/pages/api/sync.ts

+  return values.length > 0 ? values : undefined;
+};
+
+export async function sync() {


✨ Thus begins the magic ✨

zephraph · 2020-11-15T06:23:42Z

src/pages/api/sync.ts

+    await prisma.subteam.upsert(body(slug, subteam));
+  }
+
+  // 3. Delete any orphaned members


This is a clean up step from earlier runs (if any have happened)

damassi · 2020-11-15T18:57:54Z

src/data/sidebar.ts

+    ["Organizations", "org", await getOrgLinksData()],
+    ["Teams", "team", await getTeamLinksData()],
+  ];
+};


Mentioned previously (and obviously non-blocking) but at this point it seems like it would be pretty straightforward to hoist this into some schema resolvers and graphql it on the client.

damassi · 2020-11-15T19:00:17Z

src/pages/location/[location].tsx

  };
 };

 export const getStaticProps: GetStaticProps = async ({ params }) => {
-  const members = await getMembers();
+  const locationSlug = params?.location as string;


Its funny how elegant it is to mix client / server code in Next... nbd

damassi

Amazing work @zephraph 👍 No substantial feedback other than getting this on prisma seems like a good step towards spinning up a light GraphQL api for us to read from, but as it is it all makes sense!

zephraph · 2020-11-15T19:35:42Z

I'm gonna iterate on this. I'll ship it out to staging and see how it does.

anandaroop

Good stuff 😎

anandaroop · 2020-11-17T13:49:31Z

src/components/MemberDetails.tsx

@@ -125,11 +134,7 @@ export function MemberDetails({ member }: MemberDetailsProps) {
            <Serif size="4" weight="semibold">
              Manager:
            </Serif>
-            <RouterLink
-              href={"/member/[member]"}
-              as={`/member/${normalizeParam(manager.name)}`}


anandaroop · 2020-11-17T14:04:14Z

src/pages/api/sync.ts

+  return values.length > 0 ? values : undefined;
+};
+
+export default async function sync(req: NowRequest, res: NowResponse) {


I usually have these API handlers typed as NextApiRequest and NextApiResponse — is this just legacy or is there a substantive difference here?

anandaroop · 2020-11-17T14:23:24Z

src/data/prisma.ts

@@ -0,0 +1,30 @@
+import { PrismaClient } from "@prisma/client";


So did this end up solving the problem you noted above?

some weird things where there are a ton of processes managing database connections

zephraph added 10 commits November 11, 2020 20:00

Add prisma

35d1919

Integrate sqlite via prisma, syncing from sheet

ed191ea

Delete orphaned members

81a37a2

Remove empty reports rendering, clean unused imports

a8268a0

Ignore yarn error log

edd2333

Improve prisma connection handling

86369fb

Revamp image handling; fix rendering issues

41b0dd0

Add revalidation to all the pages

de9887b

Make fallback blocking so 404 pages work

15ad86a

Add cron job to refresh local cache

3dacf64

zephraph commented Nov 15, 2020

View reviewed changes

damassi reviewed Nov 15, 2020

View reviewed changes

damassi approved these changes Nov 15, 2020

View reviewed changes

zephraph merged commit 5362825 into master Nov 15, 2020

zephraph deleted the mirror-data-locally branch November 15, 2020 19:35

artsy-peril bot mentioned this pull request Nov 16, 2020

Deploy #41

Merged

anandaroop reviewed Nov 17, 2020

View reviewed changes

		// This is your Prisma schema file,
		// learn more about it in the docs: https://pris.ly/d/prisma-schema

		location Location? @relation(fields: [locationSlug], references: [slug])
		locationSlug String?

		@@ -0,0 +1,16 @@
		/**
		* This file writes out the google credentials required to access the google drive api to read our team images.


		log.info(`Uploading image for ${userSlug}`);

		const normalizer = sharp().rotate().resize(500, 500);

		@@ -0,0 +1,30 @@
		import { PrismaClient } from "@prisma/client";

Auto refresh team nav data; improve image handling #38

Auto refresh team nav data; improve image handling #38

Conversation

zephraph commented Nov 15, 2020 • edited Loading

Problem

Solution

Incremental Static Site Regeneration

Optimizing Image Delivery

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damassi Nov 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damassi left a comment • edited Loading

Choose a reason for hiding this comment

zephraph commented Nov 15, 2020

anandaroop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zephraph commented Nov 15, 2020 •

edited

Loading

damassi Nov 15, 2020 •

edited

Loading

damassi left a comment •

edited

Loading