A curated list of awesome and useful posts, videos, and articles on leading a data team. This includes leadership at the middle-management, Director/VP, or C-suite level, for organizations both big and small. A few relevant engineering management articles are sprinkled in.
- Hiring (17)
- Culture (15)
- Impact (8)
- Strategy (12)
- Diversity Equity and Inclusion (4)
- Project Management (9)
- Code Review (3)
- Organization Structure and Job Titles (19)
- ML and AI Within an Organization (13)
- BI and Analytics Within an Organization (21)
- Management Skills (8)
- Data Platforms (17)
- Data Governance (6)
Author | Title | One-sentence summary | Year |
---|---|---|---|
Eli Goldberg | Hire better data scientists: A field guide for hiring managers new to data science. Part 1. Creating better job descriptions brings in better talent. | When hiring, highlight the "why you", desecribe opportunities instead of responsibilities, describe key actions and background experience needed not technologies, and proofread! | 2020 |
Eli Goldberg | Hire better data scientists: A field guide for hiring managers new to data science Part 2. Create a clear interviewing process. | Make time for hiring and use your shift in priorities to your advantage, don't "wing it", write your process down and engineer it to be data driven, and modify the process not your adherence to it. | 2020 |
Gergely Orosz | Hiring (and Retaining) a Diverse Engineering Team | Stories from six engineering leaders who succeeded in building and growing diverse teams | 2021 |
“Are we being too harsh on junior candidates?” | Reddit thread discussing expectations of junior ML job candidates | 2022 | |
Hacker News | “When did 7 interviews become normal” | A “Ask HN” forum question around the topic of over-interviewing | 2022 |
Farhan Thawar | VP of Engineering hiring cheatsheet | A guide for assessing a candidate for a engineering or data leadership role: provides good and bad responses to questions. | 2022 |
Freaking Rectange Blog | How to Freaking Find Great Developers By Having Them Read Code | When hiring for data engineering, analytics, data science, or ML Engineering roles, it would be better to have candidates try to read code instead of writing it (it can be neutral interview-only code). | 2022 |
Emily Thompson | Hiring Data Scientists With Intention | Gives guidance on: writing a focused job description, being strategic in sourcing, and designing a structured interview process so that you can be consistent in evaluating candidates. | 2022 |
Nate Rosidi | 15 Python Coding Interview Questions You Must Know For Data Science | Provides 15 examples of testing basic python dta manipulation skills for interviews. | 2022 |
Jike Chong, Ben Lorica, Yue Cathy Chang | Top Places to Work for Data Scientists: We identify U.S. organizations that will help you develop your career in data science | Looks at factors that make a data science org attractive to an IC, but this provides some insights for hiring managers trying to get in the heads of talent. | 2022 |
Randy Au | Let's talk a bit about giving interviews | Gives thoughts on planning and carrying out a technical data science interview. | 2022 |
Michelangelo D'Agostino, Katie Malone | The Care and Feeding of Data Scientists, Chapters 2 and 3 | "How to Win Friends and Recruit Data Scientists" and "Interview with the Data Scientist" has tips on recruiting and interviewing. | 2019 |
Dip Ranjan Chatterjee | The Data Science Interview Book | A very comprehensive set of topics to interview data science candidates with (spans statistics, ML, NLP, etc). | 2022 |
Tristan Handy | When to hire a data engineer? | Article makes the claim that increasingly data analysts and scientists are working on ETL pipelines themselves (with the help of Stitch, Fivetran, dbt, etc.) but data engineers are still essential for: managing core data infrastructure, building and maintaining custom ingestion pipelines, supporting data team resources with design and performance optimization, and building non-SQL transformation pipelines. | 2022 |
Jacob Kaplan-Moss | My questions for prospective employers (Director/VP roles) | This post discusses the other side of the hiring table, and gives great questions a candidate for a Director or VP-level engineering leadership role should be asking (though this post could also be helpful to hiring team thinking through the scope of a Director or VP-level role). | 2019 |
Chip Huyen | What we look for in a resume | Outlines the resume evaluation process for a small startup looking for data talent and includes topics like looking for examples of persistence, looking for unique perspectives, and looking for metrics around business impact. | 2023 |
Mikhail Popov | Hiring a data scientist | A retrospective from the Wikimedia Foundation, of Wikipedia fame, sharing what they learned in the hiring process and how they discovered a better approach to interviewing for their data team. | 2017 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Emily Thompson | Growing Data Teams from Reactive to Influential | Reactive data teams lead to low impact and attrition, so instead acknowledge if your team is reactive, assess reactivity quantitatively, focus on near-term wins for cultural change, and build longer-term foundational work into the team’s capacity | 2022 |
Prukalpa Sankar | It’s Time for the Modern Data Culture Stack | We need a modern data culture stack: best practices, values, and cultural rituals that will help data people come together and collaborate effectively. | 2021 |
Kuba Niechcial | How to set goals for engineers? | Provides some examples of good engineer personnel goals and things to keep in mind (e.g. KPIs should not be personal goals). | 2021 |
Jacob Kaplan-Moss | “Exit Interviews Are a Trap” | Rethinking the exit interview: there is very little upside (unlikely things will change) and potentially significant downside (bad blood, retracted references, malicious actions by employer, etc. | 2022 |
Christoph Neijenhuis | How to stop shrinkage in engineering teams | The journey to stopping shrinkage in engineering teams is long and rarely straightforward, but there are practical things leaders can do to take control of the chaos, from taking steps to get out of survival mode and tackling problems around culture to involving teams in the development of a solid technical strategy. | 2022 |
Caitlin Moorman | Proficiency v. Creativity | It is critical to find a balance between open-endedness/opportunities for creativity and standardized rigor when leading a data function. | 2020 |
Shimin Zhang | Why a Meeting Costs More than a MacBook Pro – the Business Case for Fewer Developers in Meetings | Describes the opportunity cost of having all developers or data engineers attending meetings and describes ways to recoup this. | 2022 |
David Waller | 10 Steps to Creating a Data-Driven Culture | Details some steps for working towards a data-driven culture, from taking care in choosing metrics to quantifying uncertainty. | 2020 |
Michael Kaminsky | A Culture of Partnership | Building a culture of partnership on your analytics team is crucial to maximizing the impact your team can have. | 2019 |
Benn Stancil | Do data-driven companies actually win? | Article discusses how much a data-driven culture actually contributes to a company's successs through a handful of hypothetical fashion companies. | 2022 |
Michelangelo D'Agostino, Katie Malone | The Care and Feeding of Data Scientists, Chapter 4 | "Fear and Loathing in Data Science" offers concrete tips on culture that help to retain your best people. | 2019 |
Benjamin Rogojan | Onboarding For Data Teams | The costs (both opportunity costs and retention problems) of poor onboarding are great, to help with this the author writes about 'Onboarding For Context', 'Environment Set-Up', and the concept of 'Commit Something Day One'. | 2022 |
Prukalpa Sankar | The “knowledge-creating” company, a big announcement and other takeaways from dbt Coalesce | Prukalpa provides thoughts around this great quote from an early 90's HBR article: "...markets shift, technologies proliferate, competitors multiply, and products become obsolete almost overnight, successful companies are those that consistently create new knowledge, disseminate it widely throughout the organization, and quickly embody it in new technologies and products; these activities define the ‘knowledge-creating’ company, whose sole business is continuous innovation." | 2022 |
Christine Garcia | The secrets of a modern data leader: The first 365 days inside a data team | Fantastic video covering how leaders should nurture their data teams, build the right team values, establish governance inside the team, create cadences and rituals, etc. | 2022 |
Claire Carroll | Data education is broken | The post explores the disconnect between data education and real data practice in industry (e.g. analyzing static flat files in R, Pandas, or SPSS compared with using SQL along with tools like git, dbt, Airflow, VSCode, etc), why this occurs, and the effects it has on the data industry. | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
McKinsey | Ten red flags signaling your analytics program will fail. | A list ranging from the executive team doesn't have a clear vision for it's analytics program to nobody knows the quantitative impact that analytics is providing | 2018 |
Erik Bernhardsson | Building a data team at a mid-stage startup: a short story | A story about a fictional company that became more data-driven and how it was done. | 2021 |
Abinaya Sundarraj | Data Management: How to Stay on Top of Your Customer’s Mind? | Describes the virtues and challenges around achieving a customer-centric, data perspective in a business. | 2022 |
Mikkel Dengsøe | How to measure data quality: Practical guidelines for how to measure quality, engagement and productivity in a data team | Provides some thoughts around how to evaluate your data team and suggests three categories of metrics: quality, productivity, and engagement. | 2022 |
Sarah Krasnik | Choosing a Data Catalog | Although not technically on management, this tackles the critical topic of documentation, dictionaries, knowledge repos and such, which are critically important for a data org. | 2022 |
Chad Sanderson | The Existential Threat of Data Quality: and Why the Modern Data Stack Can't Solve It | Despite the rapidly-evolving/growing data stack, poor data quality remains an enormous problem; the article breaks it down into "downstream" and "upstream" categories. | 2022 |
Anna Geller | Should You Measure the Value of a Data Team? What to measure and whether you should | Wonderful discussion of the challenges of measuring a data team's impact, and provides clear examples of good, so-so, and poor metrics for measuring this performance. | 2023 |
Benn Stancil | A method for measuring analytical work: Our only job should be to make people more decisive. | Argues that much of the value of an analytics org is difficult to quantify, but perhaps these orgs should be valued (and measured on) their ability to reduce the time it takes to make decisions. | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Prukalpa Sankar | Data Advantage Matrix: A New Way to Think About Data Strategy | Break down your data advantage into four categories (e.g. operational, strategic, product, and business opportunity) and then assess what stage each of these is at (e.g. basic, intermediate, advanced) | 2021 |
Ilan Man | Creating a Data Road Map | Provides suggestions for what factors to consider when thinking about a data roadmap or data strategy (e.g. identifying the audience, set up the scaffolding, etc.) . | 2019 |
Chris Brown | Executing a Data Strategy with OKRs | Outlines how OKRs (Objectives and Key Results) can help with executing on data strategy and provides some examples. | 2022 |
Yali Sassoon | Organizations need to deliberately create data | Organizations spend an incredible amount of time and resources extracting data from various sources, but rarely consider making their own data to generate inputs for the ML systems. | 2022 |
Leo Polovets | The Value of Data, Part 1: Using Data as a Competitive Advantage | Software and hardware infrastructure are becoming commoditized, so data you generate gives you the advantage; data helps you make good content recommendations, helps with ad targetting, gives you actionable insights, makes operations more efficient, and more. | 2015 |
Leo Polovets | The Value of Data, Part 2: Building Valuable Datasets | Describes the attributes of high-value datasets, common approaches for capturing this data, and common pitfalls people fall into during this process (e.g. consider the law of diminishing returns, how clean is your data, etc.) | 2015 |
Leo Polovets | The Value of Data, Part 3: Data Business Models | Final post in this series describes the concept of a "Data Business Model", the reality of how data can be monetized with examples of companies in each scenario. | 2015 |
Emilie Schario and Taylor A Murphy | Run Your Data Team Like A Product Team | Service-oriented data teams aren’t effective, and the authors suggest running the data team like a product team is ideal, where you take a more active roll in defining your org's success metrics and push the business forward in a more active way. | 2021 |
Jeremy Salfen | Building a Data Practice from Scratch | Provides a series of suggestions for first data hires at an early stage startup, including the following principles: "don’t worry about making things fancy", "keep an eye on how things will scale, but rein in your impulses to optimize them", and "documentation, transparency, and reproducibility are interrelated and fundamental". | 2021 |
Brittany Bennett | Roadmapping as a Tool for Data Leaders | Author describes how to create a roadmap with their data team and how to use it to push for more team resources (includes ideation sessions with sticky notes, voting, generating a timeline, and then ultimately packaging this for the leadership to get the resources). | 2023 |
Raymond See | Tools and Techniques to Establish Your Data Team Early | Provides some tips for early-stage start-ups hoping to develop a data function (e.g. hire a few generalists, bring in the right tools, etc) | 2023 |
Prukalpa Sankar | A Behind-the-Scenes Look at How Postman’s Data Team Works: How Postman’s data team set up better onboarding, infrastructure, and processes while growing 4–5x in one year | Describes Postman's data team structure (contains central, embedded, and distributed memebers), how they handle prioritization, sprints, and the like. | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Sophia (Saeyoon) Baik | Building a Diverse Engineering Team in 2022: The Beginner’s Guide | Provides great summary and many links describing the state of DEI in tech engineering, along with why diversity helps boost productivity, and a number of suggestions on how to reduce hiring biases. | 2022 |
Sergio Morales | Future-proof your Analytics Efforts in 2020: Hire Diverse Teams | Post describes how data team diversity deters bias and encourages curiosity, skepticism and analytical thinking; attributes any analytics enterprise will highly value. | 2020 |
Swathi Young | How To Make Sure That Diversity In AI Works | Post provides guidance on how management teams can build diverse AI teams, including suggestions like restructuring talent acquisition, thinking through pay parity, and more. | 2021 |
Gergely Orosz | Hiring (and Retaining) a Diverse Engineering Team | Stories from six engineering leaders who succeeded in building and growing diverse teams | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Erik Bernhardsson | “Why software projects take longer than you think: a statistical model” | Adding up time estimates for many subtasks isnt advised, instead, figure out which tasks have the highest uncertainty – those tasks are basically going to dominate the time to completion. | 2019 |
Erik Bernhardsson | “σ-driven project management: when is the optimal time to give up?” | The post describes an abstract measure “alpha” that captures the risk of a project and based on that risk the post describes a statistical model that shows when one ought to give up on a project. | 2022 |
Michael Kaminsky | Agile Analytics, Part 1: The Good Stuff | When it comes to data science and analytics, these aspects of the scrum work flow work well: acceptance criteria, pointing, two-week chunks (sprints), and explicit prioritization. | 2018 |
Michael Kaminsky | Agile Analytics, Part 2: The Bad Stuff | Some aspects of agile don't work so well with data teams, these include: "The fortuitous finding", exploratory data analysis needs, product ownership / story-writing, and business-as-usual support. | 2018 |
Michael Kaminsky | Agile Analytics, Part 3: The Adjustments | Adjustments are suggested for agile to work well on a data team: time-bound spikes for research, build in slack time for exploration, acceptance criteria includes “write the next story”, peer-review instead of sprint-review. | 2018 |
Michelangelo D'Agostino, Katie Malone | The Care and Feeding of Data Scientists, Chapter 5 | "To Agile or Not to Agile". | 2019 |
Oscar Baruffa | Dealing with difficult stakeholders | Presents some approaches for handling difficult stakeholders that you need buy-in from, including things like take the path of least resistance, work towards getting stakeholders to think it's their idea, have lots of private conversations beforehand, and more. | 2022 |
Lucas F Costa | Useful engineering metrics and why velocity is not one of them | Covers four useful metrics that are easily attainable from JIRA that aren't easily gameable and can help you debug process problems: arrival rate, work in progress, throughput, and cycle time. | 2022 |
Leandro Carvalho | Data Product Canvas — A practical framework for building high-performance data products | Outlines the "Data Canvas" framework for building new data products, which is divided into 10 blocks (problem, solution, data, hypotheses, actors, actions, KPIs, values, risks and performance/impact), and separated by 3 domain areas: the product vision, the vision of the strategy, and the business vision. | 2022 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Gunnar Morling | The Code Review Pyramid | There should be a hierachy of effort in reviewing code, where more effort is spent on core concepts, how performant code is, and documentation, with less effort on test quality (though of course tests are important) and syntax. | 2022 |
Tim Hopper | Code Review Guidelines for Data Science Teams | In the context of data team, desecribes what a code review should achieve, bullets to carry out pull requests, and some links to additional reading. | 2020 |
Eric Ma | Practicing Code Review | In the context of data science the essay briefly describes the purpose of code review, what it should not be, and the value of it in data work. | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Rob Dearborn | Organizing and scaling an effective data team | General guidelines on what a properly-structured data team should look like, with describes ranging from 1-person data team to 32+ person team. | 2022 |
Brittany Bennett | Building Powerful Data Teams: On Investing in Junior Talent | Provides suggestions on how developing junior talent: blocking off time for personal development, celebrating this blocked off time, hiring tutors, and more. | 2021 |
Eric Colson | "Beware the data science pin factory: The power of the full-stack data science generalist and the perils of division of labor through function" | Beware specialization in data science (data science is not to execute. Rather, the goal is to learn and develop profound new business capabilities), as there are costs to specialization. | 2019 |
Chuong Do | "What is the most effective way to structure a data science team?" | Covers how should data scientist roles be defined (analysis vs building), where should data scientists report (centralized vs decentralized), where should the data science function live (engineering org vs product org vs independent consultancy), and what should an organization do to set up data science for success. | 2017 |
Mikkel Dengsøe | "Data team structure: embedded or centralised?" | There are three common models of how data teams are structured, each with their drawbacks and advantages: centralized, embedded, and hybrid. | 2022 |
Randy Bean | Chief Data Officers Struggle To Make A Business Impact | There is widespread disparity of opinion on what defines a successful Chief Data Officer, so it makes sense that only CDOs are poised for success according to a recent Gartner report. | 2019 |
Matthew Mayo | Data Scientist, Data Engineer & Other Data Careers, Explained | Explanations of various titles such as Data Architect, Data Engineer, Analyst, ML Engineer, and Data Scientist | 2022 |
Gergely Orosz | What Silicon Valley "Gets" about Software Engineers that Traditional Companies Do Not | The Silicon Valley treats engineers as autonomous adults who are smart people because that’s who they hire because that’s who can do the work they need done, while traditional companies tend to keep developers in pure execution roles. | 2021 |
Rifat Majumder | The Data Product Manager | Describes the emerging role of "Data Product Manager", and how benefits they provide an org: better business impact, a deep understanding of customer problems, and more clarity on priorities. | 2021 |
Benn Stancil | The technical pay gap: The culture we build is the culture we buy | Describes the current state of confusion around data titles (using the "analytics engineer" as an example), and describes how the tech industry overvalues technical skills at times. | 2022 |
Ben Darfler | Engineering Levels at Honeycomb: Avoiding the Scope Trap | Describes a nice framework for thinking about job levels, based on scope and level of project complexity. | 2022 |
Mikkel Dengsøe | Data teams are getting larger, faster | There are many problems you can encounter when your data team grows beyond a handful of people; the article provides some tips on working through these problems. | 2022 |
Jorge Fioranelli | A framework for Engineering Managers | Although not directly about data this is relevant: a framework for engineering managers to think through titles and expectations (including domains of technology, systems, people, process, and influence). | 2022 |
Pardis Noorzad | Models for integrating data science teams within companies: A comparative analysis | Compares different models for situating DS teams including the "center-of-excellence model", the "Accounting model", the "consultant model", the "embedded model", and more, and considers factors like "Coordination efficiency", "Employee happiness", and others. | 2019 |
Kurt Cagle | Why You Don’t Need Data Scientists | Early in an organization's data maturity stage, you don't need "data scientists" and machine learning people, you instead need to focus on data quality and ontological engineering problems. | 2018 |
Michelangelo D'Agostino, Katie Malone | The Care and Feeding of Data Scientists, Chapter 6 | "Chutes and Career Ladders" discusses how to write a great career ladder for your team. | 2019 |
Benjamin Rogojan | Different Types Of "Data Engineering" Teams | Post gives nice overview of the various flavors of data engineering roles in organizations (including software engineers, data platform engineers, etc). | 2022 |
Morgan Krey | Storytellers and System Builders: A New Way to Think About Data Roles | There has been a proliferation of "data X" roles (e.g. data engineer, data scientist, data analyst, etc) but the author argues that there are really just two kinds of data practitioners: system builders (your engineers that build pipelines, schedule jobs, stand up APIs, etc.) and storytellers (looking for actionable insights, visualizing data on dashboards, etc). | 2022 |
Mikkel Dengsøe | Data team as % of workforce: A deep dive into 100 tech scaleups | Author analyzed 100 known startups and notes that data team members comprise 1-5% of the company headcount, and this varies industry to industry (details included) | 2023 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Monica Rogati | The AI Hierarchy of Needs | Before you can fully get value out of ML/AI in an organization, it is critical to have foundational data needs met (i.e. good data collection processes, checks, and analytics). | 2017 |
Mario Perrakis | “The “0 / 1 / Done” Strategy for Data Science” | A description for what a DS org should aspire to: 0-day handovers facilitated by great documentation and code, 1-day prototypes enabled by good tooling and good knowledge, and a clear definition of “done”. | 2022 |
Thomas Redman | “Your Data Initiatives Can’t Just Be for Data Scientists” | Describes the tole and importance of non-data experts in DS projects: collaborators, customers, and as creators of the data. | 2022 |
Natassha Selvaraj | “Why Are So Many Data Scientists Quitting Their Jobs?” | Two primary factors drive a number of new data scientists out of the profession: a mis-match between employer and employee expectations around data science work and the general difficulty of ML to add clear business value. | 2022 |
Pete Warden | How Should you Protect your Machine Learning Models and IP? | Some thoughts on the importance of protecting IP in a ML org. | 2022 |
Jeff Saltz | Managing Machine Learning Projects | Touches on difficulties of managing ML projects and how the management process differs from standard software development. | 2021 |
Alfred Spector, Peter Norvig, Chris Wiggins, and Jeannette M. Wing | Data Science in Context: Foundations, Challenges, Opportunities | A pre-release of a book that gives a thorough accounting of the history of Data Science, a high-level understanding of its applications, and the ethical and social concerns associated with it. | 2022 |
Brooke Carter, Melissa Barr, and Michael Mui | ML Education at Uber: Frameworks Inspired by Engineering Principles | Provides an overview of the philosophy behind Uber's ML education program. | 2022 |
Eyal Trabelsi | How to build TRUST in Machine Learning, the sane way | Provides suggestions on how teams can improve trust in ML in their org, including defining metrics up front, following some best practices when developing the model, A/B testing the model upon deployment, and more. | 2022 |
Andrew Lukyanenko | Lessons learned after 10 years in IT: What I have learned from my mistakes and successes | A senior data scientist gives general DS career (some of which is worth noting as a leader) including topics around interviewing, productivity, communication, time estimation, and more. | 2022 |
Shreya Shankar et al. | Operationalizing Machine Learning: An Interview Study | From the abstract: They conducted interviews with 18 MLEs working across many applications, touching on how Velocity , Validation , and Versioning govern project success (in terms of deployment and long-term maintanence), and they also discuss interviewees’ pain points and anti-patterns. |
2022 |
Eugene Yan | Mechanisms for Effective Machine Learning Projects | The author describes a few process-based techniques for increasing ML project success (e.g. establishing project pilots and copilots, literature reviews, methods reviews, etc). | 2023 |
Arthur Turrell | Data science maturity and the cloud | Describes conditions and infrastructure needed for data scientists to thrive in an organization, and puts it in yhe context of data maturity. | 2023 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Lenny Rachitsky | Choosing Your North Star Metric | Proposes metrics based on your type of business, recommends having a singular north star metric, and avoid using revenue as your metric. | 2021 |
Ron Berman | “The Value of Descriptive Analytics: Evidence from Online Retailers” | The authors estimate an increase of 4%–10% in average weekly revenues post-adoption associated with the adoption of descriptive analytics among online retailers. | 2020 |
Roger M. Stein | "Why Managing Data Scientists Is Different" | Two challenges in managing data scientists: (1) managing a data research effort tends to be a dynamic and self-correcting process in which it is difficult to plan either a project’s timing or final outcomes, and (2) analytics is highly sensitive to time, cost, and quality tradeoffs. | 2015 |
Eric Colson | "The Sobering Truth about the Impact of your Business Ideas" | The vast majority of business ideas fail to generate a positive impact, and this underscores the value of measuring impact, collecting data, and testing. | 2021 |
Joe McFarren | 5 Tips for Managing a Successful Analytics Project | In the context of analytics consulting it is important to: clearly establish project scope, be in constant communication, determine a line of escalation, monitor work with tracking apps, and track finances. | 2022 |
Erik Balodis | A Framework for Embedding Decision Intelligence into your Organization | Provides a high-level overview of how to infuse decision-intelligence into an organization, along with some additional reading sources. | 2022 |
Nelson Auner | Building an Analytics Stack in 2020 | Gives an overview of the modern analytics stack via three buckets: a data-moving tool (ETL), a data warehouse to store the data, and a BI layer to analyze the data. | 2020 |
Mode | The Data Team’s Guide for Marketing Metrics | Good overview of the landscape of metrics used in data marketing work (as well as information on the technical side of it). | 2022 |
SeattleDataGuy | Why Are We Still Struggling To Answer How Many Active Customers We Have? | Surprisingly, metrics are still hard to calculate and this is at least partly because of turnover of developers, ERP and CRM migrations, producers of data constantly changing what data they provide, and mergers and acquisitions, and other reasons. | 2022 |
Randy Au | We take our units of analysis for granted | Understanding what the "unit of analysis" is, is critical to answer a research question, and yet in industry it's something we often poorly handle. | 2022 |
Marie Lefevre | Not All Data Requests Are Urgent, So Start by Asking These 5 Questions | Details five questions the authors typically asks of those that request analyses: Why? Why again? Who is it for? When is it due? Is it more of a priority than that other request? | 2022 |
Amplitude | The North Star Playbook: The guide to discovering your product’s North Star | A short book intended for product managers and product designers that describes the value of North Star metrics and how to iddentify them. | 2018 |
Gergely Orosz | Checklist used at Uber to determine if something is urgent | 1. What is the impact? 2. Do you have a signed spec answering the why and the what? 3. Do you have your estimate of the cost? 4. Make the cost of dropping what you're doing very clear. | 2022 |
Dan Frank | Experimentation Platform in a Day | A short technical (but very accessible guide) to setting up a simple experimentation "platform" with elements of logging, measurement, assignment, and analysis. | 2022 |
Ron Kohavi, Diane Tang, and Ya Xu | Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing | A fantastic introductory book on A/B testing for program and feature evaluation; covers methods, interpretation, biases that can arise, and culture around experimentation. | 2020 |
W.D. (ryxcommar@gmail.com) | Caveats and Limitations of A/B Testing at Growth Tech Companies | Highlights an issue of A/B tests where over time effect sizes tend to shrink, and growth companies can find themselves in a situation where the statistical power benefits of a growing user base are outweighed by this diminishing returns effect. | 2022 |
Tristan Handy | The Startup Founder’s Guide to Analytics | Although written in 2017, this article gives a still relevant high-level overview on creating the analytics competency at your org, at different levels of company size. | 2017 |
Rembrand Koning and Aaron Chatterji | Experimentation and Startup Performance: Evidence from A/B Testing | This academic paper provides the first evidence of how digital experimentation affects the performance of a large sample of high-technology startups using data that tracks their growth, technology use, and product launches (they find increased performance on several critical dimensions, including page views and new product features). | 2022 |
Sarah Krasnik | The Analytics Requirements Document: Launch and pray doesn't work when it comes to data | Makes the case that an ARD should be generated by the analytics team in parallel to a Product Requirements Document early in the product evolution lifecycle, outlining metric tracking expectations, data design, data criteria, and more. | |
Erin Gustafson | Meaningful metrics: How data sharpened the focus of product teams | Outlines a thorough growth model that is broadly applicable to most B2C organizations where users subscribe to a service and includes discussion on how various "levers" of this growth model were tested. | 2023 |
Kasia Rachuta | Why You Need an Experimentation Template | Author shares a generic version of their company's A/B testing doc and argues that it's helpful for structuring tests and ensuring stakeholders have thought about the right business questions prior to asking for something to be launched. | 2023 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
David Loftesness | The Engineer to Manager Transition, by Former Twitter Director of Engineering | Talks about an engineering management "event loop", where you touch base on people, projects, process, and self on daily, weekly, and monthly basis. | 2015 |
The Institute of Leadership & Management | "Spotlight on Leadership Styles" | Describes a set of leadership/management styles including pace-setting, democratic, laissez-faire, and more. | 2018 |
Andy Johns | How to know when to stop: A guide to avoiding burnout and establishing balance in your life—by guest author Andy Johns | A framework for thinking throughout burnout including: 1) Define your personal range of tolerance, 2) Pick your career progression, 3) Pick your life progression. | 2022 |
Alan Johnson | 11 Principles of Engineering Management | A brief, digestable list of management principles for new engineering managers. | 2022 |
GitLab | Preventing burnout: A manager's toolkit | Provides 12 strategies managers can utilize to support their team and prevent burnout | 2022 |
Tanya Reilly | Being glue | Describes the importance of "glue work" (e.g. noticing when other people in the team are blocked and helping them out, reviewing design documents and noticing what's inconsistent, onboarding the new people and making them productive faster, or improving processes to make customers happy. | 2019 |
Lindy Greer, Francesca Gino, and Robert I. Sutton | You Need Two Leadership Gears: Know when to take charge and when to get out of the way | Describes how leaders that know when, where, and how to shift gears between a top-down/take charge personas (“exercise authority” mode) and a more “flat” mode (in which the leader levels the hierarchy and shares power) will tend to be more successful, research shows. | 2023 |
Sarah Drasner | Engineering Management for the Rest of Us | Fantastic general engineering management book covering tooics such as career laddering, giving and receiving feedback, setting team culture, and more. | 2022 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Elijah Ben Izzy and Stefan Krawczyk | Deployment for Free -- A Machine Learning Platform for Stitch Fix's Data Scientists | The authors describe, at a high-level, the initial design considerations for Stitch Fix's ML platform, present the API data scientists use to interact with it, and detail its capabilities. | 2022 |
Barr Moses and Lior Gavish | What is a Data Platform? And How to Build One | While every organization’s data platform approach will vary based on the industry and the size of their company, this quick and dirty guide lays out a blueprint for a modern data platform. | 2022 |
Jordan Volz | The Modern Data Stack Ecosystem: Spring 2022 Edition | Articles maps out the various pieces of the modern data stack, including event tracking, a data warehouse, data governance, and more. | 2022 |
Krzysztof Szafranek | Zalando's Machine Learning Platform: Architecture and tooling behind machine learning at Zalando | Provides an overview of Zalando's ML platform (AWS-powered) from the perspective of a machine learning practitioner. | 2022 |
Jean-Georges Perrin | The next generation of Data Platforms is the Data Mesh | The post summarizes Zhamak Dehghani's proposal for transitioning from current breadth-first data platforms (end-to-end data lifecycle) into vertical/depth-first architectures (one business domain at a time). | 2022 |
Gabrielle Davelaar and Jordan Edwards | DevOps for AI - Microsoft | Great talk outlines how DevOps principles can be applied to AI, and then shows in detail how CI/CD, version control, model storage, and more fit into a great MLOps process. | 2018 |
Kevin Hu | The Four Pillars of Data Observability | Provides a definition of data observability and how in the context of a data platform this includes the following facets: metrics, lineage, metadata, and logs. | 2022 |
Stefan Krawczyk | What I Learned Building Platforms at Stitch Fix: Five lessons learned while building platforms for Data Scientists. | The author describes 5 lessons learned in building a data science platform, including things like don't build them for all possible users, abstract away any underlying APIs to simplify things for end-users. | 2022 |
Lak Lakshmanan | No, you don’t need MLOps: Keep It Simple: the complexity of full MLOps is rarely needed | In counterpoint to all the buzz, the author warns that MLOps is no panacea, and can often automate away important detail or cause a large amount of technical debt that ultimately doesn't save time. | 2022 |
Nishith Agarwal | The Build vs. Buy Guide for the Modern Data Stack | The author claims that the decision to build vs buy comes down to five main considerations: cost, complexity, expertise, time to value, and competitive advantage. | 2022 |
Dominik Kreuzberger, Niklas Kühl, and Sebastian Hirschl | Machine Learning Operations: Overview, Definition, and Architecture | The authors conducted a literature review and interviews with experts to create an aggregated overview of the necessary principles, components, and roles, as well as the associated architecture and workflows surrounding "MLOps" | 2022 |
Indika Kumara et al. | Requirements and Reference Architecture for MLOps: Insights from Industry | The authors conducted a qualitative analysis of the MLOps field from literature, and bucket their findings into categories like "Infrastructure", "Model Deployment and Serving", "Monitoring and Feedback Loops", and more. | 2022 |
Charlie Summers | Demystifying event streams: Transforming events into tables with dbt | Provides an overview on how to convert events from an event-driven microservice architecture into relational tables in a warehouse like Snowflake, the advantages of this architecture, and how you might want to structure your event messages. | 2022 |
Dmitry Kruglov | The Architecture of a Modern Startup: Hype wave, pragmatic evidence vs the need to move fast | Probably more relevant for CTO roles, but with interesting nuggets for Heads of Data, this post gives an overview of the various infrastructure and tools used in the modern startup (languages, infrastucture as code, secrets management, databases, etc). | 2022 |
Sam Lafontaine | How to Build a Modern Data Stack – The Comprehensive Guide | A light overview of the several components that constitute the modern data stack: a data source, data ingestion tools, data storage, data transformations and modeling, data analytics, and data activation (what used to be called "reverse ETL"). | 2021 |
Jordan Tigani | Big data is dead | Provactive piece that argues that despite the hype of the last 10 years around the coming "big data" wave and the need for big data tooling and infrastructure, only the smallest of fractions of organizations need to concern themselves with this. | 2023 |
Benjamin Rogojan | Why You Should Upgrade Your Data Infrastructure | Gives high-level summary of data the several phases of data infrastructure that organizations mature through (from tiny start-up looking at manually-generated spreadsheets to more mature organizations with complex ETL DAGs. | 2023 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Sanjana Sen and Stephen Bailey | Locally Optimistic Meetup - Governance and Compliance | A conversation among many data practitioners about how their organizations handle data access control, data tagging, anonymization, and other key compliance activities, and what frameworks they have found helpful. | 2020 |
Bryan Petzold, Matthias Roggendorf, Kayvaun Rowshankish, and Christoph Sporleder | Designing data governance that delivers value | Briefly surveys the problem of poor data governance, describes an idea data governance model, and provides six ways to drive data-governance excellence. | 2020 |
Ilan Man | People-first Data stacks | Proposes switching from tech- to user-centric data management by i) integrating data into company culture (raising awareness, tracking adoption); ii) making data governance options actionable for stakeholders outside of the data platform and iii) introducing ownership of tests on data quality. | 2022 |
Yali Sassoon | Why Data Contracts are Obviously a Good Idea. And why there is so much resistance to this idea from the community around the Modern Data Stack | Briefly describes the importance of data contracts, provides an example of a complaint against contracts, and then how complaints arise because practitioners are stuck in the “data is oil” paradigm i.e. assume that the data is extracted, rather than deliberately creating data. | 2022 |
Crystal Lewis | Using a data dictionary as your roadmap to quality data | A bit more for an academic or research audience, provides a style guide and suggestions on making an effective data dictionary and nomenclature for data models. | 2022 |
Maggie Hays | Data Governance, but Make It a Team Sport | Outlines an iterative framework (with examples) to introduce data governance within an organization (includes identify the chief data problem(s) to solve, set clear goals to resolve these problems, start small before you go big, drive incremental action, and then measure progress and iterate). | 2023 |