Welcome to the Cannabis Data Science Meetup Group, a team of data scientists from around the world who are advancing cannabis science, one 🧬 molecule at a time! Here you can find many useful notes, notebooks, and video tutorials to help you get, wrangle, and analyze cannabis data with the best of them. Come join the fun every Wednesday at 1:20pm PST / 2:20pm MT / 3:20pm CT / 4:20pm EST. You are always welcome to use the code, watch the videos, and make contributions of your own! Please dive in:
We would love to see you at an upcoming meetup! Please bring any research that you may like to share, your thoughts, comments, questions, concerns, or anything at all. All are welcome. See you soon!
Event | Day | Time |
---|---|---|
Cannabis Data Science | Wednesdays | 1:20pm PST / 2:20pm MT / 3:20pm CDT / 4:20pm EST |
Please peruse the Cannabis Data Science archive and see if you can find anything of value!
Topic | Description | Video | Code |
---|---|---|---|
Get Data | Join the fun, zany bunch on our first Cannabis Data Science meetup as we begin to wrangle the firehose of data that the Washington State traceability system offers to the public. | Video | Code |
Look at the Data | This week we begin to look at the firehose of data that the Washington State traceability system offers to the public. | Video | Code |
Data Wrangling | This week we begin to wrangle the firehose of data that the Washington State traceability system offers to the public. | Video | Code |
API Exploration | This week we build a simple API to access the firehose of data that the Washington State traceability system offers to the public. | Video | Code |
Competitive Wages | This week we estimate competitive wage rates for workers in the Colorado cannabis market with data that the Colorado Marijuana Enforcement Division publishes. | Video | Code |
Competitive Interest Rates | This week we estimate competitive interest rates in the Colorado cannabis market using data published by the Colorado Marijuana Enforcement Division. | Video | Code |
Market Concentration | This week we begin to estimate market concentration using data that the Washington State traceability system offers to the public. | Video | Code |
Introduction to Forecasting | This week we go over the 10 commandments of forecasting and begin to forecast. | Video | Code |
Traceability and Communication | This week we begin to talk traceability and discuss how communication is critical when integrating software systems. | Video | Code |
Inflation Part One | This week we estimate inflation in the Oregon cannabis market and begin to make forecasts for inflation in 2021. | Video | Code |
Inflation Part Two | This week we review our model for inflation in Oregon and go over our forecasts for inflation in 2021. | Video | Code |
Lab Results and Traceability | This week we go over the lab testing process and the steps labs take to stay in compliance with traceability systems. | Video | Code |
Waste Analytics | This week we discuss and analyze the large amount of biomass waste that is generated by cannabis cultivation. | Video | Code |
Track and Trace | This week we begin to wrangle the firehose of data that the Washington State traceability system offers to the public. | Video | Code |
Market Basket Analysis | This week we begin to discuss the breakdown of consumer purchases using data that the Washington State traceability system offers to the public. | Video | Code |
Binary Data | This week we begin to discuss how to use binary models to analyze cannabis data. | Video | Code |
Crunching Numbers in Oklahoma | This week we begin to analyze the data that the Oklahoma Medical Marijuana Authority (OMMA) makes available to the public. | Video | Code |
Better Data, More Forecasts | This week we prepare more forecasts for 2021 using public cannabis data. | Video | Code |
Testing and Analysis | This week we discuss laboratory testing requirements in the cannabis industry, how lab tests are performed, and how labs operate. | Video | Code |
Laboratory Software | This week we talk about software that labs use in their operations. | Video | Code |
Transportation Costs | This week we discuss transportation costs and sales in Michigan using data that the CRA makes available to the public. | Video | Code |
Hemp Analysis Part One | This week we begin to collect and analyze data from the Midwestern Hemp Database published by the University of Illinois. | Video | Code |
Hemp Analysis Part Two | This week we build a simple model to try to predict when hemp may test above the permitted concentration of THC using the Midwestern Hemp Database. | Video | Code |
Cannabinoid Analysis Part One | This week we begin to analyze cannabinoid data that the Washington State traceability system offers to the public. | Video | Code |
Cannabinoid Analysis Part Two | This week we continue our analysis of the cannabinoid data that the Washington State traceability system offers to the public. | Video | Code |
Residual Solvents | This week we discuss residual solvent detections and thresholds using data that Washington State traceability system offers to the public. | Video | Code |
Cannabis Sales Part One | We begin cannabis sales analysis with groundbreaking research by Paul Kitko who identifies cannabis dispensary purchase patterns using economics and data science. Join the fun, data wrangling, and analytics in the Cannabis Data Science meetup, every Wednesday at 8:30am PST | 10:30am CDT | 11:30am EST. Support the group: https://opencollective.com/cannlytics-company Find the data and source code: https://github.com/cannlytics/cannabis-data-science | Video | Code |
Cannabis Sales Part Two | We continue to analyze cannabis sales, expanding our analysis to all states with permitted recreational and/or medicinal cannabis. | Video | Code |
Looking at Cannabis Types from Las Vegas | After a 'snafu' on Wednesday, we manage to analyze 4 types of cannabis in Washington State, from 'Paris' Las Vegas. | Video | Code |
Mapping Licensees per Capita Part One | This week we big looking at measure of market competitiveness, licensees per capita, along geographic lines in Oklahoma. | Video | Code |
Barriers to Entry and Market Competitiveness | Discussions of market competitiveness and scale led us to a fruitful discussion of barriers to entry, including high capital costs, both financial and human. | Video | Code |
Terpene Analysis Part One | Terpene data galore! We discover a treasure trove of public cannabis terpene data published graciously by Connecticut Open Data and calculate the prevalence of various terpenes. | Video | Code |
Terpene Analysis Part Two | An extraordinary day of cannabinoid and terpene data crunching followed by data exploration in Massachusetts. | Video | Code |
Measuring Cannabis GDP | Today we break new ground by estimating GDP from permitted adult-use cannabis in Massachusetts. | Video | Code |
Equilibrium Analysis | Today we conduct a partial equilibrium analysis of the cannabis industry in Massachusetts, estimating prices, wages, and rates of return. | Video | Code |
Model Estimation and Bias | We attempt to fit economic models using Massachusetts cannabis data and explore model pitfalls and bias. | Video | Code |
A Brief History of Cannabis QA | Albeit impromptu, we manage to discuss the history of quality assurance in the cannabis industry and how it was curiously spurred by the hops latent viroid. | Video | Code |
Forecasting Sales and Inflation | Today we apply the 10 commandments of forecasting and utilize a nifty vector autoregressive (VAR) model to forecast cannabis sales in Massachusetts. | Video | Code |
Predicting Market Performance Part One | Today we utilize a number of techniques that we have covered to perform a powerful market analysis of Massachusetts' cannabis market and begin to predict market performance in Massachusetts in the coming year, 2022. | Video | Code |
Predicting Market Performance Part Two | Today we talk about the history of the structure-conduct-performance paradigm in the industrial organization field of economics and how economic models can be used to analyze regulatory policy, the potential for collusion, and market competition and concentration. | Video | Code |
Predicting Market Performance Part Three | We finally complete our market analysis of Massachusetts. We successfully quantify the market, predict its future performance, and discuss the market implications, both past and future. | Video | Code |
Predicting Market Performance Q & A | We discuss the market implications of our analysis of Massachusetts and brainstorm ideas for comparative analysis with additional states. | Video | Code |
Comparative Analysis | Happy thanksgiving! Today we begin to compare the structure and performance of cannabis dispensaries in various states with adult-use cannabis. We uncover an interesting pattern that warrants further investigation. | Video | Code |
Economic Surplus | The Cannabis Data Science meetup group come back strong with an impactful discussion of economic surplus in the cannabis market. | Video | Code |
Measuring Market Structure | We see-through our analysis of cannabis markets by concretely measuring market structure. We can now confidently classify the competitiveness of cannabis markets! | Video | Code |
Forecasting Sales in 2022 | Join the best meetup to date as we forecast cannabis sales across the U.S. in 2022. It is hard to conceptualize the staggering amount of money spent on cannabis, however, we do just that and concretize the enormous potential social benefit. | Video | Code |
East Coast vs. West Coast Cannabis | We do a deep dive on cannabinoids measured in East Coast and West Coast cannabis and find a structural difference that may stem from differences in how the cannabis is tested. | Video | Code |
Forecasting Models | Now 300 strong, the Cannabis Data Science meetup group delivers the first open source, open data forecast of cannabis sales in 2022. | Video | Code |
Predicting Laboratory Profitability | You're not going to want to miss this meetup, especially if you're a lab owner. This week at the Cannabis Data Science meetup we calculate possibly the most important metric to your bottom line. Whether or not your lab is in business in 5 years from now depends on this metric. | Video | Code |
Processing Cannabinoids and Managing Inconsistencies | The lesson of the week: variability matters. We go back in time to discuss the origins of cannabinoid processing, early cannabinoid research, and the development of cannabinoid extraction techniques. | Video | Code |
Data Augmentation and Visualization | It is imperative to have the right tools (and data) for the task at hand. The idea is to merge objects by common factors, retaining the data points that you need in your analysis. Once you have augmented data, then you have created value by facilitating analyses that could not otherwise be performed or visualizations that can only be created with the augmented data. | Video | Code |
Statistics with Big Data | Calculating statistics on large datasets is difficult, but simple statistics, if able to be calculated, can provide enormous value, provide deep insights, spark ideas for future research, and identify aspects that need further magnification. | Video | Code |
Logistics and Transportation Statistics with Big Data | 1,000,000+ more miles this year, easy! Keep on trucking Washington State couriers! This week we look at the total number of transfers by licensee and by license type as we create various novel maps. Check out these stats and more next week with the Cannabis Data Science meetup group. | Video | Code |
Spatial Analysis | Today we begin to answer your long-standing questions about cannabis prices. We gather powerful spatial analysis techniques pioneered by great data scientists from throughout history. Stay tuned for as we answer the question: do prices vary by geography (zip code) in Washington State? | Video | Code |
The Effects of Taxes | This week we extended our analysis to include taxes! Check out the latest and greatest research on the fundamentals of the cannabis industry. Have any good ideas? Extend the discussion in the comments or on Slack. | Video | Code |
Discussing the State of Cannabis Research | From sunny San Diego we talk about the latest and greatest cannabis research and the questions that the Cannabis Data Science team can answer this year with rich, publicly available data that is sitting there like a pile of gold nuggets on a table, free for the taking! | Video | Code |
Natural Language Processing to Extract Data from Human-Written Text | Parsing natural language can be complex, but can yield valuable data. We use the SpaCy Python package to parse human-entered labels to unlock never-before-crunched data that we then readily analyze with the best-known statistical models. | Video | Code |
Exploratory Data Analysis: Correlations, Deviations, and Regressions | Lesson of the day: measurements vary and it is of utmost importance to explore our data to understand how it varies. We learn from the original statisticians how to describe, explore, and then analyze our rich, albeit messy data. | Video | Code |
Study Habit Patterns of Successful Scientists | This is a short, but sweet story about how we can learn and be inspired from history, however bizarre it may be. We explore one of first analyses of study habits of successful scientists. | Video | Code |
Brand Analysis: Measuring Marketing | Can we measure marketing performance for a cannabis brand? This week we estimate market share, penetration, customer value, and a myriad of other marketing metrics for the top cannabis-infused beverage brands in Washington State to prove that yes we can! Grab your favorite beverage and enjoy. | Video | Code |
Game Theory to Model Entry and Exit in Cannabis Markets | This week we dip our toes into game theory. We model cannabis production as a game and use it to predict actual entry and exit into cannabis markets in Washington State. It's all fun and games, until someone makes a profit! | Video | Code |
Consumer Choice | How will inflation affect the proportion of people who use cannabis and the quantity of cannabis consumed by people who consume cannabis? These are questions that the Cannabis Data Science group is uniquely poised to answer. Join us in modeling both participation and consumption to make unbiased, consistent predictions about cannabis consumption. | Video | Code |
Data Curation: Helping Consumers Access Pesticide Data | Because Washington State makes cannabis traceability data available to the public, data scientists can calculate statistics to help consumers. As a proof of concept for other states, such as Oregon, we begin to curate public Washington State pesticide data to make the data easily accessible to consumers. | Video | Code |
Artificial Intelligence: Overcoming Asymmetric Information | What is AI? On the holiest of cannabis days, the Cannabis Data Science team cooks up an artificial intelligence to be a curator and custodian of cannabis data to reduce asymmetric information and move one tiny step forwards to a better world for everyone with cannabis. | Video | Code |
Cannabis Consumption: Estimating Consumer Demand | Someone call a plumber! The data dam just burst! The Cannabis Data Science team is all hands on deck serving you up the holy grail of cannabis data: cannabis use rates. Get them while they're hot!!! This is the holy grail folks. Today we curate variables related to cannabis use in the USA by state from a 2019-2020 Census survey. | Video | Code |
Fertilizers: Costs, Benefits, and Plant Hardiness | What are Nitrogen (N), Phosphorus (P), and Potassium (K) and why should a cannabis cultivator care? This is the exact topic of the day. We follow the money and connect the dots from Saskatchewan to Humboldt County. | Video | Code |
Fertilizer Prices and the US Hemp Canopy | Every piece of the puzzle must be filled in, so we diligently collect all public fertilizer price and hemp yield, harvest, and acreage data under the sun. Please enjoy and explore the golden data laying before us. | Video | Code |
Predicting Effects + Aromas Part 1: Preparing & Training Prediction Models | Now's better than never, we release the SkunkFx! This is where things get interesting. The Cannabis Data Science Team makes it abundantly clear with the one-and-only cannabis effects prediction model that open-source statistics yields better prediction models than you can build with hundreds of millions of dollars of funding in legacy systems. Please enjoy and please put the statistics to good use. | Video | Code |
Plant Patents: Classifying Cultivars with Terpene Lab Results | There's been noise that lab results and strains don't mean anything. We push back hard as lab results are the key mechanism that top cultivators and lawyers are using to file plant patents. Don't let people sell you hype and miss out on a golden opportunity of a life time! | Video | Code |
Predicting Effects + Aromas Part 2: Distinguishing Type and Strain Effects | Walk, then run. We clearly outline our theory, the statistical models that we will use, and the intricacies of the data as we prepare to predict effects and aromas of cannabis strains given their lab results, distinguishing different type and strain effects that you may experience in various varieties. | Video | Code |
Enter the Skunk: Using Statistics to Make Predictions | Tell your developer(s) about our free effects and aromas API! If you're paying for cannabinoid and/or terpene tests, then you may as well have the effects and aromas of your products predicted 🔮 for free! Simply input cannabinoid and/or terpene data and you will receive a prediction of probable effects and aromas. The cherry on top is that you can report back the actual effects and aromas that characterize your product and the model becomes that much smarter! So, you can make your predictions better over time if you opt-in to providing feedback. Please explore at your pleasure and, hopefully, you are able to find many clever uses for the statistics. Bon appétit! | Video | Code |
First things first, you can clone the repository:
git clone https://github.com/cannlytics/cannabis-data-science.git
The majority of examples are written in Python. If you install Anaconda, then you can create a virtual environment with all of the packages that you will need:
cd ./cannabis-data-science
conda create --name cds python=3.9
conda activate cds
pip install -r requirements.txt
You should now be off to the races and able to go through most notebooks, following any notebook-specific instructions to download supplementary datasets.
Contributions are always welcome! Please submit issues, questions, bugs, fixes, improved-upon code, or anything at all that you want to be addressed. Anyone is welcome to contribute anything. You can refer to the Cannlytics contributing guide for more information about contributing to the Cannlytics ecosystem in general. One of the easiest ways that you can help the group is by giving the repository a ⭐
💬 Join the Cannabis Data Science Slack channel to keep the conversation going!
The Cannabis Data Science meetup group and the accompanying source code is made available with ❤️ and your good will. Please consider making a contribution to help us continue crafting useful code and wrangling new datasets for you. Thank you 🙏
Provider | Link |
---|---|
👐 OpenCollective | https://opencollective.com/cannlytics-company/donate |
💸 PayPal Donation | https://cannlytics.page.link/donate |
💵 Venmo Donation | https://www.venmo.com/u/cannlytics |
🪙 Bitcoin donation address | 34CoUcAFprRnLnDTHt6FKMjZyvKvQHb6c6 |
⚡ Ethereum donation address | cannlytics.eth |
Copyright (c) 2021-2022 Cannlytics
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Please cite the following if you use the code examples in your research:
@misc{cannlytics2023,
title={Cannabis Data Science},
author={Skeate, Keegan and O'Sullivan-Sutherland, Candace},
journal={https://github.com/cannlytics/cannabis-data-science},
year={2023}
}