Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…into main
  • Loading branch information
jhudsl-robot committed Dec 18, 2023
2 parents 0a542aa + cd9e863 commit 4e5f34d
Show file tree
Hide file tree
Showing 168 changed files with 13,321 additions and 10,921 deletions.
17 changes: 17 additions & 0 deletions docs/no_toc/01a-AI_Possibilities-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,25 @@ This course is targeted toward industry and non-profit leaders and decision make

In this course, we'll learn about what Artificial intelligence is, and what it isn't. We'll also learn the basics of how it works, learn about different types of AI, and set some ground rules for minimizing the harms and maximizing the benefits of AI.

This course will cover:

- Framework, or definition, of AI
- Essential AI examples and case studies
- The take-home of how AI works
- Key definitions of types of AI and related technologies
- What is possible with AI
- Ground rules for using AI for good

### Learning Objectives

We will learn how to:

- Determine what AI is and isn't using our three part framework: the data, algorithm, and interface
- Identify common technologies and whether or not they are AI
- Explain the essential "behind the scenes" technology of how AI works
- Detail ground rules for using AI ethically
- Identify possibilities for using AI while understanding its limitations

<div class = disclaimer>
**Disclaimer:** The thoughts and ideas presented in this course are not to be substituted for legal or ethical advice and are only meant to give you a starting point for gathering information about AI policy and regulations to consider.
</div>
126 changes: 75 additions & 51 deletions docs/no_toc/01b-AI_Possibilities-what_is_ai.md

Large diffs are not rendered by default.

51 changes: 46 additions & 5 deletions docs/no_toc/01c-AI_Possibilities-how_ai_works.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,24 @@ TODO: Slides here: https://docs.google.com/presentation/d/1OydUyEv1uEzn8ONPznxH1

# How AI Works

Let's briefly revisit our definition of AI: it must have data, algorithm(s), and an interface. Let's dive into each of these in more detail below.
Let's briefly revisit our definition of AI. It must have some data, an algorithm, and an interface. Let's break these down in more detail below.

## Early Warning for Skin Cancer

Each year in the United States, 6.1 million adults are treated for skin cancer (basal cell and squamous cell carcinomas), totaling nearly $10 billion in costs [@CDC2023]. It is one of the most common forms of cancer in the United States, and mortality from skin cancer is a real concern. Fortunately, early detection through regular screening can increase survival rates to over 95% [@Melarkode2023]. Cost and accessibility of screening providers, however, means that many people aren't getting the preventative care they need.

Increasingly, AI is being used to flag potential skin cancer. AI focused on skin cancer detection could be used by would-be patients to motivate them to seek a professional opinion, or by clinicians to validate their findings or help with continuous learning.

1. **Data**: Images of skin
1. **Data**: Images with and without skin cancer present

1. **Algorithm**: Detection of possible skin cancer

1. **Interface**: Web portal or app where you can submit a new picture

## Collecting Datapoints

Let's say a clinician, *Dr. Derma*, is learning how to screen for skin cancer. When Dr. D sees their first instance of skin cancer, they now have one data point. Dr. D could make future diagnoses based on this one data point, but it might not be very accurate. Over time, as Dr. D does more screenings of skin with and without cancer, they will get a better and better idea of what skin cancer looks like. This is part of what we do best. Human beings are powerhouses when it comes to pattern recognition and processing [@Mattson2014].
Let's say a clinician, *Dr. Derma*, is learning how to screen for skin cancer. Dr. D goes to their first day at the clinic and sees their first instance of skin cancer. Dr. D now has one data point.
Dr. D could make future diagnoses based on this single data point, but these diagnoses probably won't be very accurate. Over time, as Dr. D does more screenings of skin with and without cancer, they will get a better and better idea of what skin cancer looks like. This is part of what we do best. Human beings are powerhouses when it comes to pattern recognition and processing [@Mattson2014].

Like Dr. D, AI will get better at finding the right patterns with more data. In order to train an AI algorithm to detect possible skin cancer, we'll first want to gather as many pictures of normal and cancerous skin as we can. This is the **raw data** [@Leek2017].

Expand All @@ -43,7 +44,7 @@ Representative diversity of datasets is crucial for the effectiveness of AI. For
The tech industry's lack of diversity contributes to these issues, often leading to the discovery of failures only after harm has occurred.
</div>

Large Language Models (LLMs), which we will cover later, are great examples of high quantity and quality of data. Think about how much text information is freely available on the internet! Throughout the internet, we're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". Many LLMs are trained on sources like [Wikipedia](https://www.wikipedia.org/), which are typically grammatically sound and informative, leading to higher quality output.
Large Language Models (LLMs), which we will cover later, are great examples of using high quantity and quality of data. Think about how much text information is freely available on the internet! Throughout the internet, we're much more likely to see the phrase "cancer is a disease" than "cancer is a computer program". Many LLMs are trained on sources like [Wikipedia](https://www.wikipedia.org/), which are typically grammatically sound and informative, leading to higher quality output.


<img src="resources/images/01c-AI_Possibilities-how_ai_works_files/figure-html//1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I_g2a3877ab699_0_79.png" title="CAPTION HERE" alt="CAPTION HERE" width="100%" style="display: block; margin: auto;" />
Expand All @@ -59,7 +60,6 @@ It's important to remember that AI systems need specific instructions to start d

Once data is labeled, either "cancer" or "not cancer", we can use it to train the algorithm in the next step. This data is aptly called **training data**.


<img src="resources/images/01c-AI_Possibilities-how_ai_works_files/figure-html//1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I_g263e06ef889_36_318.png" title="CAPTION HERE" alt="CAPTION HERE" width="100%" style="display: block; margin: auto;" />

## Understanding the Algorithm
Expand All @@ -78,6 +78,8 @@ As an example, here is a very simple algorithm with one feature (spot perimeter)

1. If the perimeter of the spot is not circular, label the image "cancer".

<img src="resources/images/01c-AI_Possibilities-how_ai_works_files/figure-html//1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I_g263e06ef889_36_474.png" title="CAPTION HERE" alt="CAPTION HERE" width="100%" style="display: block; margin: auto;" />

### Testing the Algorithm

After setting up and quantifying the features, we want to make sure the AI is actually doing a good job. We'll take some images the AI hasn't seen before, called **test data**. We know the correct answers, but the AI does not. The AI will measure the features within each of the images to provide an educated guess of the proper label. Every time AI gets a label wrong, it will reassess parts of the algorithm. For example, it might make the tweak below:
Expand All @@ -102,6 +104,45 @@ Finally, AI would not work without an interface. This is where we can get creati

<img src="resources/images/01c-AI_Possibilities-how_ai_works_files/figure-html//1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I_g263e06ef889_36_397.png" title="CAPTION HERE" alt="CAPTION HERE" width="100%" style="display: block; margin: auto;" />

## Understanding the AI Spring

The "AI Spring" is the period of rapid growth and progress in artificial intelligence starting in the early 2020s. A huge component of the AI Spring is **Generative AI**, which includes text generation, image creation, natural speech generation, computer code production, biological molecule discovery, and more.

In the example above, the AI learns to distinguish between skin conditions based on features and patterns it identifies. Its main goal is to make decisions about someone's skin condition rather than generating new examples. This is called **discriminative AI**.

However, let's imagine we wanted AI to generate examples of skin cancer. If the AI was creating new, realistic images of skin cancer, trying to generate what cancerous lesions might look like, it would be considered **generative**.

<div class = dictionary>
**Generative AI**: Creates new, creative things that look like what it has learned.
**Discriminative AI**: Tells things apart or makes decisions based on what it has learned.
</div>

<img src="resources/images/01c-AI_Possibilities-how_ai_works_files/figure-html//1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I_g2640f36ed31_12_9.png" title="CAPTION HERE" alt="CAPTION HERE" width="100%" style="display: block; margin: auto;" />

We'll talk next about some generative AI models which have made recent breakthroughs possible.

### Transformer Models

Transformers have been especially helpful for text generation. They work like smart readers that can understand context and relationships in language very well. Imagine you're reading a sentence, and at each word, you want to pay attention to other words to understand the context better. The **self-attention mechanism** does this very efficiently. It allows the model to focus on different parts of the input (like words in a sentence) simultaneously, capturing long-range dependencies.

Take for example this paragraph from the Wikipedia entry for skin cancer. A transformer model would be able to synthesize the information to understand the relationship between UV exposure, risk factors, and the development of different types of skin cancers for different groups of individuals. It can easily distill the information into themes and topics.

> More than 90% of cases are caused by exposure to ultraviolet radiation from the Sun.[4] This exposure increases the risk of all three main types of skin cancer.[4] Exposure has increased, partly due to a thinner ozone layer. Tanning beds are another common source of ultraviolet radiation. For melanomas and basal-cell cancers, exposure during childhood is particularly harmful. For squamous-cell skin cancers, total exposure, irrespective of when it occurs, is more important. Between 20% and 30% of melanomas develop from moles.[6] People with lighter skin are at higher risk as are those with poor immune function such as from medications or HIV/AIDS. Diagnosis is by biopsy.
<img src="resources/images/01c-AI_Possibilities-how_ai_works_files/figure-html//1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I_g2640f36ed31_12_28.png" title="CAPTION HERE" alt="CAPTION HERE" width="100%" style="display: block; margin: auto;" />

### Diffusion Models

Like transformers, diffusion models are useful for generative AI, particularly image generation. The key to diffusion models is that they have a lot of training in how to fill in the blanks. The model starts with many "noisy" images (imagine a photo with lots of holes or black spots) and tries to reproduce the original image. This process is called "denoising score matching". It then uses this training to generate entirely new content.

<img src="resources/images/01c-AI_Possibilities-how_ai_works_files/figure-html//1OydUyEv1uEzn8ONPznxH1mGd4VHC9n88_aUGqkHJX4I_g2640f36ed31_12_41.png" title="CAPTION HERE" alt="CAPTION HERE" width="100%" style="display: block; margin: auto;" />

## Summary

In our skin cancer detection example, an AI system required lots of data labeled with information ("cancer" or "not cancer"). An algorithm looked for patterns between these two groups and then provided the results via an interface. This AI is an example of discriminative AI.

Since the early 2020s, generative AI has exploded in popularity, assisted by transformer and diffusion models, among other advancements. These technologies have allowed AI to excel at creating new content, by recognizing deeper context and patterns.

<div class = disclaimer>
**Disclaimer:** The thoughts and ideas presented in this course are not to be substituted for legal or ethical advice and are only meant to give you a starting point for gathering information about AI policy and regulations to consider.
</div>
30 changes: 26 additions & 4 deletions docs/no_toc/01d-AI_Possibilities-ai_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,40 @@ We've learned a bit about how AI works. However there are many different types o

## Machine Learning

**Machine learning** is broad concept describing how computers to learn from data. It includes traditional methods like decision trees and linear regression, as well as more modern approaches such as deep learning. It involves training models on labeled data to make predictions or uncover patterns or grouping of data. Machine learning is often the "algorithm" part of our data - algorithm - interface framework.
**Machine learning** is broad concept describing how computers learn from looking at lots of examples. Imagine you are learning to tell the difference between apples and oranges. Someone first has to show you examples and say, "This is an apple, and this is an orange." Similarly, machine learning approaches need examples of input data that is "labeled" with the correct output. The goal of machine learning is making useful or accurate predictions. Machine learning includes simpler approaches like regression, and more complicated approaches like deep learning (see below).

## Neural Networks

**Neural networks** are a specific class of algorithms within the broader field of machine learning. They organize data into layers, including an input layer for data input and an output layer for results, with intermediate "hidden" layers in between.
**Neural networks** are a specific class of algorithms within machine learning. Neural networks mimic the way data is transferred between neurons in the brain.

You can think of layers like different teams in an organization. The input layer is in charge of scoping and strategy, the output layer is in charge of finalizing deliverables, while the intermediate layers are responsible for piecing together existing and creating new project materials. These layers help neural networks understand hierarchical patterns in data.
Neural networks organize data into layers, starting with an "input layer" of raw data. Data is then transferred to the next layer, called a "hidden" layer. The hidden layer combines the raw data in many ways to create levels of abstraction. You can think of an image that is very pixelated becoming more clear. Finally, results are produced in an "output layer".

The connections between nodes have weights that the network learns during training. The network can then adjust these weights to minimize errors in predictions. Neural networks often require large amounts of labeled data for training, and their performance may continue to improve with more data.
Neural networks often require large amounts of labeled data for training, and their performance may continue to improve with more data. Google uses a neural network to power its search algorithm [@ibm2023].

<img src="resources/images/01d-AI_Possibilities-ai_types_files/figure-html//1UiYOR_4a68524XsCv-f950n_CfbyNJVez2KdAjq2ltU_g2a694e3cce9_0_0.png" title="CAPTION HERE" alt="CAPTION HERE" width="100%" style="display: block; margin: auto;" />

## Deep Learning

**Deep learning** refers to neural networks with multiple intermediate "hidden" layers. A neural network with 2+ hidden layers could be considered a deep learning system [@ibm2023]. The advantage of deep learning is that these approaches cluster data automatically, and can detect abstraction or patterns that we might not know ahead of time. This is especially useful for complicated data, like unstructured text or images.

## Natural Language Processing

**Natural language processing**, or NLP, deals with interpreting text and extracting relevant information and insights. It can also categorize and organize the documents themselves. For example, NLP could help read the contents of documents online and decide whether they are patents or journal articles. These documents could then be indexed in [Google Scholar](https://scholar.google.com/). Many NLP approaches use deep learning [@wikiNLP]

## Generative AI

## Large Language Model

## Transformer Model

In 2017, Google engineers published a paper describing a type of neural network they called a **transformer model**. This model revolutionized the field of natural language processing and led to an explosion in what was possible with AI. Transformer models are basically what drives generative AI models today.

## Variational Autoencoders (VAEs)

## Generative Adversarial Networks (GANs)

## Strengths and Weaknesses

<div class = disclaimer>
**Disclaimer:** The thoughts and ideas presented in this course are not to be substituted for legal or ethical advice and are only meant to give you a starting point for gathering information about AI policy and regulations to consider.
</div>
2 changes: 1 addition & 1 deletion docs/no_toc/04e-AI_Policy-creating_your_ai_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Tools like LLMs are increasingly popular, and it is unlikely that your organizat

For example, many AIs have a user agreement. Two possible types of user agreements are commercial agreements (which is what individual users generally have) and enterprise agreements (which is what organizations and institutions might have). The terms and conditions of enterprise agreements tend to be more stable over time and can be negotiated to include terms favorable to your organization, like your organization's data not being used as part of the AI's training data.

If an employee uses an AI system as a single consumer, they will generally sign a consumer use agreement. Under a consumer agreement, your employee may have fewer legal protections than they would if they were operating under an enterprise agreement. Consumer agreements can change unexpectedly, which means you could be operating under a whole new set of circumstances month to month. Additionally, consumers do not have the same sort of negotiating power with consumer agreements, which means a single employee is unlikely to have the same sort of data protections that an institution might have. By negotiating an enterprise agreement, your organization has created a system in which employees and their actions are less likely to result in unintential harm or data misuse.
If an employee uses an AI system as a single consumer, they will generally sign a consumer use agreement. Under a consumer agreement, your employee may have fewer legal protections than they would if they were operating under an enterprise agreement. Consumer agreements can change unexpectedly, which means you could be operating under a whole new set of circumstances month to month. Additionally, consumers do not have the same sort of negotiating power with consumer agreements, which means a single employee is unlikely to have the same sort of data protections that an institution might have. By negotiating an enterprise agreement, your organization has created a system in which employees and their actions are less likely to result in unintentional harm or data misuse.

Thinking about your AI policy as just the beginning, not the entire thing, can be a way to protect your employees, your organization, and the people you serve.

Expand Down
Loading

0 comments on commit 4e5f34d

Please sign in to comment.