Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualize: Updated documentation #2

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 26 additions & 13 deletions source/widgets/visualize/distributions.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,29 +13,42 @@ Displays value distributions for a single attribute.
- Data: data with an additional column showing whether an instance is selected
- Histogram Data: bins and instance counts from the histogram

The **Distributions** widget displays the [value distribution](https://en.wikipedia.org/wiki/Frequency_distribution) of discrete or continuous attributes. If the data contains a class variable, distributions may be conditioned on the class.
The **Distributions** widget displays the [value distribution](https://en.wikipedia.org/wiki/Frequency_distribution) of discrete or continuous attributes. If the data contains a class variable, distributions may be split by the class.

The graph shows how many times (e.g., in how many instances) each attribute value appears in the data. If the data contains a class variable, class distributions for each of the attribute values will be displayed (like in the snapshot below). To create this graph, we used the *Zoo* dataset.
The graph shows how many times (e.g., in how many instances) each attribute value appears in the data. If the data contains a class variable, class distributions for each of the attribute values will be displayed (like in the snapshot below). To create the graph, we used the *heart-disease* dataset.

![](images/Distributions-Discrete.png)
![](images/Distributions-Discrete-stamped.png)

1. A list of variables for display. *Sort categories by frequency* orders displayed values by frequency.
2. Set *Bin width* with the slider. Precision scale is set to sensible intervals. *Fitted distribution* fits selected distribution to the plot. Options are [Normal](https://en.wikipedia.org/wiki/Normal_distribution), [Beta](https://en.wikipedia.org/wiki/Beta_distribution), [Gamma](https://en.wikipedia.org/wiki/Gamma_distribution), [Rayleigh](https://en.wikipedia.org/wiki/Rayleigh_distribution), [Pareto](https://en.wikipedia.org/wiki/Pareto_distribution), [Exponential](https://en.wikipedia.org/wiki/Exponential_distribution), [Kernel density](https://en.wikipedia.org/wiki/Kernel_density_estimation).
2. *Fitted distribution* fits selected distribution to the plot. Options are:
- [Normal](https://en.wikipedia.org/wiki/Normal_distribution)
- [Beta](https://en.wikipedia.org/wiki/Beta_distribution)
- [Gamma](https://en.wikipedia.org/wiki/Gamma_distribution)
- [Rayleigh](https://en.wikipedia.org/wiki/Rayleigh_distribution)
- [Pareto](https://en.wikipedia.org/wiki/Pareto_distribution)
- [Exponential](https://en.wikipedia.org/wiki/Exponential_distribution)
- [Kernel density](https://en.wikipedia.org/wiki/Kernel_density_estimation).

Set *Bin width* with the slider. *Smoothing* is enabled for the Kernel density option. *Hide bars* hides bars and shows only the fitted distribution.
3. Columns:

- *Split by* displays value distributions for instances of a certain class.
- *Stack columns* displays one column per bin, colored by proportions of class values.
- *Show probabilities* shows probabilities of class values at selected variable.
- *Show cumulative distribution* cumulatively stacks frequencies.
- *Split by* displays value distributions for instances of a certain class. Set to class target variable by default.
- *Stack columns* displays one column per bin, colored by proportions of class values.
- *Show probabilities* shows probabilities of class values at selected variable.
- *Show cumulative distribution* cumulatively stacks frequencies.

4. If *Apply Automatically* is ticked, changes are communicated automatically. Alternatively, click *Apply*.

For continuous attributes, the attribute values are also displayed as a histogram. It is possible to fit various distributions to the data, for example, a Gaussian kernel density estimation. *Hide bars* hides histogram bars and shows only distribution (old behavior of Distributions).

For this example, we used the *Iris* dataset.

![](images/Distributions-Continuous.png)

In class-less domains, the bars are displayed in blue. We used the *Housing* dataset.
In class-less domains, the bars are displayed in blue. We used the *Housing* dataset for the graph below.

![](images/Distributions-NoClass.png)

Example
-------

A simple example with the *heart-disease* data set is to display **Distributions** after the **File** widget, then select a subset of interest, say female patients. The selection is sent downstream, to **Data Table**, where we can inspect the selection.

![](images/Distributions-NoClass.png)
![](images/Distributions-Example.png)
44 changes: 24 additions & 20 deletions source/widgets/visualize/freeviz.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,40 +16,44 @@ Displays FreeViz projection.

**FreeViz** uses a paradigm borrowed from particle physics: points in the same class attract each other, those from different class repel each other, and the resulting forces are exerted on the anchors of the attributes, that is, on unit vectors of each of the dimensional axis. The points cannot move (are projected in the projection space), but the attribute anchors can, so the optimization process is a hill-climbing optimization where at the end the anchors are placed such that forces are in equilibrium. The button Optimize is used to invoke the optimization process. The result of the optimization may depend on the initial placement of the anchors, which can be set in a circle, arbitrary or even manually. The later also works at any stage of optimization, and we recommend to play with this option in order to understand how a change of one anchor affects the positions of the data points. In any linear projection, projections of unit vector that are very short compared to the others indicate that their associated attribute is not very informative for particular classification task. Those vectors, that is, their corresponding anchors, may be hidden from the visualization using Radius slider in Show anchors box.

![](images/freeviz-zoo-stamped.png)

1. Two initial positions of anchors are possible: random and circular. Optimization moves anchors in an optimal position.
2. Set the color of the displayed points (you will get colors for discrete values and grey-scale points for continuous). Set label, shape and size to differentiate between points. Set symbol size and opacity for all data points.
3. Anchors inside a circle are hidden. Circle radius can be be changed using a slider.
4. Adjust plot properties:
- Set [jittering](https://en.wikipedia.org/wiki/Jitter) to prevent the dots from overlapping (especially for discrete attributes).
![](images/FreeViz-stamped.png)

1. Two initial positions of anchors are possible: random and circular. Optimization moves anchors in an optimal position. If checked, *Gravity* adjusts the tendency of same-colored points to cluster together. Press *Start* to run the optimization.
2. Set the color of the displayed points (you will get colors for discrete values and BGY gradient points for continuous). Set label, shape and size to differentiate between points. *Label only selection and subset* allows you to select individual data instances and label them.
3. Adjust plot properties:
- *Symbol size* changes the size of the points.
- *Opacity* changes the transparency of the points.
- Set [*jittering*](https://en.wikipedia.org/wiki/Jitter) to prevent the dots from overlapping (especially for discrete attributes).
- Anchors inside a circle are hidden. Circle radius can be changed using *Hide radius*.
- *Show color regions* colors the graph by the *Color* attribute (see the screenshot below).
- *Show legend* displays a legend on the right. Click and drag the legend to move it.
- *Show class density* colors the graph by class (see the screenshot below).
- *Label only selected points* allows you to select individual data instances and label them.
5. *Select, zoom, pan and zoom to fit* are the options for exploring the graph. The manual selection of data instances works as an angular/square selection tool. Double click to move the projection. Scroll in or out for zoom.
6. If *Send automatically* is ticked, changes are communicated automatically. Alternatively, press *Send*.
7. *Save Image* saves the created image to your computer in a .svg or .png format.
8. Produce a report.

4. *Select*, *zoom*, *pan* and *zoom to fit* are the options for exploring the graph. The manual selection of data instances works as an angular/square selection tool. Scroll in or out for zoom.
5. If *Send automatically* is ticked, changes are communicated automatically. Alternatively, press *Send*.

Manually move anchors
---------------------

![](images/freeviz-moveanchor.png)
![](images/FreeViz-anchors.png)

One can manually move anchors. Use a mouse pointer and hover above the end of an anchor. Click the left button and then you can move selected anchor where ever you want.

Selection
---------

Selection can be used to manually defined subgroups in the data. Use Shift modifier when selecting data instances to put them into a new group. Shift + Ctrl (or Shift + Cmd on macOs) appends instances to the last group.
Selection can be used to manually define the subgroups in the data. The default tool is *Select*, which selects data instances within the chosen rectangular area. Use Shift modifier when selecting data instances to put them into a new group. Shift + Ctrl (or Shift + Cmd on macOs) appends instances to the last group. Alt removes the chosen instance from the group.

Signal data outputs a data table with an additional column that contains group indices.
*Pan* enables you to move the plot around the pane. With *Zoom* you can zoom in and out of the pane with a mouse scroll, while *Reset zoom* resets the visualization to its optimal size.

The widget outputs a data table with an additional column that contains group indices.

![](images/FreeViz-selection.png)

Explorative Data Analysis
-------------------------
Example
-------

An example of a simple schema, where we observe a projection of the *zoo* dataset. We optimized the plot to place similar points together. Thus we can observe that mammals are distinctly different from other animals.

The **FreeViz**, as the rest of Orange widgets, supports zooming-in and out of part of the plot and a manual selection of data instances. These functions are available in the lower left corner of the widget. The default tool is *Select*, which selects data instances within the chosen rectangular area. *Pan* enables you to move the plot around the pane. With *Zoom* you can zoom in and out of the pane with a mouse scroll, while *Reset zoom* resets the visualization to its optimal size. An example of a simple schema, where we selected data instances from a rectangular region and sent them to the [Data Table](../data/datatable.md) widget, is shown below.
There is an area of the plot that places reptiles and amphibians close together. We have selected the points by dragging a rectangle over them. Next, we can observe the selected instances in a [Data Table](../data/datatable.md). It looks like frogs and worms share some characteristics. What could these be? Use Data Table to find out.

![](images/FreeViz-Example-Explorative.png)
![](images/FreeViz-Example.png)
18 changes: 8 additions & 10 deletions source/widgets/visualize/heatmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,20 @@ Plots a heat map for a pair of attributes.

The widget enables row selection with click and drag. One can zoom in with Ctrl++ (Cmd++) and zoom out with Ctrl+- (Cmd+-). Ctrl+0 (Cmd+0) resets zoom to the extended version, while Ctrl+9 (Cmd+9) reset it to the default.

![](images/HeatMap.png)
![](images/HeatMap-stamped.png)

1. The color pallette. Choose from linear, diverging, color-blind friendly, or other pallettes. **Low** and **High** are thresholds for the color palette (low for attributes with low values and high for attributes with high values). Selecting one of diverging palettes, which have two extreme colors and a neutral (black or white) color at the midpoint, enables an option to set a meaningful mid-point value (default is 0).
1. The color pallette. Choose from linear, diverging, color-blind friendly, or other pallettes. **Range** defines the low or high threshold for the color palette (left for attributes with low values and right for attributes with high values). Selecting one of diverging palettes, which have two extreme colors and a neutral (black or white) color at the midpoint, enables an option to set a meaningful mid-point value (default is 0).
2. Merge rows. If there are too many rows in the visualization, one can merge them with k-means algorithm into N selected clusters (default 50).
3. Cluster columns and rows:
- **None** (lists attributes and rows as found in the dataset)
- **Clustering** (clusters data by similarity with hierarchical clustering on Euclidean distances and with average linkage)
- **Clustering** (clusters data by similarity with hierarchical clustering on Euclidean distances and with Ward linkage)
- **Clustering with ordered leaves** (same as clustering, but it additionally maximizes the sum of similarities of adjacent elements)
4. Split rows or columns by a categorical variable. If the data contains a class variable, rows will be automatically split by class.
5. Set what is displayed in the plot in **Annotation & Legend**.
5. **Annotation & Legend** sets what is displayed in the plot:
- If *Show legend* is ticked, a color chart will be displayed above the map.
- If *Stripes with averages* is ticked, a new line with attribute averages will be displayed on the left.
**Row Annotations** adds annotations to each instance on the right. Color colors the instances with the corresponding value of the selected categorical variable.
**Column Annotations** adds annotation to each variable at the selected position (default is Top). Color colors the columns with the corresponding value of the selected column annotation.
**Row Annotations** adds annotations to each instance on the right. *Color* colors the instances with the corresponding value of the selected categorical variable.
**Column Annotations** adds annotation to each variable at the selected position (default is Top). *Color* colors the columns with the corresponding value of the selected column annotation.
6. If *Keep aspect ratio* is ticked, each value will be displayed with a square (proportionate to the map).
7. If *Send Automatically* is ticked, changes are communicated automatically. Alternatively, click *Send*.

Expand All @@ -39,8 +39,6 @@ Heat map enables some neat plot enhancements. Such options are clustering of row

Row and column clustering is performed independently. Row clustering is computed from Euclidean distances, while column clustering uses Pearson correlation coefficients. Hierarchical clustering is based on the Ward linkage method. Clustering with optimal leaf ordering reorders left and right branches in the dendrogram to minimize the sum of distances between adjacent leaves (Bar-Joseph et al. 2001).



![](images/HeatMap-advanced.png)

Examples
Expand All @@ -52,13 +50,13 @@ The **Heat Map** below displays attribute values for the *brown-selected* data s

Heat map shows low expressions in blue and high expressions in yellow and white. For better organization, we added *Clustering (opt. ordering)* to the columns, which puts columns with similar profiles closer together. In this way we can see the conditions that result in low expressions for ribosomal genes in the lower right corner.

Additionally, the plot is enhanced with row color on the right, showing which class the rows belong to.
We have selected some Proteas encoding genes with high expressions under the spo-mid condition. We can observe which genes these are in a [Data Table](../data/datatable.md).

![](images/HeatMap-Example1.png)

### Sentiment Analysis

Heat maps are great for visualizing any kind of comparable numeric variables, for example sentiment in a collection of documents. We will take *book-excerpts* corpus from the **Corpus** widget and pass it to the **Sentiment Analysis** widget, which computes sentiment scores for each document. The output of sentiment analysis are four columns, positive, negative, and neutral sentiment score, and a compound score that aggregates the previous scores into a single number. Positive compound values (white) represent positive documents, while negative (blue) represent negative documents.
Heat maps are great for visualizing any kind of comparable numeric variables, for example sentiment in a collection of documents. We will take *book-excerpts* corpus from the [Corpus](https://orangedatamining.com/widget-catalog/text-mining/corpus-widget/) widget and pass it to the [Sentiment Analysis](https://orangedatamining.com/widget-catalog/text-mining/sentimentanalysis/) widget, which computes VADER sentiment scores for each document. The output of sentiment analysis are four columns, positive, negative, and neutral sentiment score, and a compound score that aggregates the previous scores into a single number. Positive compound values (white) represent positive documents, while negative (blue) represent negative documents.

We used row clustering to place similar rows closer together, resulting in clear negative and positive groups. Now we can select negative children's books and explore which are they.

Expand Down
Binary file removed source/widgets/visualize/icons/box-plot.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/cn2ruleviewer.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/distributions.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/freeviz.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/heat-map.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/line-plot.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/linear-projection.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/mosaic-display.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/nomogram.png
Binary file not shown.
Binary file not shown.
Binary file removed source/widgets/visualize/icons/pythagorean-tree.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/radviz.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/scatter-map.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/scatter-plot.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/sieve-diagram.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/silhouette-plot.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/tree-viewer.png
Binary file not shown.
Binary file removed source/widgets/visualize/icons/venn-diagram.png
Binary file not shown.
Binary file modified source/widgets/visualize/images/Distributions-Continuous.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
0 305 29
1 305 195
2 305 356
3 305 501
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified source/widgets/visualize/images/Distributions-NoClass.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified source/widgets/visualize/images/FreeViz-selection.png
Binary file modified source/widgets/visualize/images/HeatMap-Example1.png
Binary file modified source/widgets/visualize/images/HeatMap-Example2.png
Binary file modified source/widgets/visualize/images/HeatMap-advanced.png
Binary file removed source/widgets/visualize/images/HeatMap.png
Diff not rendered.
Binary file modified source/widgets/visualize/images/LinePlot-Example.png
Binary file modified source/widgets/visualize/images/LinePlot-stamped.png
Diff not rendered.
Diff not rendered.
Binary file removed source/widgets/visualize/images/Mosaic-Display.png
Diff not rendered.
Binary file removed source/widgets/visualize/images/Nomogram-Example.png
Diff not rendered.
Diff not rendered.
Binary file added source/widgets/visualize/images/Nomogram-LR.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed source/widgets/visualize/images/Radviz-Brown-2.png
Diff not rendered.
Binary file removed source/widgets/visualize/images/Radviz-Brown.png
Diff not rendered.
Binary file modified source/widgets/visualize/images/ScatterPlot-selection.png
Diff not rendered.
Binary file modified source/widgets/visualize/images/ScatterPlotExample-Ranking.png
Binary file modified source/widgets/visualize/images/Scatterplot-ClassDensity.png
Binary file modified source/widgets/visualize/images/Scatterplot-Iris-stamped.png
Binary file modified source/widgets/visualize/images/SieveDiagram-Example.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file modified source/widgets/visualize/images/SieveDiagram-stamped.png
Binary file removed source/widgets/visualize/images/SieveDiagram.png
Diff not rendered.
Binary file modified source/widgets/visualize/images/TreeViewer-classification.png
Binary file modified source/widgets/visualize/images/TreeViewer-regression.png
Binary file modified source/widgets/visualize/images/TreeViewer-selection.png
Binary file modified source/widgets/visualize/images/TreeViewer-stamped.png
Diff not rendered.
Diff not rendered.
13 changes: 6 additions & 7 deletions source/widgets/visualize/lineplot.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,18 @@ Visualization of data profiles (e.g., time series).
- Selected Data: instances selected from the plot
- Data: data with an additional column showing whether a point is selected

[Line plot](https://en.wikipedia.org/wiki/Line_chart) a type of plot which displays the data as a series of points, connected by straight line segments. It only works for numerical data, while categorical can be used for grouping of the data points.
[Line plot](https://en.wikipedia.org/wiki/Line_chart) displays the data as a series of points, connected by straight line segments. It only works for numerical data, while categorical can be used for grouping the data points.

![](images/LinePlot-stamped.png)

1. Information on the input data.
2. Select what you wish to display:
1. Select what you wish to display:
- Lines show individual data instances in a plot.
- Range shows the range of data points between 10th and 90th percentile.
- Mean adds the line for mean value. If group by is selected, means will be displayed per each group value.
- Mean adds the line for mean value. If *Group by* is selected, the means will be displayed per each group value.
- Error bars show the standard deviation of each attribute.
3. Select a categorical attribute to use for grouping of data instances. Use None to show ungrouped data.
4. *Select, zoom, pan and zoom to fit* are the options for exploring the graph. The manual selection of data instances works as a line selection, meaning the data under the selected line plots will be sent on the output. Scroll in or out for zoom. When hovering over an individual axis, scrolling will zoom only by the hovered-on axis (vertical or horizontal zoom).
5. If *Send Automatically* is ticked, changes are communicated automatically. Alternatively, click *Send*.
2. Select a categorical attribute to use for grouping of data instances. Use *None* to show ungrouped data.
3. *Select*, *zoom*, *pan* and *zoom to fit* are the options for exploring the graph. The manual selection of data instances works as a line selection, meaning the data under the selected line plots will be sent on the output. Scroll in or out for zoom. When hovering over an individual axis, scrolling will zoom only by the hovered-on axis (vertical or horizontal zoom).
4. If *Send Automatically* is ticked, changes are communicated automatically. Alternatively, click *Send*.

Example
-------
Expand Down
Loading