Defining the scope: features and target audiences #4

krassowski · 2021-10-04T07:33:44Z

It would be good to start by agreeing on a list of features that we believe would be useful for a repository exploration tool to include, and how those may benefit different target audiences. Here is a basic list of feature I think are worth considering:

Exploring the directory structure

For exploring studies: with more files and directories we would like to see
For audience: lost in the forest of folders (everyone)
Context: it's not very common for analysis software to have dozens of nested directories, but it can happen; it might be more frequent for specific languages, but as a rule I think that more experienced software engineers tend to add more tooling configuration into the version control systems (think .github, .vscode, binder, dist, docker), which may lead to novices/non-programmers getting lost and unable to find the information relevant to them easily.

See https://next.github.com/projects/repo-visualization for the GitHub's take on this one.

Exploring the flow of the data

For exploring studies: using multi-step processes scattered across multiple files
For audience: interested in finding out more about a specific step in the process
Context: usually more complex studies include a flowchart describing the data wrangling process; it is however lacking links to the corresponding place in the codebase

Exploring the code structure

For exploring studies: providing code for re-use, or having a substantial chunk of analysis code attached
For audience: end-users = researchers, possibly with good computer literacy but limited programming knowledge to strictly what is required for their analysis, scientific software developers trying to contribute/extend someone else's work, students trying to understand/dissect an algorithm/method
Context: code navigation became easier in recent year on GitHub after introduction of the code-jumping and integration with the in-browser IDE; still navigation there is no tool providing an integrated overview where relevant parts of code can be annotated with the the relevance to the specific sections of associated method/paper, and where a functions/classes dependencies diagram could be interactively explored.

with the flow of data being the main focus of the initial proposal. While a general-purpose tool can be developed with similar features, I would very much like to focus on supporting the academic community, and on allowing to create an output which can be used independent of the tool, e.g. an SVG graph with embedded links which the code author could embed in their repository (like this one generated with https://github.com/krassowski/nbpipeline).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining the scope: features and target audiences #4

Defining the scope: features and target audiences #4

krassowski commented Oct 4, 2021 •

edited

Loading

Defining the scope: features and target audiences #4

Defining the scope: features and target audiences #4

Comments

krassowski commented Oct 4, 2021 • edited Loading

Exploring the directory structure

Exploring the flow of the data

Exploring the code structure

krassowski commented Oct 4, 2021 •

edited

Loading