Skip to content

List of strings (implementation ideas) #322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ivan-pi opened this issue Feb 15, 2021 · 2 comments
Open

List of strings (implementation ideas) #322

ivan-pi opened this issue Feb 15, 2021 · 2 comments
Labels
topic: strings String processing

Comments

@ivan-pi
Copy link
Member

ivan-pi commented Feb 15, 2021

I thought it might be best to create a new issue to discuss alternative implementations for the list of strings proposed in #268. As @arjenmarkus wrote there:

This proposal considers the features that would be useful for this derived type, a list of strings. It does not prescribe the methods for implementing the type. Given the ease by which arrays are handled in Fortran, it is possible to use allocatable arrays, but perhaps a linked list may prove to be more efficient, at least in certain scenarios of use. Therefore the proposal concentrates on the usage.

I thought it might be helpful to expand on this topic using some analogies with the C++ library. In C++, the standard template library provides the special container std::string as a sequence of characters supporting fast random access, and fast insert/delete at back of the container.

In Fortran we can reach similar behavior/usage using the character(len=:), allocatable type:

character(len=:), allocatable :: str

! Automatic allocation upon initialization
str = "Hello"

! "Insertion" at the back
str = str//", world!"    ! "Hello, world!"

! Fast random access
str(8:8) = "W"           ! "Hello, World!"

! "Deletion"
str = str(1:len(str)-1)  ! "Hello, World"

There are of course also very large differences between the C++ std::string and the Fortran allocatable character sequences in terms of memory management and other things. Some of the differences can be inferred from the numerous member functions of std::string for access, modification, and string operations.

Moving one step further to a "lists of strings" in C++ there are several options. For "fixed-size" lists one has:

// Built-in (fixed) size array
std::string colour[4] = { "Blue", "Red",
                              "Orange", "Yellow" };

// Array of strings (safer, and easier to use than the built-in array)
std::array<std::string, 4> colour { "Blue", "Red", "Orange",
                                     "Yellow" };

For dynamic lists of strings the most commonly used option in C++ is

// Vector of strings (the "go to" container)
std::vector<std::string> colour {"Blue", "Red", "Orange"};
// Strings can be added at any time with push_back
colour.push_back("Yellow")

However, the STL library also contains other specialized containers such as: std::deque, std::list, and std::forward_list. Quoting from the textbook C++ Primer:

The idea of having different containers is they offer different performance trade-offs relative to

  • The costs to add or delete elements to the container
  • The costs to perform non-sequential access to elements of the container

The performance aspects of different sequential containers are summarized neatly in this table copied out of the same book:

Container Description
vector Flexible-size array. Supports fast random access.
Inserting or deleting elements other than at the back may be slow.
deque Double-ended queue. Supports fast random access.
Fast insert/delete at front or back.
list Doubly linked list. Supports only bidirectional sequential access.
Fast insert/delete at any point in the list.
forward_list Singly linked list. Supports only sequential access in one direction.
Fast insert/delete at any point in the list.
array Fixed-size array. Supports fast random access.
Cannot add or remove elements.
string A specialized container, similar to vector, that contains characters.
Fast random access. Fast insert/delete at back.

The beauty of the C++ containers is they share many common operations. If you can avoid random access, and instead use only iterators to access the elements you can switch easily between some containers. A table of the supported operations can be found here.

I believe the current implementation in #311 is most similar to the std::vector and uses contiguous storage for the underlying string elements. The comments from @esterjo (#268 (comment) and #241 (comment)) already address the fact that other implementations might be more suitable in some usage cases in order to avoid memory fragmentation or support other operations.

Given that there is not much precedence for such string containers in Fortran, I find it hard to judge which implementation is most suitable for typical Fortran usage cases. Getting some feedback from the community could help guide further (experimental) implementation efforts.

@ivan-pi ivan-pi added the topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... label Feb 15, 2021
@arjenmarkus
Copy link
Member

arjenmarkus commented Feb 16, 2021 via email

@arjenmarkus
Copy link
Member

arjenmarkus commented Feb 16, 2021 via email

@awvwgk awvwgk added topic: strings String processing and removed topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... labels Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: strings String processing
Projects
None yet
Development

No branches or pull requests

3 participants