Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Apr 13, 2021

Since #16661 it is possible to know the total sizes for some Lucene segment files by using the Node Stats or Indices Stats API with the include_segment_file_sizes parameter, and the list of file extensions has been extended in #71416.

This pull request adds a bit more information about file sizes like the number of files (count), the min, max and average file sizes in bytes that share the same extension. Here is a sample:

"cfs" : {
  "description" : "Compound Files",
  "size_in_bytes" : 2260,
  "min_size_in_bytes" : 2260,
  "max_size_in_bytes" : 2260,
  "average_size_in_bytes" : 2260,
  "count" : 1
}

This pull request also simplifies how compound file sizes were computed: before compound segment files were extracted and sizes aggregated with regular non-compound files sizes (which I find confusing and out of the scope of the original issue #6728), now CFS/CFE files appears as distinct files.

I think that these information give a better view of the segment files and are useful in many cases, specially with searchable snapshots whose segment stats can now be introspected thanks to the include_unloaded_segments parameter.

@tlrx tlrx added >enhancement :Data Management/Stats Statistics tracking and retrieval APIs v8.0.0 v7.13.0 labels Apr 13, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Apr 13, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@tlrx tlrx requested review from jpountz and ywelsch April 13, 2021 17:19
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these information give a better view of the segment files and are useful in many cases, specially with searchable snapshots whose segment stats can now be introspected thanks to the include_unloaded_segments parameter.

agree. LGTM

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to make index stats about files. I think that some users found it useful to extract CFS files because it helped reason about what takes disk space, which hopefully #68508 will address.

@tlrx tlrx merged commit fe4c3e7 into elastic:master Apr 15, 2021
@tlrx tlrx deleted the enhanced-segment-files-stats branch April 15, 2021 08:15
@tlrx
Copy link
Member Author

tlrx commented Apr 15, 2021

Thanks Yannick and Adrien!

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Apr 15, 2021
… APIs (elastic#71643)

Since elastic#16661 it is possible to know the total sizes for some Lucene segment files 
by using the Node Stats or Indices Stats API with the include_segment_file_sizes 
parameter, and the list of file extensions has been extended in elastic#71416.

This commit adds a bit more information about file sizes like the number of files 
(count), the min, max and average file sizes in bytes that share the same extension.

Here is a sample:
"cfs" : {
  "description" : "Compound Files",
  "size_in_bytes" : 2260,
  "min_size_in_bytes" : 2260,
  "max_size_in_bytes" : 2260,
  "average_size_in_bytes" : 2260,
  "count" : 1
}

This commit also simplifies how compound file sizes were computed: before 
compound segment files were extracted and sizes aggregated with regular 
non-compound files sizes (which can be confusing and out of the scope of 
the original issue elastic#6728), now CFS/CFE files appears as distinct files.

These new information are provided to give a better view of the segment 
files and are useful in many cases, specially with frozen searchable snapshots 
whose segment stats can now be introspected thanks to the 
include_unloaded_segments parameter.
@jpountz
Copy link
Contributor

jpountz commented Apr 15, 2021

Is it breaking enough that it should only go in 8.0?

@tlrx
Copy link
Member Author

tlrx commented Apr 15, 2021

Is it breaking enough that it should only go in 8.0?

I think it can go in 7.13.0. The REST response contains more information now but the previous fields are unchanged; the set of files has been already extended in #71416 but existing extensions and descriptions remain the same (just more extensions now); CFS/CFE is computed differently but the memory infos are still the same.

@tlrx
Copy link
Member Author

tlrx commented Apr 15, 2021

@ywelsch do you have an opinion?

@ywelsch
Copy link
Contributor

ywelsch commented Apr 15, 2021

I think this is not breaking for the reasons that you've outlined, so 7.13 is ok

@tlrx
Copy link
Member Author

tlrx commented Apr 15, 2021

Thanks Yannick.

@jpountz unless you disagree I'll move forward with the backport (#71725)

@jpountz
Copy link
Contributor

jpountz commented Apr 15, 2021

@tlrx I trust your judgement, I only wanted to make sure this question had been considered given the change to how CFS files are treated.

tlrx added a commit that referenced this pull request Apr 15, 2021
… Stats APIs (#71725)

Since #16661 it is possible to know the total sizes for some Lucene segment files 
by using the Node Stats or Indices Stats API with the include_segment_file_sizes 
parameter, and the list of file extensions has been extended in #71416.

This commit adds a bit more information about file sizes like the number of files 
(count), the min, max and average file sizes in bytes that share the same extension.

Here is a sample:
"cfs" : {
  "description" : "Compound Files",
  "size_in_bytes" : 2260,
  "min_size_in_bytes" : 2260,
  "max_size_in_bytes" : 2260,
  "average_size_in_bytes" : 2260,
  "count" : 1
}

This commit also simplifies how compound file sizes were computed: before 
compound segment files were extracted and sizes aggregated with regular 
non-compound files sizes (which can be confusing and out of the scope of 
the original issue #6728), now CFS/CFE files appears as distinct files.

These new information are provided to give a better view of the segment 
files and are useful in many cases, specially with frozen searchable snapshots 
whose segment stats can now be introspected thanks to the 
include_unloaded_segments parameter.

Backport of #71643
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Stats Statistics tracking and retrieval APIs >enhancement Team:Data Management Meta label for data/management team v7.13.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants