Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage Collector: Eliminate double slash in URL #3525

Merged
merged 3 commits into from
Jun 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -356,14 +356,17 @@ object GarbageCollector {
println("Expired addresses:")
expiredAddresses.show()

val storageNamespace = new ApiClient(apiURL, accessKey, secretKey).getStorageNamespace(repo)
var storageNamespace = new ApiClient(apiURL, accessKey, secretKey).getStorageNamespace(repo)
if (!storageNamespace.endsWith("/")) {
storageNamespace += "/"
}
arielshaqed marked this conversation as resolved.
Show resolved Hide resolved

val removed =
remove(storageNamespace, gcAddressesLocation, expiredAddresses, runID, region, hcValues)

val commitsDF = getCommitsDF(runID, gcCommitsLocation, spark)
val reportLogsDst = s"${storageNamespace}/_lakefs/logs/gc/summary/"
val reportExpiredDst = s"${storageNamespace}/_lakefs/logs/gc/expired_addresses/"
val reportLogsDst = concatToGCLogsPrefix(storageNamespace, "summary")
val reportExpiredDst = concatToGCLogsPrefix(storageNamespace, "expired_addresses")

val time = DateTimeFormatter.ISO_INSTANT.format(java.time.Clock.systemUTC.instant())
writeParquetReport(commitsDF, reportLogsDst, time, "commits.parquet")
Expand All @@ -375,11 +378,16 @@ object GarbageCollector {
.write
.partitionBy("run_id")
.mode(SaveMode.Overwrite)
.parquet(s"${storageNamespace}/_lakefs/logs/gc/deleted_objects/${time}/deleted.parquet")
.parquet(concatToGCLogsPrefix(storageNamespace, s"deleted_objects/$time/deleted.parquet"))

spark.close()
}

private def concatToGCLogsPrefix(storageNameSpace: String, key: String): String = {
val strippedKey = key.stripPrefix("/")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why do we need this? No call has a leading slash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is not aware of the calls, it sanitizes the input...

s"${storageNameSpace}_lakefs/logs/gc/$strippedKey"
}

private def repartitionBySize(df: DataFrame, maxSize: Int, column: String): DataFrame = {
val nRows = df.count()
val nPartitions = math.max(1, math.ceil(nRows / maxSize)).toInt
Expand Down
6 changes: 3 additions & 3 deletions pkg/graveler/retention/garbage_collection_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ import (
)

const (
configFileSuffixTemplate = "/%s/retention/gc/rules/config.json"
addressesFilePrefixTemplate = "/%s/retention/gc/addresses/"
commitsFileSuffixTemplate = "/%s/retention/gc/commits/run_id=%s/commits.csv"
configFileSuffixTemplate = "%s/retention/gc/rules/config.json"
addressesFilePrefixTemplate = "%s/retention/gc/addresses/"
commitsFileSuffixTemplate = "%s/retention/gc/commits/run_id=%s/commits.csv"
arielshaqed marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +22 to +24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part fixes another bug we found in the GC-Azure integration! double thanks @Jonathan-Rosenberg :)
Up until this fix, while trying to get gc rules we got the following error:

: invalid namespace
500 Internal Server Error

But removing the superfluous "/" fixed it for the following reason -
Before the fix, "/_lakefs/settings/retention/gc/rules/config.json" was parsed without returning an error to

parsedKey, err := url.ParseRequestURI(key)
, leading to an incorrect URL resolution trying to resolve it as a fully qualified URI (that includes a namespace). After the fix, "_lakefs/settings/retention/gc/rules/config.json" is resolved as a non-fully qualified URI.

)

type GarbageCollectionManager struct {
Expand Down