{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":35964690,"defaultBranch":"main","name":"mongo-spark","ownerLogin":"mongodb","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2015-05-20T17:59:42.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/45120?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1724144662.0","currentOid":""},"activityList":{"items":[{"before":"acd200ed24c9a814414df3e49ac1b77b1ee61314","after":"046251ad785f84085d951fe7636c9af10a15dfe1","ref":"refs/heads/main","pushedAt":"2024-08-20T09:04:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Version: bump 10.5.0-SNAPSHOT","shortMessageHtmlLink":"Version: bump 10.5.0-SNAPSHOT"}},{"before":"e2212ae8b156d10a70d62b578c0377b33c738cd0","after":"acd200ed24c9a814414df3e49ac1b77b1ee61314","ref":"refs/heads/main","pushedAt":"2024-08-20T09:04:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Version: bump 10.4.0","shortMessageHtmlLink":"Version: bump 10.4.0"}},{"before":"c4043ae4d56c4f671d38778a776b2a09af116054","after":"e2212ae8b156d10a70d62b578c0377b33c738cd0","ref":"refs/heads/main","pushedAt":"2024-08-20T08:54:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Spotless and gitignore updates","shortMessageHtmlLink":"Spotless and gitignore updates"}},{"before":"9105cf403c4bf16104a1727b0fc31c5621b48ac7","after":"c4043ae4d56c4f671d38778a776b2a09af116054","ref":"refs/heads/main","pushedAt":"2024-07-15T10:58:09.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Added parse mode support when reading data from MongoDB. (#119)\n\nAdds the `mode` configuration allowing for different parsing strategies when handling documents\r\nthat don't match the expected schema during reads.\r\n\r\nThe options are:\r\n\r\n - `FAILFAST` (default) throw an exception when parsing a document that doesn't match the schema.\r\n - `PERMISSIVE` Sets any invalid fields to `null`.\r\n Combine with the `columnNameOfCorruptRecord` configuration if you want to store any invalid documents\r\n as an extended json string.\r\n - `DROPMALFORMED` ignores the whole document.\r\n\r\nAdds the `columnNameOfCorruptRecord` configuration whic extends the `PERMISSIVE` mode. When configured it\r\nsaves the whole invalid document as extended json in that column, as long as its defined in the Schema. Inferred\r\nschemas will add the `columnNameOfCorruptRecord` column if set and the `mode` is `PERMISSIVE`.\r\n\r\nNote: Names derive from existing spark json configurations, from where this feature takes inspiration.\r\n\r\nSPARK-327\r\n\r\nCo-authored-by: Viacheslav Babanin ","shortMessageHtmlLink":"Added parse mode support when reading data from MongoDB. (#119)"}},{"before":"a219bced9f472d724f8d62c69968b4a36cc406fc","after":"9105cf403c4bf16104a1727b0fc31c5621b48ac7","ref":"refs/heads/main","pushedAt":"2024-07-11T09:59:31.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Auto Bucket Partitioner (#120)\n\n* Auto Bucket Partitioner\r\n\r\nA new `$sample` based partitioner that provides support for all collection types. Supports partitioning across single or multiple fields, including nested fields.\r\n\r\nThe logic for the partitioner is as follows:\r\n\r\n- Calculate the number of documents per partition. Runs a `$collStats` aggregation to get the average document size.\r\n- Determines the total count of documents. Uses the `$collStats` count or by running a `countDocuments` query if the user supplies their own `aggregation.pipeline` configuration.\r\n- Determines the number of partitions. Calculated as: `count / number of documents per partition`\r\n- Determines the number of documents to $sample. Calculated as: `samples per partition * number of partitions`.\r\n- Creates the aggregation pipeline to generate the partitions.\r\n ```\r\n [{$match: },\r\n {$sample: },\r\n {$addFields: {: {<'i': '$fieldList[i]' ...>}} // Only added iff fieldList.size() > 1\r\n {$bucketAuto: {\r\n groupBy: ,\r\n buckets: \r\n }\r\n }\r\n ]\r\n ```\r\n\r\nConfigurations:\r\n\r\n- `fieldList`: The field list to be used for partitioning.\r\n Either a single field name or a list of comma separated fields.\r\n Defaults to: \"_id\".\r\n- `chunkSize`: The average size (MB) for each partition.\r\n Note: Uses the average document size to determine the number of documents per partition so\r\n partitions may not be even.\r\n Defaults to: 64.\r\n- `samplesPerPartition`: The number of samples to take per partition.\r\n Defaults to: 10.\r\n- `partitionKeyProjectionField`: The field name to use for a projected field that contains all the\r\n fields used to partition the collection.\r\n Defaults to: \"__idx\".\r\n Recommended to only change if there already is a \"__idx\" field in the document.\r\n\r\nPartitions are calculated as logical ranges. When using sharded clusters these will map closely to ranged chunks.\r\nWhen using with hashed shard keys these logical ranges require broadcast operations.\r\n\r\nSimilar to the SamplePartitioner however uses the $bucketAuto aggregation stage to generate the partition bounds.\r\n\r\nSPARK-356\r\n\r\nCo-authored-by: Valentin Kovalenko ","shortMessageHtmlLink":"Auto Bucket Partitioner (#120)"}},{"before":"7a49664ab7aa9dd98c191c387d4b9fecede18841","after":"a219bced9f472d724f8d62c69968b4a36cc406fc","ref":"refs/heads/main","pushedAt":"2024-07-09T08:47:58.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Ensure that all escaped fields are unescaped correctly (#121)\n\n\r\nSPARK-432","shortMessageHtmlLink":"Ensure that all escaped fields are unescaped correctly (#121)"}},{"before":"a4f590fc043158ec633969b00f288453dd5ba464","after":"7a49664ab7aa9dd98c191c387d4b9fecede18841","ref":"refs/heads/main","pushedAt":"2024-07-01T10:10:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Updated MongoDB Java driver version to latest 5.1.x (#122)\n\nSPARK-431","shortMessageHtmlLink":"Updated MongoDB Java driver version to latest 5.1.x (#122)"}},{"before":"de9753b4dbb57ebd94e54fb3fde29120ef2a48ee","after":"a4f590fc043158ec633969b00f288453dd5ba464","ref":"refs/heads/main","pushedAt":"2024-07-01T10:09:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Update README.md\n\nFix broken Maven search link\r\n\r\nSPARK-427","shortMessageHtmlLink":"Update README.md"}},{"before":"d2d494abfba9d8a82239acc6f41e93349c18f829","after":"de9753b4dbb57ebd94e54fb3fde29120ef2a48ee","ref":"refs/heads/main","pushedAt":"2024-06-25T08:59:57.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Added Schema hints for use when inferring schemas. (#118)\n\n* Added Schema hints for use when inferring schemas.\r\n\r\nAdded a new configuration: `schemaHints`.\r\nUsers can now supply schema to enforce the schema information about known field types when inferring schema.\r\n\r\nSupports the following Spark formats:\r\n - DDL: `value STRING,count INT`\r\n - SQL DDL: `STRUCT`\r\n - JSON:\r\n ```{\"type\":\"struct\",\"fields\":[\r\n {\"name\":\"value\",\"type\":\"string\",\"nullable\":true},\r\n {\"name\":\"count\",\"type\":\"integer\",\"nullable\":true}]}```\r\n\r\nTo create DDL or Json schema strings simply use the Spark shell:\r\n\r\n```\r\nimport org.apache.spark.sql.types._\r\nval mySchema = StructType(Seq(StructField(\"value\", StringType), StructField(\"count\", IntegerType)))\r\n\r\nmySchema.toDDL\r\nmySchema.sql\r\nmySchema.simpleString\r\nmySchema.json\r\n```\r\n\r\nOr in PySpark:\r\n\r\n```\r\nfrom pyspark.sql.types import StructType, StructField, StringType, IntegerType\r\nmySchema = StructType([ StructField('value', StringType(), True), StructField('count', IntegerType(), True)])\r\n\r\nmySchema.simpleString()\r\nmySchema.json()\r\n```\r\n\r\nSPARK-365\r\n\r\n---------\r\n\r\nCo-authored-by: Viacheslav Babanin ","shortMessageHtmlLink":"Added Schema hints for use when inferring schemas. (#118)"}},{"before":"66fa1ba205497098e91aa59d1114dd4641013c66","after":"d2d494abfba9d8a82239acc6f41e93349c18f829","ref":"refs/heads/main","pushedAt":"2024-06-12T10:04:34.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Remove duplicate test","shortMessageHtmlLink":"Remove duplicate test"}},{"before":"2bd64f15da89e521077d2e76aa9d02073975f103","after":null,"ref":"refs/tags/r10.3.0","pushedAt":"2024-05-01T15:54:31.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"}},{"before":"2bd64f15da89e521077d2e76aa9d02073975f103","after":"66fa1ba205497098e91aa59d1114dd4641013c66","ref":"refs/heads/main","pushedAt":"2024-05-01T13:50:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Version: bump 10.4.0-SNAPSHOT","shortMessageHtmlLink":"Version: bump 10.4.0-SNAPSHOT"}},{"before":"6e60a0e0721952a3c22c0e32d3f24225454ce32e","after":"d8b296773c8f59322c44cc1bbc1fbdc0d5d86657","ref":"refs/heads/main","pushedAt":"2024-05-01T13:40:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"gitignore update","shortMessageHtmlLink":"gitignore update"}},{"before":"382f490988a76d83e24b86b2a61ed2c97dcb0f71","after":"6e60a0e0721952a3c22c0e32d3f24225454ce32e","ref":"refs/heads/main","pushedAt":"2024-04-30T15:14:14.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Ensure correct support for data types (#115)\n\nAdd round trip tests for data types\r\n\r\nSPARK-386\r\n\r\n---------\r\n\r\nCo-authored-by: Valentin Kovalenko ","shortMessageHtmlLink":"Ensure correct support for data types (#115)"}},{"before":"f3990a850cef00b2cb0115dcf47c4a3bb54944e7","after":"cc86377b22ab8b31c271e53d2de47d036dcc78cc","ref":"refs/heads/10.2.x","pushedAt":"2024-04-19T08:23:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Build: Version 10.2.4-SNAPSHOT","shortMessageHtmlLink":"Build: Version 10.2.4-SNAPSHOT"}},{"before":"17b8d877a29137acd93081963806ea504af318a4","after":"f3990a850cef00b2cb0115dcf47c4a3bb54944e7","ref":"refs/heads/10.2.x","pushedAt":"2024-04-18T12:30:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Build: Version 10.2.3","shortMessageHtmlLink":"Build: Version 10.2.3"}},{"before":"eb7520d4afe7456f39cfb8cfff8e3f09f582f3a9","after":"17b8d877a29137acd93081963806ea504af318a4","ref":"refs/heads/10.2.x","pushedAt":"2024-04-18T10:27:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Build: version 10.2.3-SNAPSHOT","shortMessageHtmlLink":"Build: version 10.2.3-SNAPSHOT"}},{"before":"8f76c4ec4f1ab70409de745660457779d5385ac0","after":"eb7520d4afe7456f39cfb8cfff8e3f09f582f3a9","ref":"refs/heads/10.2.x","pushedAt":"2024-04-18T09:53:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Fix convertJson configuration parsing (#117)\n\n\r\nSPARK-425","shortMessageHtmlLink":"Fix convertJson configuration parsing (#117)"}},{"before":"d8fad9ac4c7af59c3d87875763b91d890bfafc7c","after":"382f490988a76d83e24b86b2a61ed2c97dcb0f71","ref":"refs/heads/main","pushedAt":"2024-04-18T09:50:38.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Fix convertJson configuration parsing (#117)\n\n\r\nSPARK-425","shortMessageHtmlLink":"Fix convertJson configuration parsing (#117)"}},{"before":"6f93bc29dbc1ed4fbeff70d95395c541808ffb41","after":null,"ref":"refs/heads/SPARK-386","pushedAt":"2024-04-18T09:44:44.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"}},{"before":null,"after":"6f93bc29dbc1ed4fbeff70d95395c541808ffb41","ref":"refs/heads/SPARK-386","pushedAt":"2024-04-18T09:44:32.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Fix timezone normalization in test","shortMessageHtmlLink":"Fix timezone normalization in test"}},{"before":"07c4dc488e02ffea02dabb3930f8af5cc68c1da9","after":"d8fad9ac4c7af59c3d87875763b91d890bfafc7c","ref":"refs/heads/main","pushedAt":"2024-04-17T16:44:31.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Fix handling of empty strings when parsing (#116)\n\n\r\nSPARK-421","shortMessageHtmlLink":"Fix handling of empty strings when parsing (#116)"}},{"before":"9b513639ab6785664e2813daa68b87587f731b24","after":"07c4dc488e02ffea02dabb3930f8af5cc68c1da9","ref":"refs/heads/main","pushedAt":"2024-04-16T08:33:02.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Provide Spark 3.1 - 3.5 support (#114)\n\nEvergreen: Updated to include Mongo 7.0 and run on ubuntu 20.04\r\nEvergreen: Updated Spark versions to latest patch releases\r\nAdded Spark 3.1 - 3.5 Support\r\n\r\nSPARK-413","shortMessageHtmlLink":"Provide Spark 3.1 - 3.5 support (#114)"}},{"before":"6a8ae2bd7156b86fa76a022f81891bdef6c2fabc","after":"9b513639ab6785664e2813daa68b87587f731b24","ref":"refs/heads/main","pushedAt":"2024-04-12T11:21:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Added explict test for RowToBsonDocumentConverter handling of nulls","shortMessageHtmlLink":"Added explict test for RowToBsonDocumentConverter handling of nulls"}},{"before":null,"after":"8f76c4ec4f1ab70409de745660457779d5385ac0","ref":"refs/heads/10.2.x","pushedAt":"2024-04-12T09:22:38.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Build: Version 10.2.2","shortMessageHtmlLink":"Build: Version 10.2.2"}},{"before":"8770ce8e9d548ec5cf638b295f79785b6cd138eb","after":null,"ref":"refs/heads/10.2.x","pushedAt":"2024-04-12T09:20:05.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"}},{"before":null,"after":"8770ce8e9d548ec5cf638b295f79785b6cd138eb","ref":"refs/heads/10.2.x","pushedAt":"2024-04-12T09:19:12.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"temp","shortMessageHtmlLink":"temp"}},{"before":"d60d04bc1dc85ba401f4dbacd154aa88ad536a47","after":"6a8ae2bd7156b86fa76a022f81891bdef6c2fabc","ref":"refs/heads/main","pushedAt":"2024-04-03T13:00:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Fix checkstyle import * in SimpleMongoConfigTest","shortMessageHtmlLink":"Fix checkstyle import * in SimpleMongoConfigTest"}},{"before":"d2725457c3144081784142acc4c32033c661b826","after":"d60d04bc1dc85ba401f4dbacd154aa88ad536a47","ref":"refs/heads/main","pushedAt":"2024-04-03T12:56:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Remove test fixed in pr #112","shortMessageHtmlLink":"Remove test fixed in pr #112"}},{"before":"993834f02eae58b900ac6dc01d70be8ffe98891a","after":"d2725457c3144081784142acc4c32033c661b826","ref":"refs/heads/main","pushedAt":"2024-04-03T12:54:35.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"rozza","name":"Ross Lawley","path":"/rozza","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420?s=80&v=4"},"commit":{"message":"Added license header to SimpleMongoConfigTest","shortMessageHtmlLink":"Added license header to SimpleMongoConfigTest"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0yMFQwOTowNDoyOC4wMDAwMDBazwAAAASe4STM","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wNC0wM1QxMjo1NDozNS4wMDAwMDBazwAAAAQmqBAz"}},"title":"Activity ยท mongodb/mongo-spark"}