forked from apache/pig
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGES.txt
4025 lines (2177 loc) · 144 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Pig Change Log
Release 0.12.1 (unreleased changes)
IMPROVEMENTS
PIG-3529: Upgrade HBase dependency from 0.95-SNAPSHOT to 0.96 (jarcec via daijy)
PIG-3552: UriUtil used by reducer estimator should support viewfs (amatsukawa via aniket486)
PIG-3549: Print hadoop jobids for failed, killed job (aniket486)
PIG-3047: Check the size of a relation before adding it to distributed cache in Replicated join (aniket486)
PIG-3480: TFile-based tmpfile compression crashes in some cases (dvryaboy via aniket486)
BUG FIXES
PIG-3827: Custom partitioner is not picked up with secondary sort optimization (daijy)
PIG-3826: Outer join with PushDownForEachFlatten generates wrong result (daijy)
PIG-3820: TestAvroStorage fail on some OS (daijy)
PIG-3818: PIG-2499 is accidently reverted (daijy)
PIG-3516: pig does not bring in joda-time as dependency in its pig-template.xml (daijy)
PIG-3753: LOGenerate generates null schema (daijy)
PIG-3782: PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment (knoguchi via daijy)
PIG-3779: Assert constructs ConstantExpression with null when no comment is given (thedatachef via cheolsoo)
PIG-3777: Pig 12.0 Documentation (karinahauser via daijy)
PIG-3774: Piggybank Over UDF get wrong result (daijy)
PIG-3657: New partition filter extractor fails with NPE (cheolsoo)
PIG-3347: Store invocation brings side effect (daijy)
PIG-3670: Fix assert in Pig script (daijy)
PIG-3741: Utils.setTmpFileCompressionOnConf can cause side effect for SequenceFileInterStorage (aniket486)
PIG-3641: Split "otherwise" producing incorrect output when combined with ColumnPruning (knoguchi)
PIG-3677: ConfigurationUtil.getLocalFSProperties can return an inconsistent property set (rohini)
PIG-3621: Python Avro library can't read Avros made with builtin AvroStorage (rusell.jurney via cheolsoo)
PIG-3592: Should not try to create success file for non-fs schemes like hbase (rohini)
PIG-3572: Fix all unit test for during build pig with Hadoop 2.X on Windows (ssvinarchukhorton via daijy)
PIG-2629: Wrong Usage of Scalar which is null causes high namenode operation (rohini)
PIG-3593: Import jython standard module fail on cluster (daijy)
PIG-3576: NPE due to PIG-3549 when job never gets submitted (lbendig via cheolsoo)
PIG-3567: LogicalPlanPrinter throws OOM for large scripts (aniket486)
PIG-3579: pig.script's deserialized version does not maintain line numbers (jgzhang via aniket486)
PIG-3570: Rollback PIG-3060 (daijy)
PIG-3530: Some e2e tests is broken due to PIG-3480 (daijy)
PIG-3492: ColumnPrune dropping used column due to LogicalRelationalOperator.fixDuplicateUids changes not propagating (knoguchi via daijy)
PIG-3325: Adding a tuple to a bag is slow (dvryaboy via aniket486)
PIG-3512: Reducer estimater is broken by PIG-3497 (daijy)
PIG-3510: New filter extractor fails with more than one filter statement (aniket486 via cheolsoo)
Release 0.12.0
INCOMPATIBLE CHANGES
PIG-3082: outputSchema of a UDF allows two usages when describing a Tuple schema (jcoveney)
PIG-3191: [piggybank] MultiStorage output filenames are not sortable (Danny Antonelli via jcoveney)
PIG-3174: Remove rpm and deb artifacts from build.xml (gates)
IMPROVEMENTS
PIG-3503: More document for Pig 0.12 new features (daijy)
PIG-3445: Make Parquet format available out of the box in Pig (lbendig via aniket486)
PIG-3483: Document ASSERT keyword (aniket486 via daijy)
PIG-3470: Print configuration variables in grunt (lbendig via daijy)
PIG-3493: Add max/min for datetime (tyro89 via daijy)
PIG-3479: Fix BigInt, BigDec, Date serialization. Improve perf of PigNullableWritable deserilization (dvryaboy)
PIG-3461: Rewrite PartitionFilterOptimizer to make it work for all the cases (aniket486)
PIG-2417: Streaming UDFs - allow users to easily write UDFs in scripting languages with no
JVM implementation. (jeremykarn via daijy)
PIG-3199: Provide a method to retriever name of loader/storer in PigServer (prkommireddi via daijy)
PIG-3367: Add assert keyword (operator) in pig (aniket486)
PIG-3235: Avoid extra byte array copies in streaming (rohini)
PIG-3065: pig output format/committer should support recovery for hadoop 0.23 (daijy)
PIG-3390: Make pig working with HBase 0.95 (jarcec via daijy)
PIG-3431: Return more information for parsing related exceptions. (jeremykarn via daijy)
PIG-3430: Add xml format for explaining MapReduce Plan. (jeremykarn via daijy)
PIG-3048: Add mapreduce workflow information to job configuration (billie.rinaldi via daijy)
PIG-3436: Make pigmix run with Hadoop2 (rohini)
PIG-3424: Package import list should consider class name as is first even if -Dudf.import.list is passed (rohini)
PIG-3204: Change script parsing to parse entire script instead of line by line (rohini)
PIG-3359: Register Statements and Param Substitution in Macros (jpacker via cheolsoo)
PIG-3182: Pig currently lacks functions to trim the whitespace only on one hand side (sarutak via cheolsoo)
PIG-3163: Pig current releases lack a UDF endsWith. This UDF tests if a given string ends with the specified suffix (sriramkrishnan via cheolsoo)
PIG-3015: Rewrite of AvroStorage (jadler via cheolsoo)
PIG-3361: Improve Hadoop version detection logic for Pig unit test (daijy)
PIG-3280: Document IN operator and CASE expression (cheolsoo)
PIG-3342: Allow conditions in case statement (cheolsoo)
PIG-3327: Pig hits OOM when fetching task reports (rohini)
PIG-3336: Change IN operator to use or-expressions instead of EvalFunc (cheolsoo)
PIG-3339: Move pattern compilation in ToDate as a static variable (rohini)
PIG-3332: Upgrade Avro dependency to 1.7.4 (nielsbasjes via cheolsoo)
PIG-3307: Refactor physical operators to remove methods parameters that are always null (julien)
PIG-3317: disable optimizations via pig properties (traviscrawford via billgraham)
PIG-3321: AVRO: Support user specified schema on load (harveyc via rohini)
PIG-2959: Add a pig.cmd for Pig to run under Windows (daijy)
PIG-3311: add pig-withouthadoop-h2 to mvn-jar (julien)
PIG-2873: Converting bin/pig shell script to python (vikram.dixit via daijy)
PIG-3308: Storing data in hive columnar rc format (maczech via daijy)
PIG-3303: add hadoop h2 artifact to publications in ivy.xml (julien)
PIG-3169: Remove intermediate data after a job finishes (mwagner via cheolsoo)
PIG-3173: Partition filter push down does not happen when partition keys condition include a AND and OR construct (rohini)
PIG-2786: enhance Pig launcher script wrt. HBase/HCat integration (ndimiduk via daijy)
PIG-3198: Let users use any function from PigType -> PigType as if it were builtlin (jcoveney)
PIG-3268: Case statement support (cheolsoo)
PIG-3269: In operator support (cheolsoo)
PIG-200: Pig Performance Benchmarks (daijy)
PIG-3261: User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not
appended (qwertymaniac via daijy)
PIG-3141: Giving CSVExcelStorage an option to handle header rows (jpacker via cheolsoo)
PIG-3217: Add support for DateTime type in Groovy UDFs (herberts via daijy)
PIG-3218: Add support for biginteger/bigdecimal type in Groovy UDFs (herberts via daijy)
PIG-3248: Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha (daijy)
PIG-3235: Add log4j.properties for unit tests (cheolsoo)
PIG-3236: parametrize snapshot and staging repo id (gkesavan via daijy)
PIG-3244: Make PIG_HOME configurable (robert.schooley@gmail.com via daijy)
PIG-3233: Deploy a Piggybank Jar (njw45 via cheolsoo)
PIG-3245: Documentation about HBaseStorage (Daisuke Kobayashi via cheolsoo)
PIG-3211: Allow default Load/Store funcs to be configurable (prkommireddi via cheolsoo)
PIG-3136: Introduce a syntax making declared aliases optional (jcoveney via cheolsoo)
PIG-3142: [piggybank] Fixed-width load and store functions for the Piggybank (jpacker via cheolsoo)
PIG-3162: PigTest.assertOutput doesn't allow non-default delimiter (dreambird via cheolsoo)
PIG-3002: Pig client should handle CountersExceededException (jarcec via billgraham)
PIG-3189: Remove ivy/pig.pom and improve build mvn targets (billgraham)
PIG-3192: Better call to action to download Pig in docs (rjurney via jcoveney)
PIG-3167: Job stats are printed incorrectly for map-only jobs (Mark Wagner via jcoveney)
PIG-3131: Document PluckTuple UDF (rjurney via jcoveney)
PIG-3098: Add another test for the self join case (jcoveney)
PIG-3129: Document syntax to refer to previous relation (rjurney via jcoveney)
PIG-2553: Pig shouldn't allow attempts to write multiple relations into same directory (prkommireddi via cheolsoo)
PIG-3179: Task Information Header only prints out the first split for each task (knoguchi via rohini)
PIG-3108: HBaseStorage returns empty maps when mixing wildcard with other columns (christoph.bauer via billgraham)
PIG-3178: Print a stacktrace when ExecutableManager hits an OOM (knoguchi via rohini)
PIG-3160: GFCross uses unnecessary loop (sandyr via cheolsoo)
PIG-3138: Decouple PigServer.executeBatch() from compilation of batch (pkommireddi via cheolsoo)
PIG-2878: Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This
check is case insensitive. (shami via gates)
PIG-2994: Grunt shortcuts (prasanth_j via cheolsoo)
PIG-3140: Document PigProgressNotificationListener configs (billgraham)
PIG-3139: Document reducer estimation (billgraham)
PIG-2764: Add a biginteger and bigdecimal type to pig (jcoveney)
PIG-3073: POUserFunc creating log spam for large scripts (jcoveney)
PIG-3124: Push FLATTENs After FILTERs If Possible (nwhite via daijy)
PIG-3086: Allow A Prefix To Be Added To URIs In PigUnit Tests (nwhite via gates)
PIG-3091: Make schema, header and stats file configurable in JsonMetadata (pkommireddi via jcoveney)
PIG-3078: Make a UDF that, given a string, returns just the columns prefixed by that string (jcoveney)
PIG-3090: Introduce a syntax to be able to easily refer to the previously defined relation (jcoveney)
PIG-3057: Make PigStorage.readField() protected (pablomar and billgraham via billgraham)
PIG-2788: improved string interpolation of variables (jcoveney)
PIG-2362: Rework Ant build.xml to use macrodef instead of antcall (azaroth via cheolsoo)
PIG-2857: Add a -tagPath option to PigStorage (prkommireddi via cheolsoo)
PIG-2341: Need better documentation on Pig/HBase integration (jthakrar and billgraham via billgraham)
PIG-3075: Allow AvroStorage STORE Operations To Use Schema Specified By URI (nwhite via cheolsoo)
PIG-3062: Change HBaseStorage to permit overriding pushProjection (billgraham)
PIG-3016: Modernize more tests (jcoveney via cheolsoo)
PIG-2582: Store size in bytes (not mbytes) in ResourceStatistics (prkommireddi via billgraham)
PIG-3006: Modernize a chunk of the tests (jcoveney via cheolsoo)
PIG-2997: Provide a convenience constructor on PigServer that accepts Configuration (prkommireddi via rohini)
PIG-2933: HBaseStorage is using setScannerCaching which is deprecated (prkommireddi via rohini)
PIG-2881: Add SUBTRACT eval function (jocosti via cheolsoo)
PIG-3004: Improve exceptions messages when a RuntimeException is raised in Physical Operators (julien)
PIG-2990: the -secretDebugCmd shouldn't be a secret and should just be...a command (jcoveney)
PIG-2941: Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices (jgordon via azaroth)
PIG-2778: Add 'matches' operator to predicate pushdown (cheolsoo via jcoveney)
PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not set (cheolsoo via sms)
PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates)
PIG-2910: Add function to read schema from outout of Schema.toString() (initialcontext via thejas)
OPTIMIZATIONS
PIG-3395: Large filter expression makes Pig hang (cheolsoo)
PIG-3123: Simplify Logical Plans By Removing Unneccessary Identity Projections (njw45 via cheolsoo)
PIG-3013: BinInterSedes improve chararray sort performance (rohini)
BUG FIXES
PIG-3504: Fix e2e Describe_cmdline_12 (cheolsoo via daijy)
PIG-3128: Document the BigInteger and BigDecimal data type (daijy via cheolsoo)
PIG-3497: JobControlCompiler should only do reducer estimation when the job has a reduce phase (amatsukawa via aniket486)
PIG-3495: Streaming udf e2e tests failures on Windows (daijy)
PIG-3494: Several fixes for e2e tests (daijy)
PIG-3292: Logical plan invalid state: duplicate uid in schema during self-join to get cross product (cheolsoo via daijy)
PIG-3491: Fix e2e failure Jython_Diagnostics_4 (daijy)
PIG-3114: Duplicated macro name error when using pigunit (daijy)
PIG-3370: Add New Reserved Keywords To The Pig Docs (cheolsoo)
PIG-3487: Fix syntax errors in nightly.conf (arpitgupta via daijy)
PIG-3458: ScalarExpression lost with multiquery optimization (knoguchi)
PIG-3360: Some intermittent negative e2e tests fail on hadoop 2 (daijy)
PIG-3468: PIG-3123 breaks e2e test Jython_Diagnostics_2 (daijy)
PIG-3466: Race Conditions in InternalDistinctBag during proactive spill (cheolsoo)
PIG-3454: Update JsonLoader/JsonStorage (tyro89 via daijy)
PIG-3333: Fix remaining Windows core unit test failures (daijy)
PIG-3426: Add support for removing s3 files (jeremykarn via daijy)
PIG-3349: Document ToString(Datetime, String) UDF (cheolsoo)
PIG-3374: CASE and IN fail when expression includes dereferencing operator (cheolsoo)
PIG-2606: union/ join operations are not accepting same alias as multiple inputs (hsubramaniyan via daijy)
PIG-3379: Alias reuse in nested foreach causes PIG script to fail (xuefuz via daijy)
PIG-3432: typo in log message in SchemaTupleFrontend (epishkin via cheolsoo)
PIG-3410: LimitOptimizer is applied before PartitionFilterOptimizer (aniket486)
PIG-3405: Top UDF documentation indicates improper use (aniket486 via cheolsoo)
PIG-3425: Hive jdo api jar referenced in pig script throws error (deepesh via cheolsoo)
PIG-3422: AvroStorage failed to read paths separated by commas (yuanlid via rohini)
PIG-3420: Failed to retrieve map values from data loaded by AvroStorage (yuanlid via rohini)
PIG-3414: QueryParserDriver.parseSchema(String) silently returns a wrong result when a comma is missing in the schema definition (cheolsoo)
PIG-3412: jsonstorage breaks when tuple does not have as many columns as schema (aesilberstein via cheolsoo)
PIG-3243: Documentation error (sarutak via cheolsoo)
PIG-3210: Pig fails to start when it cannot write log to log files (mengsungwu via cheolsoo)
PIG-3392: Document STARTSWITH and ENDSWITH UDFs (sriramkrishnan via cheolsoo)
PIG-3393: STARTSWITH udf doesn't override outputSchema method (sriramkrishnan via cheolsoo)
PIG-3389: "Set job.name" does not work with dump command (cheolsoo)
PIG-3387: Miss spelling in test code "TestBuiltin.java" (sarutak via cheolsoo)
PIG-3384: Missing negation in UDF doc sample code (ddamours via cheolsoo)
PIG-3369: unit test TestImplicitSplitOnTuple.testImplicitSplitterOnTuple failed when using hadoopversion=23 (dreambird via cheolsoo)
PIG-3375: CASE does not preserve the order of when branches (cheolsoo)
PIG-3364: Case expression fails with an even number of when branches (cheolsoo)
PIG-3354: UDF example does not handle nulls (patc888 via daijy)
PIG-3355: ColumnMapKeyPrune bug with distinct operator (jeremykarn via aniket486)
PIG-3318: AVRO: 'default value' not honored when merging schemas on load with AvroStorage (viraj via rohini)
PIG-3250: Pig dryrun generates wrong output in .expanded file for 'SPLIT....OTHERWISE...' command (dreambird via cheolsoo)
PIG-3331: Default values not stored in avro file when using specific schemas during store in AvroStorage (viraj via rohini)
PIG-3322: AvroStorage give NPE on reading file with union as top level schema (viraj via rohini)
PIG-2828: Handle nulls in DataType.compare (aniket486)
PIG-3335: TestErrorHandling.tesNegative7 fails on MR2 (xuefuz)
PIG-3316: Pig failed to interpret DateTime values in some special cases (xuefuz)
PIG-2956: Invalid cache specification for some streaming statement (daijy)
PIG-3310: ImplicitSplitInserter does not generate new uids for nested schema fields, leading to miscomputations (cstenac via daijy)
PIG-3334: Fix Windows piggybank unit test failures (daijy)
PIG-3337: Fix remaining Window e2e tests (daijy)
PIG-3328: DataBags created with an initial list of tuples don't get registered as spillable (mwagner via daijy)
PIG-3313: pig job hang if the job tracker is bounced during execution (yu.chenjie via daijy)
PIG-3297: Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc (nielsbasjes via cheolsoo)
PIG-3069: Native Windows Compatibility for Pig E2E Tests and Harness (anthony.murphy via daijy)
PIG-3291: TestExampleGenerator fails on Windows because of lack of file name escaping (dwann via daijy)
PIG-3026: Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences (dwann via daijy)
PIG-3025: TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification (dwann via daijy)
PIG-2955: Fix bunch of Pig e2e tests on Windows (daijy)
PIG:3302: JSONStorage throws NPE if map has null values (rohini)
PIG-3309: TestJsonLoaderStorage fails with IBM JDK 6/7 (lrangel via daijy)
PIG-3097: HiveColumnarLoader doesn't correctly load partitioned Hive table (maczech via daijy)
PIG-3305: Infinite loop when input path contains empty partition directory (maczech via daijy)
PIG-3286: TestPigContext.testImportList fails in trunk (cheolsoo)
PIG-2970: Nested foreach getting incorrect schema when having unrelated inner query (daijy)
PIG-3304: XMLLoader in piggybank does not work with inline closed tags (aseldawy via daijy)
PIG-3028: testGrunt dev test needs some command filters to run correctly without cygwin (jgordon via gates)
PIG-3290: TestLogicalPlanBuilder.testQuery85 fail in trunk (daijy)
PIG-3027: pigTest unit test needs a newline filter for comparisons of golden multi-line (jgordon via gates)
PIG-2767: Pig creates wrong schema after dereferencing nested tuple fields (daijy)
PIG-3276: change the default value for hcat.bin to hcat instead of /usr/local/hcat/bin/hcat (arpitgupta via daijy)
PIG-3277: fix the path to the benchmarks file in the print statement (arpitgupta via daijy)
PIG-3122: Operators should not implicitly become reserved keywords (jcoveney via cheolsoo)
PIG-3193: Fix "ant docs" warnings (cheolsoo)
PIG-3186: tar/deb/pkg ant targets should depend on piggybank (lbendig via gates)
PIG-3270: Union onschema failing at runtime when merging incompatible types (knoguchi via daijy)
PIG-3271: POSplit ignoring error from input processing giving empty results (knoguchi via daijy)
PIG-2265: Test case TestSecondarySort failure (daijy)
PIG-3060: FLATTEN in nested foreach fails when the input contains an empty bag (daijy)
PIG-3249: Pig startup script prints out a wrong version of hadoop when using fat jar (prkommireddi via daijy)
PIG-3110: pig corrupts chararrays with trailing whitespace when converting them to long (prkommireddi via daijy)
PIG-3253: Misleading comment w.r.t getSplitIndex() method in PigSplit.java (cheolsoo)
PIG-3208: [zebra] TFile should not set io.compression.codec.lzo.buffersize (ekoontz via daijy)
PIG-3172: Partition filter push down does not happen when there is a non partition key map column filter (rohini)
PIG-3205: Passing arguments to python script does not work with -f option (rohini)
PIG-3239: Unable to return multiple values from a macro using SPLIT (dreambird via cheolsoo)
PIG-3077: TestMultiQueryLocal should not write in /tmp (dreambird via cheolsoo)
PIG-3081: Pig progress stays at 0% for the first job in hadoop 23 (rohini)
PIG-3150: e2e Scripting_5 fails in trunk (dreambird via cheolsoo)
PIG-3153: TestScriptUDF.testJavascriptExampleScript fails in trunk (cheolsoo)
PIG-3145: Parameters in core-site.xml and mapred-site.xml are not correctly substituted (cheolsoo)
PIG-3135: HExecutionEngine should look for resources in user passed Properties (prkommireddi via cheolsoo)
PIG-3200: MiniCluster should delete hadoop-site.xml on shutDown (prkommireddi via cheolsoo)
PIG-3158: Errors in the document "Control Structures" (miyakawataku via cheolsoo)
PIG-3161: Update reserved keywords in Pig docs (russell.jurney via cheolsoo)
PIG-3156: TestSchemaTuple fails in trunk (cheolsoo)
PIG-3155: TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3 fails in trunk (cheolsoo)
PIG-3154: TestPackage.testOperator fails in trunk (dreambird via cheolsoo)
PIG-3168: TestMultiQueryBasic.testMultiQueryWithSplitInMapAndMultiMerge fails in trunk (cheolsoo)
PIG-3137: Fix Piggybank test to not using /tmp dir (dreambird via cheolsoo)
PIG-3149: e2e build.xml still refers to jython 2.5.0 jar even though it's replaced by jython standalone 2.5.2 jar (cheolsoo)
PIG-2266: Bug with input file joining optimization in Pig (jadler via cheolsoo)
PIG-2645: PigSplit does not handle the case where SerializationFactory returns null (shami via gates)
PIG-3031: Update Pig to use a newer version of joda-time (zjshen via cheolsoo)
PIG-3071: Update hcatalog jar and path to hbase storage handler jar in pig script (arpitgupta via cheolsoo)
PIG-3029 TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution (jgordon via gates)
PIG-3120: setStoreFuncUDFContextSignature called with null signature (jdler via cheolsoo)
PIG-3115: Distinct Build-in Function Doesn't Handle Null Bags (njw45 via daijy)
PIG-2433: Jython import module not working if module path is in classpath (rohini)
PIG-2769 a simple logic causes very long compiling time on pig 0.10.0 (nwhite via gates)
PIG-2251: PIG leaks Zookeeper connections when using HBaseStorage (jamarkha via cheolsoo)
PIG-3112: Errors and lacks in document "User Defined Functions" (miyakawataku via cheolsoo)
PIG-3050: Fix FindBugs multithreading warnings (cheolsoo)
PIG-3066: Fix TestPigRunner in trunk (cheolsoo)
PIG-3101: Increase io.sort.mb in YARN MiniCluste (cheolsoo)
PIG-3100: If a .pig_schema file is present, can get an index out of bounds error (jcoveney)
PIG-3096: Make PigUnit thread safe (cheolsoo)
PIG-3095: "which" is called many, many times for each Pig STREAM statement (nwhite via cheolsoo)
PIG-3085: Errors and lacks in document "Built In Functions" (miyakawataku via cheolsoo)
PIG-3084: Improve exceptions messages in POPackage (julien)
PIG-3072: Pig job reporting negative progress (knoguchi via rohini)
PIG-3014: CurrentTime() UDF has undesirable characteristics (jcoveney via cheolsoo)
PIG-2924: PigStats should not be assuming all Storage classes to be file-based storage (cheolsoo)
PIG-3046: An empty file name in -Dpig.additional.jars throws an error (prkommireddi via cheolsoo)
PIG-2989: Illustrate for Rank Operator (xalan via gates)
PIG-2885: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3 (cheolsoo via sms)
PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24 (cheolsoo via dvryaboy)
Release 0.11.2 (Unreleased)
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-3380: Fix e2e test float precision related test failures when run with -Dpig.exec.mapPartAgg=true (anlilin via rohini)
OPTIMIZATIONS
PIG-2769: a simple logic causes very long compiling time on pig 0.10.0 (njw45 via dvryaboy) (prev. applied to 0.12)
BUG FIXES
PIG-3455: Pig 0.11.1 OutOfMemory error (rohini)
PIG-3435: Custom Partitioner not working with MultiQueryOptimizer (knoguchi via daijy)
PIG-3385: DISTINCT no longer uses custom partitioner (knoguchi via daijy)
PIG-2507: Semicolon in paramenters for UDF results in parsing error (tnachen via daijy)
PIG-3341: Strict datetime parsing and improve performance of loading datetime values (rohini)
PIG-3329: RANK operator failed when working with SPLIT (xalan via cheolsoo)
PIG-3345: Handle null in DateTime functions (rohini)
PIG-3223: AvroStorage does not handle comma separated input paths (dreambird via rohini)
PIG-3262: Pig contrib 0.11 doesn't compile on certain rpm systems (mgrover via cheolsoo)
PIG-3264: mvn signanddeploy target broken for pigunit, pigsmoke and piggybank (billgraham)
Release 0.11.1
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-3256: Upgrade jython to 2.5.3 (legal concern) (daijy)
PIG-2988: start deploying pigunit maven artifact part of Pig release process (njw45 via rohini)
PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag. Extra option to gc() before spilling large bag. (knoguchi via rohini)
PIG-3216: Groovy UDFs documentation has minor typos (herberts via rohini)
PIG-3202: CUBE operator not documented in user docs (prasanth_j via billgraham)
OPTIMIZATIONS
BUG FIXES
PIG-3267: HCatStorer fail in limit query (daijy)
PIG-3252: AvroStorage gives wrong schema for schemas with named records (mwagner via cheolsoo)
PIG-3132: NPE when illustrating a relation with HCatLoader (daijy)
PIG-3194: Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2 (prkommireddi via dvryaboy)
PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
PIG-3144: Erroneous map entry alias resolution leading to "Duplicate schema alias" errors (jcoveney via cheolsoo)
PIGG-3212: Race Conditions in POSort and (Internal)SortedBag during Proactive Spill (kadeng via dvryaboy)
PIG-3206: HBaseStorage does not work with Oozie pig action and secure HBase (rohini)
Release 0.11.0
INCOMPATIBLE CHANGES
PIG-3034: Remove Penny code from Pig repository (gates via cheolsoo)
PIG-2931: $ signs in the replacement string make parameter substitution fail (cheolsoo via jcoveney)
PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates)
IMPROVEMENTS
PIG-3044: Trigger POPartialAgg compaction under GC pressure (dvryaboy)
PIG-2907: Publish pig jars for Hadoop2/23 to maven (rohini)
PIG-2934: HBaseStorage filter optimizations (billgraham)
PIG-2980: documentation for DateTime datatype (zjshen via thejas)
PIG-2982: add unit tests for DateTime type that test setting timezone (zjshen via thejas)
PIG-2937: generated field in nested foreach does not inherit the variable name as the field name (jcoveney)
PIG-3019: Need a target in build.xml for source releases (gates)
PIG-2832: org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of PigContext (prkommireddi via rohini)
PIG-2898: Parallel execution of e2e tests (iveselovsky via rohini)
PIG-2913: org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks up previous minicluster configuration file (cheolsoo via julien)
PIG-2976: Reduce HBaseStorage logging (billgraham)
PIG-2947: Documentation for Rank operator (xalan via azaroth)
PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy)
PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)
PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney)
PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham)
PIG-2579: Support for multiple input schemas in AvroStorage (cheolsoo via sms)
PIG-2946: Documentation of "history" and "clear" commands (xalan via azaroth)
PIG-2877: Make SchemaTuple work in foreach (and thus, in loads) (jcoveney)
PIG-2923: Lazily register bags with SpillableMemoryManager (dvryaboy)
PIG-2929: Improve documentation around AVG, CONCAT, MIN, MAX (cheolsoo via billgraham)
PIG-2852: Update documentation regarding parallel local mode execution (cheolsoo via jcoveney)
PIG-2879: Pig current releases lack a UDF startsWith.This UDF tests if a given string starts with the specified prefix. (initialcontext via azaroth)
PIG-2712: Pig does not call OutputCommitter.abortJob() on the underlying OutputFormat (rohini via gates)
PIG-2918: Avoid Spillable bag overhead where possible (dvryaboy)
PIG-2900: Streaming should provide conf settings in the environment (dvryaboy)
PIG-2353: RANK function like in SQL (xalan via azaroth)
PIG-2915: Builtin TOP udf is sensitive to null input bags (hazen via dvryaboy)
PIG-2901: Errors and lacks in document "Pig Latin Basics" (miyakawataku via billgraham)
PIG-2905: Improve documentation around REPLACE (cheolsoo via billgraham)
PIG-2882: Use Deque instead of Stack (mkhadikov via dvryaboy)
PIG-2781: LOSort isEqual method (xalan via dvryaboy)
PIG-2835: Optimizing the convertion from bytes to Integer/Long (jay23jack via dvryaboy)
PIG-2886: Add Scan TimeRange to HBaseStorage (ted.m via dvryaboy)
PIG-2895: jodatime jar missing in pig-withouthadoop.jar (thejas)
PIG-2888: Improve performance of POPartialAgg (dvryaboy)
PIG-2708: split MiniCluster based tests out of org.apache.pig.test.TestInputOutputFileValidator (analog.sony via daijy)
PIG-2890: Revert PIG-2578 (dvryaboy)
PIG-2850: Pig should support loading macro files as resources stored in JAR files (matterhayes via dvryaboy)
PIG-1314: Add DateTime Support to Pig (zjshen via thejas)
PIG-2785: NoClassDefFoundError after upgrading to pig 0.10.0 from 0.9.0 (matterhayes via sms)
PIG-2556: CSVExcelStorage load: quoted field with newline as first character sees newline as record end (tivv via dvryaboy)
PIG-2875: Add recursive record support to AvroStorage (cheolsoo via sms)
PIG-2662: skew join does not honor its config parameters (rajesh.balamohan via thejas)
PIG-2871: Refactor signature for PigReducerEstimator (billgraham)
PIG-2851: Add flag to ant to run tests with a debugger port (billgraham)
PIG-2862: Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier (jcoveney)
PIG-2855: Provide a method to measure time spent in UDFs (dvryaboy)
PIG-2837: AvroStorage throws StackOverFlowError (cheolsoo via sms)
PIG-2856: AvroStorage doesn't load files in the directories when a glob pattern matches both files and directories. (cheolsoo via sms)
PIG-2569: Fix org.apache.pig.test.TestInvoker.testSpeed (aklochkov via dvryaboy)
PIG-2858: Improve PlanHelper to allow finding any PhysicalOperator in a plan (dvryaboy)
PIG-2854: AvroStorage doesn't work with Avro 1.7.1 (cheolsoo via sms)
PIG-2779: Refactoring the code for setting number of reducers (jay23jack via billgraham)
PIG-2765: Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator (prasanth_j via dvryaboy)
PIG-2814: Fix issues with Sample operator documentation (prasanth_j via dvryaboy)
PIG-2817: Documentation for Groovy UDFs (herberts via julien)
PIG-2492: AvroStorage should recognize globs and commas (cheolsoo via sms)
PIG-2706: Add clear to list of grunt commands (xalan via azaroth)
PIG-2823: TestPigContext.testImportList() does not pass if another javac in on the PATH (julien)
PIG-2800: pig.additional.jars path separator should align with File.pathSeparator instead of being hard-coded to ":" (jgordon via azaroth)
PIG-2797: Tests should not create their own file URIs through string concatenation, should use Util.generateURI instead (jgordon via azaroth)
PIG-2820: relToAbsolutePath is not replayed properly when Grunt reparses the script after PIG-2699 (julien)
PIG-2763: Groovy UDFs (herberts via julien)
PIG-2780: MapReduceLauncher should break early when one of the jobs throws an exception (jay23jack via daijy)
PIG-2804: Remove "PIG" exec type (dvryaboy)
PIG-2726: Handling legitimate NULL values in Cube operator (prasanth_j via dvryaboy)
PIG-2808: Add *.project to .gitignore (azaroth)
PIG-2787: change the module name in ivy to lowercase to match the maven repo (julien)
PIG-2632: Create a SchemaTuple which generates efficient Tuples via code gen (jcoveney)
PIG-2750: add artifacts to the ivy.xml for other jars Pig generates (julien)
PIG-2748: Change the names of the jar produced in the build folder to match maven conventions (julien)
PIG-2770: Allow easy inclusion of custom build targets (julien)
PIG-2697: pretty print schema via pig.pretty.print.schema (rangadi via jcoveney)
PIG-2673: Allow Merge join to follow an ORDER statement (dvryaboy)
PIG-2699: Reduce the number of instances of Load and Store Funcs down to 2+1 (julien)
PIG-2166: UDFs to join a bag (hluu via daijy)
PIG-2651: Provide a much easier to use accumulator interface (jcoveney via daijy)
PIG-2658: Add pig.script.submitted.timestamp and pig.job.submitted.timestamp in generated Map-Reduce job conf (billgraham)
PIG-2735: Add a pig.version.suffix property in build.xml to easily override with a build number (julien)
PIG-2705: outputSchema modification from scripting UDFs (levyjoshua via julien)
PIG-2724: Make Tuple Iterable (jcoveney)
PIG-2733: Add *.patch, *.log, *.orig, *.rej, *.class to gitignore (jcoveney)
PIG-2732: Let's get rid of the deprecated Tuple methods (jcoveney)
PIG-2638: Optimize BinInterSedes treatment of longs (jcoveney)
PIG-2727: PigStorage Source tagging does not need pig.splitCombination to be turned off (prkommireddi via dvryaboy)
PIG-2710: Implement Naive CUBE operator (prasanth_j via dvryaboy)
PIG-2714: Pig documentation on TOP funcation has issues (daijy)
PIG-2066: Accumulators should be able to early-terminate (jcoveney)
PIG-2600: Better Map support (prkommireddi via jcoveney)
PIG-2711: e2e harness: cache benchmark results between test runs (thw via daijy)
PIG-2702: Make Pig local mode (and tests) faster by working around the hard coded sleep(5000) in hadoop's JobControl (julien)
PIG-2659: add source location of the aliases in the physical plan (julien)
PIG-2547: Easier UDFs: Convenient EvalFunc super-classes (billgraham, dvryaboy)
PIG-2639: Utils.getSchemaFromString should automatically give name to all types, but fails on boolean (jcoveney)
PIG-2696: Enhance Job Stat to print out median map and reduce time (hluu via daijy)
PIG-2583: Add Grunt command to list the statements in cache (xalan via daijy)
PIG-2688: Log the aliases being processed for the current job (ddaniels888 via azaroth)
PIG-2680: TOBAG output schema reporting (andy schlaikjer via jcoveney)
PIG-2685: Fix error in EvalFunc ctor when implementing Algebraic UDF whose return type is parameterized (andy schlaikjer via jcoveney)
PIG-2664: Allow PPNL impls to get more job info during the run (billgraham)
PIG-2663: Expose helpful ScriptState methods (billgraham)
PIG-2660: PPNL notified of plan before it gets executed (billgraham)
PIG-2574: Make reducer estimator plugable (billgraham)
PIG-2677: Add target to build.xml to generate clover summary reports (gates)
PIG-2650: Convenience mock Loader and Storer to simplify unit testing of Pig scripts (julien)
PIG-2257: AvroStorage doesn't recognize schema_file field when JSON isn't used in the constructor (billgraham)
PIG-2587: Compute LogicalPlan signature and store in job conf (billgraham)
PIG-2619: HBaseStorage constructs a Scan with cacheBlocks = false
PIG-2604: Pig should print its build info at runtime (traviscrawford via dvryaboy)
PIG-2573: Automagically setting parallelism based on input file size does not work with HCatalog (traviscrawford via julien)
PIG-2538: Add helper wrapper classes for StoreFunc (billgraham via dvryaboy)
PIG-2010: registered jars on distributed cache (traviscrawford and julienledem via dvryaboy)
PIG-2533: Pig MR job exceptions masked on frontend (traviscrawford via dvryaboy)
PIG-2525: Support pluggable PigProcessNotifcationListeners on the command line (dvryaboy)
PIG-2515: [piggybank] Make CustomFormatToISO return null on Exception in parsing dates (rjurney via dvryaboy)
PIG-2503: Make @MonitoredUDF inherited (dvryaboy)
PIG-2488: Move Python unit tests to e2e tests (alangates via daijy)
PIG-2456: Pig should have a pigrc to specify default script cache (prkommireddi via daijy)
PIG-2496: Cache resolved classes in PigContext (dvryaboy)
PIG-2482: Integrate HCat DDL command into Pig (daijy)
PIG-2479: changingPattern should be used with checkmodified in ivysettings.xml (abayer via azaroth)
PIG-2349: Ant build repeats ivy-buildJar several times (azaroth)
PIG-2359: Support more efficient Tuples when schemas are known (dvryaboy)
PIG-2282: Automatically update Eclipse .classpath file when new libs are added to the classpath through Ivy (azaroth via daijy)
PIG-2468: Speed up TestBuiltin (dvryaboy)
PIG-2467: Speed up TestCommit (dvryaboy)
PIG-2460: Use guava 11 instead of r06 (dvryaboy)
PIG-2267: Make the name of the columns in schema optional (jcoveney via daijy)
PIG-2453: Fetching schema can be very slow for multi-thousand file LOADs (dvryaboy)
PIG-2443: [Piggybank] Add UDFs to check if a String is an Integer And if a String is Numeric (prkommireddi via daijy)
PIG-2437: Use Ivy to get automaton.jar (azaroth)
PIG-2448: Convert more tests to use LOCAL mode (dvryaboy)
PIG-2438: Do not hardcode commons-lang version in build.xml (azaroth)
PIG-2422: Add log messages for Jython schema definitions (vivekp via gates)
PIG-2403: Reduce code duplication in SUM, MAX, MIN udfs (dvryaboy)
PIG-2245: Add end to end test for tokenize (markroddy via gates)
PIG-2327: bin/pig doesn't have any hooks for picking up ZK installation deployed from tarballs (rvs via hashutosh)
PIG-2382: Modify .gitignore to ignore pig-withouthadoop.jar (azaroth via hashutosh)
PIG-2380: Expose version information more cleanly (jcoveney via azaroth)
PIG-2311: STRSPLIT needs to allow bytearray arguments (xuting via olgan)
PIG-2365: Current TOP implementation needlessly results in a null bag name (jcoveney via dvryaboy)
PIG-2151: Add annotation to specify output schema in Java UDFs (dvryaboy)
PIG-2230: Improved error message for invalid parameter format (xuitingz via olgan)
PIG-2328: Add builtin UDFs for building and using bloom filters (gates)
PIG-2338: Need signature for EvalFunc (daijy)
PIG-2337: Provide UDF with input schema (xutingz via daijy)
OPTIMIZATIONS
BUG FIXES
PIG-3147: Spill failing with "java.lang.RuntimeException: InternalCachedBag.spill() should not be called" (knoguchi via dvryaboy)
PIG-3109: Missing license headers (jarcec via cheolsoo)
PIG-3022: TestRegisteredJarVisibility.testRegisteredJarVisibility fails with hadoop-2.0.x (rohini via cheolsoo)
PIG-3125: Fix zebra compilation error (cheolsoo)
PIG-3051: java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning (knoguchi via rohini)
PIG-3076: make TestScalarAliases more reliable (julien)
PIG-3020: "Duplicate uid in schema" error when joining two relations derived from the same load statement (jcoveney)
PIG-3044: hotfix to remove divide by 0 error (jcoveney)
PIG-3033: test-patch failed with javadoc warnings (fang fang chen via cheolsoo)
PIG-3058: Upgrade junit to at least 4.8 (fang fang chen via cheolsoo)
PIG-2978: TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x (cheolsoo)
PIG-3039: Not possible to use custom version of jackson jars (rohini)
PIG-3045: Specifying sorting field(s) at nightly.conf - fix sortArgs (rohini via cheolsoo)
PIG-2979: Pig.jar doesn't work with hadoop-2.0.x (cheolsoo)
PIG-3035: With latest version of hadoop23 pig does not return the correct exception stack trace from backend (rohini)
PIG-2405: some unit test case failed with open JDK (fang fang chen via cheolsoo)
PIG-3018: Refactor TestScriptLanguage to remove duplication and write script in different files (julien)