Skip to content

Conversation

@yhuai
Copy link
Contributor

@yhuai yhuai commented Aug 26, 2015

https://issues.apache.org/jira/browse/SPARK-10287

After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).

@yhuai
Copy link
Contributor Author

yhuai commented Aug 26, 2015

@liancheng Maybe it is better to make JSON, Parquet, and ORC consistent instead of fixing JSON's refresh problem.

@liancheng
Copy link
Contributor

LGTM. We should mention this in the release note and migration guide.

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41650 has finished for PR 8469 at commit acec3ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Aug 27, 2015

I will test it with my partitioned JSON table.

@yhuai
Copy link
Contributor Author

yhuai commented Aug 27, 2015

It works. I will update doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the release note, we need to add JSON data source will not automatically load new files that are created by other applications (i.e. files that are not inserted to the dataset through Spark SQL). [SPARK-10287].

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41705 has finished for PR 8469 at commit dead685.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #1698 has finished for PR 8469 at commit dead685.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LogisticRegressionModel @Since("1.3.0") (
    • class SVMModel @Since("1.1.0") (
    • class GaussianMixtureModel @Since("1.3.0") (
    • class KMeansModel @Since("1.1.0") (@Since("1.0.0") val clusterCenters: Array[Vector])
    • class PowerIterationClusteringModel @Since("1.3.0") (
    • class StreamingKMeansModel @Since("1.2.0") (
    • class StreamingKMeans @Since("1.2.0") (
    • class BinaryClassificationMetrics @Since("1.3.0") (
    • class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, Double)])
    • class MultilabelMetrics @Since("1.2.0") (predictionAndLabels: RDD[(Array[Double], Array[Double])])
    • class RegressionMetrics @Since("1.2.0") (
    • class ChiSqSelectorModel @Since("1.3.0") (
    • class ChiSqSelector @Since("1.3.0") (
    • class ElementwiseProduct @Since("1.4.0") (
    • class IDF @Since("1.2.0") (@Since("1.2.0") val minDocFreq: Int)
    • class Normalizer @Since("1.1.0") (p: Double) extends VectorTransformer
    • class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int)
    • class StandardScaler @Since("1.1.0") (withMean: Boolean, withStd: Boolean) extends Logging
    • class StandardScalerModel @Since("1.3.0") (
    • class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
    • class FreqItemset[Item] @Since("1.3.0") (
    • class FreqSequence[Item] @Since("1.5.0") (
    • class PrefixSpanModel[Item] @Since("1.5.0") (
    • class DenseMatrix @Since("1.3.0") (
    • class SparseMatrix @Since("1.3.0") (
    • class DenseVector @Since("1.0.0") (
    • class SparseVector @Since("1.0.0") (
    • class BlockMatrix @Since("1.3.0") (
    • class CoordinateMatrix @Since("1.0.0") (
    • class IndexedRowMatrix @Since("1.0.0") (
    • class RowMatrix @Since("1.0.0") (
    • class PoissonGenerator @Since("1.1.0") (
    • class ExponentialGenerator @Since("1.3.0") (
    • class GammaGenerator @Since("1.3.0") (
    • class LogNormalGenerator @Since("1.3.0") (
    • case class Rating @Since("0.8.0") (
    • class MatrixFactorizationModel @Since("0.8.0") (
    • abstract class GeneralizedLinearModel @Since("1.0.0") (
    • class IsotonicRegressionModel @Since("1.3.0") (
    • case class LabeledPoint @Since("1.0.0") (
    • class LassoModel @Since("1.1.0") (
    • class LinearRegressionModel @Since("1.1.0") (
    • class RidgeRegressionModel @Since("1.1.0") (
    • class MultivariateGaussian @Since("1.3.0") (
    • case class BoostingStrategy @Since("1.4.0") (
    • class Strategy @Since("1.3.0") (
    • class DecisionTreeModel @Since("1.0.0") (
    • class Node @Since("1.2.0") (
    • class Predict @Since("1.2.0") (
    • class RandomForestModel @Since("1.2.0") (
    • class GradientBoostedTreesModel @Since("1.2.0") (
    • abstract class SetOperation(left: LogicalPlan, right: LogicalPlan) extends BinaryNode
    • case class Union(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)
    • case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)
    • case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)

@yhuai
Copy link
Contributor Author

yhuai commented Aug 27, 2015

I am merging it to master and branch 1.5.

asfgit pushed a commit that referenced this pull request Aug 27, 2015
https://issues.apache.org/jira/browse/SPARK-10287

After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).

Author: Yin Huai <yhuai@databricks.com>

Closes #8469 from yhuai/jsonRefresh.

(cherry picked from commit b3dd569)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@asfgit asfgit closed this in b3dd569 Aug 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants