Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions python/pyspark/sql/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,9 +226,8 @@ def createDataFrame(self, data, schema=None, samplingRatio=None):
from ``data``, which should be an RDD of :class:`Row`,
or :class:`namedtuple`, or :class:`dict`.

When ``schema`` is :class:`pyspark.sql.types.DataType` or
:class:`pyspark.sql.types.StringType`, it must match the
real data, or an exception will be thrown at runtime. If the given schema is not
When ``schema`` is :class:`pyspark.sql.types.DataType` or a datatype string it must match
the real data, or an exception will be thrown at runtime. If the given schema is not
:class:`pyspark.sql.types.StructType`, it will be wrapped into a
:class:`pyspark.sql.types.StructType` as its only field, and the field name will be "value",
each record will also be wrapped into a tuple, which can be converted to row later.
Expand All @@ -239,8 +238,7 @@ def createDataFrame(self, data, schema=None, samplingRatio=None):
:param data: an RDD of any kind of SQL data representation(e.g. :class:`Row`,
:class:`tuple`, ``int``, ``boolean``, etc.), or :class:`list`, or
:class:`pandas.DataFrame`.
:param schema: a :class:`pyspark.sql.types.DataType` or a
:class:`pyspark.sql.types.StringType` or a list of
:param schema: a :class:`pyspark.sql.types.DataType` or a datatype string or a list of
column names, default is None. The data type string format equals to
:class:`pyspark.sql.types.DataType.simpleString`, except that top level struct type can
omit the ``struct<>`` and atomic types use ``typeName()`` as their format, e.g. use
Expand All @@ -251,7 +249,7 @@ def createDataFrame(self, data, schema=None, samplingRatio=None):

.. versionchanged:: 2.0
The ``schema`` parameter can be a :class:`pyspark.sql.types.DataType` or a
:class:`pyspark.sql.types.StringType` after 2.0.
datatype string after 2.0.
If it's not a :class:`pyspark.sql.types.StructType`, it will be wrapped into a
:class:`pyspark.sql.types.StructType` and each record will also be wrapped into a tuple.

Expand Down
10 changes: 4 additions & 6 deletions python/pyspark/sql/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -414,9 +414,8 @@ def createDataFrame(self, data, schema=None, samplingRatio=None):
from ``data``, which should be an RDD of :class:`Row`,
or :class:`namedtuple`, or :class:`dict`.

When ``schema`` is :class:`pyspark.sql.types.DataType` or
:class:`pyspark.sql.types.StringType`, it must match the
real data, or an exception will be thrown at runtime. If the given schema is not
When ``schema`` is :class:`pyspark.sql.types.DataType` or a datatype string, it must match
the real data, or an exception will be thrown at runtime. If the given schema is not
:class:`pyspark.sql.types.StructType`, it will be wrapped into a
:class:`pyspark.sql.types.StructType` as its only field, and the field name will be "value",
each record will also be wrapped into a tuple, which can be converted to row later.
Expand All @@ -426,8 +425,7 @@ def createDataFrame(self, data, schema=None, samplingRatio=None):

:param data: an RDD of any kind of SQL data representation(e.g. row, tuple, int, boolean,
etc.), or :class:`list`, or :class:`pandas.DataFrame`.
:param schema: a :class:`pyspark.sql.types.DataType` or a
:class:`pyspark.sql.types.StringType` or a list of
:param schema: a :class:`pyspark.sql.types.DataType` or a datatype string or a list of
column names, default is ``None``. The data type string format equals to
:class:`pyspark.sql.types.DataType.simpleString`, except that top level struct type can
omit the ``struct<>`` and atomic types use ``typeName()`` as their format, e.g. use
Expand All @@ -438,7 +436,7 @@ def createDataFrame(self, data, schema=None, samplingRatio=None):

.. versionchanged:: 2.0
The ``schema`` parameter can be a :class:`pyspark.sql.types.DataType` or a
:class:`pyspark.sql.types.StringType` after 2.0. If it's not a
datatype string after 2.0. If it's not a
:class:`pyspark.sql.types.StructType`, it will be wrapped into a
:class:`pyspark.sql.types.StructType` and each record will also be wrapped into a tuple.

Expand Down