public final class DataFrameNaFunctions
extends Object
DataFrames.
 | Modifier and Type | Method and Description | 
|---|---|
Dataset<Row> | 
drop()
Returns a new  
DataFrame that drops rows containing any null or NaN values. | 
Dataset<Row> | 
drop(int minNonNulls)
Returns a new  
DataFrame that drops rows containing
 less than minNonNulls non-null and non-NaN values. | 
Dataset<Row> | 
drop(int minNonNulls,
    scala.collection.Seq<String> cols)
(Scala-specific) Returns a new  
DataFrame that drops rows containing less than
 minNonNulls non-null and non-NaN values in the specified columns. | 
Dataset<Row> | 
drop(int minNonNulls,
    String[] cols)
Returns a new  
DataFrame that drops rows containing
 less than minNonNulls non-null and non-NaN values in the specified columns. | 
Dataset<Row> | 
drop(scala.collection.Seq<String> cols)
(Scala-specific) Returns a new  
DataFrame that drops rows containing any null or NaN values
 in the specified columns. | 
Dataset<Row> | 
drop(String how)
Returns a new  
DataFrame that drops rows containing null or NaN values. | 
Dataset<Row> | 
drop(String[] cols)
Returns a new  
DataFrame that drops rows containing any null or NaN values
 in the specified columns. | 
Dataset<Row> | 
drop(String how,
    scala.collection.Seq<String> cols)
(Scala-specific) Returns a new  
DataFrame that drops rows containing null or NaN values
 in the specified columns. | 
Dataset<Row> | 
drop(String how,
    String[] cols)
Returns a new  
DataFrame that drops rows containing null or NaN values
 in the specified columns. | 
Dataset<Row> | 
fill(boolean value)
Returns a new  
DataFrame that replaces null values in boolean columns with value. | 
Dataset<Row> | 
fill(boolean value,
    scala.collection.Seq<String> cols)
(Scala-specific) Returns a new  
DataFrame that replaces null values in specified
 boolean columns. | 
Dataset<Row> | 
fill(boolean value,
    String[] cols)
Returns a new  
DataFrame that replaces null values in specified boolean columns. | 
Dataset<Row> | 
fill(double value)
Returns a new  
DataFrame that replaces null or NaN values in numeric columns with value. | 
Dataset<Row> | 
fill(double value,
    scala.collection.Seq<String> cols)
(Scala-specific) Returns a new  
DataFrame that replaces null or NaN values in specified
 numeric columns. | 
Dataset<Row> | 
fill(double value,
    String[] cols)
Returns a new  
DataFrame that replaces null or NaN values in specified numeric columns. | 
Dataset<Row> | 
fill(long value)
Returns a new  
DataFrame that replaces null or NaN values in numeric columns with value. | 
Dataset<Row> | 
fill(long value,
    scala.collection.Seq<String> cols)
(Scala-specific) Returns a new  
DataFrame that replaces null or NaN values in specified
 numeric columns. | 
Dataset<Row> | 
fill(long value,
    String[] cols)
Returns a new  
DataFrame that replaces null or NaN values in specified numeric columns. | 
Dataset<Row> | 
fill(java.util.Map<String,Object> valueMap)
Returns a new  
DataFrame that replaces null values. | 
Dataset<Row> | 
fill(scala.collection.immutable.Map<String,Object> valueMap)
(Scala-specific) Returns a new  
DataFrame that replaces null values. | 
Dataset<Row> | 
fill(String value)
Returns a new  
DataFrame that replaces null values in string columns with value. | 
Dataset<Row> | 
fill(String value,
    scala.collection.Seq<String> cols)
(Scala-specific) Returns a new  
DataFrame that replaces null values in
 specified string columns. | 
Dataset<Row> | 
fill(String value,
    String[] cols)
Returns a new  
DataFrame that replaces null values in specified string columns. | 
<T> Dataset<Row> | 
replace(scala.collection.Seq<String> cols,
       scala.collection.immutable.Map<T,T> replacement)
(Scala-specific) Replaces values matching keys in  
replacement map. | 
<T> Dataset<Row> | 
replace(String[] cols,
       java.util.Map<T,T> replacement)
Replaces values matching keys in  
replacement map with the corresponding values. | 
<T> Dataset<Row> | 
replace(String col,
       java.util.Map<T,T> replacement)
Replaces values matching keys in  
replacement map with the corresponding values. | 
<T> Dataset<Row> | 
replace(String col,
       scala.collection.immutable.Map<T,T> replacement)
(Scala-specific) Replaces values matching keys in  
replacement map. | 
public Dataset<Row> drop()
DataFrame that drops rows containing any null or NaN values.
 public Dataset<Row> drop(String how)
DataFrame that drops rows containing null or NaN values.
 
 If how is "any", then drop rows containing any null or NaN values.
 If how is "all", then drop rows only if every column is null or NaN for that row.
 
how - (undocumented)public Dataset<Row> drop(String[] cols)
DataFrame that drops rows containing any null or NaN values
 in the specified columns.
 cols - (undocumented)public Dataset<Row> drop(scala.collection.Seq<String> cols)
DataFrame that drops rows containing any null or NaN values
 in the specified columns.
 cols - (undocumented)public Dataset<Row> drop(String how, String[] cols)
DataFrame that drops rows containing null or NaN values
 in the specified columns.
 
 If how is "any", then drop rows containing any null or NaN values in the specified columns.
 If how is "all", then drop rows only if every specified column is null or NaN for that row.
 
how - (undocumented)cols - (undocumented)public Dataset<Row> drop(String how, scala.collection.Seq<String> cols)
DataFrame that drops rows containing null or NaN values
 in the specified columns.
 
 If how is "any", then drop rows containing any null or NaN values in the specified columns.
 If how is "all", then drop rows only if every specified column is null or NaN for that row.
 
how - (undocumented)cols - (undocumented)public Dataset<Row> drop(int minNonNulls)
DataFrame that drops rows containing
 less than minNonNulls non-null and non-NaN values.
 minNonNulls - (undocumented)public Dataset<Row> drop(int minNonNulls, String[] cols)
DataFrame that drops rows containing
 less than minNonNulls non-null and non-NaN values in the specified columns.
 minNonNulls - (undocumented)cols - (undocumented)public Dataset<Row> drop(int minNonNulls, scala.collection.Seq<String> cols)
DataFrame that drops rows containing less than
 minNonNulls non-null and non-NaN values in the specified columns.
 minNonNulls - (undocumented)cols - (undocumented)public Dataset<Row> fill(long value)
DataFrame that replaces null or NaN values in numeric columns with value.
 value - (undocumented)public Dataset<Row> fill(double value)
DataFrame that replaces null or NaN values in numeric columns with value.value - (undocumented)public Dataset<Row> fill(String value)
DataFrame that replaces null values in string columns with value.
 value - (undocumented)public Dataset<Row> fill(long value, String[] cols)
DataFrame that replaces null or NaN values in specified numeric columns.
 If a specified column is not a numeric column, it is ignored.
 value - (undocumented)cols - (undocumented)public Dataset<Row> fill(double value, String[] cols)
DataFrame that replaces null or NaN values in specified numeric columns.
 If a specified column is not a numeric column, it is ignored.
 value - (undocumented)cols - (undocumented)public Dataset<Row> fill(long value, scala.collection.Seq<String> cols)
DataFrame that replaces null or NaN values in specified
 numeric columns. If a specified column is not a numeric column, it is ignored.
 value - (undocumented)cols - (undocumented)public Dataset<Row> fill(double value, scala.collection.Seq<String> cols)
DataFrame that replaces null or NaN values in specified
 numeric columns. If a specified column is not a numeric column, it is ignored.
 value - (undocumented)cols - (undocumented)public Dataset<Row> fill(String value, String[] cols)
DataFrame that replaces null values in specified string columns.
 If a specified column is not a string column, it is ignored.
 value - (undocumented)cols - (undocumented)public Dataset<Row> fill(String value, scala.collection.Seq<String> cols)
DataFrame that replaces null values in
 specified string columns. If a specified column is not a string column, it is ignored.
 value - (undocumented)cols - (undocumented)public Dataset<Row> fill(boolean value)
DataFrame that replaces null values in boolean columns with value.
 value - (undocumented)public Dataset<Row> fill(boolean value, scala.collection.Seq<String> cols)
DataFrame that replaces null values in specified
 boolean columns. If a specified column is not a boolean column, it is ignored.
 value - (undocumented)cols - (undocumented)public Dataset<Row> fill(boolean value, String[] cols)
DataFrame that replaces null values in specified boolean columns.
 If a specified column is not a boolean column, it is ignored.
 value - (undocumented)cols - (undocumented)public Dataset<Row> fill(java.util.Map<String,Object> valueMap)
DataFrame that replaces null values.
 
 The key of the map is the column name, and the value of the map is the replacement value.
 The value must be of the following type:
 Integer, Long, Float, Double, String, Boolean.
 Replacement values are cast to the column data type.
 
For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.
   import com.google.common.collect.ImmutableMap;
   df.na.fill(ImmutableMap.of("A", "unknown", "B", 1.0));
 
 valueMap - (undocumented)public Dataset<Row> fill(scala.collection.immutable.Map<String,Object> valueMap)
DataFrame that replaces null values.
 
 The key of the map is the column name, and the value of the map is the replacement value.
 The value must be of the following type: Int, Long, Float, Double, String, Boolean.
 Replacement values are cast to the column data type.
 
For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.
   df.na.fill(Map(
     "A" -> "unknown",
     "B" -> 1.0
   ))
 
 valueMap - (undocumented)public <T> Dataset<Row> replace(String col, java.util.Map<T,T> replacement)
replacement map with the corresponding values.
 
   import com.google.common.collect.ImmutableMap;
   // Replaces all occurrences of 1.0 with 2.0 in column "height".
   df.na.replace("height", ImmutableMap.of(1.0, 2.0));
   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
   df.na.replace("name", ImmutableMap.of("UNKNOWN", "unnamed"));
   // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
   df.na.replace("*", ImmutableMap.of("UNKNOWN", "unnamed"));
 
 col - name of the column to apply the value replacement. If col is "*",
            replacement is applied on all string, numeric or boolean columns.replacement - value replacement map. Key and value of replacement map must have
                    the same type, and can only be doubles, strings or booleans.
                    The map value can have nulls.
 public <T> Dataset<Row> replace(String[] cols, java.util.Map<T,T> replacement)
replacement map with the corresponding values.
 
   import com.google.common.collect.ImmutableMap;
   // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
   df.na.replace(new String[] {"height", "weight"}, ImmutableMap.of(1.0, 2.0));
   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
   df.na.replace(new String[] {"firstname", "lastname"}, ImmutableMap.of("UNKNOWN", "unnamed"));
 
 cols - list of columns to apply the value replacement. If col is "*",
             replacement is applied on all string, numeric or boolean columns.replacement - value replacement map. Key and value of replacement map must have
                    the same type, and can only be doubles, strings or booleans.
                    The map value can have nulls.
 public <T> Dataset<Row> replace(String col, scala.collection.immutable.Map<T,T> replacement)
replacement map.
 
   // Replaces all occurrences of 1.0 with 2.0 in column "height".
   df.na.replace("height", Map(1.0 -> 2.0));
   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
   df.na.replace("name", Map("UNKNOWN" -> "unnamed"));
   // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
   df.na.replace("*", Map("UNKNOWN" -> "unnamed"));
 
 col - name of the column to apply the value replacement. If col is "*",
            replacement is applied on all string, numeric or boolean columns.replacement - value replacement map. Key and value of replacement map must have
                    the same type, and can only be doubles, strings or booleans.
                    The map value can have nulls.
 public <T> Dataset<Row> replace(scala.collection.Seq<String> cols, scala.collection.immutable.Map<T,T> replacement)
replacement map.
 
   // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
   df.na.replace("height" :: "weight" :: Nil, Map(1.0 -> 2.0));
   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
   df.na.replace("firstname" :: "lastname" :: Nil, Map("UNKNOWN" -> "unnamed"));
 
 cols - list of columns to apply the value replacement. If col is "*",
             replacement is applied on all string, numeric or boolean columns.replacement - value replacement map. Key and value of replacement map must have
                    the same type, and can only be doubles, strings or booleans.
                    The map value can have nulls.