In the last post we show how to apply a function to multiple columns. And if you have done that, you might have multiple column with desired data. However, you might want to rename back to original name.
let’s consider you have following dataframe. And you want to rename all the columns to different name.
>>> df.printSchema() root |-- name: string (nullable = true) |-- age: integer (nullable = true) |-- joining_dt: date (nullable = true)
First thing you need is map
which contains mapping from old names
to new names
and a small functional programming.
How to rename multiple columns in Pyspark
from pyspark.sql.functions import col col_rename = {"age":"new_age", "name":"new_name", "joining_dt":"new_joining_dt"} df_with_col_renamed = df.select([col(c).alias(col_rename.get(c,c)) for c in df.columns])
>>> df_with_col_renamed.printSchema() root |-- new_name: string (nullable = true) |-- new_age: integer (nullable = true) |-- new_joining_dt: date (nullable = true)
How to rename multiple columns in spark using Scala
val colToRename = Map("age"->"new_age", "name"->"new_name", "joining_dt"->"new_joining_dt") val newDf = df.select( df.columns.map{ oldName=>col(oldName).alias(colToRename.getOrElse(oldName, oldName)) }: _*)Leave a Comment