Skip to content

Spark – How to rename multiple columns in DataFrame

In the last post we show how to apply a function to multiple columns. And if you have done that, you might have multiple column with desired data. However, you might want to rename back to original name.

let’s consider you have following dataframe. And you want to rename all the columns to different name.

>>> df.printSchema()
root
 |-- name: string (nullable = true)
 |-- age: integer (nullable = true)
 |-- joining_dt: date (nullable = true)

First thing you need is map which contains mapping from old names to new names and a small functional programming.

How to rename multiple columns in Pyspark

from pyspark.sql.functions import col
col_rename = {"age":"new_age", "name":"new_name", "joining_dt":"new_joining_dt"}
df_with_col_renamed = df.select([col(c).alias(col_rename.get(c,c)) for c in df.columns])
>>> df_with_col_renamed.printSchema()
root
 |-- new_name: string (nullable = true)
 |-- new_age: integer (nullable = true)
 |-- new_joining_dt: date (nullable = true)

How to rename multiple columns in spark using Scala

val colToRename = Map("age"->"new_age", 
					  "name"->"new_name", 
					  "joining_dt"->"new_joining_dt")
val newDf = df.select(
				df.columns.map{
						oldName=>col(oldName).alias(colToRename.getOrElse(oldName, oldName))
				}: _*)
Published inHadoopspark

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *