Background
In the last post, we saw how to filter data in a DataFrame provided by the panda's library in Python. In this post, we will see how we can manipulate string and date type columns in DataFrame.
String and Date manipulation in DataFrame with Pandas
String manipulation
You can use df.dtypes to check the data types of columns in a DataFrame. You can also use df.astype to convert the column types. In this section, we will see how to manipulate and operate string data types in DataFrame. You can do this by accessing the string content using .str accessor type.
Code:
import pandas as pd df = pd.DataFrame({"Name": ["Aniket", "abhijit", "awantika", "Anvi"], "Role": ["IT Dev", "Finance Analyst", "IT QA", "Fun Play"], "Joining Date": [20190101, 20200101, 20210101, 20220101]}) print(df) # .str accessor to operate on string data type df["Name_initials"] = df["Name"].str[0:3] df["Department"] = df["Role"].str.split(" ", expand=True)[1] # Use + operator to concatenate data df["Name_Department_Combined"] = df["Name"] + "_" + df["Department"] # Chain operations to get results in one line df["Capitalized_Initials"] = df["Name"].str.capitalize().str[0:3] print(df.to_string())
Output:
Name Role Joining Date
0 Aniket IT Dev 20190101
1 abhijit Finance Analyst 20200101
2 awantika IT QA 20210101
3 Anvi Fun Play 20220101
Name object
Role object
Joining Date int64
dtype: object
Name Role Joining Date Name_initials Department Name_Department_Combined Capitalized_Initials
0 Aniket IT Dev 20190101 Ani Dev Aniket_Dev Ani
1 abhijit Finance Analyst 20200101 abh Analyst abhijit_Analyst Abh
2 awantika IT QA 20210101 awa QA awantika_QA Awa
3 Anvi Fun Play 20220101 Anv Play Anvi_Play Anv
In the above example, you can see various ways you can manipulate the str column type on data frame.
Date manipulation
Similar to manipulating string we can also manipulate date data types in pandas using accessory "dt". We will use the same example as above for date manipulation. For this we use a different data type called "datetime64[ns]" , this data type represents a timestamp meaning it represents a date and time.
You can try executing below code to see how this data type works
- print(pd.to_datetime("20240701"))
In the example above we have a "Joining date" & as you see in the outout of code above it currently prints int as dtype of that column. We need to convert it to a datetime type before we do further manipulations on a date. As I mentioned above we can convert the data type using df.astype() method.
Code:
import pandas as pd df = pd.DataFrame({"Name": ["Aniket", "abhijit", "awantika", "Anvi"], "Role": ["IT Dev", "Finance Analyst", "IT QA", "Fun Play"], "Joining Date": [20190101, 20200202, 20210303, 20220404]}) df['Joining Date'] = pd.to_datetime(df['Joining Date'], format='%Y%m%d') # If date was of standard YYYY-MM-DD format you could use velow # df = df.astype({"Joining Date": "datetime64[ns]"}) print(df.dtypes) df["Joining Year"] = df["Joining Date"].dt.year df["Joining Month"] = df["Joining Date"].dt.month df["Joining Day"] = df["Joining Date"].dt.day print(df.to_string())
Output:
Name object
Role object
Joining Date datetime64[ns]
dtype: object
Name Role Joining Date Joining Year Joining Month Joining Day
0 Aniket IT Dev 2019-01-01 2019 1 1
1 abhijit Finance Analyst 2020-02-02 2020 2 2
2 awantika IT QA 2021-03-03 2021 3 3
3 Anvi Fun Play 2022-04-04 2022 4 4
No comments:
Post a Comment