Python language has vast application. Data analysis and manipulation is one of many application.

We can work on data using Pandas library in Python for generating data analytics and manipulating data to generate useful patterns.

What is Pandas?

Pandas is a Python library which is used to work with sequential and tabular data. It has functionality to manage, analyze and manipulate data in a simple and efficient way.

We can think of its data structures as relatives to database tables or spreadsheets.

Pandas includes NumPy library. Two primary data structures of pandas are series (1D data) and dataframe (2D data). It can work with Homogenous as well as Heterogenous data.

Features of Pandas

  • Time-series manipulation tools
  • Works with missing data (NaN)
  • Works with different data files (xls,db,csv,psv,hdf5,etc.)
  • ETL tools (Extraction, Transformation and Load tools)

What is DataFrame?

Pandas DataFrame is a heterogenous 2D object, i.e. data are of same type within each column but it could be a different data type for each column and can be labeled with an index (implicit or explicit).

In simple words, DataFrame is like table in database.

The index can be implicit, starting with 0 or we can have our own index. Index can even include dates and times.

Let us now work with DataFrame

Creating an empty DataFrame

import pandas as pd
df1 = pd.DataFrame()
print(df1)

Creating an empty structure DataFrame

import pandas as pd
df1 = pd.DataFrame(columns=['Sr. no','Item','Desc'])
print (df1)

df2 = pd.DataFrame(columns=['Sr. no','Item','Desc'],index=range(1,10))
print (df2)

Creating a DataFrame passing NumPy array

import pandas as pd
arr = {'Sr. no' : [1,2,3,4],
      'Items'  : ['A','B','C','D']}
df1 = pd.DataFrame(arr)
print(df1)

Creating a DataFrame passing a Dictionary

import pandas as pd
dict1 = {1:'A',2:'B',3:'C',4:'D'}
df1 = pd.DataFrame([dict1])
print(df1)

Creating a DataFrame with datetime index

import pandas as pd
arr = {'Sr. no' : [1,2,3,4],
      'Items'  : ['A','B','C','D']}
indx = pd.DatetimeIndex(['2021-12-30','2021-12-31','2022-01-01','2022-01-02'])
df1 = pd.DataFrame(arr,index=indx)
print(df1)

Viewing DataFrame

import pandas as pd
arr = {'Sr. no' : [1,2,3,4],
      'Items'  : ['A','B','C','D']}
df1 = pd.DataFrame(arr,index=indx)
print(df1)   # pd.DataFrame() also print DataFrame 

# get first two rows
df.head(2)

# get last two rows
df.tail(2)

# get DataFrame's index
df.index

# get DataFrame's columns
df.columns

#get DataFrame's values
df.values

Importing Data in Pandas

Pandas DataFrame can read data from several data formats, most common includes csv, psv, xls, json, sql, hdf5, etc.

We will look at few examples to import data from different data formats and customizations.

import pandas as pd
df1 = pd.read_csv('file1.csv',sep=' ') # to read dataframe csv file with blank space as separator
df1 = pd.read_csv('file1.csv',usecols=[0,1,2,3],nrows = 100) # to read data from columns 0 to 3
df1 = pd.read_excel('file1.xls',sheet_name='User') # to read data from excel file and sheet named 'User'

More on Pandas DataFrame in upcoming post.

Hope it helps!

Happy Learning 🙂

Leave a Reply