NumPy: First step towards Data Analysis using Python, we will cover the following topics
- Why to use NumPy –> Motivating Examples
- NumPy arrays – creation, methods, and attributes
- Basic math with arrays
- Manipulation with arrays
- Using NumPy for simulations
Data Analysis with Pandas
- Pandas Series & all operations with it.
- Pandas DataFrames & all operations with it.
Matplotlib – Needed for visualizing data.
In this course we are not going to plot with Matplotlib because we will use higher level libraries for plotting: Seaborn and Pandas. However since both of these libraries are built on top of Matplotlib we need to acquire the basic terminology and concepts of Matplotlib because frequently we will need to make modifications to the objects and plots produced by those higher level libraries. Therefore this lesson is not a complete introduction to Matplotlib, we will learn just enough so we can get started visualizing data.
Exploratory Data Analysis with Seaborn and Pandas
Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. It is used to understand the data, get context about it, understand the variables and the relationship between them, and formulate hypothesis that could be useful when building predictive models.
Assignment_1:EDA with Python Pandas (30-45 mins)
In this Assignment, you will do-&-learn:
- importing datasets, dealing with missing values, changing data types.
- filtering, sorting, selecting specific column(s).
- dealing with duplicate values, dropping and adding rows and columns.
- counting values, counting unique values.
Assignment_3: Data Analysis of Customer Churn rate at a Telecom Co. (60-75mins)
Analyzing a dataset on the churn rate of telecom operator clients.
Note : This dataset is different from the one’s used in the earlier Assignment
Doing Statistics Using SciPy
The SciPy package contains various toolboxes dedicated to common issues in scientific computing. Its different submodules correspond to different applications, such as interpolation, integration, optimization, image processing, statistics, special functions, etc. SciPy is the core package for scientific routines in Python; it is meant to operate efficiently on NumPy arrays, so that NumPy and SciPy work hand in hand.We would be learning SciPy .stats only.
Module 6_A: Time Series Analysis – Introduction & terminology
Module 6_B: Time Series Analysis – Application in real-time
Module 6_C: Time Series Forecasting – Concepts & Problem Definition
Module 6_D: Time Series Forecasting – Problem Solving
Module_6: deals with predicting the electricity consumption of a household for the next three months, estimating traffic on roads at certain periods, and predicting the price at which a stock will trade on the BSE or NSE.
They all fall under the concept of time series data. You cannot accurately predict any of these results without the ‘time’ component. And as more and more data is generated in the world around us, time series forecasting keeps becoming an ever more critical technique for a data analyst or business analyst to master.
Compulsory Projects to be done as a part of course.