chaptor-2 Data Analysis Using Python in Google Colab
Data Analysis Using Python in Google Colab
What is Google Colab?
Google Colaboratory is a free online platform where you can write and run Python code directly in your browser without installing software.
Features
- Free Python environment
- Cloud storage support
- GPU/TPU support
- Easy sharing
- Best for Data Analysis & Machine Learning
Step 1: Open Google Colab
Open: Google Colab
Click:
- New Notebook
Step 2: Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Step 3: Create Sample Dataset
data = {
'Student': ['Amit', 'Rahul', 'Sneha', 'Pooja', 'Karan'],
'Marks': [85, 72, 90, 67, 78],
'City': ['Nagpur', 'Pune', 'Mumbai', 'Delhi', 'Nagpur']
}
df = pd.DataFrame(data)
print(df)
Output
| Student | Marks | City |
|---|---|---|
| Amit | 85 | Nagpur |
| Rahul | 72 | Pune |
| Sneha | 90 | Mumbai |
| Pooja | 67 | Delhi |
| Karan | 78 | Nagpur |
Step 4: Basic Data Analysis
Display First Rows
print(df.head())
Dataset Information
print(df.info())
Statistical Summary
print(df.describe())
Step 5: Filter Data
Students with Marks Greater Than 75
high_marks = df[df['Marks'] > 75]
print(high_marks)
Output
| Student | Marks | City |
|---|---|---|
| Amit | 85 | Nagpur |
| Sneha | 90 | Mumbai |
| Karan | 78 | Nagpur |
Step 6: Calculate Average Marks
average = df['Marks'].mean()
print("Average Marks:", average)
Step 7: Group By Example
City Wise Average Marks
city_avg = df.groupby('City')['Marks'].mean()
print(city_avg)
Step 8: Data Visualization
Bar Chart
plt.bar(df['Student'], df['Marks'])
plt.title("Student Marks Analysis")
plt.xlabel("Students")
plt.ylabel("Marks")
plt.show()
Step 9: Upload Excel/CSV File in Colab
from google.colab import files
uploaded = files.upload()
After upload:
df = pd.read_csv('filename.csv')
print(df.head())
Step 10: Real-Time Project Ideas
Student Result Analysis
- Highest Marks
- Lowest Marks
- Average Result
- Subject-wise Analysis
Sales Data Analysis
- Monthly Sales
- Profit Analysis
- Product Performance
Employee Analysis
- Salary Analysis
- Department-wise Report
- Attendance Report
Popular Python Libraries for Data Analysis
| Library | Use |
|---|---|
| pandas | Data handling |
| numpy | Numerical operations |
| matplotlib | Charts |
| plotly | Interactive charts |
| sklearn | Machine learning |
Simple Interview Questions
What is Pandas?
Pandas is a Python library used for data analysis and data manipulation.
What is DataFrame?
A DataFrame is a table-like structure in pandas.
What is Google Colab?
Google Colab is an online Python notebook platform.
Mini Practice Task
Create a dataset of:
- Employee Name
- Salary
- Department
Then perform:
- Average Salary
- Highest Salary
- Department-wise grouping
- Bar chart visualization
Comments
Post a Comment