Chapter 2: Pandas Library

1. Why Pandas?

Think of Pandas as the “Excel-plus” of Python: it lets you load data from many sources, rearrange it with a few commands, and save it back out—while keeping everything in regular Python code so it’s easy to automate or share.

2. The Two Core Building Blocks

Object	What it represents	Quick mental picture
`Series`	One-dimensional labelled array	A single column from a spreadsheet
`DataFrame`	Two-dimensional labelled table (rows × columns)	A whole spreadsheet tab

You’ll spend 90 % of your time with DataFrame, but it helps to know that each column inside it is itself a Series.

3. Reading Data In

You have …	Use this function	What you get back
A CSV file	`pd.read_csv("file.csv")`	A `DataFrame` with typed columns
An Excel file	`pd.read_excel("file.xlsx")`	Same idea—column names come from the sheet
Many other formats (JSON, SQL, HTML tables, Parquet…)	`pd.read_*()` family	The pattern is consistent

Under the hood: Pandas calls highly-optimized C/NumPy code, so even huge files load quickly.

4. Writing Data Out

After you finish cleaning or analysing, one line puts the result where colleagues can open it:

df.to_excel("results.xlsx", index=False)   # or df.to_csv(...)

Tip: index=False prevents the row numbers from becoming an extra column.

5. Grabbing the Data You Need

Situation	Use …	Explanation
“Give me row 0, column 'Score' by its label.”	`df.loc[0, "Score"]`	`loc` = location by label (row index / column name)
“Give me the second row (position 1) by number.”	`df.iloc[1]`	`iloc` = integer location
“I want the raw NumPy matrix for machine-learning.”	`df.values` (or the new `df.to_numpy()`)	Returns a 2-D `ndarray`

Mnemonic: loc - label, iloc - integer.

6. A Mini-Walk-through (ties it all together)

import pandas as pd

df = pd.read_csv("sample.csv")          # 1. Load
df.to_excel("sample.xlsx", index=False) # 2. Save elsewhere

alice_score = df.loc[0, "Score"]        # 3a. Label-based lookup
second_row  = df.iloc[1]                # 3b. Position-based lookup
as_array    = df.values                 # 4. NumPy view

What you would see in the notebook/console:

Read CSV:
      Name  Age  Score
0    Alice   25     85
1      Bob   30     90
2  Charlie   35     88

Alice's Score: 85
Second row:
 Name     Bob
 Age       30
 Score     90
Name: 1, dtype: object

7. Practical Tips & Gotchas

Row labels matter. If your CSV has its own unique ID column (say “EmployeeID”), pass index_col="EmployeeID" to read_csv—then loc feels natural.
Large files? Add dtype hints or chunksize=... to read pieces in streaming-style.
Copy vs view. Operations like df["Age"] return a view, so in-place edits may warn you (SettingWithCopy). Use .loc to be explicit: df.loc[:, "Age"] += 1.
NumPy interop. df.values shares memory—modify the array and the DataFrame changes too.

8. Where to Go Next

Filtering: Boolean masks (df[df["Score"] > 90]).
Grouping + summarising: df.groupby("Age").mean().
Merging: SQL-style joins with pd.merge().

Master these three techniques alongside today’s basics, and you’ll cover most everyday data tasks.

Key takeaway: With pd.read_csv → DataFrame → loc/iloc → to_excel, you can already build a complete data pipeline—read, manipulate, and export. Everything else in Pandas extends or refines these fundamentals.

Data Structures and Object-Oriented Programming