Pandas: .loc vs .iloc
CONCEPTUAL
Explain the difference between .loc and .iloc indexers in Pandas DataFrames. Provide examples of how and when you would use each.
Explanation: .loc vs .iloc
In Pandas, .loc and .iloc are the primary accessors used for selecting data from DataFrames (and Series). While both are used for slicing and selecting, they differ fundamentally in how they interpret the input used for selection.
Core Distinction:
.loc(Label-based selection):- Selects data based on the actual labels of the rows and columns.
- The row and column identifiers you provide to
.locare interpreted as labels from the DataFrame's index and column names. - When slicing with labels (e.g.,
df.loc['start_label':'end_label']), both the start and end labels are inclusive.
.iloc(Integer position-based selection):- Selects data based on the integer positions (0-based index) of the rows and columns, similar to how you would index a Python list or a NumPy array.
- The row and column identifiers you provide to
.ilocare interpreted as integer positions. - When slicing with integer positions (e.g.,
df.iloc[start_pos:end_pos]), the start position is inclusive, and the end position is exclusive (standard Python slicing behavior).
Detailed Breakdown & Examples:
Let's consider a sample DataFrame:
import pandas as pd
import numpy as np
data = {'col_A': [10, 20, 30, 40, 50],
'col_B': ['p', 'q', 'r', 's', 't'],
'col_C': np.random.rand(5)}
df = pd.DataFrame(data, index=['row1', 'row2', 'row3', 'row4', 'row5'])
print("Original DataFrame:\n", df)
# Output:
# Original DataFrame:
# col_A col_B col_C
# row1 10 p 0.123...
# row2 20 q 0.456...
# row3 30 r 0.789...
# row4 40 s 0.987...
# row5 50 t 0.654...
Using .loc (Label-based):
- Selecting a single row by label:
print("\nRow 'row2' using .loc:\n", df.loc['row2']) - Selecting multiple rows by a list of labels:
print("\nRows 'row1' and 'row3' using .loc:\n", df.loc[['row1', 'row3']]) - Selecting a range of rows by label (inclusive):
print("\nRows 'row2' to 'row4' using .loc:\n", df.loc['row2':'row4']) - Selecting a single cell by row and column labels:
df.loc[row_label, col_label]print("\nCell at ('row3', 'col_B') using .loc:", df.loc['row3', 'col_B']) - Selecting specific rows and specific columns by labels:
print("\nRows 'row1','row4' and Cols 'col_A','col_C' using .loc:\n", df.loc[['row1', 'row4'], ['col_A', 'col_C']]) - Selecting rows based on a boolean condition (powerful!):
print("\nRows where col_A > 25 using .loc:\n", df.loc[df['col_A'] > 25]) - Setting values using labels:
df_copy = df.copy() df_copy.loc['row1', 'col_A'] = 100 print("\nDataFrame after setting ('row1', 'col_A') to 100 with .loc:\n", df_copy)
Using .iloc (Integer position-based):
- Selecting a single row by integer position:
print("\nRow at position 1 (second row) using .iloc:\n", df.iloc[1]) - Selecting multiple rows by a list of integer positions:
print("\nRows at positions 0 and 2 using .iloc:\n", df.iloc[[0, 2]]) - Selecting a range of rows by integer position (exclusive end):
print("\nRows from position 1 up to (not including) 4 using .iloc:\n", df.iloc[1:4]) - Selecting a single cell by row and column integer positions:
df.iloc[row_pos, col_pos]print("\nCell at (pos 2, pos 1) using .iloc:", df.iloc[2, 1]) # row3, col_B - Selecting specific rows and specific columns by integer positions:
print("\nRows at pos 0,3 and Cols at pos 0,2 using .iloc:\n", df.iloc[[0, 3], [0, 2]]) - Selecting all rows for specific columns by integer position:
print("\nAll rows for columns at pos 0 and 2 using .iloc:\n", df.iloc[:, [0, 2]]) - Setting values using integer positions:
df_copy_iloc = df.copy() df_copy_iloc.iloc[0, 0] = 1000 print("\nDataFrame after setting (pos 0, pos 0) to 1000 with .iloc:\n", df_copy_iloc)
Important Note on Slicing:
df.loc['label1':'label3']is **inclusive** of 'label3'.df.iloc[0:3]is **exclusive** of position 3 (i.e., it gets positions 0, 1, 2).
When to Use Which:
| Scenario | Use .loc |
Use .iloc |
|---|---|---|
| You know the row/column labels (names). | ✔️ Yes (e.g., df.loc['row_name', 'column_name']) |
❌ No (unless labels happen to be integers that match positions) |
| You want to select data by its position, regardless of labels. | ❌ No | ✔️ Yes (e.g., df.iloc[0, 1] for first row, second column) |
| Row/column labels are not integers (e.g., strings, datetimes). | ✔️ Yes | ❌ No (.iloc strictly requires integers for positions) |
| Row/column labels *are* integers, but you want to refer to them as *labels*. | ✔️ Yes (e.g., if index is [10, 20, 30], df.loc[10] uses label 10) |
Use with caution (df.iloc[10] would try to get the 11th row by position, which might be different or out of bounds) |
| You need to select rows based on a boolean condition. | ✔️ Yes (e.g., df.loc[df['col'] > 5]) |
Can be done, but less direct. Often involves converting boolean Series to NumPy array: df.iloc[df['col'].values > 5] or df.iloc[(df['col'] > 5).to_numpy()]. .loc is more natural for this. |
| Slicing behavior for the end point. | Inclusive ('start':'end' includes 'end') |
Exclusive (start:end excludes end) |
Key Considerations:
- Clarity and Readability: Use
.locwhen your selection logic is based on meaningful labels. This often makes code easier to understand. Use.ilocwhen the numerical position is what matters. - Avoiding Ambiguity: If your DataFrame has an integer index (e.g.,
0, 1, 2, ...), using[]directly for selection (e.g.,df[0:2]ordf['col_name']) can sometimes be ambiguous or lead to unexpected behavior depending on whether it's interpreted as label or position..locand.ilocare explicit and therefore preferred to avoid this ambiguity. - Potential for Errors: Mixing up
.locand.iloc(e.g., providing a label to.iloc) will result in errors (TypeErrororKeyError).
In summary: Use .loc for label-based indexing and .iloc for purely integer-based positional indexing. Being explicit with these accessors leads to more robust and readable Pandas code.