How to select data with pandas.DataFrame.loc in Python

When working with pandas, one of the most powerful features at your disposal is the ability to select data from a DataFrame using the loc indexer. This allows you to access rows and columns by labels, which can be incredibly useful when you’re dealing with large datasets where you want to slice and dice your data based on specific criteria.

To get started, let’s consider a simple DataFrame. Here’s how you can create one:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)

Now that you have a DataFrame, you can use loc to select rows by their index labels. For instance, if you want to access the row for ‘Alice’, you would do the following:

alice_row = df.loc[0]

This will return a Series object containing all the values in the row where the index is 0. You can also select multiple rows by passing a list of labels:

selected_rows = df.loc[[0, 2]]

This will give you a new DataFrame with just the rows for ‘Alice’ and ‘Charlie’. But loc isn’t just limited to row selection; you can also select specific columns. For example, if you want to get the names and cities of the selected rows, you can do:

names_and_cities = df.loc[[0, 2], ['Name', 'City']]

Here, you’re effectively slicing the DataFrame to obtain specific information. That is particularly useful in scenarios where you need to focus on certain attributes while ignoring others.

One common task is filtering data based on conditions. For example, if you wanted to select all rows where the age is greater than 25, you can combine boolean indexing with loc:

age_filter = df.loc[df['Age'] > 25]

This will return a DataFrame with only the rows where the age is greater than 25. Such filtering is essential in data analysis, enabling you to hone in on relevant subsets of your data. You can also perform more complex queries by chaining conditions using the bitwise operators:

complex_filter = df.loc[(df['Age'] > 20) & (df['City'] == 'New York')]

This returns rows where the age is above 20 and the city is New York. The ability to combine these conditions gives you a lot of flexibility in how you explore and analyze your datasets.

Another useful aspect of loc is that it allows for label-based slicing. If you have a DataFrame indexed by labels other than the default integers, you can easily select a range of rows as long as you specify the starting and ending labels:

df.set_index('Name', inplace=True)
sliced_df = df.loc['Alice':'Charlie']

This will return all rows in the DataFrame from ‘Alice’ to ‘Charlie’, inclusive. Such label-based slicing can make your code cleaner and easier to understand, especially when working with datasets that have meaningful indices.

As you dive deeper into pandas, mastering the loc indexer will be crucial for efficient data manipulation. Its ability to handle both row and column selection based on labels empowers you to perform a wide range of data extraction tasks. The more comfortable you become with loc, the more intuitive your data analysis process will be, which will allow you to focus on deriving insights rather than getting bogged down in the mechanics of data retrieval.

JBL Tune Buds - True wireless Noise Cancelling earbuds, JBL Pure Bass Sound, Bluetooth 5.3, 4-Mic technology for Crisp, Clear Calls, Up to 48 hours of battery life, Water and dust resistant (Black)

(42510207)

$59.95 (as of June 28, 2026 11:11 GMT +00:00 - )

Mastering advanced indexing techniques with loc

Beyond simpler selection and slicing, loc supports some advanced indexing techniques that can significantly enhance your data manipulation capabilities. One of these is using boolean arrays directly within loc to filter rows dynamically, combined with column selection to retrieve only relevant data.

For example, suppose you want to get the names of people who live in either New York or Chicago and are older than 23. You can combine multiple conditions using loc like this:

cities = ['New York', 'Chicago']
filtered = df.loc[(df['City'].isin(cities)) & (df['Age'] > 23), 'Name']

This returns a Series containing only the names that meet both conditions. Notice how loc gracefully handles both the boolean mask and column selection in a single line.

Another powerful technique is using callable functions inside loc. Instead of creating intermediate variables for filters, you can pass a function that takes the DataFrame and returns a boolean mask or labels. This can make chains of transformations more concise and readable:

young_ny = df.loc[lambda d: (d['Age'] <= 25) & (d['City'] == 'New York')]

Using callables can be especially handy in method chains, helping keep your code clean and expressive without sacrificing performance or clarity.

You can also use loc for setting values conditionally. Suppose you want to add a new column, Senior, marking those over 30 years old as True. Here’s how you would do that:

df.loc[:, 'Senior'] = df['Age'] > 30

This creates or overwrites the Senior column with a boolean mask. If you want to update values only for a subset of rows, say changing the city name for ‘David’, you can do:

df.loc[df.index == 'David', 'City'] = 'Austin'

Notice how loc allows precise targeting for updates, avoiding the need for cumbersome looping or intermediate DataFrames.

When working with multi-index DataFrames, loc gets even more interesting. You can pass tuples to select rows by multiple levels of the index, or use slice(None) to select all entries along one level while filtering another:

arrays = [
    ['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
    ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
multi_df = pd.DataFrame({'A': range(8), 'B': range(8, 16)}, index=index)

# Select all rows where 'first' == 'baz'
selected = multi_df.loc[('baz', slice(None)), :]

This snippet selects all rows where the first level of the index is ‘baz’, regardless of the second level. Multi-indexing with loc is a game-changer when dealing with hierarchical data.

Finally, remember that loc supports label-based slicing that’s inclusive of the stop label. This contrasts with Python’s usual exclusive slice behavior and can be leveraged to your advantage:

subset = df.loc['Alice':'David', 'Age':'City']

This grabs all rows from ‘Alice’ through ‘David’ and all columns from ‘Age’ through ‘City’, including both endpoints. This inclusive slicing makes it easier to specify ranges without guesswork.

How to select data with pandas.DataFrame.loc in Python

JBL Tune Buds - True wireless Noise Cancelling earbuds, JBL Pure Bass Sound, Bluetooth 5.3, 4-Mic technology for Crisp, Clear Calls, Up to 48 hours of battery life, Water and dust resistant (Black)

Mastering advanced indexing techniques with loc

Comments

Leave a Reply Cancel reply

How to configure warning options with sys.warnoptions in Python

How to estimate object size using sys.getsizeof in Python

How to set recursion limit using sys.setrecursionlimit in Python

How to get Python version info with sys.version_info in Python