The fundamentals of pandas


Pandas has two primary data structures: Series and DataFrame

  • Series: A Series is a one-dimensional labeled array that can hold any data type. It’s similar to a column in a spreadsheet or a one-dimensional NumPy array. Each element in a series has an associated label called an index. The index allows for more efficient and intuitive data manipulation by making it easier to reference specific elements of your data.
  • DataFrame: A dataframe is a two-dimensional labeled data structure—essentially a table or spreadsheet—where each column and row is represented by a Series.
Create a DataFrame

To use pandas in your notebook, first import it. Similar to NumPy, pandas has its own standard alias, pd, that’s used by data professionals around the world. Once you’ve imported pandas into your working environment, create a dataframe. Here are some of the ways to create a DataFrame object in a Jupyter Notebook. 

From a dictionary:

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

From a numpy array:

df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                  columns=['a', 'b', 'c'])

From a comma-separated values (csv) file:

df3 = pd.read_csv('/file_path/file_name.csv')

Attributes and methods
The DataFrame class is powerful and convenient because it comes with a suite of built-in features that simplify common data analysis tasks. These features are known as attributes and methods. An attribute is a value associated with an object or class that is referenced by name using dotted expressions. A method is a function that is defined inside a class body and typically performs an action. A simpler way of thinking about the distinction between attributes and methods is to remember that attributes are characteristics of the object, while methods are actions or operations

Common DataFrame attributes
Data professionals use attributes and methods constantly. Some of the most-used DataFrame attributes include:

AttributeDescription
columnsReturns the column labels of the dataframe
dtypesReturns the data types in the dataframe
ilocAccesses a group of rows and columns using integer-based indexing
locAccesses a group of rows and columns by label(s) or a Boolean array
shapeReturns a tuple representing the dimensionality of the dataframe
values Returns a NumPy representation of the dataframe

Common DataFrame methods

Some of the most-used DataFrame methods include:

MethodDescription
apply()Applies a function over an axis of the dataframe
copy()Makes a copy of the dataframe’s indices and data
describe()Returns descriptive statistics of the dataframe, including the minimum, maximum, mean, and percentile values of its numeric columns; the row count; and the data types
drop()Drops specified labels from rows or columns
groupby()Splits the dataframe, applies a function, and combines the results
head(n=5)Returns the first n rows of the dataframe (default=5)
info()Returns a concise summary of the dataframe
isna()Returns a same-sized Boolean dataframe indicating whether each value is null (can also use isnull() as an alias)
sort_values()Sorts by the values across a given axis
value_counts()Returns a series containing counts of unique rows in the dataframe
where()Replaces values in the dataframe where a given condition is false

These are just a handful of some of the most commonly used attributes and methods—there are many, many more! Some of them can also be used on pandas Series objects. For a more detailed list, refer to the pandas DataFrame documentation, which includes helpful examples of how to use each tool. 

Pandas dataframes are a convenient way to work with tabular data. Each row and each column can be represented by a pandas Series, which is similar to a one-dimensional array. Both dataframes and series have a large collection of methods and attributes to perform common tasks and retrieve information. Pandas also has its own special notation to select data. As you work more with pandas, you’ll become more comfortable with this notation and its many applications in data science.