NumPy Arrays: The Complete Guide

March 14, 2026 · 8 min read · Updated March 14, 2026 · intermediate

python numpy arrays data

NumPy is the foundation of numerical computing in Python. At its core are arrays — homogeneous, fixed-size data structures that are vastly more efficient than Python lists for numerical operations. This guide covers everything you need to know to work with NumPy arrays effectively.

Installing and Importing NumPy

First, install NumPy if you haven’t already:

pip install numpy

Then import it in your Python code:

import numpy as np

The convention is to alias numpy as np — you’ll see this throughout all NumPy documentation and tutorials.

Creating Arrays

The simplest way to create an array is from a Python list:

import numpy as np

# 1D array (vector)
arr = np.array([1, 2, 3, 4, 5])
print(arr)  # [1 2 3 4 5]

# 2D array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix)
# [[1 2 3]
#  [4 5 6]]

# 3D array (tensor)
tensor = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(tensor.shape)  # (2, 2, 2)

NumPy arrays are homogeneous — all elements must have the same data type. NumPy will automatically convert to a common type:

arr = np.array([1, 2.5, 3])  # All converted to float64
print(arr.dtype)  # float64

Specialized Array Creation Functions

NumPy provides functions for creating common array patterns:

import numpy as np

# Arrays of zeros
np.zeros(5)              # 1D array of zeros
np.zeros((3, 4))         # 3x4 matrix of zeros

# Arrays of ones
np.ones(5)                # 1D array of ones
np.ones((2, 3), dtype=int)

# Arrays with a constant value
np.full((2, 2), 7)       # 2x2 array filled with 7

# Identity matrix
np.eye(3)                # 3x3 identity matrix
np.eye(4, k=1)           # 4x4 with 1s on first diagonal above main

# Sequences
np.arange(0, 10, 2)      # [0, 2, 4, 6, 8] — like range()
np.linspace(0, 1, 5)    # [0., 0.25, 0.5, 0.75, 1.] — 5 evenly spaced points

# Random arrays
np.random.rand(3, 3)     # Uniform distribution [0, 1)
np.random.randn(3, 3)    # Standard normal distribution
np.random.randint(0, 10, (3, 3))  # Random integers [0, 10)
np.random.choice([1, 2, 3], size=5)  # Random selection

The dtype System

NumPy’s data type system controls memory usage and numerical precision:

import numpy as np

# Common dtypes
arr_int = np.array([1, 2, 3], dtype=np.int32)   # 32-bit integer
arr_float = np.array([1, 2, 3], dtype=np.float32)  # 32-bit float
arr_float64 = np.array([1, 2, 3], dtype=np.float64)  # 64-bit float

# Converting dtypes
arr = np.array([1, 2, 3])
print(arr.astype(np.float64))  # [1. 2. 3.]

# Checking dtype
arr = np.array([1, 2, 3])
print(arr.dtype)  # int64

For numerical computing, float64 is the default and offers the best precision, but float32 uses half the memory.

Array Properties

Every NumPy array has useful attributes:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.ndim)       # 2 — number of dimensions
print(arr.shape)      # (2, 3) — size of each dimension
print(arr.size)       # 6 — total number of elements
print(arr.dtype)     # int64 — data type
print(arr.itemsize)  # 8 — bytes per element
print(arr.nbytes)    # 48 — total bytes (size * itemsize)
print(arr.strides)   # (24, 8) — bytes to step in each dimension

The strides attribute reveals how NumPy achieves fast array operations — it knows how many bytes to skip to move to the next element in each dimension.

Indexing

NumPy supports several indexing methods:

Basic Indexing

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Single element — returns a scalar
print(arr[0, 0])  # 1
print(arr[1, 2])  # 6

# Negative indexing works like Python lists
print(arr[-1, -1])  # 6 — last row, last column

# Slicing rows and columns
print(arr[0, :])    # [1, 2, 3] — first row
print(arr[:, 1])    # [2, 5] — second column
print(arr[0, 1:3])  # [2, 3] — first row, columns 1 and 2

Boolean Indexing

Filter arrays using boolean conditions:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

# Boolean mask
mask = arr > 3
print(arr[mask])  # [4, 5, 6]

# Inline boolean indexing
print(arr[arr % 2 == 0])  # [2, 4, 6] — even numbers only

# Using np.where for conditional selection
arr = np.array([1, 2, 3, 4, 5])
result = np.where(arr > 2, arr * 2, arr)
print(result)  # [1, 2, 6, 8, 10]

Fancy Indexing

Use arrays as indices to select specific elements:

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Integer array indexing
print(arr[[0, 2, 4]])  # [10, 30, 50] — elements at indices 0, 2, 4

# 2D example
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rows = [0, 1, 2]
cols = [2, 1, 0]
print(matrix[rows, cols])  # [3, 5, 7]

# Using np.ix_ for 2D indexing
print(matrix[np.ix_([0, 2], [1, 2])])
# [[2 3]
#  [8 9]]

Array Operations

NumPy excels at vectorized operations — applying operations to entire arrays without explicit loops:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Arithmetic operations apply element-wise
print(arr + 1)   # [2, 3, 4, 5, 6]
print(arr * 2)   # [2, 4, 6, 8, 10]
print(arr ** 2)  # [1, 4, 9, 16, 25]
print(arr / 2)   # [0.5, 1., 1.5, 2., 2.5]

# Array-to-array operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)   # [5, 7, 9]
print(a * b)   # [4, 10, 18]

# Comparison returns boolean arrays
print(arr > 3)  # [False, False, False, True, True]

Universal Functions (ufuncs)

NumPy provides fast, vectorized mathematical functions:

import numpy as np

arr = np.array([0, np.pi/2, np.pi, 3*np.pi/2])

# Trigonometric
print(np.sin(arr))  # [0., 1., 0., -1.]
print(np.cos(arr))  # [1., 0., -1., 0.]
print(np.tan(arr))  # [0., inf, 0., inf]

# Exponential and logarithm
print(np.exp(arr))  # e^arr
print(np.log(arr))  # natural log — raises warning for <= 0
print(np.log10(arr))  # base 10
print(np.log2(arr))   # base 2

# Rounding
arr = np.array([1.4, 1.6, 2.5])
print(np.floor(arr))  # [1., 1., 2.]
print(np.ceil(arr))   # [2., 2., 3.]
print(np.round(arr))  # [1., 2., 2.]

Aggregation Functions

Sum, mean, min, max, and more:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Whole array
print(np.sum(arr))    # 21
print(np.mean(arr))   # 3.5
print(np.min(arr))    # 1
print(np.max(arr))    # 6
print(np.std(arr))    # standard deviation
print(np.var(arr))    # variance

# Along an axis
print(np.sum(arr, axis=0))  # [5, 7, 9] — column sums
print(np.sum(arr, axis=1))  # [6, 15] — row sums

# Find position of min/max
print(np.argmin(arr))  # 0 — flat index
print(np.argmax(arr))  # 5

# Cumulative operations
print(np.cumsum(arr))  # [1, 3, 6, 10, 15, 21]

Reshaping Arrays

Change array dimensions without copying data:

import numpy as np

arr = np.arange(12)  # [0, 1, 2, ..., 11]

# Reshape to different dimensions
print(arr.reshape(3, 4))
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

print(arr.reshape(2, 3, 2))
# [[[ 0  1]
#   [ 2  3]
#   [ 4  5]]
#  [[ 6  7]
#   [ 8  9]
#   [10 11]]]

# -1 means "figure it out automatically"
print(arr.reshape(3, -1))  # 3 rows, 4 columns (12/3=4)
print(arr.reshape(-1))    # Flatten to 1D

# Transpose
matrix = np.arange(6).reshape(2, 3)
print(matrix.T)  # 3x2 matrix
print(matrix.swapaxes(0, 1))  # same as transpose

Flattening and Ravelling

import numpy as np

matrix = np.array([[1, 2, 3], [4, 5, 6]])

# flatten() returns a copy
flat = matrix.flatten()
print(flat)  # [1, 2, 3, 4, 5, 6]

# ravel() returns a view (if possible)
flat_view = matrix.ravel()
print(flat_view)  # [1, 2, 3, 4, 5, 6]

Use ravel() when possible — it’s more memory-efficient since it doesn’t copy data.

Broadcasting

Broadcasting lets NumPy perform operations on arrays with different shapes:

import numpy as np

# Basic broadcasting — scalar expands to array
arr = np.array([1, 2, 3])
print(arr + 10)  # [11, 12, 13] — 10 is broadcast to match arr's shape

# 1D + 2D broadcasting
a = np.array([[1], [2], [3]])  # shape (3, 1)
b = np.array([10, 20, 30])     # shape (3,)
print(a + b)
# [[11, 21, 31]
#  [12, 22, 32]
#  [13, 23, 33]]

# Broadcasting rules: dimensions are compared from right to left
# Dimensions must be equal or one must be 1

Broadcasting follows specific rules: arrays are compared from right to left, and dimensions must match or be 1.

Copying Arrays

Understanding when copies happen is crucial for performance:

import numpy as np

arr = np.array([1, 2, 3])

# Assignment creates a view, not a copy
view = arr[0:2]
print(view.base is arr)  # True — shares memory

# Explicit copy
copy = arr.copy()
print(copy.base is arr)  # False — independent memory

# Some operations always return copies
transposed = arr.T  # For 1D, returns same array but may behave differently in 2D
flattened = arr.flatten()  # Always creates a new array

# Use base to check if array is a view
arr = np.arange(6).reshape(2, 3)
view = arr[0]
print(view.base is arr)  # True — view shares memory

Working with Dates

NumPy has basic datetime support:

import numpy as np

# Create datetime64 arrays
dates = np.array('2024-01-01', dtype='datetime64') + np.arange(5)
print(dates)
# ['2024-01-01' '2024-01-02' '2024-01-03' '2024-01-04' '2024-01-05']

# Datetime arithmetic
start = np.datetime64('2024-01-01')
end = np.datetime64('2024-01-10')
print(end - start)  # 9 days

# Extract components
print(dates.astype('datetime64[D]').astype(int))  # Day of month

Getting Started

NumPy arrays are the backbone of numerical computing in Python. They’re faster than lists, support vectorized operations, and integrate with virtually every scientific Python library.

The key concepts to remember:

Create arrays using np.array(), np.zeros(), np.arange(), and so on
Index using brackets, boolean masks, or fancy indexing
Operations apply element-wise by default
Broadcasting handles different-shaped arrays automatically
Use .copy() when you need an independent array

From here, explore NumPy’s linear algebra module (np.linalg), random number generation (np.random), and integration with pandas for data analysis.