NumPy Arrays: The Complete Guide
NumPy is the foundation of numerical computing in Python. At its core are arrays — homogeneous, fixed-size data structures that are vastly more efficient than Python lists for numerical operations. This guide covers everything you need to know to work with NumPy arrays effectively.
Installing and Importing NumPy
First, install NumPy if you haven’t already:
pip install numpy
Then import it in your Python code:
import numpy as np
The convention is to alias numpy as np — you’ll see this throughout all NumPy documentation and tutorials.
Creating Arrays
The simplest way to create an array is from a Python list:
import numpy as np
# 1D array (vector)
arr = np.array([1, 2, 3, 4, 5])
print(arr) # [1 2 3 4 5]
# 2D array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix)
# [[1 2 3]
# [4 5 6]]
# 3D array (tensor)
tensor = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(tensor.shape) # (2, 2, 2)
NumPy arrays are homogeneous — all elements must have the same data type. NumPy will automatically convert to a common type:
arr = np.array([1, 2.5, 3]) # All converted to float64
print(arr.dtype) # float64
Specialized Array Creation Functions
NumPy provides functions for creating common array patterns:
import numpy as np
# Arrays of zeros
np.zeros(5) # 1D array of zeros
np.zeros((3, 4)) # 3x4 matrix of zeros
# Arrays of ones
np.ones(5) # 1D array of ones
np.ones((2, 3), dtype=int)
# Arrays with a constant value
np.full((2, 2), 7) # 2x2 array filled with 7
# Identity matrix
np.eye(3) # 3x3 identity matrix
np.eye(4, k=1) # 4x4 with 1s on first diagonal above main
# Sequences
np.arange(0, 10, 2) # [0, 2, 4, 6, 8] — like range()
np.linspace(0, 1, 5) # [0., 0.25, 0.5, 0.75, 1.] — 5 evenly spaced points
# Random arrays
np.random.rand(3, 3) # Uniform distribution [0, 1)
np.random.randn(3, 3) # Standard normal distribution
np.random.randint(0, 10, (3, 3)) # Random integers [0, 10)
np.random.choice([1, 2, 3], size=5) # Random selection
The dtype System
NumPy’s data type system controls memory usage and numerical precision:
import numpy as np
# Common dtypes
arr_int = np.array([1, 2, 3], dtype=np.int32) # 32-bit integer
arr_float = np.array([1, 2, 3], dtype=np.float32) # 32-bit float
arr_float64 = np.array([1, 2, 3], dtype=np.float64) # 64-bit float
# Converting dtypes
arr = np.array([1, 2, 3])
print(arr.astype(np.float64)) # [1. 2. 3.]
# Checking dtype
arr = np.array([1, 2, 3])
print(arr.dtype) # int64
For numerical computing, float64 is the default and offers the best precision, but float32 uses half the memory.
Array Properties
Every NumPy array has useful attributes:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.ndim) # 2 — number of dimensions
print(arr.shape) # (2, 3) — size of each dimension
print(arr.size) # 6 — total number of elements
print(arr.dtype) # int64 — data type
print(arr.itemsize) # 8 — bytes per element
print(arr.nbytes) # 48 — total bytes (size * itemsize)
print(arr.strides) # (24, 8) — bytes to step in each dimension
The strides attribute reveals how NumPy achieves fast array operations — it knows how many bytes to skip to move to the next element in each dimension.
Indexing
NumPy supports several indexing methods:
Basic Indexing
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Single element — returns a scalar
print(arr[0, 0]) # 1
print(arr[1, 2]) # 6
# Negative indexing works like Python lists
print(arr[-1, -1]) # 6 — last row, last column
# Slicing rows and columns
print(arr[0, :]) # [1, 2, 3] — first row
print(arr[:, 1]) # [2, 5] — second column
print(arr[0, 1:3]) # [2, 3] — first row, columns 1 and 2
Boolean Indexing
Filter arrays using boolean conditions:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
# Boolean mask
mask = arr > 3
print(arr[mask]) # [4, 5, 6]
# Inline boolean indexing
print(arr[arr % 2 == 0]) # [2, 4, 6] — even numbers only
# Using np.where for conditional selection
arr = np.array([1, 2, 3, 4, 5])
result = np.where(arr > 2, arr * 2, arr)
print(result) # [1, 2, 6, 8, 10]
Fancy Indexing
Use arrays as indices to select specific elements:
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
# Integer array indexing
print(arr[[0, 2, 4]]) # [10, 30, 50] — elements at indices 0, 2, 4
# 2D example
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rows = [0, 1, 2]
cols = [2, 1, 0]
print(matrix[rows, cols]) # [3, 5, 7]
# Using np.ix_ for 2D indexing
print(matrix[np.ix_([0, 2], [1, 2])])
# [[2 3]
# [8 9]]
Array Operations
NumPy excels at vectorized operations — applying operations to entire arrays without explicit loops:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Arithmetic operations apply element-wise
print(arr + 1) # [2, 3, 4, 5, 6]
print(arr * 2) # [2, 4, 6, 8, 10]
print(arr ** 2) # [1, 4, 9, 16, 25]
print(arr / 2) # [0.5, 1., 1.5, 2., 2.5]
# Array-to-array operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # [5, 7, 9]
print(a * b) # [4, 10, 18]
# Comparison returns boolean arrays
print(arr > 3) # [False, False, False, True, True]
Universal Functions (ufuncs)
NumPy provides fast, vectorized mathematical functions:
import numpy as np
arr = np.array([0, np.pi/2, np.pi, 3*np.pi/2])
# Trigonometric
print(np.sin(arr)) # [0., 1., 0., -1.]
print(np.cos(arr)) # [1., 0., -1., 0.]
print(np.tan(arr)) # [0., inf, 0., inf]
# Exponential and logarithm
print(np.exp(arr)) # e^arr
print(np.log(arr)) # natural log — raises warning for <= 0
print(np.log10(arr)) # base 10
print(np.log2(arr)) # base 2
# Rounding
arr = np.array([1.4, 1.6, 2.5])
print(np.floor(arr)) # [1., 1., 2.]
print(np.ceil(arr)) # [2., 2., 3.]
print(np.round(arr)) # [1., 2., 2.]
Aggregation Functions
Sum, mean, min, max, and more:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Whole array
print(np.sum(arr)) # 21
print(np.mean(arr)) # 3.5
print(np.min(arr)) # 1
print(np.max(arr)) # 6
print(np.std(arr)) # standard deviation
print(np.var(arr)) # variance
# Along an axis
print(np.sum(arr, axis=0)) # [5, 7, 9] — column sums
print(np.sum(arr, axis=1)) # [6, 15] — row sums
# Find position of min/max
print(np.argmin(arr)) # 0 — flat index
print(np.argmax(arr)) # 5
# Cumulative operations
print(np.cumsum(arr)) # [1, 3, 6, 10, 15, 21]
Reshaping Arrays
Change array dimensions without copying data:
import numpy as np
arr = np.arange(12) # [0, 1, 2, ..., 11]
# Reshape to different dimensions
print(arr.reshape(3, 4))
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
print(arr.reshape(2, 3, 2))
# [[[ 0 1]
# [ 2 3]
# [ 4 5]]
# [[ 6 7]
# [ 8 9]
# [10 11]]]
# -1 means "figure it out automatically"
print(arr.reshape(3, -1)) # 3 rows, 4 columns (12/3=4)
print(arr.reshape(-1)) # Flatten to 1D
# Transpose
matrix = np.arange(6).reshape(2, 3)
print(matrix.T) # 3x2 matrix
print(matrix.swapaxes(0, 1)) # same as transpose
Flattening and Ravelling
import numpy as np
matrix = np.array([[1, 2, 3], [4, 5, 6]])
# flatten() returns a copy
flat = matrix.flatten()
print(flat) # [1, 2, 3, 4, 5, 6]
# ravel() returns a view (if possible)
flat_view = matrix.ravel()
print(flat_view) # [1, 2, 3, 4, 5, 6]
Use ravel() when possible — it’s more memory-efficient since it doesn’t copy data.
Broadcasting
Broadcasting lets NumPy perform operations on arrays with different shapes:
import numpy as np
# Basic broadcasting — scalar expands to array
arr = np.array([1, 2, 3])
print(arr + 10) # [11, 12, 13] — 10 is broadcast to match arr's shape
# 1D + 2D broadcasting
a = np.array([[1], [2], [3]]) # shape (3, 1)
b = np.array([10, 20, 30]) # shape (3,)
print(a + b)
# [[11, 21, 31]
# [12, 22, 32]
# [13, 23, 33]]
# Broadcasting rules: dimensions are compared from right to left
# Dimensions must be equal or one must be 1
Broadcasting follows specific rules: arrays are compared from right to left, and dimensions must match or be 1.
Copying Arrays
Understanding when copies happen is crucial for performance:
import numpy as np
arr = np.array([1, 2, 3])
# Assignment creates a view, not a copy
view = arr[0:2]
print(view.base is arr) # True — shares memory
# Explicit copy
copy = arr.copy()
print(copy.base is arr) # False — independent memory
# Some operations always return copies
transposed = arr.T # For 1D, returns same array but may behave differently in 2D
flattened = arr.flatten() # Always creates a new array
# Use base to check if array is a view
arr = np.arange(6).reshape(2, 3)
view = arr[0]
print(view.base is arr) # True — view shares memory
Working with Dates
NumPy has basic datetime support:
import numpy as np
# Create datetime64 arrays
dates = np.array('2024-01-01', dtype='datetime64') + np.arange(5)
print(dates)
# ['2024-01-01' '2024-01-02' '2024-01-03' '2024-01-04' '2024-01-05']
# Datetime arithmetic
start = np.datetime64('2024-01-01')
end = np.datetime64('2024-01-10')
print(end - start) # 9 days
# Extract components
print(dates.astype('datetime64[D]').astype(int)) # Day of month
Getting Started
NumPy arrays are the backbone of numerical computing in Python. They’re faster than lists, support vectorized operations, and integrate with virtually every scientific Python library.
The key concepts to remember:
- Create arrays using
np.array(),np.zeros(),np.arange(), and so on - Index using brackets, boolean masks, or fancy indexing
- Operations apply element-wise by default
- Broadcasting handles different-shaped arrays automatically
- Use
.copy()when you need an independent array
From here, explore NumPy’s linear algebra module (np.linalg), random number generation (np.random), and integration with pandas for data analysis.
See Also
- Getting Started with NumPy — Installation and first steps
- NumPy Array Operations — Advanced array manipulation
- Getting Started with pandas — Data analysis built on NumPy