NumPy Basics: Arrays and Vectorized Math
NumPy is the foundation of numerical computing in Python. If you’re doing anything with data analysis, machine learning, or scientific computing, you’ll be working with NumPy arrays from the first line of code. This tutorial covers the core concepts you need: creating arrays, vectorized operations, and broadcasting.
What is a NumPy Array?
A NumPy array is a grid of values, all of the same type, indexed by a tuple of integers. Unlike a Python list, a NumPy array is stored in contiguous memory and supports vectorized operations — you apply an operation to the entire array at once, without writing a loop.
import numpy as np
# A 1D array
a = np.array([1, 2, 3, 4, 5])
print(a)
# [1 2 3 4 5]
print(type(a))
# <class 'numpy.ndarray'>
The key difference from a list: when you apply a * 2 to a NumPy array, every element doubles. With a Python list, * 2 duplicates the list itself.
# NumPy: element-wise
a = np.array([1, 2, 3])
print(a * 2)
# [2 4 6]
# Python list: replication
b = [1, 2, 3]
print(b * 2)
# [1, 2, 3, 1, 2, 3]
Creating Arrays
NumPy provides many ways to create arrays:
# From a Python list
a = np.array([1, 2, 3])
# Range of values (like range(), but returns an array)
b = np.arange(0, 10, 2) # start, stop, step
print(b)
# [0 2 4 6 8]
# Evenly spaced numbers
c = np.linspace(0, 1, 5) # start, stop, number of points
print(c)
# [0. 0.25 0.5 0.75 1. ]
# Arrays of zeros or ones
z = np.zeros(5)
print(z)
# [0. 0. 0. 0. 0.]
m = np.ones((3, 3)) # tuple for shape of multi-dimensional array
print(m)
# [[1. 1. 1.]
# [1. 1. 1.]
# [1. 1. 1.]]
# Identity matrix
i = np.eye(4)
print(i)
# [[1. 0. 0. 0.]
# [0. 1. 0. 0.]
# [0. 0. 1. 0.]
# [0. 0. 0. 1.]]
The dtype attribute tells you the data type of an array:
a = np.array([1, 2, 3])
print(a.dtype)
# int64
b = np.array([1.0, 2.0, 3.0])
print(b.dtype)
# float64
# Specify dtype explicitly
c = np.array([1, 2, 3], dtype=np.float32)
print(c.dtype)
# float32
Array Shape and Dimensions
Every array has a shape — a tuple describing its dimensions:
a = np.array([1, 2, 3])
print(a.shape) # (3,)
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b.shape) # (2, 3)
c = np.zeros((2, 3, 4)) # 2 layers, 3 rows, 4 columns
print(c.shape) # (2, 3, 4)
Use reshape() to change the shape without changing the data:
a = np.arange(12)
print(a)
# [ 0 1 2 3 4 5 6 7 8 9 10 11]
print(a.shape) # (12,)
b = a.reshape(3, 4)
print(b)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
print(b.shape) # (3, 4)
# Or use -1 to infer one dimension
c = a.reshape(3, -1) # -1 means "whatever fits"
print(c.shape) # (3, 4)
Vectorized Operations
The real power of NumPy: operations apply to every element automatically.
a = np.array([1, 2, 3, 4])
# Arithmetic
print(a + 10) # [11 12 13 14]
print(a * 2) # [2 4 6 8]
print(a ** 2) # [1 4 9 16]
print(a % 2) # [1 0 1 0]
# Comparison returns a boolean array
print(a > 2) # [False False True True]
Mathematical functions operate element-wise:
a = np.array([0, np.pi/2, np.pi])
print(np.sin(a))
# [0.0000000e+00 1.0000000e+00 1.2246468e-16]
print(np.sqrt(a))
# [0. 1.25331414 1.77245385]
Aggregation functions:
a = np.array([1, 2, 3, 4, 5])
print(a.sum()) # 15
print(a.mean()) # 3.0
print(a.std()) # 1.4142135623730951
print(a.min()) # 1
print(a.max()) # 5
print(a.cumsum()) # [1 3 6 10 15]
For 2D arrays, specify the axis:
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b.sum()) # 21 — all elements
print(b.sum(axis=0)) # [5 7 9] — sum each column
print(b.sum(axis=1)) # [ 6 15] — sum each row
Broadcasting
Broadcasting is how NumPy handles operations between arrays of different shapes. It stretches the smaller array to match the larger one — without actually copying data.
# Add a scalar to an array
a = np.array([1, 2, 3])
print(a + 10) # [11 12 13]
# The scalar 10 is "broadcast" to [10, 10, 10]
# Add a 1D array to a 2D array (row-wise)
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])
print(a + b)
# [[11 22 33]
# [14 25 36]]
# Add a 1D array to a 2D array (column-wise)
c = np.array([[1], [2], [3]]) # shape (3, 1)
print(a + c)
# [[ 2 3 4]
# [ 6 7 8]
# [ 7 8 9]]
Broadcasting rules: two dimensions are compatible when they are equal, or one of them is 1. NumPy compares dimensions from the rightmost side.
# This works: shape (3,) + shape (3, 1) → (3, 3)
a = np.arange(3).reshape(3, 1)
b = np.arange(3)
print((a + b).shape) # (3, 3)
Indexing and Slicing
NumPy arrays support familiar Python slicing syntax:
a = np.arange(10)
print(a[2:7]) # [2 3 4 5 6] — slice from index 2 to 6
print(a[:5]) # [0 1 2 3 4] — from start to index 4
print(a[::2]) # [0 2 4 6 8] — every other element
print(a[::-1]) # [9 8 7 6 5 4 3 2 1 0] — reversed
For 2D arrays:
b = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(b[1, 2]) # 6 — row 1, column 2
print(b[:, 0]) # [1 4 7] — entire first column
print(b[0:2, 1:3]) # [[2 3], [5 6]] — submatrix
Boolean indexing — filter by condition:
a = np.array([10, 20, 30, 40, 50])
mask = a > 25
print(mask) # [False False True True True]
print(a[mask]) # [30 40 50]
# Or inline
print(a[a % 20 == 0]) # [20 40]
Array Math vs Python Loops
NumPy operations are implemented in C and operate on entire arrays at once. The difference matters enormously with large data:
# Python loop — slow for large arrays
a = list(range(1000000))
b = [x * 2 + 1 for x in a]
# NumPy — orders of magnitude faster
a = np.arange(1000000)
b = a * 2 + 1
The NumPy version avoids Python’s per-element overhead and can use SIMD instructions and multiple cores.
Installing and Importing
NumPy is a third-party package — install it with pip or conda:
pip install numpy
Or use it through Anaconda:
conda install numpy
In any Python script, import it as np — this is the universal convention:
import numpy as np
a = np.array([1, 2, 3])
Common Gotchas
Float precision in comparisons. Due to floating-point representation, exact equality can fail:
a = np.array([0.1 + 0.2])
print(a[0] == 0.3) # False
# Use np.isclose() instead
print(np.isclose(a[0], 0.3)) # True
Assigning to a slice vs copying. Assigning to a slice modifies the original array:
a = np.array([1, 2, 3, 4])
b = a[:2]
b[0] = 99
print(a) # [99 2 3 4] — a was modified!
Use .copy() to create an independent copy:
a = np.array([1, 2, 3, 4])
b = a[:2].copy()
b[0] = 99
print(a) # [1 2 3 4] — a is unchanged
Array vs matrix. NumPy’s np.matrix class always treats arrays as 2D and uses * for matrix multiplication. In modern NumPy, np.matrix is deprecated — use @ (matmul) or np.matmul() for matrix multiplication on ndarrays.
See Also
- /tutorials/numpy-getting-started/ — installation and first steps with NumPy
- /tutorials/numpy-array-operations/ — element-wise ops, aggregations, array manipulation
- /guides/numpy-arrays-guide/ — deeper look at ndarray internals and performance