NumPy Basics: Arrays and Vectorized Math

April 21, 2026 · 7 min read ·Updated April 22, 2026 ·beginner

numpyarraysvectorizedbroadcastingdata-sciencebeginner

NumPy is the foundation of numerical computing in Python. If you’re doing anything with data analysis, machine learning, or scientific computing, you’ll be working with NumPy arrays from the first line of code. This tutorial covers the core concepts you need: creating arrays, vectorized operations, and broadcasting.

What is a NumPy Array?

A NumPy array is a grid of values, all of the same type, indexed by a tuple of integers. Unlike a Python list, a NumPy array is stored in contiguous memory and supports vectorized operations — you apply an operation to the entire array at once, without writing a loop.

import numpy as np

# A 1D array
a = np.array([1, 2, 3, 4, 5])
print(a)
# [1 2 3 4 5]
print(type(a))
# <class 'numpy.ndarray'>

The key difference from a list: when you apply a * 2 to a NumPy array, every element doubles. With a Python list, * 2 duplicates the list itself.

# NumPy: element-wise
a = np.array([1, 2, 3])
print(a * 2)
# [2 4 6]

# Python list: replication
b = [1, 2, 3]
print(b * 2)
# [1, 2, 3, 1, 2, 3]

Creating Arrays

NumPy provides many ways to create arrays:

# From a Python list
a = np.array([1, 2, 3])

# Range of values (like range(), but returns an array)
b = np.arange(0, 10, 2)  # start, stop, step
print(b)
# [0 2 4 6 8]

# Evenly spaced numbers
c = np.linspace(0, 1, 5)  # start, stop, number of points
print(c)
# [0.   0.25 0.5  0.75 1.  ]

# Arrays of zeros or ones
z = np.zeros(5)
print(z)
# [0. 0. 0. 0. 0.]

m = np.ones((3, 3))  # tuple for shape of multi-dimensional array
print(m)
# [[1. 1. 1.]
#  [1. 1. 1.]
#  [1. 1. 1.]]

# Identity matrix
i = np.eye(4)
print(i)
# [[1. 0. 0. 0.]
#  [0. 1. 0. 0.]
#  [0. 0. 1. 0.]
#  [0. 0. 0. 1.]]

The dtype attribute tells you the data type of an array:

a = np.array([1, 2, 3])
print(a.dtype)
# int64

b = np.array([1.0, 2.0, 3.0])
print(b.dtype)
# float64

# Specify dtype explicitly
c = np.array([1, 2, 3], dtype=np.float32)
print(c.dtype)
# float32

Array Shape and Dimensions

Every array has a shape — a tuple describing its dimensions:

a = np.array([1, 2, 3])
print(a.shape)  # (3,)

b = np.array([[1, 2, 3], [4, 5, 6]])
print(b.shape)  # (2, 3)

c = np.zeros((2, 3, 4))  # 2 layers, 3 rows, 4 columns
print(c.shape)  # (2, 3, 4)

Use reshape() to change the shape without changing the data:

a = np.arange(12)
print(a)
# [ 0  1  2  3  4  5  6  7  8  9 10 11]
print(a.shape)  # (12,)

b = a.reshape(3, 4)
print(b)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]
print(b.shape)  # (3, 4)

# Or use -1 to infer one dimension
c = a.reshape(3, -1)  # -1 means "whatever fits"
print(c.shape)  # (3, 4)

Vectorized Operations

The real power of NumPy: operations apply to every element automatically.

a = np.array([1, 2, 3, 4])

# Arithmetic
print(a + 10)   # [11 12 13 14]
print(a * 2)    # [2 4 6 8]
print(a ** 2)   # [1 4 9 16]
print(a % 2)    # [1 0 1 0]

# Comparison returns a boolean array
print(a > 2)    # [False False  True  True]

Mathematical functions operate element-wise:

a = np.array([0, np.pi/2, np.pi])

print(np.sin(a))
# [0.0000000e+00 1.0000000e+00 1.2246468e-16]

print(np.sqrt(a))
# [0.         1.25331414 1.77245385]

Aggregation functions:

a = np.array([1, 2, 3, 4, 5])

print(a.sum())    # 15
print(a.mean())   # 3.0
print(a.std())    # 1.4142135623730951
print(a.min())    # 1
print(a.max())    # 5
print(a.cumsum()) # [1 3 6 10 15]

For 2D arrays, specify the axis:

b = np.array([[1, 2, 3], [4, 5, 6]])

print(b.sum())           # 21 — all elements
print(b.sum(axis=0))     # [5 7 9] — sum each column
print(b.sum(axis=1))     # [ 6 15] — sum each row

Broadcasting

Broadcasting is how NumPy handles operations between arrays of different shapes. It stretches the smaller array to match the larger one — without actually copying data.

# Add a scalar to an array
a = np.array([1, 2, 3])
print(a + 10)  # [11 12 13]
# The scalar 10 is "broadcast" to [10, 10, 10]

# Add a 1D array to a 2D array (row-wise)
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])

print(a + b)
# [[11 22 33]
#  [14 25 36]]

# Add a 1D array to a 2D array (column-wise)
c = np.array([[1], [2], [3]])  # shape (3, 1)
print(a + c)
# [[ 2  3  4]
#  [ 6  7  8]
#  [ 7  8  9]]

Broadcasting rules: two dimensions are compatible when they are equal, or one of them is 1. NumPy compares dimensions from the rightmost side.

# This works: shape (3,) + shape (3, 1) → (3, 3)
a = np.arange(3).reshape(3, 1)
b = np.arange(3)
print((a + b).shape)  # (3, 3)

Indexing and Slicing

NumPy arrays support familiar Python slicing syntax:

a = np.arange(10)
print(a[2:7])   # [2 3 4 5 6] — slice from index 2 to 6
print(a[:5])    # [0 1 2 3 4] — from start to index 4
print(a[::2])   # [0 2 4 6 8] — every other element
print(a[::-1])  # [9 8 7 6 5 4 3 2 1 0] — reversed

For 2D arrays:

b = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

print(b[1, 2])        # 6 — row 1, column 2
print(b[:, 0])        # [1 4 7] — entire first column
print(b[0:2, 1:3])    # [[2 3], [5 6]] — submatrix

Boolean indexing — filter by condition:

a = np.array([10, 20, 30, 40, 50])

mask = a > 25
print(mask)  # [False False  True  True  True]
print(a[mask])  # [30 40 50]

# Or inline
print(a[a % 20 == 0])  # [20 40]

Array Math vs Python Loops

NumPy operations are implemented in C and operate on entire arrays at once. The difference matters enormously with large data:

# Python loop — slow for large arrays
a = list(range(1000000))
b = [x * 2 + 1 for x in a]

# NumPy — orders of magnitude faster
a = np.arange(1000000)
b = a * 2 + 1

The NumPy version avoids Python’s per-element overhead and can use SIMD instructions and multiple cores.

Installing and Importing

NumPy is a third-party package — install it with pip or conda:

pip install numpy

Or use it through Anaconda:

conda install numpy

In any Python script, import it as np — this is the universal convention:

import numpy as np

a = np.array([1, 2, 3])

Common Gotchas

Float precision in comparisons. Due to floating-point representation, exact equality can fail:

a = np.array([0.1 + 0.2])
print(a[0] == 0.3)  # False

# Use np.isclose() instead
print(np.isclose(a[0], 0.3))  # True

Assigning to a slice vs copying. Assigning to a slice modifies the original array:

a = np.array([1, 2, 3, 4])
b = a[:2]
b[0] = 99
print(a)  # [99  2  3  4] — a was modified!

Use .copy() to create an independent copy:

a = np.array([1, 2, 3, 4])
b = a[:2].copy()
b[0] = 99
print(a)  # [1 2 3 4] — a is unchanged

Array vs matrix. NumPy’s np.matrix class always treats arrays as 2D and uses * for matrix multiplication. In modern NumPy, np.matrix is deprecated — use @ (matmul) or np.matmul() for matrix multiplication on ndarrays.