Array programming

1. Array programming#

NumPy is the fundamental package for scientific computing in Python. It provides a powerful multidimensional array, along with an assortment of fast routines for performing mathematical, logical, sorting, selecting, linear algebra, and statistical operations. A growing plethora of scientific Python packages use NumPy arrays. If you are going to work on signal/image processing, data analysis, or machine learning projects, it is nearly mandatory to have a solid understanding of NumPy.

Python: As a simple example, consider the case of multiplying each element in a 1-D sequence with the corresponding element in another sequence of the same length. If the data are stored in two Python lists, we could iterate over each element.

a = [1,2,3]
b = [4,5,6]
c = []
for i in range(len(a)):
    c.append(a[i] * b[i])
    
print(c)
[4, 10, 18]

This produces the correct answer, but if a and b each contain millions of numbers, we will pay the price for the inefficiencies of looping in Python. We could accomplish the same task much more quickly in C. This saves all the overhead involved in interpreting the Python code and manipulating Python objects, but at the expense of the benefits gained from coding in Python. NumPy offers a compromise between these two approaches.

Numpy: To start working with Numpy, we need to import the numpy package first.

import numpy as np

After the import, we can refer to NumPy’s functions through the prefix np. At the core of NumPy package is a powerful array that allows us to perform mathematical operations in a “element-by-element” fashion via pre-compiled C code. This gives us the best of both worlds: the simplicity of Python, with the efficiency of C. The code below does what the earlier example does, at near-C speed, but with the code simplicity we expect from something based on Python.

a = np.array([1,2,3])
b = np.array([4,5,6])
c = a * b
>>> c
array([4, 10, 18])

This example illustrates two of NumPy’s powerful features: vectorization and broadcasting.

  • Vectorization describes the absence of any explicit looping in the code. These things are taking place just “behind the scenes” in optimized, pre-compiled C code. Vectorized code has the advantage of being more concise and easier to read.

  • Broadcasting is the term used to describe the implicit element-by-element behavior of operations. Most NumPy operations behave in this way, i.e., they broadcast. This is a powerful mechanism that allows NumPy operations to work with arrays of “compatible” shapes.