v1 - Spring 2019: Dominic Carrano, Sukrit Arora, and Babak Ayazifar
This semester, we're adding a new component to EE 120, iPython assignments or "labs". Really, they're just homework assignments though.
To better prepare you for future edneavors in these areas, we want to give you exposure to real world applications of the material via Python (specifically, the Python scientific computing stack of numpy, scipy, and matplotlib all together in iPython notebooks), a popular tool in both industry and academia.
The labs showcase algorithms and topics that are not explicitly covered in lecture, but that are fully comprehensible using the knowledge you've gained from class. Typically, we provide a "Background" section (and/or information within the actual questions) at the start of the lab explaining the topic in sufficient detail for you to complete the lab. In addition, there will typically be a "References" section at the end containing relevant links for exploring the topic further, although these aren't required reading by any means.
We have invested much time and effort into making these labs interesting and application-driven, and hope that you find them both informative and fun. We welcome and encourage any feedback you have on them. Don't hesitate to reach out to a member of the course staff if you have suggestions for how they could be improved!
In terms of what we expect from you to do with these: every place you're required to answer a question, whether it be in the form of writing code or interpreting plots/results that you generate, it will be marked with "TODO".
When done, go to the notebook menu (under the jupyter logo) and click File -> Download as -> PDF via LaTeX (.pdf)
. This is what you'll submit to Gradescope.
For this problem, you're not required to write any code except for one place where you tab-autocomplete a variable name. Simply run all the cells and acquaint yourself with the functions and features that they showcase.
Modified and adapted from Berkeley Python Bootcamp 2013, Python for Signal Processing, EE123, and EE126 iPython Notebook Tutorials.
The ipython notebook is divided into cells. Each cell can contain texts, codes or html scripts. Running a non-code cell simply advances to the next cell. To run a code cell using Shift-Enter or pressing the play button in the toolbar above:
1+2
For debugging, often we would like to interupt the current running process. This can be done by pressing the stop button.
When a processing is running, the circle on the right upper corner is filled. When idle, the circle is empty.
import time
while(1):
print("error")
time.sleep(1)
Interrupting sometimes does not work. You can reset the state by restarting the kernel. This is done by clicking Kernel/Restart or the Refresh button in the toolbar above.
To save your notebook, either select "File->Save and Checkpoint"
or hit Command-s
for Mac and Ctrl-s
for Windows
To undo changes in each cell, hit Command-z
for Mac and Ctrl-z
for Windows.
To undo Delete Cell
, select Edit->Undo Delete Cell
.
One useful feature of iPython is tab completion
one_plus_one = 1+1
# TODO type `one_` then hit TAB to auto-complete the variable
print(one_
Another useful feature is the help command. Type any function followed by ?
returns a help window. Hit the x
button to close it.
abs?
"Insert->Insert New Cell Below"
or click the white plus buttonMarkdown
for textMarkdown
cells by double-clicking them.Help->Keyboard Shortcuts
has a list of keyboard shortcutsThese are the libraries that we will be using in this class:
Numpy
NumPy is the fundamental package for scientific computing with Python.
Scipy
The SciPy library is a collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics and much more.
matplotlib
matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
numpy and matplotlib are both very systematic, in the sense that you're usually not calling one-off functions from them, but rather using them throughout your code, almost like a sort of sub-Programming language inside Python. By the end of this semester, you'll be fairly well-seasoned in both. scipy, on the other hand, is more of a collection of various useful functions that you'll look up when you need to do something really specific that numpy doesn't have, like digital filter design (see EE 123 for more) or some more complicated algorithms like k-means clustering (as you saw in EE 16B).
To import a specific library x
, simply type import x
.
To access the library function f
, type x.f
.
If you want to change the library name to y
, type import x as y
.
# CONVENTION: "import numpy as np" when importing numpy, it lets us use "np" as shorthand!
# You could just as easily do "import numpy as derp", but then you'd have all this weird code
# like "derp.convolve(x, h, mode='full')" - stick to np.
import numpy as np
np.ones((3,1))
Unlike MATLAB, there is a difference between int
and float
in Python 2. Mainly, integer division returns the floor in Python 2. However, in Python 3 there is no floor, but it is always good to check this when debugging!
1 / 4
1 / 4.0
Unlike MATLAB/C, doubles quotes and single quotes are the same thing. Both represent strings. '+'
concatenates strings
# This is a comment
"EE " + '120'
A list is a mutable array of data. That is we can change it after we create it. They can be created using square brackets []
Important functions:
'+'
appends lists.len(x)
to get lengthx = ["EE"] + [1, 2, 0]
print(x)
print(len(x))
A tuple is an immutable list. They can be created using round brackets (). They are usually used as inputs and outputs to functions.
t = ("E", "E") + (1, 2, 0)
print(t)
# cannot do assignment to a tuple after creation - it's immutable
t[4] = 3 # will error
# note: errors in ipython notebook appear inline
The numpy array, aka an "ndarray", is like a list with multidimensional support and more functions. This will be the primary data structure in our class.
Arithmetic operations on NumPy arrays correspond to elementwise operations.
Important NumPy Array functions:
.shape
returns the dimensions of the array.
.ndim
returns the number of dimensions.
.size
returns the number of entries in the array.
len()
returns the first dimension.
To use functions in NumPy, we have to import NumPy to our workspace. This is done by the command import numpy
. By convention, we rename numpy
as np
for convenience.
x = np.array([[1, 2], [3 , 4]])
print(x)
x.shape # returns the dimensions of the numpy array
np.shape(x) # equivalent to x.shape
One major advantage of using numpy arrays is that arithmetic operations on numpy arrays correspond to elementwise operations.
print(x)
print()
print(x + 2) # numpy is smart and assumes you want this to be done to all elements!
You can use np.matrix
with the multiplication operator or np.dot
to do matrix multiplication.
print(np.matrix(x) * np.matrix(x))
print() # newline for formatting
# Or
print(np.dot(x,x))
Numpy uses pass-by-reference semantics so it creates views into the existing array, without implicit copying. This is particularly helpful with very large arrays because copying can be slow.
x = np.array([1,2,3,4,5,6])
print(x)
We slice an array from a to b-1 with [a:b]
.
y = x[0:4]
print(y)
Because slicing does not copy the array, changing y
changes x
.
y[0] = 7
print(x)
print(y)
To actually copy x, we should use .copy()
.
x = np.array([1,2,3,4,5,6])
y = x.copy()
y[0] = 7
print(x)
print(y)
We use arange
to create integer sequences in numpy arrays. It's exactly like the normal range function in Python, except that it automatically returns the result as a numpy array, rather than the plain vanilla Python list.
arange(0,N)
creates an array listing every integer from 0 to N-1.
arange(0,N,m)
creates an array listing every m
th integer from 0 to N-1 .
print(np.arange(-5,5)) # every integer from -5 ... 4
print(np.arange(0,5,2)) # every other integer from 0 ... 4
In this class, we will use matplotlib.pyplot
to plot signals and images.
By convention, we import matplotlib.pyplot
as plt
.
To display the plots inside the browser, we use the command %matplotlib inline
- do not forget this line whenever you start a new notebook. We'll always include it for you, but in case your plots aren't showing up in the notebook when you're playing around on your own, it's probably because you forgot this. If you don't include it, Python will default to displaying it in another window on your computer, which normally is fine, but here we need the plots in the notebook so they show up in your submission PDF.
import matplotlib.pyplot as plt # by convention, we import pyplot as plt
# plot in browser instead of opening new windows
%matplotlib inline
# Generate signals
x = np.arange(0, 1, 0.001)
y1 = np.exp(-x) # decaying exponential
y2 = np.sin(2 * np.pi * 10.0 * x)/4.0 + 0.5 # 10 Hz sine wave
plt.figure()
plt.plot(x, y1)
plt.show()
plt.figure()
plt.plot(x, y1)
plt.plot(x, y2)
plt.show()
# figsize is the dimensions of the figure:
# - the first argument is the width
# - the second is height
# it's useful when you want to adjust the figure's dimensions, e.g. you
# need a huge x-axis for data taken over a long time period
plt.figure(figsize=(16, 4))
plt.plot(x, y1)
# fancy formatting stuff - try playing with it!
plt.title("Decaying Exponential")
plt.xlabel("Time")
plt.ylabel("Amplitude")
plt.legend(["$e^{-x}$"]) # LaTeX fancy formatting
plt.xlim([0, 1]) # zoom in on x-axis
# asking plt for a new figure before plotting will put the next call to plt.plot
# on that new figure
plt.figure(figsize=(16, 4))
plt.plot(x, y2)
plt.title("10 Hz Sine Wave")
# ALWAYS make sure to call plt.show() *ONCE* after all your plotting code so your plots are displayed!
# You only need to call it once per code cell, even if you have multiple figures.
plt.show()
Make no mistake - the data points used for plotting on a computer truly always are discrete, but matplotlib's plt.plot()
function interpolates them, giving us the nice continuous beauties you see above.
# The figsize parameter can help you shape your figure
plt.figure(figsize=(10,7))
plt.plot(x, y1)
plt.plot(x, y2)
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.title("Title")
# You can also change the legend font size by passing in the fontsize= paramater
plt.legend((r'$e^{-x}$', r'$\frac{\sin(2\pi \cdot 10\cdot x)}{4}+0.5$'), fontsize=14)
plt.show()
You can also specify more options in plot()
, such as color and linewidth. You can also change the axis using plt.axis
plt.figure(figsize=(10,7))
plt.plot(x, y1, ":r", linewidth=10)
plt.plot(x, y2, "--k")
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.title("Title")
plt.legend((r'$e^{-x}$', r'$\frac{\sin(10\cdot x)}{4}+0.5$'), fontsize=14)
# plt.axis takes in a list of the form [x_lower, x_upper, y_lower, y_upper]
plt.axis([0, 0.5, -0.5, 1.5])
plt.show()
# xkcd: the Comic sans of plot styles
with plt.xkcd():
plt.figure()
plt.plot(x, y1, 'b')
plt.plot(x, y2, color='orange')
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.title("Title")
plt.legend(("blue", "orange"))
plt.show()
There are many other plotting functions. For example, we will use plt.imshow()
for showing images.
image = np.outer(y1, y2) # plotting the outer product of y1 and y2
plt.figure()
plt.imshow(image)
Similarly, we use plt.stem()
for plotting discretized signals (technically, on a computer, everything is discretized, but it's often as a result of sampling something continuous, in which case it's often more informative to plot it as a continuous signal with plt.plot()
).
# decaying exponential
n = np.arange(0, 10)
signal = np.exp(-n / 2)
# stem plot on an 8-by-4 inch figure
plt.figure(figsize=(8, 4))
plt.stem(n, signal)
# zoom out a little on both axes to get cleaner looking plot
plt.xlim([-.5, 9.5])
plt.ylim([0, 1.2])
plt.show()
As you know, LTI systems act by convolving an input with the system's impulse response. This is why LTI systems are so nice - an LTI system is completely characterized by its impulse response, which is typically measureable to a high degree of precision in practice - just send in an impulse!
First, we'll get acquainted with performing convolution in numpy, using some familiar signals.
The numpy function for convolution is np.convolve
, which takes in three arguments: two numpy arrays (your signals) $x(n)$, $h(n)$ and a string, mode
, telling the function how to treat the boundaries. It returns $y(n) = (x * h)(n)$, with some subtle differences in how the boundaries are handled based on what you pass in for mode
.
full
, same
, valid
¶There are three different options for mode
, and understanding the differences is important:
full
: Returns the "entire" convolution; this is exactly what you've seen in class. The resultant signal's length will be the sum of the two input signals' lengths, minus 1 (Discussion 1, problem 3 anyone?).same
: Truncates the convolution (i.e., cuts off the extra) to be of the same length as max(len(x), len(h))
. This is often used when doing analysis with measured signals, as the measurements will have a lot of zeros padding the start/end of the signals, so this trunaction doesn't actually remove any information. Additionally, if len(x) == len(h)
, then the convolution (with "same") will also have that length. This is what the name same
is meant to suggest — the output is the same length as the (longer) input.valid
: Returns the part of the convolution where the signals overlap completely. For example, when doing an $N$-point moving average of a signal, your output isn't really meaningful until you've hit the first $N$ data points, and isn't meaningful after you've hit the last $N$ of them.In practice, (2) plus padding on any implicit zeros outside the stored signal values so that the result doesn't have any nonzero signal values cut out is most common, but (1) and (3) do come up.
Here's an example of performing the convolution of $x[n] = \frac{1}{2}\delta[n] + \frac{1}{2}\delta[n - 1]$ with $h[n] = 3\delta[n+1]$. You should try doing it by hand and see if your results match Python's!
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Signals
# Time vals: -4 -3 -2 -1 0 1 2 3 4
x = np.array([0, 0, 0, 0, .5, .5, 0, 0, 0]) # .5delta(n) + .5delta(n - 1)
h = np.array([0, 0, 0, 3, 0, 0, 0, 0, 0]) # 3delta(n + 1)
# Time indices; since "full"
n_same = np.arange(-4, 5)
n_full = np.arange(-8, 9)
# Convolve the signals; get the "full" and "same" convolutions
y_same = np.convolve(x, h, 'same')
y_full = np.convolve(x, h, 'full')
# Plot the results on a 2x2 grid with the first row for the signals
# and the second row for the convolution results
plt.figure(figsize=(20, 8))
ylimits = [0, 3.5]
plt.subplot(2, 2, 1)
plt.stem(n_same, x, linefmt='b-', markerfmt='bo') # make first signal blue
plt.title("Signal 1: $x[n]$")
plt.ylim(ylimits)
plt.subplot(2, 2, 2)
plt.stem(n_same, h, linefmt='g-', markerfmt='go') # make second signal green
plt.title("Signal 2: $h[n]$")
plt.ylim(ylimits)
plt.subplot(2, 2, 3)
plt.stem(n_full, y_full, linefmt='r-', markerfmt='ro') # convolution results are red
plt.title("\"Full\" Convolution Result: $(x * h)[n]$")
plt.ylim(ylimits)
plt.subplot(2, 2, 4)
plt.stem(n_same, y_same, linefmt='r-', markerfmt='ro') # convolution results are red
plt.title("\"Same\" Convolution Result: $(x * h)[n]$")
plt.ylim(ylimits)
plt.show()
Observations:
len(x) + len(h) - 1 = 2 + 1 - 1 = 2
, which fits within the original time indices, and so we see the same results for "full" as "same" (just with more padding on "full" since "same" cut out some of it).n_same
and n_full
, so that they showed up as expected in our plots. This problem of tracking the correct time indices can indeed be annoying when dealing with convolution on a computer, since data is often given without an associated range of time or even a notion of "zero". In practice, you must interpret what "zero" is based on the context.Now, run the next cell to see a case where the outputs are different. It's a classic - the convolution of two boxes, which produces a triangle.
# Signals
# Time vals: -4 -3 -2 -1 0 1 2 3 4
box1 = np.array([0, 1, 1, 1, 1, 1, 1, 1, 0])
box2 = np.array([0, 1, 1, 1, 1, 1, 1, 1, 0])
# Time indices; since "full"
n_same = np.arange(-4, 5)
n_full = np.arange(-8, 9)
# Convolve the signals; get the "full" and "same" convolutions
tri_same = np.convolve(box1, box2, 'same')
tri_full = np.convolve(box1, box2, 'full')
# Plot the results on a 2x2 grid with the first row for the signals
# and the second row for the convolution results
plt.figure(figsize=(20, 8))
ylimits = [0, 8]
plt.subplot(2, 2, 1)
plt.stem(n_same, box1, linefmt='b-', markerfmt='bo') # make first signal blue
plt.title("Signal 1: Length 9 Rectangular Box")
plt.ylim(ylimits)
plt.subplot(2, 2, 2)
plt.stem(n_same, box2, linefmt='g-', markerfmt='go') # make second signal green
plt.title("Signal 2: Length 9 Rectangular Box")
plt.ylim(ylimits)
plt.subplot(2, 2, 3)
plt.stem(n_full, tri_full, linefmt='r-', markerfmt='ro') # convolution results are red
plt.title("\"Full\" Convolution Result: Triangle")
plt.ylim(ylimits)
plt.subplot(2, 2, 4)
plt.stem(n_same, tri_same, linefmt='r-', markerfmt='ro') # convolution results are red
plt.title("\"Same\" Convolution Result: *Partially cut-off* Triangle")
plt.ylim(ylimits)
plt.xlim([-4.5, 4.5])
plt.show()
Observations:
same
works by chopping down the convolved result to the length of the longer of the two inputs. Here, our two input signals are the same length, 9, so our output from "same" is also length 9, and we get some of the resultant signal cut out.full
for the mode argument. This is the way to go if you don't care about the output having the same length as the inputs.mode
at all, and we could simply carry out full
convolution for everything. Alas, we have finite memory to work with.mode
¶We don't mean to turn this into a problem bigger than it should be. In practice, if you have to convolve on a computer, you'll pick one of the two approaches listed above and probably not have to think about it again. Approach (2) is the most common in practice.
We want you to be aware of these subtle differences, as they can be confusing at first. It's worth paying attention to such nuances so you don't waste time trying to fix some boundary issue.
Now that you've seen some examples, you'll get to compute a convolution on your own.
First, using $\{0, 1, 2, ..., 9, 10\}$ as your time indices, create two signals, $x_1$ and $x_2$:
The signal $x_1$ is a right-sided decaying exponential with rate 0.8. That is,
$$x_1[n] = \begin{cases}e^{-0.8n},& n \geq 0 \\ 0,& n < 0\end{cases}$$
Keep in mind that this signal is actually of infinite duration, but we're truncating it at $n=10$ (for a total length of 11). The function np.exp
, which computes the exponential function, will be useful for this.
The signal $x_2$ is a five-point weighted sum of deltas. Specifically,
$$x_2[n] = 0.1\delta[n] + 0.7\delta[n - 1] + 1.1\delta[n - 2] + 1.3\delta[n - 3] + 0.9\delta[n - 4] + 0.3\delta[n - 5]$$
Fill in the requested lines, and run the cells to plot your signals and verify they're correct. The first should decay to values that look like they're zero after roughly $n=6$; the second you can try plotting by hand if you're unsure of.
# Useful definitions
n = np.arange(11)
rate = .8
# Your signals!
x1 = # TODO define me as the decaying exponential with "rate" as my rate
x2 = # TODO define me as the signal x2 from above
# Sanity check
plt.figure(figsize=(16, 4))
plt.subplot(1, 2, 1)
plt.stem(n, x1)
plt.xlim([-.5, 10.5])
plt.ylim([0, 1.5])
plt.title("Decaying Exponential with Rate {0}".format(rate))
plt.subplot(1, 2, 2)
plt.stem(n, x2)
plt.xlim([-.5, 10.5])
plt.ylim([0, 1.5])
plt.title("Six-Point Weighted Sum of Deltas")
plt.show()
Signals looking good? Nice. Now, finish the next cell by adding code to convolve the signals using "full"; generate the appropriate time indices; and plot the convolved signal using your generated time indices.
n = # TODO add time indices here
y = # TODO make y the convolution of x1, x2 in "full" mode
# TODO add code to plot the signal y as a lollipop plot
Comments: Note how the convolution "stretched" out the decaying exponential! In practice, we often don't have access to a closed-form expression describing our impulse response. Instead, we have a file with a measured version of our impulse response, which we can use for simulating the action of an LTI system that we want to study.
For example, suppose you are in the business of obtaining high-precision measurements. In theory, the perfect measurement system (suppose it's an LTI system) would have a delta as its impulse response, since convolving with a delta does nothing. In practice, you may be using an oscilloscope that has a delta-like impulse response, such as $x_2$. To study what features of a signal of interest are modified by the imperfections, you can model the oscilloscope as an LTI system, record its impulse response, and then simulate the measurement process in Python by convolving the signal of interest with your measured impulse response, studying all the relevant details offline.
This provides a nice interpretation of this question in the context of taking measurements — an oscilloscope doesn't have a perfectly narrow delta as its impulse response, it has something more like $x_2$, so whenever we measure something, we're really getting a more "stretched out" version of it! In some cases, this idea of convolution as "smearing" one signal across time with another is a very useful interpretation.
Now that you're acquainted with Python and numpy/matplotlib, as well as computing convolution in numpy, we can put our skills to doing some more interesting things!
In this question, we'll explore two classic filters: the 1D edge detector, and the moving average. You've seen both of these on HW 1, so in this problem, we'll move beyond mere conceptual understanding and focus on exploring what they do in the context of actual data, especially data that is cumbersome to analyze with pencil and paper. These are both LTI systems, so we'll speak of them in terms of their impulse responses.
The 1D edge detector is used in signal processing to, as you may have guessed from the name, detect edges, or jumps in a signal's amplitude. It's referred to as 1D to distinguish it from 2D edge-detecting filters used in image processing— basic signals only containing amplitude versus time information are referred to as "1D" whereas images are often considered as "2D" signals. Other than the Deconvolution lab, where we'll briefly touch on some basic image processing, all signals you'll see in EE 120 will be 1D.
The impulse response of a 1D edge detector is defined as:
$$h_{ED}[n] = \delta[n] - \delta[n - 1]$$
The filter works by taking the difference between every pair of adjacent signal values. If you have a sequence of constant values, the filter will repeatedly output zeros, as there's no "edge".
Similarly, if the signal has a zero (at time $n-1$) followed by a 1 (at time $n$), the filter will output a 1 (at time $n$), indicating an edge of "size" 1. The filter also encodes information about the edge's "direction" - if you instead had a 1 (at time $n-1$) followed by a 0 (at time $n$), the output (at time $n$) would be -1.
Because of this, the 1D edge detector can be thought of as the discrete-time equivalent of taking a derivative.
We'll go through two examples with this filter. First, we'll try it out on a piecewise constant signal. We call a DT signal "piecewise constant" if it consists only of constant-height segments each spread over more than one sample.
In the cell below, using the time indices $n \in \{-10, -9, ..., -1, 0, 1, ..., 9, 10\}$, define your 1D edge detector's impulse response as ED
. Additionally, define the piecewise constant signal $x$ as follows:
$$x[n] = \sum_{k = -5}^{-1} \delta[n - k] + 3\sum_{k = 0}^{3} \delta[n - k] + 2\sum_{k = 4}^{8} \delta[n - k]$$
Finally, apply your edge detector ED
to $x$ by convolving them, using the "same" mode (note we have ample zero padding here, so this is fine). Plotting code has been provided for you. Simply run the cell when done. We'll call the filtered result y
.
Make sure to answer the questions below the cell. You can do this by double-clicking the cell with text in it, and typing in your answer. Press Shift+Enter
while focused on that cell after you're done typing to return it to normal.
# TODO SIGNAL AND TIME INDICES DEFINITIONS HERE
n = # time indices
ED = # edge detector impulse response
x = # test signal
# TODO FILTERING CODE HERE
y = # filtered version of x
# Plot results
plt.figure(figsize=(16, 8))
plt.subplot(2, 1, 1)
plt.stem(n, x)
plt.ylim([0, 3.5])
plt.title("Piecewise Constant Signal $x[n]$")
plt.subplot(2, 1, 2)
plt.stem(n, y)
plt.ylim([-2.5, 2.5])
plt.title("1D Edge Detector Output for $x[n]$")
plt.show()
Question 1a: $x$ is a piecewise-constant signal. How many times does the actual constant change? Don't count the change from not existing to zero at the start (or at the end).
A: (TODO)
Question 1b: How many points of the edge detector's output are nonzero (that is, how many edges are detected)?
A: (TODO)
Question 1c (OPTIONAL): One of the hottest areas of signal processing in the past ~15 years has been the study of sparse (mostly zero) signals, including both acquisition and representation. Suppose you have a piecewise-constant signal, which you want to sparsify (i.e., make mostly zero) into another signal, from which you can still recover the original. How would you do so by only saving one of the original signal values and applying an LTI filter? You can ignore noise that would be present for real world signals — assume the signal truly is piecewise-constant like the ones above.
A: (TODO)
Now that we've tried our edge detector on a piecewise constant signal, let's try it on a less well-behaved signal: some noise! In this part, as well as the moving average filter, we'll use what's commonly referred to as "Gaussian noise" as our test signal.
In many real-world applications, when using some data that has been collected (i.e., a signal), you don't have access to the original signal (call it $x$). Instead, you have access to $\tilde{x} = x + z$, where $z$ is "noise" that corrupts the signal. A large part of signal processing, machine learning, statistics, data science, and many other application areas involves extracting as much meaning as possible from the measured signal $\tilde{x}$. A popular choice for modeling $z$, which we'll explore more in the lab on Orthogonal Signaling later in the semester, is what's known as "Gaussian noise". The name comes from the fact that the noise itself is random, and so it has to be drawn from some probability distribution, which in this case is the Gaussian distribution.
Don't worry about the specifics of this — you don't need to know probability theory for this class. But due to the popularity of the Gaussian noise model in signal processing, it's worth experimenting with as a test signal.
We'll use the time indices $\{-100, ..., -1, 0, 1, ..., 100\}$. The signal noise
is defined for you already for these indices. Fill in the code to generate the time indices, regenerate your new impulse response for these time indices, and finally apply your edge detector (again, use the "same" mode for convolution; don't worry about any padding issues). Run the cell when done to plot the result.
Hint: You should use numpy functions, rather than writing out the 201 values by hand, for defining the edge detector impulse response. The functions np.ones
, np.zeros
, and np.concatenate
may be of use.
We'll plot in continuous time as it better illustrates the concept.
noise = np.random.normal(0, 1, 201)
# TODO: TIME INDICES
n =
# TODO: NEW IMPULSE RESPONSE
ED =
# TODO: FILTERING CODE
noise_filt =
# Plot results
plt.figure(figsize=(12, 5))
plt.plot(n, noise)
plt.plot(n, noise_filt)
plt.xlim([n[0], n[-1]]) # n[-1] is the last element of n
plt.ylim([-5, 5])
plt.ylabel("Amplitude")
plt.legend(("Gaussian Noise", "1D Edge Detector Output for Gaussian Noise"))
plt.show()
Question: Qualitatively, does the filter amplify or suppress the "strength" of the noise? What implications might this have if we want to detect edges on a signal that's noisy?
A: (TODO)
The moving average filter is specified in terms of a single integer parameter, $L$, which represents the length of the filter. The filter takes the average of the $L$ points before, and including, the current point of a signal, outputting that average. Formally, the filter's impulse response is:
$$h_{MA}[n] = \frac{\delta[n] + \delta[n - 1]\ +\ ...\ +\ \delta[n - (L - 1)]}{L}$$
It is very popular in signal processing and time series analysis for smoothing data. For example, if you have a signal that is "spiky" — that is, it jumps around a lot (in the real world, this often arises due to noise), a moving average filter will smooth out the data. Whenever you need a quick and easy way to implement a filter to smooth your data out, and you don't need anything super fancy (and in most cases, you don't), a moving average is a great choice.
Here, we'll analyze one of the most common datasets in time series analysis: stock price data! Run the cell below to load it. We will be taking a look at Apple's stock data over the past year and a half.
Note: We got this data from Yahoo! finance. If you are interested in playing around with stock data on your own, you can click on a stock, navigate to the historical data tab, download a csv file, and use our code below to parse it. Acquiring data often plays second fiddle to all the fancy algorithms used on it despite being just as important a part of any engineering field that relies on it. However, here, we want to focus on the algorithms, not the minutiae of Yahoo! csv file formats, hence why we provide the code to read the data in.
# CSV hocus pocus
import csv
stock_dates = []
stock_prices = []
with open('AAPL.csv', mode='r') as raw_data:
csv_reader = csv.DictReader(raw_data)
for row in csv_reader:
data = row['Close']
if not data == 'null':
stock_prices.append(float(data))
stock_dates.append(row['Date'])
stock_prices = np.array(stock_prices)
# 400 most recent days
start = -400
end = -1
x = np.arange(len(stock_prices[start:end]))
# Plot roughly one data point per week (x[::7]) to we can display dates
# without matplotlib going crazy about the labels overlapping
plt.figure(figsize=(20, 5))
plt.xticks(x[::7], stock_dates[start:end:7], fontsize=10, rotation=75)
plt.plot(x, stock_prices[start:end])
plt.title("AAPL Daily Closing Prices, June 2017 to January 2019")
plt.ylabel("US Dollars")
plt.show()
Fill in the missing parts of the cell below to filter the noise with moving average filters of length 5, 25, and 75, and be sure to answer the question about interpreting the results below the cell.
Fill in the cell below to define these impulse responses as MA5, MA25, MA75
respectively and filter your test data (data
) with each of them using convolution with the "same" mode to avoid any irrelevant issues with signal length. This means you only have to define your impulse response where it's nonzero rather than pad it with zeros like we did in previous examples. We'll call the outputs y5, y25
, and y75
, respectively.
Hint: Don't forget to divide by $L$ when defining your impulse responses, or your results will be scaled incorrectly! (This won't actually change the general shape, but it will only mess with the signal's amplitude, which may be a serious problem in a real world context — if you predict some stock price and you're off by a factor of 75, that's a big issue!)
Plotting code has been provided for you. To generate the results, simply run the cell after adding your own code.
data = stock_prices[start:end]
## TODO IMPULSE RESPONSE DEFINITIONS HERE
MA5 =
MA25 =
MA75 =
## TODO FILTERING CODE HERE
y5 =
y25 =
y75 =
# Overlay stock prices
plt.figure(figsize=(16, 4))
plt.plot(np.arange(len(data)), data)
plt.plot(np.arange(len(data)), y5)
plt.plot(np.arange(len(data)), y25)
plt.plot(np.arange(len(data)), y75)
# Formatting mumbo jumbo to abstract away boundary issues
plt.xlim([40, 360])
plt.ylim([140, 240])
plt.legend(('Original Data', '5-Point Average', '25-Point Average', '75-Point Average'), bbox_to_anchor=(1.1, 1.05))
plt.ylabel("US Dollars")
plt.xlabel("Days Since June 23, 2017")
plt.title("Moving Averaged AAPL Stock Prices")
plt.show()
Question: As we increase $L$, does the filter smooth the signal out more, or less? Explain in terms of both the filter's impulse response and the plot.
A: (TODO)
The Moving Average Convergence Divergence (MACD) Indicator is a trend-following momentum indicator that shows the relationship between two moving averages of a stock’s price.
To calculate the MACD of a stock, we first have to understand a new kind of moving average, called the Exponential Moving Average (EMA). What we did in Question 3 is known as an Simple Moving Average (SMA) - all points are given equal weight. An EMA, on the other hand, places a greater weight, and therefore significance, on the most recent data points. The benefit of the EMA over the SMA is that the EMA reacts faster to recent price changes. Let's take a look at how the EMA works.
We can express the EMA using the following LCCDE:
$$y[n]=\alpha\cdot x[n] + (1-\alpha)\cdot y[n-1]$$
Where $y[n]$ is the EMA at day $n$, and $x[n]$ is the stock price at day $n$. Now, why is this an "exponential" moving average? It might not be obvious when we state it in this form because it is a recursive LCCDE.
Let's try to write it out step by step to try to unravel the recursion. Assuming that $y[n]$ is zero $\forall n<0$,
\begin{align} y[0] &= \alpha\cdot x[0] + (1-\alpha)\cdot y[-1] = \alpha\cdot x[0] \\ y[1] &= \alpha\cdot x[1] + (1-\alpha)\cdot y[0] = \alpha\cdot x[1] + (1-\alpha)\cdot \alpha\cdot x[0] \\ y[2] &= \alpha\cdot x[2] + (1-\alpha)\cdot y[1] = \alpha\cdot x[2] + (1-\alpha)\cdot (\alpha\cdot x[1] + (1-\alpha)\cdot \alpha\cdot x[0]) = \alpha\cdot x[2] + (1-\alpha)\cdot \alpha\cdot x[1] + (1-\alpha)^2\cdot \alpha\cdot x[0] \\ &\vdots \\ y[n] &= \alpha\sum_{k=0}^{n}(1-\alpha)^k\cdot x[n-k] \end{align}
Ah! Now that we have rewritten it, it's clear why we call this an EMA, because we are weighting each previous data point with an exponentially smaller value.
Now, let's find the impulse response of this system by applying a Kronecker delta to our system:
$$h[n]=\alpha\sum_{k=0}^{n}(1-\alpha)^k\cdot \delta[n-k]=\alpha\cdot(1-\alpha)^n$$
Plotting this with a length of $5$, we can see the exponential shape emerge!
def ema_filter(length):
alpha = 2/(length+1)
# TODO: Create and return an EMA filter with length "length"
print("Someone forgot to implement me!") # TODO delete this print when done
h = ema_filter(5)
plt.figure()
plt.xlim([-.5, 4.5])
plt.stem(h)
plt.show()
2 Quick Things to Note:
The LCCDE we originally gave describes an IIR filter, but we are using an FIR filter - after all, we can't store infinitely many values on a computer. In order to account for this, we renormalize so that the sum of the impulse response's coefficients, known as the DC gain, is 1.
The value for $\alpha$ that we have picked is motivated by reducing the output noise variance. We have a provided a reference below if you wish to read more about this.
We can now convolve data with the impulse response to calculate the EMA!
The MACD line is calculated by taking the difference between the 26-day EMA of the stock and its 12-day EMA. Let's try calculating the MACD line for some sample stock data.
# MACD Line Calculation
data = stock_prices[start:end]
# Create the filters of the correct lengths using the function your wrote above
EMA26 = ema_filter(26)
EMA12 = ema_filter(12)
l_26 = len(EMA26)
l_12 = len(EMA12)
MACD = np.convolve(data, EMA12, mode='full')[l_26:-l_12]-np.convolve(data, EMA26, mode='full')[l_26:-l_26]
# Plotting Code
x = np.arange(len(MACD))
plt.figure(figsize=(16, 6))
plt.xticks(x[::15], stock_dates[start+l_26:end:15], fontsize=10, rotation=75)
plt.plot(MACD, label="MACD")
plt.legend()
plt.show()
Now that we have the MACD line, we want to take the 9-day EMA of the MACD line to create the signal line. Now, equipped with the MACD line and the signal line, we are ready to analyze the stock data.
# Signal Line Calculation on plotting with MACD line
EMA9 = ema_filter(9)
signal = np.convolve(MACD, EMA9, mode='same')
# Plotting Code
c = ['green', 'red']
colors = [c[bool(i)] for i in np.greater(signal,MACD)]
x = np.arange(len(signal))
plt.figure(figsize=(16,10))
plt.subplot(2,1,1)
plt.title("Stock Price")
plt.plot(data)
plt.ylabel("Price")
plt.subplot(2,1,2)
plt.title("MACD, Signal, and Histogram")
plt.xticks(x[::15], stock_dates[start+l_26:end:15], fontsize=10, rotation=75)
plt.plot(MACD, label='MACD')
plt.plot(signal, label='Signal')
plt.bar(range(len(signal)),(MACD-signal), color=colors, label="Difference Histogram")
plt.legend()
plt.show()
In signal processing terms, the MACD is a filtered measure of velocity. The velocity has been passed through two first-order linear low-pass filters. The signal line is that resulting velocity, filtered again. The difference between those two, the histogram, is a measure of the acceleration, with all three filters applied. A MACD crossover of the signal line indicates that the direction of the acceleration is changing. The MACD line crossing zero suggests that the average velocity is changing direction.
Question: Using this analogy of velocity and acceleration, we can think of a stock's price as a the position of the car. Under this assumption, describe the plot above. Specifically:
**Answer: **(TODO)
- Position: (TODO)
- Velocity: (TODO)
- Acceleration: (TODO)
- Date Range + Explanation: (TODO)
- How to invest (OPTIONAL): (TODO)
Special thanks to the Berkeley Python Bootcamp 2013, Python for Signal Processing, EE123 and and EECS126 for providing a great starting point for Q1, Introduction to the Python scientific computing stack.