<item 1>, <item 2>, ..., <item n>] [
Data Classes
Last week, we were introduced to the notion of data types. Recall that “data type” can be thought of as the category (or type) of data- i.e. integer, float, character, etc.
In Python, however, we often need to aggregate data into larger structures, often referred to as data classes.
Lists
Perhaps the most fundamental data structure in Python is that of a list. Just like lists in real life or in mathematics, Python lists are just collections of items enclosed in square brackets:
Again, the items in a list can be of any data type; we can even mix and match data types!
Just as we were able to use a Python function (type()
) to check the type of a particular piece of data, we can also use Python to check the structure or class of a piece of data. It turns out that we use the same function as before- namely, type()
!
Indexing
Alright, now that we can store data in lists, how can we access elements in a list? The answer is to use what is known as indexing.
Given a list x
, we access the i
th element using the code
x[i]
The reason we call this “indexing” is because the number that goes between the brackets is the index of the element that we want.
What does this mean? Well, let’s see by way of an example.
So, what we would colloquially call the first element of a list, Python calls the zeroeth element.
Alright, let’s put together some of the concepts we just learned.
Tables
Another very useful data structure in Python is that of a table. Python tables behave pretty much the same as the tables we’ve used in, say, math- they are a grid of values arranged sequentially.
Tables can be created using the Table()
function in Python, which itself comes from the datascience
module. The general syntax of creating a table with the Table()
function is:
Table().with_columns("<col 1 name>", [<col 1, val 1>, <col 1, val 2>, ... ],
"<col 2 name>", [<col 2, val 1>, <col 2, val 2>, ... ],
... )
For example,
Table().with_columns("Name", ["Ethan", "Morgan", "Amy"],
"ID", [12345, 10394, 20343],
"Office", ["South Hall", "South Hall", "North Hall"]
)
Name | ID | Office |
---|---|---|
Ethan | 12345 | South Hall |
Morgan | 10394 | South Hall |
Amy | 20343 | North Hall |
There is nothing stopping us from assigning a table to a variable! For example, after running
= Table().with_columns(
table1 "Name", ["Ethan", "Morgan", "Amy"],
"ID", [12345, 10394, 20343],
"Fav_Drink", ["Iced Tea", "Coffee", "Sprite"]
)
the variable table1
is equivalent to the table displayed above:
table1
Name | ID | Fav_Drink |
---|---|---|
Ethan | 12345 | Iced Tea |
Morgan | 10394 | Coffee |
Amy | 20343 | Sprite |
The datascience
module contains a plethora of methods we can use to manage tables. For example, the select()
method can be used to select columns by name:
"ID") table1.select(
ID |
---|
12345 |
10394 |
20343 |
Suppose we want to select rows of a table that satisfy a given condition. For example, if we wanted to find the information of only people who like Sprite in the table1
table above, we would call
"Fav_Drink", "Sprite") table1.where(
Name | ID | Fav_Drink |
---|---|---|
Amy | 20343 | Sprite |
What would happen if we tried to select the rows of table1
with Coke
in the Fav_Drink
column? Well, since there is nobody in table1
that has coke as their favorite drink, we should hope that Python returns an empty table.
"Fav_Drink", "Coke") table1.where(
Name | ID | Fav_Drink |
---|
Sure enough, Python has returned an empty table!
Arrays
The final Data Structure we will examine in this class is that of an array. Arrays behave very similarly to Tables, with a few differences. For one, the syntax used to create an array is slightly different:
<item 1>, <item 2>, <item 3>, ...) make_array(
For example,
"Spring", "Summer", "Autumn", "Winter") make_array(
array(['Spring', 'Summer', 'Autumn', 'Winter'],
dtype='<U6')
You may ask- what’s that dtype='<U6'
symbol at the end of the output? For now, don’t worry about it, as we will revisit this later.
Lists vs. Arrays
So, we now know about three different data classes in Python: lists, tables, and arrays. At first glance, lists and arrays may seem somewhat similar. However, there are a few key differences between them:
What the previous Task illustrates is the fact that arrays lend themselves to element-wise operations, whereas lists do not. One important limitation about arrays, though, is that the elements in an array must all be of the same data type. If you try to make an array consisting of elements that are different data types Python will still run, however it will not run in the way you expect it to!
Comparisons
Here’s a question: is 2
less than 3
? Well, yes it is! If we wanted to confirm this, we could simply ask Python whether 2
is less than 3
by running
2 < 3
True
Notice, however, how Python answered this question: it simply returned True
. Let’s see what the data type of True
is:
type(True)
bool
True
is of the type bool
, which is short for boolean
. There are only two boolean quantities in Python: True
and False
. Let’s see how we can generate a False
value:
3 < 2
False
Here is a list of comparison operators, taken from the Inferential Thinking textbook:
Comparison | Operator | True Example | False Example |
---|---|---|---|
Less than | < |
2 < 3 |
2 < 2 |
Greater than | > |
3 < 2 |
3 > 3 |
Less than or equal | <= |
2 <= 2 |
3 <= 2 |
Greater than or equal | >= |
3 >= 3 |
2 >= 3 |
Equal | == |
3 == 3 |
3 == 2 |
Not equal | != |
3 != 2 |
2 != 2 |
One nice thing about Python is that it allows for multiple simultaneous comparisons. For example,
2 < 3 < 4
True
For instance, 2 < 3 < 1
would return False
, because even though 2
is less than 3
it is not true that 3
is less than 1
.
Believe it or not, you can compare strings as well! Python compares strings alphabetically; that is, letters at the beginning of the alphabet are considered to have smaller ordinal value than letters at the end of the alphabet. For example:
"apple" < "banana"
True
"zebra" < "zanzibar"
False
"cat" <= "catenary"
True
Finally, we discuss how comparisons work in the context of lists and arrays. The way Python compares lists is by what is known as lexicographical order. From the official Python help documentation, this means
first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.
For instance, [1, 2, 3] < [2, 1, 1]
would return True
since 1
(the first element of the first list) is less than 2
(the first element of the second list).
The comparison of arrays is a little more straightforward, except
To see exactly how comparison of arrays works, let’s work through a Task:
What the previous task illustrates is that Python compares arrays element-wise.
Conditionals
Now, we can use comparisons for much more than verifying simple arithmetic relationships. One of the main areas in which comparisons arise is the area of conditional expressions.
Simply put, conditional expressions are how we can convey a set of choices to Python. As an example, let’s consider finding someone’s city based on their zip code. To simplify, let’s assume the only zip codes we consider are 9311
, 93120
, and 93150
. From postal data, we know that:
- a zip code of
93117
corresponds to Goleta - a zip code of
93120
corresponds to Santa Barbara - a zip code of
93150
corresponds to Montecito
We can rephrase this information in terms of “if” statements:
- If a person has a zip code of
93117
, then they are in Goleta - Otherwise, if they have a zip code of
93120
, then they are in Santa Barbara - Otherwise, if they have a zip code of
93150
, then they are in Montecito
This is precisely the syntax we would use when translating this experiment into Python syntax:
if zip_code == 93117:
= "Goleta"
location elif zip_code == 93120:
= "Santa Barbara"
location elif zip_code == 93150:
= "Montecito" location
By the way: elif
is an abbreviation for else if
, which itself can be thought of as equivalent to otherwise, if
.
Here’s the general syntax of a conditional expression in Python:
if <condition 1>:
<task 1>
elif <condition 2>:
<task 2>
...else:
<final task>
When executing the above conditional statement, Python first checks whether <condition 1>
returns a value of True
or False
. If it returns a value of True
, then <task 1>
is executed and the statement ends. Otherwise, Python checks whether <task 2>
is True
or False
; if it is True
then <condition 2>
is executed, etc.
For example, if instead of the conditional expression in Task 2 we had instead put
= 2
x
if x < 2:
= "hello"
x elif x < 3:
= "goodbye"
x else:
= "take care" x
then we would have received an error!
Functions
Finally, let’s quickly discuss Python functions. We’ve already been using quite a few functions:
If you recall, the general syntax for calling a function is:
<function name>(<arg1>, <arg2>, ... )
where <function name>
denotes the function name and <arg1>
, <arg2>
, etc. denote the arguments of the function.
Creating your own function in Python is actually fairly simple! Here is the syntax we use:
def <function name>(<list out the argument names>):
"""include a 'docstring' here"""
<body of the function>
return <what you want the function to output>
For example,
def f(x, y):
"""returns x^2 + y^2"""
return x**2 + y**2
creates a function f
that can be called on two arguments, x
and y
, and returns the sum of squares of the arguments; e.g.
3, 4) # should return 3^2 + 4^2 = 25 f(
25
By the way, the docstring referenced above is a verbal description of what the function does. (Recall from Lab01 that it is just a multi-line comment, since it is enclosed in triple quotation marks!). All functions should include a docstring to convey to the user what the function does.
For instance,
def g(x, y):
"""should return x^2 + y^2"""
**2 + y**2
x
3, 4) g(
Finally, let’s combine some things by way of a concluding Task:
What to Turn In
Congrats on finishing Lab 02! Download the .ipnyb
version of your notebook and upload it to Gradescope!