Claire Cahoon, Matt Jansen, Nathan Kelber, and Kristina Bush
Thursday, May 23, 2019
If you would like to save your work and continue to use Python on your computer, we recommend downloading Anaconda.
Goals:
Python is a general-purpose programming language that has become popular for data and text analysis. Python can be easier to learn than other languages because it emphasizes (human) readability and flexibility. Python is the third most used language on GitHub; this means you'll find packages (sets of functionality developed by other users) to use Python for a wide variety of problems and tasks.
If you haven't worked with a programming language before, learning Python will introduce you to approaches and thought processes common to many similar languages, making it easier to learn other languages like Java and R.
Scripting: writing code to automate repeptitive tasks. For example, extracting text from thousands of pdf files and sorting them into directories based on the information in the texts.
Quantitative analysis: specialized tools and functions for everything from fitting deep learning models with tensorflow
to creating interactive visualizations with bokeh
.
Natural Language Processing: extracting text from different file formats and parsing and analyzing sentiment using tools from the NLTK
package.
Web development(Django
), Image processing (pillow
), Web scraping (BeautifulSoup
), Acessing APIs, and many more.
However, there are few things you can do in Python that can't also be done elsewhere! If you already know one or more programming language, you'll have to decide where Python best fits or helps in your own workflows.
If you're already comfortable with basic programming concepts, you may find this workshop very simple. Python II later in the day may be more helpful if you're already familiar with the concepts below and just need to learn new syntax, but both workshops will be focused on the basics. You're welcome to stay, review, and help other attendees!
Both Python 2 and Python 3 are widely used in research. Unfortunately, while both Python 2 and 3 are very similar, there are a few key syntax differences. Python 3 was released in 2008; nearly all important tools have been re-written or updated for Python 3. Python 2 will not be maintained after January 1, 2020. This workshop will focus on Python 3.
An IDE is software that groups together the many tools to ease development in a programming language, such as a code editor, compiler, and debugger in a convenient interface. There are many different IDEs to choose from. IDEs are not necessary, but often good for beginners.
As you gain experience, you can choose whether an IDE is right for your uses and which one works best for you. For the purposes of this workshop, we will use Spyder, which comes packaged with Anaconda.
Open Spyder:
Spyder's default interface provides three panes:
We'll begin by using Python as a simple calculator.
For the purposes of this workshop, Python code will be presented in numbered grey cells as below. Any output generated will also be displayed.
2+2
To execute this in Spyder, copy or type the code yourself into the Ipython console pane. Press Enter
to execute.
Alternatively, you can enter code into the Editor pane. This is particularly useful when writing more complicated or reusable code. The code you write in the Editor pane will be saved as a .py file to run or revise later.
To use the Editor pane to save and execute code, type the code in the Editor pane, highlight the line(s) you want to execute and click:
The code will then execute in the Console pane.
Standard arithmetic operations are available in Python.
3*3
Note: We can annotate our code with comments. Python uses #
to denote comments. Anything typed after a #
will be ignored on execution of the code.
#1+2
5/2 #division
What is 126544 squared? (Exponentiation is denoted **
)
What is 5 divided by 0?
We often need Python to store values for future re-use. Python has many default data object types available; we will focus on a few common examples.
We assign a value to a variable using =
. We do not need to declare a type or give any other information.
text="Hello, World"
number=42
String objects, like text
above, contain textual values. These are identified to Python by quotes; you can use either ' or " as long as you use the same type to begin and end your string. You can't do mathmatical operations on strings.
Python uses several different numeric data types for storing different values. Examples include integers, long integers, and floating point numbers (decimals). Numbers can also be stored in string values using quotes.
For example:
notnumber="42" #This is stored as text, not a number
Once we have defined an object, we can view it again by printing it.
print(text)
We can also modify the contents of an object: for example, including redefining them or changing their type. The object's type determines what Python can do with the object.
print(number+58)
#print(number+notnumber)
So we can add a value, 58, to our number object, but we can't add our notnumber object. Let's double check what notnumber contains:
print(notnumber)
print(number)
Even those these appear the same, Python treats them very differently. Remember how we defined notnumber? Let's check what data type Python is using with type
.
type(notnumber)
Fortunately Python provides a set of functions to convert objects between different data types. A function stores prewritten commands to accomplish a particular task. Most functions take one or more objects as 'arguments' (i.e. inputs) and produce some new object as output.
We've already seen two functions: type
and print
.
The int
function takes an object as an argument and converts it to an int
(integer) numeric object. The usage is as follows:
newnumber=int(notnumber)
print(newnumber)
type(newnumber)
int
objects can only hold integer values (whole numbers). If you have decimal values, use the float
(floating decimal) type instead.
myfloat=float(newnumber)+0.5
print(myfloat)
Now we can try adding objects again.
print(number+newnumber)
You can access documentation for functions in Python with help
, for example help(sum)
. Base Python functions and those provided by packages also usually have online documentation that may be easier to read.
j
and k
, with values 37 and 52 respectively.j
and k
? The product? Write code for each of these in the editor window, and run with the keyboard shortcut (refer back to Section 1. Entering Code).j
and k
to be 8 and 3. Re-use your code from the editor to determine their sum and product.Lists store multiple objects together. For example, we can save numbers and characters:
my_list=[1,2,3,"four"]
print(my_list)
We can also directly place previously defined objects into a list (even other lists!):
obj0=12
obj1="cat"
obj2=["a","b","c"]
my_list1=[obj0,obj1,obj2]
print(my_list1)
Once we've defined a list, we can add more elements to the end with the .append
function.
my_list1.append("dog")
my_list1.append(42)
print(my_list1)
Create a list that includes:
your two favorite holidays (as a list within the list)
For example:
["red",["Halloween","New Years"]]
Then append the number of pets you have as a new list item
Python retains the order of elements in our list and uses that order to retrieve objects in lists. These numbered positions are called indices.
We use [
and ]
to provide indices to Python.
Most importantly, Python starts counting at zero: The first element in your list is denoted [0]
, the second [1]
, the third [2]
and so on. This can take some getting used to!
my_list2 = ["cat","dog","parrot"]
print(my_list2[2])
We can use multiple indices for lists within lists, one after the other:
#recall
my_list1=[12, 'cat', ['a', 'b', 'c']]
print(my_list1[2][1]) #i.e. the second element of the list held in the third element of my_list1
The len
function provides the length of an object in Python.
print(len(my_list1))
If len(my_new_list)=10
that means there are ten elements in the list. Remember that Python starts counting at 0, so the indices are 0 through 9.
We can use the range
function with len
to generate a list of indices.
my_indices1 = list(range(len(my_list1)))
print(my_indices1)
Indexes can also be used with any sequential data type, including strings.
If you want to retrieve a range of indexes, you can use two numbers with :
in between. Note that Python stops at the index before the one you note as the end of the range.
For example:
my_str="The quick brown fox jumps over the lazy dog."
print(my_str[4])
print(my_str[4:9]) #4:9 indicates characters 4-8
Using :
ranges with one end blank will automatically go to the end of the object.
We can also work from right to left using negative numbers.
print(my_str[:4])
print(my_str[-4:])
We can still use multiple nested indices across different sequential data types.
["home","away"][0][0:3]
Unfortunately, not all data types are sequential - indices will not work on numeric values.
my_pi
as defined below. Does it work as defined? If not, how could we make it work?
my_pi=3.141592653589793
Below is a list of lists of strings containing the NATO phonetic codes for each letter of the alphabet. Each list within nato
contains a letter of the alphabet and its corresponding code.
nato=[["A","Alfa"],
["B","Bravo"],
["C","Charlie"],
["D","Delta"],
["E","Echo"],
["F","Foxtrot"],
["G","Golf"],
["H","Hotel"],
["I","India"],
["J","Juliett"],
["K","Kilo"],
["L","Lima"],
["M","Mike"],
["N","November"],
["O","Oscar"],
["P","Papa"],
["Q","Quebec"],
["R","Romeo"],
["S","Sierra"],
["T","Tango"],
["U","Uniform"],
["V","Victor"],
["W","Whiskey"],
["X","X-ray"],
["Y","Yankee"],
["Z","Zulu"]]
What is the fifteenth letter of the alphabet?
There are other important data types we won't cover today. We will cover Python's data type for mapping between values, dictionaries, in the Python I Extra.
Read more about Python's built-in data types here.
Conditions let us check if a certain statement is true.
Conditions often arise from comparisons:
< strictly less than
<= less than or equal
> strictly greater than
>= greater than or equal
== equal
!= not equal
is object identity
is not negated object identity
in sequence membership
not in sequence non-membership
Each condition considered evaluates to a Boolean value - True
or False
.
num=5
num<3
letter="a"
letter in ["a","b","c"]
A conditional statement allows your code to branch and behave differently based on defined conditions.
A simple conditional statement takes the form:
if <condition>:
<do something only if condition1 is true>
Your instructions can be as long as necessary, but they have to remain indented. Indentation is very important in Python as it groups lines of code without using explict characters like {
and }
as in many other language. You can indent with spaces or tabs, but you must be consistent.
We can supply alternate instructions if the condition is false with else
, or even consider multiple conditions with elif
(i.e. else if).
if <condition1>:
<do something if condition1 is true>
elif <condition2>:
<do a different thing if condition1 is false and condition2 is true>
else
<do a third thing if neither condition is true>
num=5
if num>4:
print("This number is greater than four")
num=3
if num>4:
print("This number is greater than four")
Adding else
lets us give instructions if our condition is False
.
num=3
if num>4:
print("This number is greater than than four")
else:
print("This number is less than or equal to four")
Finally, the elif
command lets us split the possible values of num
into more groups.
num=8
if num<3:
print("This number is less than three")
elif num<10:
print("This number is greater than or equal to three and less than ten")
else:
print("This number is greater than ten")
testword=<your choice of word>
if <condition using testword>:
print(testword)
A "for loop" allows us to apply the same steps to each element in a list or other sequence of objects. In essence, loops let us automate tasks relative to some sequence that we might otherwise write like this:
sales=[5,2,7,9,3]
total_sales=0
total_sales=total_sales+sales[0]
total_sales=total_sales+sales[1]
total_sales=total_sales+sales[2]
total_sales=total_sales+sales[3]
total_sales=total_sales+sales[4]
print(total_sales)
Loops take the form:
for <name> in <list>:
do something based on name
<name>
is completely arbitrary, though i,j,k, and n are relatively common<list>
is a pre-defined list or other iterable object.my_nums=list(range(6))
for n in my_nums:
print(n)
We can also loop within loops. Like conditionals, indentation is key to control which blocks of code are executed within which loop.
#Nesting loops - indentation is key!
listOfWords = ["blue", "yellow", "red", "green"]
newList = [] #initialize an empty list
for color in listOfWords:
numLetters = 0 #resets to zero each time the loop runs
for letter in color:
numLetters += 1
temporaryList = [color, numLetters]
newList.append(temporaryList)
print(newList)
Notice that before the loop begins we create an empty list. This is a common stragegy to collect outputs from some or all of the loops iterations. This can generalize to numbers by defining a zero-valued variable before the loop and modifying it with each iteration.
How could we write the code above with fewer lines? Is there a simpler way to find the length of each word?
Loops become even more useful when combined with conditionals, to perform different steps based on each value in the loop.
for number in range(10):
if number % 2 == 0: # % denotes the modulo operation - the result is the remainder after dividing by 2
print(number)
Recall that we can combine multiple conditions with and
.
scores=[95,90,66,83,71,78,93,81,87,81]
grades=[]
for score in scores:
if score>=90:
grade="A"
elif score>=80:
grade="B"
elif score>=70 and score<80:
grade="C"
elif score>=60 and score<70:
grade="D"
else:
grade="F"
grades.append([score,grade])
print(grades)
Exercise:
score>=80
etc. in the elif
statements? Can these conditionals be simplified?We can use the break statement with a conditional to stop the loop if a certain condition occurs.
First, lets get some new functions from a package called random
.
from random import choices,seed
makes the functions choices
and seed
from the random
package available in our Python session.choices(population=range(100),k=50)
will sample 50 random numbers (with replacement) from the numbers 0-99. seed(1234)
locks the pseudo-random number generator so your results should match mine - try running this again without seed
!You can read more about the functions in the random
package here. We'll revisit packages next week!
from random import choices,seed
seed(1234)
test=choices(population=range(100),k=50)
print(test)
total=0
for number in test:
if number>10:
total=total+number
else:
print("This number is too low:",number)
break
print(total)
What does the above loop do? How would this run differently if we disabled the break by commenting (i.e. #break
)?
choices
function above to generate a random list of 50 numbers in 0-99. Write a loop that will find the sum of only the first 6 even numbers.If you are taking Python II later today and do not have Anaconda downloaded yet, please begin the process as soon as possible (see Setup) since some of the material in Python II cannot be completed using pyfiddle. If you are having trouble with installing or need to borrow a computer for the workshop, see one of the instructors!
If you'd like more practice, you can attempt our Python I: Extra.
Thanks for coming!