Python Data Science Toolbox (Part 1)

Published

October 5, 2023

Writing your own functions

Strings in Python

In the video, you learned of another standard Python datatype, strings. Recall that these represent textual data. To assign the string ‘DataCamp’ to a variable company, you execute:

company = 'DataCamp'

You’ve also learned to use the operations + and * with strings. Unlike with numeric types such as ints and floats, the + operator concatenates strings together, while the * concatenates multiple copies of a string together. In this exercise, you will use the + and * operations on strings to answer the question below. Execute the following code in the shell:

object1 = "data" + "analysis" + "visualization"
object2 = 1 * 3
object3 = "1" * 3

What are the values in object1, object2, and object3, respectively?

object1 = "data" + "analysis" + "visualization"
object2 = 1 * 3
object3 = "1" * 3
print("object1:", object1)
object1: dataanalysisvisualization
print("object2:", object2)
object2: 3
print("object3:", object3)
object3: 111

Recapping built-in functions

In the video, Hugo briefly examined the return behavior of the built-in functions print() and str(). Here, you will use both functions and examine their return values. A variable x has been preloaded for this exercise. Run the code below in the console. Pay close attention to the results to answer the question that follows.

  • Assign str(x) to a variable y1: y1 = str(x)
  • Assign print(x) to a variable y2: y2 = print(x)
  • Check the types of the variables x, y1, and y2. What are the types of x, y1, and y2?
x = 4.89
y1 = str(x)
y2 = print(x)
4.89
print(type(y1))
<class 'str'>
print(type(y2))
<class 'NoneType'>

Write a simple function

In the last video, Hugo described the basics of how to define a function. You will now write your own function!

Define a function, shout(), which simply prints out a string with three exclamation marks ‘!!!’ at the end. The code for the square() function that we wrote earlier is found below. You can use it as a pattern to define shout().

def square():
    new_value = 4 ** 2
    return new_value
    

Note that the function body is indented 4 spaces already for you. Function bodies need to be indented by a consistent number of spaces and the choice of 4 is common.

This course touches on a lot of concepts you may have forgotten, so if you ever need a quick refresher, download the Python for Data Science Cheat Sheet and keep it handy!

# Define the function shout
def shout():
    """Print a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    shout_word = 'congratulations' +  '!!!'

    # Print shout_word
    print(shout_word)

shout()
congratulations!!!

Single-parameter functions

Congratulations! You have successfully defined and called your own function! That’s pretty cool.

In the previous exercise, you defined and called the function shout(), which printed out a string concatenated with ‘!!!’. You will now update shout() by adding a parameter so that it can accept and process any string argument passed to it. Also note that shout(word), the part of the header that specifies the function name and parameter(s), is known as the signature of the function. You may encounter this term in the wild!

# Define shout with the parameter, word
def shout(word):
    """Print a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    shout_word = word + '!!!'

    # Print shout_word
    print(shout_word)

# Call shout with the string 'congratulations'
shout('congratulations')
congratulations!!!

Functions that return single values

You’re getting very good at this! Try your hand at another modification to the shout() function so that it now returns a single value instead of printing within the function. Recall that the return keyword lets you return values from functions. Parts of the function shout(), which you wrote earlier, are shown. Returning values is generally more desirable than printing them out because, as you saw earlier, a print() call assigned to a variable has type NoneType.

# Define shout with the parameter, word
def shout(word):
    """Return a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    shout_word = word + "!!!"

    # Replace print with return
    return(shout_word)

# Pass 'congratulations' to shout: yell

yell = shout("congratulations")

# Print yell

print(yell)
congratulations!!!

Functions with multiple parameters

Hugo discussed the use of multiple parameters in defining functions in the last lecture. You are now going to use what you’ve learned to modify the shout() function further. Here, you will modify shout() to accept two arguments. Parts of the function shout(), which you wrote earlier, are shown.

# Define shout with parameters word1 and word2
def shout(word1, word2):
    """Concatenate strings with three exclamation marks"""
    # Concatenate word1 with '!!!': shout1
    shout1 = word1 + "!!!"
    
    # Concatenate word2 with '!!!': shout2
    shout2 = word2 + "!!!"
    
    # Concatenate shout1 with shout2: new_shout
    
    new_shout = shout1 + shout2

    # Return new_shout
    return new_shout

# Pass 'congratulations' and 'you' to shout(): yell

yell = shout("congratulations", "you")

# Print yell
print(yell)
congratulations!!!you!!!

A brief introduction to tuples

Alongside learning about functions, you’ve also learned about tuples! Here, you will practice what you’ve learned about tuples: how to construct, unpack, and access tuple elements. Recall how Hugo unpacked the tuple even_nums in the video:

a, b, c = even_nums

A three-element tuple named nums has been preloaded for this exercise. Before completing the script, perform the following:

Print out the value of nums in the IPython shell. Note the elements in the tuple. In the IPython shell, try to change the first element of nums to the value 2 by doing an assignment: nums[0] = 2. What happens?


nums =  (3, 4, 6)

# Unpack nums into num1, num2, and num3
num1, num2, num3 = nums

# Construct even_nums

even_nums = (2, num2, num3)

Functions that return multiple values

In the previous exercise, you constructed tuples, assigned tuples to variables, and unpacked tuples. Here you will return multiple values from a function using tuples. Let’s now update our shout() function to return multiple values. Instead of returning just one string, we will return two strings with the string !!! concatenated to each.

Note that the return statement return x, y has the same result as return (x, y): the former actually packs x and y into a tuple under the hood!

# Define shout_all with parameters word1 and word2
def shout_all(word1, word2):
    
    # Concatenate word1 with '!!!': shout1
    shout1 = word1 + "!!!"
    
    # Concatenate word2 with '!!!': shout2
    shout2 = word2 + "!!!"
    
    # Construct a tuple with shout1 and shout2: shout_words
    shout_words = (shout1, shout2)

    # Return shout_words
    return shout_words

# Pass 'congratulations' and 'you' to shout_all(): yell1, yell2

yell1, yell2 = shout_all("congratulations", "you")
# Print yell1 and yell2
print(yell1)
congratulations!!!
print(yell2)
you!!!

Bringing it all together (1)

You’ve got your first taste of writing your own functions in the previous exercises. You’ve learned how to add parameters to your own function definitions, return a value or multiple values with tuples, and how to call the functions you’ve defined.

In this and the following exercise, you will bring together all these concepts and apply them to a simple data science problem. You will load a dataset and develop functionalities to extract simple insights from the data.

For this exercise, your goal is to recall how to load a dataset into a DataFrame. The dataset contains Twitter data and you will iterate over entries in a column to build a dictionary in which the keys are the names of languages and the values are the number of tweets in the given language. The file tweets.csv is available in your current directory.

Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).

# Import pandas

import pandas as pd 
# Import Twitter data as DataFrame: df
df = pd.read_csv("data/tweets.csv")

# Initialize an empty dictionary: langs_count
langs_count = {}

# Extract column from DataFrame: col
col = df['lang']

# Iterate over lang column in DataFrame
for entry in col:

    # If the language is in langs_count, add 1 
    if entry in langs_count.keys():
        langs_count[entry] += 1
    # Else add the language to langs_count, set the value to 1
    else:
        langs_count[entry] = 1

# Print the populated dictionary
print(langs_count)
{'en': 97, 'et': 1, 'und': 2}

Bringing it all together (2)

Great job! You’ve now defined the functionality for iterating over entries in a column and building a dictionary with keys the names of languages and values the number of tweets in the given language.

In this exercise, you will define a function with the functionality you developed in the previous exercise, return the resulting dictionary from within the function, and call the function with the appropriate arguments.

For your convenience, the pandas package has been imported as pd and the ‘tweets.csv’ file has been imported into the tweets_df variable.

# Define count_entries()
def count_entries(df, col_name):
    """Return a dictionary with counts of 
    occurrences as value for each key."""

    # Initialize an empty dictionary: langs_count
    langs_count = {}
    
    # Extract column from DataFrame: col
    col = df[col_name]
    
    # Iterate over lang column in DataFrame
    for entry in col:

        # If the language is in langs_count, add 1
        if entry in langs_count.keys():
            langs_count[entry] += 1
        # Else add the language to langs_count, set the value to 1
        else:
            langs_count[entry] = 1

    # Return the langs_count dictionary
    return langs_count
    

# Call count_entries(): result

result = count_entries(df, "lang")

# Print the result
print(result)
{'en': 97, 'et': 1, 'und': 2}

Pop quiz on understanding scope

In this exercise, you will practice what you’ve learned about scope in functions. The variable num has been predefined as 5, alongside the following function definitions:

num = 5
def func1():
    num = 3
    print(num)

def func2():
    global num
    double_num = num * 2
    num = 6
    print(double_num)
    
func1()
3
func2()
10
print(num)
6
  • Try calling func1() and func2() in the shell, then answer the following questions:

The keyword global

Let’s work more on your mastery of scope. In this exercise, you will use the keyword global within a function to alter the value of a variable defined in the global scope.

# Create a string: team
team = "teen titans"

# Define change_team()
def change_team():
    """Change the value of the global variable team."""

    # Use team in global scope
    global team

    # Change the value of team in global: team
    team = "justice league"
# Print team
print(team)
teen titans
# Call change_team()
change_team()

# Print team
print(team)
justice league

Python’s built-in scope

Here you’re going to check out Python’s built-in scope, which is really just a built-in module called builtins. However, to query builtins, you’ll need to import builtins ‘because the name builtins is not itself built in…No, I’m serious!’ (Learning Python, 5th edition, Mark Lutz). After executing import builtins in the IPython Shell, execute dir(builtins) to print a list of all the names in the module builtins. Have a look and you’ll see a bunch of names that you’ll recognize! Which of the following names is NOT in the module builtins?

# Define three_shouts
def three_shouts(word1, word2, word3):
    """Returns a tuple of strings
    concatenated with '!!!'."""

    # Define inner
    def inner(word):
        """Returns a string concatenated with '!!!'."""
        return word + '!!!'

    # Return a tuple of strings
    return (inner(word1), inner(word2), inner(word3))

# Call three_shouts() and print
print(three_shouts('a', 'b', 'c'))
('a!!!', 'b!!!', 'c!!!')