Where math mixes with code: novembre 2015

venerdì 20 novembre 2015

Sorting algorithms in C++ - A small introduction for newbies and wannabe programmers

Today I finished the preparation of my set of slides on sorting algorithms.
This presentation is meant to introduce sorting algorithms to the ones that:

do not know what a sorting algorithm is
want to know more details about bubble sort algorithm (less interesting...) and merge sort algorithm (...definitely interesting!)
want to know how to implement such algorithms in C++
are a little bit rusty on the sorting process

So, drink up!

domenica 8 novembre 2015

A crash course in Python - Lesson 7: Functional programming

In this post, I don't want to discuss functional programming at all, but I would like to list some of the functional tools I encountered while using Python.

Curry functions

From Wikipedia:
"Currying is the technique of translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument"

Suppose we have the following function:

def exp(base, power):
    return base ** power

If we want to create the function two_to_the with a single input (which is the power) and fixed base (which is 2), we may do it by simply doing the following:

def two_to_the(power):
    return exp(2, power)

The same thing can be done as follows:

from functools import partial

two_to_the = partial(exp,2)  # is a function of one variable
print two_to_the(3)  # 8

square_of = partial(exp, power=2)
print square_of(3)  # 9

Map, Filter and Reduce

We occasionally use map, reduce and filter, which provide functional alternatives to list comprehension.

Map

The map function can be used with multiple-argument function if you provide multiple lists:

def multiply(x, y):
    return x * y

products = map(multiply, [1,2], [4,5])
# [1*4, 2*5] = [4, 10]

Filter

The filter function does the work of a list-comprehension if:

def isEven(x):
    return x % 2 == 0

xs = [1,2,3,4]
x_evens = [x for x in xs if isEven(x)]  # [2,4]
x_evens = filter(isEven, xs)  # same as above
list_evener = partial(filter, isEven)  # function that filters a list
x_evens = list_evener(xs)  # again [2,4]

Reduce

The reduce function combines the first two elements of a list, then that result with the third, that result with the fourth, and so on.

xProduct = reduce(multiply, xs)  # = 1*2*3*4 = 24
listProduct = partial(reduce, multiply)  # function that reduces a list
xProduct = listProduct(xs)  # again = 24

A crash course in Python - Lesson 6: Object-oriented programming

Python allows the programmer to define classes that encapsulate:

attributes (i.e., data that describe the status of the instantiated object)
methods (i.e., functions that describe the behavior of the instantiated object)

Suppose we want to create a class that describes a set of elements (although, as we saw in a previous lesson, Set is already a built-in Python data structure.

The behavior that we may apply on such class is as follows:

adding items
removing items from it
check whether it contains a certain value

Here is a possible implementation of such class:

class Set:
    def __init__(self, values=None):  # constructor
        self.dict = {}
        if values is not None:
            for value in values:
                self.add(value)

    def __repr__(self):  # toString() method
        return "Set: " + str(self.dict.keys())

    def add(self, value):
        self.dict[value] = True

    def contains(self, value):
        return value in self.dict

    def remove(self, value):
        del self.dict[value]

Then we can use such class as follows:

s = Set([1,2,3])

s.add(4)
print s.contains(4)  # True

s.remove(3)
print s..contains(3)  # False

A crash course in Python - Lesson 5: some extra functionalities

Generators

A huge problem with lists is that they can grow really big. By using the following instruction:

L = range(1000000)

we actually create a list containing one million elements! If you only need to deal with one of them at a time, this results in a very inefficient code (supposing that you are able to fit the list in memory... if not, you are in trouble).

To overcome this problem, one can use generators. A generator is something that you can iterate over (e.g., by using a for loop) so that values are produced in a lazy manner, i.e., only when needed.

The creation of a generator resembles the creation of a list, with the only difference that elements are not materialized: we only suggest to the generator how to create the next elements when one need to use them.
To create a generator, one can use the following method:

def lazy_range(N):
    i = 0
    while i < N
        yield i
        i += 1

In this way, the yielded values are consumed one at a time, until none are left. The iteration over the generator is identical to the one we use over lists:

for i in lazy_range(10):
    do_something(i)

Actually, Python comes with a "lazy range function" as the one reported above, called xrange. Moreover, in Python 3 the function range is lazy itself.

By using generators, one could create even an infinite sequence! See the following:

def natural_numbers():
    n = 1
    while True:
        yield n
        n += 1

Note: the flip side of laziness is that you can only iterate through a generator once! If you need to iterate through something multiple times:

either you recreate the generator
or you use a list

Another way of creating a generator is by using list comprehension (see Lesson 2):

lazy_evens_below_20 = (i for i in lazy_range(20) if i % 2 == 0)

Randomness

The Python random module allows us to draw random numbers in the interval [0,1].
To draw a random number we do as follows:

import random
aRandomNumber = random.random()

Reproducibility

As usual, the random generator produces pseudorandom numbers (i.e., deterministic) based on an internal state. Such state can be initialize with random.seed. It is suggested to apply such approach, since you may want to get reproducible results in your experiments!
Every time you reset the seed to a specific value, the extraction of random numbers restarts, and redraws the same values it drew the first time you reset the seed to that value.

mySeed = 10

random.seed(mySeed)
print random.random()  # 0.5714

random.seed(mySeed)
print random.random()  # again 0.5714

Extracting values from a range

You can use random.randrange(), which takes:

a single arguments B, when you want to extract numbers from the range [0, 1, ..., B-1]
two arguments A and B, when you want to extract numbers from the range [A, A+1, ..., B-1]

Lists vs. random

The random module can be used to randomize the order of list elements, too. This is done via the random.shuffle() function:

upToTen = range(10)
random.shuffle(upToTen)
print upToTen
# [2,5,1,9,7,3,8,6,4,0]

With a similar mechanism, it is possible to pick up a random element from the list.

myBestFriend = random.choice(["Alice","Bob","Charlie"])  # 'Bob'

If you need to draw a sample of elements without replacement (i.e., without duplicates):

lotteryNumbers = range(60)
winningNumbers = random.sample(lotteryNumbers, 6)
# [16, 36, 10, 6, 25, 9]

To choose a sample of elements with replacement (i.e., with duplicates):

fourWithReplacement = [random.choice(range(10))
                          for _ in range(4)]

Regular expressions

Regular expressions provide a way of searching text.
The Python module that allows one to apply regular expressions to text is re.
We are not going to discuss regular expressions syntax in this article! Feel free to read some documentation.

args and kwargs

Python allows the programmer to declare function whose parameters are not specified. By using args and kwargs, one is able to read:

the parameter values (via args)
the parameter names (via kwargs)

def magic(*args, **kwargs):
    print "unnamed args: ", args
    print "keyword args:", kwargs

magic(1, 2, key="word", key2="word2")

# prints:
#   unnamed args: (1,2)
#   keyword args: {'key2': 'word2', 'key': 'word'}

sabato 7 novembre 2015

A crash course in Python - Lesson 4: other data structures

Tuple

A tuple is an immutable list. Technically, you can do on/with tuples whatever you can do on a list that does not involve modifying it.

A tuple can be specified by using parentheses (or nothing) instead of square brackets:

my_tuple = (1, 2)
other_tuple = 3, 4

try:
    my_tuple[1] = 3
except TypeError:
    print "cannot modify a tuple"

A tuple is a convenient way of returning multiple values from functions:

def sum_and_product(x, y):
    return (x + y), (x * y)

sp = sum_and_product(2, 3)  # equals (5,6)
s, p = sum_and_product(5, 10) # s = 15, p = 50

Consequently, tuples can be used for multiple assignments too:

x, y = 1, 2  # x = 1, y = 2
x, y = y, x  # swap: x = 2, y = 1

Set

A set represents a collection of distinct elements. It can be declared, modified and manipulated as follows:

s = set()
s.add(1)  # s = {1}
s.add(2)  # s = {1, 2}
s.add(2)  # s = {1, 2}
x = len(s)  # x = 2
y = 2 in s  # y = True
z = 3 in s  # z = False

Sets perform very well when one has to check if they contain a specific value: in is a very fast operation! So, if we have a large collection of items that we want to use for a membership test, a set is more appropriate than a list.
Moreover, if one builds a set starting from a list, what he obtains is the set of distinct values in that list:

my_list = [1, 2, 3, 2, 1, 3]
item_set = set(my_list)  # {1, 2, 3}
distinct_elements = list(item_set) # [1, 2, 3]

A crash course in Python - Lesson 3: Dictionaries

Data structure: the map

A fundamental data structure in computer science is the map. A map associates a key (usually a string) with a value (which usually can be any object).

Here is an example of a map construction and usage.
Suppose we would like to count the number of words in a document, in order to build a histogram of frequencies. Let's assume we use the following map to store the histogram:

"word1" -> frequency1
"word2" -> frequency2
....
"wordN" -> frequencyN

On the left side, I listed the words of the documents, without duplicates. These represent the keys of the map. Whenever a key is provided (say, word1), the corresponding value (in our example, frequency1) is retrieved.
In order to build the aforementioned map, we can use the following algorithm (written in pseudo-code):

for (aWord in document)
    map[aWord] = map[aWord] + 1

In this pseudo-code, map[aWord] retrieves the value associated with the word aWord. In this way, every time the word aWord is encountered in the document, the frequency associated with that word and stored in the map is incremented.
Where is the pro in all this?

Accessing the map given a word takes O(1)
With an algorithm of time complexity O(N) we built the histogram of frequencies

Almost every programming language has its own implementation of such data structure (e.g., HashMap is the Java implementation).

Moreover, there are many products based on it. If you are fond of NodeJS, for instance, you could know Redis, an in-memory data structure stored, which can be used as a cache, and is fully integrated with NodeJS. In this case, keys and values are both stored as strings.

From map to Python's dictionary

Python has its own "map-like" data structure, called dictionary. A dictionary associates values with keys, and is declared as follows:

empty_dict1 = {}
empty_dict2 = dict()
grades = {"Joel": 80, "Tim": 95}

Properties of keys

Dictionary keys must be immutable. In particular, you cannot use lists as keys! If you need a multipart key, then you should use a tuple, or figure out a way to turn the key into a string.

Retrieving a value

You can look up the value for a specific key using square brackets:

joel_grade = grades["Joel"]

Obviously, it could be the case that a key does not exist in the map. Thus, you may want to cast an exception that cover such case:

try:
    kates_grade = grades["Kate"]
except KeyError:
    print "No grade for Kate"

Instead of casting an exception, one may want to check if a key exists by doing the following operation:

joel_has_grade = "Joel" in grades   # True

In this case, we are using the operator in to check if the specified value is contained in the dictionary. This reminds lists and the membership check on an element... (check my previous Python lesson on lists).

However, to avoid the whole fuss and muss that could be created by the absence of a key, you can use the get() method, which:

in case the key exists, returns the associated value
in case the key does not exist, returns a default value

Here are some examples:

joel_grade = grades.get("Joel",0)  # equals 80, default is 0
no_one_grade = grades.get("No one")  # default is None

Adding a new value

It is possible to assign key-value pairs using the square brackets notation:

grades["Tim"] = 99

Going back to our word histogram example

We are going to create a dictionary where the keys are words and the values are word counts.

word_counts = {}
for word in document:
    if word in word_counts:
        word_counts [word] += 1
    else:
        word_counts[word] = 1

A second (and equivalent) approach is the following, which uses exceptions to handle the special case in which the word is not in the dictionary:

word_counts = {}
for word in document:
    try:
        word_counts [word] += 1
    except KeyError:
        word_counts[word] = 1

A third approach uses the get() method:

word_counts = {}
for word in document:
    previous_count = word_counts.get(word, 0)
    word_counts[word] = previous_count + 1

defaultdict

A defaultdict (i.e., "default dictionary") is a regular dictionary that handles particularly the case in which someone tries to look up a key it does not contain. In this case, the dictionary adds a value for it, using a zero-argument function you provide when you create it.

To use a default dictionary, you have to import it:

from collections import defaultdict

Going back to our example, we can use default dictionaries to store the histogram of words:

word_counts = defaultdict(int)    # int() produces 0
for word in document:
    word_counts[word] += 1

You can use a default dictionary to initialize several objects:

def_list_dict = defaultdict(list)  # list() produces an empty list
def_list_dict[2]. append(1)  # {2:[1]}

default_dict = defaultdict(dict) # dict() produces an empty dict
default_dict["Joel"]["City"] = "Seattle"  # {"Joel":{"City":"Seattle"}}

default_pair = defaultdict(lambda: [0,0])
default_pair[2][1] = 1  # {2:[0,1]}

Counter

A counter turns a sequence of values into a defaultdict(int)-like object, mapping keys to counts. It can be used for histograms.

from collections import Counter
c = Counter([0, 1, 2, 0])  # {0:2, 1:1, 2:1}

The output shows that:

the number "0" appeared 2 times in the sequence
the number "1" appeared 1 time in the sequence
the number "2" appeared 1 time in the sequence

We can use a Counter to solve the problem of counting words in a document, too:

word_counts = Counter(document)

In the Counter object there is the possibility of using the most_common method:

# print the 10 most common words and their counts
for word, count in word_counts.most_common(10)
    print word, count

mercoledì 4 novembre 2015

A crash course in Python - Lesson 2: Lists

A fundamental data structure is the list. A list is an ordered collection of data. This does NOT mean that values are ordered in a specific order (i.e., either ascending or descending). Instead, it means that elements can be accessed in an ordered way, starting from index 0 to index (length-1).

A list can be seen as an array with some added functionality, and a great (and very important!) difference: while an array stores homogeneous values (i.e., values having the same type), a list can store heterogeneous values.

Instantiating a list

The following two lines show how to declare:

a homogeneous list (i.e., a list containing values having the same data type);
a heterogeneous list (i.e., a list containing values having different data types).

homogeneous_list_of_integers = [1, 2, 3]
heterogeneous_list = ["string", 0.1, True]

A list can also be generated via the range function, which includes all the values in a range:

list = range(10) # [0,1,...,9]

Element types

Note: list elements can be lists too:

list_of_lists = [homogenous_list, heterogeneous_list, []]

Accessing to elements

It is possible to access to list elements in the following way:

first = list[0]  # first element in the list
last = list[-1]  # last element of the list
next_to_last = list[-2]  # next_to_last element of the list

Check if a list contains an element

Python has an in operator to check for list membership:

1 in [0, 1, 2]  # True
3 in [0, 1, 2]  # False

Slicing lists

You can use square brackets to slice lists:

first_three_elements = list[:3]  #index 3 is EXCLUDED
three_to_end = list[3:]  #index 3 is INCLUDED
one_to_four = list[1:5]  #index 1 is included, 
                         # index 5 is excluded
without_first_and_last = list[1:-1] # 0 and -1 are excluded
last_three = list[-3:]
copy_of_x = list[:]

Concatenate lists

It is easy to concatenate lists together:

x = [1, 2, 3]
x.extend([4, 5, 6])  # Now x is [1,2,3,4,5,6]

However, one can decide to avoid the modification of the original list, by using list addition:

x = [1, 2, 3]
y = x + [4, 5, 6]
# Now y = [1, 2, 3, 4, 5, 6]
# while x is unchanged

It is also possible to append lists one item at a time:

x = [1, 2, 3]
x.append(0)

Store the list content in separate variables

It is often convenient to unpack lists if you know exactly how many elements they contain:

x, y = [1, 2]  # Now x = 1 and y = 2

Notice that if you are not interested in extracting and storing all the values in the list, you can decide to ignore some values:

_, y = [1, 2]  #Now y = 2 and we did not care about the first element

Some special functionalities

Lists have specific functionalities that allow one to manipulate easily their data.

For instance, the following computes the length of a list:

list_length = len(list)

The following, instead, computes the sum of the elements of the list:

list_sum = sum(list)

Creating lists via for loops

You may want to transform a list into another list, by choosing only certain elements, or by transforming elements, or both. This operation is called list comprehension. You may do this by doing the following:

even_numbers = [x for x in range(5) id x % 2 == 0]
squares = [x * x for x in range(5)]
even_squares = [x * x for x in even_numbers]

The same thing applies for dictionaries or sets: you can transform a list into one of them.

If you do not need the value from the list, you can use an underscore as the variable:

zeroes = [0 for _ in even_numbers]

Finally, you may use multiple for to populate the list:

pairs = [(x, y)
           for x in range(10)
           for y in range(10)]
# 100 pairs: (0,0) (0,1)...(9,8) (9,9)

Sorting lists

Don't implement your own (bubble sort??) sorting algorithm: Python already provides a sorting method for lists!

the sort() method is applied on a list x via dot notation (i.e., x.sort()) and sorts the list in place meaning that x will be modified so that its elements will be ordered in ascending order
the sorted(L) method takes a list L as parameter and returns a new list, which is a copy of list L, only with ordered elements (in ascending ordered)

x = [4, 1, 2, 3]
y = sorted(x)  # y = [1, 2, 3, 4], x unchanged
x.sort()  # x = [1, 2, 3, 4]

There are a couple of things that you can apply to the sorted function:

to sort elements in descending order, you can specify a reverse=True parameter
you can compare the results of a function that you specify with the key parameter; the sorted function will order such results instead of the actual elements of the list

x = sorted([-4,1,-2,3], key=abs, reverse=True)  
# result: [-4, 3, -2, 1]

# sort the words and counts from highest count to lowest
wc = sorted(word_counts.items(),
                   key = lambda (word, count): count,
                   reverse=True)

Iterating over elements and indexes: enumerate

You may want to iterate over a list and use both its elements and their indexes. The solution is the enumerate() function which produces tuples of the form

(index, element)

Note: to have further details about what is a tuple and how to use it, see Lesson 4.

To extract such tuples:

for i, anElement in enumerate(myCollection):
    do_something(i, anElement)

Instead, if you just want the indexes:

for i, _ in enumerate(myCollection): do_something(i)

Zipping and unzipping lists

Zipping lists

One may need to zip lists together, i.e., transform multiple lists into a single list of tuples, where each tuple contains elements from the lists having the same index.

list1 = ['a', 'b', 'c']
list2 = [1, 2, 3]
zip(list1, list2)  # [('a',1), ('b',2), ('c',3)]

Unzipping lists

You can also unzip a list in multiple lists:

pairs = [('a',1), ('b',2), ('c',3)]
letters, numbers = zip(*pairs)

Here, the asterisk performs an argument unpacking, meaning that we will use the elements of pairs as individual arguments to the zip function. This results in the same outcome that you'd obtain by calling:

zip(('a',1), ('b',2), ('c',3))
# result: [('a','b','c'), (1,2,3)]

You can decide to use argument unpacking with any function, as follows:

def add(a,b):
    return a + b

add(1,2)  # returns 3
add([1,2])  # TypeError
add(*[1,2])  # returns 3

domenica 1 novembre 2015

A crash course in Python - Lesson 1: Basic principles

Recently I found myself in learning two or three things in Python. Why Python? Well, every data scientist knows how important is to use programming languages for which data science libraries are available, and actually Python is one of these languages.

I would like to summarize all the concepts I have learned in a set of articles. Such concepts are really simple, but help one in getting started with this programming language.

A note: In this small tutorial, I am using Python 2.7, since Python 3 is not backward-compatible with Python 2.

Readable code? Commented code

Comments in Python are inserted as follows:

# this is a comment
a = b + c # this is a comment too

Indenting code is the only option!

Many programming languages (e.g., Java, C++) use curly braces to delimit blocks of code. Python replaces curly braces with indentation.

  for i in [1, 2, 3]:
    print i
    for j in [1, 2, 3]:
      print j
      print i + j

Notice, moreover, that semicolon are not needed in this language, so, do not use them.

Adding functionalities via modules

A module in Python is a third-party feature collector that, if included in your code, allows you to use such features.

Importing a module:

import re as reg_expr
my_regular_expression = reg_expr.compile("[0-9]", reg_expr.I)

Importing only a set of features from a module:

# module: collections
# functions: defaultdict, Counter
from collections import defaultdict, Counter
lookup = defaultdict(int)
my_counter = Counter()

Arithmetic: dividing double values made easy

Python executes integer divisions by default, so that, for instance, when we execute $5 / 2$, this equals $2$. Since usually we do not want to obtain this, there is a tweak that allows us to do proper divisions resulting in non-integer numbers:

from __future__ import division

From now on, $5/2 = 2.5$.

If you want to perform integer division, then, you will write:

5 // 2

Python functions

A function, as usual, is a portion of code that takes zero or more inputs and returns an output. In Python functions are defined as follows:

def double(x):
  """ this is the description of the function"""
  return x * 2

Functions as parameters

As in JavaScript, it is possible to pass functions as parameters to other functions:

def apply_to_one(myFunction):
  return myFunction(1)

# double is a function
my_double = double
x = apply_to_one(my_double)

Lambda functions

Lambda functions are short anonymous functions, declared in Python as follows:

y = apply_to_one(lambda x: x + 4)

Here, I am passing as a function that requires to add $4$ to any provided input. Then, I am passing such function as a parameter to a function that adds $1$ to the input. As a result, the function apply_to_one calls the lambda function passing the value $x = 1$, which results in $x = 1$ plus $x = x + 4$, which in turn returns $x = 5$.

Default arguments for functions

Default arguments need to be specified when you want a value other than the default.

def my_print(message="my default message"):
  print message

my_print("hello") #prints "hello"
my_print() #prints "my default message"

Default arguments can be specified by name, if needed:

def subtract(a=0,b=0):
  return a - b

subtract(0,5) #returns -5
subtract(b=5) # returns -5

Strings

Strings, as in JavaScript,can be delimited by single or double quotation marks.

my_string = "hello"
your_string = 'hello to you'

Python uses the len() function to return the length of the string (i.e., the number of characters it contains).

Exceptions

Why try-catching exceptions, if you can try-except them? Python does it! Forget about the catch clause. You can handle an exception as follows:

try:
  print 0 / 0
except ZeroDivisionError:
  print "cannot divide by zero"

Control flow

If

Obviously, Python allows to alter the execution flow by performing actions conditionally using an if:

if 1 > 2:
    print "1 is smaller than 2"
elif 1 > 3:
    print "1 is smaller than 3"
else:
    print "all other conditions failed."

Python has also the ternary if-then-else line:

parity = "even" if z % 2 == 0 else "odd"

While

This is the construct for the while loop:

x = 0
while x < 10:
    x += 1

For

This is the construct for the for loop:

for x in range(10)
    print x

If you need more complex logic, you can use continue and break:

for x in range(10):
    if x == 3:
        continue
    if x == 5:
        break
    print x

Boolean values

Booleans in Python work as in other languages, except they are capitalized:

True is the true value
False is the false value
None indicates a nonexistent value (similar to other languages' NULL value)

The following values are treated as false: False, None, [] (i.e., an empty list), {} (i.e., an empty dictionary), "", set(), 0, 0.0

Anything else gets treated as True.

Python has:

an all function, which takes a list and returns True precisely when every element is truthy
an any function, which returns True when at least one element is truthy

Wrap-up

Well, that's all for basic principles. They are kind of common with other languages.

A little thing that I did not cover is how Python handles data types. Here is a nice explanation about why Python is both a dynamic language and also a strongly typed language. To summarize the whole thing:

Strong typing means that the type of a value cannot change. For instance, a variable that is born as string remains a string, cannot be converted to numbers. Python is strongly typed: variable types stay as they are.
Dynamic typing means that runtime objects have a type, as opposed to static typing where variables have a type. Python is a dynamic language.

A blog that mixes concepts from Mathematics and Software Engineering

Hello to everyone!
Today I open this blog, to share ideas and concepts that I learn over time regarding math and software engineering.

Well well, before starting with the real content, I would like to thank the author of this answer for pointing to nice solutions that allows bloggers to add equations to their blogs.