Sequential Data Types


A String can be seen as a sequence of characters, which can be expressed in several ways:

Indexing strings

Let's look at the string "Hello World": Diagram string Hello World You can see that the characters of a string are enumerated from left to right starting with 0. If you start from the right side, the enumeration is started with -1.

Every character of a string can be accessed by putting the index after the string name in square brackets, as can be seen in the following example:

>>> txt = "Hello World"
>>> txt[0]
>>> txt[4]
Negative indices can be used as well. In this case we start counting from right, starting with -1:
  >>> txt[-1]
  >>> txt[-5]

Python Lists

The list is a most versatile data type in Python. It can be written as a list of comma-separated items (values) between square brackets. Lists are related to arrays of programming languages like C, C++ or Java, but Python lists are by far more flexible than "classical" arrays. For example, items in a list need not all have the same type. Furthermore lists can grow in a program run, while in C the size of an array has to be fixed at compile time.

An example of a list:
languages = ["Python", "C", "C++", "Java", "Perl"]
There are different ways of accessing the elements of a list. Most probably the easiest way for C programmers will be through indices, i.e. the numbers of the lists are enumerated starting with 0:
>>> languages = ["Python", "C", "C++", "Java", "Perl"]
>>> languages[0]
>>> languages[1]
>>> languages[2]
>>> languages[3]
The previous example of a list has been a list with elements of equal data types. But as we had before, lists can have various data types. The next example shows this:
group = ["Bob", 23, "George", 72, "Myriam", 29]


Lists can have sublists as elements. These sublists may contain sublists as well, i.e. lists can be recursively constructed by sublist structures.
>>> person = [["Marc","Mayer"],["17, Oxford Str", "12345","London"],"07876-7876"]
>>> name = person[0]
>>> print name
['Marc', 'Mayer']
>>> first_name = person[0][0]
>>> print first_name
>>> last_name = person[0][1]
>>> print last_name
>>> address = person[1]
>>> street = person[1][0]
>>> print street
17, Oxford Str
The next example shows a more complex list with a deeply structured list:
>>> complex_list = [["a",["b",["c","x"]]]]
>>> complex_list = [["a",["b",["c","x"]]],42]
>>> complex_list[0][1]
['b', ['c', 'x']]
>>> complex_list[0][1][1][0]


A tuple is an immutable list, i.e. a tuple cannot be changed in any way once it has been created. A tuple is defined analogously to lists, except that the set of elements is enclosed in parentheses instead of square brackets. The rules for indices are the same as for lists. Once a tuple has been created, you can't add elements to a tuple or remove elements from a tuple.

Where is the benefit of tuples? The following example shows how to define a tuple and how to access a tuple. Furthermore we can see that we raise an error, if we try to assign a new value to an element of a tuple:
>>> t = ("tuples", "are", "immutable")
>>> t[0]
>>> t[0]="assignments to elements are not possible"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment


Python endless Lists and strings have many common properties, e.g. the elements of a list or the characters of a string appear in a defined order and can be accessed through indices. There are other data types with similar properties like tuple, buffer and xrange. In Python these data types are called "sequence data types" or "sequential data types".

Operators and methods are the same for "sequence data types", as we will see in the following text.


In many programming languages it can be quite tough to slice a part of a string and even tougher, if you like to address a "subarray". Python makes it very easy with its slice operator. Slicing is often better known as substring or substr.

When you want to extract part of a string, or some part of a list, you use in Python the slice operator. The syntax is simple. Actually it looks a little bit like accessing a single element with an index, but instead of just one number we have more, separated with a colon ":". We have a start and an end index, one or both of them may be missing. It's best to study the mode of operation of slice by having a look at examples:

>>> str = "Python is great"
>>> first_six = str[0:6]
>>> first_six
>>> starting_at_five = str[5:]
>>> starting_at_five
'n is great'
>>> a_copy = str[:]
>>> without_last_five = str[0:-5]
>>> without_last_five
'Python is '
Syntactically, there is no difference on lists:
>>> languages = ["Python", "C", "C++", "Java", "Perl"]
>>> some_languages = languages[2:4]
>>> some_languages
['C++', 'Java']
>>> without_perl = languages[0:-1]
>>> without_perl
['Python', 'C', 'C++', 'Java']

Slicing works with three arguments as well. If the third argument is for example 3, only every third element of the list, string or tuple from the range of the first two arguments will be taken.

If s is a sequential data type, it works like this:
s[begin: end: step]
The resulting sequence consists of the following elements:
s[begin], s[begin + 1 * step], ... s[begin + i * step] for all (begin + i * step) < end.
In the following example we define a string and we print every third character of this string:
>>> str = "Python under Linux is great"
>>> str[::3]
'Ph d n  e'


Length of a sequence The length of a sequence, i.e. a list, a string or a tuple, can be determined with the function len(). For strings it counts the number of characters and for lists or tuples the number of elements are counted, whereas a sublist counts as 1 element.

>>> txt = "Hello World"
>>> len(txt)
>>> a = ["Swen", 45, 3.54, "Basel"]
>>> len(a)

Concatenation of Sequences

Combining two sequences like strings or lists is as easy as adding two numbers. Even the operator sign is the same.
The following example shows how to concatenate two strings into one:
>>> firstname = "Homer"
>>> surname = "Simpson"
>>> name = firstname + " " + surname
>>> print name
Homer Simpson
It's as simple for lists:
>>> colours1 = ["red", "green","blue"]
>>> colours2 = ["black", "white"]
>>> colours = colours1 + colours2
>>> print colours
['red', 'green', 'blue', 'black', 'white']
The augmented assignment "+=" which is well known for arithmetic assignments work for sequences as well.
s += t
is syntactically the same as:
s = s + t
But it is only syntactically the same. The implementation is different: In the first case the left side has to be evaluated only once. Augment assignments may be applied for mutable objects as an optimization.

Checking if an Element is Contained in List

It's easy to check, if an item is contained in a sequence. We can use the "in" or the "not in" operator for this purpose.
The following example shows how this operator can be applied:

>>> abc = ["a","b","c","d","e"]
>>> "a" in abc
>>> "a" not in abc
>>> "e" not in abc
>>> "f" not in abc
>>> str = "Python is easy!"
>>> "y" in str
>>> "x" in str


So far we had a "+" operator for sequences. There is a "*" operator available as well. Of course there is no "multiplication" between two sequences possible. "*" is defined for a sequence and an integer, i.e. s * n or n * s.
It's a kind of abbreviation for an n-times concatenation, i.e.

str * 4
is the same as
str + str + str + str
Further examples:
>>> 3 * "xyz-"
>>> "xyz-" * 3
>>> 3 * ["a","b","c"]
['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']
The augmented assignment for "*" can be used as well:
s *= n is the same as s = s * n.

The Pitfalls of Repetitions

In our previous examples we applied the repetition operator on strings and flat lists. We can apply it to nested lists as well:
>>> x = ["a","b","c"]
>>> y = [x] * 4
>>> y
[['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']]
>>> y[0][0] = "p"
>>> y
[['p', 'b', 'c'], ['p', 'b', 'c'], ['p', 'b', 'c'], ['p', 'b', 'c']]
Repetitions with references This result is quite astonishing for beginners of Python programming. We have assigned a new value to the first element of the first sublist of y, i.e. y[0][0] and we have "automatically" changed the first elements of all the sublists in y, i.e. y[1][0], y[2][0], y[3][0]
The reason is that the repetition operator "* 4" creates 4 references to the list x: and so it's clear that every element of y is changed, if we apply a new value to y[0][0].