python-course.eu

38. Expenses and income example with Pandas and Python

By Bernd Klein. Last modified: 26 Apr 2023.

In this chapter of our Pandas tutorial we will deal with simple Expense and Income tables for private usage. Some say that if you want to manage your money successfully you will have to track your income and expenses. Monitoring the flow of money is important to understand your financial situation. You need to know exactly how much is coming in and going out. This article is not meant to convince you of the necessity of doing this. The main focus is rather on the possibilities offered by Python and Pandas to program the necessary tools.

money bags

We will start with a small example, suitable for private purposes and the following chapter of our tutorial continues with a more extensive example suitable for small businesses.

Private Budgeting with Python and Pandas

Let us assume that you have already a csv file containing your expenses and income over a certain period of time. This journal may look like this:

Date;Description;Category;Out;In
2020-06-02;Salary Frank;Income;0;4896.44
2020-06-03;supermarket;food and beverages;132.40;0
2020-06-04;Salary Laura;Income;0;4910.14
2929-06-04;GreenEnergy Corp., (electricity);utility;87.34;0
2020-06-09;water and sewage;utility;60.56;0
2020-06-10;Fitness studio, Jane;health and sports;19.00;0
2020-06-11;payment to bank;monthly redemption payment;1287.43;0
2020-06-12;LeGourmet Restaurant;restaurants and hotels;145.00;0
2020-06-13;supermarket;food and beverages;197.42;0
2020-06-13;Pizzeria da Pulcinella;restaurants and hotels;60.00;0
2020-06-26;supermarket;food and beverages;155.42;0
2020-06-27;theatre tickets;education and culture;125;0
2020-07-02;Salary Frank;Income;0;4896.44
2020-07-03;supermarket;food and beverages;147.90;0
2020-07-05;Salary Laura;Income;0;4910.14
2020-07-08;Golf Club, yearly payment;health and sports;612.18;0
2020-07-09;house insurance;insurances and taxes;167.89;0
2020-07-10;Fitness studio, Jane;health and sports;19.00;0
2020-07-10;supermarket;food and beverages;144.12;0
2020-07-11;payment to bank;monthly redemption payment;1287.43;0
2020-07-18;supermarket;food and beverages;211.24;0
2020-07-13;Pizzeria da Pulcinella;restaurants and hotels;33.00;0
2020-07-23;Cinema;education and culture;19;0
2020-07-25;supermarket;food and beverages;186.11;0

The above mentioned csv file is saved under the name expenses_and_income.csv in the folder data. It is easy to read it in with Pandas as we can see in our chapter Pandas Data Files:

import pandas as pd

exp_inc = pd.read_csv("/data1/expenses_and_income.csv", sep=";")
exp_inc
  Date Description Category Out In
0 2020-06-02 Salary Frank Income 0.00 4896.44
1 2020-06-03 supermarket food and beverages 132.40 0.00
2 2020-06-04 Salary Laura Income 0.00 4910.14
3 2929-06-04 GreenEnergy Corp., (electricity) utility 87.34 0.00
4 2020-06-09 water and sewage utility 60.56 0.00
5 2020-06-10 Fitness studio, Jane health and sports 19.00 0.00
6 2020-06-11 payment to bank monthly redemption payment 1287.43 0.00
7 2020-06-12 LeGourmet Restaurant restaurants and hotels 145.00 0.00
8 2020-06-13 supermarket food and beverages 197.42 0.00
9 2020-06-13 Pizzeria da Pulcinella restaurants and hotels 60.00 0.00
10 2020-06-26 supermarket food and beverages 155.42 0.00
11 2020-06-27 theatre tickets education and culture 125.00 0.00
12 2020-07-02 Salary Frank Income 0.00 4896.44
13 2020-07-03 supermarket food and beverages 147.90 0.00
14 2020-07-05 Salary Laura Income 0.00 4910.14
15 2020-07-08 Golf Club, yearly payment health and sports 612.18 0.00
16 2020-07-09 house insurance insurances and taxes 167.89 0.00
17 2020-07-10 Fitness studio, Jane health and sports 19.00 0.00
18 2020-07-10 supermarket food and beverages 144.12 0.00
19 2020-07-11 payment to bank monthly redemption payment 1287.43 0.00
20 2020-07-18 supermarket food and beverages 211.24 0.00
21 2020-07-13 Pizzeria da Pulcinella restaurants and hotels 33.00 0.00
22 2020-07-23 Cinema education and culture 19.00 0.00
23 2020-07-25 supermarket food and beverages 186.11 0.00

By reading the CSV file, we created a DataFrame object. What can we do with it, or in other words: what information interests Frank and Laura? Of course they are interested in the account balance. They want to know what the total income was and they want to see the total of all expenses.

The balances of their expenses and incomes can be easily calculated by applying the sum on the DataFrame exp_inc[['Out', 'In']]:

exp_inc[['Out', 'In']].sum()

OUTPUT:

Out     5097.44
In     19613.16
dtype: float64

What other information do they want to gain from the data? They might be interested in seeing the expenses summed up according to the different categories. This can be done using groupby and sum:

category_sums = exp_inc.groupby("Category").sum()
category_sums
  Out In
Category    
Income 0.00 19613.16
education and culture 144.00 0.00
food and beverages 1174.61 0.00
health and sports 650.18 0.00
insurances and taxes 167.89 0.00
monthly redemption payment 2574.86 0.00
restaurants and hotels 238.00 0.00
utility 147.90 0.00
category_sums.index

OUTPUT:

Index(['Income', 'education and culture', 'food and beverages',
       'health and sports', 'insurances and taxes',
       'monthly redemption payment', 'restaurants and hotels', 'utility'],
      dtype='object', name='Category')
import matplotlib.pyplot as plt

ax = category_sums.plot.bar(y="Out")
plt.xticks(rotation=45)

OUTPUT:  

data image

(array([0, 1, 2, 3, 4, 5, 6, 7]), <a list of 8 Text xticklabel objects>)

We can also display this as a pie chart:

ax = category_sums.plot.pie(y="Out")
ax.legend(loc="upper left", bbox_to_anchor=(1.5, 1))

OUTPUT:  

data image

<matplotlib.legend.Legend at 0x7fe20d953fd0>

Alternatively, we can create the same pie plot with the following code:

ax = category_sums["Out"].plot.pie()
ax.legend(loc="upper left", bbox_to_anchor=(1.5, 1))

OUTPUT:

data image

<matplotlib.legend.Legend at 0x7fe20d90c810>

If you imagine that you will have to type in all the time category names like "household goods and service" or "rent and mortgage interest", you will agree that it is very likely to have typos in your journal of expenses and income.

So it will be a good idea to use numbers (account numbers) for your categories. The following categories are available in our example.

The following categories are provided:

Category Account Number
rent and mortgage interest 200
insurances and taxes 201
food and beverages 202
education and culture 203
transport 204
health and sports 205
household goods and services 206
clothing 207
communications 208
restaurants and hotels 209
utility ( heating, electricity, water, and garbage) 210
other expenses 211
income 400

We can implement this as a dictionary mapping categories into account numbers:

category2account = {'monthly redemption payment': '200',
                    'insurances and taxes': '201',
                    'food and beverages': '202',
                    'education and culture': '203',
                    'transport': '204',
                    'health and sports': '205',
                    'household goods and services': '206',
                    'clothing': '207',
                    'communications': '208',
                    'restaurants and hotels': '209',
                    'utility': '210',
                    'other expenses': '211',
                    'Income': '400'}

The next step is to replace our "clumsy" category names with the account numbers. The replace method of DataFrame is ideal for this purpose. We can replace all the occurrences of the category names in our DataFrame by the corresponding account names:

exp_inc.replace(category2account, inplace=True)
exp_inc.rename(columns={"Category": "Accounts"}, inplace=True)
exp_inc[:5]
  Date Description Accounts Out In
0 2020-06-02 Salary Frank 400 0.00 4896.44
1 2020-06-03 supermarket 202 132.40 0.00
2 2020-06-04 Salary Laura 400 0.00 4910.14
3 2929-06-04 GreenEnergy Corp., (electricity) 210 87.34 0.00
4 2020-06-09 water and sewage 210 60.56 0.00

We will save this DataFrame object now in an excel file. This excel file will have two sheets: One with the "expenses and income" journal and the other one with the mapping of account numbers to category names. We will turn the category2account dictionary into a Series object for this purpose. The account numbers serve as the index:

account_numbers = pd.Series(list(category2account.keys()), index=category2account.values())
account_numbers.name = "Description"
account_numbers.rename("Accounts")

OUTPUT:

200      monthly redemption payment
201            insurances and taxes
202              food and beverages
203           education and culture
204                       transport
205               health and sports
206    household goods and services
207                        clothing
208                  communications
209          restaurants and hotels
210                         utility
211                  other expenses
400                          Income
Name: Accounts, dtype: object
exp_inc.insert(1, "accounts", account_numbers)
with pd.ExcelWriter('/data1/expenses_and_income_2020.xlsx') as writer:
    account_numbers.to_excel(writer, "account numbers")
    exp_inc.to_excel(writer, "journal")
    writer.save()

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

See: Live Python courses overview

Upcoming online Courses

Enrol here