39. Net Income Method Example with Numpy, Matplotlib and Scipy
By Bernd Klein. Last modified: 26 Apr 2023.
The Net Income Method (in Germany known as Einnahmeüberschussrechnung, EÜR) is a simplified profit determination method. Under German law, self-employed persons such as doctors, lawyers, architects and others have the choice of submitting annual accounts or using the simple net income method. What we present in this chapter of our Pandas tutorial will not only be interesting for small companies, especially in Germany and EU countries, but also for other countries in the world as well.
We are mainly interested in introducing the different ways in which pandas and python can solve such problems. We show what it is all about to monitor the flow of money and in this way to better understand the financial situation. The algorithms can also be used for tax purposes. But be warned, this is a very general treatment of the matter and needs to be adjusted to the actual tax situation in your country. Although it is mainly based on German law, we cannot guarantee its accuracy! So, before you actually use it, you have to make sure that it is suitable for your situation and the tax laws of your country.
Very often a Microsoft Excel file is used for mainting a journal file of the financial transactions. In the data1
folder there is a file that contains an Excel file net_income_method_2020.xlsx
with the accounting data of a fictitious company. We could have used simple texts files like csv as well.
Journal File
This excel document contains two data sheets: One with the actual data "journal" and one with the name "account numbers", which contains the mapping from the account numbers to the description.
We will read this excel file into two DataFrame objects:
import pandas as pd
with pd.ExcelFile("/data1/net_income_method_2020.xlsx") as xl:
accounts2descr = xl.parse("account numbers",
index_col=0)
journal = xl.parse("journal",
index_col=0,
)
journal.index = pd.to_datetime(journal.index)
journal.index
OUTPUT:
DatetimeIndex(['2020-04-02', '2020-04-02', '2020-04-02', '2020-04-02', '2020-04-02', '2020-04-02', '2020-04-05', '2020-04-05', '2020-04-05', '2020-04-05', '2020-04-09', '2020-04-09', '2020-04-10', '2020-04-10', '2020-04-10', '2020-04-10', '2020-04-10', '2020-04-10', '2020-04-13', '2020-04-13', '2020-04-13', '2020-04-26', '2020-04-26', '2020-04-26', '2020-04-26', '2020-04-27', '2020-05-03', '2020-05-03', '2020-05-03', '2020-05-03', '2020-05-05', '2020-05-05', '2020-05-08', '2020-05-09', '2020-05-10', '2020-05-11', '2020-05-11', '2020-05-11', '2020-05-11', '2020-05-11', '2020-05-13', '2020-05-18', '2020-05-25', '2020-05-25', '2020-06-01', '2020-06-02', '2020-06-03', '2020-06-03', '2020-06-04', '2020-06-04', '2020-06-09', '2020-06-10', '2020-06-10', '2020-06-11', '2020-06-11', '2020-06-11', '2020-06-11', '2020-06-11', '2020-06-12', '2020-06-13', '2020-06-13', '2020-06-26', '2020-06-26', '2020-06-27', '2020-07-02', '2020-07-03', '2020-07-05', '2020-07-05', '2020-07-08', '2020-07-09', '2020-07-10', '2020-07-10', '2020-07-10', '2020-07-10', '2020-07-10', '2020-07-10', '2020-07-11', '2020-07-11', '2020-07-13', '2020-07-18', '2020-07-23', '2020-07-23', '2020-07-25', '2020-07-25', '2020-07-27', '2020-07-26', '2020-07-28'], dtype='datetime64[ns]', name='date', freq=None)
The first one is the tab "account numbers" which contains the mapping from the account numbers to the description of the accounts:
accounts2descr
description | |
---|---|
account | |
4400 | revenue plant Munich |
4401 | revenue plant Frankfurt |
4402 | revenue plant Berlin |
2010 | souvenirs |
2020 | clothes |
2030 | other articles |
2050 | books |
2100 | insurances |
2200 | wages |
2300 | loans |
2400 | hotels |
2500 | petrol |
2600 | telecommunication |
2610 | internet |
The second data sheet "journal" contains the actual journal entries:
journal[:10]
account number | document number | description | tax rate | gross amount | |
---|---|---|---|---|---|
date | |||||
2020-04-02 | 4402 | 8983233038 | Zurkan, Köln | 19 | 4105.98 |
2020-04-02 | 2010 | 57550799 | Birmann, Souvenirs | 19 | -1890.00 |
2020-04-02 | 2200 | 14989004 | wages | 0 | -17478.23 |
2020-04-02 | 2500 | 12766279 | Filling Station, Petrol | 19 | -89.40 |
2020-04-02 | 4400 | 3733462359 | EnergyCom, Hamburg | 19 | 4663.54 |
2020-04-02 | 4402 | 7526058231 | Enoigo, Strasbourg | 19 | 2412.82 |
2020-04-05 | 4402 | 1157284466 | Qbooks, Frankfurt | 7 | 2631.42 |
2020-04-05 | 4402 | 7009463592 | Qbooks, Köln | 7 | 3628.45 |
2020-04-05 | 2020 | 68433353 | Jamdon, Clothes | 19 | -1900.00 |
2020-04-05 | 2010 | 53353169 | Outleg, Souvenirs | 19 | -2200.00 |
There are many ways to analyze this data. We can for example sum up all the accounts:
account_sums = journal[["account number", "gross amount"]].groupby("account number").sum()
account_sums
gross amount | |
---|---|
account number | |
2010 | -4090.00 |
2020 | -10500.80 |
2030 | -1350.00 |
2050 | -900.00 |
2100 | -612.00 |
2200 | -69912.92 |
2300 | -18791.92 |
2400 | -1597.10 |
2500 | -89.40 |
2600 | -492.48 |
2610 | -561.00 |
4400 | 37771.84 |
4401 | 69610.35 |
4402 | 61593.99 |
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Account Charts
What about showing a pie chart of these sums? We encounter one problem: Pie charts cannot contain negative values. However this is not a real problem. We can split the accounts into income and expense accounts. Of course, this corresponds more to what we really want to see.
Charts for the Income Accounts
We create a DataFrame with the income accounts:
income_accounts = account_sums[account_sums["gross amount"] > 0]
income_accounts
gross amount | |
---|---|
account number | |
4400 | 37771.84 |
4401 | 69610.35 |
4402 | 61593.99 |
We can now visualize these values in a pie chart.
plot = income_accounts.plot(y='gross amount', figsize=(5, 5), kind="pie")
You probably don't like the position of the legend? With the parameter bbox_to_anchor we can position it on a desired position. So, if we want to, we can even move it outside of the plot. With the relative coordinate values (0.5, 0.5)
we position the legend in the center of the plot. The legend is a box structure. So the question is what does the postion (0.5, 0.5) mean? We can define this by using the parameter loc
additionally:
loc value | meaning |
---|---|
upper left | bbox_to_anchor describes the position of the left upper corner of the legend box |
upper right | correspondingly the upper right corner |
lower left | the lower left corner |
lower left | the lower right corner |
We use this to position the legend with its left upper corner positioned in the middle of the plot:
plot = income_accounts.plot(y='gross amount',
figsize=(5, 5),
kind="pie")
plot.legend(bbox_to_anchor=(0.5, 0.5),
loc="upper left")
OUTPUT:
<matplotlib.legend.Legend at 0x7f172ca03a90>
Now we position the lower right corner of the legend into the center of the plot:
plot = income_accounts.plot(y='gross amount', figsize=(5, 5), kind="pie")
plot.legend(bbox_to_anchor=(0.5, 0.5),
loc="lower right")
OUTPUT:
<matplotlib.legend.Legend at 0x7f172c9c5590>
There is another thing we can improve. We see the labels 4400, 4401, and 4402 beside of each pie segment. In addition, we see them in the legend. This is ugly and redundant information. In the following we will turn the labels off, i.e. set them to an empty string, in the plot and we explicitly set them in the legend method:
plot = income_accounts.plot(y='gross amount',
figsize=(5, 5),
kind="pie",
labels=['', '', ''])
plot.legend(bbox_to_anchor=(0.5, 0.5),
labels=income_accounts.index)
OUTPUT:
<matplotlib.legend.Legend at 0x7f172c956550>
Now, we are close to perfection. Just one more tiny thing. Some might prefer to see the actual description text rather than an account number. We will cut out this information from the DataFrame accounts2descr
by using loc
and the list of desired numbers [4400, 4401, 4402]
. The result of this operation will be the argument of the set_index
method. (Atention: reindex
is not giving the wanted results!)
descriptions = accounts2descr["description"].loc[[4400, 4401, 4402]]
plot = income_accounts.plot(kind="pie",
y='gross amount',
figsize=(5, 5),
labels=['', '', ''])
plot.legend(bbox_to_anchor=(0.5, 0.5),
loc="lower left",
labels=descriptions)
OUTPUT:
<matplotlib.legend.Legend at 0x7f172c8ba8d0>
Would you prefer a bar chart? No problem, we just have to set the parameter kind
to bar
instead of pie
:
plot = income_accounts.plot(y='gross amount',
figsize=(5, 5),
kind="bar",
legend=False)
For bar charts we have to set kind
to barh
. So that it doesn't get too boring, we can also color the bars by passing a list of colors to the color parameter:
plot = income_accounts.plot(y='gross amount',
figsize=(5, 5),
kind="barh",
legend=False,
color=['green', 'orange', 'blue'])
Charts for the Expenses Accounts
we can do the same now with our debitors (expenses accounts):
expenses_accounts = account_sums[account_sums["gross amount"] < 0]
expenses_accounts
gross amount | |
---|---|
account number | |
2010 | -4090.00 |
2020 | -10500.80 |
2030 | -1350.00 |
2050 | -900.00 |
2100 | -612.00 |
2200 | -69912.92 |
2300 | -18791.92 |
2400 | -1597.10 |
2500 | -89.40 |
2600 | -492.48 |
2610 | -561.00 |
acc2descr_expenses = accounts2descr["description"].loc[expenses_accounts.index]
acc2descr_expenses
OUTPUT:
account number 2010 souvenirs 2020 clothes 2030 other articles 2050 books 2100 insurances 2200 wages 2300 loans 2400 hotels 2500 petrol 2600 telecommunication 2610 internet Name: description, dtype: object
expenses_accounts.set_index(acc2descr_expenses.values, inplace=True)
expenses_accounts *= -1
labels = [''] * len(expenses_accounts)
plot = expenses_accounts.plot(kind="pie",
y='gross amount',
figsize=(5, 5),
labels=labels)
plot.legend(bbox_to_anchor=(0.5, 0.5),
labels=expenses_accounts.index)
OUTPUT:
<matplotlib.legend.Legend at 0x7f172868f0d0>
Tax Sums
We will sum up the amount according to their tax rate.
journal.drop(columns=["account number"])
document number | description | tax rate | gross amount | |
---|---|---|---|---|
date | ||||
2020-04-02 | 8983233038 | Zurkan, Köln | 19 | 4105.98 |
2020-04-02 | 57550799 | Birmann, Souvenirs | 19 | -1890.00 |
2020-04-02 | 14989004 | wages | 0 | -17478.23 |
2020-04-02 | 12766279 | Filling Station, Petrol | 19 | -89.40 |
2020-04-02 | 3733462359 | EnergyCom, Hamburg | 19 | 4663.54 |
... | ... | ... | ... | ... |
2020-07-25 | 5204418668 | BoKoData, Bodensee, Konstanz | 19 | 3678.38 |
2020-07-25 | 85241331 | Hotel, Konstanz | 7 | -583.00 |
2020-07-27 | 26865618 | Hotel, Franfurt | 7 | -450.00 |
2020-07-26 | 5892708524 | Oscar Zopvar KG, Luxemburg | 19 | 2589.80 |
2020-07-28 | 1633775505 | Oscar Zopvar KG, Luxemburg | 19 | 3578.46 |
87 rows × 4 columns
In the following we will define a function tax_sums
that calculates the VAT sums according to tax rates from a journal DataFrame:
def tax_sums(journal_df, months=None):
""" Returns a DataFrame with sales and tax rates -
If a number or list is passed to 'months', only the sales
of the corresponding months will be used.
Example: tax_sums(df, months=[3, 6]) will only use the months
3 (March) and 6 (June)"""
if months:
if isinstance(months, int):
month_cond = journal_df.index.month == months
elif isinstance(months, (list, tuple)):
month_cond = journal_df.index.month.isin(months)
positive = journal_df["gross amount"] > 0
# sales_taxes eq. umsatzsteuer
sales_taxes = journal_df[positive & month_cond]
negative = journal_df["gross amount"] < 0
# input_taxes equivalent to German Vorsteuer
input_taxes = journal_df[negative & month_cond]
else:
sales_taxes = journal_df[journal_df["gross amount"] > 0]
input_taxes = journal_df[journal_df["gross amount"] < 0]
sales_taxes = sales_taxes[["tax rate", "gross amount"]].groupby("tax rate").sum()
sales_taxes.rename(columns={"gross amount": "Sales Gross"},
inplace=True)
sales_taxes.index.name = 'Tax Rate'
input_taxes = input_taxes[["tax rate", "gross amount"]].groupby("tax rate").sum()
input_taxes.rename(columns={"gross amount": "Expenses Gross"},
inplace=True)
input_taxes.index.name = 'Tax Rate'
taxes = pd.concat([input_taxes, sales_taxes], axis=1)
taxes.insert(1,
column="Input Taxes",
value=(taxes["Sales Gross"] * taxes.index / 100).round(2))
taxes.insert(3,
column="Sales Taxes",
value=(taxes["Expenses Gross"] * taxes.index / 100).round(2))
return taxes.fillna(0)
tax_sums(journal)
Expenses Gross | Input Taxes | Sales Gross | Sales Taxes | |
---|---|---|---|---|
Tax Rate | ||||
0 | -90102.20 | 0.00 | 8334.43 | -0.00 |
7 | -3847.10 | 786.85 | 11240.71 | -269.30 |
19 | -14948.32 | 28386.20 | 149401.04 | -2840.18 |
stsum_5 = tax_sums(journal, months=5)
stsum_6 = tax_sums(journal, months=6)
stsum_5
Expenses Gross | Input Taxes | Sales Gross | Sales Taxes | |
---|---|---|---|---|
Tax Rate | ||||
0 | -22411.53 | 0.00 | 0.00 | -0.00 |
7 | -900.00 | 0.00 | 0.00 | -63.00 |
19 | -145.00 | 5952.51 | 31328.98 | -27.55 |
tax_sums(journal, months=[5, 6])
Expenses Gross | Input Taxes | Sales Gross | Sales Taxes | |
---|---|---|---|---|
Tax Rate | ||||
0 | -44812.14 | 0.00 | 6479.47 | -0.00 |
7 | -900.00 | 348.66 | 4980.84 | -63.00 |
19 | -268.12 | 12520.85 | 65899.23 | -50.94 |
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Upcoming online Courses