Go to file

Dominik Pataky e8073013c1 Rename classes in v1, v5 and v9 according to version Until now, every NetFlow version file used similar names for their classes, e.g. "Header". These are now prefixed with their respective version, e.g. "V1Header", to avoid confusion in imports etc.		2020-03-29 19:49:57 +02:00
netflow	Rename classes in v1, v5 and v9 according to version	2020-03-29 19:49:57 +02:00
.gitignore	Add JSON export and analyzing example script	2017-10-28 19:00:18 +02:00
analyzer.py	Extend analyzer with --no-dns and --match-host; fixes	2020-03-19 18:16:03 +01:00
LICENSE	Update to 2020 in file headers; update the analyzer file name in README	2020-01-20 16:59:36 +01:00
main.py	Update README/setup.py; add .json property to v9 header for export	2020-03-29 18:06:52 +02:00
README.md	Update README/setup.py; add .json property to v9 header for export	2020-03-29 18:06:52 +02:00
setup.py	Update README/setup.py; add .json property to v9 header for export	2020-03-29 18:06:52 +02:00
tests.py	Add ParsedPacket named tuple for queue; extend tests	2020-03-19 18:16:03 +01:00

README.md

Python NetFlow library

This package contains libraries for NetFlow versions 1, 5 and 9. Use the additional scripts in the repo to collect and parse incoming UDP NetFlow packets.

Version 9 is the first NetFlow version using templates. Templates make dynamically sized and configured NetFlow data flowsets possible, which makes the collector's job harder.

Licensed under MIT License. See LICENSE.

Using the collector and analyzer

In this repo you also find main.py and analyzer.py.

To start an example collector run python3 main.py -p 9000 -D. This will run a collector at port 9000 in debug mode. Point your flow exporter to this port on your host and after some time the first ExportPackets should appear (the flows need to expire first).

After you collected some data, main.py exports them into GZIP files, simply named <timestamp>.gz.

To analyze the saved traffic, run analyzer.py -f <gzip file>. In my example script this will look like the following, with resolved hostnames and services, transfered bytes and connection duration:

2017-10-28 23:17.01: SSH     | 4.25M    | 15:27 min | localmachine-2 (<IPv4>) to localmachine-1 (<IPv4>)
2017-10-28 23:17.01: SSH     | 4.29M    | 16:22 min | remotemachine (<IPv4>) to localmachine-2 (<IPv4>)
2017-10-28 23:19.01: HTTP    | 22.79M   | 47:32 min | uwstream3.somafm.com (173.239.76.148) to localmachine-1 (<IPv4>)
2017-10-28 23:22.01: HTTPS   | 1.21M    | 3 sec     | fra16s12-in-x0e.1e100.net (2a00:1450:4001:818::200e) to localmachine-1 (<IPv6>)
2017-10-28 23:23.01: SSH     | 93.79M   | 21 sec    | remotemachine (<IPv4>) to localmachine-2 (<IPv4>)
2017-10-28 23:51.01: SSH     | 14.08M   | 1:23.09 hours | remotemachine (<IPv4>) to localmachine-2 (<IPv4>)

Feel free to customize the analyzing script, e.g. make it print some nice graphs or calculate broader statistics.

Please note that the analyzer is experimental and has some rough edges. Do not rely on it in monitoring use cases!

Resources

Development environment

I have specifically written this script in combination with NetFlow exports from softflowd v0.9.9 - it should work with every correct NetFlow v9 implementation though.

Running and creating tests

The file tests.py contains some tests based on real softflowd export packets. To create the test packets try the following:

Run tcpdump/Wireshark on your interface
Produce some sample flows, e.g. surf the web and refresh your mail client.
Save the pcap file to disk.
Run tcpdump/Wireshark again on an interface.
Run softflowd with the -r <pcap_file> flag. softflowd reads the captured traffic, produces the flows and exports them. Use the interface you are capturing packets on to send the exports.
Examine the captured traffic. Use Wireshark and set the CFLOW "decode as" dissector on the export packets (e.g. based on the port). The data fields should then be shown correctly as Netflow payload.
Extract this payload as hex stream. Anonymize the IP addresses with a hex editor if necessary. A recommended hex editor is bless.

The collector is run in a background thread. The difference in transmission speed from the exporting client can lead to different results, possibly caused by race conditions during the usage of the GZIP output file.