netflow/README.md

# Python NetFlow library
This package contains libraries and tools for **NetFlow versions 1, 5 and 9**.

Version 9 is the first NetFlow version using templates. Templates make dynamically sized and configured NetFlow data flowsets possible, which makes the collector's job harder. By importing `netflow.v1`, `netflow.v5` or `netflow.v9` you have direct access to the respective parsing objects, but at the beginning you probably will have more success by running the reference collector.

Copyright 2016-2020 Dominik Pataky <dev@bitkeks.eu>

Licensed under MIT License. See LICENSE.


## Using the collector and analyzer
Since v0.9.0 the `netflow` library also includes reference implementations of a collector and an analyzer as CLI tools.
These can be used on the CLI with `python3 -m netflow.collector` and `python3 -m netflow.analyzer`. Use the `-h` flag to receive the respective help output with all provided CLI flags.

Example: to start the collector run `python3 -m netflow.collector -p 9000 -D`. This will start a collector instance at port 9000 in debug mode. Point your flow exporter to this port on your host and after some time the first ExportPackets should appear (the flows need to expire first). After you collected some data, the collector exports them into GZIP files, simply named `<timestamp>.gz` (or the filename you specified with `--file`/`-o`).

To analyze the saved traffic, run `python3 -m netflow.analyzer -f <gzip file>`. The output will look similar to the following snippet, with resolved hostnames and services, transferred bytes and connection duration:

    2017-10-28 23:17.01: SSH     | 4.25M    | 15:27 min | localmachine-2 (<IPv4>) to localmachine-1 (<IPv4>)
    2017-10-28 23:17.01: SSH     | 4.29M    | 16:22 min | remotemachine (<IPv4>) to localmachine-2 (<IPv4>)
    2017-10-28 23:19.01: HTTP    | 22.79M   | 47:32 min | uwstream3.somafm.com (173.239.76.148) to localmachine-1 (<IPv4>)
    2017-10-28 23:22.01: HTTPS   | 1.21M    | 3 sec     | fra16s12-in-x0e.1e100.net (2a00:1450:4001:818::200e) to localmachine-1 (<IPv6>)
    2017-10-28 23:23.01: SSH     | 93.79M   | 21 sec    | remotemachine (<IPv4>) to localmachine-2 (<IPv4>)
    2017-10-28 23:51.01: SSH     | 14.08M   | 1:23.09 hours | remotemachine (<IPv4>) to localmachine-2 (<IPv4>)

**Please note that the collector and analyzer are experimental reference implementations. Do not rely on them in production monitoring use cases!** In any case I recommend looking into the `netflow/collector.py` and `netflow/analyzer.py` scripts for customization. Feel free to use the code and extend it in your own tool set - that's what the MIT license is for!


## Resources
* [Cisco NetFlow v9 paper](http://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html)
* [RFC "Cisco Systems NetFlow Services Export Version 9"](https://tools.ietf.org/html/rfc3954)

## Development environment
The library was specifically written in combination with NetFlow exports from [softflowd](https://github.com/djmdjm/softflowd) v0.9.9 - it should work with every correct NetFlow v9 implementation though. If you stumble upon new custom template fields please let me know, they will make a fine addition to the `netflow.v9.V9_FIELD_TYPES` collection.

### Running and creating tests
The test file contains some tests based on real softflowd export packets. During the development of this library, two ways of gathering these hex dumps were used. First, the tcpdump/Wireshark export way:

  1. Run tcpdump/Wireshark on your public-facing interface (with tcpdump, save the pcap to disk).
  2. Produce some sample flows, e.g. surf the web and refresh your mail client. With Wireshark, save the captured packets to disk.
  4. Run tcpdump/Wireshark again on a local interface.
  4. Run softflowd with the `-r <pcap_file>` flag. softflowd reads the captured traffic, produces the flows and exports them. Use the interface you are capturing packets on to send the exports to. E.g. capture on the localhost interface (with `-i lo` or on loopback) and then let softflowd export to `127.0.0.1:1337`.
  5. Examine the captured traffic. Use Wireshark and set the `CFLOW` "decode as" dissector on the export packets (e.g. based on the port). The `data` fields should then be shown correctly as Netflow payload.
  6. Extract this payload as hex stream. Anonymize the IP addresses with a hex editor if necessary. A recommended hex editor is [bless](https://github.com/afrantzis/bless).

Second, a Docker way:

  1. Run a Docker container, e.g. alpine Linux and install `softflowd` in it.
  2. Run a softflowd daemon in the background inside the container, listening on `eth0` and exporting to e.g. `172.17.0.1:1337`.
  3. On your host start Wireshark to listen on the Docker bridge.
  4. Create some traffic from inside the container.
  5. Check the softflow daemon with `softflowctl dump-flows`.
  6. If you have some flows shown to you, export them with `softflowctl expire-all`.
  7. Your Wireshark should have picked up the epxort packets (it does not matter if there's a port unreachable error).
  8. Set the decoder for the packets to `CFLOW` and copy the hex value from the NetFlow packet.

Your exported hex string should begin with `0001`, `0005` or `0009`, depending on the NetFlow version.

The collector is run in a background thread. The difference in transmission speed from the exporting client can lead to different results, possibly caused by race conditions during the usage of the GZIP output file.
Update README/setup.py; add .json property to v9 header for export The README and setup.py were adapted to the current state, preparing for PyPI upload and package info. In v9, the header received an additional .json property, which exports the header as a dict to allow JSON serialization in the export file. This export is used in main.py 2020-03-29 18:06:52 +02:00			`# Python NetFlow library`
Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			`This package contains libraries and tools for NetFlow versions 1, 5 and 9.`
Add LICENSE and README.md 2016-08-10 22:44:31 +02:00
Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			Version 9 is the first NetFlow version using templates. Templates make dynamically sized and configured NetFlow data flowsets possible, which makes the collector's job harder. By importing `netflow.v1`, `netflow.v5` or `netflow.v9` you have direct access to the respective parsing objects, but at the beginning you probably will have more success by running the reference collector.
Add LICENSE and README.md 2016-08-10 22:44:31 +02:00
Update to 2020 in file headers; update the analyzer file name in README The analyzer is now found in analyzer.py and uses the '-f' flag for GZIPed input files. Bundled with the previous PR commit, this update should now be clearer. 2020-01-20 16:56:41 +01:00			`Copyright 2016-2020 Dominik Pataky <dev@bitkeks.eu>`
Add tests for the collector (main.py). 2019-03-31 21:23:24 +02:00
Update README + LICENSE 2019-03-31 21:37:13 +02:00			`Licensed under MIT License. See LICENSE.`
Add tests for the collector (main.py). 2019-03-31 21:23:24 +02:00

Bump to 0.6; expand analyzer 2017-10-29 11:52:59 +01:00			`## Using the collector and analyzer`
Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			Since v0.9.0 the `netflow` library also includes reference implementations of a collector and an analyzer as CLI tools.
			These can be used on the CLI with `python3 -m netflow.collector` and `python3 -m netflow.analyzer`. Use the `-h` flag to receive the respective help output with all provided CLI flags.
Bump to 0.6; expand analyzer 2017-10-29 11:52:59 +01:00
Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			Example: to start the collector run `python3 -m netflow.collector -p 9000 -D`. This will start a collector instance at port 9000 in debug mode. Point your flow exporter to this port on your host and after some time the first ExportPackets should appear (the flows need to expire first). After you collected some data, the collector exports them into GZIP files, simply named `<timestamp>.gz` (or the filename you specified with `--file`/`-o`).
Bump to 0.6; expand analyzer 2017-10-29 11:52:59 +01:00
Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			To analyze the saved traffic, run `python3 -m netflow.analyzer -f <gzip file>`. The output will look similar to the following snippet, with resolved hostnames and services, transferred bytes and connection duration:
Add duration to Connection 2017-10-29 19:38:33 +01:00
			`2017-10-28 23:17.01: SSH \| 4.25M \| 15:27 min \| localmachine-2 (<IPv4>) to localmachine-1 (<IPv4>)`
			`2017-10-28 23:17.01: SSH \| 4.29M \| 16:22 min \| remotemachine (<IPv4>) to localmachine-2 (<IPv4>)`
			`2017-10-28 23:19.01: HTTP \| 22.79M \| 47:32 min \| uwstream3.somafm.com (173.239.76.148) to localmachine-1 (<IPv4>)`
			`2017-10-28 23:22.01: HTTPS \| 1.21M \| 3 sec \| fra16s12-in-x0e.1e100.net (2a00:1450:4001:818::200e) to localmachine-1 (<IPv6>)`
			`2017-10-28 23:23.01: SSH \| 93.79M \| 21 sec \| remotemachine (<IPv4>) to localmachine-2 (<IPv4>)`
			`2017-10-28 23:51.01: SSH \| 14.08M \| 1:23.09 hours \| remotemachine (<IPv4>) to localmachine-2 (<IPv4>)`
Bump to 0.6; expand analyzer 2017-10-29 11:52:59 +01:00
Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			Please note that the collector and analyzer are experimental reference implementations. Do not rely on them in production monitoring use cases! In any case I recommend looking into the `netflow/collector.py` and `netflow/analyzer.py` scripts for customization. Feel free to use the code and extend it in your own tool set - that's what the MIT license is for!
Update to 2020 in file headers; update the analyzer file name in README The analyzer is now found in analyzer.py and uses the '-f' flag for GZIPed input files. Bundled with the previous PR commit, this update should now be clearer. 2020-01-20 16:56:41 +01:00
Bump to 0.6; expand analyzer 2017-10-29 11:52:59 +01:00
Add LICENSE and README.md 2016-08-10 22:44:31 +02:00			`## Resources`
			`* [Cisco NetFlow v9 paper](http://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html)`
			`* [RFC "Cisco Systems NetFlow Services Export Version 9"](https://tools.ietf.org/html/rfc3954)`

			`## Development environment`
Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			The library was specifically written in combination with NetFlow exports from [softflowd](https://github.com/djmdjm/softflowd) v0.9.9 - it should work with every correct NetFlow v9 implementation though. If you stumble upon new custom template fields please let me know, they will make a fine addition to the `netflow.v9.V9_FIELD_TYPES` collection.
Update README + LICENSE 2019-03-31 21:37:13 +02:00
Update to 2020 in file headers; update the analyzer file name in README The analyzer is now found in analyzer.py and uses the '-f' flag for GZIPed input files. Bundled with the previous PR commit, this update should now be clearer. 2020-01-20 16:56:41 +01:00			`### Running and creating tests`
Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			`The test file contains some tests based on real softflowd export packets. During the development of this library, two ways of gathering these hex dumps were used. First, the tcpdump/Wireshark export way:`

			`1. Run tcpdump/Wireshark on your public-facing interface (with tcpdump, save the pcap to disk).`
			`2. Produce some sample flows, e.g. surf the web and refresh your mail client. With Wireshark, save the captured packets to disk.`
			`4. Run tcpdump/Wireshark again on a local interface.`
			4. Run softflowd with the `-r <pcap_file>` flag. softflowd reads the captured traffic, produces the flows and exports them. Use the interface you are capturing packets on to send the exports to. E.g. capture on the localhost interface (with `-i lo` or on loopback) and then let softflowd export to `127.0.0.1:1337`.
Update README + LICENSE 2019-03-31 21:37:13 +02:00			5. Examine the captured traffic. Use Wireshark and set the `CFLOW` "decode as" dissector on the export packets (e.g. based on the port). The `data` fields should then be shown correctly as Netflow payload.
			`6. Extract this payload as hex stream. Anonymize the IP addresses with a hex editor if necessary. A recommended hex editor is [bless](https://github.com/afrantzis/bless).`

Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00			`Second, a Docker way:`

			1. Run a Docker container, e.g. alpine Linux and install `softflowd` in it.
			2. Run a softflowd daemon in the background inside the container, listening on `eth0` and exporting to e.g. `172.17.0.1:1337`.
			`3. On your host start Wireshark to listen on the Docker bridge.`
			`4. Create some traffic from inside the container.`
			5. Check the softflow daemon with `softflowctl dump-flows`.
			6. If you have some flows shown to you, export them with `softflowctl expire-all`.
			`7. Your Wireshark should have picked up the epxort packets (it does not matter if there's a port unreachable error).`
			8. Set the decoder for the packets to `CFLOW` and copy the hex value from the NetFlow packet.

			Your exported hex string should begin with `0001`, `0005` or `0009`, depending on the NetFlow version.

Update README.md Changed file extension from json to gz 2020-01-17 09:43:21 +01:00			`The collector is run in a background thread. The difference in transmission speed from the exporting client can lead to different results, possibly caused by race conditions during the usage of the GZIP output file.`