Commit graph

121 commits

Author SHA1 Message Date
Dominik Pataky 405f9c6a67 IPFIX: replace IPFIX_FIELD_TYPES with class; handle signed
In IPFIX, template fields can be signed or unsigned, or even be pure
bytes or unicode string. This differentiation was extended in this
commit.

Additionally, the IPFIX_FIELD_TYPES dict mapping from int->str was
replaced by a more verbose version, which also includes the standardized
IANA data types. The class' methods provides access to the fixed data
set. This is then used in the IPFIXDataRecord parser.

Refs #25
2020-04-04 15:21:53 +02:00
Dominik Pataky f7a44852c3 Tests: add memory performance test for v1 and v5; bump version to 0.10.1 2020-04-04 10:58:06 +02:00
Dominik Pataky 959f8d3c2c Tests: add parameter store_packets to send_recv_packets
The function send_recv_packets in tests stored all processed
ExportPackets by default in a list. Memory usage tests were therefore
based on this high amount of stored objects, since no instance of any
ExportPacket was deleted until exit.
With the new parameter store_packets the caller can define how many
packets should be stored during receiving, as to test multiple
scenarios.

Three such scenarios are implemented: don't store any packet, store
maximum of 500 at a time and store all packets. This comes much closer
to the real world scenario of the collector, which uses a "for export in
listener.get" loop, dumping any new ExportPacket to file immediatelly
and then deleting the object.

Yet, the case where all packets are stored must still be covered as
well, because the collector might not be the only implementation which
uses listener.get, so finding memory leaks should be covered.
2020-04-03 17:28:16 +02:00
Dominik Pataky 53f8ca764e Tests: add memory performance tests
A new test file is added which contains memory and CPU tests. For now,
only the memory usage tests work (threading!). They print out tables of
memory usage based on file path and on function. Additionally, they check
some basic measurements: if all packets were processed and if a
collection of version 9/10 called any functions in 10/9.

Refs #24
2020-04-03 15:36:09 +02:00
Dominik Pataky 258b7c1e0b Tests: move packets into lib again, add packet generator
The static packets in the tests are back in lib.py to avoid circular
imports. A new packet generator function was added.
2020-04-03 15:20:41 +02:00
Dominik Pataky 55272e8a0a Fix analyzer test; IPFIX: change handling of 16 bytes fields
Analyzer test was missing imports.

IPFIX templates with 16 bytes fields were processed extra, since struct
does not natively support conversion to int. The new implementation
still handles it extra, but uses struct's "s" unpack format descriptor
now.
2020-04-03 10:29:38 +02:00
Dominik Pataky 27525887bd Update README to reflect IPFIX implementation; bump version to v0.10.0
Resolves #20
2020-04-01 14:40:21 +02:00
Dominik Pataky 547792c5c2 Tests: move packets into each version test file; add tests for IPFIX
The previously introduced tests/lib.py contained the NetFlow v9 packets
and then the IPFIX packets, those were split and put into their
respective test files again. The lib now contains shared objects only.

For IPFIX tests were added. Two new packets were added, one with
templates and one without (again, real exports from softflowd).
Different cases are checked: no template, template and later template.
Fields of flows are also checked, especially IPv6 addresses.

Note: exports made with softflowd were created by softflowd 1.0.0,
compiled from https://github.com/irino/softflowd
2020-04-01 14:15:53 +02:00
Dominik Pataky dfe0ffdcc7 IPFIX: adapt templates attribute handling to IPFIX as well 2020-04-01 14:14:47 +02:00
Dominik Pataky 143986c38d Fix multi-exception catch in collector; make templates @property in v9
The collector should catch both v9 and IPFIX template errors - syntax
error corrected. The v9 ExportPacket.templates attribute is now
@property and read-only.
2020-04-01 14:12:27 +02:00
Dominik Pataky 56d443aa2a Refactor tests, moved into tests/
The tests are now located in tests/. They are also split into multiple
files, beginning with test_netflow and test_analyzer. The tests for
IPFIX will be added to test_ipfix.
2020-04-01 11:55:45 +02:00
Dominik Pataky 4b8cbf92bc IPFIX: implement field types of 16 bytes in parser
Python struct does not natively support 16 byte fields. But since IPFIX
uses fields of length 16 bytes for at least IPv6 addresses, they must be
processed in the IPFIX parser. This commit adds support for 16 byte
fields by handling them as special struct.unpack cases.
2020-04-01 11:34:34 +02:00
Dominik Pataky d2e1bc8c83 IPFIX: reformat IANA field types dict (adding the data type) 2020-04-01 09:46:32 +02:00
Dominik Pataky c3da0b2096 Adapt utils, collector, analyzer to IPFIX
At differnt points in the tool set, NetFlow (v9) is set as the default
case. Now that IPFIX is on its way to be supported as well, adapt all
occurences where a differentiation must be done.
2020-03-31 22:47:23 +02:00
Dominik Pataky 937e640198 IPFIX: implement data records and template handling; add IANA types
Second half of the IPFIX implementation now adds the support for data
records. The templates are also extracted, allowing the collector to use
them across exports.

The field types were extracted from the IANA assignment list at
https://www.iana.org/assignments/ipfix/ipfix-information-elements.csv

Please note that the IPFIX implementation was made from scratch and
differs from the NetFlow v9 implementation, as there was little
copy/paste.
2020-03-31 22:45:58 +02:00
Dominik Pataky 524e411850 Add first approach of IPFIX implementation
Adds a new module, IPFIX. The collector already recognizes version 10 in
the header, meaning IPFIX. The parser is able to dissect the export
package and all sets with their headers.

Missing is the handling of the templates in the data sets - a feature
needed for the whole parsing process to complete.
2020-03-31 20:58:15 +02:00
Dominik Pataky 0358c3416c Fix logger in collector; fix header dates 2020-03-31 16:28:33 +02:00
Dominik Pataky cd07885d28 Improve handling of mixed template/data exports; add test
The collector is able to parse templates in an export and then use these
templates to parse dataflows inside the same export packet. But the test
implementation was based on the assumption, that the templates always
arrive first in the packet. Now, a mixed order is also processed
successfully. Test included.
2020-03-30 16:42:48 +02:00
Dominik Pataky d4d6d59713 Provide parse_packet as API; fix parse_packet input handling; README
To get closer to a stable package, netflow now offers the parse_packet
function in its top-level __init__ file. This function was also enhanced
to handle multiple input formats (str, bytes, hex bytes).

Updated README accordingly.
2020-03-30 13:04:25 +02:00
Dominik Pataky 7ae179cb33 Reformat data flow attributes and unpacking; adapt tests
The V1DataFlow and V5DataFlow classes used a verbose way of unpacking
the hex byte stream to the specific fields until now. With this commit,
both use a list of field names, one struct.unpack call and then a
mapping for-loop for each field.

Additionally the upper boundary of the passed data slice was added.

With the self.__dict__.update() call all fields are now also accessible
as direct attributes of the corresponding instance, e.g. flow.PROTO to
access flow.data["PROTO"]. This works for flows of all three versions.

The tests were adapted to reflect this new implementation.
2020-03-30 12:29:50 +02:00
Dominik Pataky 8b70fb1058 Fix to_dict() in headers; formatting
The collector uses the .to_dict() function to persist the header in its
gzipped output file. Now all headers implement this function.
2020-03-29 23:17:05 +02:00
Dominik Pataky 4a90e0ce34 Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00
Dominik Pataky 5765fa31cf Rename test file; fix analyzer test
Tests are now all running, not skipping the analyzer test.
Adapted to the new CLI calling method for the subprocess.
2020-03-29 22:33:26 +02:00
Dominik Pataky abce1f57dd Move collector and analyzer into the package, callable via CLI
Beginning with this commit, the reference implementations of the
collector and analyzer are now included in the package. They are
callable by running `python3 -m netflow.collector` or `.analyzer`, with
the same flags as before. Use `-h` to list them.

Additional fixes are contained in this commit as well, e.g. adding more
version prefixes and moving parts of code from __init__ to utils, to fix
circular imports.
2020-03-29 22:14:45 +02:00
Dominik Pataky 9d2bc21ae2 Extend and reformat tests, add tests for v1 and v5, bump version
The tests are now also parsing export packets for version 1 and 5.
Version 9 received an additional test, inspecting the data inside the
export.

All new packet hex dumps were created by using a Docker container with
alpine Linux, running a softflowd daemon inside and then pinging the
Docker host IP. After review with "softflowctl dump-flows" issueing
"softflowctl expire-all" sends the packets away to the collector (should
be an IP address outside of the Docker bridge). The export network
packets are then collected with Wireshark running in the host namespace,
capturing on the Docker bridge.

Bump version to v0.8.3

Resolves #13
Resolves #14
Refs #18
2020-03-29 19:57:13 +02:00
Dominik Pataky e8073013c1 Rename classes in v1, v5 and v9 according to version
Until now, every NetFlow version file used similar names for their
classes, e.g. "Header". These are now prefixed with their respective
version, e.g. "V1Header", to avoid confusion in imports etc.
2020-03-29 19:49:57 +02:00
Dominik Pataky 5fd4e9bd24 Update README/setup.py; add .json property to v9 header for export
The README and setup.py were adapted to the current state, preparing for
PyPI upload and package info.

In v9, the header received an additional .json property, which exports
the header as a dict to allow JSON serialization in the export file.
This export is used in main.py
2020-03-29 18:06:52 +02:00
Dominik Pataky f8c5717002 Extend analyzer with --no-dns and --match-host; fixes
This commit extends the analyzer script with two new flags:
* Adding --no-dns disables hostname DNS resolution, improving speed
* Adding --match-host <IP address> filters all flows not matching the IP

Additional small things were changed, the script is still work in
progress. Especially the "pairing" of two flows will be removed in
future versions.
2020-03-19 18:16:03 +01:00
Dominik Pataky 4639601798 Extend logging in collector (with --verbose) 2020-03-19 18:16:03 +01:00
Dominik Pataky 0d8f1a2ecb Add ParsedPacket named tuple for queue; extend tests
Before, the output queue of the collector received unnamed tuples with
three fields. This broke the tests and was less understandable. The new
version uses a named tuple for clarity.

The tests were adapted to the new type in the queue and are fixed.

For backwards compatibility a check of the Python version is added and
the subprocess stdout/stderr arguments are passed depending on this
version. See #18.
2020-03-19 18:16:03 +01:00
Dominik Pataky 290e822176 Add IPv6 local interface address handling
With the '--host' flag, a local interface IP address can be set on which
the collector listens for incoming flows. Since now, this only worked
with IPv4 addresses (using the default 0.0.0.0 interface). The commit
adds handling of passed-in IPv6 addresses by identifying ":" and then
switching to the AF_INET6 socket family.
2020-03-19 18:12:50 +01:00
cookie 647f4b3748
Merge pull request #19 from grafolean/fix/tests
Fix failing tests (wrong index when accessing netflow records)
2020-03-19 15:38:27 +00:00
Anze 096c7d6f4f Fix failing tests (wrong index when accessing netflow records) 2020-02-22 23:20:36 +01:00
Dominik Pataky 565f829945 Add verbose flag to analyzer
Adds a new flag, '-v' or '--verbose', to the analyzer.py script. It uses
a new print method and also skips some parts of the script if not passed
on the CLI.
2020-01-20 17:01:50 +01:00
Dominik Pataky adb02eab24 Update to 2020 in file headers; update the analyzer file name in README
The analyzer is now found in analyzer.py and uses the '-f' flag for
GZIPed input files. Bundled with the previous PR commit, this update
should now be clearer.
2020-01-20 16:59:36 +01:00
cookie 52d357b111
Merge pull request #12 from kaysiz/patch-1
Update README.md to match new file format (GZIP instead of JSON).
Thanks for the PR!
2020-01-20 15:13:14 +00:00
kudakwashe siziva 59652f7d2f
Update README.md
Changed file extension from json to gz
2020-01-17 10:43:21 +02:00
Dominik Pataky 61439ec6ef Improve analyzer (handling of pairs, dropping noise)
Previously, the analyzer assumed that two consecutive flows would be a
pair. This proved unreliable, therefore a new comparison algorithm is
ussed. It utilizes the IP addresses and the 'first_switched' parameter
to identify two flows of the same connection.

More improvements can be done, especially filtering and in the
identification of the initiating peer.

Tests still fail, have to be adapted to the new dicts and gzip.
2019-11-03 15:58:40 +01:00
Dominik Pataky eff99fc6e3 Add client info to stored data
Until now, packets arriving at the collector's interface were stored by
timestamp, with the exported flows in the payload. This format is now
extended to also store the client's IP address and port, allowing
multiple clients to export flows to the same collector instance.
2019-11-03 13:57:06 +01:00
Dominik Pataky 1646a52f17 Store IP addresses (v4 + v6) as strings rather than ints
As mentioned by @pR0Ps in 6b9d20c8a6/analyze_json.py (L83)
IP addresses, especially in IPv6, should better be stored as parsed
strings instead of their raw integer values. Implemented.
2019-11-03 13:35:32 +01:00
Dominik Pataky 6b9d20c8a6 Refactor storing data and writing to disk - using gzip and lines
In previous versions, collected flows (parsed data) were stored in
memory by the collector. In regular intervals, or at shutdown, this one
single dict was dumped as JSON onto disk.

With this commit, the behaviour is changed to line-based JSON dumps for
each flow, gzipped onto disk for storage efficiency. The analyze_json is
updated as well to handle the new gzipped files in the new format.

See the comments in main.py for more details.

Fixes #10
2019-11-03 12:02:05 +01:00
Dominik Pataky 3dee135a22 Merge branch 'props-master'
Merging pull request #9 by @pR0Ps
https://github.com/bitkeks/python-netflow-v9-softflowd/pull/9

Thanks for the contribution!

Resolves #9
2019-10-31 18:02:06 +01:00
Dominik Pataky 9f16d246a5 Add v1, v5 to README; change fallback; add timeout parameter
Updated the README to reference NetFlow v1 and v5 as well.

The fallback(key, dict) method used an exception-based testing of the
keys existence. Switched to 'if x in'.

The NetFlowListener is based on threading.Thread, which uses the
'timeout' parameter in .join(). Added.
2019-10-31 17:55:48 +01:00
Dominik Pataky bfec3953e6 Bump version, fix small errors, decrease packet num in tests 2019-10-31 17:35:15 +01:00
Carey Metcalfe 345a5b08ff Fix setup.py file 2019-10-16 23:46:32 -04:00
Carey Metcalfe bf92f24669 Add test for invalid packets 2019-10-16 23:46:32 -04:00
Carey Metcalfe 96817f1f8d Add support for v1 and v5 NetFlow packets
Thanks to @alieissa for the initial v1 and v5 code
2019-10-16 23:46:32 -04:00
Carey Metcalfe 186b648c4d Fix tests
Uses the analyzer's new stdin-reading capabilities to test the analysis
without having to write temporary files. Also removes most of the delays
because the listener can keep up now.
2019-10-16 23:44:28 -04:00
Carey Metcalfe 8e6d0c54e8 Allow analyze_json.py to accept input via stdin
This will make testing much cleaner in the future (no temp files needed)

Also increase performance by memoizing the hostname lookup
2019-10-16 23:44:19 -04:00
Carey Metcalfe 11dc92269c Refactor code to make programatic access to flows easier
This commit splits the packet collecting and processing out into a
thread that provides a queue-like `get(block=True, timeout=None)`
function for getting processed `ExportPackets`.

This makes it much easier to use rather than starting a generator and
sending a value to it when you want to stop. The `get_export_packets`
generator is an example of using it - it just starts the thread and
yields values from it.
2019-10-16 23:33:22 -04:00