netflow

Author	SHA1	Message	Date
Dominik Pataky	405f9c6a67	IPFIX: replace IPFIX_FIELD_TYPES with class; handle signed In IPFIX, template fields can be signed or unsigned, or even be pure bytes or unicode string. This differentiation was extended in this commit. Additionally, the IPFIX_FIELD_TYPES dict mapping from int->str was replaced by a more verbose version, which also includes the standardized IANA data types. The class' methods provides access to the fixed data set. This is then used in the IPFIXDataRecord parser. Refs #25	2020-04-04 15:21:53 +02:00
Dominik Pataky	f7a44852c3	Tests: add memory performance test for v1 and v5; bump version to 0.10.1	2020-04-04 10:58:06 +02:00
Dominik Pataky	959f8d3c2c	Tests: add parameter store_packets to send_recv_packets The function send_recv_packets in tests stored all processed ExportPackets by default in a list. Memory usage tests were therefore based on this high amount of stored objects, since no instance of any ExportPacket was deleted until exit. With the new parameter store_packets the caller can define how many packets should be stored during receiving, as to test multiple scenarios. Three such scenarios are implemented: don't store any packet, store maximum of 500 at a time and store all packets. This comes much closer to the real world scenario of the collector, which uses a "for export in listener.get" loop, dumping any new ExportPacket to file immediatelly and then deleting the object. Yet, the case where all packets are stored must still be covered as well, because the collector might not be the only implementation which uses listener.get, so finding memory leaks should be covered.	2020-04-03 17:28:16 +02:00
Dominik Pataky	53f8ca764e	Tests: add memory performance tests A new test file is added which contains memory and CPU tests. For now, only the memory usage tests work (threading!). They print out tables of memory usage based on file path and on function. Additionally, they check some basic measurements: if all packets were processed and if a collection of version 9/10 called any functions in 10/9. Refs #24	2020-04-03 15:36:09 +02:00
Dominik Pataky	258b7c1e0b	Tests: move packets into lib again, add packet generator The static packets in the tests are back in lib.py to avoid circular imports. A new packet generator function was added.	2020-04-03 15:20:41 +02:00
Dominik Pataky	55272e8a0a	Fix analyzer test; IPFIX: change handling of 16 bytes fields Analyzer test was missing imports. IPFIX templates with 16 bytes fields were processed extra, since struct does not natively support conversion to int. The new implementation still handles it extra, but uses struct's "s" unpack format descriptor now.	2020-04-03 10:29:38 +02:00
Dominik Pataky	27525887bd	Update README to reflect IPFIX implementation; bump version to v0.10.0 Resolves #20	2020-04-01 14:40:21 +02:00
Dominik Pataky	547792c5c2	Tests: move packets into each version test file; add tests for IPFIX The previously introduced tests/lib.py contained the NetFlow v9 packets and then the IPFIX packets, those were split and put into their respective test files again. The lib now contains shared objects only. For IPFIX tests were added. Two new packets were added, one with templates and one without (again, real exports from softflowd). Different cases are checked: no template, template and later template. Fields of flows are also checked, especially IPv6 addresses. Note: exports made with softflowd were created by softflowd 1.0.0, compiled from https://github.com/irino/softflowd	2020-04-01 14:15:53 +02:00
Dominik Pataky	dfe0ffdcc7	IPFIX: adapt templates attribute handling to IPFIX as well	2020-04-01 14:14:47 +02:00
Dominik Pataky	143986c38d	Fix multi-exception catch in collector; make templates @property in v9 The collector should catch both v9 and IPFIX template errors - syntax error corrected. The v9 ExportPacket.templates attribute is now @property and read-only.	2020-04-01 14:12:27 +02:00
Dominik Pataky	56d443aa2a	Refactor tests, moved into tests/ The tests are now located in tests/. They are also split into multiple files, beginning with test_netflow and test_analyzer. The tests for IPFIX will be added to test_ipfix.	2020-04-01 11:55:45 +02:00
Dominik Pataky	4b8cbf92bc	IPFIX: implement field types of 16 bytes in parser Python struct does not natively support 16 byte fields. But since IPFIX uses fields of length 16 bytes for at least IPv6 addresses, they must be processed in the IPFIX parser. This commit adds support for 16 byte fields by handling them as special struct.unpack cases.	2020-04-01 11:34:34 +02:00
Dominik Pataky	d2e1bc8c83	IPFIX: reformat IANA field types dict (adding the data type)	2020-04-01 09:46:32 +02:00
Dominik Pataky	c3da0b2096	Adapt utils, collector, analyzer to IPFIX At differnt points in the tool set, NetFlow (v9) is set as the default case. Now that IPFIX is on its way to be supported as well, adapt all occurences where a differentiation must be done.	2020-03-31 22:47:23 +02:00
Dominik Pataky	937e640198	IPFIX: implement data records and template handling; add IANA types Second half of the IPFIX implementation now adds the support for data records. The templates are also extracted, allowing the collector to use them across exports. The field types were extracted from the IANA assignment list at https://www.iana.org/assignments/ipfix/ipfix-information-elements.csv Please note that the IPFIX implementation was made from scratch and differs from the NetFlow v9 implementation, as there was little copy/paste.	2020-03-31 22:45:58 +02:00
Dominik Pataky	524e411850	Add first approach of IPFIX implementation Adds a new module, IPFIX. The collector already recognizes version 10 in the header, meaning IPFIX. The parser is able to dissect the export package and all sets with their headers. Missing is the handling of the templates in the data sets - a feature needed for the whole parsing process to complete.	2020-03-31 20:58:15 +02:00
Dominik Pataky	0358c3416c	Fix logger in collector; fix header dates	2020-03-31 16:28:33 +02:00
Dominik Pataky	cd07885d28	Improve handling of mixed template/data exports; add test The collector is able to parse templates in an export and then use these templates to parse dataflows inside the same export packet. But the test implementation was based on the assumption, that the templates always arrive first in the packet. Now, a mixed order is also processed successfully. Test included.	2020-03-30 16:42:48 +02:00
Dominik Pataky	d4d6d59713	Provide parse_packet as API; fix parse_packet input handling; README To get closer to a stable package, netflow now offers the parse_packet function in its top-level __init__ file. This function was also enhanced to handle multiple input formats (str, bytes, hex bytes). Updated README accordingly.	2020-03-30 13:04:25 +02:00
Dominik Pataky	7ae179cb33	Reformat data flow attributes and unpacking; adapt tests The V1DataFlow and V5DataFlow classes used a verbose way of unpacking the hex byte stream to the specific fields until now. With this commit, both use a list of field names, one struct.unpack call and then a mapping for-loop for each field. Additionally the upper boundary of the passed data slice was added. With the self.__dict__.update() call all fields are now also accessible as direct attributes of the corresponding instance, e.g. flow.PROTO to access flow.data["PROTO"]. This works for flows of all three versions. The tests were adapted to reflect this new implementation.	2020-03-30 12:29:50 +02:00
Dominik Pataky	8b70fb1058	Fix to_dict() in headers; formatting The collector uses the .to_dict() function to persist the header in its gzipped output file. Now all headers implement this function.	2020-03-29 23:17:05 +02:00
Dominik Pataky	4a90e0ce34	Update README, bump minor version to v0.9.0	2020-03-29 22:34:30 +02:00
Dominik Pataky	5765fa31cf	Rename test file; fix analyzer test Tests are now all running, not skipping the analyzer test. Adapted to the new CLI calling method for the subprocess.	2020-03-29 22:33:26 +02:00
Dominik Pataky	abce1f57dd	Move collector and analyzer into the package, callable via CLI Beginning with this commit, the reference implementations of the collector and analyzer are now included in the package. They are callable by running `python3 -m netflow.collector` or `.analyzer`, with the same flags as before. Use `-h` to list them. Additional fixes are contained in this commit as well, e.g. adding more version prefixes and moving parts of code from __init__ to utils, to fix circular imports.	2020-03-29 22:14:45 +02:00
Dominik Pataky	9d2bc21ae2	Extend and reformat tests, add tests for v1 and v5, bump version The tests are now also parsing export packets for version 1 and 5. Version 9 received an additional test, inspecting the data inside the export. All new packet hex dumps were created by using a Docker container with alpine Linux, running a softflowd daemon inside and then pinging the Docker host IP. After review with "softflowctl dump-flows" issueing "softflowctl expire-all" sends the packets away to the collector (should be an IP address outside of the Docker bridge). The export network packets are then collected with Wireshark running in the host namespace, capturing on the Docker bridge. Bump version to v0.8.3 Resolves #13 Resolves #14 Refs #18	2020-03-29 19:57:13 +02:00
Dominik Pataky	e8073013c1	Rename classes in v1, v5 and v9 according to version Until now, every NetFlow version file used similar names for their classes, e.g. "Header". These are now prefixed with their respective version, e.g. "V1Header", to avoid confusion in imports etc.	2020-03-29 19:49:57 +02:00
Dominik Pataky	5fd4e9bd24	Update README/setup.py; add .json property to v9 header for export The README and setup.py were adapted to the current state, preparing for PyPI upload and package info. In v9, the header received an additional .json property, which exports the header as a dict to allow JSON serialization in the export file. This export is used in main.py	2020-03-29 18:06:52 +02:00
Dominik Pataky	f8c5717002	Extend analyzer with --no-dns and --match-host; fixes This commit extends the analyzer script with two new flags: * Adding --no-dns disables hostname DNS resolution, improving speed * Adding --match-host <IP address> filters all flows not matching the IP Additional small things were changed, the script is still work in progress. Especially the "pairing" of two flows will be removed in future versions.	2020-03-19 18:16:03 +01:00
Dominik Pataky	4639601798	Extend logging in collector (with --verbose)	2020-03-19 18:16:03 +01:00
Dominik Pataky	0d8f1a2ecb	Add ParsedPacket named tuple for queue; extend tests Before, the output queue of the collector received unnamed tuples with three fields. This broke the tests and was less understandable. The new version uses a named tuple for clarity. The tests were adapted to the new type in the queue and are fixed. For backwards compatibility a check of the Python version is added and the subprocess stdout/stderr arguments are passed depending on this version. See #18.	2020-03-19 18:16:03 +01:00
Dominik Pataky	290e822176	Add IPv6 local interface address handling With the '--host' flag, a local interface IP address can be set on which the collector listens for incoming flows. Since now, this only worked with IPv4 addresses (using the default 0.0.0.0 interface). The commit adds handling of passed-in IPv6 addresses by identifying ":" and then switching to the AF_INET6 socket family.	2020-03-19 18:12:50 +01:00
cookie	647f4b3748	Merge pull request #19 from grafolean/fix/tests Fix failing tests (wrong index when accessing netflow records)	2020-03-19 15:38:27 +00:00
Anze	096c7d6f4f	Fix failing tests (wrong index when accessing netflow records)	2020-02-22 23:20:36 +01:00
Dominik Pataky	565f829945	Add verbose flag to analyzer Adds a new flag, '-v' or '--verbose', to the analyzer.py script. It uses a new print method and also skips some parts of the script if not passed on the CLI.	2020-01-20 17:01:50 +01:00
Dominik Pataky	adb02eab24	Update to 2020 in file headers; update the analyzer file name in README The analyzer is now found in analyzer.py and uses the '-f' flag for GZIPed input files. Bundled with the previous PR commit, this update should now be clearer.	2020-01-20 16:59:36 +01:00
cookie	52d357b111	Merge pull request #12 from kaysiz/patch-1 Update README.md to match new file format (GZIP instead of JSON). Thanks for the PR!	2020-01-20 15:13:14 +00:00
kudakwashe siziva	59652f7d2f	Update README.md Changed file extension from json to gz	2020-01-17 10:43:21 +02:00
Dominik Pataky	61439ec6ef	Improve analyzer (handling of pairs, dropping noise) Previously, the analyzer assumed that two consecutive flows would be a pair. This proved unreliable, therefore a new comparison algorithm is ussed. It utilizes the IP addresses and the 'first_switched' parameter to identify two flows of the same connection. More improvements can be done, especially filtering and in the identification of the initiating peer. Tests still fail, have to be adapted to the new dicts and gzip.	2019-11-03 15:58:40 +01:00
Dominik Pataky	eff99fc6e3	Add client info to stored data Until now, packets arriving at the collector's interface were stored by timestamp, with the exported flows in the payload. This format is now extended to also store the client's IP address and port, allowing multiple clients to export flows to the same collector instance.	2019-11-03 13:57:06 +01:00
Dominik Pataky	1646a52f17	Store IP addresses (v4 + v6) as strings rather than ints As mentioned by @pR0Ps in `6b9d20c8a6/analyze_json.py (L83)` IP addresses, especially in IPv6, should better be stored as parsed strings instead of their raw integer values. Implemented.	2019-11-03 13:35:32 +01:00
Dominik Pataky	6b9d20c8a6	Refactor storing data and writing to disk - using gzip and lines In previous versions, collected flows (parsed data) were stored in memory by the collector. In regular intervals, or at shutdown, this one single dict was dumped as JSON onto disk. With this commit, the behaviour is changed to line-based JSON dumps for each flow, gzipped onto disk for storage efficiency. The analyze_json is updated as well to handle the new gzipped files in the new format. See the comments in main.py for more details. Fixes #10	2019-11-03 12:02:05 +01:00
Dominik Pataky	3dee135a22	Merge branch 'props-master' Merging pull request #9 by @pR0Ps https://github.com/bitkeks/python-netflow-v9-softflowd/pull/9 Thanks for the contribution! Resolves #9	2019-10-31 18:02:06 +01:00
Dominik Pataky	9f16d246a5	Add v1, v5 to README; change fallback; add timeout parameter Updated the README to reference NetFlow v1 and v5 as well. The fallback(key, dict) method used an exception-based testing of the keys existence. Switched to 'if x in'. The NetFlowListener is based on threading.Thread, which uses the 'timeout' parameter in .join(). Added.	2019-10-31 17:55:48 +01:00
Dominik Pataky	bfec3953e6	Bump version, fix small errors, decrease packet num in tests	2019-10-31 17:35:15 +01:00
Carey Metcalfe	345a5b08ff	Fix setup.py file	2019-10-16 23:46:32 -04:00
Carey Metcalfe	bf92f24669	Add test for invalid packets	2019-10-16 23:46:32 -04:00
Carey Metcalfe	96817f1f8d	Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code	2019-10-16 23:46:32 -04:00
Carey Metcalfe	186b648c4d	Fix tests Uses the analyzer's new stdin-reading capabilities to test the analysis without having to write temporary files. Also removes most of the delays because the listener can keep up now.	2019-10-16 23:44:28 -04:00
Carey Metcalfe	8e6d0c54e8	Allow analyze_json.py to accept input via stdin This will make testing much cleaner in the future (no temp files needed) Also increase performance by memoizing the hostname lookup	2019-10-16 23:44:19 -04:00
Carey Metcalfe	11dc92269c	Refactor code to make programatic access to flows easier This commit splits the packet collecting and processing out into a thread that provides a queue-like `get(block=True, timeout=None)` function for getting processed `ExportPackets`. This makes it much easier to use rather than starting a generator and sending a value to it when you want to stop. The `get_export_packets` generator is an example of using it - it just starts the thread and yields values from it.	2019-10-16 23:33:22 -04:00

1 2

71 commits