netflow

Author	SHA1	Message	Date
Dominik Pataky	c3da0b2096	Adapt utils, collector, analyzer to IPFIX At differnt points in the tool set, NetFlow (v9) is set as the default case. Now that IPFIX is on its way to be supported as well, adapt all occurences where a differentiation must be done.	2020-03-31 22:47:23 +02:00
Dominik Pataky	937e640198	IPFIX: implement data records and template handling; add IANA types Second half of the IPFIX implementation now adds the support for data records. The templates are also extracted, allowing the collector to use them across exports. The field types were extracted from the IANA assignment list at https://www.iana.org/assignments/ipfix/ipfix-information-elements.csv Please note that the IPFIX implementation was made from scratch and differs from the NetFlow v9 implementation, as there was little copy/paste.	2020-03-31 22:45:58 +02:00
Dominik Pataky	524e411850	Add first approach of IPFIX implementation Adds a new module, IPFIX. The collector already recognizes version 10 in the header, meaning IPFIX. The parser is able to dissect the export package and all sets with their headers. Missing is the handling of the templates in the data sets - a feature needed for the whole parsing process to complete.	2020-03-31 20:58:15 +02:00
Dominik Pataky	0358c3416c	Fix logger in collector; fix header dates	2020-03-31 16:28:33 +02:00
Dominik Pataky	cd07885d28	Improve handling of mixed template/data exports; add test The collector is able to parse templates in an export and then use these templates to parse dataflows inside the same export packet. But the test implementation was based on the assumption, that the templates always arrive first in the packet. Now, a mixed order is also processed successfully. Test included.	2020-03-30 16:42:48 +02:00
Dominik Pataky	d4d6d59713	Provide parse_packet as API; fix parse_packet input handling; README To get closer to a stable package, netflow now offers the parse_packet function in its top-level __init__ file. This function was also enhanced to handle multiple input formats (str, bytes, hex bytes). Updated README accordingly.	2020-03-30 13:04:25 +02:00
Dominik Pataky	7ae179cb33	Reformat data flow attributes and unpacking; adapt tests The V1DataFlow and V5DataFlow classes used a verbose way of unpacking the hex byte stream to the specific fields until now. With this commit, both use a list of field names, one struct.unpack call and then a mapping for-loop for each field. Additionally the upper boundary of the passed data slice was added. With the self.__dict__.update() call all fields are now also accessible as direct attributes of the corresponding instance, e.g. flow.PROTO to access flow.data["PROTO"]. This works for flows of all three versions. The tests were adapted to reflect this new implementation.	2020-03-30 12:29:50 +02:00
Dominik Pataky	8b70fb1058	Fix to_dict() in headers; formatting The collector uses the .to_dict() function to persist the header in its gzipped output file. Now all headers implement this function.	2020-03-29 23:17:05 +02:00
Dominik Pataky	4a90e0ce34	Update README, bump minor version to v0.9.0	2020-03-29 22:34:30 +02:00
Dominik Pataky	abce1f57dd	Move collector and analyzer into the package, callable via CLI Beginning with this commit, the reference implementations of the collector and analyzer are now included in the package. They are callable by running `python3 -m netflow.collector` or `.analyzer`, with the same flags as before. Use `-h` to list them. Additional fixes are contained in this commit as well, e.g. adding more version prefixes and moving parts of code from __init__ to utils, to fix circular imports.	2020-03-29 22:14:45 +02:00
Dominik Pataky	e8073013c1	Rename classes in v1, v5 and v9 according to version Until now, every NetFlow version file used similar names for their classes, e.g. "Header". These are now prefixed with their respective version, e.g. "V1Header", to avoid confusion in imports etc.	2020-03-29 19:49:57 +02:00
Dominik Pataky	5fd4e9bd24	Update README/setup.py; add .json property to v9 header for export The README and setup.py were adapted to the current state, preparing for PyPI upload and package info. In v9, the header received an additional .json property, which exports the header as a dict to allow JSON serialization in the export file. This export is used in main.py	2020-03-29 18:06:52 +02:00
Dominik Pataky	61439ec6ef	Improve analyzer (handling of pairs, dropping noise) Previously, the analyzer assumed that two consecutive flows would be a pair. This proved unreliable, therefore a new comparison algorithm is ussed. It utilizes the IP addresses and the 'first_switched' parameter to identify two flows of the same connection. More improvements can be done, especially filtering and in the identification of the initiating peer. Tests still fail, have to be adapted to the new dicts and gzip.	2019-11-03 15:58:40 +01:00
Dominik Pataky	1646a52f17	Store IP addresses (v4 + v6) as strings rather than ints As mentioned by @pR0Ps in `6b9d20c8a6/analyze_json.py (L83)` IP addresses, especially in IPv6, should better be stored as parsed strings instead of their raw integer values. Implemented.	2019-11-03 13:35:32 +01:00
Dominik Pataky	bfec3953e6	Bump version, fix small errors, decrease packet num in tests	2019-10-31 17:35:15 +01:00
Carey Metcalfe	96817f1f8d	Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code	2019-10-16 23:46:32 -04:00
Carey Metcalfe	ef151f8d28	Improve collector script and restructure code - Moved the netflow library out of the src directory - The UDP listener was restructured so that multiple threads can receive packets and push them into a queue. The main thread then pulls the packets off the queue one at a time and processes them. This means that the collector will never drop a packet because it was blocked on processing the previous one. - Adds a property to the ExportPacket class to expose if any new templates are contained in it. - The collector will now only retry parsing past packets when a new template is found. Also refactored the retry logic a bit to remove duplicate code (retrying just pushes the packets back into the main queue to be processed again like all the other packets). - The collector no longer continually reads and writes to/from the disk. It just caches the data in memory until it exits instead.	2019-10-16 23:31:39 -04:00

17 commits