At differnt points in the tool set, NetFlow (v9) is set as the default
case. Now that IPFIX is on its way to be supported as well, adapt all
occurences where a differentiation must be done.
Second half of the IPFIX implementation now adds the support for data
records. The templates are also extracted, allowing the collector to use
them across exports.
The field types were extracted from the IANA assignment list at
https://www.iana.org/assignments/ipfix/ipfix-information-elements.csv
Please note that the IPFIX implementation was made from scratch and
differs from the NetFlow v9 implementation, as there was little
copy/paste.
Adds a new module, IPFIX. The collector already recognizes version 10 in
the header, meaning IPFIX. The parser is able to dissect the export
package and all sets with their headers.
Missing is the handling of the templates in the data sets - a feature
needed for the whole parsing process to complete.
The collector is able to parse templates in an export and then use these
templates to parse dataflows inside the same export packet. But the test
implementation was based on the assumption, that the templates always
arrive first in the packet. Now, a mixed order is also processed
successfully. Test included.
To get closer to a stable package, netflow now offers the parse_packet
function in its top-level __init__ file. This function was also enhanced
to handle multiple input formats (str, bytes, hex bytes).
Updated README accordingly.
The V1DataFlow and V5DataFlow classes used a verbose way of unpacking
the hex byte stream to the specific fields until now. With this commit,
both use a list of field names, one struct.unpack call and then a
mapping for-loop for each field.
Additionally the upper boundary of the passed data slice was added.
With the self.__dict__.update() call all fields are now also accessible
as direct attributes of the corresponding instance, e.g. flow.PROTO to
access flow.data["PROTO"]. This works for flows of all three versions.
The tests were adapted to reflect this new implementation.
Beginning with this commit, the reference implementations of the
collector and analyzer are now included in the package. They are
callable by running `python3 -m netflow.collector` or `.analyzer`, with
the same flags as before. Use `-h` to list them.
Additional fixes are contained in this commit as well, e.g. adding more
version prefixes and moving parts of code from __init__ to utils, to fix
circular imports.
Until now, every NetFlow version file used similar names for their
classes, e.g. "Header". These are now prefixed with their respective
version, e.g. "V1Header", to avoid confusion in imports etc.
The README and setup.py were adapted to the current state, preparing for
PyPI upload and package info.
In v9, the header received an additional .json property, which exports
the header as a dict to allow JSON serialization in the export file.
This export is used in main.py
Previously, the analyzer assumed that two consecutive flows would be a
pair. This proved unreliable, therefore a new comparison algorithm is
ussed. It utilizes the IP addresses and the 'first_switched' parameter
to identify two flows of the same connection.
More improvements can be done, especially filtering and in the
identification of the initiating peer.
Tests still fail, have to be adapted to the new dicts and gzip.
As mentioned by @pR0Ps in 6b9d20c8a6/analyze_json.py (L83)
IP addresses, especially in IPv6, should better be stored as parsed
strings instead of their raw integer values. Implemented.
- Moved the netflow library out of the src directory
- The UDP listener was restructured so that multiple threads can receive
packets and push them into a queue. The main thread then pulls the
packets off the queue one at a time and processes them. This means
that the collector will never drop a packet because it was blocked on
processing the previous one.
- Adds a property to the ExportPacket class to expose if any new
templates are contained in it.
- The collector will now only retry parsing past packets when a new
template is found. Also refactored the retry logic a bit to remove
duplicate code (retrying just pushes the packets back into the main
queue to be processed again like all the other packets).
- The collector no longer continually reads and writes to/from the disk.
It just caches the data in memory until it exits instead.