Previously, option templates and their data records were not correctly
recognized. This is fixed now. Collectors can now use the
V9ExportPacket.options field to get a list of V9OptionsDataRecord, with
scopes and data fields.
Templates are mixed in the templates dict. They will have both data
templates and option templates. Let's hope exporters do not mix them
(re-use the same IDs for both template types).
During development, the search for the correct template was refactored.
The templates are not pased into the V9DataFlowSet any more. Only the
one single matching template is passed into V9DataFlowSet and
V9OptionsDataFlowset, as should be.
Refs #30
This is a hacky workaround to handle V9 options templates, without
implementing the full corresponding spec. This solves missing templates
which raise a V9TemplateNotRecognized exception, even though an exporter
might do everything correctly.
Refs #29
Refs #30
This commit replaces multiple occurences of new features which were not
yet implemented with Python 3.5.3, which is the reference backwards
compatibility version for this package. The version is based on the
current Python version in Debian Stretch (oldstable). According to
pkgs.org, all other distros use 3.6+, so 3.5.3 is the lower boundary.
Changes:
* Add maxsize argument to functools.lru_cache decorator
* Replace f"" with .format()
* Replace variable type hints "var: type = val" with "# type:" comments
* Replace pstats.SortKey enum with strings in performance tests
Additionally, various styling fixes were applied.
The version compatibility was tested with tox, pyenv and Python 3.5.3,
but there is no tox.ini yet which automates this test.
Bump patch version number to 0.10.3
Update author's email address.
Resolves#27
The collector should catch both v9 and IPFIX template errors - syntax
error corrected. The v9 ExportPacket.templates attribute is now
@property and read-only.
At differnt points in the tool set, NetFlow (v9) is set as the default
case. Now that IPFIX is on its way to be supported as well, adapt all
occurences where a differentiation must be done.
The collector is able to parse templates in an export and then use these
templates to parse dataflows inside the same export packet. But the test
implementation was based on the assumption, that the templates always
arrive first in the packet. Now, a mixed order is also processed
successfully. Test included.
The V1DataFlow and V5DataFlow classes used a verbose way of unpacking
the hex byte stream to the specific fields until now. With this commit,
both use a list of field names, one struct.unpack call and then a
mapping for-loop for each field.
Additionally the upper boundary of the passed data slice was added.
With the self.__dict__.update() call all fields are now also accessible
as direct attributes of the corresponding instance, e.g. flow.PROTO to
access flow.data["PROTO"]. This works for flows of all three versions.
The tests were adapted to reflect this new implementation.
Beginning with this commit, the reference implementations of the
collector and analyzer are now included in the package. They are
callable by running `python3 -m netflow.collector` or `.analyzer`, with
the same flags as before. Use `-h` to list them.
Additional fixes are contained in this commit as well, e.g. adding more
version prefixes and moving parts of code from __init__ to utils, to fix
circular imports.
Until now, every NetFlow version file used similar names for their
classes, e.g. "Header". These are now prefixed with their respective
version, e.g. "V1Header", to avoid confusion in imports etc.
The README and setup.py were adapted to the current state, preparing for
PyPI upload and package info.
In v9, the header received an additional .json property, which exports
the header as a dict to allow JSON serialization in the export file.
This export is used in main.py
Previously, the analyzer assumed that two consecutive flows would be a
pair. This proved unreliable, therefore a new comparison algorithm is
ussed. It utilizes the IP addresses and the 'first_switched' parameter
to identify two flows of the same connection.
More improvements can be done, especially filtering and in the
identification of the initiating peer.
Tests still fail, have to be adapted to the new dicts and gzip.
As mentioned by @pR0Ps in 6b9d20c8a6/analyze_json.py (L83)
IP addresses, especially in IPv6, should better be stored as parsed
strings instead of their raw integer values. Implemented.
- Moved the netflow library out of the src directory
- The UDP listener was restructured so that multiple threads can receive
packets and push them into a queue. The main thread then pulls the
packets off the queue one at a time and processes them. This means
that the collector will never drop a packet because it was blocked on
processing the previous one.
- Adds a property to the ExportPacket class to expose if any new
templates are contained in it.
- The collector will now only retry parsing past packets when a new
template is found. Also refactored the retry logic a bit to remove
duplicate code (retrying just pushes the packets back into the main
queue to be processed again like all the other packets).
- The collector no longer continually reads and writes to/from the disk.
It just caches the data in memory until it exits instead.
2019-10-16 23:31:39 -04:00
Renamed from src/netflow/collector_v9.py (Browse further)