commit 63abf52ec640a019f8c45c1208f0dfb585641781
Padding: add offset!=length check to reduce safety check calls
Adds another check when parsing a set. The check "offset !=
self.header.length" allows to skip the padding checks if the offset is
the same as the length, not calling rest_is_padding_zeroes and wasting
CPU time.
commit 8d1cf9cac12c45c0af70591b646d898ba5c923fc
Finish IPFIX padding handling
Tested implementation of IPFIX set padding handling. Uses TK-Khaw's
proposed no_padding_last_offset calculation, extended as modulo
calculation to match multiple data set records.
Tests were conducted by capturing live traffic on a test machine with
tcpdump, then this capture file was read in by softflowd 1.1.0, with the
collector.py as the export target. The exported IPFIX (v10) packets were
then using both no padding and padding, so that tests could be
validated.
Closes#34
Signed-off-by: Dominik Pataky <software+pynetflow@dpataky.eu>
commit 51ce4eaa268e4bda5be89e1d430477d12fc8a72c
Fix and optimize padding calculation for IPFIX sets.
Refs #34
commit 9d3c4135385ca9714b7631a0c5af46feb891a9fb
Author: Khaw Teng Kang <tk.khaw@attrelogix.com>
Date: Tue Jul 5 16:29:12 2022 +0800
Reverted changes to template_record, data_length is now computed using field length in template.
Signed-off-by: Khaw Teng Kang <tk.khaw@attrelogix.com>
commit 3c4f8e62892876d4a2d42288843890b97244df55
IPFIX: handle padding (zero bytes) in sets
Adds a check to each IPFIX set ID branch, checking if the rest of the
bytes in this set is padding/zeroes.
Refs #34
Signed-off-by: Dominik Pataky <software+pynetflow@dpataky.eu>
Previously, option templates and their data records were not correctly
recognized. This is fixed now. Collectors can now use the
V9ExportPacket.options field to get a list of V9OptionsDataRecord, with
scopes and data fields.
Templates are mixed in the templates dict. They will have both data
templates and option templates. Let's hope exporters do not mix them
(re-use the same IDs for both template types).
During development, the search for the correct template was refactored.
The templates are not pased into the V9DataFlowSet any more. Only the
one single matching template is passed into V9DataFlowSet and
V9OptionsDataFlowset, as should be.
Refs #30
This is a hacky workaround to handle V9 options templates, without
implementing the full corresponding spec. This solves missing templates
which raise a V9TemplateNotRecognized exception, even though an exporter
might do everything correctly.
Refs #29
Refs #30
The parse_packet function is one of the main functions for usage of this
library in other scripts. It works, but was under-documented until now.
Especially the 'templates' parameter might lead to confusions for new
users who have not yet worked with templates. This commit should make
things clearer.
Refs #28
Signals INT and TERM were not correctly handled in the 'while True' loop
of the yielding listener function. Now, the loop breaks as expected,
terminating the listener thread and the application.
This commit replaces multiple occurences of new features which were not
yet implemented with Python 3.5.3, which is the reference backwards
compatibility version for this package. The version is based on the
current Python version in Debian Stretch (oldstable). According to
pkgs.org, all other distros use 3.6+, so 3.5.3 is the lower boundary.
Changes:
* Add maxsize argument to functools.lru_cache decorator
* Replace f"" with .format()
* Replace variable type hints "var: type = val" with "# type:" comments
* Replace pstats.SortKey enum with strings in performance tests
Additionally, various styling fixes were applied.
The version compatibility was tested with tox, pyenv and Python 3.5.3,
but there is no tox.ini yet which automates this test.
Bump patch version number to 0.10.3
Update author's email address.
Resolves#27
Templates may be withdrawn as per RFC7011. Receiving a template with an
existing template_id and a field_count of 0 now triggers deletion of
this template.
Parts of the IPFIXFieldTypes class were extracted into the new
IPFIXDataTypes class, to increase readability and stability.
The IPFIXDataRecord class and its field parser is now more in tune with
the specifications, handling signed and unsigned, as well as float,
boolean and UTF8 strings etc.
Corresponding tests were extended with softflowd packets (level
"ethernet") and value checks (e.g. MAC address).
Resolves#25
In IPFIX, template fields can be signed or unsigned, or even be pure
bytes or unicode string. This differentiation was extended in this
commit.
Additionally, the IPFIX_FIELD_TYPES dict mapping from int->str was
replaced by a more verbose version, which also includes the standardized
IANA data types. The class' methods provides access to the fixed data
set. This is then used in the IPFIXDataRecord parser.
Refs #25
The function send_recv_packets in tests stored all processed
ExportPackets by default in a list. Memory usage tests were therefore
based on this high amount of stored objects, since no instance of any
ExportPacket was deleted until exit.
With the new parameter store_packets the caller can define how many
packets should be stored during receiving, as to test multiple
scenarios.
Three such scenarios are implemented: don't store any packet, store
maximum of 500 at a time and store all packets. This comes much closer
to the real world scenario of the collector, which uses a "for export in
listener.get" loop, dumping any new ExportPacket to file immediatelly
and then deleting the object.
Yet, the case where all packets are stored must still be covered as
well, because the collector might not be the only implementation which
uses listener.get, so finding memory leaks should be covered.
Analyzer test was missing imports.
IPFIX templates with 16 bytes fields were processed extra, since struct
does not natively support conversion to int. The new implementation
still handles it extra, but uses struct's "s" unpack format descriptor
now.
The collector should catch both v9 and IPFIX template errors - syntax
error corrected. The v9 ExportPacket.templates attribute is now
@property and read-only.
The tests are now located in tests/. They are also split into multiple
files, beginning with test_netflow and test_analyzer. The tests for
IPFIX will be added to test_ipfix.
Python struct does not natively support 16 byte fields. But since IPFIX
uses fields of length 16 bytes for at least IPv6 addresses, they must be
processed in the IPFIX parser. This commit adds support for 16 byte
fields by handling them as special struct.unpack cases.
At differnt points in the tool set, NetFlow (v9) is set as the default
case. Now that IPFIX is on its way to be supported as well, adapt all
occurences where a differentiation must be done.
Second half of the IPFIX implementation now adds the support for data
records. The templates are also extracted, allowing the collector to use
them across exports.
The field types were extracted from the IANA assignment list at
https://www.iana.org/assignments/ipfix/ipfix-information-elements.csv
Please note that the IPFIX implementation was made from scratch and
differs from the NetFlow v9 implementation, as there was little
copy/paste.
Adds a new module, IPFIX. The collector already recognizes version 10 in
the header, meaning IPFIX. The parser is able to dissect the export
package and all sets with their headers.
Missing is the handling of the templates in the data sets - a feature
needed for the whole parsing process to complete.
The collector is able to parse templates in an export and then use these
templates to parse dataflows inside the same export packet. But the test
implementation was based on the assumption, that the templates always
arrive first in the packet. Now, a mixed order is also processed
successfully. Test included.
To get closer to a stable package, netflow now offers the parse_packet
function in its top-level __init__ file. This function was also enhanced
to handle multiple input formats (str, bytes, hex bytes).
Updated README accordingly.
The V1DataFlow and V5DataFlow classes used a verbose way of unpacking
the hex byte stream to the specific fields until now. With this commit,
both use a list of field names, one struct.unpack call and then a
mapping for-loop for each field.
Additionally the upper boundary of the passed data slice was added.
With the self.__dict__.update() call all fields are now also accessible
as direct attributes of the corresponding instance, e.g. flow.PROTO to
access flow.data["PROTO"]. This works for flows of all three versions.
The tests were adapted to reflect this new implementation.
Beginning with this commit, the reference implementations of the
collector and analyzer are now included in the package. They are
callable by running `python3 -m netflow.collector` or `.analyzer`, with
the same flags as before. Use `-h` to list them.
Additional fixes are contained in this commit as well, e.g. adding more
version prefixes and moving parts of code from __init__ to utils, to fix
circular imports.
Until now, every NetFlow version file used similar names for their
classes, e.g. "Header". These are now prefixed with their respective
version, e.g. "V1Header", to avoid confusion in imports etc.
The README and setup.py were adapted to the current state, preparing for
PyPI upload and package info.
In v9, the header received an additional .json property, which exports
the header as a dict to allow JSON serialization in the export file.
This export is used in main.py
Previously, the analyzer assumed that two consecutive flows would be a
pair. This proved unreliable, therefore a new comparison algorithm is
ussed. It utilizes the IP addresses and the 'first_switched' parameter
to identify two flows of the same connection.
More improvements can be done, especially filtering and in the
identification of the initiating peer.
Tests still fail, have to be adapted to the new dicts and gzip.