Commit graph

28 commits

Author SHA1 Message Date
Dominik Pataky 7ea24a900c Small addition to grafolean fix (comments, endianness hint) 2022-07-02 11:50:18 +02:00
Anze 92b221aa10 Fix: f-strings might not be supported 2022-05-08 22:01:34 +02:00
Anze 1bffe3a2a3 Performance improvement: rearrange netflow v9 packet parsing (use struct.unpack to extract all of the values at once) 2022-05-08 18:31:06 +02:00
Anze c12507343b Performance improvement: no need to copy a part of the buffer when using struct.unpack_from() 2022-05-08 18:30:11 +02:00
Anze 77da7b16b6 Performance improvement: use struct.unpack instead of manually constructing bytes when possible 2022-05-08 17:54:05 +02:00
Anze b10dc5faef Performance improvement: rearrange code so that instead of converting IP addresses to integers first, we construct them from bytes directly 2022-05-08 17:52:51 +02:00
Anze ef99464fc5 Performance improvement: when checking if a field contains an IP address, compare the keys (which are integers) instead of values (strings) 2022-05-08 17:51:56 +02:00
Dominik Pataky 8b5675913d Small changes to PR #37 preventing infinite loops; bump version
Closes #37
2022-04-25 20:26:04 +02:00
Vitali Sepetnitsky b8e911a40a avoid infinite loop in V9ExportPacket's constructor 2022-02-16 18:39:15 +02:00
Dominik Pataky ab32ce93b5 Fix counters in options templates
Counters in 4-packs used '/ 4' instead of '// 4', passing a float into
range(), instead of int.

Refs #30
2021-05-02 15:48:20 +02:00
Dominik Pataky 5adde00aec Implement options templates/records handling for V9
Previously, option templates and their data records were not correctly
recognized. This is fixed now. Collectors can now use the
V9ExportPacket.options field to get a list of V9OptionsDataRecord, with
scopes and data fields.

Templates are mixed in the templates dict. They will have both data
templates and option templates. Let's hope exporters do not mix them
(re-use the same IDs for both template types).

During development, the search for the correct template was refactored.
The templates are not pased into the V9DataFlowSet any more. Only the
one single matching template is passed into V9DataFlowSet and
V9OptionsDataFlowset, as should be.

Refs #30
2021-04-05 13:07:32 +02:00
Dominik Pataky e43980fe4a Add stub implementation to store V9 options templates
This is a hacky workaround to handle V9 options templates, without
implementing the full corresponding spec. This solves missing templates
which raise a V9TemplateNotRecognized exception, even though an exporter
might do everything correctly.

Refs #29
Refs #30
2021-04-04 20:42:49 +02:00
Dominik Pataky 54e19af8c2 Adapt new V9OptionsTemplateFlowSet stub
Resolves #29
2021-04-04 10:35:08 +02:00
Jonas Licht 5b823052f1 Stub parsing of option templates to can ignore option datasets 2021-03-26 16:46:27 +01:00
Dominik Pataky 5cdb514ffc Ensure compatibility with Python 3.5.3
This commit replaces multiple occurences of new features which were not
yet implemented with Python 3.5.3, which is the reference backwards
compatibility version for this package. The version is based on the
current Python version in Debian Stretch (oldstable). According to
pkgs.org, all other distros use 3.6+, so 3.5.3 is the lower boundary.

Changes:
  * Add maxsize argument to functools.lru_cache decorator
  * Replace f"" with .format()
  * Replace variable type hints "var: type = val" with "# type:" comments
  * Replace pstats.SortKey enum with strings in performance tests

Additionally, various styling fixes were applied.
The version compatibility was tested with tox, pyenv and Python 3.5.3,
but there is no tox.ini yet which automates this test.

Bump patch version number to 0.10.3
Update author's email address.

Resolves #27
2020-04-24 16:52:25 +02:00
Dominik Pataky 143986c38d Fix multi-exception catch in collector; make templates @property in v9
The collector should catch both v9 and IPFIX template errors - syntax
error corrected. The v9 ExportPacket.templates attribute is now
@property and read-only.
2020-04-01 14:12:27 +02:00
Dominik Pataky c3da0b2096 Adapt utils, collector, analyzer to IPFIX
At differnt points in the tool set, NetFlow (v9) is set as the default
case. Now that IPFIX is on its way to be supported as well, adapt all
occurences where a differentiation must be done.
2020-03-31 22:47:23 +02:00
Dominik Pataky 0358c3416c Fix logger in collector; fix header dates 2020-03-31 16:28:33 +02:00
Dominik Pataky cd07885d28 Improve handling of mixed template/data exports; add test
The collector is able to parse templates in an export and then use these
templates to parse dataflows inside the same export packet. But the test
implementation was based on the assumption, that the templates always
arrive first in the packet. Now, a mixed order is also processed
successfully. Test included.
2020-03-30 16:42:48 +02:00
Dominik Pataky 7ae179cb33 Reformat data flow attributes and unpacking; adapt tests
The V1DataFlow and V5DataFlow classes used a verbose way of unpacking
the hex byte stream to the specific fields until now. With this commit,
both use a list of field names, one struct.unpack call and then a
mapping for-loop for each field.

Additionally the upper boundary of the passed data slice was added.

With the self.__dict__.update() call all fields are now also accessible
as direct attributes of the corresponding instance, e.g. flow.PROTO to
access flow.data["PROTO"]. This works for flows of all three versions.

The tests were adapted to reflect this new implementation.
2020-03-30 12:29:50 +02:00
Dominik Pataky 8b70fb1058 Fix to_dict() in headers; formatting
The collector uses the .to_dict() function to persist the header in its
gzipped output file. Now all headers implement this function.
2020-03-29 23:17:05 +02:00
Dominik Pataky abce1f57dd Move collector and analyzer into the package, callable via CLI
Beginning with this commit, the reference implementations of the
collector and analyzer are now included in the package. They are
callable by running `python3 -m netflow.collector` or `.analyzer`, with
the same flags as before. Use `-h` to list them.

Additional fixes are contained in this commit as well, e.g. adding more
version prefixes and moving parts of code from __init__ to utils, to fix
circular imports.
2020-03-29 22:14:45 +02:00
Dominik Pataky e8073013c1 Rename classes in v1, v5 and v9 according to version
Until now, every NetFlow version file used similar names for their
classes, e.g. "Header". These are now prefixed with their respective
version, e.g. "V1Header", to avoid confusion in imports etc.
2020-03-29 19:49:57 +02:00
Dominik Pataky 5fd4e9bd24 Update README/setup.py; add .json property to v9 header for export
The README and setup.py were adapted to the current state, preparing for
PyPI upload and package info.

In v9, the header received an additional .json property, which exports
the header as a dict to allow JSON serialization in the export file.
This export is used in main.py
2020-03-29 18:06:52 +02:00
Dominik Pataky 61439ec6ef Improve analyzer (handling of pairs, dropping noise)
Previously, the analyzer assumed that two consecutive flows would be a
pair. This proved unreliable, therefore a new comparison algorithm is
ussed. It utilizes the IP addresses and the 'first_switched' parameter
to identify two flows of the same connection.

More improvements can be done, especially filtering and in the
identification of the initiating peer.

Tests still fail, have to be adapted to the new dicts and gzip.
2019-11-03 15:58:40 +01:00
Dominik Pataky 1646a52f17 Store IP addresses (v4 + v6) as strings rather than ints
As mentioned by @pR0Ps in 6b9d20c8a6/analyze_json.py (L83)
IP addresses, especially in IPv6, should better be stored as parsed
strings instead of their raw integer values. Implemented.
2019-11-03 13:35:32 +01:00
Carey Metcalfe 96817f1f8d Add support for v1 and v5 NetFlow packets
Thanks to @alieissa for the initial v1 and v5 code
2019-10-16 23:46:32 -04:00
Carey Metcalfe ef151f8d28 Improve collector script and restructure code
- Moved the netflow library out of the src directory
- The UDP listener was restructured so that multiple threads can receive
  packets and push them into a queue. The main thread then pulls the
  packets off the queue one at a time and processes them. This means
  that the collector will never drop a packet because it was blocked on
  processing the previous one.
- Adds a property to the ExportPacket class to expose if any new
  templates are contained in it.
- The collector will now only retry parsing past packets when a new
  template is found. Also refactored the retry logic a bit to remove
  duplicate code (retrying just pushes the packets back into the main
  queue to be processed again like all the other packets).
- The collector no longer continually reads and writes to/from the disk.
  It just caches the data in memory until it exits instead.
2019-10-16 23:31:39 -04:00
Renamed from src/netflow/collector_v9.py (Browse further)