Commit graph

17 commits

Author SHA1 Message Date
Dominik Pataky c3da0b2096 Adapt utils, collector, analyzer to IPFIX
At differnt points in the tool set, NetFlow (v9) is set as the default
case. Now that IPFIX is on its way to be supported as well, adapt all
occurences where a differentiation must be done.
2020-03-31 22:47:23 +02:00
Dominik Pataky 937e640198 IPFIX: implement data records and template handling; add IANA types
Second half of the IPFIX implementation now adds the support for data
records. The templates are also extracted, allowing the collector to use
them across exports.

The field types were extracted from the IANA assignment list at
https://www.iana.org/assignments/ipfix/ipfix-information-elements.csv

Please note that the IPFIX implementation was made from scratch and
differs from the NetFlow v9 implementation, as there was little
copy/paste.
2020-03-31 22:45:58 +02:00
Dominik Pataky 524e411850 Add first approach of IPFIX implementation
Adds a new module, IPFIX. The collector already recognizes version 10 in
the header, meaning IPFIX. The parser is able to dissect the export
package and all sets with their headers.

Missing is the handling of the templates in the data sets - a feature
needed for the whole parsing process to complete.
2020-03-31 20:58:15 +02:00
Dominik Pataky 0358c3416c Fix logger in collector; fix header dates 2020-03-31 16:28:33 +02:00
Dominik Pataky cd07885d28 Improve handling of mixed template/data exports; add test
The collector is able to parse templates in an export and then use these
templates to parse dataflows inside the same export packet. But the test
implementation was based on the assumption, that the templates always
arrive first in the packet. Now, a mixed order is also processed
successfully. Test included.
2020-03-30 16:42:48 +02:00
Dominik Pataky d4d6d59713 Provide parse_packet as API; fix parse_packet input handling; README
To get closer to a stable package, netflow now offers the parse_packet
function in its top-level __init__ file. This function was also enhanced
to handle multiple input formats (str, bytes, hex bytes).

Updated README accordingly.
2020-03-30 13:04:25 +02:00
Dominik Pataky 7ae179cb33 Reformat data flow attributes and unpacking; adapt tests
The V1DataFlow and V5DataFlow classes used a verbose way of unpacking
the hex byte stream to the specific fields until now. With this commit,
both use a list of field names, one struct.unpack call and then a
mapping for-loop for each field.

Additionally the upper boundary of the passed data slice was added.

With the self.__dict__.update() call all fields are now also accessible
as direct attributes of the corresponding instance, e.g. flow.PROTO to
access flow.data["PROTO"]. This works for flows of all three versions.

The tests were adapted to reflect this new implementation.
2020-03-30 12:29:50 +02:00
Dominik Pataky 8b70fb1058 Fix to_dict() in headers; formatting
The collector uses the .to_dict() function to persist the header in its
gzipped output file. Now all headers implement this function.
2020-03-29 23:17:05 +02:00
Dominik Pataky 4a90e0ce34 Update README, bump minor version to v0.9.0 2020-03-29 22:34:30 +02:00
Dominik Pataky abce1f57dd Move collector and analyzer into the package, callable via CLI
Beginning with this commit, the reference implementations of the
collector and analyzer are now included in the package. They are
callable by running `python3 -m netflow.collector` or `.analyzer`, with
the same flags as before. Use `-h` to list them.

Additional fixes are contained in this commit as well, e.g. adding more
version prefixes and moving parts of code from __init__ to utils, to fix
circular imports.
2020-03-29 22:14:45 +02:00
Dominik Pataky e8073013c1 Rename classes in v1, v5 and v9 according to version
Until now, every NetFlow version file used similar names for their
classes, e.g. "Header". These are now prefixed with their respective
version, e.g. "V1Header", to avoid confusion in imports etc.
2020-03-29 19:49:57 +02:00
Dominik Pataky 5fd4e9bd24 Update README/setup.py; add .json property to v9 header for export
The README and setup.py were adapted to the current state, preparing for
PyPI upload and package info.

In v9, the header received an additional .json property, which exports
the header as a dict to allow JSON serialization in the export file.
This export is used in main.py
2020-03-29 18:06:52 +02:00
Dominik Pataky 61439ec6ef Improve analyzer (handling of pairs, dropping noise)
Previously, the analyzer assumed that two consecutive flows would be a
pair. This proved unreliable, therefore a new comparison algorithm is
ussed. It utilizes the IP addresses and the 'first_switched' parameter
to identify two flows of the same connection.

More improvements can be done, especially filtering and in the
identification of the initiating peer.

Tests still fail, have to be adapted to the new dicts and gzip.
2019-11-03 15:58:40 +01:00
Dominik Pataky 1646a52f17 Store IP addresses (v4 + v6) as strings rather than ints
As mentioned by @pR0Ps in 6b9d20c8a6/analyze_json.py (L83)
IP addresses, especially in IPv6, should better be stored as parsed
strings instead of their raw integer values. Implemented.
2019-11-03 13:35:32 +01:00
Dominik Pataky bfec3953e6 Bump version, fix small errors, decrease packet num in tests 2019-10-31 17:35:15 +01:00
Carey Metcalfe 96817f1f8d Add support for v1 and v5 NetFlow packets
Thanks to @alieissa for the initial v1 and v5 code
2019-10-16 23:46:32 -04:00
Carey Metcalfe ef151f8d28 Improve collector script and restructure code
- Moved the netflow library out of the src directory
- The UDP listener was restructured so that multiple threads can receive
  packets and push them into a queue. The main thread then pulls the
  packets off the queue one at a time and processes them. This means
  that the collector will never drop a packet because it was blocked on
  processing the previous one.
- Adds a property to the ExportPacket class to expose if any new
  templates are contained in it.
- The collector will now only retry parsing past packets when a new
  template is found. Also refactored the retry logic a bit to remove
  duplicate code (retrying just pushes the packets back into the main
  queue to be processed again like all the other packets).
- The collector no longer continually reads and writes to/from the disk.
  It just caches the data in memory until it exits instead.
2019-10-16 23:31:39 -04:00