netflow/netflow/v5.py

#!/usr/bin/env python3

"""
Netflow V5 collector and parser implementation in Python 3.
This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd.
Created purely for fun. Not battled tested nor will it be.

Reference: https://www.cisco.com/c/en/us/td/docs/net_mgmt/netflow_collection_engine/3-6/user/guide/format.html
This script is specifically implemented in combination with softflowd. See https://github.com/djmdjm/softflowd
"""

import struct

__all__ = ["V5DataFlow", "V5ExportPacket", "V5Header"]


class V5DataFlow:
    """Holds one v5 DataRecord
    """
    length = 48

    def __init__(self, data):
        pack = struct.unpack("!IIIHHIIIIHHxBBBHHBBxx", data)
        fields = [
            'IPV4_SRC_ADDR',
            'IPV4_DST_ADDR',
            'NEXT_HOP',
            'INPUT',
            'OUTPUT',
            'IN_PACKETS',
            'IN_OCTETS',
            'FIRST_SWITCHED',
            'LAST_SWITCHED',
            'SRC_PORT',
            'DST_PORT',
            # Byte 36 is used for padding
            'TCP_FLAGS',
            'PROTO',
            'TOS',
            'SRC_AS',
            'DST_AS',
            'SRC_MASK',
            'DST_MASK',
            # Word 46 is used for padding
        ]

        self.data = {}
        for idx, field in enumerate(fields):
            self.data[field] = pack[idx]
        self.__dict__.update(self.data)  # Make data dict entries accessible as object attributes

    def __repr__(self):
        return "<DataRecord with data {}>".format(self.data)


class V5Header:
    """The header of the V5ExportPacket
    """
    length = 24

    def __init__(self, data):
        pack = struct.unpack('!HHIIIIBBH', data[:self.length])
        self.version = pack[0]
        self.count = pack[1]
        self.uptime = pack[2]
        self.timestamp = pack[3]
        self.timestamp_nano = pack[4]
        self.sequence = pack[5]
        self.engine_type = pack[6]
        self.engine_id = pack[7]
        self.sampling_interval = pack[8]

    def to_dict(self):
        return self.__dict__


class V5ExportPacket:
    """The flow record holds the header and data flowsets.
    """

    def __init__(self, data):
        self.flows = []
        self.header = V5Header(data)

        offset = self.header.length
        for flow_count in range(0, self.header.count):
            end = offset + V5DataFlow.length
            flow = V5DataFlow(data[offset:end])
            self.flows.append(flow)
            offset += flow.length

    def __repr__(self):
        return "<ExportPacket v{} with {} records>".format(
            self.header.version, self.header.count)
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00			`#!/usr/bin/env python3`

			`"""`
			`Netflow V5 collector and parser implementation in Python 3.`
Reformat data flow attributes and unpacking; adapt tests The V1DataFlow and V5DataFlow classes used a verbose way of unpacking the hex byte stream to the specific fields until now. With this commit, both use a list of field names, one struct.unpack call and then a mapping for-loop for each field. Additionally the upper boundary of the passed data slice was added. With the self.__dict__.update() call all fields are now also accessible as direct attributes of the corresponding instance, e.g. flow.PROTO to access flow.data["PROTO"]. This works for flows of all three versions. The tests were adapted to reflect this new implementation. 2020-03-30 12:29:50 +02:00			`This file belongs to https://github.com/bitkeks/python-netflow-v9-softflowd.`
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00			`Created purely for fun. Not battled tested nor will it be.`

			`Reference: https://www.cisco.com/c/en/us/td/docs/net_mgmt/netflow_collection_engine/3-6/user/guide/format.html`
Rename classes in v1, v5 and v9 according to version Until now, every NetFlow version file used similar names for their classes, e.g. "Header". These are now prefixed with their respective version, e.g. "V1Header", to avoid confusion in imports etc. 2020-03-29 19:49:57 +02:00			`This script is specifically implemented in combination with softflowd. See https://github.com/djmdjm/softflowd`
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00			`"""`

			`import struct`

Move collector and analyzer into the package, callable via CLI Beginning with this commit, the reference implementations of the collector and analyzer are now included in the package. They are callable by running `python3 -m netflow.collector` or `.analyzer`, with the same flags as before. Use `-h` to list them. Additional fixes are contained in this commit as well, e.g. adding more version prefixes and moving parts of code from __init__ to utils, to fix circular imports. 2020-03-29 22:14:45 +02:00			`__all__ = ["V5DataFlow", "V5ExportPacket", "V5Header"]`

Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00
Rename classes in v1, v5 and v9 according to version Until now, every NetFlow version file used similar names for their classes, e.g. "Header". These are now prefixed with their respective version, e.g. "V1Header", to avoid confusion in imports etc. 2020-03-29 19:49:57 +02:00			`class V5DataFlow:`
			`"""Holds one v5 DataRecord`
			`"""`
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00			`length = 48`

			`def __init__(self, data):`
Reformat data flow attributes and unpacking; adapt tests The V1DataFlow and V5DataFlow classes used a verbose way of unpacking the hex byte stream to the specific fields until now. With this commit, both use a list of field names, one struct.unpack call and then a mapping for-loop for each field. Additionally the upper boundary of the passed data slice was added. With the self.__dict__.update() call all fields are now also accessible as direct attributes of the corresponding instance, e.g. flow.PROTO to access flow.data["PROTO"]. This works for flows of all three versions. The tests were adapted to reflect this new implementation. 2020-03-30 12:29:50 +02:00			`pack = struct.unpack("!IIIHHIIIIHHxBBBHHBBxx", data)`
			`fields = [`
			`'IPV4_SRC_ADDR',`
			`'IPV4_DST_ADDR',`
			`'NEXT_HOP',`
			`'INPUT',`
			`'OUTPUT',`
			`'IN_PACKETS',`
			`'IN_OCTETS',`
			`'FIRST_SWITCHED',`
			`'LAST_SWITCHED',`
			`'SRC_PORT',`
			`'DST_PORT',`
			`# Byte 36 is used for padding`
			`'TCP_FLAGS',`
			`'PROTO',`
			`'TOS',`
			`'SRC_AS',`
			`'DST_AS',`
			`'SRC_MASK',`
			`'DST_MASK',`
			`# Word 46 is used for padding`
			`]`

Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00			`self.data = {}`
Reformat data flow attributes and unpacking; adapt tests The V1DataFlow and V5DataFlow classes used a verbose way of unpacking the hex byte stream to the specific fields until now. With this commit, both use a list of field names, one struct.unpack call and then a mapping for-loop for each field. Additionally the upper boundary of the passed data slice was added. With the self.__dict__.update() call all fields are now also accessible as direct attributes of the corresponding instance, e.g. flow.PROTO to access flow.data["PROTO"]. This works for flows of all three versions. The tests were adapted to reflect this new implementation. 2020-03-30 12:29:50 +02:00			`for idx, field in enumerate(fields):`
			`self.data[field] = pack[idx]`
			`self.__dict__.update(self.data) # Make data dict entries accessible as object attributes`
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00
			`def __repr__(self):`
			`return "<DataRecord with data {}>".format(self.data)`


Rename classes in v1, v5 and v9 according to version Until now, every NetFlow version file used similar names for their classes, e.g. "Header". These are now prefixed with their respective version, e.g. "V1Header", to avoid confusion in imports etc. 2020-03-29 19:49:57 +02:00			`class V5Header:`
			`"""The header of the V5ExportPacket`
			`"""`
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00			`length = 24`

			`def __init__(self, data):`
Reformat data flow attributes and unpacking; adapt tests The V1DataFlow and V5DataFlow classes used a verbose way of unpacking the hex byte stream to the specific fields until now. With this commit, both use a list of field names, one struct.unpack call and then a mapping for-loop for each field. Additionally the upper boundary of the passed data slice was added. With the self.__dict__.update() call all fields are now also accessible as direct attributes of the corresponding instance, e.g. flow.PROTO to access flow.data["PROTO"]. This works for flows of all three versions. The tests were adapted to reflect this new implementation. 2020-03-30 12:29:50 +02:00			`pack = struct.unpack('!HHIIIIBBH', data[:self.length])`
			`self.version = pack[0]`
			`self.count = pack[1]`
			`self.uptime = pack[2]`
			`self.timestamp = pack[3]`
			`self.timestamp_nano = pack[4]`
			`self.sequence = pack[5]`
			`self.engine_type = pack[6]`
			`self.engine_id = pack[7]`
			`self.sampling_interval = pack[8]`
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00
Fix to_dict() in headers; formatting The collector uses the .to_dict() function to persist the header in its gzipped output file. Now all headers implement this function. 2020-03-29 23:13:22 +02:00			`def to_dict(self):`
			`return self.__dict__`

Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00
			`class V5ExportPacket:`
Rename classes in v1, v5 and v9 according to version Until now, every NetFlow version file used similar names for their classes, e.g. "Header". These are now prefixed with their respective version, e.g. "V1Header", to avoid confusion in imports etc. 2020-03-29 19:49:57 +02:00			`"""The flow record holds the header and data flowsets.`
			`"""`
Ensure compatibility with Python 3.5.3 This commit replaces multiple occurences of new features which were not yet implemented with Python 3.5.3, which is the reference backwards compatibility version for this package. The version is based on the current Python version in Debian Stretch (oldstable). According to pkgs.org, all other distros use 3.6+, so 3.5.3 is the lower boundary. Changes: * Add maxsize argument to functools.lru_cache decorator * Replace f"" with .format() * Replace variable type hints "var: type = val" with "# type:" comments * Replace pstats.SortKey enum with strings in performance tests Additionally, various styling fixes were applied. The version compatibility was tested with tox, pyenv and Python 3.5.3, but there is no tox.ini yet which automates this test. Bump patch version number to 0.10.3 Update author's email address. Resolves #27 2020-04-24 16:34:37 +02:00
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00			`def __init__(self, data):`
			`self.flows = []`
Rename classes in v1, v5 and v9 according to version Until now, every NetFlow version file used similar names for their classes, e.g. "Header". These are now prefixed with their respective version, e.g. "V1Header", to avoid confusion in imports etc. 2020-03-29 19:49:57 +02:00			`self.header = V5Header(data)`
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00
			`offset = self.header.length`
			`for flow_count in range(0, self.header.count):`
Reformat data flow attributes and unpacking; adapt tests The V1DataFlow and V5DataFlow classes used a verbose way of unpacking the hex byte stream to the specific fields until now. With this commit, both use a list of field names, one struct.unpack call and then a mapping for-loop for each field. Additionally the upper boundary of the passed data slice was added. With the self.__dict__.update() call all fields are now also accessible as direct attributes of the corresponding instance, e.g. flow.PROTO to access flow.data["PROTO"]. This works for flows of all three versions. The tests were adapted to reflect this new implementation. 2020-03-30 12:29:50 +02:00			`end = offset + V5DataFlow.length`
			`flow = V5DataFlow(data[offset:end])`
Add support for v1 and v5 NetFlow packets Thanks to @alieissa for the initial v1 and v5 code 2019-10-17 05:23:51 +02:00			`self.flows.append(flow)`
			`offset += flow.length`

			`def __repr__(self):`
			`return "<ExportPacket v{} with {} records>".format(`
Ensure compatibility with Python 3.5.3 This commit replaces multiple occurences of new features which were not yet implemented with Python 3.5.3, which is the reference backwards compatibility version for this package. The version is based on the current Python version in Debian Stretch (oldstable). According to pkgs.org, all other distros use 3.6+, so 3.5.3 is the lower boundary. Changes: * Add maxsize argument to functools.lru_cache decorator * Replace f"" with .format() * Replace variable type hints "var: type = val" with "# type:" comments * Replace pstats.SortKey enum with strings in performance tests Additionally, various styling fixes were applied. The version compatibility was tested with tox, pyenv and Python 3.5.3, but there is no tox.ini yet which automates this test. Bump patch version number to 0.10.3 Update author's email address. Resolves #27 2020-04-24 16:34:37 +02:00			`self.header.version, self.header.count)`