Until now, every NetFlow version file used similar names for their
classes, e.g. "Header". These are now prefixed with their respective
version, e.g. "V1Header", to avoid confusion in imports etc.
The README and setup.py were adapted to the current state, preparing for
PyPI upload and package info.
In v9, the header received an additional .json property, which exports
the header as a dict to allow JSON serialization in the export file.
This export is used in main.py
This commit extends the analyzer script with two new flags:
* Adding --no-dns disables hostname DNS resolution, improving speed
* Adding --match-host <IP address> filters all flows not matching the IP
Additional small things were changed, the script is still work in
progress. Especially the "pairing" of two flows will be removed in
future versions.
Before, the output queue of the collector received unnamed tuples with
three fields. This broke the tests and was less understandable. The new
version uses a named tuple for clarity.
The tests were adapted to the new type in the queue and are fixed.
For backwards compatibility a check of the Python version is added and
the subprocess stdout/stderr arguments are passed depending on this
version. See #18.
With the '--host' flag, a local interface IP address can be set on which
the collector listens for incoming flows. Since now, this only worked
with IPv4 addresses (using the default 0.0.0.0 interface). The commit
adds handling of passed-in IPv6 addresses by identifying ":" and then
switching to the AF_INET6 socket family.
Adds a new flag, '-v' or '--verbose', to the analyzer.py script. It uses
a new print method and also skips some parts of the script if not passed
on the CLI.
The analyzer is now found in analyzer.py and uses the '-f' flag for
GZIPed input files. Bundled with the previous PR commit, this update
should now be clearer.
Previously, the analyzer assumed that two consecutive flows would be a
pair. This proved unreliable, therefore a new comparison algorithm is
ussed. It utilizes the IP addresses and the 'first_switched' parameter
to identify two flows of the same connection.
More improvements can be done, especially filtering and in the
identification of the initiating peer.
Tests still fail, have to be adapted to the new dicts and gzip.
Until now, packets arriving at the collector's interface were stored by
timestamp, with the exported flows in the payload. This format is now
extended to also store the client's IP address and port, allowing
multiple clients to export flows to the same collector instance.
As mentioned by @pR0Ps in 6b9d20c8a6/analyze_json.py (L83)
IP addresses, especially in IPv6, should better be stored as parsed
strings instead of their raw integer values. Implemented.
In previous versions, collected flows (parsed data) were stored in
memory by the collector. In regular intervals, or at shutdown, this one
single dict was dumped as JSON onto disk.
With this commit, the behaviour is changed to line-based JSON dumps for
each flow, gzipped onto disk for storage efficiency. The analyze_json is
updated as well to handle the new gzipped files in the new format.
See the comments in main.py for more details.
Fixes#10
Updated the README to reference NetFlow v1 and v5 as well.
The fallback(key, dict) method used an exception-based testing of the
keys existence. Switched to 'if x in'.
The NetFlowListener is based on threading.Thread, which uses the
'timeout' parameter in .join(). Added.
Uses the analyzer's new stdin-reading capabilities to test the analysis
without having to write temporary files. Also removes most of the delays
because the listener can keep up now.
This commit splits the packet collecting and processing out into a
thread that provides a queue-like `get(block=True, timeout=None)`
function for getting processed `ExportPackets`.
This makes it much easier to use rather than starting a generator and
sending a value to it when you want to stop. The `get_export_packets`
generator is an example of using it - it just starts the thread and
yields values from it.
- Moved the netflow library out of the src directory
- The UDP listener was restructured so that multiple threads can receive
packets and push them into a queue. The main thread then pulls the
packets off the queue one at a time and processes them. This means
that the collector will never drop a packet because it was blocked on
processing the previous one.
- Adds a property to the ExportPacket class to expose if any new
templates are contained in it.
- The collector will now only retry parsing past packets when a new
template is found. Also refactored the retry logic a bit to remove
duplicate code (retrying just pushes the packets back into the main
queue to be processed again like all the other packets).
- The collector no longer continually reads and writes to/from the disk.
It just caches the data in memory until it exits instead.
Until now, exports which were received, but their template was not known,
resulted in KeyError exceptions due to a missing key in the template dict.
With this release, these exports are buffered until a template export
updates this dict, and all buffered exports are again examined.
Release v0.7.0
Fixes#4Fixes#5