Commit graph

45 commits

Author SHA1 Message Date
Dominik Pataky 5fd4e9bd24 Update README/setup.py; add .json property to v9 header for export
The README and setup.py were adapted to the current state, preparing for
PyPI upload and package info.

In v9, the header received an additional .json property, which exports
the header as a dict to allow JSON serialization in the export file.
This export is used in main.py
2020-03-29 18:06:52 +02:00
Dominik Pataky f8c5717002 Extend analyzer with --no-dns and --match-host; fixes
This commit extends the analyzer script with two new flags:
* Adding --no-dns disables hostname DNS resolution, improving speed
* Adding --match-host <IP address> filters all flows not matching the IP

Additional small things were changed, the script is still work in
progress. Especially the "pairing" of two flows will be removed in
future versions.
2020-03-19 18:16:03 +01:00
Dominik Pataky 4639601798 Extend logging in collector (with --verbose) 2020-03-19 18:16:03 +01:00
Dominik Pataky 0d8f1a2ecb Add ParsedPacket named tuple for queue; extend tests
Before, the output queue of the collector received unnamed tuples with
three fields. This broke the tests and was less understandable. The new
version uses a named tuple for clarity.

The tests were adapted to the new type in the queue and are fixed.

For backwards compatibility a check of the Python version is added and
the subprocess stdout/stderr arguments are passed depending on this
version. See #18.
2020-03-19 18:16:03 +01:00
Dominik Pataky 290e822176 Add IPv6 local interface address handling
With the '--host' flag, a local interface IP address can be set on which
the collector listens for incoming flows. Since now, this only worked
with IPv4 addresses (using the default 0.0.0.0 interface). The commit
adds handling of passed-in IPv6 addresses by identifying ":" and then
switching to the AF_INET6 socket family.
2020-03-19 18:12:50 +01:00
cookie 647f4b3748
Merge pull request #19 from grafolean/fix/tests
Fix failing tests (wrong index when accessing netflow records)
2020-03-19 15:38:27 +00:00
Anze 096c7d6f4f Fix failing tests (wrong index when accessing netflow records) 2020-02-22 23:20:36 +01:00
Dominik Pataky 565f829945 Add verbose flag to analyzer
Adds a new flag, '-v' or '--verbose', to the analyzer.py script. It uses
a new print method and also skips some parts of the script if not passed
on the CLI.
2020-01-20 17:01:50 +01:00
Dominik Pataky adb02eab24 Update to 2020 in file headers; update the analyzer file name in README
The analyzer is now found in analyzer.py and uses the '-f' flag for
GZIPed input files. Bundled with the previous PR commit, this update
should now be clearer.
2020-01-20 16:59:36 +01:00
cookie 52d357b111
Merge pull request #12 from kaysiz/patch-1
Update README.md to match new file format (GZIP instead of JSON).
Thanks for the PR!
2020-01-20 15:13:14 +00:00
kudakwashe siziva 59652f7d2f
Update README.md
Changed file extension from json to gz
2020-01-17 10:43:21 +02:00
Dominik Pataky 61439ec6ef Improve analyzer (handling of pairs, dropping noise)
Previously, the analyzer assumed that two consecutive flows would be a
pair. This proved unreliable, therefore a new comparison algorithm is
ussed. It utilizes the IP addresses and the 'first_switched' parameter
to identify two flows of the same connection.

More improvements can be done, especially filtering and in the
identification of the initiating peer.

Tests still fail, have to be adapted to the new dicts and gzip.
2019-11-03 15:58:40 +01:00
Dominik Pataky eff99fc6e3 Add client info to stored data
Until now, packets arriving at the collector's interface were stored by
timestamp, with the exported flows in the payload. This format is now
extended to also store the client's IP address and port, allowing
multiple clients to export flows to the same collector instance.
2019-11-03 13:57:06 +01:00
Dominik Pataky 1646a52f17 Store IP addresses (v4 + v6) as strings rather than ints
As mentioned by @pR0Ps in 6b9d20c8a6/analyze_json.py (L83)
IP addresses, especially in IPv6, should better be stored as parsed
strings instead of their raw integer values. Implemented.
2019-11-03 13:35:32 +01:00
Dominik Pataky 6b9d20c8a6 Refactor storing data and writing to disk - using gzip and lines
In previous versions, collected flows (parsed data) were stored in
memory by the collector. In regular intervals, or at shutdown, this one
single dict was dumped as JSON onto disk.

With this commit, the behaviour is changed to line-based JSON dumps for
each flow, gzipped onto disk for storage efficiency. The analyze_json is
updated as well to handle the new gzipped files in the new format.

See the comments in main.py for more details.

Fixes #10
2019-11-03 12:02:05 +01:00
Dominik Pataky 3dee135a22 Merge branch 'props-master'
Merging pull request #9 by @pR0Ps
https://github.com/bitkeks/python-netflow-v9-softflowd/pull/9

Thanks for the contribution!

Resolves #9
2019-10-31 18:02:06 +01:00
Dominik Pataky 9f16d246a5 Add v1, v5 to README; change fallback; add timeout parameter
Updated the README to reference NetFlow v1 and v5 as well.

The fallback(key, dict) method used an exception-based testing of the
keys existence. Switched to 'if x in'.

The NetFlowListener is based on threading.Thread, which uses the
'timeout' parameter in .join(). Added.
2019-10-31 17:55:48 +01:00
Dominik Pataky bfec3953e6 Bump version, fix small errors, decrease packet num in tests 2019-10-31 17:35:15 +01:00
Carey Metcalfe 345a5b08ff Fix setup.py file 2019-10-16 23:46:32 -04:00
Carey Metcalfe bf92f24669 Add test for invalid packets 2019-10-16 23:46:32 -04:00
Carey Metcalfe 96817f1f8d Add support for v1 and v5 NetFlow packets
Thanks to @alieissa for the initial v1 and v5 code
2019-10-16 23:46:32 -04:00
Carey Metcalfe 186b648c4d Fix tests
Uses the analyzer's new stdin-reading capabilities to test the analysis
without having to write temporary files. Also removes most of the delays
because the listener can keep up now.
2019-10-16 23:44:28 -04:00
Carey Metcalfe 8e6d0c54e8 Allow analyze_json.py to accept input via stdin
This will make testing much cleaner in the future (no temp files needed)

Also increase performance by memoizing the hostname lookup
2019-10-16 23:44:19 -04:00
Carey Metcalfe 11dc92269c Refactor code to make programatic access to flows easier
This commit splits the packet collecting and processing out into a
thread that provides a queue-like `get(block=True, timeout=None)`
function for getting processed `ExportPackets`.

This makes it much easier to use rather than starting a generator and
sending a value to it when you want to stop. The `get_export_packets`
generator is an example of using it - it just starts the thread and
yields values from it.
2019-10-16 23:33:22 -04:00
Carey Metcalfe ef151f8d28 Improve collector script and restructure code
- Moved the netflow library out of the src directory
- The UDP listener was restructured so that multiple threads can receive
  packets and push them into a queue. The main thread then pulls the
  packets off the queue one at a time and processes them. This means
  that the collector will never drop a packet because it was blocked on
  processing the previous one.
- Adds a property to the ExportPacket class to expose if any new
  templates are contained in it.
- The collector will now only retry parsing past packets when a new
  template is found. Also refactored the retry logic a bit to remove
  duplicate code (retrying just pushes the packets back into the main
  queue to be processed again like all the other packets).
- The collector no longer continually reads and writes to/from the disk.
  It just caches the data in memory until it exits instead.
2019-10-16 23:31:39 -04:00
Dominik Pataky ce2be709d6 Update README + LICENSE 2019-03-31 21:37:13 +02:00
Dominik Pataky 8de110980c Add tests for the collector (main.py). 2019-03-31 21:23:24 +02:00
Dominik Pataky 85e6af4bd2 Add buffering of exports with unknown template
Until now, exports which were received, but their template was not known,
resulted in KeyError exceptions due to a missing key in the template dict.
With this release, these exports are buffered until a template export
updates this dict, and all buffered exports are again examined.

Release v0.7.0

Fixes #4
Fixes #5
2019-03-31 20:51:34 +02:00
Dominik Pataky 5c7ec0aef8 Add additional field types (ASA, PANOS) and set fallback type to 0
refs #4 @ Github
2018-06-15 13:48:17 +02:00
Dominik Pataky 9395aafa71 Fix missing IP_PROTOCOL_VERSION field in analyzer
Checks for the key first and handles non-existence.
Update to Copyright notices.

Fixes #3
2018-02-20 12:09:54 +01:00
Dominik Pataky 691a3480fd Add duration to Connection 2017-10-29 19:38:33 +01:00
Dominik Pataky 6c267c8c77 Bump to 0.6; expand analyzer 2017-10-29 11:53:32 +01:00
Dominik Pataky 898d220a91 Add JSON export and analyzing example script 2017-10-28 19:00:18 +02:00
Dominik Pataky 92d8e724bf Fix merge for Python3 2017-10-28 17:34:55 +02:00
cookie 9df5bd426e
Merge pull request #2 from deeso/master
Created an installable Python Package
2017-10-28 17:19:29 +02:00
Adam Pridgen 23bc00a316 typo in logging message 2017-09-16 14:15:34 -05:00
Adam Pridgen e11105e950 added setup main file 2017-09-16 14:11:44 -05:00
Doм 7b24ae51e0 Merge pull request #1 from randerzander/master
Thanks for contributing @randerzander !
2016-12-12 18:46:06 +01:00
Randy Gelhausen bd22551669 converted hardcoded host/port to arg driven, switched int.from_bytes to Python2 friendly routine 2016-11-29 22:50:09 -05:00
Dominik Pataky 8fa999b877 Remove namedtuples import (old version) 2016-08-10 23:10:11 +02:00
Dominik Pataky aa2a8d8458 Add LICENSE and README.md 2016-08-10 22:47:35 +02:00
Dominik Pataky 546f96122f Fix datarecord saving bug; cleanup; license 2016-08-10 22:33:57 +02:00
Dominik Pataky 2d7c905d41 Parsing finished, bug in datarecord lists 2016-08-10 20:38:07 +02:00
Dominik Pataky 1be7552e06 Add classes 2016-08-10 18:55:38 +02:00
Dominik Pataky 6cf8356456 Basic implementation of udp socket listener and FlowRecord 2016-08-10 16:28:29 +02:00