A Splunk MLTK prototype for detecting anomalous VPN logins.
This project is a prototype submitted for the HackerEarth Splunk Build-a-thon (Track 4: AI/ML). It uses Splunk’s Machine Learning Toolkit (MLTK) to detect anomalous VPN logins based on geography and time.
As described in the “VPN Login from an Unusual Location per User Model” use case, traditional security tools struggle to detect credential theft when valid credentials are used. This model establishes a baseline of normal login behavior for each user (typical locations and hours) and flags any significant deviations as potential security threats.
vpn_logins.csv
which contains timestamp
, user
, and src_ip
.iplocation
command to get the latitude
and longitude
from the src_ip
.login_hour
from the timestamp and converting it to a number.KMeans
algorithm from MLTK is used to cluster user behavior.
lat
, lon
, and login_hour
features.by user
clause creates a unique behavioral baseline (a cluster center) for every single user.vpn_model_final
.apply
command runs new login events against the model. It calculates the cluster_distance
for each event, which is a numerical score of how far that login is from the user’s normal behavior. A high score indicates a likely anomaly.Model Training:
| inputlookup vpn_logins.csv | eval _time = strptime(timestamp, "%Y-%m-%dT%H:%M:%SZ") | iplocation src_ip | eval login_hour = tonumber(strftime(_time, "%H")) | where isnotnull(lat) AND isnotnull(lon) AND isnotnull(login_hour) | fit KMeans k=1 from "lat", "lon", "login_hour" by "user" into vpn_model_final
Detection & Visualization (for the anomaly table):
| inputlookup vpn_logins.csv | eval _time = strptime(timestamp, "%Y-%m-%dT%H:%M:%SZ") | iplocation src_ip | eval login_hour = tonumber(strftime(_time, "%H")) | where isnotnull(lat) AND isnotnull(lon) AND isnotnull(login_hour) | apply vpn_model_final | sort - cluster_distance | table _time, user, City, Country, cluster_distance
Prerequisites: A Splunk Enterprise instance with the Splunk Machine Learning Toolkit (MLTK) and the Python for Scientific Computing Add-on installed.
vpn_logins.csv
from this repository as a lookup file in Splunk (Settings > Lookups > Lookup table files
). Make it globally shared.vpn_anomaly_detection.xml
. You can view its source and use it as a template to rebuild the panels using the queries above.