Making a hot spare for the Crowdstrike TA

So in my environment, I have different tiers of forwarders that perform different tasks.  I have the usual universal forwarders and then I have my Heavy Forwarders.  I have two HFs which are actually running in a cluster with a load balancer in front of them.  I do this because I want reliability overall but it also provides the option for server downtime should something go wrong or the server just needs a reboot.

However, not all apps run the same in splunk.   Some addons are configured to pull data from a server and forward that data to the indexers.  So the HFs are designed as receivers where data is pushed to them, while I have another set of HFs that are in charge or pulling data from remote servers.   I refer to these servers as “pullers”.

But pulling data in a cluster is a problem because, unlike the search head clusters, there is no captain that controls the data to be pulled and where the addon left off. So we can only have one puller active at any time.  We have a spare puller on standby as a warm spare.  But what happens when the active node goes down?

Well for most apps/addons on the 2nd puller, it can rely on the inputs.conf parameter “ignoreOlderThan” to ignore events older than X days/hours/etc.  For some apps, like Rapid7, the bookmark used to note the last event pulled is stored in that server’s KVstore.

But Crowdstrike wasn’t so handy.  When I failed over to the 2nd puller, it started pulling data since the first event was recorded for our account.  This led to a lot of duplicate events and false alarms.  I asked CS how to avoid this and they said the offset value stored in the inputs.conf file is the marker where the forwarder starts collecting data.  So a zero value offset starts at the beginning.

I then asked where can this offset value be obtained and they were not able to locate that information.  Fortunately, that info was in the addon’s logs.

So I took this info and used it to narrow down all the logs from that sourcetype with the key “consuming”.

index=_internal sourcetype=ta-crowdstrike_ucc_lib-2 "consuming" 

Now that these logs are identified, I need to make an extract to define the placeholder value.
[ta-crowdstrike_ucc_lib-2]
EXTRACT-placeholder = ^.*for\s'\w+-\w+-\w+-\w+-\w+'\sfrom\s(?P<placeholder>.+)

Now that these logs are identified, I need to make an extract to define the placeholder value.

Now that the offset is getting extracted to field name “placeholder”, I can use this in my splunk query to locate the latest/highest value. 
index=_internal sourcetype=ta-crowdstrike_ucc_lib-2 "start consuming" | stats max(placeholder)

Once I found the highest value, I took this offset and used it as the offset value on an independent splunk instance that is sending to a test index.   When tested on this new server, it used the new offset value by starting on the next integer.

I checked the search results and it had only indexed events that were created during the time of that offset value.  So I now how the offset value I need.  Now I just need to extract it from splunk using automation.

I leveraged the API to run the search and extract the placeholder.

$ curl -u USERNAME:PASSWORD https://SPLUNK:8089/services/search/jobs -d search="search index=_internal sourcetype=ta-crowdstrike_ucc_lib-2 "start consuming" host=forwarder_name | stats max(placeholder)"

Caution: do not use single quotes in API SPL queries. Use double quotes and escape them as needed.

This returns the SID of the search, which I can pipe back into the API to obtain the results of the search.

[splunk@SERVER~]$ curl -u USERNAME:PASSWORD https://SPLUNK:8089/services/search/jobs/1564006340.36894_A17B22CE-90D3-4B82-976E-169244223C1E/results

Here we can see the returned value is 6951741.  That’s the offset value I need to extract.  
Now that I am getting the data I need from the API, I can move these calls to a script to do the rest.  
From here I wrote a script that performs the above calls to grab the offset data.  Then it takes that value and places it in the correct app file (listed below) on the deployment server. 
/opt/splunk/etc/deployment-apps/TA-crowdstrike/local/crowdstrike_falcon_host_inputs.conf
This means that the deployment server always has the latest offset value stored in the configuration that is pushed out to the clients.  
However, ensure your deployment server has this app set to ENABLE only.  Do not set it to restart splunkd.  Keeping restart enabled could cause a restart of splunk every time the offset value is changed. 

All that’s left is to add the script to the crontab so it’s always updating that value.  

Now if I activate the standby puller, it will grab the latest offset value and start from there without creating duplicate data on our indexers. 

Leave a comment