Creating a Virtual Splunk Deployment – Part 2 Splunk

This next section will focus on creating a distributed enterprise splunk environment. I will be using the latest version of splunk which is currently 8.0.5. While this will age out over time, most commands will be similar and should continue to work as time goes on. However, if something fails, google is your friend and splunk’s documentation should be able to bridge the gap.

While you don’t have to do this in too strict of an order so you are welcome to change things up. But if you want to learn as you go, I suggest you follow this plan.

Indexers

The indexers are the storage servers, so they need more storage space. Even if this is just for testing and you never use it, best to just provision it as needed.

Edit the VM and add 2 more hard disks. One 500 gb and the other 2000gb. You may not have that much space. but since this is a vm it wont allocate the space unless you tell it to do so. So you can just use the space you have and let the VM pretend it has that much space available.

Each indexer should get two more hard disks (totaling 4 disks). One disk will emulate HOT storage and the other COLD storage. this is not proper hot/cold storage, but the structure is only intended to demonstrate how it is used. If you really want to go hardcore, you can create WARM and FROZEN disks as well, but for this demo it’s overkill.

Restart the VMs to load the new disks into the OS. the should appear in the /dev folder.

Verify the volumes using command:
# lsblk

/dev/sdc should have 500gb and /dev/sdd should have 2000gb. if they are swapped, then you will need to adjust the fstab info below.

Use the previous chapter to initialize and partition the new disks. in the /etc/fstab file, name the mount points to reflect the storage destination.
# nano /etc/fstab

Once fstab is updated, we can now delete the mount points we created earlier. stop splunk and run these commands:
# /opt/splunk/bin/splunk stop
rm -rf /hot
rm -rf /cold

we only want to delete these mount points on the indexers.

Now reboot to activate the mount points. Be sure its perfect, a bad fstab will not boot.

License Manager

Since all our servers will be looking for a license, let’s configure the license server first. After this server is configured it will need to be running at all times.

Go to SETTINGS > LICENSING

Click ADD LICENSE. if you are able to upload your DEV license click BROWSE.

But in most cases you will need to rely on VMware Tools so you can copy/paste the XML. Click the link to activate the text box so you can paste in the XML data from the license file.

Click INSTALL. It should prompt for a restart, so go ahead and click RESTART NOW.

When it returns you should see your license listed with max of 10GB of data and 6 months expiration.

Click the EDIT link for that license.

Ensure that any indexer that connects can use this license. Change this value if needed and click SUBMIT.

Activate license clients

Now let’s activate this license on all other splunk servers. FYI, with everything the CLI method is fastest, but I’ll show the GUI method as well.

GUI METHOD:

Login to the GUI of each server. Go to SETTINGS > LICENSING.
At the top, you will see the option to change the server to a slave device. Click this button

Use the 2nd option and add the IP address of the license manager, then append :8089 as shown below.

Click SAVE then RESTART NOW

CLI METHOD:

SSH to each server. use this command to designate the license manager
# /opt/splunk/bin/splunk edit licenser-localslave -master_uri https://192.168.11.15:8089 -auth admin:’splunkpass’

Change info in the above command as needed.

Then restart splunk to take effect.
To restart splunk, use command:
# /opt/splunk/bin/splunk restart
(keep this command handy, you will often need it for times I say to restart splunk)

Indexer Cluster

First we will configure the indexer nodes, and then the master node.

use the client machine and open the terminal. you can use one terminal window and multiple terminal tabs by right clicking the terminal and selecting NEW TAB.

ssh to all indexers and the master node.

run this command on all servers
# nano /opt/splunk/etc/system/local/server.conf

On the Master Node, paste in the below text at the end of the file

[clustering]
mode = master
replication_factor = 3
search_factor = 1
pass4SymmKey = thispassword
cluster_label = indexerCluster1

A search factor of 2 is ideal but you need [repFactor] x [searchFactor] number of nodes to allow that. Since we have 3, we can do a 3×1. if you want to do a 2×2 you will need to create one additional indexer. but since search redundancy is not a big deal in dev, this will work.

thispassword can be whatever password you want. for this demo, we are keeping it simple. this is the indexer cluster password. write it down. there will be other passwords.

MASTERNODE is the IP address of the master node server. In prod you will want to use DNS. but since we didnt set that up, IP address will work just fine.
Be sure to use the INTERNAL IP ADDRESS (.11).

On the other 3 indexers, paste the following:

[clustering]
master_uri = https://MASTERNODE:8089
mode = slave
pass4SymmKey = thispassword

[replication_port://9887]

Save the server.conf file and exit.

Restart SPLUNK on the master node, and then restart splunk on the peer nodes.

Login to the Master Node web GUI and go to SETTINGS > INDEXER CLUSTERING and it should reveal that clustering is enabled and list each peer node.

Keep in mind this is the first time running things, so restarting splunk all at the same time or even the cluster all at once isn’t a bid deal right now. but normally, when everything is working correctly, only a limited amount of machines can be “down” at any single point.

indexes.conf

Now that the indexers and master node are working together, we need to make the indexes.conf file that will tell the indexers how to manage each index. But since we are using a distributed environment (as opposed to standalone), we need to distribute the indexes.conf file the right way.

So we are going to use the Master Node to push “apps” to the Indexer nodes. All splunk instances have the /opt/splunk/etc/master-apps folder, but that folder is only used by the Master Node. Anything found inside the master-apps folder will pushed to all members of the indexer cluster to the /opt/splunk/etc/slave-apps folder. Since we are doing this, we will also create an file to add some settings for all indexers.

ssh to the master node. go to the master-apps folder
# cd /opt/splunk/etc/master-apps

here we will create two folders.
# mkdir -p indexers_indexes_list/local
mkdir -p indexers_global_settings/local

Make the indexes.conf file.
# nano indexers_indexes_list/local/indexes.conf

Paste in this text and save

[volume:hot1]
path = /hot/splunk
maxVolumeDataSizeMB = 450000

[volume:cold1]
path = /cold/splunk
maxVolumeDataSizeMB = 1950000

[default]
repFactor = auto
homePath = volume:hot1/$_index_name/db
# warm located in hot volume
coldPath = volume:cold1/$_index_name/colddb
thawedPath = $SPLUNK_DB/$_index_name/thaweddb
# 1 year max
frozenTimePeriodInSecs = 31536000

[test]
maxTotalDataSizeMB = 1500
maxWarmDBCount = 20
## 2 months max for TEST index
frozenTimePeriodInSecs = 604800

[linux]
maxTotalDataSizeMB = 30000
maxDataSize = auto_high_volume
maxWarmDBCount = 200
maxMemMB=20

Lets have a look at this file while we’re here.

The first part is defining the path where splunk should store the data. this is the reason we made the /hot and /cold mount points for the indexers. splunk has 100gb of room to run the application, but we reserved the 500gb for hot/warm and 2000gb for cold storage. The maxVolumeDataSizeMB tells splunk the max size of the volume so it doesnt go over and potentially crash splunk. so you want to keep this number close to the actual max value but just a tad under.

The [default] database has the parameters ALL indexes will get by default unless you specify settings for a index with an individual stanza. this of this as inheritance. all indexes will inherit these settings first and then look for specific definitions later that will overwrite these settings.
Here we are saying all indexes use the same paths and we are using built-in macros to make life easy. so $index_name is a shortcut to an index like [firewall].
HOT1 and COLD1 were already defined in the previous stanzas.
We set frozenTimePeriodInSecs because this is often a global parameter. here we are saying the data stays in the cold storage for max of 1 year. after that it’s frozen. since we don’t have frozen, the data is deleted.

The test index is very important to have as a starter index. Since its for testing, we are defining this to use less storage than other important indexes. 1500 (MB) is per indexer.
maxWarmDBCount keeps a limit on the number of HOT/warm buckets in storage. this is helpful because if it’s not rolled to COLD quickly enough, it will fill up the HOT volume. this will be important for the large indexes. for test, we set frozenTimePeriodInSecs to expire after 2 months since the data is not meant to be kept like other indexes. we have to define these items here because otherwise they will be inherited from the default index settings.

Now we need to make an inputs file that tells the indexer to listen on port 9997
# nano indexers_global_settings/local/inputs.conf

Paste in this text and save

[splunktcp://9997]
disabled = 0

this setting is going in the app “indexers_global_settings” because that app is intended for all of the indexers. so any general settings that need to get pushed to all indexers should go in this app/folder. If we wanted to update the web GUI, we could add a web.conf file and place additional settings there.

Pushing indexer bundles

Also known as “configuration bundle actions”. this process pushes the apps/folders stored in the master-apps folder on the master node to the slave-apps folder located on all of the indexers attached to the cluster.

login to the web GUI of the master node. go to SETTINGS > INDEXER CLUSTERING

click the EDIT button and select CONFIGURATION BUNDLE ACTIONS

Click VALIDATE CHECK AND RESTART

it will then load all the apps in master-apps folder and run the configuration as if it were live and look for problems. if there is a problem it will return status FAILURE. in which case you likely have a typo somewhere you need to resolve. if it returns status SUCCESS, you are ok to push.

Click PUSH when ready to push to bundle. If a restart is required, it will perform a rolling restart where it restarts each member one-by-one. If you have enough members in the cluster, it will restart multiple members at the same time. But since we have 3, it will do one at a time.

For sanity, let’s see what happens when we have a bad validation. In the above indexes.conf file, i’ve changed the path of the disk.

Since the path does not exist it should fail at validation.

Validation returned UNSUCCESSFUL. To find out why, you can check the log on the Master Node at /opt/splunk/var/log/splunk/splunkd.log

The errors should also appear in the MESSAGES area of the GUI, but since these are low power VMs it might take a few minutes to arrive.

Search Head Deployer

The deployer pushes apps to the members in the search head cluster. This needs to be built before the search head cluster is created as you will see below.

Edit server.conf
# nano /opt/splunk/etc/system/local/server.conf

[shclustering]
pass4SymmKey = mySecretKey
shcluster_label = shcluster1

Save and exit.

We also need to add this server to the indexer cluster as a search head. This will allow the Deployer to search if needed. Paste the below text into server.conf

[clustering]
master_uri = https://192.168.11.21:8089
mode = searchhead
pass4SymmKey = thispassword

Save, exit, and restart splunk.

Search Head Cluster

Now let’s build the search head cluster. For search heads, its best to have an ODD number of members and you need minimum of 3 for a cluster, so we have 3.

The search head cluster is not managed like the indexer cluster. the apps are distributed to the members in the same way as the master node distributes its apps to members, but the cluster itself is formed by the members alone. There is a captain member that performs additional tasks other members do not perform. This is the “manager” of sorts so we will designate SH1 as the captain and build accordingly.

SSH to all members and the Deployer

Review this command to run on ALL MEMBERS. This will initialize each member to prepare the member to be joined to a cluster.
# /opt/splunk/bin/splunk init shcluster-config -auth admin:’ADMINPASS‘ -mgmt_uri https://NEW_MEMBER_IP:8089/ -replication_port PORT -conf_deploy_fetch_url https://DEPLOYER_IP:8089/ -secret SECRET_KEY -shcluster_label LABEL_NAME

Where ADMINPASS is the admin password stored in the password safe
NEW_MEMBER_IP is the IP address (or DNS hostname) of the this search head member to be added
PORT is the unused TCP port, needs to be unique to that server, so use something like 9101 for server 1 and 9102 for server 2, etc..
DEPLOYER_IP is the IP address or DNS hostname of the Search Head Deployer
SECRET_KEY is the search head secret key
LABEL_NAME is the label for the search head cluster (used for monitoring) and needs to be the same for each member of that cluster.

These are the commands for all of our search heads
SH1
# /opt/splunk/bin/splunk init shcluster-config -auth admin:’splunkpass’ -mgmt_uri https://192.168.11.12:8089 -replication_port 9101 -replication_factor 2 -conf_deploy_fetch_url https://192.168.11.15:8089 -secret mySecretKey -shcluster_label shcluster1

SH2
# /opt/splunk/bin/splunk init shcluster-config -auth admin:’splunkpass’ -mgmt_uri https://192.168.11.13:8089 -replication_port 9102 -replication_factor 2 -conf_deploy_fetch_url https://192.168.11.15:8089 -secret mySecretKey -shcluster_label shcluster1

SH3
# /opt/splunk/bin/splunk init shcluster-config -auth admin:’splunkpass’ -mgmt_uri https://192.168.11.14:8089 -replication_port 9103 -replication_factor 2 -conf_deploy_fetch_url https://192.168.11.15:8089 -secret mySecretKey -shcluster_label shcluster1

Run the above 3 commands on the CORRECT search heads and restart splunk on each member.

On the member you wish to make captain, here is the command syntax:
# /opt/splunk/bin/splunk bootstrap shcluster-captain -servers_list “https://SH1_IP:8089,https://SH2_IP:8089,https://SH3_IP:8089″ -auth admin:’ADMINPASS
Where SH1,SH2,SH3 are the IPs of each member
ADMINPASS is the admin password for the splunk server you are using

The command should look like this:
# /opt/splunk/bin/splunk bootstrap shcluster-captain -servers_list “https://192.168.11.12:8089,https://192.168.11.13:8089,https://192.168.11.14:8089” -auth admin:’splunkpass’

The captain is now ready. You can view cluster status with this command:
# /opt/splunk/bin/splunk show shcluster-status -auth admin:’splunkpass’

Finally, run this command on all search head cluster members and the deployer. this command tells the splunk instance to associate with the indexer cluster as a search head. this allows the search head to pull specific data from the indexers, most importantly the list of available indexes.
# /opt/splunk/bin/splunk edit cluster-config -mode searchhead -master_uri https://MASTER_NODE:8089 -secret SECRET -auth admin:’ADMINPASS
Where MASTER_NODE is the IP address or DNS hostname of the Master Node (aka Indexer Cluster Master)
SECRET is the Indexer Cluster password
ADMINPASS is the admin password for the splunk server you are using

restart splunk, one at a time. from this point on, only one search head can be down at any point. More than one will break the cluster.

Login to the GUI of one of the search heads. go to SETTINGS > SEARCH HEAD CLUSTERING. You should see each member listed. This may take a few minutes since these are low power VMs.

We will finish with the forwarding tier in Part 3.

Creating a Virtual Splunk Deployment – Part 3 More Splunk

We only have a few more items to setup before we can start using the deployment.

Indexer Discovery

Before we can use any forwarders, we need an easy way for forwarders to know where to send the data. There are two ways, each with advantages and disadvantages. The Splunk recommended way is Indexer Discovery. To enable Indexer Discovery, ssh to the master node.

Edit server.conf
# nano /opt/splunk/etc/system/local/server.conf

Paste in the below text at the end of the file:

[indexer_discovery]
pass4SymmKey = discovery

Restart splunk on the Master Node

Now the MN is ready to receive polling requests from forwarders using that set password.

Index paths

In some cases, we need to create the storage paths as defined in indexes.conf. They won’t actually be used, so it won’t occupy disk space. But we need to give splunk the access it needs to work. So run these commands on ALL SERVERS EXCEPT THE INDEXERS.
# sudo mkdir /hot && sudo chown splunk /hot
sudo mkdir /cold && sudo chown splunk /cold

this step must be performed or splunk will break when it restarts because it will be looking for these folder locations. The indexers already have these paths so that’s why they don’t need this step.

Deployment Client

A deployment client simply pings the deployment server for apps and takes whatever the DS says it should receive. So we need to tell the client devices where the DS is located.

SSH to the following devices:

  • Deployment Server
  • Heavy Forwarder
  • Monitoring Server
  • Deployer
  • Master Node

And run the following command on each server:
# /opt/splunk/bin/splunk set deploy-poll 192.168.11.11:8089 -auth admin:’splunkpass’

No need to restart splunk after this change.

NOTE: while the Master Node and Deployer are clients of the deployment server, this is not how we will be pushing apps to the search head cluster and indexer cluster. These apps are meant for the master node and deployer to use only for those servers. More on this later.

Deployment Server

To activate a splunk instance as a Deployment Server, all you need to do is place an app (or folder) in the deployment-apps folder. But we need to create an app anyway, so let’s do that now.

Run these commands:
# mkdir -p /opt/splunk/etc/deployment-apps/global_forwarders_outputs/local/
nano /opt/splunk/etc/deployment-apps/global_forwarders_outputs/local/outputs.conf

Paste in the below text to the new file and save.

[indexer_discovery:indexerCluster1]
pass4SymmKey = discovery
master_uri = https://192.168.11.21:8089

[tcpout:groupName]
indexerDiscovery = indexerCluster1
autoLBFrequency = 30
forceTimebasedAutoLB = true
useACK=true

[tcpout]
defaultGroup = groupName

Ensure the IP address used above is the correct internal IP for the master node.
save the file and exit.

Login to the web GUI for the deployment server (DS). Go to SETTINGS > FORWARDER MANAGEMENT.

Click APPS and you should see the app we just created.

We also need to make an app that is a clone of the indexes.conf app on the master node. Use command:
# mkdir -p /opt/splunk/etc/deployment-apps/global_indexes_list/local/
nano /opt/splunk/etc/deployment-apps/global_indexes_list/local/indexes.conf

Now go back to the indexes.conf file in the Master Node configuration and paste in all the same data to the app. Don’t worry about having two copies of the same file. We can address that in another session. For now, we just need to get the Forwarders to know what indexes are available.

In the CLIENTS tab, you should see all you

Now click the SERVER CLASSES tab. Click the link to create a new class

Name the app global_forwarders_outputs

Click ADD APPS, then click on the global_forwarders_outputs app we have on the left side to move that over to the right side. Do not add the indexers app.
click SAVE.

You will see the app listed as part of the server class. under ACTIONS, select EDIT.

By default, RESTART SPLUNKD will not be checked. In many cases, like this one, splunk must be restarted in order for these apps to take effect on the clients to which they are distributed. So in this case we need to check this box and click SAVE.

Now click ADD CLIENTS. Using the list at the bottom of the page, copy the Host name or DNS name or the IP address (internal) and paste into the INCLUDE section at the top. Click SAVE

to verify the heavy forwarder is outputting as expected, run this command:
# sudo tcpdump -nnei any port 9997
if you see data flowing, you are sending data to the indexers on port 9997

Now let’s finish the indexers app.

at the FORWARDER MANAGEMENT page, click NEW SERVER CLASS to create a new server class and name it “global_indexes_list“.

Add the indexers app. Then edit the app to RESTART SPLUNKD when installed. SAVE

For clients, add the Heavy Forwarder and SAVE the class.

Forwarders Server Class

Since we just made the server class to add indexer discovery to the heavy forwarder. We can add some additional splunk instances to that class.

Go to the deployment server and open the class_forwarders_outputs server class.
Under CLIENTS, click the EDIT button
Add all of the available clients to the class. These are the clients we NEED to receive this app.

SAVE the class.

Do the above steps for the class_indexes_list server class

Heavy Forwarder

Most of the splunk items have already been accomplished. But we can still add the forwarder as a search head. It’s not critical, but you will likely need to search your data after adding it via the HF.

SSH to the heavy forwarder
# ssh hf

edit server.conf
# nano /opt/splunk/etc/system/local/server.conf

[clustering]
master_uri = https://192.168.11.21:8089
mode = searchhead
pass4SymmKey = thispassword

save the file and exit.

Since we gave all the servers the extra 2 disks that only the indexers need, we can use the 3rd disk as the log storage disk and leave the 4th disk alone. so we need to edit fstab again:

Run these commands:
# sudo parted /dev/sdc
(parted) mklabel gpt
(parted) mkpart logs 0% 100%
(parted) quit
# sudo mkfs.xfs /dev/sdc1

Now add this volume to fstab:
# sudo nano /etc/fstab

Add the text below to the end of the file

/dev/sdc1	/logs	xfs defaults 0 0

Save the file and reboot. When it comes back up, set permissions with this command:
# sudo chown -R splunk:splunk /logs

Syslog Service

On the Heavy Forwarder, you may want to install a syslog service to collect logs from other servers. This is critical for a splunk deployment because MOST appliances will only have the option to send a syslog feed to a remote collection server. The Heavy Forwarder will act as our main receiver for all log files before it even processes them for splunk. One way or another you have to get the logs to this server whether its syslog, FTP, SFTP/SCP, HEC, etc.. No matter how, it needs to arrive to the HF and the HF needs to be able to receive it. So we will cover the most common method which is a syslog service.

To enable syslog-ng on our HF, ssh to the HF and run these commands:

sudo -s
cd /etc/yum.repos.d/
wget https://copr.fedorainfracloud.org/coprs/czanik/syslog-ng328/repo/epel-7/czanik-syslog-ng328-epel-7.repo
cd /tmp
wget https://download-ib01.fedoraproject.org/pub/epel/testing/7/x86_64/Packages/i/ivykis-0.36.3-1.el7.x86_64.rpm
rpm -ivh ivykis-0.36.3-1.el7.x86_64.rpm 
yum -y install syslog-ng
yum -y remove rsyslog
systemctl enable syslog-ng
systemctl start syslog-ng
exit

Now that syslog is installed, you can make your own config file, or you can download mine. while on the HF, run this command:
# sudo mv /etc/syslog-ng/syslog-ng.conf /etc/syslog-ng/syslog-ng.conf.orig
sudo nano /etc/syslog-ng
/syslog-ng.conf

Paste in the contents of the syslog-ng.conf file from my github repo, then save.

Paste in the info from my sample config, save and exit.
Now restart syslog-ng
# sudo systemctl restart syslog-ng.service

Check the ports are listening:
# netstat -an | grep “:500”

Your Heavy Forwarder is now ready to receive logs using those ports.

A quick test before we move on, use this command from the CLIENT machine:
# logger -d -n 192.168.11.17 -p 5000 “this is a test message”

Now go back to the HF and check the log path in use for port 5000:
# cat /opt/log/IIS/IIS.log

Go ahead and clear that log:
# echo -n > /opt/log/IIS/IIS.log

Also, the eventgen addon does not create the folders, so you have to create them manually. Use this command:
# cd /logs && mkdir -p apache1 checkpoint cisco.ise dhcp dns IIS o365 proxySG

Monitoring Server

The monitoring server is essentially just a search head but we can install additional apps that work well for a monitoring server. It also takes away some valuable CPU and memory from the search head cluster so you can monitor your clusters without affecting the search head cluster’s performance.

If you’ve completed all the steps to this point, then you’ve got a working server so we will focus on the monitoring server in a separate session.

If you wish to do this now, you can add this server to the indexer cluster as a search head.

EventGen Addon

Use a browser to go to the Deployment Server web gui.
Click the APPS menu and select FIND MORE APPS
in the search bar, type in “eventgen”, then click INSTALL for that app
it will prompt you for the splunkbase credentials. sign up for a free account if you dont have one yet.
it will prompt for a restart when finished. No need, click RESTART LATER. we just want the app downloaded.

in this tutorial I’ve done A LOT of work preparing the eventgen addon to ensure it works and to provide sample data to enrich the indexers with usable data. I am using my own github repo for this tutorial, but you can use whatever data you like.

SSH to the deployment server.
# ssh ds
Now use this command to move the eventgen folder to the deployment-apps folder:
# mv /opt/splunk/etc/apps/SA-Eventgen /opt/splunk/etc/deployment-apps/

Now use these commands to download my customized files:
# cd /tmp
yum -y install git

git clone https://github.com/bramuno/SplunkEnv.git
cd SplunkEnv/SA-Eventgen/

mkdir -p /opt/splunk/etc/deployment-apps/SA-Eventgen/samples/
cp samples/* /opt/splunk/etc/deployment-apps/SA-Eventgen/samples/

mkdir -p /opt/splunk/etc/deployment-apps/SA-Eventgen/local/
cp local/* /opt/splunk/etc/deployment-apps/SA-Eventgen/local/

cp metadata/local.meta /opt/splunk/etc/deployment-apps/SA-Eventgen/metadata/

Save and exit, then restart splunk on the deployment server.

Now go to the DS web gui and go to FORWARDER MANAGEMENT
create a new server class called “heavy_forwarders”
add the SA-Eventgen app to the class and SAVE
add the HF client to the clients list and SAVE
in the APPS section, click EDIT and select EDIT, then ensure the RESTART SPLUNKD box is checked, then SAVE

wait a few minutes for the HF to restart, then run this command to check logs are flowing from the eventgen app:
# tail -f /logs/*/*.log

You should see multiple logs streaming. these will be our artifically generated source of logs of which splunk will be monitoring.

Logrotate

These new logs above need to be rotated automatically to prevent the OS or splunk from crashing. To do this, we need to tell linux what to do using logrotate.

ssh to the HF. Run this command:
# nano /etc/logrotate.d/splunk

now paste in the data below to this new file

Save and exit.
this file tells logrotate to rotate all logs in path /logs and only keep 2 days worth of logs per folder/file. It also creates the new log files under the splunk account and group so splunk has no issues reading/writing. the old logs will be compressed after rotation, so we need to ensure splunk is not trying to read these files. more on that later.

/logs/*/*.log {
	daily
	rotate 2
	compress
	missingok
	create 0660 splunk splunk
}

Conclusion

That should be enough to get started on testing in your own environment. Feel free to leave a comment if I messed something up or forgot it completely and I will try to correct it. thanks for coming to my ted talk.

Creating a virtual Splunk deployment – Part 1 Infrastructure

This guide is meant to prepare the training environment needed for most Splunk training. This is not intended for production environments because I am skipping a lot of security hardening so build in prod at your own risk. While some steps have been eliminated because of my efforts here, there is only so much I can do. Therefore, some things you will have to figure out based on my instruction and what you can find on google.

The goal is to create a fully functional distributed Splunk environment using vmware and a splunk DEV license. There are alternatives to everything so if you do not have to use what I suggest if you know a better way.

Please note this is A LOT of work. If you have a faster way to build the hardware components of the environment, you should use that method. This guide will require a decent amount of hard disk space on your host machine. I recommend one Terabyte but you can size the below machines based on your disk limitations. Just note the Indexers need the most space.

Also this will require a host machine that has enough CPU and memory to achieve running all of these VMs at once.  i recommend a 8 or 16 core cpu and 64gb of memory.   if you dont have that, i suggest you look into ESXi or cloud solutions which is expensive. In this deployment I have created, I allocated more than 1TB of disk space and the virtual disks combined would require at least 10TB. An affordable disk solution for this would be an external usb 3.0 drive with 3TB of space or more. I have tested this myself and it does work, but it is slow.

Like all guides, this will soon get outdated. so you will have to figure things out based on changes to splunk and linux. Fortunately, you have google to help.

And finally, this is all in Linux because I don’t like windows for many reasons. You are welcome to use the windows server image instead if you have that access. Also I am using the GUI linux desktop to keep things simple and universal. All the steps below can be done on the command line if you know where to go.

Dev License

Before doing anything, request a new DEV license from Splunk. This will take at least 24 hours so don’t wait. Go to dev.splunk.com and look for the request dev license option.

VirtualBox/VMware

Unfortunately you will need a professional version of vmware or other virtualization software that has all the features of vmware. Virtualbox is close enough as far as I can tell, but if if your host machine is windows it may not be able to handle the amount of VMs needed for this project. I am using VB on a linux host machine so it tends to perform ok and also does headless start to save memory/cpu. But for this demo I am using an incredibly powerful machine with windows and vmware workstation.

An ESXi server would be great or you could look into cloud hosting and find a cheap provider to spin up machines (which would save you a lot of time). You (technically) can use vmware player (freeware) but it does not have a virtual network editor. If you have access to a university, there may be an educational discount available for the vmware software. If you don’t use multiple interfaces per VM then you dont really need a virtual network editor.

Virtual Network

First you need to create your virtual network. You CAN do this with one network adapter, but that has its own set of difficulties. So I am using one internal interface and one external interface. If you are using cloud hosting, you only need one network adapter.

VMWARE:

In VMware, click EDIT > VIRTUAL NETWORK EDITOR
which in newer verions may have a different label

Create new networks using vmnet10 and vmnet11.
Assign 192.168.10.x subnet to vmnet10
Assign 192.168.11.x subnet to vmnet11
Change vmnet10 to NAT. this will be the external adapter.
Keep vmnet11 as host-only. this will be the internal adapter.
Set the DHCP range starting at 100.

VIRTUALBOX:

Click FILE > HOST NETWORK MANAGER
Click CREATE and create vboxnet0 and vboxnet1
Give vboxnet0 subnet 192.168.10.0/24
Give vboxnet1 subnet 192.168.11.0/24
Enable the DHCP server on box subnets starting at .100 for both

APPLY changes and close.

Host Machine

On your host machine, make sure you have screen saver and all sleep/hibernate options completely DISABLED. You do not want the machine to sleep while running your VM clusters. It will likely cause them to have problems responding and they will all need a reboot, sometimes your host machine will need a reboot.

Create new VM

In your VM software create a new VM with the following specifications:
CPUs: 2
Memory: 2gb
Disk1: 40gb
Disk2: 100gb
Disk3: 500gb
Disk4: 2000gb
Network Interface1: vnet10/vboxnet0
Network Interface2: vnet11/vboxnet1

 Keep in mind they will not allocate all that space unless you specify to allocate the disk space in advance. Leave the default option to NOT allocate the space so these sizes are just theoretical for now.

CentOS

At the end of this guide is a method to CLONE your VMs. I suggest you go through this guide once and then clone that first machine serveral times to create your deployment. this will save a lot of time. but if you want to get a better understanding of you operating systems, it would be better to run through this guide multiple times to better familiarize yourself with centOS.

Download the latest Centos ISO from centos.org. Get the FULL version just to avoid problems (also referred to as the DVD version). The minimal version will require far less cpu and disk space but will be more difficult to manage for those lacking experience. Regardless of what you use, you need to ensure the client VM is running an OS that has a desktop GUI installed. a minimal version of the OS will NOT come with a GUI.

Once downloaded, use it to install centos on all of your splunk boxes. load the centos ISO into your guest CD-rom and ensure it’s connected.

before starting the VM, edit the VM settings and click the ADD button and add a 2nd hard drive. Make the drive 100gb.

Click OK and start the VM. At boot, choose INSTALL CENTOS
Helpful TIP: If you are using VirtualBox, you can start the machine with the option if HEADLESS START. this will not start the VM with a screen for that machine and thus use less of the host’s resources.

install the OS with the defaults. click the NETWORK CONNECTIONS icon, enable the network adapters by switching them on, then at the bottom rename the host to the appropriate name based on it’s function (see the above list). Click DONE to go back.

Click INSTALLATION DESTINATION icon. Check the box I WOULD LIKE TO MAKE ADDITIONAL SPACE AVAILABLE. ensure ONLY THE FIRST DISK IS CHECKED as shown below. We will configure the 2nd disk later

Click FULL DISK AND SUMMARY to verify only one disk will be formatted. Click CLOSE
click DONE.

In the RECLAIM DISK SPACE window, click RECLAIM SPACE button.

Now click the BEGIN INSTALLATION button

Click the ROOT PASSWORD icon and set the root user password. Click DONE to go back.
Click USER CREATION to create the Splunk user while we are here.

Set the password and write it down. Since this is a non-prod instance, I suggest you make this an easy password as you will need it often. Click the ADVANCED button.

add these extra groups as shown below. click SAVE CHANGES

click DONE to return

The system will install and ask to reboot when complete.

The Splunk Disk

The 2nd disk that was attached to the VM now needs to be used by the OS. This will be the disk that splunk uses and ensures we don’t run out of disk space too easily.

To format the disk for use, enter parted on that disk
# parted /dev/sdb

Now in parted run these commands:
(parted)> mklabel gpt
(parted)> mkpart opt 0% 100%
(parted)> quit

Now format the voume
# mkfs.xfs /dev/sdb1

Use this command to edit the fstab file
# sudo nano /etc/fstab

add this line and save the file

/dev/sdb1  /opt  xfs  defaults  0 0

reboot

if you use
# df -h
you will see the /opt volume is mounted under /dev/sdb1 now

Disable ipv6, SElinux, firewall

SElinux is enhanced security for production servers. we are not going to need/want this for DEV.

Run these commands:
# echo ‘SELINUX=disabled’ > /etc/selinux/config
echo ‘net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1′ >> /etc/sysctl.conf

iptables -F
iptables-save

systemctl disable firewalld.service

This file may not exist but if it does, edit it
# nano /etc/sysconfig/iptables-config
Change IPTABLES_SAVE_ON_STOP=”no” to “yes”
save, exit, and reboot.

Static IP

DHCP is not a good idea with splunk. You can use DHCP reservations, but still not a best practice.

Go to each VM and click the start menu and look for SETTINGS, then click NETWORK

click the configure icon next to each adapter and find the internal (.11) adapter.
click the IPv4 tab and change from DHCP to MANUAL
Now add the IP info into the provided fields

Assign a unique IP to each VM but be sure to use this subnet. remember you set the DHCP starting address at .100 so be sure to use IPs that are below that range. I did not do that in this example, so only do what i did if you understand what you are doing.

You wont need to assign a gateway because it doesn’t need one, but if you want you can use .1.

Click APPLY to save changes.

Update hostname

on each VM, open the terminal and run command
# sudo nano /etc/hostname

Edit the hostname (if needed) to reflect the role of the machine

Update packages

Run command
# sudo yum -y update

then reboot

Verify routes

There should only be one default route using .10
Use command
# route -n

If you see the .11 listed with 0.0.0.0 as the Destination, you need to remove that route

To remove the route, first find the interface name
# ifconfig

This interface is named ens34.

Go to the network scripts folder
# cd /etc/sysconfig/network-scripts/

Edit the ifcfg file with that name
# sudo nano ifcfg-ens34

Change DEFROUTE to NO

Save and exit, reboot to verify the route is gone.

SSH keys

One of your VMs is meant to be the “client” machine. This will act as your base of operations where you can manage all the other VMs from this machine. My example I used ubuntu, but if you only used centos thats also fine. however, the client machine needs to run the full version of cents as it comes with the GUI needed to do this next step.

Open the VM console and open the terminal.
First create the ssh folder
# mkdir ~/.ssh
# chmod 700 ~/.ssh
# cd ~/.ssh

Now use command:
# ssh-keygen
and use the name splunk to name the key pair
just hit ENTER when asked for a passphrase. you dont want a passphrase.

The keys will be placed in your home folder at /home/splunk/.ssh/

Run these commands:
# cd ~/.ssh
cat splunk.pub > authorized_keys
chmod 640 authorized_keys
chown -R splunk:splunk /home/splunk
/

Now set that ssh key as the default key. Edit the ssh file:
# sudo nano /etc/ssh/ssh_config

Under Host * you need to make these changes
Uncomment stricthostkeychecking and ensure it says “no”
Uncomment IdentityFile and change it to “splunk”
Save and exit

Now test by ssh’ing to one of your boxes or you can ssh to the box you are using. it should not prompt for a password:

Python 3

The newer versions of Centos/RHEL will eventually start shipping with python3 as the default version and you will need to install an older version for script compatibility. For older CentOS/RHEL, you will have python 2 and need to install 3.

Go to https://www.python.org/ftp/python/ and find the version you want to install, and download the package. we have CentOS 7, so we want to install python 3.7 since thats what splunk requires.

Run these commands:
# sudo -s
yum -y install gcc openssl-devel bzip2-devel libffi-devel
cd /tmp/
wget https://www.python.org/ftp/python/3.7.7/Python-3.7.7.tgz
tar xzf Python-3.7.7.tgz
cd Python-3.7.7
./configure –enable-optimizations
make altinstall

Now test with
# python3.7 -V

If that fails, try adding these two commands
# ln -s /usr/local/bin/python3.7 /usr/bin/python3.7
ln -s /usr/local/bin/python3.7 /usr/bin/python3

Exit root
# exit

Splunk Install

You are now ready to install Splunk. get the latest version or the version you want to install at splunk.com. since we are using centos we want the redhat/centos version. Like everything, you will have to login to splunk, but it’s just a free account so no need to buy anything.

Use the Client machine and open a firefox browser. go to splunk.com and look for the “download splunk enterprise” link.

Here we will download the RPM as its much easier to install/uninstall.

when you attempt to download the file, it will take you to another screen that contains the WGET method. this is helpful provided your external network interface is working.

Click the WGET link and copy the text provided. save it on a text editor window.

Now you can ssh to each splunk VM and run the commands remotely from the client machine.

paste in the wget command to download the file:
# wget -O splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm ‘https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86_64&platform=linux&version=8.0.5&product=splunk&filename=splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm&wget=true’

this will download an rpm file to the folder

Now install using
# sudo rpm -ivh <splunk file>
so in this case, the file name is splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm
so we convert our command to:
# sudo rpm -ivh splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm

Repeat the install for all splunk VMs.

Auto-start at boot

First use this command to START splunk AND set the admin password you will need at first login.
# /opt/splunk/bin/splunk start –answer-yes –accept-license –no-prompt –seed-passwd “splunkpass”
You can change “splunkpass” to your own password if you prefer. Just write it down for later.

I strongly advise you use the same password for every VM until you better understand what you are doing. But since this is a non-prod instance, it’s your call.

Now set splunk to start at boot.
# sudo /opt/splunk/bin/splunk enable boot-start -user splunk –answer-yes –accept-license

First login

use your client machine browser to visit http://<IPADDRESS&gt;:8000 where IPADDRESS is the internal (.11) address of the VM.
Use the admin password you set above. Username is admin

If you cannot get it to load from the client machine, either splunk is not running or something is blocking the connection like a firewall or bad IP address.

Go to SETTINGS > SERVER SETTINGS > GENERAL SETTINGS


Change the server name to reflect it’s purpose, but leave the management port as it is.

Enable HTTPS and change the port to 8443

The default hostname should be the true hostname of the server (FQDN). but for DEV needs, you can use the same name as you used above.

Click SAVE

Click the MESSAGES menu and click the restart link to be shortcutted to the restart area. restart the splunk service.

It will take about 30 seconds, then you can try logging in again, but you need to change the URL
use HTTPS instead of HTTP
use port 8443
which becomes
https://<IPADDRESS&gt;:8443
However, if you click the CLICK HERE TO CONTINUE link, it should update it for you

Most browsers should warn you it’s not secure now, but this is normal. Add the exception to the browser to proceed.

While in Firefox, create a bookmark for each server and name them all for their role

If you are able to login using HTTPS, you are done installing splunk to it’s base configuration.

Hot & Cold

To allow us to use the indexes.conf file later on, we need to make an unused mount point the file will reference. But the mount points will never be used by most splunk servers.

Run these commands:
# sudo -s
mkdir -p /cold
mkdir -p /hot
chown -R splunk /cold
chown -R splunk /hot

VMware Cloning

Now that we have at least one working splunk instance, we can either start at the top and repeat this entire process manually, or we can use VMware’s CLONE option to make a copy of this VM in its current state. Cloning will create new unique MAC addresses for each network adapter but it will not assign new IP addresses, so some additional configuration after cloning will be required.

You may choose how to proceed.

VMWARE:

To perform a clone, power off the VM. in VMware, locate the machine on the left side menu. Right-click the machine name and select MANAGE > CLONE….

Click NEXT
Select THE CURRENT STATE IN THE VIRTUAL MACHINE
Select CREATE A FULL CLONE
Name the VM for the role it’s going to play (eg. Heavy Forwarder, etc)
Ensure the LOCATION path is correct
click FINISH
You can clone all the machines you need now, so repeat as many times as needed.
I recommend making an extra clone to be used as a base in the event you want to add more splunk servers.

VIRTUALBOX:

ensure the VM is stopped. select the VM in the list of machines.
click MACHINE > CLONE
Name the new machine based on it’s role
Provide the correct path as needed
Change MAC policy to GENERATE
check the box to keep disk names
click NEXT
select FULL CLONE
click CLONE

Update Cloned Interfaces

Now that the machines are cloned, we need to ensure all devices have a unique IP address on both interfaces. Since the .10 subnet is DHCP, we can ignore that. but since .11 needs to be static, we need to go in and assign an IP to each device.

One at a time, start up a VM.
When the macine console boots up, login and go to SETTINGS > NETWORK
modify the INTERNAL network adapter and change the last octet of the IP address to a unique IP address for the internal network. .
the EXTERNAL adapter should still be using DHCP so there is no need to change anything.

Ensure the CONNECT AUTOMATICALLY option is checked

Also ensure the adapter is enabled.

Now open a terminal window and run this command:
# sudo nano /etc/hostname

update the hostname to reflect the role of the server. this is helpful for when you SSH to this server.
save and exit.

Also be sure to update the splunk instance names for the new clone.

Delete GUID

Splunk has a unique GUID. since we cloned the machine, we have replicated the GUID and thus all GUIDs are identifcal. so we need to delete the GUID and restart splunk. Run this command on each clone:
# rm -f /opt/splunk/etc/instance.cfg && /opt/splunk/bin/splunk restart

Collect Inventory

Go to each of your VMs and collect the ip addresses for all internal interfaces.
Use command
# ifconfig

Locate the .11 address and write it down on a notepad at the client machine. You will need this a lot.

Once you have your list of IPs and hosts, you can now add host aliases to make your life easier.
For example, the search heads below use alias SH1, SH2, SH3.
So if I want to ssh to search head 3, I only need to use
# ssh sh3
instead of
# ssh 192.168.11.14

Now use this command on the client machine
# sudo nano /etc/host

Now paste in all that host info into this file (paste by right-clicking on the terminal). Save and exit/
this will tell your client OS the IPs of all your VMs.

Structure

The Distributed Environment will be based on what Splunk recommends in their training courses. The list of devices as follows:

  • Client Machine / Syslog Source Forwarder
  • Search Head 1
  • Search Head 2
  • Search Head 3
  • Indexer 1
  • Indexer 2
  • Indexer 3
  • Search Head Deployer & License Manager
  • Monitoring Console
  • Master Node (Index Cluster Mgr)
  • Deployment Server
  • Heavy Forwarder

this tutorial is quite large and wordpress is hurting. so I will continue this in part 2

The Master Deployment Server

If you’re a splunk administrator, or if you’ve taken the splunk administrator classes, you may have heard of a concept whereby you can use the Deployment Server to push apps to the Search Head Deployer (aka the Deployer) and the Master Node (aka the Indexer Cluster Master). It sounds nice in theory and if you listen to Splunk’s official take on the matter, it sounds quite simple. But what happens if you actually do it? Well, I tried it and it turned out to be a lot more complicated than I thought.

For reference, the information presented here is accurate up to Splunk version 7.2.6.

Background

A Deployment Server (DS) is an instance of Splunk that pushes Splunk apps to other Splunk servers or instances.  Apps are similar to apps on your phone in that they perform a specific function.  A deployment server comes in handy when your Splunk environment starts growing in size as it saves the average Splunk admin a lot of time.  If you have one app that is installed on 100 splunk servers, you need to push those changes to all those servers.  If you do this manually, you will have to SCP the new files to the server and then restart all Splunk instances.   But if you have a DS, you can push these changes to all 100 servers at the same time and automatically restart each instance of Splunk.   The DS has a list of servers and another list of apps that should go to those servers, and then copies those apps (exactly as they are found on the DS) to each of those servers.  

The Search Head Deployer (SHD) is like a DS in that it pushes apps to all of the Splunk search heads that are included in the search head cluster.   This process will also trigger a rolling restart of all the search heads. A rolling restart is a safer restart as it does not restart each member at the same time. It restarts each member one-by-one to keep availability as high as possible since the search heads are customer facing. The SHD pushes apps in a different way because Splunk users need the option to save their own knowledge objects (KOs). So the SHD will merge the local/ and default/ folders together and push that merged folder to the default/ folder at the destination client (ie. search head). This way the search head users can save their own KOs to that app’s local/ folder and it takes precedence over the pushed default/ folder. Each recipient is known as a client.

The Master Node (MN) is sometimes referred to as the Indexer Cluster Master. Like the SHD and DS, the MN pushes apps to members of the Index Cluster. The MN also controls bucket replication between members and a lot more. But for apps, it pushes them same as the DS and does not merge anything like the SHD. If a restart is required, the MN will perform and monitor a rolling restart of each cluster member.

All splunk apps to be used by that splunk instance are located at $SPLUNK_HOME/etc/apps
The DS apps to be pushed are located at $SPLUNK_HOME/etc/deployment-apps
The MN apps to be pushed are located at $SPLUNK_HOME/etc/master-apps/
The SHD apps to be pushed are located at $SPLUNK_HOME/etc/shcluster/apps/

The Concept

As Splunk introduces it, the concept is simply that the DS can send apps to the MN and SHD just as they do for all other clients (Splunk instances receiving apps from the DS).   

Now, it’s important to note that a MN and SHD and even the DS itself can all be clients of the DS. Yes, the DS can send apps to itself. This is because all of these server are their own instance of Splunk. In that they are running apps and need to send data to the indexers like all the other Splunk instances. So the MN and SHD are pushing apps to their cluster members, but the MN and SHD are not actually using those apps on their own instance of Splunk. So, apps can be pushed from the DS to the MN and SHD so those servers can make use of those apps for further distribution, but never actually use the apps themselves.

So if we want to push deployment apps to the MN and SHD, then we need to update $SPLUNK_HOME/etc/system/local/deploymentclient.conf.

By adding
[deployment-client]
repositoryLocation = /new/folder/location

we are making the client tell the DS to place all the apps in this new location.

So, if we want to push apps to the MN, we need to change this repositoryLocation value on the MN to $SPLUNK_HOME/etc/master-apps/
If we want to push apps to the SHD, we need to change the repositoryLocation value on the SHD to $SPLUNK_HOME/etc/shcluster/apps

After that just restart the Splunk instance and the client will start negotiating with the DS.

There is one more step we need to perform on the DS before this will work. We have to mark these apps meant for the SHD and MN as “noop” or non-operational. This is because when the DS pushes the apps, it tells the client to start using them immediately. But these apps are not meant for the MN and SHD to use, they are meant for the MN and SHD to deploy to their cluster members. And there is no parameter on the clients where we can prevent this, so this change must be done on the DS.

So we need to edit file $SPLUNK_HOME/etc/system/local/serverclass.conf.

Here each app is labeled with the server class. The stateOnClient parameter is what we need.
[serverClass:ServerClassName:app:AppName]
restartSplunkWeb = 0
restartSplunkd = 0
stateOnClient = enabled

The default state is enabled. In the Splunk web GUI you can change this to disabled. But the web GUI does not provide the third option we need here, which is noop.
[serverClass:ServerClassName:app:AppName]
restartSplunkWeb = 0
restartSplunkd = 0
stateOnClient = noop

Now restart the DS and those apps will be in a noop state.

Should be fine, right? It’s not….not really.

The Problem

Changing the stateOnClient value to noop is critical because if it’s not in that state, the receiving client will attempt to install the apps, even if they are in “disabled” state.  Disabled means it’s still installed, but not actively used by the client. But installing the app happens when the splunk instance is first started. It only checks for the default app path which is $SPLUNK_HOME/etc/apps. But since the receiving path has been changed, the received apps are not going to the etc/apps/ folder, so splunk cannot find them. But the client is communicating with the deployment server so splunk still knows those apps exist but the DS is telling the client to install the apps from the new repository location. This is not possible and splunk will start flooding errors. But since we changed the stateOnClient to noop, this problem will not happen.

The real problem here is the app on the deployment server. If you look at the app on the DS, it is now listed as “Unchanged from state on deployment server”. This is the GUI translation of “noop”.

The problem here is that this state does NOT change between server classes. For example, let’s say you have two (or more) server classes. Both server classes contain the same app. Say you change the state of the app on class A to noop and leave the state for the same app on class B as enabled. When you restart Splunk and check the GUI, both class A and class B will show that shared app as “Unchanged from state on deployment server”, aka noop. So if you change the state of the app to “noop” for one server class, it changes the state in all other server classes.

The only way to get around this is to create a clone of the app using a different name. But this is not fun because any changes made to the app have to be made to both apps. However, a workaround for this is to create a symlink to the real app to be used as the cloned app. The DS will not know the difference and treat it like a new app. If you have multiple administrators in charge of your DS, this can be problematic. Also, some apps simply cannot be renamed/cloned. The best example of this are TA addons.

Many TAs have scripts included which are set to run directly from $SPLUNK/home/etc/apps/<app_name>. The TA knows the app’s default name, so it includes that name in the script path. So if you clone the app or change the name, the TA won’t work because it can’t find the path it’s looking for and you will see a ton of script errors in the logs. This may not be the case for all TAs, but it is certainly the case for many I’ve come across.

But if you don’t want to clone/rename an app, and we shouldn’t have to (hey Splunk maybe fix this), you simply cannot use those apps anywhere else. So if your app should go to the SHD and a forwarder (for example), the app will not work on the forwarder because it was set to “noop” for the SHD.

Another major issue you may run into is server load. Combining all of your MN and SHD apps into the DS repository will increase the total number of apps. For small environments, this likely won’t be a problem. But the larger your environment gets, the more apps you will be using and thus your repository will fill up quickly. The more apps you have, the more CPU your server will use to process them.

If your machine is not very powerful (eg. a virtual machine with limited resources), you will notice a hit on your DS’s overall performance. If there are too many apps, a force reload using command
# /opt/splunk/bin/splunk reload deploy-server
may cause a major slowdown. Under these conditions, a force reload caused my DS to completely stop responding for ~90 seconds. This is because the reload forces a re-deployment of all the server classes. This means that all apps are now getting sent to all of their assigned clients, even if they don’t really need it.

A decent workaround for this problem is to use the server class name as part of the reload command to instruct the DS to only reload that specific server class.
# /opt/splunk/bin/splunk reload deploy-server <class_name>

The solution

After considering all of this I determined it would be easier to just send the apps to a temporary client that can forward the apps to the correct destination.  So I created some virtual machines and installed the Splunk Universal Forwarder.  I added those machines to the DS as new clients and created two new server classes:
indexer_cluster
searchHead_cluster

These proxy machines are the only respective clients of the server classes and the apps assigned are the apps that need to be installed on their respective clusters. So the DS is sending the apps in “enabled” state to the proxy machine. On the proxy machine, the below script is run via cron every minute.
# rsync -rqup --delete --ignore-errors --exclude-from=/home/splunk/excludeUF.txt /opt/splunkforwarder/etc/apps/* splunk@SPLUNK_IP:/opt/splunk/etc/shcluster/apps/

–delete allows the proxy to remove apps from the SHD/MN destination as required.
–ignore-errors is critical to prevent rsync from halting on unimportant errors
the –exclude-from directive will load the list of folder names included in the referenced file. In this example, I don’t need to include these apps because I’m not pushing changes to them. So be sure your list is accurate.
The rsync command copies everything in the apps folder of the UF client, and sends them to the shcluster/apps/ folder of the SHD, or the master-apps/ folder of the MN.

Now that the apps are going to the final destination (SHD/MN), those clusters can now be pushed from the SHD and MN as needed. The proxy machines are technically running the apps, but since it’s a universal forwarder and not a heavy forwarder, it won’t do much of anything.

But what about apps meant to run on the SHD and MN? Well because we didn’t need to change the deploymentclient.conf file, we can send these machines the apps using another server class. So to keep things simple, I created two new server classes for these specific machines.
Deployer_local
MasterNode_local

So any app that the SHD needs to use for itself is added to the Depolyer_local class. And the SHD client is added to the class as well. Only these apps are pushed to the main $SPLUNK_HOME/etc/apps folder so I don’t need to worry about the search head cluster apps. So we now have two server classes pushing different apps to the same destination server.

This method has worked very well so far and allows me to keep the apps in enabled state. If I remove an app from the server class, it is removed from the apps folder of the proxy machine, and then the proxy machine uses rsync to delete that app from the SHD/MN. I don’t have to clone or rename apps and I am able to use the DS to push apps to every single device in my environment. I hope this was helpful to others.

Making a hot spare for the Crowdstrike TA

So in my environment, I have different tiers of forwarders that perform different tasks.  I have the usual universal forwarders and then I have my Heavy Forwarders.  I have two HFs which are actually running in a cluster with a load balancer in front of them.  I do this because I want reliability overall but it also provides the option for server downtime should something go wrong or the server just needs a reboot.

However, not all apps run the same in splunk.   Some addons are configured to pull data from a server and forward that data to the indexers.  So the HFs are designed as receivers where data is pushed to them, while I have another set of HFs that are in charge or pulling data from remote servers.   I refer to these servers as “pullers”.

But pulling data in a cluster is a problem because, unlike the search head clusters, there is no captain that controls the data to be pulled and where the addon left off. So we can only have one puller active at any time.  We have a spare puller on standby as a warm spare.  But what happens when the active node goes down?

Well for most apps/addons on the 2nd puller, it can rely on the inputs.conf parameter “ignoreOlderThan” to ignore events older than X days/hours/etc.  For some apps, like Rapid7, the bookmark used to note the last event pulled is stored in that server’s KVstore.

But Crowdstrike wasn’t so handy.  When I failed over to the 2nd puller, it started pulling data since the first event was recorded for our account.  This led to a lot of duplicate events and false alarms.  I asked CS how to avoid this and they said the offset value stored in the inputs.conf file is the marker where the forwarder starts collecting data.  So a zero value offset starts at the beginning.

I then asked where can this offset value be obtained and they were not able to locate that information.  Fortunately, that info was in the addon’s logs.

So I took this info and used it to narrow down all the logs from that sourcetype with the key “consuming”.

index=_internal sourcetype=ta-crowdstrike_ucc_lib-2 "consuming" 

Now that these logs are identified, I need to make an extract to define the placeholder value.
[ta-crowdstrike_ucc_lib-2]
EXTRACT-placeholder = ^.*for\s'\w+-\w+-\w+-\w+-\w+'\sfrom\s(?P<placeholder>.+)

Now that these logs are identified, I need to make an extract to define the placeholder value.

Now that the offset is getting extracted to field name “placeholder”, I can use this in my splunk query to locate the latest/highest value. 
index=_internal sourcetype=ta-crowdstrike_ucc_lib-2 "start consuming" | stats max(placeholder)

Once I found the highest value, I took this offset and used it as the offset value on an independent splunk instance that is sending to a test index.   When tested on this new server, it used the new offset value by starting on the next integer.

I checked the search results and it had only indexed events that were created during the time of that offset value.  So I now how the offset value I need.  Now I just need to extract it from splunk using automation.

I leveraged the API to run the search and extract the placeholder.

$ curl -u USERNAME:PASSWORD https://SPLUNK:8089/services/search/jobs -d search="search index=_internal sourcetype=ta-crowdstrike_ucc_lib-2 "start consuming" host=forwarder_name | stats max(placeholder)"

Caution: do not use single quotes in API SPL queries. Use double quotes and escape them as needed.

This returns the SID of the search, which I can pipe back into the API to obtain the results of the search.

[splunk@SERVER~]$ curl -u USERNAME:PASSWORD https://SPLUNK:8089/services/search/jobs/1564006340.36894_A17B22CE-90D3-4B82-976E-169244223C1E/results

Here we can see the returned value is 6951741.  That’s the offset value I need to extract.  
Now that I am getting the data I need from the API, I can move these calls to a script to do the rest.  
From here I wrote a script that performs the above calls to grab the offset data.  Then it takes that value and places it in the correct app file (listed below) on the deployment server. 
/opt/splunk/etc/deployment-apps/TA-crowdstrike/local/crowdstrike_falcon_host_inputs.conf
This means that the deployment server always has the latest offset value stored in the configuration that is pushed out to the clients.  
However, ensure your deployment server has this app set to ENABLE only.  Do not set it to restart splunkd.  Keeping restart enabled could cause a restart of splunk every time the offset value is changed. 

All that’s left is to add the script to the crontab so it’s always updating that value.  

Now if I activate the standby puller, it will grab the latest offset value and start from there without creating duplicate data on our indexers.