Creating a virtual Splunk deployment – Part 1 Infrastructure

This guide is meant to prepare the training environment needed for most Splunk training. This is not intended for production environments because I am skipping a lot of security hardening so build in prod at your own risk. While some steps have been eliminated because of my efforts here, there is only so much I can do. Therefore, some things you will have to figure out based on my instruction and what you can find on google.

The goal is to create a fully functional distributed Splunk environment using vmware and a splunk DEV license. There are alternatives to everything so if you do not have to use what I suggest if you know a better way.

Please note this is A LOT of work. If you have a faster way to build the hardware components of the environment, you should use that method. This guide will require a decent amount of hard disk space on your host machine. I recommend one Terabyte but you can size the below machines based on your disk limitations. Just note the Indexers need the most space.

Also this will require a host machine that has enough CPU and memory to achieve running all of these VMs at once.  i recommend a 8 or 16 core cpu and 64gb of memory.   if you dont have that, i suggest you look into ESXi or cloud solutions which is expensive. In this deployment I have created, I allocated more than 1TB of disk space and the virtual disks combined would require at least 10TB. An affordable disk solution for this would be an external usb 3.0 drive with 3TB of space or more. I have tested this myself and it does work, but it is slow.

Like all guides, this will soon get outdated. so you will have to figure things out based on changes to splunk and linux. Fortunately, you have google to help.

And finally, this is all in Linux because I don’t like windows for many reasons. You are welcome to use the windows server image instead if you have that access. Also I am using the GUI linux desktop to keep things simple and universal. All the steps below can be done on the command line if you know where to go.

Dev License

Before doing anything, request a new DEV license from Splunk. This will take at least 24 hours so don’t wait. Go to dev.splunk.com and look for the request dev license option.

VirtualBox/VMware

Unfortunately you will need a professional version of vmware or other virtualization software that has all the features of vmware. Virtualbox is close enough as far as I can tell, but if if your host machine is windows it may not be able to handle the amount of VMs needed for this project. I am using VB on a linux host machine so it tends to perform ok and also does headless start to save memory/cpu. But for this demo I am using an incredibly powerful machine with windows and vmware workstation.

An ESXi server would be great or you could look into cloud hosting and find a cheap provider to spin up machines (which would save you a lot of time). You (technically) can use vmware player (freeware) but it does not have a virtual network editor. If you have access to a university, there may be an educational discount available for the vmware software. If you don’t use multiple interfaces per VM then you dont really need a virtual network editor.

Virtual Network

First you need to create your virtual network. You CAN do this with one network adapter, but that has its own set of difficulties. So I am using one internal interface and one external interface. If you are using cloud hosting, you only need one network adapter.

VMWARE:

In VMware, click EDIT > VIRTUAL NETWORK EDITOR
which in newer verions may have a different label

Create new networks using vmnet10 and vmnet11.
Assign 192.168.10.x subnet to vmnet10
Assign 192.168.11.x subnet to vmnet11
Change vmnet10 to NAT. this will be the external adapter.
Keep vmnet11 as host-only. this will be the internal adapter.
Set the DHCP range starting at 100.

VIRTUALBOX:

Click FILE > HOST NETWORK MANAGER
Click CREATE and create vboxnet0 and vboxnet1
Give vboxnet0 subnet 192.168.10.0/24
Give vboxnet1 subnet 192.168.11.0/24
Enable the DHCP server on box subnets starting at .100 for both

APPLY changes and close.

Host Machine

On your host machine, make sure you have screen saver and all sleep/hibernate options completely DISABLED. You do not want the machine to sleep while running your VM clusters. It will likely cause them to have problems responding and they will all need a reboot, sometimes your host machine will need a reboot.

Create new VM

In your VM software create a new VM with the following specifications:
CPUs: 2
Memory: 2gb
Disk1: 40gb
Disk2: 100gb
Disk3: 500gb
Disk4: 2000gb
Network Interface1: vnet10/vboxnet0
Network Interface2: vnet11/vboxnet1

 Keep in mind they will not allocate all that space unless you specify to allocate the disk space in advance. Leave the default option to NOT allocate the space so these sizes are just theoretical for now.

CentOS

At the end of this guide is a method to CLONE your VMs. I suggest you go through this guide once and then clone that first machine serveral times to create your deployment. this will save a lot of time. but if you want to get a better understanding of you operating systems, it would be better to run through this guide multiple times to better familiarize yourself with centOS.

Download the latest Centos ISO from centos.org. Get the FULL version just to avoid problems (also referred to as the DVD version). The minimal version will require far less cpu and disk space but will be more difficult to manage for those lacking experience. Regardless of what you use, you need to ensure the client VM is running an OS that has a desktop GUI installed. a minimal version of the OS will NOT come with a GUI.

Once downloaded, use it to install centos on all of your splunk boxes. load the centos ISO into your guest CD-rom and ensure it’s connected.

before starting the VM, edit the VM settings and click the ADD button and add a 2nd hard drive. Make the drive 100gb.

Click OK and start the VM. At boot, choose INSTALL CENTOS
Helpful TIP: If you are using VirtualBox, you can start the machine with the option if HEADLESS START. this will not start the VM with a screen for that machine and thus use less of the host’s resources.

install the OS with the defaults. click the NETWORK CONNECTIONS icon, enable the network adapters by switching them on, then at the bottom rename the host to the appropriate name based on it’s function (see the above list). Click DONE to go back.

Click INSTALLATION DESTINATION icon. Check the box I WOULD LIKE TO MAKE ADDITIONAL SPACE AVAILABLE. ensure ONLY THE FIRST DISK IS CHECKED as shown below. We will configure the 2nd disk later

Click FULL DISK AND SUMMARY to verify only one disk will be formatted. Click CLOSE
click DONE.

In the RECLAIM DISK SPACE window, click RECLAIM SPACE button.

Now click the BEGIN INSTALLATION button

Click the ROOT PASSWORD icon and set the root user password. Click DONE to go back.
Click USER CREATION to create the Splunk user while we are here.

Set the password and write it down. Since this is a non-prod instance, I suggest you make this an easy password as you will need it often. Click the ADVANCED button.

add these extra groups as shown below. click SAVE CHANGES

click DONE to return

The system will install and ask to reboot when complete.

The Splunk Disk

The 2nd disk that was attached to the VM now needs to be used by the OS. This will be the disk that splunk uses and ensures we don’t run out of disk space too easily.

To format the disk for use, enter parted on that disk
# parted /dev/sdb

Now in parted run these commands:
(parted)> mklabel gpt
(parted)> mkpart opt 0% 100%
(parted)> quit

Now format the voume
# mkfs.xfs /dev/sdb1

Use this command to edit the fstab file
# sudo nano /etc/fstab

add this line and save the file

/dev/sdb1  /opt  xfs  defaults  0 0

reboot

if you use
# df -h
you will see the /opt volume is mounted under /dev/sdb1 now

Disable ipv6, SElinux, firewall

SElinux is enhanced security for production servers. we are not going to need/want this for DEV.

Run these commands:
# echo ‘SELINUX=disabled’ > /etc/selinux/config
echo ‘net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1′ >> /etc/sysctl.conf

iptables -F
iptables-save

systemctl disable firewalld.service

This file may not exist but if it does, edit it
# nano /etc/sysconfig/iptables-config
Change IPTABLES_SAVE_ON_STOP=”no” to “yes”
save, exit, and reboot.

Static IP

DHCP is not a good idea with splunk. You can use DHCP reservations, but still not a best practice.

Go to each VM and click the start menu and look for SETTINGS, then click NETWORK

click the configure icon next to each adapter and find the internal (.11) adapter.
click the IPv4 tab and change from DHCP to MANUAL
Now add the IP info into the provided fields

Assign a unique IP to each VM but be sure to use this subnet. remember you set the DHCP starting address at .100 so be sure to use IPs that are below that range. I did not do that in this example, so only do what i did if you understand what you are doing.

You wont need to assign a gateway because it doesn’t need one, but if you want you can use .1.

Click APPLY to save changes.

Update hostname

on each VM, open the terminal and run command
# sudo nano /etc/hostname

Edit the hostname (if needed) to reflect the role of the machine

Update packages

Run command
# sudo yum -y update

then reboot

Verify routes

There should only be one default route using .10
Use command
# route -n

If you see the .11 listed with 0.0.0.0 as the Destination, you need to remove that route

To remove the route, first find the interface name
# ifconfig

This interface is named ens34.

Go to the network scripts folder
# cd /etc/sysconfig/network-scripts/

Edit the ifcfg file with that name
# sudo nano ifcfg-ens34

Change DEFROUTE to NO

Save and exit, reboot to verify the route is gone.

SSH keys

One of your VMs is meant to be the “client” machine. This will act as your base of operations where you can manage all the other VMs from this machine. My example I used ubuntu, but if you only used centos thats also fine. however, the client machine needs to run the full version of cents as it comes with the GUI needed to do this next step.

Open the VM console and open the terminal.
First create the ssh folder
# mkdir ~/.ssh
# chmod 700 ~/.ssh
# cd ~/.ssh

Now use command:
# ssh-keygen
and use the name splunk to name the key pair
just hit ENTER when asked for a passphrase. you dont want a passphrase.

The keys will be placed in your home folder at /home/splunk/.ssh/

Run these commands:
# cd ~/.ssh
cat splunk.pub > authorized_keys
chmod 640 authorized_keys
chown -R splunk:splunk /home/splunk
/

Now set that ssh key as the default key. Edit the ssh file:
# sudo nano /etc/ssh/ssh_config

Under Host * you need to make these changes
Uncomment stricthostkeychecking and ensure it says “no”
Uncomment IdentityFile and change it to “splunk”
Save and exit

Now test by ssh’ing to one of your boxes or you can ssh to the box you are using. it should not prompt for a password:

Python 3

The newer versions of Centos/RHEL will eventually start shipping with python3 as the default version and you will need to install an older version for script compatibility. For older CentOS/RHEL, you will have python 2 and need to install 3.

Go to https://www.python.org/ftp/python/ and find the version you want to install, and download the package. we have CentOS 7, so we want to install python 3.7 since thats what splunk requires.

Run these commands:
# sudo -s
yum -y install gcc openssl-devel bzip2-devel libffi-devel
cd /tmp/
wget https://www.python.org/ftp/python/3.7.7/Python-3.7.7.tgz
tar xzf Python-3.7.7.tgz
cd Python-3.7.7
./configure –enable-optimizations
make altinstall

Now test with
# python3.7 -V

If that fails, try adding these two commands
# ln -s /usr/local/bin/python3.7 /usr/bin/python3.7
ln -s /usr/local/bin/python3.7 /usr/bin/python3

Exit root
# exit

Splunk Install

You are now ready to install Splunk. get the latest version or the version you want to install at splunk.com. since we are using centos we want the redhat/centos version. Like everything, you will have to login to splunk, but it’s just a free account so no need to buy anything.

Use the Client machine and open a firefox browser. go to splunk.com and look for the “download splunk enterprise” link.

Here we will download the RPM as its much easier to install/uninstall.

when you attempt to download the file, it will take you to another screen that contains the WGET method. this is helpful provided your external network interface is working.

Click the WGET link and copy the text provided. save it on a text editor window.

Now you can ssh to each splunk VM and run the commands remotely from the client machine.

paste in the wget command to download the file:
# wget -O splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm ‘https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86_64&platform=linux&version=8.0.5&product=splunk&filename=splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm&wget=true’

this will download an rpm file to the folder

Now install using
# sudo rpm -ivh <splunk file>
so in this case, the file name is splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm
so we convert our command to:
# sudo rpm -ivh splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm

Repeat the install for all splunk VMs.

Auto-start at boot

First use this command to START splunk AND set the admin password you will need at first login.
# /opt/splunk/bin/splunk start –answer-yes –accept-license –no-prompt –seed-passwd “splunkpass”
You can change “splunkpass” to your own password if you prefer. Just write it down for later.

I strongly advise you use the same password for every VM until you better understand what you are doing. But since this is a non-prod instance, it’s your call.

Now set splunk to start at boot.
# sudo /opt/splunk/bin/splunk enable boot-start -user splunk –answer-yes –accept-license

First login

use your client machine browser to visit http://<IPADDRESS&gt;:8000 where IPADDRESS is the internal (.11) address of the VM.
Use the admin password you set above. Username is admin

If you cannot get it to load from the client machine, either splunk is not running or something is blocking the connection like a firewall or bad IP address.

Go to SETTINGS > SERVER SETTINGS > GENERAL SETTINGS


Change the server name to reflect it’s purpose, but leave the management port as it is.

Enable HTTPS and change the port to 8443

The default hostname should be the true hostname of the server (FQDN). but for DEV needs, you can use the same name as you used above.

Click SAVE

Click the MESSAGES menu and click the restart link to be shortcutted to the restart area. restart the splunk service.

It will take about 30 seconds, then you can try logging in again, but you need to change the URL
use HTTPS instead of HTTP
use port 8443
which becomes
https://<IPADDRESS&gt;:8443
However, if you click the CLICK HERE TO CONTINUE link, it should update it for you

Most browsers should warn you it’s not secure now, but this is normal. Add the exception to the browser to proceed.

While in Firefox, create a bookmark for each server and name them all for their role

If you are able to login using HTTPS, you are done installing splunk to it’s base configuration.

Hot & Cold

To allow us to use the indexes.conf file later on, we need to make an unused mount point the file will reference. But the mount points will never be used by most splunk servers.

Run these commands:
# sudo -s
mkdir -p /cold
mkdir -p /hot
chown -R splunk /cold
chown -R splunk /hot

VMware Cloning

Now that we have at least one working splunk instance, we can either start at the top and repeat this entire process manually, or we can use VMware’s CLONE option to make a copy of this VM in its current state. Cloning will create new unique MAC addresses for each network adapter but it will not assign new IP addresses, so some additional configuration after cloning will be required.

You may choose how to proceed.

VMWARE:

To perform a clone, power off the VM. in VMware, locate the machine on the left side menu. Right-click the machine name and select MANAGE > CLONE….

Click NEXT
Select THE CURRENT STATE IN THE VIRTUAL MACHINE
Select CREATE A FULL CLONE
Name the VM for the role it’s going to play (eg. Heavy Forwarder, etc)
Ensure the LOCATION path is correct
click FINISH
You can clone all the machines you need now, so repeat as many times as needed.
I recommend making an extra clone to be used as a base in the event you want to add more splunk servers.

VIRTUALBOX:

ensure the VM is stopped. select the VM in the list of machines.
click MACHINE > CLONE
Name the new machine based on it’s role
Provide the correct path as needed
Change MAC policy to GENERATE
check the box to keep disk names
click NEXT
select FULL CLONE
click CLONE

Update Cloned Interfaces

Now that the machines are cloned, we need to ensure all devices have a unique IP address on both interfaces. Since the .10 subnet is DHCP, we can ignore that. but since .11 needs to be static, we need to go in and assign an IP to each device.

One at a time, start up a VM.
When the macine console boots up, login and go to SETTINGS > NETWORK
modify the INTERNAL network adapter and change the last octet of the IP address to a unique IP address for the internal network. .
the EXTERNAL adapter should still be using DHCP so there is no need to change anything.

Ensure the CONNECT AUTOMATICALLY option is checked

Also ensure the adapter is enabled.

Now open a terminal window and run this command:
# sudo nano /etc/hostname

update the hostname to reflect the role of the server. this is helpful for when you SSH to this server.
save and exit.

Also be sure to update the splunk instance names for the new clone.

Delete GUID

Splunk has a unique GUID. since we cloned the machine, we have replicated the GUID and thus all GUIDs are identifcal. so we need to delete the GUID and restart splunk. Run this command on each clone:
# rm -f /opt/splunk/etc/instance.cfg && /opt/splunk/bin/splunk restart

Collect Inventory

Go to each of your VMs and collect the ip addresses for all internal interfaces.
Use command
# ifconfig

Locate the .11 address and write it down on a notepad at the client machine. You will need this a lot.

Once you have your list of IPs and hosts, you can now add host aliases to make your life easier.
For example, the search heads below use alias SH1, SH2, SH3.
So if I want to ssh to search head 3, I only need to use
# ssh sh3
instead of
# ssh 192.168.11.14

Now use this command on the client machine
# sudo nano /etc/host

Now paste in all that host info into this file (paste by right-clicking on the terminal). Save and exit/
this will tell your client OS the IPs of all your VMs.

Structure

The Distributed Environment will be based on what Splunk recommends in their training courses. The list of devices as follows:

  • Client Machine / Syslog Source Forwarder
  • Search Head 1
  • Search Head 2
  • Search Head 3
  • Indexer 1
  • Indexer 2
  • Indexer 3
  • Search Head Deployer & License Manager
  • Monitoring Console
  • Master Node (Index Cluster Mgr)
  • Deployment Server
  • Heavy Forwarder

this tutorial is quite large and wordpress is hurting. so I will continue this in part 2

Leave a comment