This guide is meant to prepare the training environment needed for most Splunk training. This is not intended for production environments because I am skipping a lot of security hardening so build in prod at your own risk. While some steps have been eliminated because of my efforts here, there is only so much I can do. Therefore, some things you will have to figure out based on my instruction and what you can find on google.
The goal is to create a fully functional distributed Splunk environment using vmware and a splunk DEV license. There are alternatives to everything so if you do not have to use what I suggest if you know a better way.
Please note this is A LOT of work. If you have a faster way to build the hardware components of the environment, you should use that method. This guide will require a decent amount of hard disk space on your host machine. I recommend one Terabyte but you can size the below machines based on your disk limitations. Just note the Indexers need the most space.
Also this will require a host machine that has enough CPU and memory to achieve running all of these VMs at once. i recommend a 8 or 16 core cpu and 64gb of memory. if you dont have that, i suggest you look into ESXi or cloud solutions which is expensive. In this deployment I have created, I allocated more than 1TB of disk space and the virtual disks combined would require at least 10TB. An affordable disk solution for this would be an external usb 3.0 drive with 3TB of space or more. I have tested this myself and it does work, but it is slow.
Like all guides, this will soon get outdated. so you will have to figure things out based on changes to splunk and linux. Fortunately, you have google to help.
And finally, this is all in Linux because I don’t like windows for many reasons. You are welcome to use the windows server image instead if you have that access. Also I am using the GUI linux desktop to keep things simple and universal. All the steps below can be done on the command line if you know where to go.
Dev License
Before doing anything, request a new DEV license from Splunk. This will take at least 24 hours so don’t wait. Go to dev.splunk.com and look for the request dev license option.
VirtualBox/VMware
Unfortunately you will need a professional version of vmware or other virtualization software that has all the features of vmware. Virtualbox is close enough as far as I can tell, but if if your host machine is windows it may not be able to handle the amount of VMs needed for this project. I am using VB on a linux host machine so it tends to perform ok and also does headless start to save memory/cpu. But for this demo I am using an incredibly powerful machine with windows and vmware workstation.
An ESXi server would be great or you could look into cloud hosting and find a cheap provider to spin up machines (which would save you a lot of time). You (technically) can use vmware player (freeware) but it does not have a virtual network editor. If you have access to a university, there may be an educational discount available for the vmware software. If you don’t use multiple interfaces per VM then you dont really need a virtual network editor.
Virtual Network
First you need to create your virtual network. You CAN do this with one network adapter, but that has its own set of difficulties. So I am using one internal interface and one external interface. If you are using cloud hosting, you only need one network adapter.
VMWARE:
In VMware, click EDIT > VIRTUAL NETWORK EDITOR
which in newer verions may have a different label
Create new networks using vmnet10 and vmnet11.
Assign 192.168.10.x subnet to vmnet10
Assign 192.168.11.x subnet to vmnet11
Change vmnet10 to NAT. this will be the external adapter.
Keep vmnet11 as host-only. this will be the internal adapter.
Set the DHCP range starting at 100.
VIRTUALBOX:
Click FILE > HOST NETWORK MANAGER
Click CREATE and create vboxnet0 and vboxnet1
Give vboxnet0 subnet 192.168.10.0/24
Give vboxnet1 subnet 192.168.11.0/24
Enable the DHCP server on box subnets starting at .100 for both
APPLY changes and close.
Host Machine
On your host machine, make sure you have screen saver and all sleep/hibernate options completely DISABLED. You do not want the machine to sleep while running your VM clusters. It will likely cause them to have problems responding and they will all need a reboot, sometimes your host machine will need a reboot.
Create new VM
In your VM software create a new VM with the following specifications:
CPUs: 2
Memory: 2gb
Disk1: 40gb
Disk2: 100gb
Disk3: 500gb
Disk4: 2000gb
Network Interface1: vnet10/vboxnet0
Network Interface2: vnet11/vboxnet1
Keep in mind they will not allocate all that space unless you specify to allocate the disk space in advance. Leave the default option to NOT allocate the space so these sizes are just theoretical for now.
CentOS
At the end of this guide is a method to CLONE your VMs. I suggest you go through this guide once and then clone that first machine serveral times to create your deployment. this will save a lot of time. but if you want to get a better understanding of you operating systems, it would be better to run through this guide multiple times to better familiarize yourself with centOS.
Download the latest Centos ISO from centos.org. Get the FULL version just to avoid problems (also referred to as the DVD version). The minimal version will require far less cpu and disk space but will be more difficult to manage for those lacking experience. Regardless of what you use, you need to ensure the client VM is running an OS that has a desktop GUI installed. a minimal version of the OS will NOT come with a GUI.
Once downloaded, use it to install centos on all of your splunk boxes. load the centos ISO into your guest CD-rom and ensure it’s connected.
before starting the VM, edit the VM settings and click the ADD button and add a 2nd hard drive. Make the drive 100gb.
Click OK and start the VM. At boot, choose INSTALL CENTOS
Helpful TIP: If you are using VirtualBox, you can start the machine with the option if HEADLESS START. this will not start the VM with a screen for that machine and thus use less of the host’s resources.
install the OS with the defaults. click the NETWORK CONNECTIONS icon, enable the network adapters by switching them on, then at the bottom rename the host to the appropriate name based on it’s function (see the above list). Click DONE to go back.
Click INSTALLATION DESTINATION icon. Check the box I WOULD LIKE TO MAKE ADDITIONAL SPACE AVAILABLE. ensure ONLY THE FIRST DISK IS CHECKED as shown below. We will configure the 2nd disk later
Click FULL DISK AND SUMMARY to verify only one disk will be formatted. Click CLOSE
click DONE.
In the RECLAIM DISK SPACE window, click RECLAIM SPACE button.
Now click the BEGIN INSTALLATION button
Click the ROOT PASSWORD icon and set the root user password. Click DONE to go back.
Click USER CREATION to create the Splunk user while we are here.
Set the password and write it down. Since this is a non-prod instance, I suggest you make this an easy password as you will need it often. Click the ADVANCED button.
add these extra groups as shown below. click SAVE CHANGES
click DONE to return
The system will install and ask to reboot when complete.
The Splunk Disk
The 2nd disk that was attached to the VM now needs to be used by the OS. This will be the disk that splunk uses and ensures we don’t run out of disk space too easily.
To format the disk for use, enter parted on that disk
# parted /dev/sdb
Now in parted run these commands:
(parted)> mklabel gpt
(parted)> mkpart opt 0% 100%
(parted)> quit
Now format the voume
# mkfs.xfs /dev/sdb1
Use this command to edit the fstab file
# sudo nano /etc/fstab
add this line and save the file
/dev/sdb1 /opt xfs defaults 0 0
reboot
if you use
# df -h
you will see the /opt volume is mounted under /dev/sdb1 now
Disable ipv6, SElinux, firewall
SElinux is enhanced security for production servers. we are not going to need/want this for DEV.
Run these commands:
# echo ‘SELINUX=disabled’ > /etc/selinux/config
echo ‘net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1′ >> /etc/sysctl.conf
iptables -F
iptables-save
systemctl disable firewalld.service
This file may not exist but if it does, edit it
# nano /etc/sysconfig/iptables-config
Change IPTABLES_SAVE_ON_STOP=”no” to “yes”
save, exit, and reboot.
Static IP
DHCP is not a good idea with splunk. You can use DHCP reservations, but still not a best practice.
Go to each VM and click the start menu and look for SETTINGS, then click NETWORK
click the configure icon next to each adapter and find the internal (.11) adapter.
click the IPv4 tab and change from DHCP to MANUAL
Now add the IP info into the provided fields
Assign a unique IP to each VM but be sure to use this subnet. remember you set the DHCP starting address at .100 so be sure to use IPs that are below that range. I did not do that in this example, so only do what i did if you understand what you are doing.
You wont need to assign a gateway because it doesn’t need one, but if you want you can use .1.
Click APPLY to save changes.
Update hostname
on each VM, open the terminal and run command
# sudo nano /etc/hostname
Edit the hostname (if needed) to reflect the role of the machine
Update packages
Run command
# sudo yum -y update
then reboot
Verify routes
There should only be one default route using .10
Use command
# route -n
If you see the .11 listed with 0.0.0.0 as the Destination, you need to remove that route
To remove the route, first find the interface name
# ifconfig
This interface is named ens34.
Go to the network scripts folder
# cd /etc/sysconfig/network-scripts/
Edit the ifcfg file with that name
# sudo nano ifcfg-ens34
Change DEFROUTE to NO
Save and exit, reboot to verify the route is gone.
SSH keys
One of your VMs is meant to be the “client” machine. This will act as your base of operations where you can manage all the other VMs from this machine. My example I used ubuntu, but if you only used centos thats also fine. however, the client machine needs to run the full version of cents as it comes with the GUI needed to do this next step.
Open the VM console and open the terminal.
First create the ssh folder
# mkdir ~/.ssh
# chmod 700 ~/.ssh
# cd ~/.ssh
Now use command:
# ssh-keygen
and use the name splunk to name the key pair
just hit ENTER when asked for a passphrase. you dont want a passphrase.
The keys will be placed in your home folder at /home/splunk/.ssh/
Run these commands:
# cd ~/.ssh
cat splunk.pub > authorized_keys
chmod 640 authorized_keys
chown -R splunk:splunk /home/splunk/
Now set that ssh key as the default key. Edit the ssh file:
# sudo nano /etc/ssh/ssh_config
Under Host * you need to make these changes
Uncomment stricthostkeychecking and ensure it says “no”
Uncomment IdentityFile and change it to “splunk”
Save and exit
Now test by ssh’ing to one of your boxes or you can ssh to the box you are using. it should not prompt for a password:
Python 3
The newer versions of Centos/RHEL will eventually start shipping with python3 as the default version and you will need to install an older version for script compatibility. For older CentOS/RHEL, you will have python 2 and need to install 3.
Go to https://www.python.org/ftp/python/ and find the version you want to install, and download the package. we have CentOS 7, so we want to install python 3.7 since thats what splunk requires.
Run these commands:
# sudo -s
yum -y install gcc openssl-devel bzip2-devel libffi-devel
cd /tmp/
wget https://www.python.org/ftp/python/3.7.7/Python-3.7.7.tgz
tar xzf Python-3.7.7.tgz
cd Python-3.7.7
./configure –enable-optimizations
make altinstall
Now test with
# python3.7 -V
If that fails, try adding these two commands
# ln -s /usr/local/bin/python3.7 /usr/bin/python3.7
ln -s /usr/local/bin/python3.7 /usr/bin/python3
Exit root
# exit
Splunk Install
You are now ready to install Splunk. get the latest version or the version you want to install at splunk.com. since we are using centos we want the redhat/centos version. Like everything, you will have to login to splunk, but it’s just a free account so no need to buy anything.
Use the Client machine and open a firefox browser. go to splunk.com and look for the “download splunk enterprise” link.
Here we will download the RPM as its much easier to install/uninstall.
when you attempt to download the file, it will take you to another screen that contains the WGET method. this is helpful provided your external network interface is working.
Click the WGET link and copy the text provided. save it on a text editor window.
Now you can ssh to each splunk VM and run the commands remotely from the client machine.
paste in the wget command to download the file:
# wget -O splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm ‘https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86_64&platform=linux&version=8.0.5&product=splunk&filename=splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm&wget=true’
this will download an rpm file to the folder
Now install using
# sudo rpm -ivh <splunk file>
so in this case, the file name is splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm
so we convert our command to:
# sudo rpm -ivh splunk-8.0.5-a1a6394cc5ae-linux-2.6-x86_64.rpm
Repeat the install for all splunk VMs.
Auto-start at boot
First use this command to START splunk AND set the admin password you will need at first login.
# /opt/splunk/bin/splunk start –answer-yes –accept-license –no-prompt –seed-passwd “splunkpass”
You can change “splunkpass” to your own password if you prefer. Just write it down for later.
I strongly advise you use the same password for every VM until you better understand what you are doing. But since this is a non-prod instance, it’s your call.
Now set splunk to start at boot.
# sudo /opt/splunk/bin/splunk enable boot-start -user splunk –answer-yes –accept-license
First login
use your client machine browser to visit http://<IPADDRESS>:8000 where IPADDRESS is the internal (.11) address of the VM.
Use the admin password you set above. Username is admin
If you cannot get it to load from the client machine, either splunk is not running or something is blocking the connection like a firewall or bad IP address.
Go to SETTINGS > SERVER SETTINGS > GENERAL SETTINGS
Change the server name to reflect it’s purpose, but leave the management port as it is.
Enable HTTPS and change the port to 8443
The default hostname should be the true hostname of the server (FQDN). but for DEV needs, you can use the same name as you used above.
Click SAVE
Click the MESSAGES menu and click the restart link to be shortcutted to the restart area. restart the splunk service.
It will take about 30 seconds, then you can try logging in again, but you need to change the URL
use HTTPS instead of HTTP
use port 8443
which becomes
https://<IPADDRESS>:8443
However, if you click the CLICK HERE TO CONTINUE link, it should update it for you
Most browsers should warn you it’s not secure now, but this is normal. Add the exception to the browser to proceed.
While in Firefox, create a bookmark for each server and name them all for their role
If you are able to login using HTTPS, you are done installing splunk to it’s base configuration.
Hot & Cold
To allow us to use the indexes.conf file later on, we need to make an unused mount point the file will reference. But the mount points will never be used by most splunk servers.
Run these commands:
# sudo -s
mkdir -p /cold
mkdir -p /hot
chown -R splunk /cold
chown -R splunk /hot
VMware Cloning
Now that we have at least one working splunk instance, we can either start at the top and repeat this entire process manually, or we can use VMware’s CLONE option to make a copy of this VM in its current state. Cloning will create new unique MAC addresses for each network adapter but it will not assign new IP addresses, so some additional configuration after cloning will be required.
You may choose how to proceed.
VMWARE:
To perform a clone, power off the VM. in VMware, locate the machine on the left side menu. Right-click the machine name and select MANAGE > CLONE….
Click NEXT
Select THE CURRENT STATE IN THE VIRTUAL MACHINE
Select CREATE A FULL CLONE
Name the VM for the role it’s going to play (eg. Heavy Forwarder, etc)
Ensure the LOCATION path is correct
click FINISH
You can clone all the machines you need now, so repeat as many times as needed.
I recommend making an extra clone to be used as a base in the event you want to add more splunk servers.
VIRTUALBOX:
ensure the VM is stopped. select the VM in the list of machines.
click MACHINE > CLONE
Name the new machine based on it’s role
Provide the correct path as needed
Change MAC policy to GENERATE
check the box to keep disk names
click NEXT
select FULL CLONE
click CLONE
Update Cloned Interfaces
Now that the machines are cloned, we need to ensure all devices have a unique IP address on both interfaces. Since the .10 subnet is DHCP, we can ignore that. but since .11 needs to be static, we need to go in and assign an IP to each device.
One at a time, start up a VM.
When the macine console boots up, login and go to SETTINGS > NETWORK
modify the INTERNAL network adapter and change the last octet of the IP address to a unique IP address for the internal network. .
the EXTERNAL adapter should still be using DHCP so there is no need to change anything.
Ensure the CONNECT AUTOMATICALLY option is checked
Also ensure the adapter is enabled.
Now open a terminal window and run this command:
# sudo nano /etc/hostname
update the hostname to reflect the role of the server. this is helpful for when you SSH to this server.
save and exit.
Also be sure to update the splunk instance names for the new clone.
Delete GUID
Splunk has a unique GUID. since we cloned the machine, we have replicated the GUID and thus all GUIDs are identifcal. so we need to delete the GUID and restart splunk. Run this command on each clone:
# rm -f /opt/splunk/etc/instance.cfg && /opt/splunk/bin/splunk restart
Collect Inventory
Go to each of your VMs and collect the ip addresses for all internal interfaces.
Use command
# ifconfig
Locate the .11 address and write it down on a notepad at the client machine. You will need this a lot.
Once you have your list of IPs and hosts, you can now add host aliases to make your life easier.
For example, the search heads below use alias SH1, SH2, SH3.
So if I want to ssh to search head 3, I only need to use
# ssh sh3
instead of
# ssh 192.168.11.14
Now use this command on the client machine
# sudo nano /etc/host
Now paste in all that host info into this file (paste by right-clicking on the terminal). Save and exit/
this will tell your client OS the IPs of all your VMs.
Structure
The Distributed Environment will be based on what Splunk recommends in their training courses. The list of devices as follows:
- Client Machine / Syslog Source Forwarder
- Search Head 1
- Search Head 2
- Search Head 3
- Indexer 1
- Indexer 2
- Indexer 3
- Search Head Deployer & License Manager
- Monitoring Console
- Master Node (Index Cluster Mgr)
- Deployment Server
- Heavy Forwarder
this tutorial is quite large and wordpress is hurting. so I will continue this in part 2