How to: Introduction to Splunk Indexing
Introduction
Splunk is IT search1: it allows searching and navigating of information—logs, configurations, messages, traps and alerts, scripts, code, metrics—from network devices.2 We use it at my place of employment for our system logs, amongst other things. All servers, some devices, and now some applications send their logs to Splunk. This gives us a unified search across our network, which is useful when diagnosing issues. Another benefit to Splunk is the specificity of log viewing. This is nice when you want to allow specific users or groups to view some logs but not have access directly to the host machines. For example, a help desk might benefit from seeing a log, but you wouldn’t want them to have access directly to the machines.
Recently, I was tasked with sending a new log to Splunk—the log from our custom web application that issues null routes. The idea was that this would allow an extra layer of accountability. I had a hard time finding easy, explicit instructions, so I thought the community would benefit from such documentation. So here it is: Logan Leger’s quick and easy five-step introduction to Splunk indexing.
Step 1: Identify Log
The first step to indexing in Splunk is to identify the log that you want to be indexed. This might be a system log or, in my case, an application log. For a system log in Unix, this should be in /var/log; for some versions of Windows, this will be in %SystemRoot%system32config.3 For sending Windows Event Logs to the syslogger, Ross Brown recommended evt2sys on Twitter. In a custom application, this will be where the programmer specified; review the code to find this. In Unix using C, sending syslog data is usually accomplished using the syslog(3) family of functions4, which are included in libc. In Perl, you can use the Sys::Syslog module.5
Step 2: Send Log to Syslog
Once you identify the log you want to send to Splunk, send it to the syslog daemon. In Perl, you can use the Syslog extension. Once the application is sending the log to the daemon, open up etc/syslog.conf and add the following:
local0.* /var/log/log.log.
Change local0.* to whichever facility is available to you, just make sure that isn’t already chosen. Also, change log.log to the actual name of your log.
Step 3: Send Log to Splunk Daemon
The next step is to send the log to the Splunk daemon. To do this, append /etc/syslog.conf with the following:
local0.* @splunk.example.org.
Again, make sure to change local0.* to whichever facility is available; this should be the same as above. Change @splunk.example.org to the address of your Splunk installation. Now, restart the syslog daemon with syslogd (/etc/rc.d/syslogd restart).
Step 4: Add Log to Splunk Configuration
Now that you’re sending the log to the syslog daemon and also sending it to Splunk remotely via the syslog daemon, it’s time to add it to the Splunk configuration. Open up /etc/syslog.conf in your Splunk installation (this is the syslogd configuration and not Splunk-specific) and add the following line:
local0.* /var/log/remote/log.log.
Once again, use the same facility from above and change log.log to the actual name of the log. Now, restart the syslog daemon (/etc/rc.d/syslogd restart).
Step 5: Add Log to Splunk Web Interface
First, log in to the web interface as an administrator, and click on “Admin” in the top right-hand corner. The Splunk logo in the top left should now say “Splunk>Admin.” Click on “Data Inputs” and then “Files & Directories” under “Data Inputs” in the sidebar on the left. Then, click the “New Input” button near the top-center. Fill out the details in the form. These will vary, but more than likely you will want to click the “Monitor a directory” radio button—Splunk works similar to the Unix tail -f command. The “Full path on server” is the path to the remote log on the Splunk installation, taken from step 5—/var/log/remote/log.log.
Troubleshooting
I initially ran into some issues when sending the log to Splunk. To test whether or not Splunk was actually receiving the data, Anthony, a coworker, recommended using the logger(1) command.6 Basically, we used the logger command to add our own data to the logs (absolute bogus data; it was just for testing purposes, but we knew exactly what we said and where it’d be). This might come in handy if you run into issues. We saw this bogus data show up in Splunk, so we knew it was sending data. It turned out that we had internal problems in our application.
Conclusion
That’s it! Your log should now show up in Splunk. Play with the search to see if it is. Keep in mind that you must allow the IP of any remote boxes through any firewalls. Also, these instructions were written with Unix/FreeBSD in mind. While most of this information might be similar, there will be some differentiation—e.g. syslogd, rc.d. On the server-side, the syslog daemon is used to receive messages relayed by another remote syslog daemon. This is completely independent of Splunk. (There is an actual syslog protocol which was just recently standardized and extended, but it will be some time before we see the older implementations abdicating to the new protocol.) If your setup has a central log server, this method should still work.
I hope this helps alleviate any headache trudging through the Splunk documentation—it’s actually quite easy!
[Many thanks to Anthony Illiopoulos for his significant input on this article.]
¶

How do you incoporate Arcsight with this product?
I have no idea. I’ve never dealt with Arcsight, just Splunk, but I imagine the two would work well together.