The source of the data can be
event logs,
web logs,
live application logs,
network feeds,
system metrics,
change monitoring,
message queues,
archive files, and so on.
In general, data sources are grouped into the following categories.
[Files and directories]
Most data that you might be interested in comes directly from files and directories.
[Network events]
The Splunk software can index remote data from any network port and SNMP events from remote devices.
[Windows sources]
The Windows version of Splunk software accepts a wide range of Windows-specific inputs, including Windows Event Log, Windows Registry, WMI, Active Directory, and Performance monitoring.
[Other sources]
Other input sources are supported, such as [FIFO queues] and scripted inputs for getting data from [APIs], and other [remote data] interfaces.
Events are stored in the index as a group of files that fall into two categories:
Raw data, which is the data that you add to the Splunk deployment. The raw data is stored in a compressed format.
Index files, which include some metadata files that point to the raw data.
These files reside in sets of directories, called buckets, that are organized by age.