Phasing out ElasticSearch data with Index Lifecycle Management

Out with the (c)old, in with the new

Recently I needed to look into time-to-live (TTL) capabilities within ElasticSearch. It has been several years since I used this feature (version 1.5, to precise – aeons ago!) Unsurprisingly, things have changed significantly since. Instead of setting _ttl in the index mapping to true and defining a value for indices.ttl.interval we have a completely new implementation. It’s known as ILM or Index Lifecycle Management. In this article I will take a look at how this works.

Index Lifecycle Management

The ILM feature was first introduced in version 6.6.0 and is out of beta as of version 6.7.0.

Index Lifecycle Management offers more than just defining a time-to-live. It is does not just allow the user to set-up a policy to take care of deleting old data. Here are the additional features that are interesting:

  • ILM allows the user to move indices through as many as 5 discrete phases. Hot, warm, cold, frozen and delete.
  • You can apply different constraints such as read-only access, max. size or duration etc. to each of these phases.
  • An ILM policy can be defined from within an index template. As such it can apply to either an index or a data stream.
  • The policy creates new indices transparently as needed, automatically updating the index alias. This keeps the implementation details hidden from clients (as they only write to the alias).

There are some differences when using a data stream or an index. I will look at these with a step-by-step example below.

Walk-through: ILM with a data stream

The first thing to do is to define our ILM policy. The policy is independent of indices and index templates as it will be applied to objects once it is created. We can define the policy either from within Kibana or using the API. I’ll be using the API. We create/PUT a policy defining at least a hot phase, and optionally any of the other four.

We can define our policy to trigger a rollover by size and/or age (whichever occurs first). The min_age tag indicates when the policy will move the index to the next phase. In this simple example we have a hot phase that rolls over to warm. This happens when either the index size reaches 50 GB or when 1 minute has passed. The default value is zero, meaning in effect that the index will move to the next phase automatically. I’ve set the aging times purposefully low so we can check if the policy works as expected:

Index Template

We define an index template with an index pattern (in our example: “aken”) and our policy (index.lifecycle.name). We can add any other settings we will require for subsequent indices that are created on rollover. Notice that the data stream tag is empty. We just need to include this tag so that we have a data stream with the same name as the index pattern. When we post a document to this datastream, we can see that the ILM policy has created an appropriate index. This is named using the pattern .ds-{data stream name}-XXXX e.g. ds-aken-0002. Here is a summary of these steps:

After a few minutes we can check on the indexes matching the .ds-* pattern, and sure enough, we now have 2 of them. The data stream triggers the incrementation of indices. The highest number being the most recent one and, therefore, the one to which the alias currently points.

Updates

However, the documentation tells us that data streams only work consistently for append operations. What this means is that an attempt to update a document – via the index alias – using its _id, will result in an exception. The way around this is to update the document by addressing the backing data streams index (e.g. .ds-aken-00002), but this requires us to know the documents id, sequence number and primary term, thus requiring implementation details to leak out for the client to use:

Walk-through: ILM with an index

As an alternative to data streams we can use an index-pattern. Instead of having an rollover implicit in the data stream, we have a few extra steps. We need to define an index pattern (as opposed to the name) in the template and provide the rollover alias using the index.lifecycle.rollover_alias setting. We then have to create an initial index, specifying the same alias as for the rollover and declaring it as a “write index” i.e. the one to which the alias initially points. When we post a document to the index alias, elastic will write it to whichever index is backing the alias at that time:

In contrast to data streams, we can update via the alias as normal:

We can track the current state of our ILM policy by using this call:

GET aken/_ilm/explain

This gives us a summary of all indices with their phases, their lifetime, pending actions and other information. Note that the documentation states that ILM expects a cluster status of green for things to work smoothly:

Single node clusters

That last point has a clear bearing on ILM usage with a single-node cluster. The status for a single-node cluster is never green, as there is, by definition, no replication of data and thus no failover capability. In my tests the indices never reached the cold phase, despite the very short aging time. In addition, rollover from hot to warm took place at 20 minute intervals – rather than the shorter one defined in the policy. It’s not clear as to why this is the case. The default setting for indices.lifecycle.poll_interval is 10 minutes and so the “unexpected side effects” may well be that elastic decides that the absence of replicated shards prevents indices from being moved from warm, and the short aging times are thus redundant as they are set to a value lower than the poll interval.

In other words: test this thoroughly if you are planning to use it in a single node cluster. And if that is the case it may be easier to regularly run a delete-by-query followed by a forced expunge.

Wrap-up

Depending on our scenario we have a fairly easy way to manage our time-series data. We can use either data streams (append-only) or standard indices, both with automated rollover. Apart from the initial configuration, elastic does all the heavy lifting for us and keeps the implementation details abstracted away behind the index alias, which Index Lifecycle Management also manages for us. ILM is available in the basic version of the elastic stack, which also means that it is available for free.

5 1 vote
Article Rating
Subscribe
Notify of

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x