Caching is a quite useful concept, specially if you have relatively heavy computations that might not need to be done every single time. Among many different types of caches, LRU caches are a quite popular choice. The basic idea is to retain most recently used n outcomes that can fit inside the allocated memory for the cache. So those cached outcomes can be used to serve subsequent queries without recomputing those outcomes. This is quite a good fit for most practical use cases when:
- same query is likely to occur in a reasonably small time window,
- computed outcomes for queries are time sensitive. i.e. outcomes depends on the time it's computed.
I was recently asked to implement an LRU cache during a coding interview and I was quite happy that interviewers do ask non hypothetical questions too. Anyway this article is not about interviews. It's an LRU cache implementation for a new programming language called Ballerina.
Current cache implementation of Ballerina in its Standard Library
I wanted to give Ballerina a try for a long time, but I didn't get a chance. Finally I got some time to dig into their codebase and try it out. I was quite impressed by the fact that they have included an LRU cache implementation in their standard library and it's shipped together with the compiler. It looks like that caching implementation is consumed in few other places inside other standard libraries, like the HTTP module.
I looked at their caching implementation and found these issues with it:
- Its runtime complexity is not the optimal for a LRU Cache. If size of the buffer is
N
and if there are M
cache queries (assuming GET/SET ratio is constant) then its run time complexity is O(NM)
. The LRU Cache can be implemented to run in O(M)
.
- It has a parameter called eviction factor
f
, and what it does is once the cache is full it removes N*f
least recently used items. This behavior doesn't utilize allocated memory for the cache in 100%, but average utilization is something like (1-f)*100 + (f*100/2)
percent. - It compares between all timestamps of the items to compute the least recently used
N*f
items to be evicted. But this depends on the precision of the timestamp. It currently stores timestamps in milliseconds. So if all items in the cache are accessed in the same millisecond, then above computation would be inaccurate. - It has a separate task running every 5 seconds to cleanup expired cache entries. The way it's implemented this also introduces a shared lock between all cache instances. Altogether this cleanup processes is unnecessary work as this check can be done only during the GET operation. Also it doesn't make much sense to do this regular cleanup work for saving memory. Usually LRU caches are expected to be full all the time. If not consumer should consider reducing the amount of memory (i.e.
N
) allocated for the cache. - It seems to be written in a non thread-safe manner. See here the comment I made in an issue.
Here how the current implementation takes O(NM)
time:
Once cache is reached to it's capacity it computes Nf
least recently used entries in cache. That's done in O(N*Nf)
, using 2 loops. This could have been done much faster using a heap, but that approach won't help us much to get the overall time complexity down to O(M)
Out of M
cache access, if we assume the ratio between GET/SET operations is constant, then we can assume the above eviction is going to happen O(M/(N*f))
times. Note that the "Big O" eliminates that constant factor.
So total run time complexity is O((M/Nf) * (N*Nf))=O(MN)
.
We should be able to implement LRU cache so that its time complexity is independent from the cache size. Key idea to do so is to use a separate linked list to store entries sorted by their access time. Entries in the map will have a pointer to the corresponding item in the linked list, so that whenever you access an item, you can move that item to the front of the linked list. That way the least recently accessed item is alway the tail item of the linked list.
Here is a general explanation on how LRU caches are implemented.
You can find my LRU cache implementation on Ballerina here.
Few important points regarding this specific implementation:
- It uses
time:nanoTime()
not because we need nanosecond precision. In fact their is no guarantee that underlying hardware clock has nanosecond accuracy. I'm using it simply because time:currentTime():time
seems to be 10x slower than time:nanoTime()
.
By the way time:nanoTime()
does not return the current timestamp since epoch date. It's just a wrapper to System.nanoTime in java which returns a time since some arbitrary reference time. It's safe to be used here as long as this program is not going to run 100 years continuously without a restart.
- Right now Ballerina maps are thread-safe. So that synchronizations on map operations are redundant, as all LRU cache operations anyway need to be synchronized using
lock
. - I initially implemented
CacheItem
as a Ballerina Closed Record. But looking at Ballerina code, it looks like Records are implemented as Hashtables. If you look at caching implementation, attributes of CacheItem
s are accessed all the time. So having to access these attributes using a Hashtable doesn't make much sense. Ideally it should be implemented using something like C struct. Therefore I changed CacheItem
implementation to use Ballerina Objects which is similar to Java classes.
Here's a comparison of the run time of the O(N)
implementation and the O(NM)
implementation in Ballerina Standard Library:
Capacity (N) | O(M) Cache run-time (s) | O(NM) Cache run-time (s) |
---|
5 | 11.876 | 21.133 |
10 | 15.760 | 17.318 |
20 | 13.688 | 18.829 |
40 | 16.180 | 26.507 |
80 | 17.271 | 46.618 |
160 | 17.502 | 78.205 |
320 | 17.851 | 142.091 |
640 | 18.129 | 275.218 |
1000 | 18.435 | 423.605 |
You can find the program used to measure these numbers here
Measured times are real elapsed time. Not CPU time, which I couldn't find a way to measure in Ballerina.
As you can see the run-time of the cache implementation in Ballerina Standard Library varies linearly as N
grows. But stays quite stable for smaller N
, specially for values smaller than 20 or so. I guess this is probably due to CPU caching that takes place because of all operations occur in a smaller array when it computes the evicting entries.
It looks like there's a slight increase in time as N
increases for O(M)
implementation. To verify this I tried with few different N
and had following results:
Capacity (N) | O(M) Cache run-time (s) |
---|
102 | 10.470 |
103 | 10.545 |
104 | 13.518 |
105 | 19.555 |
106 (with default heap size) | GC overhead limit exceeded |
106 (with 256MB) | 44.177 |
106 (with 512MB) | 21.131 |
106 (with 1024MB) | 18.156 |
For N
=106 with the default heap size, it crashes giving following error:
java.lang.OutOfMemoryError: GC overhead limit exceeded
I increased the Java heap size and got rest of the results after the failed test. Here we observe that as heap size increases, the runt-time performance also becomes better. I think this is due to the additional computations that need to be done when trying to allocate space for objects within a small heap vs doing that in a larger heap.
So the slight increment in run-time as N
grows is due to additional memory allocations it has to do within the same heap size.
In my opinion it would have been nice if the compiler could handle 1 million records (60 bytes per each record) in a map within a 128MB heap. It's quite interesting to implement same Cache in Java and see whether it can handle 1 million items within 128MB. Anyway it's not a big issue whereas the language doesn't seem to worry about those aspects too much right now.
You can pull it within your code by adding it as a dependency in your project by editing .toml
file. e.g.
[dependencies]
"chethiya/cache" = "0.2.1"
Run following command to pull the module:
ballerina pull chethiya/cache
Now you are ready to use the LRUCache
in your code. Simply import the module and create an instance of it in your .bal
file. e.g.
import ballerina/http;
import ballerina/log;
import chethiya/cache;
cache:LRUCache cachedResults = new(1000, 60000, false);
service hello on new http:Listener(9090) {
resource function sayHello(http:Caller caller, http:Request req) {
var query = getQuery(req);
SearchResult? result = cachedResults.get(query);
if (result is ()) {
result = computeResult(query);
cachedResults.put(query, result);
}
var res = caller->respond(result);
if (res is error) {
log:printError("Error sending response", result);
}
}
}
If you have any issue regarding this implementation, please report those here.