Tuesday, 20 March 2012

Hibernate Caching

Hibernate provides three different caching mechanisms; first-level, second-level and query cache. Understanding how to use the caching mechanisms is important to enhance performance. Incorrectly configuring caching could lead to degrading performance. This post gives a conceptual understanding of how Hibernate caching works. 

The configuration of caching is provided by the Hibernate documentation.

First-level Cache
The Hibernate Session is a unit of work representing a transaction at the database level. When a session is created and Hibernate entities modified, Hibernate will not update the underlying database tables immediately. Instead it will keep track of the changes and perform a reduced number of SQL statements at the end of the session. For example, if an entity is modified several times within the same session, Hibernate will generate only one SQL update statement at the end of the session containing all the changes.

Second-level Cache 
The second-level cache is associated with the SessionFactory rather than each Session and it is not enabled by default. The second-level cache doesn't store instances of an entity (to prevent trips to the database when the entity is requested); rather it stores a dehydrated state of the entity. Conceptually this can be thought of as a Map which contains the entity's id as the key and and an array of the properties as value. 

As an example lets assume we have the following Employee entity:


public class Employee {
  private Employee manager;
  private String forename;
  private String surname;
  private Set<Employee> staff;
  //setters and getters
}

Hibernate will cache the records as such:

Conceptual Employee Data Cache

Id  [forename, surname, manager, [staff] ]
1 [ “John”, “Smith”, null, [2, 3] ]
2 [“Sarah”,”Brown”, 1, [] ]
3 [“Gavin”, “Adams” 1, [] ]


So if the Employee with id 1 is queried from the database without the cache, it would result in the following queries:


select * from Employee where id=1 ; load the employee with id 1
select * from Employee where manager_id=1 ; load the staff of 1 (will return 2, 3)
select * from Employee where manager_id=2 ; load any potential staff of 2 (will return none)
select * from Employee where manager_id=3 ; load any potential staff of 3 (will return none)


With the cache enabled, there would be no SQL select statements executed. If however, the associations were not cached then it would result in all the queries except the first. Therefore, it is best to cache associations whenever possible. 

The above queries were based on using the entity identifier. If the query were more complex such as by forename then Hibernate must still issue a select statement to retrieve the identifier of the entity before the cache can be queried for associations.
 
//Complex query
Query query = session.createQuery("from Employee as e where e.forename=?");
query.setString(0, "John");
List l = query.list();

//single SQL select statment to retrieve id.
select * from Employee where forename='John'

This mandatory select statement to retrieve the id is where the query cache can be used.

Query Cache
The query cache is responsible for caching queries and their results. This is only useful for queries that are run frequently with the same parameters. Conceptually the query cache works similarly to the caching of associations in the second-level cache; the query and parameters are stored as a key, with the value being a list of identifiers for that query. These identifiers are then used to query the second-level cache for a given entity which is then hydrated.