This article is about improving JPA's performance when working with large/thick hierarchies.
Let's suppose that for some reason you have a similar (or larger) hierarchy to one I encountered in one of my projects:
- InheritanceType.JOINED also using DiscriminatorColumn
- 4 levels hierarchy (including root entity)
- 24 entities actually making the hierarchy
- 7 properties as an average for each entity (excluding @Id)
- other entities are referenced by those actually making the hierarchy
- using hibernate JPA 2.1 api (though this might not be important)
When retrieving an entity I face the situation of about 70 tables being joined (24 from the hierarchy and 46 other referenced entities) and the performance is pathetic. Of course I have the appropriate indexes painfully chosen by testing every query with EXPLAIN ANALYZE.
Part of the problem is that I also have a large table for the hierarchy's root (26 million of rows). The other part is the way JPA is working: I mean when I try to get the root entities filtered by few discriminator types then JPA creates a 70 tables SELECT instead of using only the tables making sense for those specific discriminators.
Let's name RootEntity the root entity of my JPA hierarchy. Let's also suppose I have Level4Entity (with 4a as discriminator) which inherits from Level3Entity which inherits from Level2Entity which inherits RootEntity. When querying for RootEntity using 4a and 4b discriminator types as criteria then JPA is using the same query as when querying for RootEntity's entire hierarchy (70 tables as a total). Should JPA only use the tables required by Level4Entity's specific sub-hierarchy (e.g. 12 tables: 4 for hierarchy and other 8 referenced) then the performance would be much better. Even when simply retrieving a RootEntity entity by its @Id the timing is 12 seconds!
The client cares only for the performance (while using same hardware) asks: can you do better? Well, let's now just imagine the totally crazy idea of using InheritanceType.SINGLE_TABLE instead of InheritanceType.JOINED. For me this would mean an 177 columns table for RootEntity's and liquibase scripts in order to copy the columns and data from previous tables to the RootEntity's table, refactor FKs and indexes. But would this effort worth? Well, if your database supports materialized views you could test this approach before changing your code and database. You could simply create a materialized view based on the actual 70 tables SELECT generated by JPA then you could test the retrieving of one row by pk (somehow equivalent to the previously retrieving by @Id which took 12 seconds). If you really want the best comparison than you should include in the materialized view only the tables used directly by the hierarchy (24 tables) which is not so straight forward but still easy to do. For me this approach allowed for a 450 ms when retrieving an entity instead of 12 s as before - quite a huge difference.
So yes, changing the hierarchy to InheritanceType.SINGLE_TABLE and using a many-columns table with many possible null columns was by far faster then using InheritanceType.JOINED approach. And not only the retrieve was faster but also the INSERT (I don't remember exactly by how much but at least 5 times faster).
So, what are the pro and cons of the approach?
- speed: both for query and insert
- sql simplicity: I find easier to read a many columns SELECT instead of many JOIN SELECT
- database structure: you'll have 1 many columns table versus many tables
- database consistency: columns which were NOT NULL in their dedicated table should now accept NULL in order to account for the situation when their row might not exists for a specific sub-hierarchy
Observation: with Oracle table partitioning on the join columns of the hierarchy I guess (but never tested) you should gain same performance benefit while still using InheritanceType.JOINED