My dataset looks like this:
Task-1, Priority1, (SkillA, SkillB)
Task-2, Priority2, (SkillA)
Task-3, Priority3, (SkillB, SkillC)
Calling application (client) will send in a list of skills - say (SkillD, SkillA).
lookup:
- Search thru dataset for SkillD first, and not find anything.
- Search for SkillA. We will find two entries - Task-1 with Priority1, Task-2 with Priority2.
- Identify the task with highest priority (in this case, Task-1)
- Remove Task-1 from that dataset & return Task-1 to client
Design considerations:
- there will be lot of add/update/delete to the dataset when website goes live
- There are only few skills but not a static list (about 10), but for each skill, there can be thousands of tasks. So, the lookup/retrieval will have to be extremely fast
I have considered simple List with binarySearch(comparator) or Map(skill, SortedSettasks(task)), but looking for more ideas.
What is the best way to design a data structure for this kind of dataset that allows a complex key and sorted array of tasks associated with that key.
How about changing the aproach a bit? You can use the Guava and a Multimap in particular.
Every experienced Java programmer has, at one point or another, implemented a
Map<K, List<V>>
orMap<K, Set<V>>
, and dealt with the awkwardness of that structure. For example,Map<K, Set<V>>
is a typical way to represent an unlabeled directed graph. Guava's Multimap framework makes it easy to handle a mapping from keys to multiple values. A Multimap is a general way to associate keys with arbitrarily many values.
There are two ways to think of a Multimap conceptually: as a collection of mappings from single keys to single values:
I would suggest you having a Multimap of and the answer to your problem in a powerfull feature introduced by Multimap called Views
Good luck!