Linked List versus Hashed Linked iIst

Tue Aug 19 07:45:48 EDT 2014

On August 18, 2014 10:53:12 PM EDT, Nick Krause <xerofoify at gmail.com> wrote:
>On Mon, Aug 18, 2014 at 7:01 PM, Greg Freemyer
><greg.freemyer at gmail.com> wrote:
>> On Mon, Aug 18, 2014 at 4:20 PM, Nick Krause <xerofoify at gmail.com>
>wrote:
>>> What are the advantages of the hashed linked list version over the
>>> standard one and does it
>>> increase the memory usage and overhead of the linked list more if I
>>> use a hashed version?
>>
>> Seriously?  Do you know what a hash is?
>>
>> A hash is a well-defined many to one algorithm.
>>
>> If I have a universe of a million items that hash down to 100 unique
>> hashes, then I can group those million items by hash and have 100
>> groups of roughly 10,000 items each.
>>
>> The better the hashing algorithm versus my original universe of 1
>> million items, the more even the distribution.
>>
>> Now that I have 100 segregated groups I can build an array of 100
>> linked lists all maintained separately.
>>
>> Thus:
>>
>> hash_index = my_hash(item)
>>
>> add_item(linked_list[hash_item], item) is how I add my item to the
>> hashed linked list.
>>
>> is_in_list(linked_list[hash_item], item) is how I check to see if my
>> item is already in the list.
>>
>> So in my example I have to have 100 linked lists, but each list is on
>> average 100x smaller than a simple linked list would be.
>>
>> Is adding an item to the hashed linked list faster?
>>
>> Absolutely not, I have to hash the item first then do a normal linked
>> list insertion.  That will always be slower.
>>
>> Is finding the item faster?
>>
>> That is the whole point of the exercise.  The theory is you ONLY use
>a
>> hashed linked list if the overhead of hashing the item is less than
>> the amount of time saved by traversing shorter lists when you search.
>>
>> It is the job of the programmer to make the determination if a hashed
>> list is a better choice or not on a case by case basis.  It depends
>on
>> the length of the list without breaking it into pieces and how well
>> the hash algorithm can do at generating roughly similar segregated
>> groups.
>>
>> For the size question, write yourself a userspace app and test it.
>> Obviously that is more work than asking here, but it is ASSUMED you
>> are doing research on your OWN before you post questions here.
>>
>> fyi: this question has little to do with the linux kernel.  It is
>part
>> of what people mean when they say you need to go learn c before you
>> start on the kernel.  Using linked lists and hashed linked lists is
>> stuff you can fully explore in userspace.
>>
>> Greg
>No I known what the advantages are for user space was wondering if
>there were any issues that differ in
>kernel space.
>Nick

1) your original question needs either a highly generic answer like I gave, or a highly specific one that depends on the exact nature of the data, the number of the items tracked, the ratio of searches vs. adds, and how smooth the hash grouping is.  Since you didn't provide the exact use case, only the generic answer is possible.  In fact your question implies that the answer is relatively straight forward.  A much better question would have been "for a specific use case, how is the choice of a normal linked list vs a hashed linked list performed?"

Note the answer to that has nothing to do with user space vs kernel space.

2) The kernel is not a magic place.  Sure there are issues like locking and interrupts that make the kernel more complex than user space, but for data algorithms it is just that the quality of the code is pretty universally excellent.  It is excellent because it has been open for 20+ years and some great developers have worked on it during that time.  Poorly written code in any of the core areas was eradicated long ago.

You can take that excellent code into your user space app and test it to your heart's content.  Not only can you do that, for something like a linked list evaluation, you should do that.  You have implied "tested" code is code that compiles.  If a developer wanted to replace the hashed linked link implementation it would be expected that they had done significant testing of the new code in user space with highly varied loads to show what they work well on and when the new code performs less well.  Then do an analysis of the existing kernel data structures which use hashed linked lists and prove that the new method is an improvement for the actual kernel use cases.  It would be months of work, but that is what it takes if you actually want to improve the kernel in a meaningful way.

Greg

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.