KE
  • dotNet Web 3.0
  • Engineering Management
    • Process Planning (SDLC)
      • Software development process
      • Basics of SDLC models
      • Scrum
      • Kanban
      • Scrum vs Kanban: applicability
      • Scrumban
    • Estimation
      • Scope Concept
      • Estimates, Targets, and Commitments
      • Overestimate vs Underestimate
      • Decomposition and Recomposition
      • Analogy-based estimations
      • Estimating in Agile
  • Requirements
    • Software Requirements Engineering
      • Requirement definition
      • Levels of Requirements
      • Most common requirements risks
      • Characteristics of Excellent Requirements
      • Benefits from a High-Quality Requirements Process
      • Root Causes of Project Success and Failure
  • Design
    • OOD
      • Abstraction
      • Encapsulation
      • Inheritance vs Aggregation
      • Modularity
      • Polymorphism
      • Abstraction Qualities (cohesion, coupling, etc)
      • Types vs. Classes
      • Separation of concerns principle
      • SOLID
      • Design Patterns
        • Structural patterns
        • Creational patterns
        • Behavioral patterns
      • Most often used design patterns
      • Software Architecture Patterns (structure, pros & cons)
      • Inversion of Control Containers and the Dependency Injection pattern
      • Domain-Driven Design patterns
      • Anti-patterns
    • DB Design
      • Relational Terminology: Entities
      • Relational terminology: Attributes
      • Relational terminology: Records (Tuples)
      • Relationships (One-to-One, One-to-Many)
      • Understanding ER notation
      • Understanding normalization concept
      • Data Integrity
    • Modeling
      • UML: Basic Diagram Types
      • UML: Use Case Diagram (Essentials)
      • UML: Class Diagram (Essentials)
      • Entity Relationship Diagrams
      • Data Flow Diagrams
    • Security
      • Information security concepts
      • Access Control Lists (ACLs)
      • Access Control Models
      • .NET Cryptography Model
      • ASP.NET Identity
      • OWASP Top 10
      • Cross-Site Request Forgery (XSRF)
      • Protecting against cross-site scripting attacks (XSS)
      • Protecting against buffer overrun attacks
      • Protecting against SQL-injection attacks
      • CSRF/XSRF protection
    • Algorithms
      • Algorithms complexity (understanding, big O notation, complexity of common algorithms)
      • Array sorting methods (bubble sort, quick sort, merge sort)
      • Tree structure (construction, traversal)
      • Binary search algorithm
      • Hash table (creating, collisions)
      • Stack, queue, linked list (construction, understanding, usage)
  • Construction Core
    • Programming language
      • Declare namespaces, classes, interfaces, static and instance class members
      • Types casting
      • Value and reference types. Class vs Struct usage.
      • Properties and automatic properties
      • Structured Exception Handling, Exception filters
      • Collections and Generics
      • Dictionaries. Comparison of Dictionaries
      • Building enumerable types
      • Building cloneable objects
      • Building comparable types
      • Nullable types
      • Delegates, events and lambdas
      • Indexers and operator overloading
      • Anonymous types
      • Extension methods. Practices.
      • Custom Type Conversions (implicit/explicit keywords)
      • Strings and StringBuilder. String concatenation practices. String Interpolation
      • Serialization
      • System.IO namespace
      • LINQ to Objects
      • General Coding conventions for C#
      • Static Using Statement
      • Type Reflection
      • Custom attributes
      • Dispose and Finalizable patterns
      • Garbage collection
      • .Net Diagnostics
      • Implementing logging
      • Exception handling guidelines
      • Regular Expressions
      • Span<T> struct
      • C# - What's new?
      • .NET Standard overview
    • Concurrency
      • Understand differences between Concurrency vs Multi-threading vs Asynchronous
      • Concurrency: An Overview
      • Async basics
      • Task Parallelism
      • Basic Synchronization in C#
      • Deadlock problem
      • QueueBackgroundWorkItem or IHostedService for .NET Core
      • How to run Background Tasks in ASP.NET
    • Refactoring
      • Refactoring Concept (what/when/why)
      • Smells Catalog and possible re-factorings
      • Moving Features Between Objects (basic)
      • Organizing Data (basic)
      • Composing Methods (basic)
      • Simplifying Conditional Expressions (basic)
      • Making Method Calls Simpler
      • Dealing with Generalization
    • Product deploying, software installation
      • Create, configure, and publish a web package (.NET Web Profile)
      • Publishing Web Services
      • Manage packages by using NuGet, NPM and Bower
    • Networking
      • Understanding networks: layers and protocols
      • Basic understanding of TCP/IP model and protocols
      • Defining internet, intranet and VPN
      • Basics of Firewalls and DMZ
      • Application layer protocols basics (HTTP, FTP, Telnet)
      • Understanding HTTP and WWW
      • Basic troubleshooting tools (ICMP, ping, traceroute)
      • Client/Server model
      • Sockets, IP and port addressing
      • Using proxy server
      • File transfer services: FTP, TFTP
      • Name resolution services: DNS, whois
      • Remote access services: Telnet, SSH, rdesktop, VNC
      • The basic difference between HTTP and HTTPS protocols
  • Construction Web
    • Web server applications
      • ASP.NET Core
        • Application startup
        • Middleware
        • Working with Static Files
        • Routing
        • Error Handling
        • Globalization and localization
        • Configuration
        • Logging
        • File Providers
        • Dependency Injection
        • Working with Multiple Environments
        • Hosting
        • Managing Application State
        • Request Features
      • ASP.NET Core MVC
        • MVC basics (Model, View, Controller, DI)
        • Model binding and validation
        • View (Razor compilation, Layout, Tag Helpers, Partial Views, DI, View components)
        • Controllers (Route to actions, File uploads)
      • Security and Identity (concepts understanding)
        • Authentication
        • Using identity
        • Authorization with roles
      • Bundle and Minify assets
      • Develop ASP.NET Core MVC apps
      • Advanced topics for ASP.NET Core MVC
        • Application model
        • Filters
        • Areas
        • Application Parts
        • Custom Model Building
        • IActionConstraint
      • Host and deploy ASP.NET Core
      • Migrate from ASP.NET to ASP.NET Core
      • Troubleshoot ASP.NET Core projects
      • Open Web Interface for .NET (OWIN)
      • Web server implementations in ASP.NET Core
    • Web Services
      • REST
      • ASP.NET Web API
        • Routing
        • Configuration
        • Basic error handling
      • Web API-based services
      • Web API Security
      • Token based security
      • SingalR
      • Serialization Frameworks
      • Implement caching
      • gRPC on ASP.NET Core
      • API versioning
      • API documentation
    • Microservices and Cloud
      • Microservices architecture
      • Dockerize a .NET Core application
      • Development workflow for Docker apps
    • JavaScript, HTML, CSS
      • JavaScript: Variables
      • JavaScript: Data types and types conversion
      • JavaScript: Operators
      • JavaScript: Control and Loop constructions
      • JavaScript: Functions, Execution Context and Variables scopes
      • JavaScript: Arrays
      • JavaScript: JS in WebBrowser and basic DOM manipulations
      • HTML: Basic elements
      • CSS: Simple Style rules
      • CSS: selectors
      • Box model
      • HTML: Standards and Browser compatibility
      • HTML: Page Layouts with divs
      • HTML: Frames
      • CSS: Elements positioning and layering
      • CSS: Tables properties
      • CSS: Flexbox
      • Different storage
      • JavaScript: Event Understanding (propagation, capturing, attach/detach)
      • JavaScript: Closure
      • AJAX/JSON
      • Ecma script 6: OOP
      • Promise
      • Strict mode of javascript
    • JavaScript Frameworks
      • Selecting elements
      • Operating on collection
      • Manipulating with elements, working with properties, attributes and data
      • Events
      • animation and effects
      • utilities and Ajax
      • SPA (SINGLE PAGE APPLICATIONS)
      • EcmaScript 6
      • UI frameworks basics:
      • NPM basics:
      • React basics
  • Construction DB
    • SQL
      • Tables, relationships, keys, constraints understanding
      • DDL, DML, DCL understanding
      • SQL data types
      • SQL operators, functions
      • Data manipulation (insert, update, delete)
      • Retrieving data (simple select statement)
      • Joins understanding
      • Creating, modifying, removing database objects
      • Aggregations (ORDER BY, GROUP BY, HAVING, SUM, COUNT, AVG, etc)
      • Combining the results of multiple queries (UNION, EXCEPT, INTERSECT, MINUS, subqueries)
      • Sessions, transactions, locks
      • Isolation levels understanding
      • Implementing stored procedures, user-defined functions, triggers
      • Cursors
    • Data Access Layer
      • Manage connection strings and objects
      • Working with data providers
      • Connect to a data source by using a generic data access interface
      • Handle and diagnose database connection exceptions
      • Manage exceptions when selecting, modifying data
      • Build command objects and query data from data sources
      • Retrieve data source by using the DataReader
      • Manage data by using the DataAdapter and TableAdapter
      • Updating data
      • Entity Framework
        • Query data sources by using EF
        • Code First to existing DB
        • Entity Data Modeling Fundamentals
        • Querying Data
        • Data modification
  • Verification
    • Code Quality
      • MSDN: Guidelines for Names
      • SDO Best Practices Catalog - Coding Standards
      • SDO Best Practices Catalog - Code Review Process
      • SDO Best Practices Catalog - Automatic Code Inspection
      • Automated coding standards enforcement (StyleCop, Resharper)
      • Code Reviews and Toolset
      • Use Work Items (TODO, BUG etc.)
      • Preemptive Error Detection
      • Desirable characteristics of a design (minimal complexity, ease of maintenance, minimal connectednes
      • Creating high quality classes
      • Creating high quality methods
      • Guidelines for initializing variables
      • Exceptions and error handling techniques
      • Best practices of working with data types
      • Code commenting practices
    • Automated Testing (principles, patterns, and practices)
      • Software testing basic concepts
      • Software testing concept
      • Test Case
      • Test Suite
      • Test Plan
      • Testing Levels
      • Naming standards for unit tests
      • Types of test doubles (Stub, Mock, Spy, Fake, Dummy)
      • Basic coverage criteria
      • Testing concepts (Unit vs Functional vs Integration)
      • Goals of Unit Testing, What Makes a Test Valuable?
      • Styles of Unit Testing (Output / State / Collaboration)
      • Good unit test properties
      • F.I.R.S.T Principles of unit testing
      • Test Pyramid concept
      • Testing Pyramid, Agile Testing Pyramid, Diamond
      • Breaking the dependency, Interaction testing
      • Strategies for isolating the database in tests
      • Test smells and how to avoid
      • Test Organization patterns
      • Fixture setup patterns
      • Test double patterns
      • Feature-driven development (FDD)
      • Behavior-driven development (BDD)
      • Test-driven development (TDD)
      • Acceptance testing, Acceptance Test Driven Development (ATDD)
      • Continuous testing
    • Automated Testing (Frameworks, Tools, Libraries)
      • .NET unit test frameworks overview
      • .NET Mocking Frameworks, a comparison
      • xUnit
        • Primary test framework attributes
        • Asserts
        • Exception Handling in Unit Tests
        • Skipping Tests
        • Initialization and Cleanup (Assembly, Class, Test)
        • Data-driven Tests
      • NSubstitute
        • Mocking Method Calls (Using Mock Object, Return Values, Argument Matching)
        • Behavior Verification (Method Was/Not Called, a Specific Number of Times, Getter/Setter Was Called)
        • Throwing exceptions
        • Raising Events from Mock Objects
        • Returning Different Results for Sequential Calls
      • AutoFixture
      • EF Core InMemory test
      • Integration tests in ASP.NET Core
      • Isolating database data in integration tests
      • Test ASP.NET Core MVC apps
  • Configuration Management
    • Product builds and Continuous Integration
      • Automated build concept
      • Dotnet cli
      • CI/CD Basic concepts
    • Managing Versions
      • Fundamental concepts: revisions, working copy, repository, branch, baseline, trunk
      • Versioning Models
      • Distributed Version Control basics
      • Distributed systems advantages and weak sides
      • VCS Management life-cycle on (one of) major tools (clone, commit, update, revert, merge, resolve, et
      • Branching/Merging strategies
      • Blaming (annotate)
      • Revision graph/log actions (Git)
      • Integrating with Issue Tracking Systems
      • Source control Best Practices
Powered by GitBook
On this page
  1. Design
  2. Algorithms

Hash table (creating, collisions)

Hash Table is a data structure which stores data in an associative manner. In a hash table, data is stored in an array format, where each data value has its own unique index value. Access of data becomes very fast if we know the index of the desired data.

Thus, it becomes a data structure in which insertion and search operations are very fast irrespective of the size of the data. Hash Table uses an array as a storage medium and uses hash technique to generate an index where an element is to be inserted or is to be located from.

Hashing is a technique that is used to uniquely identify a specific object from a group of similar objects. Some examples of how hashing is used in our lives include:

  • In universities, each student is assigned a unique roll number that can be used to retrieve information about them.

  • In libraries, each book is assigned a unique number that can be used to determine information about the book, such as its exact position in the library or the users it has been issued to etc.

In both these examples the students and books were hashed to a unique number.

Assume that you have an object and you want to assign a key to it to make searching easy. To store the key/value pair, you can use a simple array like a data structure where keys (integers) can be used directly as an index to store values. However, in cases where the keys are large and cannot be used directly as an index, you should use hashing.

In hashing, large keys are converted into small keys by using hash functions. The values are then stored in a data structure called hash table. The idea of hashing is to distribute entries (key/value pairs) uniformly across an array. Each element is assigned a key (converted key). By using that key you can access the element in O(1) time. Using the key, the algorithm (hash function) computes an index that suggests where an entry can be found or inserted.

Hashing is implemented in two steps:

  • An element is converted into an integer by using a hash function. This element can be used as an index to store the original element, which falls into the hash table.

  • The element is stored in the hash table where it can be quickly retrieved using hashed key.

    hash = hashfunc(key)
    index = hash % array_size

In this method, the hash is independent of the array size and it is then reduced to an index (a number between 0 and array_size − 1) by using the modulo operator (%).

Hash function

A hash function is any function that can be used to map a data set of an arbitrary size to a data set of a fixed size, which falls into the hash table. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.

To achieve a good hashing mechanism, It is important to have a good hash function with the following basic requirements:

  • Easy to compute: It should be easy to compute and must not become an algorithm in itself.

  • Uniform distribution: It should provide a uniform distribution across the hash table and should not result in clustering.

  • Less collisions: Collisions occur when pairs of elements are mapped to the same hash value. These should be avoided.

Irrespective of how good a hash function is, collisions are bound to occur. Therefore, to maintain the performance of a hash table, it is important to manage collisions through various collision resolution techniques.

Hash table A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to compute an index into an array in which an element will be inserted or searched. By using a good hash function, hashing can work well. Under reasonable assumptions, the average time required to search for an element in a hash table is O(1).

Collisions

What is Collision?

Since a hash function gets us a small number for a key which is a big integer or string, there is a possibility that two keys result in the same value. The situation where a newly inserted key maps to an already occupied slot in the hash table is called collision and must be handled using some collision handling technique.

What are the chances of collisions with large table?

Collisions are very likely even if we have big table to store keys. An important observation is Birthday Paradox. With only 23 persons, the probability that two people have the same birthday is 50%.

How to handle Collisions?

There are mainly two methods to handle collision:

  1. Separate Chaining

  2. Open Addressing

Separate Chaining

The idea is to make each cell of hash table point to a linked list of records that have same hash function value.

Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.

  • Advantages

    1. Simple to implement.

    2. Hash table never fills up, we can always add more elements to the chain.

    3. Less sensitive to the hash function or load factors.

    4. It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.

  • Disadvantages

    1. Cache performance of chaining is not good as keys are stored using a linked list. Open addressing provides better cache performance as everything is stored in the same table.

    2. Wastage of Space (Some Parts of hash table are never used)

    3. If the chain becomes long, then search time can become O(n) in the worst case.

    4. Uses extra space for links.

As you might be knowing that hash table data structure works on key value pairing. The idea behind using of Hash table is it would work with O(1) time complexity for insertion, deletion and search operations in Hash Table for any given value. O(1) time complexity can be achieved if we have a key for each value. Calculation of key for a given value is done by hash functions. eg.- If we want to do some operations on characters used in a string we can use ASCII value of characters as key. This way each character (value) would be mapped to different ascii (key). Suppose if you also want to keep track of number of occurrences of characters or you want to store repeated characters also in hash table than collision will occur. Because repeated characters will collide with already stored characters in has table for the keys. In this example it may not look a big deal as characters are identical but in some other cases it would affect the desired result of O(1). In worst case operations can cost O(n). So collisions occur when more than one value link to single key in hash table. We can add values in linked list for that key in such cases.

To avoid the collision hash function should be properly picked. It can also depend on the size of hash table. If you have wide range of inputs its quite possible that for a particular set of input you will always get worst case O(n) time complexity. To overcome that we can have a set of hash functions and we can randomly pic a hash function on execution. This would avoid the situations where a given input always leads to worst case performance. There are some other techniques also like Open Addressing. You can refer Cormen Book to know more about it.

PreviousBinary search algorithmNextStack, queue, linked list (construction, understanding, usage)

Last updated 5 years ago

hashChaining