C语言如何实现HashMap？-C语言-盈思创科技

核心概念：理解哈希表的基本原理。
数据结构设计：定义 C 语言中的哈希表结构。
核心函数实现：包括初始化、哈希函数、插入、查找、删除等。
完整代码示例：提供一个可以直接运行的 C 语言代码。
优化与进阶：讨论如何改进这个基础实现。

核心概念

哈希表是一种通过哈希函数将键映射到存储桶（数组索引）来实现快速数据访问的数据结构，它的核心思想是利用数组 O(1) 时间复杂度的随机访问特性。

（图片来源网络，侵删）

主要组成部分：

数组：作为哈希表的底层存储，我们称之为“桶”或“槽”（Bucket/Slot）。
键：用于查找和唯一标识元素的值（字符串 "name"）。
值：与键关联的数据（字符串 "Alice"）。
哈希函数：将任意类型的键转换成数组索引（一个整数）的函数，一个好的哈希函数应尽量减少哈希冲突。
冲突解决：当两个不同的键通过哈希函数计算出相同的索引时，就会发生冲突，常见的解决方法有：
- 链地址法：每个桶不直接存储元素，而是存储一个链表（或动态数组），所有映射到同一个桶的元素都挂在这个链表上,这是最常用和最易于实现的方法。
- 开放地址法：当发生冲突时，寻找下一个“空”的桶来存储元素（线性探测、二次探测等）。

我们将采用链地址法来实现，因为它在 C 语言中实现起来更直观。

数据结构设计

我们需要定义两个主要的结构体：HashMapEntry 和 HashMap。

HashMapEntry：表示哈希表中的一个条目，它包含键、值，以及一个指向下一个条目的指针（用于链表）。
HashMap：表示整个哈希表，它包含一个条目指针数组、桶的数量和当前元素的数量。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
// 定义一个键值对条目
typedef struct HashMapEntry {
    char* key;             // 键，这里我们使用字符串作为键
    char* value;           // 值
    struct HashMapEntry* next; // 指向下一个条目的指针，用于解决冲突
} HashMapEntry;
// 定义哈希表
typedef struct {
    int capacity;          // 桶的数量
    int size;              // 当前存储的元素数量
    HashMapEntry** buckets; // 桶数组，每个桶是一个指向条目链表头节点的指针
} HashMap;

核心函数实现

1 初始化哈希表

我们需要为哈希表分配内存，并初始化所有桶为 NULL。

（图片来源网络，侵删）

// 创建一个新的哈希表
HashMap* hashmap_create(int capacity) {
    HashMap* map = (HashMap*)malloc(sizeof(HashMap));
    if (!map) return NULL;
    map->capacity = capacity;
    map->size = 0;
    // 为桶数组分配内存
    map->buckets = (HashMapEntry**)calloc(capacity, sizeof(HashMapEntry*));
    if (!map->buckets) {
        free(map);
        return NULL;
    }
    return map;
}

2 哈希函数

哈希函数是哈希表的灵魂，对于字符串键，一个简单但有效的哈希函数是djb2算法。

// 一个简单的字符串哈希函数 (djb2算法)
unsigned long hash(const char* str) {
    unsigned long hash = 5381;
    int c;
    while ((c = *str++)) {
        hash = ((hash << 5) + hash) + c; // hash * 33 + c
    }
    return hash;
}

3 插入/更新元素

插入操作是哈希表最核心的操作之一。

计算键的哈希值，并对桶的数量取模,得到桶的索引。
检查该桶的链表中是否已存在相同的键。
- 如果存在,则更新对应的值。
- 如果不存在，则创建一个新条目，并将其插入到链表的头部（或尾部）。

// 向哈希表中插入或更新一个键值对
void hashmap_put(HashMap* map, const char* key, const char* value) {
    // 1. 计算桶的索引
    unsigned long index = hash(key) % map->capacity;
    HashMapEntry* entry = map->buckets[index];
    HashMapEntry* prev = NULL;
    // 2. 遍历链表，查找是否已存在该键
    while (entry != NULL) {
        if (strcmp(entry->key, key) == 0) {
            // 键已存在，更新值
            free(entry->value); // 释放旧的值
            entry->value = strdup(value); // 复制新的值
            return; // 更新完成，直接返回
        }
        prev = entry;
        entry = entry->next;
    }
    // 3. 键不存在，创建新条目
    HashMapEntry* new_entry = (HashMapEntry*)malloc(sizeof(HashMapEntry));
    if (!new_entry) return; // 内存分配失败
    new_entry->key = strdup(key);
    new_entry->value = strdup(value);
    new_entry->next = NULL;
    // 4. 将新条目插入到链表头部
    if (prev == NULL) {
        // 如果桶为空，新条目成为链表头
        map->buckets[index] = new_entry;
    } else {
        // 否则，插入到链表末尾
        prev->next = new_entry;
    }
    map->size++;
}

4 查找元素

计算键的哈希值,得到桶的索引。
遍历该桶的链表,查找匹配的键。
如果找到，返回对应的值；否则返回 NULL。

// 根据键查找值
char* hashmap_get(HashMap* map, const char* key) {
    unsigned long index = hash(key) % map->capacity;
    HashMapEntry* entry = map->buckets[index];
    while (entry != NULL) {
        if (strcmp(entry->key, key) == 0) {
            return entry->value; // 找到，返回值
        }
        entry = entry->next;
    }
    return NULL; // 未找到
}

5 删除元素

计算键的哈希值,得到桶的索引。
遍历链表,找到要删除的条目及其前一个条目。
如果找到，则调整指针以“绕过”该条目,并释放其内存。

// 从哈希表中删除一个键值对
bool hashmap_remove(HashMap* map, const char* key) {
    unsigned long index = hash(key) % map->capacity;
    HashMapEntry* entry = map->buckets[index];
    HashMapEntry* prev = NULL;
    while (entry != NULL) {
        if (strcmp(entry->key, key) == 0) {
            // 找到了要删除的条目
            if (prev == NULL) {
                // 删除的是链表头节点
                map->buckets[index] = entry->next;
            } else {
                // 删除的是中间或尾节点
                prev->next = entry->next;
            }
            free(entry->key);
            free(entry->value);
            free(entry);
            map->size--;
            return true; // 删除成功
        }
        prev = entry;
        entry = entry->next;
    }
    return false; // 未找到该键
}

6 释放哈希表内存

遍历所有桶，释放每个链表的所有节点,最后释放桶数组和哈希表本身。

// 释放哈希表的所有内存
void hashmap_free(HashMap* map) {
    for (int i = 0; i < map->capacity; i++) {
        HashMapEntry* entry = map->buckets[i];
        while (entry != NULL) {
            HashMapEntry* temp = entry;
            entry = entry->next;
            free(temp->key);
            free(temp->value);
            free(temp);
        }
    }
    free(map->buckets);
    free(map);
}

完整代码示例

下面是一个完整的、可运行的 C 程序,演示了如何使用上面实现的哈希表。

（图片来源网络，侵删）

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
// --- 结构体定义 ---
typedef struct HashMapEntry {
    char* key;
    char* value;
    struct HashMapEntry* next;
} HashMapEntry;
typedef struct {
    int capacity;
    int size;
    HashMapEntry** buckets;
} HashMap;
// --- 函数声明 ---
HashMap* hashmap_create(int capacity);
unsigned long hash(const char* str);
void hashmap_put(HashMap* map, const char* key, const char* value);
char* hashmap_get(HashMap* map, const char* key);
bool hashmap_remove(HashMap* map, const char* key);
void hashmap_free(HashMap* map);
void hashmap_print(HashMap* map);
// --- 主函数 ---
int main() {
    // 创建一个容量为 10 的哈希表
    HashMap* map = hashmap_create(10);
    if (!map) {
        printf("Failed to create hashmap.\n");
        return 1;
    }
    // 插入一些数据
    hashmap_put(map, "name", "Alice");
    hashmap_put(map, "age", "30");
    hashmap_put(map, "city", "New York");
    hashmap_put(map, "job", "Engineer");
    // 更新数据
    hashmap_put(map, "age", "31");
    printf("--- Initial Data ---\n");
    hashmap_print(map);
    // 查找数据
    printf("\n--- Lookup ---\n");
    printf("Name: %s\n", hashmap_get(map, "name"));
    printf("Age: %s\n", hashmap_get(map, "age"));
    printf("Country: %s\n", hashmap_get(map, "country")); // 应该输出 (null)
    // 删除数据
    printf("\n--- Removing 'city' ---\n");
    if (hashmap_remove(map, "city")) {
        printf("Removed 'city' successfully.\n");
    } else {
        printf("Failed to remove 'city'.\n");
    }
    printf("\n--- Final Data ---\n");
    hashmap_print(map);
    // 释放内存
    hashmap_free(map);
    return 0;
}
// --- 函数实现 (已在上面详细说明) ---
HashMap* hashmap_create(int capacity) {
    HashMap* map = (HashMap*)malloc(sizeof(HashMap));
    if (!map) return NULL;
    map->capacity = capacity;
    map->size = 0;
    map->buckets = (HashMapEntry**)calloc(capacity, sizeof(HashMapEntry*));
    if (!map->buckets) {
        free(map);
        return NULL;
    }
    return map;
}
unsigned long hash(const char* str) {
    unsigned long hash = 5381;
    int c;
    while ((c = *str++)) {
        hash = ((hash << 5) + hash) + c;
    }
    return hash;
}
void hashmap_put(HashMap* map, const char* key, const char* value) {
    unsigned long index = hash(key) % map->capacity;
    HashMapEntry* entry = map->buckets[index];
    HashMapEntry* prev = NULL;
    while (entry != NULL) {
        if (strcmp(entry->key, key) == 0) {
            free(entry->value);
            entry->value = strdup(value);
            return;
        }
        prev = entry;
        entry = entry->next;
    }
    HashMapEntry* new_entry = (HashMapEntry*)malloc(sizeof(HashMapEntry));
    if (!new_entry) return;
    new_entry->key = strdup(key);
    new_entry->value = strdup(value);
    new_entry->next = NULL;
    if (prev == NULL) {
        map->buckets[index] = new_entry;
    } else {
        prev->next = new_entry;
    }
    map->size++;
}
char* hashmap_get(HashMap* map, const char* key) {
    unsigned long index = hash(key) % map->capacity;
    HashMapEntry* entry = map->buckets[index];
    while (entry != NULL) {
        if (strcmp(entry->key, key) == 0) {
            return entry->value;
        }
        entry = entry->next;
    }
    return NULL;
}
bool hashmap_remove(HashMap* map, const char* key) {
    unsigned long index = hash(key) % map->capacity;
    HashMapEntry* entry = map->buckets[index];
    HashMapEntry* prev = NULL;
    while (entry != NULL) {
        if (strcmp(entry->key, key) == 0) {
            if (prev == NULL) {
                map->buckets[index] = entry->next;
            } else {
                prev->next = entry->next;
            }
            free(entry->key);
            free(entry->value);
            free(entry);
            map->size--;
            return true;
        }
        prev = entry;
        entry = entry->next;
    }
    return false;
}
void hashmap_free(HashMap* map) {
    for (int i = 0; i < map->capacity; i++) {
        HashMapEntry* entry = map->buckets[i];
        while (entry != NULL) {
            HashMapEntry* temp = entry;
            entry = entry->next;
            free(temp->key);
            free(temp->value);
            free(temp);
        }
    }
    free(map->buckets);
    free(map);
}
// 辅助函数：打印哈希表内容
void hashmap_print(HashMap* map) {
    printf("HashMap (Capacity: %d, Size: %d):\n", map->capacity, map->size);
    for (int i = 0; i < map->capacity; i++) {
        printf("Bucket %d: ", i);
        HashMapEntry* entry = map->buckets[i];
        if (entry == NULL) {
            printf("NULL");
        } else {
            while (entry != NULL) {
                printf("[%s:%s] -> ", entry->key, entry->value);
                entry = entry->next;
            }
            printf("NULL");
        }
        printf("\n");
    }
}

编译和运行:

gcc hashmap.c -o hashmap
./hashmap

预期输出:

--- Initial Data ---
HashMap (Capacity: 10, Size: 4):
Bucket 0: NULL
Bucket 1: NULL
Bucket 2: NULL
Bucket 3: NULL
Bucket 4: [name:Alice] -> NULL
Bucket 5: NULL
Bucket 6: NULL
Bucket 7: [age:31] -> [city:New York] -> NULL
Bucket 8: [job:Engineer] -> NULL
Bucket 9: NULL
--- Lookup ---
Name: Alice
Age: 31
Country: (null)
--- Removing 'city' ---
Removed 'city' successfully.
--- Final Data ---
HashMap (Capacity: 10, Size: 3):
Bucket 0: NULL
Bucket 1: NULL
Bucket 2: NULL
Bucket 3: NULL
Bucket 4: [name:Alice] -> NULL
Bucket 5: NULL
Bucket 6: NULL
Bucket 7: [age:31] -> NULL
Bucket 8: [job:Engineer] -> NULL
Bucket 9: NULL

优化与进阶

这个基础实现已经涵盖了哈希表的核心功能,但在实际应用中还可以进行很多优化：

动态扩容：
- 问题：当哈希表中的元素越来越多（size 接近 capacity），哈希冲突的概率会急剧上升，导致性能下降（链表变长，查找时间从 O(1) 退化为 O(n)）。
- 解决方案：实现一个负载因子（load factor = size / capacity），当负载因子超过某个阈值（如 0.75）时，创建一个更大的新桶数组（通常是原来的 2 倍），然后将所有旧条目重新哈希（rehash）到新桶中，最后替换掉旧的桶数组,这是哈希表保持高性能的关键。
更通用的键类型：
- 问题：当前实现只支持 char* 作为键。
- 解决方案：可以将 key 定义为 void*，并让用户在创建哈希表时提供一个自定义的比较函数（int (*compare_func)(const void*, const void*)）和一个自定义的哈希函数，这样就能支持整数、结构体等任何类型的数据。
更高效的内存管理：
- 问题：频繁的 malloc 和 free 会带来性能开销。
- 解决方案：可以引入一个内存池，预先分配一大块内存，然后从中分配和回收条目，而不是直接调用系统级的 malloc。
使用开放地址法：

对于某些场景，开放地址法可能比链地址法更节省内存（因为它不需要存储指针），并且在缓存友好性方面可能更好，但实现起来比链地址法更复杂,尤其是在删除元素时。
线程安全：
- 当前实现不是线程安全的，如果多个线程同时读写哈希表，会导致数据竞争和崩溃，可以通过互斥锁（pthread_mutex_t）来保护对哈希表的访问,但这会带来一定的性能开销。

这个 C 语言实现的哈希表是一个非常好的学习项目,它帮助你深入理解了哈希表的工作原理和底层实现细节。

C语言如何实现HashMap？

核心概念

数据结构设计

核心函数实现

1 初始化哈希表

2 哈希函数

3 插入/更新元素

4 查找元素

5 删除元素

6 释放哈希表内存

完整代码示例

优化与进阶

相关文章

目录[+]