C语言如何实现dictionary功能？-C语言-盈思创科技

在 C 语言中，并没有像 Python 或 JavaScript 那样内置的、原生的 Dictionary 或 Map 数据结构，C 语言的核心库只提供了基本的数据类型，如数组、结构体、链表等。

（图片来源网络，侵删）

当我们在 C 语言中需要实现一个“字典”（即键值对 Key-Value Store）时，通常需要我们自己动手实现,或者使用第三方库。

"字典"的核心思想是通过一个唯一的键来快速查找、插入、删除一个对应的值，这种“快速查找”的特性，通常通过哈希表 来实现。

下面，我将从三个层面来解释 C 语言中的字典：

核心思想：为什么需要哈希表？
手动实现：一个简单的哈希表示例
使用第三方库：更高效、更可靠的选择

核心思想：为什么需要哈希表？

想象一下，如果你用数组来存储键值对，struct pair { char *key; int value; } arr[100];，那么查找一个键 key 的唯一方法就是遍历整个数组，直到找到匹配的项，这个操作的时间复杂度是 O(n)，当数据量很大时,效率会非常低。

（图片来源网络，侵删）

哈希表的目标是实现接近 O(1)（常数时间）的查找、插入和删除效率。

哈希表的工作原理：

哈希函数：这是一个将任意大小的“键”映射成一个固定大小整数（通常用作数组索引）的函数，一个好的哈希函数应该能将键均匀地分布到整个数组中,减少冲突。
- 对于字符串键 "apple"，一个简单的哈希函数可能是：遍历每个字符，将它们的 ASCII 码值相加,然后对数组大小取模。
- hash("apple") = ('a' + 'p' + 'p' + 'l' + 'e') % array_size
处理冲突：不同的键可能会通过哈希函数计算出相同的索引（这称为“哈希冲突”），处理冲突是哈希表实现的关键,常见的方法有两种：
- 链地址法：这是最常用的方法，哈希表的底层实现是一个“指针数组”，数组的每个元素都是一个指向链表头节点的指针，当发生冲突时,新的键值对会被添加到对应索引的链表中。
- 开放地址法：当发生冲突时，会按照一定的规则（如线性探测、二次探测）在数组中寻找下一个“空位”来插入元素。

一个典型的 C 语言字典（哈希表）由以下部分组成：

一个结构体 HashTable，它包含：
- 一个指向“桶”数组的指针。
- 当前存储的元素数量。
- 哈希表的大小（桶的数量）。
一个“桶” Bucket，它本身是一个结构体，包含：
- 指向链表头节点的指针。
- （可选）其他元数据。
一个链表节点 Node，它包含：
- 键。
- 值。
- 指向下一个节点的指针。

手动实现：一个简单的哈希表示例

下面我们将使用 链地址法 来实现一个简单的、只能存储 int 值的哈希表，这个例子虽然不完美,但能清晰地展示哈希表的核心构造和操作。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// 定义链表节点
typedef struct Node {
    char *key;          // 键
    int value;          // 值
    struct Node *next;  // 指向下一个节点的指针
} Node;
// 定义哈希表
typedef struct {
    int size;           // 哈希表的大小（桶的数量）
    Node **buckets;     // 桶数组，每个桶是一个指向链表头节点的指针
} HashTable;
// 哈希函数 (简单的字符串哈希)
unsigned int hash(const char *key, int table_size) {
    unsigned long int hash_value = 0;
    for (int i = 0; key[i] != '\0'; i++) {
        hash_value = (hash_value << 5) + key[i];
    }
    return hash_value % table_size;
}
// 创建一个新的哈希表
HashTable *create_hash_table(int size) {
    HashTable *ht = (HashTable *)malloc(sizeof(HashTable));
    if (ht == NULL) {
        return NULL;
    }
    ht->size = size;
    // 为桶数组分配内存
    ht->buckets = (Node **)calloc(size, sizeof(Node *));
    if (ht->buckets == NULL) {
        free(ht);
        return NULL;
    }
    return ht;
}
// 向哈希表中插入或更新一个键值对
void insert(HashTable *ht, const char *key, int value) {
    unsigned int index = hash(key, ht->size);
    // 1. 检查键是否已存在
    Node *current = ht->buckets[index];
    while (current != NULL) {
        if (strcmp(current->key, key) == 0) {
            // 键已存在，更新值
            current->value = value;
            return;
        }
        current = current->next;
    }
    // 2. 键不存在，创建新节点并插入到链表头部
    Node *new_node = (Node *)malloc(sizeof(Node));
    if (new_node == NULL) {
        fprintf(stderr, "Memory allocation failed\n");
        return;
    }
    new_node->key = strdup(key); // 复制键字符串
    new_node->value = value;
    new_node->next = ht->buckets[index]; // 新节点指向旧的链表头
    ht->buckets[index] = new_node;       // 将新节点设为新的链表头
}
// 根据键查找值
int get(HashTable *ht, const char *key) {
    unsigned int index = hash(key, ht->size);
    Node *current = ht->buckets[index];
    while (current != NULL) {
        if (strcmp(current->key, key) == 0) {
            return current->value; // 找到，返回值
        }
        current = current->next;
    }
    return -1; // 未找到，返回一个特殊值（1）
}
// 从哈希表中删除一个键值对
void delete(HashTable *ht, const char *key) {
    unsigned int index = hash(key, ht->size);
    Node *current = ht->buckets[index];
    Node *prev = NULL;
    while (current != NULL) {
        if (strcmp(current->key, key) == 0) {
            if (prev == NULL) {
                // 要删除的是链表的头节点
                ht->buckets[index] = current->next;
            } else {
                // 要删除的是中间或尾节点
                prev->next = current->next;
            }
            free(current->key); // 释放键的内存
            free(current);      // 释放节点的内存
            return;
        }
        prev = current;
        current = current->next;
    }
}
// 释放哈希表的所有内存
void free_hash_table(HashTable *ht) {
    for (int i = 0; i < ht->size; i++) {
        Node *current = ht->buckets[i];
        while (current != NULL) {
            Node *temp = current;
            current = current->next;
            free(temp->key);
            free(temp);
        }
    }
    free(ht->buckets);
    free(ht);
}
// 测试代码
int main() {
    HashTable *ht = create_hash_table(10); // 创建一个大小为10的哈希表
    insert(ht, "apple", 10);
    insert(ht, "banana", 20);
    insert(ht, "orange", 30);
    insert(ht, "apple", 100); // 更新 "apple" 的值
    printf("Value of 'apple': %d\n", get(ht, "apple"));    // 输出: 100
    printf("Value of 'banana': %d\n", get(ht, "banana"));  // 输出: 20
    printf("Value of 'grape': %d\n", get(ht, "grape"));    // 输出: -1 (未找到)
    delete(ht, "banana");
    printf("Value of 'banana' after deletion: %d\n", get(ht, "banana")); // 输出: -1
    free_hash_table(ht); // 释放所有内存
    return 0;
}

使用第三方库：更高效、更可靠的选择

手动实现一个功能完善、性能优异的哈希表是非常复杂的,需要考虑：

更好的哈希算法（如 djb2, FNV-1a）来减少冲突。
动态扩容：当元素数量超过某个阈值时，自动增大哈希表的大小，并重新哈希所有元素,以保持效率。
内存管理细节。
线程安全（如果需要）。

在实际项目中，强烈建议使用成熟的第三方库，它们经过充分测试，性能更好,功能更全。

Glib (GLib)

Glib 是一个非常流行的 C 语言库，是 GNOME 项目的基础，它提供了一个功能强大的 GHashTable。

优点：

高性能,自动处理扩容。
API 设计优秀,易于使用。
支持多种键和值的数据类型。

示例：

#include <glib.h>
#include <stdio.h>
int main() {
    // 创建一个新的哈希表，键和值都是指针
    GHashTable *hash_table = g_hash_table_new(g_str_hash, g_str_equal);
    // 插入键值对
    g_hash_table_insert(hash_table, "apple", GINT_TO_POINTER(10));
    g_hash_table_insert(hash_table, "banana", GINT_TO_POINTER(20));
    // 查找值
    gpointer value = g_hash_table_lookup(hash_table, "apple");
    if (value) {
        printf("Value of 'apple': %d\n", GPOINTER_TO_INT(value)); // 输出: 10
    }
    // 删除
    g_hash_table_remove(hash_table, "banana");
    // 销毁哈希表（会自动释放所有键和值）
    g_hash_table_destroy(hash_table);
    return 0;
}

编译时需要链接 Glib 库： gcc -o example example.cpkg-config --cflags --libs glib-2.0``

uthash

这是一个非常轻量级的、单文件的 C 语言哈希表库，使用起来极其方便，你只需要将 uthash.h 文件包含到你的项目中即可。

优点：

单文件,易于集成。
语法非常简洁，宏定义使得代码看起来像原生 C 语言。
性能很好。

示例：

#include "uthash.h" // 引入 uthash.h
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// 定义你的结构体
struct my_struct {
    const char *key;      /* 键 */
    int id;               /* 值 */
    UT_hash_handle hh;    /* uthash 需要的句柄 */
};
int main() {
    struct my_struct *users = NULL; // 哈希表必须初始化为 NULL
    // 添加元素
    struct my_struct *s;
    s = (struct my_struct *)malloc(sizeof *s);
    s->key = "alice";
    s->id = 1;
    HASH_ADD_KEYPTR(hh, users, s->key, strlen(s->key), s);
    s = (struct my_struct *)malloc(sizeof *s);
    s->key = "bob";
    s->id = 2;
    HASH_ADD_KEYPTR(hh, users, s->key, strlen(s->key), s);
    // 查找元素
    struct my_struct *found = NULL;
    HASH_FIND_STR(users, "alice", found);
    if (found) {
        printf("Found user: %s, id: %d\n", found->key, found->id); // 输出: Found user: alice, id: 1
    }
    // 删除元素
    HASH_DEL(users, found);
    free(found);
    // 遍历哈希表
    for (struct my_struct *tmp; users != NULL; users = (struct my_struct *)(users->hh.next)) {
        tmp = users;
        printf("User: %s, id: %d\n", tmp->key, tmp->id);
    }
    return 0;
}

编译： gcc -o example example.c

方法	优点	缺点	适用场景
手动实现	深入理解哈希表原理	代码复杂，容易出错，性能可能不佳，功能不完善	学习数据结构，或项目有极特殊需求且无法使用第三方库
Glib	功能强大，性能优异，API友好	增加外部依赖	大型项目，特别是 GNOME 生态相关的应用
uthash	轻量级，单文件，易于集成，语法简洁	功能相对 Glib 简单	中小型项目，嵌入式系统，或需要快速添加哈希表功能的场景

对于绝大多数 C 语言开发者来说，推荐直接使用 uthash 或 Glib，它们能让你从繁琐的实现细节中解放出来,专注于业务逻辑本身。

C语言如何实现dictionary功能？

核心思想：为什么需要哈希表？

手动实现：一个简单的哈希表示例

使用第三方库：更高效、更可靠的选择

Glib (GLib)

uthash

相关文章

目录[+]