【未】#创作计划# 字符串哈希（基础篇）

cjdstttttt

2025-10-27 23:29:32

发布于：广东

42阅读

0回复

0点赞

嗯。

字符串哈希太好用了你知道吗。

由于一些基础字符串哈希算法在提高级别字符串算法已经不适用（例如字符串拼接，只能返回哈希值，不适合用于提高算法），本创作计划将分为“基础篇”与“提高篇”。

如果你对基础字符串哈希算法十分了解，请前往提高篇。

字符串哈希是一种简单、用处广泛的字符串算法，可以处理大部分字符串问题。

首先我们要知道字符串哈希的原理是什么。

其实就是把一个字符串通过乘法压缩成一个数值。这个数值就叫哈希值。

例如定义 $k$ 为乘数，长度为 $5$ 的字符串 $S$ 就可以压缩成 $h(S)=S_1\times k^4+S_2\times k^3+S_3\times k^2+S_4\times k^1+S_5\times k^0$ 。

这样我们就可以递推出 $S$ 所有前缀的哈希值： $h([S_1,S_2,...,S_i])=k\times h([S_1,S_2,...,S_{i-1}])+S_i$ 。

这样，每个字符串都有一个对应的哈希值，不同的串哈希值也不同。

但这样数值是指数级增长的，到了后面 unsigned long long 都存不下，怎么办？

我们可以对哈希值对一个大数取模，这样虽然可能有两个不同字符串的哈希值相同，但错误率极小，如果模数是 $p$ ，则两个不同的随机字符串哈希值相同的概率为 $\frac{1}{p}$ （一般单模哈希的 $p$ 不能取太小，因为根据生日悖论， $\sqrt p$ 个随机字符串就很大概率出现一对哈希值相同的）。特别地，可以使用 unsigned long long 自然溢出，这可以看做模数 $p=2^{64}$ 。

求一个字符串哈希代码如下（自然溢出，乘数 $k=131$ ）：

ull get_hash(string s, int n){
	ull ans = 0;
	for(int i = 1; i <= n; i++){
		ans = ans * mul + s[i];
	}
	return ans;
}

时间复杂度： $O(n)$ 。

字符串哈希可以干什么呢？你可以看看下面的内容。（吓哭了）

判断两字符串是否相等

题目
给定两个长度为 $n$ 的字符串 $A,B$ ，判断这两个字符串是否相等。
即判断是否对于所有 $1\le i\le n$ ，满足 $A_i=B_i$ 。

这个简单，判断哈希值是否相等即可。

#include <iostream>
#include <cstdio>
#define ull unsigned long long
using namespace std;
const ull mul = 131;
string s1, s2;
int n;
ull get_hash(string s, int n){
	ull ans = 0;
	for(int i = 1; i <= n; i++){
		ans = ans * mul + s[i];
	}
	return ans;
}
bool equal(ull hsh1, ull hsh2){
	return (hsh1 == hsh2);
}
int main(){
	ios::sync_with_stdio(0);
	cin.tie(0), cout.tie(0);
	cin >> n; 
	cin >> s1 >> s2;
	s1 = ' ' + s1;
	s2 = ' ' + s2;
	cout << equal(get_hash(s1, n), get_hash(s2, n));

	return 0;
}

求哈希值时间复杂度： $O(n)$ 。

判断相等时间复杂度： $O(1)$ 。

字符串拼接

题目
给定两个字符串 $A,B$ ，请输出他们的哈希值和拼接后的哈希值。

假设需要拼接两个字符串 $A,B$ ，它们的长度分别为 $n,m$ 。

则它们的哈希值分别为 $h(A)=A_1\times k^{n-1}+A_2\times k^{n-2}+...+A_n\times k^0$ ， $h(B)=B_1\times k^{m-1}+B_2\times k^{m-2}+...+B_m\times k^0$ 。

假设拼接后的字符串 $C=A+B$ 。则有哈希值 $h(C)=A_1\times k^{n+m-1}+A_2\times^{n+m-2}+...+A_n\times k^{m}+B_1\times k^{m-1}+B_2\times k^{m-2}+...+B_m\times k^0=h(A)\times k^m+h(B)$ 。

预处理 $k^i$ 即可。

#include <iostream>
#include <cstdio>
#define ull unsigned long long
using namespace std;
const ull mul = 131;
ull ksm[200005];
string s1, s2;
int n, m;
ull get_hash(string s, int n){
	ull ans = 0;
	for(int i = 1; i < s.size(); i++){
		ans = ans * mul + s[i];
	}
	return ans;
}
ull merge(ull hsh1, ull hsh2, ull size1, ull size2){
	return (hsh1 * ksm[size2]) + (hsh2);
}
int main(){
	ios::sync_with_stdio(0);
	cin.tie(0), cout.tie(0);
	ksm[0] = 1;
	for(int i = 1; i <= 200000; i++){
		ksm[i] = ksm[i - 1] * mul;
	}
	cin >> n >> m;
	cin >> s1 >> s2;
	s1 = ' ' + s1;
	s2 = ' ' + s2;
	cout << get_hash(s1, n) << ' ' << get_hash(s2, m) << ' ' << merge(get_hash(s1, n), get_hash(s2, m), n, m);

	return 0;
}

预处理，求哈希值时间复杂度： $O(n+m)$ 。

拼接时间复杂度： $O(1)$ 。

截取字符串一个子串

题目
给定一个字符串 $A$ ，给出 $q$ 次询问，首先输出 $A$ 的哈希值，然后输出每次询问查找的子串的哈希值。

我们可以按照上面的方法，先预处理出这个字符串的前缀哈希值。

下面的代码就是记录一个字符串的前缀哈希值：

#include <iostream>
#include <cstdio>
#define ull unsigned long long
using namespace std;
const ull mul = 131;
string s;
int n;
ull HASH[1000005];
int main(){
	ios::sync_with_stdio(0);
	cin.tie(0), cout.tie(0);
    cin >> n;
	cin >> s;
	s = ' ' + s;
	for(int i = 1; i <= n; i++){
		HASH[i] = HASH[i - 1] * mul + s[i];
	}

	return 0;
}

记要截取的子串为 $S'=[S_l,S_{l+1},S_{l+2},...,S_{l+len-1}]$ 。

则 $h(S')\\ =S_l\times k^{len-1}+S_{l+1}\times k^{len2}+...+S_{l+len-1}\times k^0\\ =(S_1\times k^{l+len-2}+S_2\times k^{l+len-3}+...+S_{l+len-1}\times k^0)-(S_1\times k^{l+len-2}+S_2\times k^{l+len-3}+...+S_{l-1}\times k^{len})\\ =h([S_1,S_2,...,S_{l+len-1}])-k^{len}h([S_1,S_2,...,S_{l-1}])$ 。

#include <iostream>
#include <cstdio>
#define ull unsigned long long
using namespace std;
const ull mul = 131;
string s;
int n;
ull HASH[1000005];
ull ksm[1000005];
ull substr(int l, int len){
	return HASH[l + len - 1] - ksm[len] * HASH[l - 1];
}
int main(){
	ios::sync_with_stdio(0);
	cin.tie(0), cout.tie(0);
	cin >> n;
	cin >> s;
	s = ' ' + s;
	ksm[0] = 1;
	for(int i = 1; i <= n; i++){
		HASH[i] = HASH[i - 1] * mul + s[i];
		ksm[i] = ksm[i - 1] * mul;
	}
	cout << HASH[n] << '\n';
	int q;
	cin >> q;
	while(q--){
		int l, len;
		cin >> l >> len;
		cout << substr(l, len) << '\n';
	}
	

	return 0;
}

预处理时间复杂度： $O(n)$
查询时间复杂度： $O(q)$

有帮助，赞一个

去预览

0/2000

全部评论 6

cjdstttttt
你说我是不是可以分成上下篇讲水两个创作计划（（（

5天前来自广东
2
退1年，偶尔会回私信，别@了‎（
qp

6天前来自重庆
1
cjdstttttt
d

6天前来自广东
1
- 退1年，偶尔会回私信，别@了‎（
  回复cjdstttttt
  捉（
  
  6天前来自重庆
  0
- 退1年，偶尔会回私信，别@了‎（
  回复cjdstttttt
  童哥这么晚还在肝OI吗？
  
  6天前来自重庆
  0
- cjdstttttt
  回复退1年，偶尔会回私信，别@了‎（
  瞎写一波水个创作计划（
  
  6天前来自广东
  1
cjdstttttt
d

4天前来自广东
0
ppl 大帝
介绍一下用字符串哈希+<数据删除>（kachang）过KMP板子，不然狂踩

5天前来自浙江
0
- cjdstttttt
  回复ppl 大帝
  会的兄弟会的
  
  5天前来自广东
  0
- cjdstttttt
  回复ppl 大帝
  1e6也不卡常吧
  
  5天前来自广东
  0
- cjdstttttt
  回复ppl 大帝
  border有点难找
  
  5天前来自广东
  0
退1年，偶尔会回私信，别@了‎（
d

6天前来自重庆
0

判断两字符串是否相等

字符串拼接

截取字符串一个子串

全部评论 6

热门讨论