|
|
вернуться в форумHelp with aho corasick MLE 7,please ! #include <iostream> #include <queue> #include <map> using namespace std; #define MAX 2000000 #define K 128 map<int,int> Next; short link[MAX]; bool leaf[MAX]; short p[MAX]; char I[MAX];
int sz,n,m,vv=0; unsigned char ch; char chh; queue<short> q; int main(){ p[0] = 0; sz = 1; scanf("%d",&n);getchar(); while(n--){ int v=0; while((ch=getchar())!='\n') { if (Next[v*1000+ch] == 0){ p[sz] = v; I[sz] = ch; Next[v*1000+ch] = sz++; } v = Next[v*1000+ch]; } leaf[v] = true; } ~8400Kb Wrong bor ?? Re: Help with aho corasick MLE 7,please ! I can't understand your Aho-Corasick realistation. Why v * 1000, is ch only 32 thru 255 ? then, why max 2000000? Are you sure, that your FSM will never have more than 2000 states? Re: Help with aho corasick MLE 7,please ! map<int,int> Next; I use 'int' 4-8 digit number under the peaks, and 1-3 digit number under the symbol (for which there is a transition) {я использую в 'int' 4-8 разряд под номер вершины, а 1-3 разряд под номер символа (по которому есть переход) } So looks like the transition to a tree: int go (int g,unsigned char ch) { /// /// return Next[g*1000+ch]; } Edited by author 09.05.2011 15:50 Re: Help with aho corasick MLE 7,please ! That wrong way: imagine dictonary ablahblahblahblah... (10000 long) bblahblahblahblah... (10000 long) ... jblahblahblahblah....(10000 long) so you will have 100000 nodes in your tree. If you multiply it by 1000 (or even by 256-32=224), you will have 100000000 (or 22400000) entries, that cann't fit in memory limit. so you have to store link-symbols in other way. For example, you can imagine, that evry node can be reached only by one symbol, so you can store symbol in your node and make something like linked list of child-nodes root | node(a) -- node(b) -- ... -- node(n) | nodelevel2(a) -- .... that will require only, letter, child-link, sibling-link, and fault-function value. So, may be you will wish to store queue-link field in each note (that will help you to build Aho-Corasick FSM) - that total less than 24 bytes in each of 100000 nodes Other way: to store for each node dynamic array of used letters and corresonded child-nodes. |
|
|