NDK-049: 哈夫曼编码

1. 哈夫曼编码

编码是信息从一种形式或格式，转换成另一种形式的过程，
用预先规定的方法，将文字、数组或其它对象编成数码，或将信息、数据转换成规定的电脉冲信号。
编码：
解码：

特点：所有元素，都挂在叶子节点上。不会出现在中间的情况。

2.哈夫曼树(霍夫曼树，赫夫曼树)：主要用于压缩，变长的编码，【最优的前缀码二叉树】。

定长编码：1字节8位，代表一个字符。Ascii码。
变长编码：utf编码，哈夫曼编码，长度根据不同的情况来确定。

思想：根据字符出现的频率，来做压缩编码。

路径长度：根节点到目标节点边的的数量（节点个数-1）。对于n长度是4.
节点的权：出现的次数，n出现了2次.
带权路径长度：对于n来讲，2*4.
树的带权路径长度（WLP）：所有节点带全路径长度之和，

怎样才是最优的？【就是树的带权路径长度最小】。
思想：出现频率最高的，在树的最上层；出现频率第的，在树的下层。能让编码长度更短一些。

怎样构建一棵最优的霍夫曼树？

待传输字符：aaaabbcccccccd

ASCII编码结果：
01100001 01100001 01100001 01100001
01100010 01100010
01100011 01100011 01100011 01100011 01100011 01100011 01100011
01100100
占据字节的总位数：8*14=112位

1.统计字符出现的个数
a-> 4次，b-> 2次，c-> 7次，d-> 1次
2.构建霍夫曼树
首先拿2个出现频率最低的出来 bd作为节点：把最小d的放左边，大的b放右边，他们重新构建一棵树。他们俩出现的次数和=3（bd出现的次数之和）.
a->4, (d,b)->3, c->7
再拿2个最小的（3,4），小的左/大的右，继续构建树；
(d,b,a)->7, c->7
再拿最后2个最小的（7,7），继续构建树（相等的情况，新节点放左边；右边放子树）。最终成为霍夫曼树。
c的带权路径：17; d：3=13; b：6=23; a：8=24
总的带权路径长度： 7+3+6+8 = 24

3.编码
树往左边走是0，往右边走是1.
c的编码: 0
d的编码: 100
b的编码: 101
a的编码: 11
霍夫曼编码结果：11111111 101 101 0000000 100 ，占用位数8+6+7+3=24位。

jpeg 压缩比率75%（20-90%）之间。

出现次数越多的，用越少的次数来代替。

前缀码的引出：是否能直接对字符编码，编码成0000000 1111 1010 11，占用位数更少？0代替c，1代替a，10代替b，11代替d。

前缀码：任意字符的编码的前缀，都不可能是另外一个字符的编码。
保证能准确还原数据。不会出现二义性或乱码。
左右子树都为null，就立即执行解码操作。

解码过程：

找到1，往右边走；找到1继续往右边走，然后发现节点的左右叶子都为空，就得到了a；
往右边找1，再得到3个a；
1往右，0往左，1往右，左右子树都空，得到b；
再次得到b；
0往左走，左右子树都空，得到c；
再次得到6个c；
1往右，0往左，0往左，左右子树都空，得到d。

ndk图片，音视频，很多地方用到了霍夫曼编码解码。jpeg压缩图片，就用的霍夫曼。
字符出现频率不同，压缩比率不一样 20~90%。
对于比较频率比较接近的，使用霍夫曼还不如不压缩。直接传效率可能更高。

代码实现霍夫曼的编码和解码。

编码的思想：

统计字符出现的次数
构建出这棵最优的前缀码二叉树
统计字符的编码，abcd分别用什么编码，生成映射表(字符：编码)
根据字符去映射表查找，一个个使用编码替代，生成霍夫曼编码。

解码的思想：

根据编码的结果，从树里面去查找，查到某个节点左右子树都是空，恢复对应的字符。
查到什么字符，就是什么字符（因为是前缀码，不会出现二义性）。

代码实现霍夫曼编码和解码。
《算法4》. 买了做做题目。

// huffman.h
#pragma once
#ifndef _HUFFMAN_H
#define _HUFFMAN_H

//The Huffman tree node definition
typedef struct _htNode { // 树的结构
    char symbol;
    struct _htNode *left, *right;
}htNode;

/*
    We "encapsulate" the entire tree in a structure
    because in the future we might add fields like "size"
    if we need to. This way we don't have to modify the entire
    source code.
*/
typedef struct _htTree {
    htNode *root; // 根节点
}htTree;

//The Huffman table nodes (linked list implementation)
typedef struct _hlNode {
    char symbol; // 字符
    char *code;  // 编码结果
    struct _hlNode *next;
}hlNode;

//Incapsularea listei
typedef struct _hlTable {
    hlNode *first;
    hlNode *last;
}hlTable;

htTree * buildTree(char *inputString);
hlTable * buildTable(htTree *huffmanTree);
void encode(hlTable *table, char *stringToEncode);
void decode(htTree *tree, char *stringToDecode);
#endif

// huffman.cpp
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#include "huffman.h"
#include "pQueue.h"
#include <android/log.h>

#define TAG "TAG"
#define LOGE(...) __android_log_print(ANDROID_LOG_ERROR,TAG,__VA_ARGS__)


// 遍历树，生成映射表table（链表）：c:0-> d:100-> b:101-> a:11.
void traverseTree(htNode *treeNode, hlTable **table, int k, char code[256]) {
    //If we reach the end we introduce the code in the table
    // 1.左右子树都是空
    if (treeNode->left == NULL && treeNode->right == NULL) {
        code[k] = '\0'; // 左右都为空，标记它查完了。
        hlNode *aux = (hlNode *) malloc(sizeof(hlNode));
        aux->code = (char *) malloc(sizeof(char) * (strlen(code) + 1));
        strcpy(aux->code, code);
        aux->symbol = treeNode->symbol;
        aux->next = NULL;
        if ((*table)->first == NULL) {
            // 映射表为 null，链表连起来
            (*table)->first = aux;
            (*table)->last = aux;
        } else {
            // 映射表不为空，aux插入到队尾
            (*table)->last->next = aux;
            (*table)->last = aux;
        }
        return; // 左右子树全空，递归结束。
    }

    // 2.左子树非空，递归处理左子树。
    //We concatenate连接 a 0 for each step to the left
    if (treeNode->left != NULL) { // 左子树非空，递归循环。
        code[k] = '0'; // 左子树中，节点置0.
        traverseTree(treeNode->left, table, k + 1, code);
    }
    //We concatenate a 1 for each step to the right
    if (treeNode->right != NULL) {
        code[k] = '1'; // 右子树中，节点置1.
        traverseTree(treeNode->right, table, k + 1, code);
    }
}

// 二、根据霍夫曼树生成映射表
hlTable *buildTable(htTree *huffmanTree) {
    //We initialize the table，单链表初始化
    hlTable *table = (hlTable *) malloc(sizeof(hlTable));
    table->first = NULL;
    // 这里为什么要有 last，降低插入的时间复杂度。将O(N)变成O(1)
    table->last = NULL;

    //Auxiliary variables
    char code[256]; // 针对某个字符要生成的code，默认开辟256长度。
    //k will memories the level on which the traversal is
    int k = 0; // 标识遍历的层级

    //We traverse the tree and calculate the codes
    // 传入哈夫曼树，单链表，k层级，生成的code数组，
    traverseTree(huffmanTree->root, &table, k, code);
    return table;
}

// 一、统计字符出现的次数，由于只统计英文字母，用256去接就可以。a转Ascii码十进制是97，
htTree *buildTree(char *inputString) {
    //The array in which we calculate the frequency of the symbols
    //Knowing that there are only 256 posibilities of combining 8 bits
    //(256 ASCII characters)
    int *probability = (int *) malloc(sizeof(int) * 256);
    //We initialize the array
    // 数组初始化为 0
    for (int i = 0; i < 256; i++)
        probability[i] = 0;

    //We consider the symbol as an array index and we calculate how many times each symbol appears
    // 数组不是'\n', a对应97，修改index=97里面的值为 字符a出现的次数。 数组比map集合更方便。
    for (int i = 0; inputString[i] != '\0'; i++) // a  p[97] ++   2
        probability[(unsigned char) inputString[i]]++;

    // The queue which will hold the tree nodes
    // 1.初始化队列
    pQueue *huffmanQueue;
    initPQueue(&huffmanQueue); // 二级指针，构建结果在huffmanQueue返回。

    // 2.构建一个队列 ， 不断的从队列中取两个出现次数最小的，生成二叉树后又重新加入队列
    //We create nodes for each symbol in the string
    for (int i = 0; i < 256; i++)
        if (probability[i] != 0) { // !=0, 表示该字符出现过。
            htNode *aux = (htNode *) malloc(sizeof(htNode));
            aux->left = NULL;
            aux->right = NULL;
            aux->symbol = (char) i; // 当前字符
            addPQueue(&huffmanQueue, aux, probability[i]); // 加入队列中
        }

    //We free the array because we don't need it anymore，释放数组（数据已存储到队列中）
    free(probability);
    //We apply the steps described in the article to build the tree

    // 3.构建一棵霍夫曼树，数量不为1，继续构建树。size=1时，霍夫曼树构建完成。
        // 继续取前面2个元素(最小的)，去加入队列。相当于递归。
    while (huffmanQueue->size != 1) {
        // 3.1 第一个的出现频率
        int priority = huffmanQueue->first->priority;
        // 拿第二个与第一个相加（每次拿前面两个），出现次数相加
        priority += huffmanQueue->first->next->priority;

        // 3.2 创建两个树的子节点节点（拿最前面的2个节点）
        htNode *left = getPQueue(&huffmanQueue); // 获取队首元素
        htNode *right = getPQueue(&huffmanQueue); // 继续获取队首元素
        // 3.3 创建一个新的节点
        htNode *newNode = (htNode *) malloc(sizeof(htNode));
        newNode->left = left;
        newNode->right = right;

        // 3.4 新节点，重新插入原来的队列中
        addPQueue(&huffmanQueue, newNode, priority);
    }

    // 4.We create the tree 创建一棵树，拿到它的root。
    htTree *tree = (htTree *) malloc(sizeof(htTree));
    tree->root = getPQueue(&huffmanQueue);

    return tree;
}

// 根据编码的映射表 编码：aaaabbcccccccd编码成
// stringToEncode待压缩的字符串， std::string *str
void encode(hlTable *table, char *stringToEncode) { // 可以+参数 char** result, 用于返回编码的结果。
    hlNode *traversal;
    printf("\nEncoding\nInput string : %s\nEncoded string : \n", stringToEncode);

    //For each element of the string traverse the table
    //and once we find the symbol we output the code for it
    LOGE("len = %d",strlen(stringToEncode));
    for (int i = 0; i < strlen(stringToEncode); i++) { // 将所有待编码字符，一个个处理。
        traversal = table->first; // 每个字符，都从编码的映射表头开始。
        __android_log_print(ANDROID_LOG_ERROR, TAG, "%c", traversal->symbol);
        while (traversal->symbol != stringToEncode[i]){ // 循环遍历单链表，找到映射表中的目标字符。
            traversal = traversal->next;
        }
        // 找到了，打印出来，完成编码动作。编码可通过上面的 *result 参数返回给调用者。
        __android_log_print(ANDROID_LOG_ERROR, TAG, "%c,%s", traversal->symbol, traversal->code);
        // str->append(traversal->code);
    }
    // __android_log_print(ANDROID_LOG_ERROR,TAG,"\n");
}

void decode(htTree *tree, char *stringToDecode) {
    htNode *traversal = tree->root;

    printf("\nDecoding\nInput string : %s\nDecoded string : \n", stringToDecode);

    //For each "bit" of the string to decode
    //we take a step to the left for 0
    //or ont to the right for 1
    for (int i = 0; stringToDecode[i] != '\0'; i++) {
        if (traversal->left == NULL && traversal->right == NULL) {
            LOGE("%c", traversal->symbol);
            traversal = tree->root;
        }

        if (stringToDecode[i] == '0')
            traversal = traversal->left;

        if (stringToDecode[i] == '1')
            traversal = traversal->right;

        if (stringToDecode[i] != '0' && stringToDecode[i] != '1') {
            printf("The input string is not coded correctly!\n");
            return;
        }
    }

    if (traversal->left == NULL && traversal->right == NULL) {
        LOGE("%c", traversal->symbol);
        traversal = tree->root;
    }
}

// qQueue.h
#pragma once
#ifndef _PQUEUE_H
#define _PQUEUE_H

#include "huffman.h"
//We modify the data type to hold pointers to Huffman tree nodes
#define TYPE htNode *
#define MAX_SZ 256

typedef struct _pQueueNode {
    TYPE val;
    unsigned int priority; // 优先级，出现的次数
    struct _pQueueNode *next; // 指向下一个（链表结构）
}pQueueNode;

typedef struct _pQueue {
    unsigned int size;
    pQueueNode *first; // 队列首元素
}pQueue;

void initPQueue(pQueue **queue);
void addPQueue(pQueue **queue, TYPE val, unsigned int priority);
TYPE getPQueue(pQueue **queue);

#endif

// pQueue.cpp
#include "pQueue.h"
#include <stdlib.h>
#include <stdio.h>

// 队列，使用链表来做
void initPQueue(pQueue **queue) // c 进阶
{
    //We allocate memory for the priority queue type
    //and we initialize the values of the fields
    (*queue) = (pQueue *) malloc(sizeof(pQueue));
    (*queue)->first = NULL;
    (*queue)->size = 0;
    return;
}

// 加入队列，出现次数最小的在最前面，出现次数最大的在队尾
void addPQueue(pQueue **queue, TYPE val, unsigned int priority)
{
    //If the queue is full we don't have to add the specified value.
    //We output an error message to the console and return.
    if((*queue)->size == MAX_SZ)
    {
        printf("\nQueue is full.\n");
        return;
    }

    pQueueNode *aux = (pQueueNode *)malloc(sizeof(pQueueNode));
    aux->priority = priority; // 出现的次数
    aux->val = val;           // 值：htNode

    //If the queue is empty we add the first value.
    //We validate twice in case the structure was modified abnormally at runtime (rarely happens).
    if((*queue)->size == 0 || (*queue)->first == NULL)
    {   // 1.队列是空，做第一个元素
        aux->next = NULL;
        (*queue)->first = aux;
        (*queue)->size = 1;
        return;
    }
    else
    {
        //If there are already elements in the queue and the priority of the element
        //that we want to add is greater than the priority of the first element,
        //we'll add it in front of the first element.

        //Be careful, here we need the priorities to be in descending order
        if(priority <= (*queue)->first->priority)
        {   // 2.优先级比第一个元素小，插入到队列的最前面。***** 决定优先级相等时，放左还是右。
            aux->next = (*queue)->first;
            (*queue)->first = aux;
            (*queue)->size++;
            return;
        }
        else
        {
            //We're looking for a place to fit the element depending on it's priority
            pQueueNode * iterator = (*queue)->first;
            while(iterator->next!=NULL)
            {
                //Same as before, descending, we place the element at the begining of the
                //sequence with the same priority for efficiency even if
                //it defeats the idea of a queue.
                // 3.队列中间找到比待插入元素大的item，插入到这个item的前面
                if(priority <= iterator->next->priority)
                {
                    aux->next = iterator->next;
                    iterator->next = aux;
                    (*queue)->size++;
                    return;
                }
                iterator = iterator->next;
            }
            //If we reached the end and we haven't added the element,
            //we'll add it at the end of the queue.
            // 4.遍历完整个队列，仍然没发现出现频率比我大的（只有我最大），放最后
            if(iterator->next == NULL)
            {
                aux->next = NULL;
                iterator->next = aux;
                (*queue)->size++;
                return;
            }
        }
    }
}

// 获取节点
TYPE getPQueue(pQueue **queue)
{
    TYPE returnValue; // 待返回的节点
    //We get elements from the queue as long as it isn't empty
    if((*queue)->size > 0)
    {
        returnValue = (*queue)->first->val; // 队首元素
        // 将 first 指向下一个 ，断开队首的节点
        (*queue)->first = (*queue)->first->next; // 队首指针后移
        (*queue)->size--; // 数量-1
    }
    else
    {
        //If the queue is empty we show an error message.
        //The function will return whatever is in the memory at that time as returnValue.
        //Or you can define an error value depeding on what you choose to store in the queue.
        printf("\nQueue is empty.\n");
    }
    return returnValue;
}

// native-lib.cpp
#include <jni.h>
#include <string>
#include "huffman.h"
#include <android/log.h>

extern "C"
JNIEXPORT jstring
JNICALL
Java_com_ndk_day49_MainActivity_stringFromJNI(
        JNIEnv *env,
        jobject /* this */) {
    //We build the tree depending on the string
    // 1.生成一棵最优的前缀码霍夫曼树
    htTree *codeTree = buildTree("aaaabbcccccccd");
    //We build the table depending on the Huffman tree
    // 2.根据霍夫曼树生成映射表
    hlTable *codeTable = buildTable(codeTree);

    // 3.编码 We encode using the Huffman table
    // std::string *res = new std::string();
    encode(codeTable,"aaaabbcccccccd");

    // encode(codeTable,"aaaabbcccccccd", res);
    // 打印输出 res->c_str() // 编码结果010101010010011111111000
    // __android_log_print(ANDROID_LOG_ERROR, TAG, "code = %s", res->c_str());
    delete(res);
    //We decode using the Huffman tree
    //We can decode string that only use symbols from the initial string
    // 4.解码
    decode(codeTree,"0011111000111");
    //Output : 0011 1110 1011 0001 0010 1010 1100 1111 1000 1001

    std::string hello = "Hello from C++";
    return env->NewStringUTF(hello.c_str());
}

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 218,284评论 6赞 506
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 93,115评论 3赞 395
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 164,614评论 0赞 354
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,671评论 1赞 293
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,699评论 6赞 392
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,562评论 1赞 305
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,309评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,223评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,668评论 1赞 314
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,859评论 3赞 336
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,981评论 1赞 348
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,705评论 5赞 347
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,310评论 3赞 330
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,904评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,023评论 1赞 270
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,146评论 3赞 370
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,933评论 2赞 355