concepts

hbase的数据结构是每行一条数据，列是以column family和column的形式组织的。行数据有一个row key，row key是以byte 数组形式存储的，理论上可以是任何可序列化的数据。数据存储也是按照row key排序的（新加入数据的rowkey可以指定自增也可以自己指定）。
column首先被group成column family。在同一个family中的column有相同的prefix。column family prefix必须是printable character组成的。column后缀可以是任何bytes，column family和后缀之间用”:”链接
。column family必须被添加到schema中，但是column可以被动态创建。在储存上，一个column的member会被储存到一起，tuning和storage specification也是在column family这个level。（建议同一个cf里面的元素有相同的数据大小特征和访问模式特征，这样可以充分利用相同特征对cf优化）。
对于一行的一列，一般称之为cell，他会储存多个版本的信息，这是和一般数据库最不同的地方。

自己的理解：column family是和传统数据库的column对应，column family里的内容可以理解成一个字典表。

regions

table会按行分割为多个regions。每个region包含table中的一部分行。hbase的并行化就是通过对table分成region实现的

Implementation

hbase 也是由client，worker，coordination master组成。

hbase master主要管理region到registered regionservers的分配，并且管理regionserver的失败。
regionserver管理0到多个region的读写请求，并且管理region的split。split后会和master通信并由master分配新的server和region的对应

HBase中有两张特殊的Table，-ROOT-和.META.

.META.：记录了用户表的Region信息，.META.可以有多个regoin
-ROOT-：记录了.META.表的Region信息，-ROOT-只有一个region
Zookeeper中记录了-ROOT-表的location

同时HRegionServer也会把自己以Ephemeral方式注册到Zookeeper中，使得HMaster可以随时感知到各个HRegionServer的健康状态。此外，Zookeeper也避免了HMaster的单点问题

hbase是先把数据写到memstore里（缓存），满了以后在flush进去，同时他也会写一个hlog，作为错误恢复用。

MR utils

hbase提供的api在org.apache.hadoop.hbase.mapreduce package里，通过在run函数里用scan来读取内容。map的输入数据key为immutableByteWritable， vale为Result类。
scan有filter功能，可以filter只读哪些行

//只拿family是d,column是s:1E的值为filterStr的行
	SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
          Bytes.toBytes("d"),
          Bytes.toBytes("s:1E"),
          CompareFilter.CompareOp.EQUAL,
          Bytes.toBytes("filterStr"));
   filterList.addFilter(filter1);
   scan.setFilter(filterList);
   
   //只取某些列
	byte[] rawData = Bytes.toBytes(kgSchema.longNameToShort("m:common.topic.raw_data"));
   scan.addColumn(D_BYTES, rawData);
   scan.addColumn(D_BYTES, Bytes.toBytes("s:1E"));
   
   TableMapReduceUtil.initTableMapperJob(tableName,
                scan,
                MapTask.class,
                LongWritable.class,
                BytesRefArrayWritable.class,
                job);

创建一个hbase table

1 2	表名 cf名 hbase 》create 'stations', {NAME => 'info'}

向hbase table导入数据

public class HBaseTemperatureImporter extends Configured implements Tool {
    static class HBaseTemperatureMapper<K> extends Mapper<LongWritable, Text, K, Put> {
        private NcdcRecordParser parser = new NcdcRecordParser();
        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            parser.parse(value.toString());
            if (parser.isValidTemperature()) {
                byte[] rowKey = RowKeyConverter.makeObservationRowKey(parser.getStationId(), 4.See Daniel J.Abadi, “Column - Stores
                for Wide and Sparse Data,”January 2007.
                Building an Online Query Application | 591
                parser.getObservationDate().getTime());
                //重点就是这句 put和add
                Put p = new Put(rowKey);
                p.add(HBaseTemperatureQuery.DATA_COLUMNFAMILY,
                        HBaseTemperatureQuery.AIRTEMP_QUALIFIER,
                        Bytes.toBytes(parser.getAirTemperature()));
                context.write(null, p);
            }
        }
    }
    @Override
    public int run(String[] args) throws Exception {
        if (args.length != 1) {
            System.err.println("Usage: HBaseTemperatureImporter <input>");
            return -1;
        }
        Job job = new Job(getConf(), getClass().getSimpleName());
        job.setJarByClass(getClass());
        FileInputFormat.addInputPath(job, new Path(args[0]));
        job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "observations");
        job.setMapperClass(HBaseTemperatureMapper.class);
        job.setNumReduceTasks(0);
        //reduce直接用这个类即可
        job.setOutputFormatClass(TableOutputFormat.class);
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(HBaseConfiguration.create(),
                new HBaseTemperatureImporter(), args);
        System.exit(exitCode);
    }
}

从htable里取元素（主要看如何建立htable以及如何使用get函数，另外这个extends configure, implements tool以后可以在hbase里直接用，比如这个就是可以hbase HBaseTemperatureQuery 011990-99999）

public class HBaseTemperatureQuery extends Configured implements Tool {
    static final byte[] DATA_COLUMNFAMILY = Bytes.toBytes("data");
    static final byte[] AIRTEMP_QUALIFIER = Bytes.toBytes("airtemp");
    public NavigableMap<Long, Integer> getStationObservations(HTable table, String stationId, long maxStamp, int maxCount) throws IOException {
        byte[] startRow = RowKeyConverter.makeObservationRowKey(stationId, maxStamp);
        NavigableMap<Long, Integer> resultMap = new TreeMap<Long, Integer>();
        Scan scan = new Scan(startRow);
        scan.addColumn(DATA_COLUMNFAMILY, AIRTEMP_QUALIFIER);
        ResultScanner scanner = table.getScanner(scan);
        try {
            Result res;
            int count = 0;
            while ((res = scanner.next()) != null && count++ < maxCount) {
                byte[] row = res.getRow();
                byte[] value = res.getValue(DATA_COLUMNFAMILY, AIRTEMP_QUALIFIER);
                Long stamp = Long.MAX_VALUE -
                        Bytes.toLong(row, row.length - Bytes.SIZEOF_LONG, Bytes.SIZEOF_LONG);
                Integer temp = Bytes.toInt(value);
                resultMap.put(stamp, temp);
            }
        } finally {
            scanner.close();
        }
        return resultMap;
    }
    public int run(String[] args) throws IOException {
        if (args.length != 1) {
            System.err.println("Usage: HBaseTemperatureQuery <station_id>");
            return -1;
        }
        HTable table = new HTable(HBaseConfiguration.create(getConf()), "observations");
        try {
            NavigableMap<Long, Integer> observations =
                    getStationObservations(table, args[0], Long.MAX_VALUE, 10).descendingMap();
            for (Map.Entry<Long, Integer> observation : observations.entrySet()) { // Print the date, time, and temperature
                System.out.printf("%1$tF %1$tR\t%2$s\n", observation.getKey(),
                        observation.getValue());
            }
            return 0;
        } finally {
            table.close();
        }
    }
    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(HBaseConfiguration.create(),
                new HBaseTemperatureQuery(), args);
        System.exit(exitCode);
    }
}

Hbase使用时会一直打开hdfs文件

http://www.searchtb.com/2011/01/understanding-hbase.html

本文采用创作共用保留署名-非商业-禁止演绎4.0国际许可证，欢迎转载，但转载请注明来自http://thousandhu.github.io，并保持转载后文章内容的完整。本人保留所有版权相关权利。

本文链接：http://thousandhu.github.io/2016/02/04/Hadoop-The-Definitive-Guide-4th读书笔记-chapter-20-hbase-md/