新聞中心
本章將介紹如何在系統(tǒng)中下載,安裝和設(shè)置 Apache Pig 。

先決條件
在你運(yùn)行Apache Pig之前,必須在系統(tǒng)上安裝好Hadoop和Java。因此,在安裝Apache Pig之前,請(qǐng)按照以下鏈接中提供的步驟安裝Hadoop和Java://www.cdcxhl.com/hadoop/hadoop_enviornment_setup.htm
下載Apache Pig
首先,從以下網(wǎng)站下載最新版本的Apache Pig:https://pig.apache.org/
步驟1
打開Apache Pig網(wǎng)站的主頁。在News部分下,點(diǎn)擊鏈接release page,如下面的快照所示。
步驟2
點(diǎn)擊指定的鏈接后,你將被重定向到 Apache Pig Releases 頁面。在此頁面的Download部分下,單擊鏈接,然后你將被重定向到具有一組鏡像的頁面。
步驟3
選擇并單擊這些鏡像中的任一個(gè),如下所示。
步驟4
這些鏡像將帶您進(jìn)入 Pig Releases 頁面。 此頁面包含Apache Pig的各種版本。 單擊其中的最新版本。
步驟5
在這些文件夾中,有發(fā)行版中的Apache Pig的源文件和二進(jìn)制文件。下載Apache Pig 0.16, pig0.16.0-src.tar.gz 和 pig-0.16.0.tar.gz 的源和二進(jìn)制文件的tar文件。
安裝Apache Pig
下載Apache Pig軟件后,按照以下步驟將其安裝在Linux環(huán)境中。
步驟1
在安裝了 Hadoop,Java和其他軟件的安裝目錄的同一目錄中創(chuàng)建一個(gè)名為Pig的目錄。(在我們的教程中,我們?cè)诿麨镠adoop的用戶中創(chuàng)建了Pig目錄)。
$ mkdir Pig
第2步
提取下載的tar文件,如下所示。
$ cd Downloads/ $ tar zxvf pig-0.15.0-src.tar.gz $ tar zxvf pig-0.15.0.tar.gz
步驟3
將 pig-0.16.0-src.tar.gz 文件的內(nèi)容移動(dòng)到之前創(chuàng)建的 Pig 目錄,如下所示。
$ mv pig-0.16.0-src.tar.gz/* /home/Hadoop/Pig/
配置Apache Pig
安裝Apache Pig后,我們必須配置它。要配置,我們需要編輯兩個(gè)文件 - bashrc和pig.properties 。
.bashrc文件
在 .bashrc 文件中,設(shè)置以下變量
-
PIG_HOME 文件夾復(fù)制到Apache Pig的安裝文件夾
-
PATH 環(huán)境變量復(fù)制到bin文件夾
-
PIG_CLASSPATH 環(huán)境變量復(fù)制到安裝Hadoop的etc(配置)文件夾(包含core-site.xml,hdfs-site.xml和mapred-site.xml文件的目錄)。
export PIG_HOME = /home/Hadoop/Pig export PATH = PATH:/home/Hadoop/pig/bin export PIG_CLASSPATH = $HADOOP_HOME/conf
pig.properties文件
在Pig的 conf 文件夾中,我們有一個(gè)名為 pig.properties 的文件。在pig.properties文件中,可以設(shè)置如下所示的各種參數(shù)。
pig -h properties
支持以下屬性:
Logging: verbose = true|false; default is false. This property is the same as -v
switch brief=true|false; default is false. This property is the same
as -b switch debug=OFF|ERROR|WARN|INFO|DEBUG; default is INFO.
This property is the same as -d switch aggregate.warning = true|false; default is true.
If true, prints count of warnings of each type rather than logging each warning.
Performance tuning: pig.cachedbag.memusage=; default is 0.2 (20% of all memory).
Note that this memory is shared across all large bags used by the application.
pig.skewedjoin.reduce.memusagea=; default is 0.3 (30% of all memory).
Specifies the fraction of heap available for the reducer to perform the join.
pig.exec.nocombiner = true|false; default is false.
Only disable combiner as a temporary workaround for problems.
opt.multiquery = true|false; multiquery is on by default.
Only disable multiquery as a temporary workaround for problems.
opt.fetch=true|false; fetch is on by default.
Scripts containing Filter, Foreach, Limit, Stream, and Union can be dumped without MR jobs.
pig.tmpfilecompression = true|false; compression is off by default.
Determines whether output of intermediate jobs is compressed.
pig.tmpfilecompression.codec = lzo|gzip; default is gzip.
Used in conjunction with pig.tmpfilecompression. Defines compression type.
pig.noSplitCombination = true|false. Split combination is on by default.
Determines if multiple small files are combined into a single map.
pig.exec.mapPartAgg = true|false. Default is false.
Determines if partial aggregation is done within map phase, before records are sent to combiner.
pig.exec.mapPartAgg.minReduction=. Default is 10.
If the in-map partial aggregation does not reduce the output num records by this factor, it gets disabled.
Miscellaneous: exectype = mapreduce|tez|local; default is mapreduce. This property is the same as -x switch
pig.additional.jars.uris=. Used in place of register command.
udf.import.list=. Used to avoid package names in UDF.
stop.on.failure = true|false; default is false. Set to true to terminate on the first error.
pig.datetime.default.tz=. e.g. +08:00. Default is the default timezone of the host.
Determines the timezone used to handle datetime datatype and UDFs.
Additionally, any Hadoop property can be specified.
驗(yàn)證安裝
通過鍵入version命令驗(yàn)證Apache Pig的安裝。如果安裝成功,你將獲得Apache Pig的正式版本,如下所示。
$ pig –version Apache Pig version 0.16.0 (r1682971) compiled Jun 01 2015, 11:44:35
標(biāo)題名稱:創(chuàng)新互聯(lián)ApachePig教程:ApachePig安裝
當(dāng)前網(wǎng)址:http://fisionsoft.com.cn/article/dphccee.html


咨詢
建站咨詢
