`
philip_kissme
  • 浏览: 16211 次
  • 来自: ...
社区版块
存档分类
最新评论

Yarn临时目录不足导致Hive任务失败

阅读更多

从一张已有的Hive Table中创建新表及Partition出现如下问题

  1. 原有Hive Table中有160g数据(为三个月中所有应用和服务器的用户访问记录)
  2. 新表选取需要字段,并按照应用/服务器Ip/访问时间创建Partition
  3. //创建table
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    
    CREATE TABLE IF NOT EXISTS app_trace(
          trace_id string,
          client_ip string,
          user_device string,
          user_id string,
          user_account string,
          org_id string,
          org_name string,
          org_path string,
          org_parent_id string,
          url string,
          completed boolean,
          cost int,
          create_time bigint,
          parameters map<string,string>,
          subtrace array<string>
    )
    PARTITIONED BY (app_id int,server_ip string,create_date string)
    ROW FORMAT DELIMITED
          FIELDS TERMINATED BY '\|'
          COLLECTION ITEMS TERMINATED BY '\$'
          MAP KEYS TERMINATED BY '\:'
    STORED AS SEQUENCEFILE
    
    //加载数据
    insert OVERWRITE table app_trace partition(app_id,server_ip,craete_date)
      select
          trace_id,
          client_ip,
          user_device,
          user_id,
          user_account,
          org_id,
          org_name,
          org_path,
          org_parent_id,
          url,
          completed,
          cost,
          create_time,
          parameters,
          subtrace,
          app_id,
          server_ip,
          create_date
      from user_trace;
  4. Hive错误信息 写道
    Task with the most failures(4):
    -----
    Task ID:
    task_1418272031284_0203_r_000071

    URL:
    http://HADOOP-5-101:8088/taskdetails.jsp?jobid=job_1418272031284_0203&tipid=task_1418272031284_0203_r_000071
    -----
    Diagnostic Messages for this Task:
    Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in InMemoryMerger - Thread to merge in-memory shuffled map-outputs
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
    Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:221)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:250)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:208)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:476)
    at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
    Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:345)
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:219)
    ... 11 more


    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    MapReduce Jobs Launched:
    Job 0: Map: 282 Reduce: 80 Cumulative CPU: 12030.1 sec HDFS Read: 79178863622 HDFS Write: 15785449373 FAIL
    Total MapReduce CPU Time Spent: 0 days 3 hours 20 minutes 30 seconds 100 msec

经过排查,发现

  1. HDFS存储正常
    [jyzx@HADOOP-5-101 main_disk]$ hdfs dfs -df -h
    Filesystem Size Used Available Use%
    hdfs://HADOOP-5-101:8020 8.9 T 625.9 G 7.8 T 7%
  2. DataNode本地存储异常
    [jyzx@HADOOP-5-101 main_disk]$ df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/VolGroup-lv_root
    50G 46G 837M 99% /
    tmpfs 7.8G 56K 7.8G 1% /dev/shm
    /dev/cciss/c0d0p1 485M 32M 428M 7% /boot 
  3. 具体出现问题的目录
    /hadoop/yarn/local/usercache

    [root@HADOOP-6-199 local]# du -h --max-depth=1
    4.0K ./usercache_DEL_1411698127772
    4.0K ./usercache_DEL_1411700964513
    4.0K ./usercache_DEL_1411713191383
    4.0K ./usercache_DEL_1418272057670
    4.0K ./usercache_DEL_1411699568217
    628K ./filecache
    4.0K ./usercache_DEL_1411713338641
    7.2G ./usercache
    4.0K ./usercache_DEL_1411698079868
    4.0K ./usercache_DEL_1411713240205
    104K ./nmPrivate
    7.2G .
  4. /hadoop/yarn/local/usercache
    是yarn的node-manager本地目录
    yarn.nodemanager.local-dirs=/hadoop/yarn/local/usercache
     

解决方法

  • 只需要修改yarn的配置yarn.nodemanager.local-dirs,指定到更大的存储上即可
  • yarn.nodemanager.local-dirs=/mnt/disk1/hadoop/yarn/local/usercache
  • 重启yarn集群

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics