表数据如下:
user_id interest_array tom [a,b,c,d,g,w] bob [e,d,s,d,g,w,s] cat [a] harry [] peter NULL
目的是按顺序选择每行“interest_array”中的前 3 个元素并将其作为数组返回,输出如下:
user_id output_array tom [a,b,c] bob [e,d,s] cat [a] harry [] peter NULL实现方式 1、简单的方法,但如果初始数组可以包含较少的元素(结果数组将包含NULL),它将无法正常工作。
with mydata as( select array('a','b','c','d','g','w') as original_array ) select original_array, array(original_array[0], original_array[1], original_array[2]) as first_3_array from mydata
结果:
original_array first_3_array ["a","b","c","d","g","w"] ["a","b","c"]2、另一种使用explode的方法,适用于任何数组:
使用poseexplode分解数组,过滤位置<=2,再次收集数组:
with mydata as( select array('a','b','c','d','g','w') as original_array ) select original_array, collect_list(e.element) as first_3_array from mydata lateral view outer posexplode(original_array) e as pos, element where pos<=2 group by original_array
结果:
original_array first_3_array ["a","b","c","d","g","w"] ["a","b","c"]3、更高效的方法,不爆炸:用逗号分隔符连接数组,使用regexp提取最多3个第一个元素的子串,再次拆分:
with mydata as( select array('a') as original_array ) select original_array, split(regexp_replace(regexp_extract(concat_ws(',', original_array), '^(([^,]*,?){1,3})',1), ',$','') --remove last delimiter ,',') as first_3_array from mydata