文章目录
- 0. 数据准备
- 1. polygon的坐标系转换
- 1.1 polygon生成
- 1.1.1 输入数据是shapefile
- 1.1.2 输入数据是polygon
- 1.2 搞清楚遥感的坐标系和polygon的坐标系(重点)
- 1.3 开始转换
- 2. 基于polygon的遥感影像裁剪
- 2.1 基础裁剪方法
- 2.1.1 使用rasterio保存
- 2.1.2 使用numpy保存
- 2.2 多线程裁剪
- 3. 成品代码
传统裁剪遥感影像的方式是使用gdal裁剪。但是gdal需要安装相应的库和依赖非常麻烦。rasterio就不需要安装gdal,是非常好的替代品。
附带一些同样不需要gdal就能够python安装的地理库
pyproj
shapely
# GDAL # 咱就是说不要你了
# osgeo # 这个库就是对gdal的封装,本质还是gdal
Fiona
geopandas
rasterio
pyshp
0. 数据准备
- 原始遥感影像
- polygon数据
- python库:rasterio
1. polygon的坐标系转换
坐标系概念讲的很好的文章:https://www.cnblogs.com/onsummer/p/12081889.html
这里需要介绍坐标系的几个概念
- 地理坐标系统,英文简写GCS,Geographical Coordinate System。说白了就是经纬度
- 投影坐标系,英文简写PCS,Projection Coordinate System。说白了就是米为单位的坐标系
遥感影像裁剪的时候,需要在投影坐标系的基础上进行裁剪。因此需要进行坐标系转换。坐标系的转换分成三步
1.1 polygon生成
这一步,我们需要得到shapely.geometry.Polygon
的对象,有两种数据输入格式
1.1.1 输入数据是shapefile
shapefile中可以包含多个polygon,因此需要一个for循环遍历得到所有的polygon。此时获得的polygon自动就是shapely.geometry.Polygon
对象
shpfile = 'beijing_test_2_wgs84.shp'
shpdata = gpd.read_file(shpfile)# 其实这里就可以执行1.2的坐标系转换了,可以看了1.2后回来再看这里
# shpdata = shpdata.to_crs(arcgis_crs)for j in range(0, len(shpdata)):# 此时获得的polygon自动就是`shapely.geometry.Polygon`对象polygon = shpdata.geometry[j]
1.1.2 输入数据是polygon
通过以下代码生成shapely.geometry.Polygon
对象
def create_polygon(polygon_str):"""生成polygon对象:param polygon_str: polygon字符串,形如"x1,y1;x2,y2;x3,y3":return: shapely.geometry.Polygon对象"""coords = []for coord in polygon_str.split(';'):x, y = coord.split(',')coords.append((float(x), float(y)))polygon = Polygon(coords)return polygon
1.2 搞清楚遥感的坐标系和polygon的坐标系(重点)
转换的时候需要用到crs
模块,表示坐标系。由于polygon的坐标系需要向遥感的坐标系靠近,因此需要获得遥感的坐标系,如下代码所示
import rasterio as rio
with rio.open(rasterPath) as rasterdata:out_crs = rasterdata.crs
我们处理数据的时候遇到一个问题,就是遥感的坐标系是arcgis自定义的一种坐标系,因此需要自己生成crs。生成crs的方法有以下几种
from rasterio import crs
crs.CRS.from_epsg # 常用
crs.CRS.from_wkt # 常用
crs.CRS.from_authority
crs.CRS.from_dict
crs.CRS.from_proj4
crs.CRS.from_string
crs.CRS.from_user_input
我们用到的crs是这样的,是arcgis自己的一套坐标系。如果不用相同的方式改变polygon的坐标系,就会导致裁剪歪掉
out_crs = arcgis_crs = crs.CRS.from_wkt("""PROJCRS["WGS_1984_Web_Mercator_Auxiliary_Sphere",BASEGEOGCRS["WGS 84",DATUM["World Geodetic System 1984",ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],ID["EPSG",4326]],CONVERSION["Mercator (variant A)",METHOD["Mercator (variant A)",ID["EPSG",9804]],PARAMETER["Latitude of natural origin",0,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8801]],PARAMETER["Longitude of natural origin",0,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8802]],PARAMETER["Scale factor at natural origin",1,SCALEUNIT["unity",1],ID["EPSG",8805]],PARAMETER["False easting",0,LENGTHUNIT["metre",1],ID["EPSG",8806]],PARAMETER["False northing",0,LENGTHUNIT["metre",1],ID["EPSG",8807]]],CS[Cartesian,2],AXIS["easting",east,ORDER[1],LENGTHUNIT["metre",1,ID["EPSG",9001]]],AXIS["northing",north,ORDER[2],LENGTHUNIT["metre",1,ID["EPSG",9001]]]]""")
1.3 开始转换
将polygon对象转换成gdf(geodataframe)对象,然后指定目标crs进行转换。
import geopandas as gpd
from rasterio import crspolygon = "x1,y1;x2,y2;x3,y3"
# 1.首先将polygon转换成gdf
raw_crs = crs.CRS.from_epsg(4326)
polygon = create_polygon(polygon) # 1.1.2定义的函数
gdf = gpd.GeoDataFrame(geometry=[polygon], crs=raw_crs) # 这里的raw_crs# 2.其次将gdf进行坐标系转换,并再次获得polygon
gdf = gdf.to_crs(out_crs) # 1.2定义的out_crs
polygon = gdf.geometry[0]
2. 基于polygon的遥感影像裁剪
得到polygon后就可以进行裁剪了,有下面两种方法
2.1 基础裁剪方法
直接用polygon得到的信息得到mask遮罩,就可以获取裁剪后的结果了。
import rasterio as rio
from rasterio.mask import mask
with rio.open(rasterPath) as rasterdata:out_crs = rasterdata.crsfeature = [polygon.__geo_interface__] # 1.3中得到的polygonout_image, out_transform = mask(rasterdata, feature, all_touched=True, crop=True, nodata=255)
2.1.1 使用rasterio保存
使用rasterio可以保存更多的地理信息
import rasterio as rio
from rasterio.mask import mask
with rio.open(rasterPath) as rasterdata:# 1.获取基本信息out_meta = rasterdata.meta.copy() # 2.裁剪out_crs = rasterdata.crsfeature = [polygon.__geo_interface__] # 1.3中得到的polygonout_image, out_transform = mask(rasterdata, feature, all_touched=True, crop=True, nodata=255)# 3.更新信息并保存out_meta.update(height=out_image.shape[1],width=out_image.shape[2],shape=(out_image.shape[1], out_image.shape[2]),nodata=255,crs=rasterdata.crs,bounds=[],transform=out_transform,)with rio.open(f"{文件输出路径}", mode='w', **out_meta) as dst:dst.write(out_image)
2.1.2 使用numpy保存
使用numpy就是纯纯保存裁剪结果了
import rasterio as rio
from rasterio.mask import mask
with rio.open(rasterPath) as rasterdata:# 1.获取基本信息out_meta = rasterdata.meta.copy() # 2.裁剪out_crs = rasterdata.crsfeature = [polygon.__geo_interface__] # 1.3中得到的polygonout_image, out_transform = mask(rasterdata, feature, all_touched=True, crop=True, nodata=255)# 3.保存out_image_PIL = Image.fromarray(np.transpose(out_image, (1, 2, 0))).convert('RGB')out_image_PIL.save(f"{文件输出路径}", "PNG", quality=95, optimize=True, progressive=True)
2.2 多线程裁剪
如果直接用Pool
, ProcessPoolExecutor
等库进行裁剪,会遇到如下问题:
TypeError: self._hds cannot be converted to a Python object for pickling
这是由于 https://github.com/rasterio/rasterio/issues/1731#issuecomment-514781590
栅格数据集(rasterio DatasetReader类型)不能被pickle,也不能在进程或线程之间共享。解决方法是分发数据集标识符(路径或 URI),然后在新线程中打开它们。
解决方法-参照官方文档:https://rasterio.readthedocs.io/en/latest/topics/concurrency.html
"""thread_pool_executor.pyOperate on a raster dataset window-by-window using a ThreadPoolExecutor.Simulates a CPU-bound thread situation where multiple threads can improve
performance.With -j 4, the program returns in about 1/4 the time as with -j 1.
"""import concurrent.futures
import multiprocessing
import threadingimport rasterio
from rasterio._example import computedef main(infile, outfile, num_workers=4):"""Process infile block-by-block and write to a new fileThe output is the same as the input, but with band orderreversed."""with rasterio.open(infile) as src:# Create a destination dataset based on source params. The# destination will be tiled, and we'll process the tiles# concurrently.profile = src.profileprofile.update(blockxsize=128, blockysize=128, tiled=True)with rasterio.open(outfile, "w", **src.profile) as dst:windows = [window for ij, window in dst.block_windows()]# We cannot write to the same file from multiple threads# without causing race conditions. To safely read/write# from multiple threads, we use a lock to protect the# DatasetReader/Writerread_lock = threading.Lock()write_lock = threading.Lock()def process(window):with read_lock:src_array = src.read(window=window)# The computation can be performed concurrentlyresult = compute(src_array)with write_lock:dst.write(result, window=window)# We map the process() function over the list of# windows.with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:executor.map(process, windows)
3. 成品代码
多线程方式裁剪
import rasterio as rio
from rasterio.mask import mask
import geopandas as gpdimport concurrent.futures
import threading
from rasterio import crs
import numpy as np
from PIL import Image
import timedef save_result(out_image, name):out_image_PIL = Image.fromarray(np.transpose(out_image, (1, 2, 0))).convert('RGB')out_image_PIL.save(name, "PNG", quality=95, optimize=True, progressive=True)print(name, "保存完毕")return nameif __name__ == '__main__':rasterPath = '/Users/donganning/Desktop/dan_ali_work_space/各种文件/临时实验数据/乌海市/Level16/乌海市.tif'shpfile = '/Users/donganning/Desktop/dan_ali_work_space/Air_Pro/air_project/测试遥感影像剪切/测试/乌海市shp/wuhai_test_1_wgs84.shp'"""1. polygon坐标系转换"""shpdata = gpd.read_file(shpfile)raw_crs = crs.CRS.from_epsg(4326)shpdata = shpdata.to_crs(raw_crs)polygon = shpdata.geometry[0]polygons = [polygon]*5"""2. 多线程处理"""result_list = []"""2.1 读取数据"""with rio.open(rasterPath) as rasterdata:# 定义读取锁read_lock = threading.Lock()"""2.2 定义处理方法"""def process(polygon):"""使用锁锁住rasterio DatasetReader对象,执行裁剪操作"""with read_lock:feature = [polygon.__geo_interface__]# 通过feature裁剪栅格影像out_image, out_transform = mask(rasterdata, feature, all_touched=True, crop=True,nodata=255)# 裁剪完毕,可以写入result = save_result(out_image, f"out-{time.time()}.png")result_list.append(result)"""2.3 执行多线程操作"""with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:executor.map(process, polygons)print(result_list)