一个允许同时下载数千个文件的python包
项目描述
黑饲料
BlackFeed 是一个微型 Python 库,可让您同时下载和上传文件。您可以在本地下载文件,但也可以将它们上传到云而不将它们写入磁盘。
需要的包
使用pip自动安装
- 要求
- 博托3
安装
pip install blackfeed
用法
将文件下载并上传到 AWS S3 要使其正常工作,必须配置 AWS CLI
from blackfeed import Downloader
from blackfeed.adapter import S3Adapter
queue = [
{
'url': 'https://www.example.com/path/to/image.jpg', # Required
'destination': 'some/key/image.jpg' # S3 key - Required
},{
'url': 'https://www.example.com/path/to/image2.jpg',
'destination': 'some/key/image2.jpg'
}
]
downloader = Downloader(
S3Adapter(bucket='bucketname'),
multi=True, # If true, uploads files to images to S3 with multithreading
stateless=False # If set to False, it generates and stores md5 hashes of files in a file
state_id='flux_states' # name of the file where hashes will be stored (states.txt) not required
bulksize=200 # Number of concurrent downloads
)
downloader.process(queue)
stats = downloader.get_stats() # Returns a dict with information about the process
下载带有状态的文件
如果您不想重新下载同一个文件两次,加载状态会很有用。
from blackfeed import Downloader
from blackfeed.adapter import S3Adapter
queue = [
...
]
downloader = Downloader(
S3Adapter(bucket='bucketname'),
multi=True,
stateless=False,
state_id='filename'
)
# You can add a callback function if needed
# This function will be called after each bulk is processed
def callback(responses):
# response: {
# 'destination': destination of the file can be local or can be S3 key,
# 'url': URL from where the file was downloaded,
# 'httpcode': HTTP code returned by the server,
# 'status': True|False,
# 'content-type': Mime type of the downloaded resource Example: image/jpeg
# }
# responses: response[]
pass # Your logic
downloader.set_callback(callback)
downloader.load_states('filename') # This will load states from "filename.txt"
downloader.process(queue)
stats = downloader.get_stats() # Statistics
弹性下载器
让您轻松地从 FTP、SFTP 和 HTTP/S 服务器下载/检索文件。
例子
从 FTP 下载文件
from blackfeed import ElasticDownloader
uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'
retriever = ElasticDownloader()
res = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional
# .download() function returns False if there was an error or return the local path of the downloaded file if it was a success.
print(res)
/tmp/myfile.csv
从 FTP 检索文件的二进制内容
from blackfeed import ElasticDownloader
uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'
retriever = ElasticDownloader()
res = retriever.retrieve(uri) # Return type: io.BytesIO | False
with open('/tmp/myfile.csv', 'wb') as f:
f.write(res.getvalue())
ElasticDownloader可以自动处理 FTP、SFTP 和 HTTP URI。使用方法download下载文件到本地,使用retrieve方法获取文件的二进制内容。