Skip to main content

一个允许同时下载数千个文件的python包

项目描述

黑饲料

BlackFeed 是一个微型 Python 库,可让您同时下载和上传文件。您可以在本地下载文件,但也可以将它们上传到云而不将它们写入磁盘。

需要的包

使用pip自动安装

  • 要求
  • 博托3

安装

pip install blackfeed

用法

将文件下载并上传到 AWS S3 要使其正常工作,必须配置 AWS CLI

from blackfeed import Downloader
from blackfeed.adapter import S3Adapter

queue = [
    {
        'url': 'https://www.example.com/path/to/image.jpg', # Required
        'destination': 'some/key/image.jpg' # S3 key - Required 
    },{
        'url': 'https://www.example.com/path/to/image2.jpg',
        'destination': 'some/key/image2.jpg' 
    }
]

downloader = Downloader(
    S3Adapter(bucket='bucketname'),
    multi=True, # If true, uploads files to images to S3 with multithreading
    stateless=False # If set to False, it generates and stores md5 hashes of files in a file
    state_id='flux_states' # name of the file where hashes will be stored (states.txt) not required
    bulksize=200 # Number of concurrent downloads
)
downloader.process(queue)
stats = downloader.get_stats() # Returns a dict with information about the process

下载带有状态的文件

如果您不想重新下载同一个文件两次,加载状态会很有用。

from blackfeed import Downloader
from blackfeed.adapter import S3Adapter

queue = [
...
]

downloader = Downloader(
    S3Adapter(bucket='bucketname'),
    multi=True,
    stateless=False,
    state_id='filename'
)

# You can add a callback function if needed
# This function will be called after each bulk is processed
def callback(responses):
    # response: {
    #    'destination': destination of the file can be local or can be S3 key,
    #    'url': URL from where the file was downloaded,
    #    'httpcode': HTTP code returned by the server,
    #    'status': True|False,
    #    'content-type': Mime type of the downloaded resource Example: image/jpeg
    # }
    # responses: response[]

    pass # Your logic

downloader.set_callback(callback)

downloader.load_states('filename') # This will load states from "filename.txt"
downloader.process(queue)
stats = downloader.get_stats() # Statistics 

弹性下载器

让您轻松地从 FTP、SFTP 和 HTTP/S 服务器下载/检索文件。

例子

从 FTP 下载文件

from blackfeed import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader()
res = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional
# .download() function returns False if there was an error or return the local path of the downloaded file if it was a success.
print(res)
/tmp/myfile.csv

从 FTP 检索文件的二进制内容

from blackfeed import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader()
res = retriever.retrieve(uri) # Return type: io.BytesIO | False

with open('/tmp/myfile.csv', 'wb') as f:
    f.write(res.getvalue())

ElasticDownloader可以自动处理 FTP、SFTP 和 HTTP URI。使用方法download下载文件到本地,使用retrieve方法获取文件的二进制内容。

下载文件

下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。

内置分布

blackfeed-0.0.19-py3-none-any.whl (11.2 kB 查看哈希)

已上传 py3