Bing Image Search API

搞大创的时候本来说要自己做一个数据集，写了一个人脸图片数据集制作的脚本，结果之后开会选择了另外一个方案，数据集也不用自己准备了，这东西在我们的项目就啥用没有了。

但我还是想把它给记录一下，这是我第一次自己使用这种API服务，真的惊叹于它的速度和准确率，比什么爬虫快多了，质量也高多了，真的是性能的力量。

这里用到的是 bing 的图片搜索API，可以在 Microsoft Azure 用学生认证申请1000调用/月的资源，获得密钥后填入下面脚本里的 API_KEY。

从这里进入申请并创建资源 https://www.microsoft.com/en-us/bing/apis/bing-image-search-api

# -*- coding: utf-8 -*-
# @Author  : XFishalways
# @Time    : 2022/9/20 12:56 AM
# @Function: Generate bing image search API to create a celebrity dataset

# import the necessary packages
from requests import exceptions
import argparse
import requests
import cv2
import os

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-q", "--query", required=True,
                help="search query to search Bing Image API for")
ap.add_argument("-o", "--output", required=True,
                help="path to output directory of images")
args = vars(ap.parse_args())

# set your Microsoft Cognitive Services API key along with (1) the
# maximum number of results for a given search and (2) the group size
# for results (maximum of 3 per request)
API_KEY = "2e80ad1dd6fd48acb2f07a241877d590"
MAX_RESULTS = 6
GROUP_SIZE = 3
# set the endpoint API URL
URL = "https://api.bing.microsoft.com/v7.0/images/search"

# when attempting to download images from the web both the Python
# programming language and the requests' library have a number of
# exceptions that can be thrown so let's build a list of them now,
# so we can filter on them
EXCEPTIONS = {IOError, FileNotFoundError, exceptions.RequestException, exceptions.HTTPError, exceptions.ConnectionError,
              exceptions.Timeout}

# store the search term in a convenience variable then set the
# headers and search parameters
term = args["query"]
headers = {"Ocp-Apim-Subscription-Key": API_KEY}
params = {"q": term, "offset": 0, "count": GROUP_SIZE}

# make the search
print("[INFO] searching Bing API for '{}'".format(term))
search = requests.get(URL, headers=headers, params=params)
search.raise_for_status()

# grab the results from the search, including the total number of
# estimated results returned by the Bing API
results = search.json()
estNumResults = min(results["totalEstimatedMatches"], MAX_RESULTS)
print("[INFO] {} total results for '{}'".format(estNumResults,
                                                term))

# initialize the total number of images downloaded thus far
total = 0

# loop over the estimated number of results in `GROUP_SIZE` groups
for offset in range(0, estNumResults, GROUP_SIZE):
    # update the search parameters using the current offset, then
    # make the request to fetch the results
    print("[INFO] making request for group {}-{} of {}...".format(
        offset, offset + GROUP_SIZE, estNumResults))
    params["offset"] = offset
    search = requests.get(URL, headers=headers, params=params)
    search.raise_for_status()
    results = search.json()
    print("[INFO] saving images for group {}-{} of {}...".format(
        offset, offset + GROUP_SIZE, estNumResults))

    # loop over the results
    for v in results["value"]:
        # try to download the image
        try:
            # make a request to download the image
            print("[INFO] fetching: {}".format(v["contentUrl"]))
            r = requests.get(v["contentUrl"], timeout=30)
            # build the path to the output image
            ext = v["contentUrl"][v["contentUrl"].rfind("."):]
            p = os.path.sep.join([args["output"], "{}{}".format(
                str(total).zfill(8), ext)])
            # write the image to disk
            f = open(p, "wb")
            f.write(r.content)
            f.close()
        # catch any errors that would be not unable us to download the
        # image
        except Exception as e:
            # check to see if our exception is in our list of
            # exceptions to check for
            if type(e) in EXCEPTIONS:
                print("[INFO] skipping: {}".format(v["contentUrl"]))
                continue

        # try to load the image from disk
        image = cv2.imread(p)
        # if the image is `None` then we could not properly load the
        # image from disk (so it should be ignored)
        if image is None:
            print("[INFO] deleting: {}".format(p))
            os.remove(p)
            continue
        # update the counter
        total += 1

先要配置好 OpenCV 环境，在 mac 里还是很好弄的，brew install 就行。

MAX_RESULTS 和 GROUP_SIZE 都是可以自己调整的

整体思路大概是按批次循环发送请求，接收 json 数据，进行图片的读取写入，再同时对每组的写入的图片使用 opencv 检查图片是否能够加载出来，如果不能就删除且保持 total 计数器不变重新开始循环再次发送请求，如果可以则代表本次成功，total+=1。

另外要注意的一点是要先列出可能发生的异常，在处理 json 时套上 try catch，异常可能包括：

路径异常 => 文件读取问题
请求异常 => 响应 or 返回数据问题
网络异常 => 连接问题

运行就是通过命令行，query 和 output 两个参数，分别代表搜索内容和输出路径，输出路径要先创建好。

python3 search_bing_api.py --query "keyword" --output dataset/keyword

这里的路径当然是相对于 python 脚本的根目录的，上面的例子中 dataset 文件夹是和脚本文件同级的。

总的来说，体验还是很不错的，我测试了一下几十张图片基本一下子就搞定，一点不卡，而且质量很高，人脸图片全部准确无误，还包含各种表情和角度，虽然目前还没什么能用到的地方，但如果之后做一些需要特定类型图片数据集的，不仅限于人脸，各种动物或者物体也可以，这个就感觉会很实用很高效。

参考文档：

https://learn.microsoft.com/en-us/azure/cognitive-services/bing-image-search/quickstarts/python

https://learn.microsoft.com/en-us/azure/cognitive-services/bing-image-search/tutorial-bing-image-search-single-page-app

https://cloud.tencent.com/developer/article/1109410?utm_source=pocket_mylist

Coding, Summary

Tagged in:

API, Python

Show Comments

About the Author

XFishalways

Fisher不钓鱼川大21级在读网络空间安全专业 7年前的围棋业余5段素描彩铅水粉国画书法童子功拥有者 Hala Madrid Letsgo Pat Self-Commentator Analyzer ing 七年前的业余5段 AI Skipper nparadigm申工智能yyds 飞禽岛少年Lee Sedol

View All Articles

发表回复取消回复

About the Author

XFishalways

Other stories

数理逻辑离散数学

他还是那个他但他们好像更有味了

crAPI writeup

“退役归来”的业余5段聊聊围棋AI

Press ESC to close

Bing Image Search API

发表回复 取消回复

About the Author

XFishalways

You might also like

BGP

大二实训记录 Web安全

crAPI writeup

Other stories

数理逻辑 离散数学

他还是那个他 但他们好像更有味了

发表回复取消回复

数理逻辑离散数学

他还是那个他但他们好像更有味了