midway的使用教程-技术圈

一、写在前面

先说下本文的背景，这是一道笔者遇到的Node后端面试题，遂记录下，通过本文的阅读，你将对楼下知识点有所了解：

midway项目的创建与使用
typescript在Node项目中的应用
如何基于Node自身API封装请求
cheerio在项目中的应用
正则表达式在项目中的应用
单元测试

二、midway项目的创建和使用

第一步：输入命令**npm init midway**初始化midway项目

第二步：选择**koa-v3 - A web application boilerplate with midway v3(koa)**,按下回车

➜  www npm init midway
npx: installed 1 in 4.755s
? Hello, traveller.
  Which template do you like? …

 ⊙ v3
▸ koa-v3 - A web application boilerplate with midway v3(koa)
  egg-v3 - A web application boilerplate with midway v3(egg)
  faas-v3 - A serverless application boilerplate with midway v3(faas)
  component-v3 - A midway component boilerplate for v3

 ⊙ v2
  web - A web application boilerplate with midway and Egg.js
  koa - A web application boilerplate with midway and koa

第三步：输入你要创建的项目名称,例如**“midway-project”****， ****What name would you like to use for the new project? ‣ midway-project**

第四步：跟着提示走就好了,分别执行**cd midway-project**和**npm run dev**, 这个时候如果你没有特别设置的话，打开**http://localhost:7001**就可以看到效果了

➜  www npm init midway
npx: installed 1 in 4.755s
✔ Hello, traveller.
  Which template do you like? · koa-v3 - A web application boilerplate with midway v3(koa)
✔ What name would you like to use for the new project? · midway-project
Successfully created project midway-project
Get started with the following commands:

$ cd midway-project
$ npm run dev


Thanks for using Midway

Document ❤ Star: https://github.com/midwayjs/midway



   ╭────────────────────────────────────────────────────────────────╮
   │                                                                │
   │      New major version of npm available! 6.14.15 → 8.12.1      │
   │   Changelog: https://github.com/npm/cli/releases/tag/v8.12.1   │
   │               Run npm install -g npm to update!                │
   │                                                                │
   ╰────────────────────────────────────────────────────────────────╯

➜  www

具体的官网已经写的很详细了，不再赘述，参见：

三、如何抓取百度首页的内容

3.1、基于node自身API封装请求

在node.js的https模块有相关的get请求方法可以获取页面元素，具体的如下请参见：，我把它封装了一下

import { get } from 'https';

async function getPage(url = 'https://www.baidu.com/'): Promise {
  let data = '';
  return new Promise((resolve, reject) => {
    get(url, res => {
      res.on('data', chunk => {
        data += chunk;
      });

      res.on('error', err => reject(err));

      res.on('end', () => {
        resolve(data);
      });
    });
  });
}

额，你要测试这个方法，在node环境的话，其实也很简单的，这样写

(async () => {
  const ret = await getPage();
  console.log('ret:', ret);
})();

四、如何获取对应标签元素的属性

题目是，从获取的HTML源代码文本里，解析出id=lg的div标签里面的img标签，并返回此img标签上的src属性值

4.1、cheerio一把梭

如果你没赶上JQuery时代，那么其实你可以学下cheerio这个库，它有这个JQuery类似的API ------为服务器特别定制的，快速、灵活、实施的jQuery核心实现.具体的参见：，github地址是：

在了解了楼上的知识点以后呢，那其实就很简单了，调调API出结果。下文代码块的意思是，获取id为lg的div标签，获取它的子标签的img标签，然后调用了ES6中数组的高阶函数map，这是一个幂等函数，会返回与输入相同的数据结构的数据，最后调用get获取一下并字符串一下。

 @Get('/useCheerio')
  async useCheerio(): Promise> {
    const ret = await getPage();
    const $ = load(ret);
    const imgSrc = $('div[id=lg]')
      .children('img')
      .map(function () {
        return $(this).attr('src');
      })
      .get()
      .join(',');

    return packResp({ func: 'useCheerio', imgSrc });
  }

4.2、正则一把梭

看到一大坨字符串，嗯，正则也是应该要想到的答案。笔者正则不太好，这里写不出一步到位的正则，先写出匹配id为lg的div的正则，然后进一步匹配对应的img标签的src属性，是的，一步不行，那咱就走两步，最终结果和走一步是一样的。

 @Get('/useRegExp')
  async useRegExp(): Promise> {
    const ret = await getPage();
    // 匹配id为lg的div正则
    const reDivLg = /(?<=)(.*?)(?=<\/div>)/gi;
    // 匹配img标签的src属性
    const reSrc = //i;
    const imgSrc = ret.match(reDivLg)[0].match(reSrc)[1];

    return packResp({ func: 'useRegExp', imgSrc });
  }

五、单元测试

这里要实现两个测试点是，1、如果接口请求时间超过1秒钟，则Assert断言失败， 2、如果接口返回值不等于"//www.baidu.com/img/bd_logo1.png"，则Assert断言失败 midway集成了jest的单元测试, 官网已经写的很详细了，具体的参见：

关于1秒钟这事，我们可以计算下请求的时间戳，具体的如下：

const startTime = Date.now();
// make request
const result: any = await createHttpRequest(app).get('/useRegExp');
const cost = Date.now() - startTime;

最后再断言下就好了 expect(cost).toBeLessThanOrEqual(1000);

最终的代码如下：

  it.only('should GET /useRegExp', async () => {
    const startTime = Date.now();
    // make request
    const result: any = await createHttpRequest(app).get('/useRegExp');
    const cost = Date.now() - startTime;

    // 2. 如果接口请求时间超过1秒钟，则Assert断言失败
    const {
      data: { imgSrc },
    } = result.body as IPackResp;

    expect(imgSrc).not.toBe('//www.baidu.com/img/bd_logo1.png');
    notDeepStrictEqual(imgSrc, '//www.baidu.com/img/bd_logo1.png');
    expect(cost).toBeLessThanOrEqual(1000);
    expect(imgSrc).toBe('//www.baidu.com/img/flexible/logo/pc/index.png');
    deepStrictEqual(imgSrc, '//www.baidu.com/img/flexible/logo/pc/index.png');
  });

  it.only('should GET /useCheerio', async () => {
    const startTime = Date.now();
    // make request
    const result: any = await createHttpRequest(app).get('/useCheerio');
    const cost = Date.now() - startTime;

    const {
      data: { imgSrc },
    } = result.body as IPackResp;

    expect(imgSrc).not.toBe('//www.baidu.com/img/bd_logo1.png');
    notDeepStrictEqual(imgSrc, '//www.baidu.com/img/bd_logo1.png');
    expect(cost).toBeLessThanOrEqual(1000);
    expect(imgSrc).toBe('//www.baidu.com/img/flexible/logo/pc/index.png');
    deepStrictEqual(imgSrc, '//www.baidu.com/img/flexible/logo/pc/index.png');
  });

六、写在后面

这里，如果你眼睛够细，你会发现一个很有意思的现象，你从浏览器打开百度首页，然后控制台输出楼上的需求是这样的

const lg = document.getElementById('lg');
undefined
lg.childNodes.forEach((node) => { if(node.nodeName.toLowerCase() === 'img') { console.log(node.src) } })
2VM618:1 https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/logo/logo_white-d0c9fe2af5.png
VM618:1 https://www.baidu.com/img/PCfb_5bf082d29588c07f842ccde3f97243ea.png
undefined

然而，通过Node自带的https库，你会发现//www.baidu.com/img/flexible/logo/pc/index.png这个咦，震惊.jpg. 发生了什么？莫不是度度做了什么处理？于是乎，我用wget测试了下wget -O baidu.html [https://www.baidu.com](https://www.baidu.com), 发现正常发请求是这样的

➜  tmp wget -O baidu.html https://www.baidu.com
--2022-06-10 00:36:17--  https://www.baidu.com/
Resolving www.baidu.com (www.baidu.com)... 182.61.200.6, 182.61.200.7
Connecting to www.baidu.com (www.baidu.com)|182.61.200.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2443 (2.4K) [text/html]
Saving to: ‘baidu.html’

baidu.html                                                      100%[=====================================================================================================================================================>]   2.39K  --.-KB/s    in 0s

2022-06-10 00:36:18 (48.3 MB/s) - ‘baidu.html’ saved [2443/2443]

➜  tmp cat baidu.html

 百度一下，你就知道