在里面 在这个关于如何构建 WebAssembly 编程语言的系列文章中，我们构建了一个词法分析器。在这篇文章中，我们将介绍编译器的下一阶段，解析。解析是我们编译器的一部分，它获取词法分析器生成的令牌流并将其转换为 （AST）。 最后发帖 抽象语法树  AST 是一种树状数据结构，它将标记组织成一个逻辑层次结构，可以更容易地转换为机器代码。值得庆幸的是，因为 wispy 是一种 S 表达式语言，所以我们的代码本质上 是 AST。采用以下令牌流： 已经   “(“, “add”, “3”, “(“, “sub”, “2”, “1”, “)”, “)” 每组括号代表一个子树，其中第一个标记是操作符节点，后面的标记是它的操作数。如果我们在当前集合关闭之前遇到另一个左括号，我们知道它表示一个操作数，它本身就是一个子树。上面的令牌流将被组织成一个看起来像这样的树： 如果您有兴趣为更复杂的类 C 语法编写解析器，请参阅我以前的 系列。 构建编程语言 更多关于 AST 正如我们对词法分析器所做的那样，我们将从定义我们的类型开始。这些类型将定义我们的 AST 的结构。每种类型都代表一个“节点”，即简介中我们图表中的圆圈。这是基本节点。我们将忽略它们，因为它们与我们在词法分析器中定义的标记没有太大区别：   // src/types/ast.mts export type IntNode = { type: "int"; value: number; }; export type FloatNode = { type: "float"; value: number; }; export type IdentifierNode = { type: "identifier"; identifier: string; }; export type TypedIdentifierNode = { type: "typed-identifier"; // Note that here, we break down the identifier into its components identifier: string; typeIdentifier: string; };  AST 的一个新概念是  。  是由一组其他节点组成的表达式。 BlockNode BlockNode 例如，  是一个包含三个节点的块： (add 1 2) 计算结果为函数的标识符  。 add 一个简单地计算为数字 的 Int 。 1 一个简单地计算为数字 的 Int 。 2 如何评估块本身取决于编译器。我们将在下一篇文章中讨论。 这是定义：   // src/types/ast.mts export type BlockNode = { type: "block"; expressions: AstNode[]; }; 最后，我们定义  。与词法分析器中的 类型一样，  是一个可区分的联合，可以是我们之前定义的任何其他节点之一： AstNode Token AstNode   export type AstNode = IntNode | FloatNode | IdentifierNode | TypedIdentifierNode | BlockNode; 您可能已经注意到 有一个 数组，由于  ，  可以包含子  。换句话说，  是一种递归类型。这种递归表示可以生孩子的孩子的能力是我们 AST 的基础。这是允许 AST 中的树形成的地方。 BlockNode AstNode AstNode 可以是 BlockNode BlockNode BlockNodes BlockNode 至此 完成，看起来应该像 . src/types/ast.mts 这个文件 现在从 导出类型，就像我们对标记类型所做的那样： src/types/index.mts   // src/types/index.mts export * from "./token.mjs"; export * from "./ast.mjs"; 构建 AST 现在我们已经定义了 AST，是时候构建一个了。 创建一个新的 文件并添加我们将使用的所有导入： src/parser.mts   // src/parser.mts import { Token, IdentifierNode, TypedIdentifierNode, IdentifierToken, TypedIdentifierToken, FloatToken, FloatNode, IntToken, IntNode, AstNode, BlockNode, } from "./types/index.mjs"; 现在我们可以定义我们的顶级解析函数。 parse 函数获取词法分析器生成的标记并返回一个 作为我们树的根。 BlockNode   // src/parser.mts export const parse = (tokens: Token[]): BlockNode => { const blocks: BlockNode[] = []; // This loop is run as long as there are tokens to consume while (tokens.length) { // consumeTokenTree converts an array of tokens into a tree of tokens, more on that later. const tree = consumeTokenTree(tokens); // parseBlock turns our new tree of tokens into an actual BlockNode, recursively. More on that later as well. blocks.push(parseBlock(tree)); } // Finally we return the top level BlockNode return { type: "block", expressions: blocks, }; }; 接下来，我们定义 函数。  将一个扁平的令牌数组转换为一个令牌树。 consumeTokenTree consumeTokenTree 鉴于这个纤细的表达：   (add (sub 3 1) (sub 5 2)) 词法分析器将产生这个标记数组：   // Note: I've simplified the Token format to just be strings to keep things short ["(", "add", "(", "sub", "3", "1", ")", "(", "sub", "5", "2", ")", ")"];   将采用该平面数组并将其变成一棵树。这就像将一组括号标记 之间的每个标记放入一个数组中一样简单。所以我们上面的令牌数组变成了这个令牌树： consumeTokenTree ()   ["add", [, "sub", "3", "1"], ["sub", "5", "2"]]; 这是 的实际定义： consumeTokenTree   // src/parser.mts // This is token besides for the bracket tokens export type NonBracketToken = Exclude<Token, "parenthesis" | "square-bracket">; // The token tree is made of NonBracketTokens and other TokenTrees export type TokenTree = (NonBracketToken | TokenTree)[]; const consumeTokenTree = (tokens: Token[]): TokenTree => { const tree: TokenTree = []; // Ensures the first token is a left bracket and then discards it, defined below this function. consumeLeftBracket(tokens); while (tokens.length) { // Preview the next token const token = tokens[0]; // Check to see if the next token is a left bracket. if (token.type === "bracket" && getBracketDirection(token) === "left") { // If it is, we just ran into a sub-TokenTree. So we can simply call this function within // itself. Gotta love recursion. tree.push(consumeTokenTree(tokens)); continue; } // Check to see if the next token is a right bracket if (token.type === "bracket" && getBracketDirection(token) === "right") { // If it is, we just found the end of the tree on our current level tree.shift(); // Discard the right bracket break; // Break the loop } // If the token isn't a bracket, it can simply be added to the tree on this level tree.push(token); // Consume / discard the token from the main tokens array tokens.shift(); } // Return the tree. Don't forget to check out the helper functions below! return tree; }; const consumeLeftBracket = (tokens: Token[]) => { const bracketDirection = getBracketDirection(tokens[0]); if (bracketDirection !== "left") { throw new Error("Expected left bracket"); } return tokens.shift(); }; const getBracketDirection = (token: Token): "left" | "right" => { if (token.type !== "bracket") { throw new Error(`Expected bracket, got ${token.type}`); } // If we match a left bracket return left if (/[\(\[]/.test(token.value)) return "left"; // Otherwise return right return "right"; }; 现在我们有了一个令牌树，我们需要把它变成一个块。为此，我们创建了一个 函数，该函数将树作为其输入并返回一个  ： parseBlock BlockNode   1const parseBlock = (block?: TokenTree): BlockNode => { return { type: "block", // This is where the recursive magic happens expressions: block.map(parseExpression), }; }; 您可能已经注意到，  使用尚未编写的 函数映射树的每个项目。  采用 或 并将其转换为相应的 类型。 parseBlock parseExpression parseExpression TokenTree NonBracketToken AstNode 这是定义：   const parseExpression = (expression?: TokenTree | NonBracketToken): AstNode => { // If the expression is an Array, we were passed another TokenTree, so we can // pass the expression back to the parseBlock function if (expression instanceof Array) { return parseBlock(expression); } // The mapping here is pretty straight forward. Match the token type and pass the // expression on to a more specific expression parser. if (isTokenType(expression, "identifier")) return parseIdentifier(expression); if (isTokenType(expression, "typed-identifier")) return parseTypedIdentifier(expression); if (isTokenType(expression, "float")) return parseFloatToken(expression); if (isTokenType(expression, "int")) return parseIntToken(expression); throw new Error(`Unrecognized expression ${JSON.stringify(expression)}`); }; 让我们定义 函数。这个函数非常简洁，展示了 TypeScript 最强大的特性之一， .简单地说，  测试表达式并将类型缩小到特定的  。这允许 TypeScript 确定我们正在将正确的标记传递给它们相应的解析器函数。 isTokenType 自定义类型守卫 isTokenType TokenType 这是定义：   export const isTokenType = <T extends Token["type"]>( item: TokenTree | NonBracketToken | undefined, type: T ): item is Extract<Token, { type: T }> => { return isToken(item) && item.type === type; }; const isToken = (item?: TokenTree | NonBracketToken): item is NonBracketToken => { return !(item instanceof Array); }; 那里发生了很多事情，所以让我们来看看它。首先，我们有一个通用定义，   。这本质上是说 T 必须是 字段的可能值之一。 Typescript 足够聪明，知道这意味着 T 必须是  。 <T extends Token["type"]> Token.type "int" | "float" | "identifier" | "typed-identifier" | "bracket 下一段有趣的代码是返回类型谓词  。该谓词告诉 TypeScript，如果 的返回值为 true，则 必须是其类型与作为 参数传递的字符串匹配的  。 item is Extract<Token, { type: T }> isTokenType item type Token 在实践中，这意味着如果我们将未知的 传递给  ，打字稿将能够正确地将值缩小为更具体的标记，例如  。 Token isTokenType IntToken 现在我们已经定义了自定义类型保护，我们可以定义实际的令牌解析器。前三个很简单；他们基本上只是返回令牌的副本或稍作修改的副本：   const parseFloatToken = (float: FloatToken): FloatNode => ({ ...float }); const parseIntToken = (int: IntToken): IntNode => ({ ...int }); const parseIdentifier = (identifier: IdentifierToken): IdentifierNode => { return { type: "identifier", identifier: identifier.value, }; }; 最后的解析器是  。请记住，类型化标识符的形式为  。解析它就像用冒号分割字符串一样简单。返回数组的第一个值是  ，第二个是  。这是定义： parseTypedIdentifier identifier:type identifier type   const parseTypedIdentifier = (identifier: TypedIdentifierToken): TypedIdentifierNode => { const vals = identifier.value.split(":"); return { type: "typed-identifier", identifier: vals[0], typeIdentifier: vals[1], }; }; 这是完成的文件 这就是一个工作解析器所需的所有代码。在继续之前，让我们更新主 文件以查看解析器的输出： src/index.mts   // src/index.mts #!/usr/bin/env node import { readFileSync } from "fs"; import { lex } from "./lexer.mjs"; import { parse } from "./parser.mjs"; const file = process.argv[2]; const input = readFileSync(file, "utf8"); const tokens = lex(input); const ast = parse(tokens); console.log(JSON.stringify(tokens, undefined, 2)); 构建并运行项目：   npx tsc wispy example.wispy 如果一切顺利，输出应该如下所示 . 这个 这样，解析器就完成了。我们现在可以将来自词法分析器的标记流转换为 AST。在下一篇文章中，我们可以进入有趣的部分：生成和运行机器可读的代码。 立即 ，了解 Courier 如何使产品通知变得令人愉悦！ 申请演示

Lexer

Orchestrate Unlimited Notifications From 1 API | Sign-up today!

Courier simplifies triggering and sending notifications from your app with one API and drag and drop UI.


2021 - HackerNoon Contributor of the Year - SOFTWARE-ARCHITECTURE

2022 - HackerNoon Contributor of the Year - Business Strategy

2022 - HackerNoon Contributor of the Year - Digital Transformation

2022 - HackerNoon Contributor of the Year - Graphql

2022 - HackerNoon Contributor of the Year - Rest Api

2022 - HackerNoon Contributor of the Year - Software Architecture

2022 - Tech Ceo of the Year

2021 - HackerNoon Contributor of the Year - APP-DEVELOPMENT

Nominated for 2022 - HackerNoon Contributor of the Year - Rest Api

Nominated for 2022 - Tech Ceo of the Year

Nominated for 2022 - HackerNoon Contributor of the Year - Digital Transformation

如何为乐趣和利润构建 WebAssembly 语言：解析

About Author

註釋

標籤

这篇文章刊登在

Related Stories

点击赚钱：Telegram 可能会在 Solana 之前吸引下一个 100 亿加密用户

成功云迁移的完整指南：策略和最佳实践

看不见的层面：为什么用户访谈是不可替代的资产

使用这 18 种开发工具来提高你的工作效率 🚀🔥

点击赚钱：Telegram 可能会在 Solana 之前吸引下一个 100 亿加密用户

成功云迁移的完整指南：策略和最佳实践

看不见的层面：为什么用户访谈是不可替代的资产

使用这 18 种开发工具来提高你的工作效率 🚀🔥

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps