Understanding Chrome V8 - Chapter 20: How Compilers and Parsers Work by@huidou

Understanding Chrome V8 - Chapter 20: How Compilers and Parsers Work

tldt arrow
Read on Terminal Reader

Too Long; Didn't Read

"Let's Understand Chrome V8" are serial technology articles that explain the V8 code, it covers many V8 kernel functions and fundamentals.
featured image - Understanding Chrome V8 - Chapter 20: How Compilers and Parsers Work
灰豆 HackerNoon profile picture

@huidou

灰豆

a big fan of chrome V8

About @huidou
LEARN MORE ABOUT @HUIDOU'S EXPERTISE AND PLACE ON THE INTERNET.
react to story with heart

Welcome to other chapters of Let’s Understand Chrome V8

In the past articles, we talked a lot about the JavaScript compiler of V8. The parser, the scanner, and the bytecode, as well as their fundamentals, kernel code, and key structure, were all we covered.

In the next articles, we will walk through the compiler workflow, and watch how V8 transforms the JavaScript code into bytecode step by step.

image

Figure 1 shows the workflow, it starts from the JavaScript code you wrote, goes through the scanner and parser, and finally generates bytecodes.

Note: In this article, I use d8.exe instead of v8.exe, since d8.exe can print out to the terminal directly as well as the d8 is very light versus v8.

1. Read JavaScript code

Below is our test case:

function ignition(s) {
    this.slogan=s;
	this.start=function(){eval('console.log(this.slogan);')}
}
worker = new ignition("here we go!");
worker.start();

Below is the Execute() that is responsible for executing the JavaScript code. Quite simply, it first reads the JavaScript code from a file, then compiles and executes the code.

1.  bool SourceGroup::Execute(Isolate* isolate) {
2.  //............omit..................
3.      // Use all other arguments as names of files to load and run.
4.      HandleScope handle_scope(isolate);
5.      Local<String> file_name =
6.          String::NewFromUtf8(isolate, arg, NewStringType::kNormal)
7.              .ToLocalChecked();
8.      Local<String> source = ReadFile(isolate, arg);
9.      if (source.IsEmpty()) {
10.        printf("Error reading '%s'\n", arg);
11.        base::OS::ExitProcess(1);
12.      }
13.      Shell::set_script_executed();
14.      if (!Shell::ExecuteString(isolate, source, file_name, Shell::kNoPrintResult,
15.                                Shell::kReportExceptions,
16.                                Shell::kProcessMessageQueue)) {
17.        success = false;
18.        break;
19.      }
20.    }
21.    return success;
22.  }

In line 5, the NewFromUtf8 gets the file name which is our case. In line 8, the ReadFile gets file content which is our case actually.

1.  Local<String> Shell::ReadFile(Isolate* isolate, const char* name) {
2.  //only the most important parts...............................
3.    char* chars = static_cast<char*>(file->memory());
4.    Local<String> result;
5.    if (i::FLAG_use_external_strings && i::String::IsAscii(chars, size)) {
6.      String::ExternalOneByteStringResource* resource =
7.          new ExternalOwningOneByteStringResource(std::move(file));
8.      result = String::NewExternalOneByte(isolate, resource).ToLocalChecked();
9.    } else {
10.      result = String::NewFromUtf8(isolate, chars, NewStringType::kNormal, size)
11.                   .ToLocalChecked();
12.    }
13.    return result;
14.  }

Line 8 and line 10 return the JavaScript code depending on the type is ExternalOneByte or UTF8. Here, our case is UTF8, regarding String::ExternalOneByteStringResource, we will talk about it in the future.

Back to SourceGroup::Execute(), and step into Shell::ExecuteString at line 14, its source is given below:

1.  bool Shell::ExecuteString(Isolate* isolate, Local<String> source,
2.                      Local<Value> name, PrintResult print_result,
3.                      ReportExceptions report_exceptions,
4.                      ProcessMessageQueue process_message_queue) {
5.  	//omit............................
6.  bool success = true;
7.  {
8.    if (options.compile_options == ScriptCompiler::kConsumeCodeCache) {
9.  	//omit............................
10.       } else if (options.stress_background_compile) {
11.  	//omit............................
12.       } else {
13.         ScriptCompiler::Source script_source(source, origin);
14.         maybe_script = ScriptCompiler::Compile(context, &script_source,
15.                                                options.compile_options);
16.       }
17.       Local<Script> script;
18.       if (!maybe_script.ToLocal(&script)) {
19.         // Print errors that happened during compilation.
20.         if (report_exceptions) ReportException(isolate, &try_catch);
21.         return false;
22.       }
23.       if (options.code_cache_options ==
24.           ShellOptions::CodeCacheOptions::kProduceCache) {
25.  	//omit............................
26.       }
27.       maybe_result = script->Run(realm);//here is executing bytecode.
28.  }
29.  }

In line 13, our JavaScript is wrapped into a variable script_source which includes the line and column offset. Since V8 only compiles the JavaScript function that is executing exactly not the full JavaScript code, the variable script_source help the compiler record the compilation information.

In line 14, start to compile JavaScript.

2. Parser initialization

In the workflow, the first part is the scanner, the second is the parser. Actually, the scanner is passive and the parser is active, this means that the parser takes out a token from the compilation cache actively, and the scanner is waked up by the parser once the cache miss, and to generate tokens and fill the cache.

Below is CompileUnboundInternal() called by ScriptCompiler::Compile that at above line 14.

1.  MaybeLocal<UnboundScript> ScriptCompiler::CompileUnboundInternal(
2.      Isolate* v8_isolate, Source* source, CompileOptions options,
3.      NoCacheReason no_cache_reason) {
4.  //omit...............
5.    i::Handle<i::String> str = Utils::OpenHandle(*(source->source_string));
6.    i::Handle<i::SharedFunctionInfo> result;
7.    i::Compiler::ScriptDetails script_details = GetScriptDetails(
8.        isolate, source->resource_name, source->resource_line_offset,
9.        source->resource_column_offset, source->source_map_url,
10.        source->host_defined_options);
11.    i::MaybeHandle<i::SharedFunctionInfo> maybe_function_info =
12.        i::Compiler::GetSharedFunctionInfoForScript(
13.            isolate, str, script_details, source->resource_options, nullptr,
14.            script_data, options, no_cache_reason, i::NOT_NATIVES_CODE);
15.    if (options == kConsumeCodeCache) {
16.      source->cached_data->rejected = script_data->rejected();
17.    }
18.    delete script_data;
19.    has_pending_exception = !maybe_function_info.ToHandle(&result);
20.    RETURN_ON_FAILED_EXECUTION(UnboundScript);
21.    RETURN_ESCAPED(ToApiHandle<UnboundScript>(result));
22.  }

The above function generates the UnboundInternal stuff, the line 11 tells us that the stuff is just a Sharedfunction actually. But, what is Bind? A Sharedfunction cannot be executed directly, V8 needs to match a context to the Sharedfunction, the “match” is just the bing.

Let’s step into GetSharedFunctionInfoForScript.

1.  MaybeHandle<SharedFunctionInfo> Compiler::GetSharedFunctionInfoForScript(
2.      Isolate* isolate, Handle<String> source,
3.      const Compiler::ScriptDetails& script_details,
4.   .................) {
5.  //omit.........................
6.  		{
7.      maybe_result = compilation_cache->LookupScript(
8.          source, script_details.name_obj, script_details.line_offset,
9.          script_details.column_offset, origin_options, isolate->native_context(),
10.          language_mode);
11.    }
12.    if (maybe_result.is_null()) {
13.      ParseInfo parse_info(isolate);
14.      // No cache entry found compile the script.
15.      NewScript(isolate, &parse_info, source, script_details, origin_options,
16.                natives);
17.      // Compile the function and add it to the isolate cache.
18.      if (origin_options.IsModule()) parse_info.set_module();
19.      parse_info.set_extension(extension);
20.      parse_info.set_eager(compile_options == ScriptCompiler::kEagerCompile);
21.      parse_info.set_language_mode(
22.          stricter_language_mode(parse_info.language_mode(), language_mode));
23.      maybe_result = CompileToplevel(&parse_info, isolate, &is_compiled_scope);
24.      Handle<SharedFunctionInfo> result;
25.      if (extension == nullptr && maybe_result.ToHandle(&result)) {
26.        DCHECK(is_compiled_scope.is_compiled());
27.        compilation_cache->PutScript(source, isolate->native_context(),
28.                                     language_mode, result);
29.      } else if (maybe_result.is_null() && natives != EXTENSION_CODE) {
30.        isolate->ReportPendingMessages();
31.      }
32.    }
33.    return maybe_result;
34.  }

Line 7 is looking up the compilation cache, that I mentioned in the last article.

Line 13, the parse_info is a wrapper for Parser, like the variable script_source mentioned upfront.

Line 15, the Parser_info initialization, code is given below:

1.  Handle<Script> NewScript(Isolate* isolate, ParseInfo* parse_info,
2.                           Handle<String> source,
3.                           Compiler::ScriptDetails script_details,
4.                           ScriptOriginOptions origin_options,
5.                           NativesFlag natives) {
6.    Handle<Script> script =
7.        parse_info->CreateScript(isolate, source, origin_options, natives);
8.    Handle<Object> script_name;
9.    if (script_details.name_obj.ToHandle(&script_name)) {
10.      script->set_name(*script_name);
11.      script->set_line_offset(script_details.line_offset);
12.      script->set_column_offset(script_details.column_offset);
13.    }
14.    Handle<Object> source_map_url;
15.    if (script_details.source_map_url.ToHandle(&source_map_url)) {
16.      script->set_source_mapping_url(*source_map_url);
17.    }
18.    Handle<FixedArray> host_defined_options;
19.    if (script_details.host_defined_options.ToHandle(&host_defined_options)) {
20.      script->set_host_defined_options(*host_defined_options);
21.    }
22.    return script;
23.  }

From line 6 to line 12 wrap the JavaScript code that is our case into the variable script, then initialize the line_offset and column_offset. It is like the script_source really as I said.

Back to Compiler::GetSharedFunctionInfoForScript(), and step into CompileToplevel() at line 23.

1.  MaybeHandle<SharedFunctionInfo> CompileToplevel(
2.      ParseInfo* parse_info, Isolate* isolate,
3.      IsCompiledScope* is_compiled_scope) {
4.  //omit.......................
5.    if (parse_info->literal() == nullptr &&
6.        !parsing::ParseProgram(parse_info, isolate)) {
7.      return MaybeHandle<SharedFunctionInfo>();
8.    }
9.  //omit........................
10.    MaybeHandle<SharedFunctionInfo> shared_info =
11.        GenerateUnoptimizedCodeForToplevel(
12.            isolate, parse_info, isolate->allocator(), is_compiled_scope);
13.    if (shared_info.is_null()) {
14.      FailWithPendingException(isolate, parse_info,
15.                               Compiler::ClearExceptionFlag::KEEP_EXCEPTION);
16.      return MaybeHandle<SharedFunctionInfo>();
17.    }
18.    FinalizeScriptCompilation(isolate, parse_info);
19.    return shared_info;
20.  }

In line 5, the literal() returns the abstract syntax tree(AST), and V8 starts up the compiler to generate AST if is nullptr.

The first time, the AST is null, and steps into ParseProgram.

1.  bool ParseProgram(ParseInfo* info, Isolate* isolate,
2.                    ReportErrorsAndStatisticsMode mode) {
3.  //omit............................
4.    Parser parser(info);
5.    FunctionLiteral* result = nullptr;
6.    result = parser.ParseProgram(isolate, info);
7.    info->set_literal(result);
8.    if (result) {
9.      info->set_language_mode(info->literal()->language_mode());
10.      if (info->is_eval()) {
11.        info->set_allow_eval_cache(parser.allow_eval_cache());
12.      }
13.    }
14.    if (mode == ReportErrorsAndStatisticsMode::kYes) {
15.  //omit.............................
16.    }
17.    return (result != nullptr);
18.  }

In line 5, create the parser, namely, parser initialization.

1.  Parser::Parser(ParseInfo* info)
2.      : ParserBase<Parser>(info->zone(), &scanner_, info->stack_limit(),
3.                           info->extension(), info->GetOrCreateAstValueFactory(),
4.                           info->pending_error_handler(),
5.                           info->runtime_call_stats(), info->logger(),
6.                           info->script().is_null() ? -1 : info->script()->id(),
7.                           info->is_module(), true),
8.        info_(info),
9.        scanner_(info->character_stream(), info->is_module()),
10.        preparser_zone_(info->zone()->allocator(), ZONE_NAME),
11.        reusable_preparser_(nullptr),
12.        mode_(PARSE_EAGERLY),  // Lazy mode must be set explicitly.
13.        source_range_map_(info->source_range_map()),
14.        target_stack_(nullptr),
15.        total_preparse_skipped_(0),
16.        consumed_preparse_data_(info->consumed_preparse_data()),
17.        preparse_data_buffer_(),
18.        parameters_end_pos_(info->parameters_end_pos()) {
19.    bool can_compile_lazily = info->allow_lazy_compile() && !info->is_eager();
20.    set_default_eager_compile_hint(can_compile_lazily
21.                                       ? FunctionLiteral::kShouldLazyCompile
22.                                       : FunctionLiteral::kShouldEagerCompile);
23.    allow_lazy_ = info->allow_lazy_compile() && info->allow_lazy_parsing() &&
24.                  info->extension() == nullptr && can_compile_lazily;
25.    set_allow_natives(info->allow_natives_syntax());
26.    set_allow_harmony_dynamic_import(info->allow_harmony_dynamic_import());
27.    set_allow_harmony_import_meta(info->allow_harmony_import_meta());
28.    set_allow_harmony_nullish(info->allow_harmony_nullish());
29.    set_allow_harmony_optional_chaining(info->allow_harmony_optional_chaining());
30.    set_allow_harmony_private_methods(info->allow_harmony_private_methods());
31.    for (int feature = 0; feature < v8::Isolate::kUseCounterFeatureCount;
32.         ++feature) {
33.      use_counts_[feature] = 0;
34.    }
35.  }

From line 8 to line 18, take out the compilation information, such as scanner, and compilation mode.

From line 9 to line 23, the important stuff is can_compile_lazily.

Line 25 enables the native commands that start with %.

Summary

  • V8 uses UTF16 to encode JavaScript source code;
  • V8 uses v8::internal::source to manage our JavaScript code;
  • First, look up the compilation cache, then start up the compiler if the cache miss;
  • The scanner is passive and the parser is active.

Okay, that wraps it up for this share. I’ll see you guys next time, take care!

Please reach out to me if you have any issues. WeChat: qq9123013 Email: [email protected]

Also Published here

RELATED STORIES

L O A D I N G
. . . comments & more!
Hackernoon hq - po box 2206, edwards, colorado 81632, usa