Understanding Chrome V8 - Chapter 22: Compiler Workflow Bytecode, Constant Pool

Written by huidou | Published 2022/10/16
Tech Story Tags: understanding-chrome-v8 | javascript-development | web-development | google-chrome | software-development | programming | compiler | workflow

TLDR"Let's Understand Chrome V8" are serial technology articles that explain the V8 code, it covers many V8 kernel functions and fundamentals.via the TL;DR App

In this article, we’ll dive into bytecode, and examine the bytecode generation and constant pool in detail.

1. Bytecode Generation

The GenerateUnoptimizedCodeForToplevel is responsible for transforming an AST tree into bytecodes.

1.  MaybeHandle<SharedFunctionInfo> GenerateUnoptimizedCodeForToplevel(
2.      Isolate* isolate, ParseInfo* parse_info, AccountingAllocator* allocator,
3.      IsCompiledScope* is_compiled_scope) {
4.  //omit............
5.     while (!functions_to_compile.empty()) {
6.  //omit............
7.       if (job->ExecuteJob() == CompilationJob::FAILED ||
8.           FinalizeUnoptimizedCompilationJob(job.get(), shared_info, isolate) ==
9.               CompilationJob::FAILED) {
10.           return MaybeHandle<SharedFunctionInfo>();
11.         }
12.         if (FLAG_stress_lazy_source_positions) {
13.           SharedFunctionInfo::EnsureSourcePositionsAvailable(isolate, shared_info);
14.         }
15.         if (shared_info.is_identical_to(top_level)) {
16.           *is_compiled_scope = shared_info->is_compiled_scope();
17.           DCHECK(is_compiled_scope->is_compiled());
18.         }
19.       }
20.       parse_info->ResetCharacterStream();
21.       return top_level;
22.     }

Line 7 job->ExecuteJob() reads an AST tree from parse_info and generates bytecode. By debugging it, we step into the following function which analyses every AST node and generates a corresponding bytecode.

1.  void BytecodeGenerator::GenerateBytecodeBody() {
2.    VisitArgumentsObject(closure_scope()->arguments());
3.    VisitNewTargetVariable(closure_scope()->new_target_var());
4.    FunctionLiteral* literal = info()->literal();
5.    if (IsResumableFunction(literal->kind())) {
6.      BuildGeneratorObjectVariableInitialization();
7.    }
8.    if (FLAG_trace) builder()->CallRuntime(Runtime::kTraceEnter);
9.  //省略.........................
10.    BuildIncrementBlockCoverageCounterIfEnabled(literal, SourceRangeKind::kBody);
11.    VisitDeclarations(closure_scope()->declarations());
12.    VisitModuleNamespaceImports();
13.    builder()->StackCheck(literal->start_position());
14.    if (IsBaseConstructor(function_kind())) {
15.      if (literal->requires_brand_initialization()) {
16.        BuildPrivateBrandInitialization(builder()->Receiver());
17.      }
18.      if (literal->requires_instance_members_initializer()) {
19.        BuildInstanceMemberInitialization(Register::function_closure(),
20.                                          builder()->Receiver());
21.      }
22.    }
23.    VisitStatements(literal->body());
24.    if (!builder()->RemainderOfBlockIsDead()) {
25.      builder()->LoadUndefined();
26.      BuildReturn();
27.    }
28.  }

In the above function, lines 11, 12, and 23 first analyze the declaration, import modules, and statements respectively, and then generate corresponding bytecodes.

In our case, we will examine line 11 VisitDeclarations() in more depth. In VisitDeclarations(), the following function is called.

1.  BytecodeArrayBuilder& BytecodeArrayBuilder::LoadConstantPoolEntry(
2.      size_t entry) {
3.    OutputLdaConstant(entry);
4.    return *this;
5.  }
6.  //============================
7.  #define DEFINE_BYTECODE_OUTPUT(name, ...) 

In line 3, OutputLdaConstant stores one global constant into the constant-pool and generates bytecode LdaGlobal for loading this constant, so that we can get out the constant through the LdaGlobal during execution. About the constant pool, I will talk later.

By debugging OutputLdaConstant(), we step into the EmitBytecode() which generates one bytecode for a corresponding statement and also is the function that generates the LdaGlobal for our global constant.

1.  void BytecodeArrayWriter::EmitBytecode(const BytecodeNode* const node) {
2.  //omit..........
3.     for (int i = 0; i < operand_count; ++i) {
4.       switch (operand_sizes[i]) {
5.         case OperandSize::kNone:
6.           UNREACHABLE();
7.           break;
8.         case OperandSize::kByte:
9.           bytecodes()->push_back(static_cast<uint8_t>(operands[i]));
10.           break;
11.         case OperandSize::kShort: {
12.           uint16_t operand = static_cast<uint16_t>(operands[i]);
13.           const uint8_t* raw_operand = reinterpret_cast<const uint8_t*>(&operand);
14.           bytecodes()->push_back(raw_operand[0]);
15.           bytecodes()->push_back(raw_operand[1]);
16.           break;
17.         }
18.         case OperandSize::kQuad: {
19.           const uint8_t* raw_operand =
20.               reinterpret_cast<const uint8_t*>(&operands[i]);
21.           bytecodes()->push_back(raw_operand[0]);
22.           bytecodes()->push_back(raw_operand[1]);
23.           bytecodes()->push_back(raw_operand[2]);
24.           bytecodes()->push_back(raw_operand[3]);
25.           break;
26.         }
27.       }
28.     }
29.   }

From lines 4 to 25, calculate the operand size for our LdaGlobal, because in execution, a bytecode needs the operand size to operate the operand correctly.

Back to GenerateBytecodeBody(), in line 23, the VisitStatements() reads one AST node that represents a corresponding JavaScript statement, and generate one bytecode.

Note: If one AST node represents a JavaScript statement block, the VisitStatements() will recursively call itself to generate many bytecodes.

void BytecodeGenerator::VisitStatements(
    const ZonePtrList<Statement>* statements) {
  for (int i = 0; i < statements->length(); i++) {
    // Allocate an outer register allocations scope for the statement.
    RegisterAllocationScope allocation_scope(this);
    Statement* stmt = statements->at(i);
    Visit(stmt);
    if (builder()->RemainderOfBlockIsDead()) break;
  }
}

Figure 1 shows the VisitStatements’s call stack.

2. Constant Pool

function test(x)
{
 if(x>0)
 add3(x);//x=x+3
 dosomething........
}

When compiling the function test(), the compiler infers that the add3 will be called, and the add3 address should be pushed into the constant pool.

But, there’s a question here. We have no address if the add3() is not compiled yet. In this case, the compiler reserves a slot for the add3 in the constant pool and fills it once the add3() is compiled, namely DeferredConstants.

Let’s give a glance at FinalizeBytecode().

1.  Handle<BytecodeArray> BytecodeGenerator::FinalizeBytecode(
2.      Isolate* isolate, Handle<Script> script) {
3.    DCHECK_EQ(ThreadId::Current(), isolate->thread_id());
4.    AllocateDeferredConstants(isolate, script);
5.    if (block_coverage_builder_) {
6.      info()->set_coverage_info(
7.          isolate->factory()->NewCoverageInfo(block_coverage_builder_->slots()));
8.      if (FLAG_trace_block_coverage) {
9.        info()->coverage_info()->Print(info()->literal()->GetDebugName());
10.      }
11.    }
12.    if (HasStackOverflow()) return Handle<BytecodeArray>();
13.    Handle<BytecodeArray> bytecode_array = builder()->ToBytecodeArray(isolate);
14.    if (incoming_new_target_or_generator_.is_valid()) {
15.      bytecode_array->set_incoming_new_target_or_generator_register(
16.          incoming_new_target_or_generator_);
17.    }
18.    return bytecode_array;
19.  }

In line 4, AllocateDeferredConstants creates the DeferredConstants.

1.  void BytecodeGenerator::AllocateDeferredConstants(Isolate* isolate,
2.                                                    Handle<Script> script) {
3.    // Build global declaration pair arrays.
4.    for (GlobalDeclarationsBuilder* globals_builder : global_declarations_) {
5.      Handle<FixedArray> declarations =
6.          globals_builder->AllocateDeclarations(info(), script, isolate);
7.      if (declarations.is_null()) return SetStackOverflow();
8.      builder()->SetDeferredConstantPoolEntry(
9.          globals_builder->constant_pool_entry(), declarations);
10.     }
11.     // Find or build shared function infos.
12.     for (std::pair<FunctionLiteral*, size_t> literal : function_literals_) {
13.    //omit..................
14.     }
15.     // Find or build shared function infos for the native function templates.
16.     for (std::pair<NativeFunctionLiteral*, size_t> literal :
17.          native_function_literals_) {
18.   //omit..................
19.     }
20.     // Build object literal constant properties
21.     for (std::pair<ObjectLiteral*, size_t> literal : object_literals_) {
22.       ObjectLiteral* object_literal = literal.first;
23.   //omit..................
24.       }
25.     }
26.     // Build array literal constant elements
27.     for (std::pair<ArrayLiteral*, size_t> literal : array_literals_) {
28.   //omit..................
29.     }
30.     // Build class literal boilerplates.
31.     for (std::pair<ClassLiteral*, size_t> literal : class_literals_) {
32.   //omit..................
33.     }
34.     // Build template literals.
35.     for (std::pair<GetTemplateObject*, size_t> literal : template_objects_) {
36.   //omit..................
37.     }
38.   }

From lines 4 to 10, the loop-for store global variables in the constant pool.

From lines 11 to 35 store the sharedfunction, constant property, and other stuff in the constant pool.

3. FinalizeBytecode

The FinalizeBytecode() is the last step for generating bytecode. Let’s give a glance at line 13 of the FinalizeBytecode, the builder()->ToBytecodeArray reorganizes the bytecodes and returns a bytecode array that holds a bytecode array representing your JavaScript code. What is “reorganize”? It is that just put bytecode, constant pool, etc. together because all the stuff is already done well before ToBytecodeArray.

1.  Handle<BytecodeArray> BytecodeArrayBuilder::ToBytecodeArray(Isolate* isolate) {
2.    DCHECK(RemainderOfBlockIsDead());
3.    DCHECK(!bytecode_generated_);
4.    bytecode_generated_ = true;
5.    int register_count = total_register_count();
6.    if (register_optimizer_) {
7.  //omit
8.    }
9.    Handle<ByteArray> handler_table =
10.        handler_table_builder()->ToHandlerTable(isolate);
11.    return bytecode_array_writer_.ToBytecodeArray(
12.        isolate, register_count, parameter_count(), handler_table);
13.  }

Line 9 gets out the handler_table, namely the dispatch_table that I have talked about.

Line 11 stores the handler_table address into a bytecode array. Remember that each bytecode has only one handler that is the actual worker and is held by the dispatch_table.

Let’s give a glance at line 11 bytecode_array_writer_.ToBytecodeArray.

1.  Handle<BytecodeArray> BytecodeArrayWriter::ToBytecodeArray(
2.      Isolate* isolate, int register_count, int parameter_count,
3.      Handle<ByteArray> handler_table) {
4.    DCHECK_EQ(0, unbound_jumps_);
5.    int bytecode_size = static_cast<int>(bytecodes()->size());
6.    int frame_size = register_count * kSystemPointerSize;
7.    Handle<FixedArray> constant_pool =
8.        constant_array_builder()->ToFixedArray(isolate);
9.    Handle<BytecodeArray> bytecode_array = isolate->factory()->NewBytecodeArray(
10.        bytecode_size, &bytecodes()->front(), frame_size, parameter_count,
11.        constant_pool);
12.    bytecode_array->set_handler_table(*handler_table);
13.    return bytecode_array;
14.  }

Line 7 generates constant_pool, and line 9 generates a bytecode array that the source code is below:

1.  Handle<BytecodeArray> Factory::NewBytecodeArray(
2.      int length, const byte* raw_bytecodes, int frame_size, int parameter_count,
3.      Handle<FixedArray> constant_pool) {
4.    DCHECK_LE(0, length);
5.    if (length > BytecodeArray::kMaxLength) {
6.      isolate()->heap()->FatalProcessOutOfMemory("invalid array length");
7.    }
8.    // Bytecode array is AllocationType::kOld, so constant pool array should be
9.    // too.
10.    DCHECK(!Heap::InYoungGeneration(*constant_pool));
11.    int size = BytecodeArray::SizeFor(length);
12.    HeapObject result = AllocateRawWithImmortalMap(size, AllocationType::kOld,
13.                                                   *bytecode_array_map());
14.    Handle<BytecodeArray> instance(BytecodeArray::cast(result), isolate());
15.    instance->set_length(length);
16.    instance->set_frame_size(frame_size);
17.    instance->set_parameter_count(parameter_count);
18.    instance->set_incoming_new_target_or_generator_register(
19.        interpreter::Register::invalid_value());
20.    instance->set_osr_loop_nesting_level(0);
21.    instance->set_bytecode_age(BytecodeArray::kNoAgeBytecodeAge);
22.    instance->set_constant_pool(*constant_pool);
23.    instance->set_handler_table(*empty_byte_array());
24.    instance->set_source_position_table(*undefined_value());
25.    CopyBytes(reinterpret_cast<byte*>(instance->GetFirstBytecodeAddress()),
26.              raw_bytecodes, length);
27.    instance->clear_padding();
28.    return instance;
29.  }

Line 12 allocates memory from the V8 heap, and then lines 15–25 copy the bytecode array into the memory, and line 28 returns the bytecode finally.

Takeaways

  • Constant pool holds constants, function addresses, etc. Note: If a variable name is known, it as a constant is stored in the constant pool;

  • Constant pool always is at bytecode array[0]. In other words, given a bytecode array, the first element of the array is always the constant pool;

  • A bytecode array is a heap object and its type is FixedArray.

Okay, that wraps it up for this share. I’ll see you guys next time, take care!

Please reach out to me if you have any issues. WeChat: qq9123013 Email: [email protected]

Also Published here


Written by huidou | a big fan of chrome V8
Published by HackerNoon on 2022/10/16