Understanding Chrome V8: Chapter 19: Compilation Cache — Make the Compiler Faster

Written by huidou | Published 2022/09/30
Tech Story Tags: javascript | google-chrome | chrome-v8 | understanding-chrome-v8 | web-development

TLDR"Let's Understand Chrome V8" are serial technology articles that explain the V8 code, it covers many V8 kernel functions and fundamentals.via the TL;DR App

Welcome to other chapters of Let’s Understand Chrome V8

For performance reasons, V8 uses the compilation cache to hold the SharedFunction which is the compiler result, the cache returns the result directly once the same JavaScript code is compiled again.

You may ask why the same code is compiled again. Compile first, then execute — this is the pipeline of V8. Even the exact same code, the same function, still needs to be sent to the compiler, but not always recompiled. Actually, JavaScript code may change during execution, so the first action always is that we get the JavaScript code’s hash-value from the compilation cache, return the SharedFunction if the cache is hit, and recompile the JavaScript conversely.

1. Initialization and Filling

The Isolate, which is the V8 virtual machine instance, is responsible for initializing the CompilationCache, isolate::Init as shown below:

bool Isolate::Init(ReadOnlyDeserializer* read_only_deserializer,
                   StartupDeserializer* startup_deserializer) {
//omit...................
#define ASSIGN_ELEMENT(CamelName, hacker_name)                  \
  isolate_addresses_[IsolateAddressId::k##CamelName##Address] = \
      reinterpret_cast<Address>(hacker_name##_address());
  FOR_EACH_ISOLATE_ADDRESS_NAME(ASSIGN_ELEMENT)
#undef ASSIGN_ELEMENT

  compilation_cache_ = new CompilationCache(this);

The last line is now a CompilationCache, namely, initialization. Let’s dive into it in more depth.

1.  class V8_EXPORT_PRIVATE CompilationCache {
2.   public:
3.    MaybeHandle<SharedFunctionInfo> LookupScript(
4.        Handle<String> source, MaybeHandle<Object> name, int line_offset,
5.        int column_offset, ScriptOriginOptions resource_options,
6.        Handle<Context> native_context, LanguageMode language_mode);
7.    InfoCellPair LookupEval(Handle<String> source,
8.                            Handle<SharedFunctionInfo> outer_info,
9.                            Handle<Context> context, LanguageMode language_mode,
10.                            int position);
11.    MaybeHandle<FixedArray> LookupRegExp(Handle<String> source,
12.                                         JSRegExp::Flags flags);
13.    void PutScript(Handle<String> source, Handle<Context> native_context,
14.                   LanguageMode language_mode,
15.                   Handle<SharedFunctionInfo> function_info);
16.    void PutEval(Handle<String> source, Handle<SharedFunctionInfo> outer_info,
17.                 Handle<Context> context,
18.                 Handle<SharedFunctionInfo> function_info,
19.                 Handle<FeedbackCell> feedback_cell, int position);
20.    void PutRegExp(Handle<String> source, JSRegExp::Flags flags,
21.                   Handle<FixedArray> data);
22.    void Clear();
23.    void Remove(Handle<SharedFunctionInfo> function_info);
24.    void Iterate(RootVisitor* v);
25.    void MarkCompactPrologue();
26.    void Enable();
27.    void Disable();
28.   private:
29.    explicit CompilationCache(Isolate* isolate);
30.    ~CompilationCache() = default;
31.    base::HashMap* EagerOptimizingSet();
32.    static const int kSubCacheCount = 4;
33.    bool IsEnabled() const { return FLAG_compilation_cache && enabled_; }
34.    Isolate* isolate() const { return isolate_; }
35.    Isolate* isolate_;
36.    CompilationCacheScript script_;
37.    CompilationCacheEval eval_global_;
38.    CompilationCacheEval eval_contextual_;
39.    CompilationCacheRegExp reg_exp_;
40.    CompilationSubCache* subcaches_[kSubCacheCount];
41.    bool enabled_;
42.    friend class Isolate;
43.    DISALLOW_COPY_AND_ASSIGN(CompilationCache);
44.  };

There’re three fundamentals of the above function that are important:

1. LookupScript() at line 3. It is used to look up the cache to take out the SharedFunction that holds the bytecodes corresponding to your JavaScript code. If hit, skip compilation into the execution unit directly.

2. PutScript() at line 13. If LookupScript misses, PutScript() puts the hash corresponding to your JavaScript code and the SharedFunciton that the compiler generates into the cache before execution.

3. LookupEval and PutEval at lines 7 and 16. They are the same as LookupScript and PutScript but specialize in the JavaScript eval method because the eval needs not only SharedFunction but also context.

Let’s examine LookupScript in a little more depth:

1.  MaybeHandle<SharedFunctionInfo> CompilationCache::LookupScript(
2.      Handle<String> source, MaybeHandle<Object> name, int line_offset,
3.      int column_offset, ScriptOriginOptions resource_options,
4.      Handle<Context> native_context, LanguageMode language_mode) {
5.    if (!IsEnabled()) return MaybeHandle<SharedFunctionInfo>();
6.    return script_.Lookup(source, name, line_offset, column_offset,
7.                          resource_options, native_context, language_mode);
8.  }
9.  //..................separation..........................
10.  //..................separation..........................
11.  MaybeHandle<SharedFunctionInfo> CompilationCacheScript::Lookup(
12.      Handle<String> source, MaybeHandle<Object> name, int line_offset,
13.      int column_offset, ScriptOriginOptions resource_options,
14.      Handle<Context> native_context, LanguageMode language_mode) {
15.    MaybeHandle<SharedFunctionInfo> result;
16.    {
17.      HandleScope scope(isolate());
18.      const int generation = 0;
19.      DCHECK_EQ(generations(), 1);
20.      Handle<CompilationCacheTable> table = GetTable(generation);
21.      MaybeHandle<SharedFunctionInfo> probe = CompilationCacheTable::LookupScript(
22.          table, source, native_context, language_mode);
23.      Handle<SharedFunctionInfo> function_info;
24.      if (probe.ToHandle(&function_info)) {
25.        if (HasOrigin(function_info, name, line_offset, column_offset,
26.                      resource_options)) {
27.          result = scope.CloseAndEscape(function_info);
28.        }
29.      }
30.    }
31.    Handle<SharedFunctionInfo> function_info;
32.    if (result.ToHandle(&function_info)) {
33.  #ifdef DEBUG
34.      DCHECK(HasOrigin(function_info, name, line_offset, column_offset,
35.                       resource_options));
36.  #endif
37.      isolate()->counters()->compilation_cache_hits()->Increment();
38.      LOG(isolate(), CompilationCacheEvent("hit", "script", *function_info));
39.    } else {
40.      isolate()->counters()->compilation_cache_misses()->Increment();
41.    }
42.    return result;
43.  }

We can find that the CompilationCache::LookupScript is only the entrance, and the key function, as well as the important code, are all in CompilationCacheScript::Lookup().

1. In Lookup(), the argument source is the JavaScript that you as developers are actually writing, which is going to be compiled.

2. GetTable at line 20. It is the actual table that holds data into which PutScript put.

3. CompilationCacheTable::LookupScript at line 21. Look up the table for a result in which hit means a SharedFunction or miss is null.

4. In line 25: Check if the SharedFunction is the correct one depending on Origin.

Let’s take a glance at HasOrigin.

1.  // We only re-use a cached function for some script source code if the
2.  // script originates from the same place. This is to avoid issues
3.  // when reporting errors, etc.
4.  bool CompilationCacheScript::HasOrigin(Handle<SharedFunctionInfo> function_info,
5.                                         MaybeHandle<Object> maybe_name,
6.                                         int line_offset, int column_offset,
7.                                         ScriptOriginOptions resource_options) {
8.    Handle<Script> script =
9.        Handle<Script>(Script::cast(function_info->script()), isolate());
10.    Handle<Object> name;
11.    if (!maybe_name.ToHandle(&name)) {
12.      return script->name().IsUndefined(isolate());
13.    }
14.    if (line_offset != script->line_offset()) return false;
15.    if (column_offset != script->column_offset()) return false;
16.    if (!name->IsString() || !script->name().IsString()) return false;
17.    if (resource_options.Flags() != script->origin_options().Flags())
18.      return false;
19.    return String::Equals(
20.        isolate(), Handle<String>::cast(name),
21.        Handle<String>(String::cast(script->name()), isolate()));
22.  }

From line 1 to line 3, the comments provide the origin check rules. Specifically, the implementation is from line 14 to line 16.

Next, let’s go into PutScript.

1.  void CompilationCache::PutScript(Handle<String> source,
2.                                   Handle<Context> native_context,
3.                                   LanguageMode language_mode,
4.                                   Handle<SharedFunctionInfo> function_info) {
5.    if (!IsEnabled()) return;
6.    LOG(isolate(), CompilationCacheEvent("put", "script", *function_info));
7.    script_.Put(source, native_context, language_mode, function_info);
8.  }
9.  //.......................separation............................
10.  void CompilationCacheScript::Put(Handle<String> source,
11.                                   Handle<Context> native_context,
12.                                   LanguageMode language_mode,
13.                                   Handle<SharedFunctionInfo> function_info) {
14.    HandleScope scope(isolate());
15.    Handle<CompilationCacheTable> table = GetFirstTable();
16.    SetFirstTable(CompilationCacheTable::PutScript(table, source, native_context,
17.                                                   language_mode, function_info));
18.  }

As same as LookupScript(), the CompilationCache::PutScript is entrance, all important stuff is in CompilationCacheTable::PutScript.

In PutScript, I would like to inform you that the cache table will be filling. Below is the CacheTable source code:

1.  // This cache is used in two different variants. For regexp caching, it simply
2.  // maps identifying info of the regexp to the cached regexp object. Scripts and
3.  // eval code only gets cached after a second probe for the code object. To do
4.  // so, on first "put" only a hash identifying the source is entered into the
5.  // cache, mapping it to a lifetime count of the hash. On each call to Age all
6.  // such lifetimes get reduced, and removed once they reach zero. If a second put
7.  // is called while such a hash is live in the cache, the hash gets replaced by
8.  // an actual cache entry. Age also removes stale live entries from the cache.
9.  // Such entries are identified by SharedFunctionInfos pointing to either the
10.  // recompilation stub, or to "old" code. This avoids memory leaks due to
11.  // premature caching of scripts and eval strings that are never needed later.
12.  class CompilationCacheTable
13.      : public HashTable<CompilationCacheTable, CompilationCacheShape> {
14.   public:
15.    NEVER_READ_ONLY_SPACE
16.    static MaybeHandle<SharedFunctionInfo> LookupScript(
17.        Handle<CompilationCacheTable> table, Handle<String> src,
18.        Handle<Context> native_context, LanguageMode language_mode);
19.    static InfoCellPair LookupEval(Handle<CompilationCacheTable> table,
20.                                   Handle<String> src,
21.                                   Handle<SharedFunctionInfo> shared,
22.                                   Handle<Context> native_context,
23.                                   LanguageMode language_mode, int position);
24.    Handle<Object> LookupRegExp(Handle<String> source, JSRegExp::Flags flags);
25.    static Handle<CompilationCacheTable> PutScript(
26.        Handle<CompilationCacheTable> cache, Handle<String> src,
27.        Handle<Context> native_context, LanguageMode language_mode,
28.        Handle<SharedFunctionInfo> value);
29.//omit......................
30.    };

From line 1 to line 11, the comments provide more corresponding explanations enough.

Let’s go back to CompilationCacheScript::Put() and ahead to line 16 which is CompilationCacheTable::PutScript().

1.  Handle<CompilationCacheTable> CompilationCacheTable::PutScript(
2.      Handle<CompilationCacheTable> cache, Handle<String> src,
3.      Handle<Context> native_context, LanguageMode language_mode,
4.      Handle<SharedFunctionInfo> value) {
5.    Isolate* isolate = native_context->GetIsolate();
6.    Handle<SharedFunctionInfo> shared(native_context->empty_function().shared(),
7.                                      isolate);
8.    src = String::Flatten(isolate, src);
9.    StringSharedKey key(src, shared, language_mode, kNoSourcePosition);
10.    Handle<Object> k = key.AsHandle(isolate);
11.    cache = EnsureCapacity(isolate, cache, 1);
12.    int entry = cache->FindInsertionEntry(key.Hash());
13.    cache->set(EntryToIndex(entry), *k);
14.    cache->set(EntryToIndex(entry) + 1, *value);
15.    cache->ElementAdded();
16.    return cache;
17.  }

I think the important thing is that PutScript binds the JavaScript source and SharedFuncation as a hash pair as well as puts it into the cache, which the implementations are from line 9 to line 15.

2. Lookup and update

Let’s go into Compiler::GetSharedFunctionInfoForScript.

1.   MaybeHandle<SharedFunctionInfo> Compiler::GetSharedFunctionInfoForScript(
2.       Isolate* isolate, Handle<String> source,
3.       const Compiler::ScriptDetails& script_details,
4.       ScriptOriginOptions origin_options, v8::Extension* extension,
5.       ScriptData* cached_data, ScriptCompiler::CompileOptions compile_options,
6.       ScriptCompiler::NoCacheReason no_cache_reason, NativesFlag natives) {
7.  //omit.....................
8.       if (extension == nullptr) {
9.         bool can_consume_code_cache =
10.             compile_options == ScriptCompiler::kConsumeCodeCache;
11.         if (can_consume_code_cache) {
12.           compile_timer.set_consuming_code_cache();
13.         }
14.         maybe_result = compilation_cache->LookupScript(//lookup is here.
15.             source, script_details.name_obj, script_details.line_offset,
16.             script_details.column_offset, origin_options, isolate->native_context(),
17.             language_mode);
18.         if (!maybe_result.is_null()) {
19.           compile_timer.set_hit_isolate_cache();
20.         } else if (can_consume_code_cache) {
21.  //omit
22.           if (CodeSerializer::Deserialize(isolate, cached_data, source,
23.                                           origin_options)
24.                   .ToHandle(&inner_result) &&
25.               inner_result->is_compiled()) {
26.             is_compiled_scope = inner_result->is_compiled_scope();
27.             DCHECK(is_compiled_scope.is_compiled());
28.     		//filling is here
29.             compilation_cache->PutScript(source, isolate->native_context(),
30.                                          language_mode, inner_result);
31.             Handle<Script> script(Script::cast(inner_result->script()), isolate);
32.             maybe_result = inner_result;
33.           } else {
34.             compile_timer.set_consuming_code_cache_failed();
35.           }
36.         }
37.       }
38.       return maybe_result;
39.     }

In line 14, it means that we need to compile the JavaScript code if LookupScript is missing, so we skip to line 21 to perform compilation and generate inner_result. Then at line 29, put the inner_result into the cache. We can find that the innuer_result is just the maybe_result at line 32.

Figure 1 shows the call stack.

Besides LookupScript and PutScript, there are also LookupEval, PutEval as well as LookupRegExp, PutRegExp in V8. The workflow of them is the same as LookupScript and PutScript, please learn by yourself.

Okay, that wraps it up for this share. I’ll see you guys next time, take care!

Please reach out to me if you have any issues. WeChat: qq9123013 Email: [email protected]

Also published here.


Written by huidou | a big fan of chrome V8
Published by HackerNoon on 2022/09/30