GoLang Guide: Manual Handling of Inconsistent DynamoDB Data

One challenge I frequently encounter when working with NoSQL databases is eventual type divergence and the need to parse diverse data types into a single, consistent format. This scenario often arises when developing services that store user data.

Throughout my experience, I've faced several cases where supporting legacy inconsistent data types was crucial:

GDPR Compliance: Parsing the entire database to provide user data in compliance with GDPR regulations.
Performance Optimization: Rewriting the service in a different language due to performance issues.
Architectural Changes: Splitting a monolithic application into a set of microservices to address maintainability issues.

As business requirements evolve, we might need to store the same data in different formats. Using the same field for multiple types is convenient, especially in weakly typed languages like Node.js and Python. However, as the project grows, we eventually face the task of working with multiple data types in our database and converting them to a single type to meet business requirements.

GoLang has emerged as a popular choice for addressing such business cases. It's strongly typed, performant, and designed to accomplish 80 percent of the work with 20 percent of the effort. So, how do we handle situations where the data in the database can be of any type, and we need to parse it into a single type in our strongly typed language?

We need to write type converters to resolve type inconsistencies in the code.

Generally, we have two options for custom data conversions:

Parse our data manually from the most abstract type our language provides
Use extension points provided by the database driver and utilize the driver's tooling

In this article, I'll cover several approaches for manually decoding inconsistencies in a DynamoDB database. I've placed all the code and tests I'll be discussing in a separate repository, which you're welcome to explore and run locally if needed.

Manual Type Conversion

To manually parse the data, we need the most abstract type that can represent the majority of cases. In Go, this would be interface{}, or its alias any. If we used a specific type instead of any, our driver would panic at runtime when encountering a different type in the database, forcing us to skip those cases during parsing.

For example, imagine we have two ways of storing someone's favorite food:

[
  {
    "user_id": 1,
    "favorite_food": "apples"
  },
  {
    "user_id": 2,
    "favorite_food": [
      "apples",
      "strawberies"
    ]
  }
]

Because favorite_food is represented by two types - string and []string - we can't use either of these types in our code, as there will always be a second type that causes our program to panic. However, in our business logic, we would know the target type we need for that particular data and how to convert all other data variants to that target type. Let's represent this in code:

    type UserData struct {  
        ID           string  
        FavoriteFood any `dynamodbav:"favorite_food"`  
    }  
      
    type UserDataTarget struct {  
        FavoriteFood []string  
    }

The FavoriteFood field of type any is marked with the dynamodbav:"favorite_food" tag, which tells the DynamoDB driver to map the favorite_food field from storage into our field. Since our field type is any, the driver will use the Go type matching the DynamoDB type and assign it to our field. We can then retrieve the real type information using type comparison.

    // ConvertToArray is a manual conversion function to map 2 possible  
    // types of fields in DynamoDB - string and array of strings -  
    // into a slice of strings.  
    func ConvertToArray(field any) ([]string, error) {  
        if field == nil {  
           return nil, nil  
        }  
      
        switch field.(type) {  
        case []string:  
           return field.([]string), nil  
        case string:  
           return []string{field.(string)}, nil  
        case []any:  
           values := field.([]any)  
           casted := make([]string, 0)  
           for _, value := range values {  
              castedValue, ok := value.(string)  
              if !ok {  
                 continue  
              }  
              casted = append(casted, castedValue)  
           }  
      
           return casted, nil  
        default:  
           return nil, fmt.Errorf("unsupported type '%T' for the field '%v'", field, field)  
        }  
    }

Note here that by default DynamoDB will store any array that you pass to it as an untyped list - L even though it has the type for a set of strings - SS and you are passing an array of strings. Because of that we need to take into account a case with []any even though we might never store anything else other than a set of strings.

Example Application: Creating and Retrieving User Data

Below is a simple web application using the Echo framework. We'll break down the code into smaller sections to better understand how each part functions. First, we define the routes for creating and retrieving user data:

e := echo.New()
e.POST("/user-data", ucase.CreateUserData)  
e.GET("/user-data-manual/:id", ucase.GetUserData)

Next, we define the interface for interacting with the database:

type userDataRepository interface {  
    SaveUserDataAbstract(UserDataAbstract *types.UserDataRequest) (id string, err error)  
    GetUserDataAbstract(id uuid.UUID) (*types.UserDataRequest, error)  
}

Creating User Data

The CreateUserData function accepts any type for the favorite_food field and saves it to the database:

func (d *Usecase) CreateUserData(ctx echo.Context) error {  
    var request types.UserDataRequest  
    if err := ctx.Bind(&request); err != nil {  
       return ctx.String(http.StatusBadRequest, fmt.Sprintf("couldn't parse body: %v", err))  
    }  

    id, err := d.repo.SaveUserDataAbstract(&request)  
    if err != nil {  
       return ctx.String(http.StatusInternalServerError, fmt.Sprintf("couldn't save item: %v", err))  
    }  

    return ctx.JSON(http.StatusOK, types.UserDataRequest{ID: id})  
}

Retrieving User Data

The GetUserData function retrieves data and uses a manual parser to convert it:

func (d *Usecase) GetUserData(ctx echo.Context) error {  
    id := ctx.Param("id")  
    uid, err := uuid.Parse(id)  
    if err != nil {  
       return ctx.String(http.StatusBadRequest, fmt.Sprintf("couldn't parse id: %v", err))  
    }  

    rawUserData, err := d.repo.GetUserDataAbstract(uid)  
    if err != nil {  
       return ctx.String(http.StatusInternalServerError, fmt.Sprintf("couldn't get item: %v", err))  
    }  

    userData, err := parseAbstractRequest(rawUserData)  
    if err != nil {  
       return ctx.String(http.StatusInternalServerError, fmt.Sprintf("couldn't parse data: %v", err))  
    }  

    return ctx.JSON(http.StatusOK, userData)  
}

Parsing the Request

Finally, the parseAbstractRequest function demonstrates how to manually convert the abstract data:

func parseAbstractRequest(request *types.UserDataRequest) (manual.UserDataTarget, error) {  
    target := manual.UserDataTarget{}  
    parsed, err := manual.ConvertToArray(request.FavoriteFood)  
    if err != nil {  
       return target, err  
    }  
    target.FavoriteFood = parsed  
    return target, nil  
}

The code demonstrates a simple web application using the Echo framework. It defines two endpoints: one for creating user data (CreateUserData) and another for retrieving user data (GetUserData). The CreateUserData function accepts any type for the favorite_food field and saves it to the database. The GetUserData function retrieves the data and then uses a manual parser (parseAbstractRequest) to convert the abstract data into a specific target type. The parseAbstractRequest function demonstrates how manual conversion is applied to transform the abstract UserDataRequest into a concrete UserDataTarget structure, specifically handling the favorite_food field using the ConvertToArray function we discussed earlier.

Pros and cons

The main advantage of this manual conversion approach lies in its simplicity. The conversion logic can be encapsulated in a separate mapper function, like parseAbstractRequest in our example. This mapper serves as a single point of conversion, which needs to be updated whenever new fields are added to the structure or when data types change. This centralization of conversion logic offers some level of maintainability, as developers know exactly where to make changes when the data structure evolves.

However, while this approach is straightforward and easy to implement, it still comes with several drawbacks:

Maintainability Challenges: Although the conversion logic is centralized, the code can become complex and harder to maintain as the number of fields and type variations increases. Each new field or type change requires manual updates to the mapper function.
Limited Scalability: As the application grows, the mapper function may become a bottleneck, potentially requiring frequent updates and increasing the risk of errors.
Reduced Reusability: While the mapper function itself is reusable, the manual conversion approach may lead to similar conversion logic being implemented in different parts of the application for various data structures.
Increased Cognitive Load: Developers need to remember to update the mapper function whenever data structures change, which can be error-prone in fast-paced development environments.

Conclusion

These issues often arise from the combination of NoSQL databases and weakly-typed languages in rapid development scenarios. While this approach can accelerate initial development and provide a quick solution for handling inconsistent data types, it often results in accumulated technical debt that must be addressed as the project evolves.

The trade-off between development speed, code quality, and long-term maintainability remains a common challenge in software development. While the manual conversion with a centralized mapper can be an effective temporary solution, it's crucial to consider more robust and scalable approaches for handling data type inconsistencies in larger or long-term projects.

In future articles, we'll explore more sophisticated methods for managing inconsistent data types in DynamoDB with GoLang, aiming to balance development speed with code quality and maintainability while addressing the limitations of this simple manual conversion approach.

GoLang Guide: Manual Handling of Inconsistent DynamoDB Data

Too Long; Didn't Read

Manual Type Conversion

Example Application: Creating and Retrieving User Data

Creating User Data

Retrieving User Data

Parsing the Request

Pros and cons

Conclusion

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

GoLang Guide: Manual Handling of Inconsistent DynamoDB Data

Too Long; Didn't Read

Manual Type Conversion

Example Application: Creating and Retrieving User Data

Creating User Data

Retrieving User Data

Parsing the Request

Pros and cons

Conclusion

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps