paint-brush
GoLang: Working with Inconsistent Data Types in DynamoDB (Part 1) — Manual Conversionby@telnoiko

GoLang: Working with Inconsistent Data Types in DynamoDB (Part 1) — Manual Conversion

by Konstantin TelnoiSeptember 6th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Go is a strongly-typed, performant language designed to accomplish 80 percent of the work. It can be used to parse diverse data types into a single, consistent format. In this article, I'll cover several approaches for manually decoding inconsistencies in a DynamoDB database.
featured image - GoLang: Working with Inconsistent Data Types in DynamoDB (Part 1) — Manual Conversion
Konstantin Telnoi HackerNoon profile picture


One challenge I frequently encounter when working with NoSQL databases is eventual type divergence and the need to parse diverse data types into a single, consistent format. This scenario often arises when developing services that store user data.


As business requirements evolve, we might need to store the same data in different formats. Using the same field for multiple types is convenient, especially in weakly typed languages like Node.js and Python. However, as the project grows, we eventually face the task of working with multiple data types in our database and converting them to a single type to meet business requirements.


GoLang has emerged as a popular choice for addressing such business cases. It's strongly typed, performant, and designed to accomplish 80 percent of the work with 20 percent of the effort. So, how do we handle situations where the data in the database can be of any type, and we need to parse it into a single type in our strongly typed language?


We need to write type converters to resolve type inconsistencies in the code.

Generally, we have two options for custom data conversions:


  • Parse our data manually from the most abstract type our language provides
  • Use extension points provided by the database driver and utilize the driver's tooling


In this article, I'll cover several approaches for manually decoding inconsistencies in a DynamoDB database. I've placed all the code and tests I'll be discussing in a separate repository, which you're welcome to explore and run locally if needed.

Manual Type Conversion

To manually parse the data, we need the most abstract type that can represent the majority of cases. In Go, this would be interface{}, or its alias any. If we used a specific type instead of any, our driver would panic at runtime when encountering a different type in the database, forcing us to skip those cases during parsing.


For example, imagine we have two ways of storing someone's favorite food:

[
  {
    "user_id": 1,
    "favorite_food": "apples"
  },
  {
    "user_id": 2,
    "favorite_food": [
      "apples",
      "strawberies"
    ]
  }
]


Because favorite_food is represented by two types - string and []string - we can't use either of these types in our code, as there will always be a second type that causes our program to panic. However, in our business logic, we would know the target type we need for that particular data and how to convert all other data variants to that target type. Let's represent this in code:


    type UserData struct {  
        ID           string  
        FavoriteFood any `dynamodbav:"favorite_food"`  
    }  
      
    type UserDataTarget struct {  
        FavoriteFood []string  
    }


The FavoriteFood field of type any is marked with the dynamodbav:"favorite_food" tag, which tells the DynamoDB driver to map the favorite_food field from storage into our field. Since our field type is any, the driver will use the Go type matching the DynamoDB type and assign it to our field. We can then retrieve the real type information using type comparison.


    // ConvertToArray is a manual conversion function to map 2 possible  
    // types of fields in DynamoDB - string and array of strings -  
    // into a slice of strings.  
    func ConvertToArray(field any) ([]string, error) {  
        if field == nil {  
           return nil, nil  
        }  
      
        switch field.(type) {  
        case []string:  
           return field.([]string), nil  
        case string:  
           return []string{field.(string)}, nil  
        case []any:  
           values := field.([]any)  
           casted := make([]string, 0)  
           for _, value := range values {  
              castedValue, ok := value.(string)  
              if !ok {  
                 continue  
              }  
              casted = append(casted, castedValue)  
           }  
      
           return casted, nil  
        default:  
           return nil, fmt.Errorf("unsupported type '%T' for the field '%v'", field, field)  
        }  
    }


Note here that by default DynamoDB will store any array that you pass to it as an untyped list - L even though it has the type for a set of strings - SS and you are passing an array of strings. Because of that we need to take into account a case with []any even though we might never store anything else other than a set of strings.


The code of our use case for our app will look like this:

	 e := echo.New()
	 e.POST("/user-data", ucase.CreateUserData)  
	 e.GET("/user-data-manual/:id", ucase.GetUserData)
	 
	 type userDataRepository interface {  
        SaveUserDataAbstract(UserDataAbstract *types.UserDataRequest) (id string, err error)  
        GetUserDataAbstract(id uuid.UUID) (*types.UserDataRequest, error)  
    }

	 // CreateUserData accepts any type of 'favorite_food' field  and saves it to the database  
    func (d *Usecase) CreateUserData(ctx echo.Context) error {  
        ctx.Logger().Info("CreateUserData")  
        var request types.UserDataRequest  
        err := ctx.Bind(&request)  
        if err != nil {  
           return ctx.String(http.StatusBadRequest, fmt.Sprintf("couldn't parse body: %v", err))  
        }  
      
        id, err := d.repo.SaveUserDataAbstract(&request)  
        if err != nil {  
           return ctx.String(http.StatusInternalServerError, fmt.Sprintf("couldn't save item: %v", err))  
        }  
      
        return ctx.JSON(http.StatusOK, types.UserDataRequest{ID: id})  
    }  
      
    // GetUserData parses the 'favorite_food' field with manual parser  
    func (d *Usecase) GetUserData(ctx echo.Context) error {  
        ctx.Logger().Info("GetUserData")  
        id := ctx.Param("id")  
        uid, err := uuid.Parse(id)  
        if err != nil {  
           return ctx.String(http.StatusBadRequest, fmt.Sprintf("couldn't parse id: %v", err))  
        }  
      
        rawUserData, err := d.repo.GetUserDataAbstract(uid)  
        if err != nil {  
           return ctx.String(http.StatusInternalServerError, fmt.Sprintf("couldn't get item: %v", err))  
        }  
      
        // Convert manually  
        userData, err := parseAbstractRequest(rawUserData)  
        if err != nil {  
           return ctx.String(http.StatusInternalServerError, fmt.Sprintf("couldn't parse data: %v", err))  
        }  
      
        return ctx.JSON(http.StatusOK, userData)  
    }

	 func parseAbstractRequest(request *types.UserDataRequest) (manual.UserDataTarget, error) {  
        target := manual.UserDataTarget{}  
        parsed, err := manual.ConvertToArray(request.FavoriteFood)  
        if err != nil {  
           return target, err  
        }  
        target.FavoriteFood = parsed  
        return target, nil  
    }


The code demonstrates a simple web application using the Echo framework. It defines two endpoints: one for creating user data (CreateUserData) and another for retrieving user data (GetUserData). The application uses a repository interface to interact with the database. The CreateUserData function accepts any type for the favorite_food field and saves it to the database. The GetUserData function retrieves the data and then uses a manual parser (parseAbstractRequest) to convert the abstract data into a specific target type. The parseAbstractRequest function demonstrates how manual conversion is applied to transform the abstract UserDataRequest into a concrete UserDataTarget structure, specifically handling the favorite_food field using the ConvertToArray function we discussed earlier.


The main advantage of this manual conversion approach lies in its simplicity. The conversion logic can be encapsulated in a separate mapper function, like parseAbstractRequest in our example. This mapper serves as a single point of conversion, which needs to be updated whenever new fields are added to the structure or when data types change. This centralization of conversion logic offers some level of maintainability, as developers know exactly where to make changes when the data structure evolves.


However, while this approach is straightforward to implement, it still comes with several drawbacks:

  1. Maintainability Challenges: Although the conversion logic is centralized, the code can become complex and harder to maintain as the number of fields and type variations increases. Each new field or type change requires manual updates to the mapper function.
  2. Limited Scalability: As the application grows, the mapper function may become a bottleneck, potentially requiring frequent updates and increasing the risk of errors.
  3. Reduced Reusability: While the mapper function itself is reusable, the manual conversion approach may lead to similar conversion logic being implemented in different parts of the application for various data structures.
  4. Increased Cognitive Load: Developers need to remember to update the mapper function whenever data structures change, which can be error-prone in fast-paced development environments.


These issues often arise from the combination of NoSQL databases and weakly-typed languages in rapid development scenarios. While this approach can accelerate initial development and provide a quick solution for handling inconsistent data types, it often results in accumulated technical debt that must be addressed as the project evolves.


The trade-off between development speed, code quality, and long-term maintainability remains a common challenge in software development. While the manual conversion with a centralized mapper can be an effective temporary solution, it's crucial to consider more robust and scalable approaches for handling data type inconsistencies in larger or long-term projects.


In future articles, we'll explore more sophisticated methods for managing inconsistent data types in DynamoDB with GoLang, aiming to balance development speed with code quality and maintainability while addressing the limitations of this simple manual conversion approach.