How to Generate Large Datasets in .NET for Excel With OpenXML

Importance of Excel reporting Common approach to generating Excel files Passing large datasets in Excel Unexpected memory leaks: unraveling the Enigma Final thoughts Importance of Excel Reporting In large enterprise companies, generating Excel reports has become an indispensable process for managing and analyzing extensive datasets efficiently. These reports are crucial for tracking performance metrics, financial records, and operational statistics, offering valuable insights that drive strategic decision-making. In such environments, automation tools that generate these files play a pivotal role in streamlining report creation and ensuring accuracy. As we advance into 2024, the ability to generate Excel files should be an easy and common task, right? Common Approach to Generating Excel Files To generate an Excel file with your own dataset, we will use the OpenXML library. The first thing you should do is install this library into your project: dotnet add package DocumentFormat.OpenXml After installing the necessary library and creating our template Excel file named “Test.xlsx,” we added this code to our application: // this custom type is for your input data public class DataSet { public List Rows { get; set; } } // this row will contain number of our row and info about each cell public class DataRow { public int Index { get; set; } public Dictionary Cells { get; set; } } private void SetValuesToExcel(string filePath, DataSet dataSet) { if (string.IsNullOrWhiteSpace(filePath)) { throw new FileNotFoundException($"File not found at this path: {filePath}"); } using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, true)) { //each excel document has XML-structure, //so we need to go deeper to our sheet WorkbookPart wbPart = document.WorkbookPart; //feel free to pass sheet name as parameter. //here we'll just use the default one Sheet theSheet = wbPart.Workbook .Descendants () .FirstOrDefault(s => s.Name.Value.Trim() == "Sheet1"); //next element in hierarchy is worksheetpart //we need to dive deeper to SheetData object WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(theSheet.Id)); Worksheet worksheet = wsPart.Worksheet; SheetData sheetData = worksheet.GetFirstChild (); //iterating through our data foreach (var dataRow in dataSet.Rows) { //getting Row element from Excel's DOM var rowIndex = dataRow.Index; var row = sheetData .Elements () .FirstOrDefault(r => r.RowIndex == rowIndex); //if there is no row - we'll create new one if (row == null) { row = new Row { RowIndex = (uint)rowIndex }; sheetData.Append(row); } //now we need to iterate though each cell in the row foreach (var dataCell in dataRow.Cells) { var cell = row.Elements () .FirstOrDefault(c => c.CellReference.Value == dataCell.Key); if (cell == null) { cell = new Cell { CellReference = dataCell.Key, DataType = CellValues.String }; row.AppendChild(cell); } cell.CellValue = new CellValue(dataCell.Value); } } //after all changes in Excel DOM we need to save it wbPart.Workbook.Save(); } } And that is how to use the code above: var filePath = "Test.xlsx"; // number of rows that we want to add to our Excel file var testRowsCounter = 100; // creating some data for it var dataSet = new DataSet(); dataSet.Rows = new List (); string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; for (int i = 0; i (), Index = i + 1 }; for (int j = 0; j < 10; j++) { row.Cells.Add($"{alphabet[j]}{i+1}", Guid.NewGuid().ToString()); } dataSet.Rows.Add(row); } //passing path to our file and data object SetValuesToExcel(filePath, dataSet); Metrics Count of rows Time to process Memory gained (MB) 100 454ms 21 Mb 10 000 2.92s 132 Mb 100 000 10min 47s 270ms 333 Mb In this table, we tried to test our function with various numbers of rows. As expected - increasing number of rows will lead to decreasing of performance. To fix that, we can try another approach. Passing Large Datasets in Excel The approach demonstrated above is straightforward and sufficient for small datasets. However, as illustrated in the table, processing large datasets can be significantly slow. This method involves DOM manipulations, which are inherently slow. In such cases, the SAX (Simple API for XML) approach becomes invaluable. As the name suggests, SAX allows us to work directly with the XML of the Excel document, providing a more efficient solution for handling large datasets. Changing code from the first example to this: using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, true)) { WorkbookPart workbookPart = document.WorkbookPart; //we taking the original worksheetpart of our template WorksheetPart worksheetPart = workbookPart.WorksheetParts.First(); //adding the new one WorksheetPart replacementPart = workbookPart.AddNewPart (); string originalSheetId = workbookPart.GetIdOfPart(worksheetPart); string replacementPartId = workbookPart.GetIdOfPart(replacementPart); //the main idea is read through XML of original sheet object OpenXmlReader openXmlReader = OpenXmlReader.Create(worksheetPart); //and write it to the new one with some injection of our custom data OpenXmlWriter openXmlWriter = OpenXmlWriter.Create(replacementPart); while (openXmlReader.Read()) { if (openXmlReader.ElementType == typeof(SheetData)) { if (openXmlReader.IsEndElement) continue; // write sheet element openXmlWriter.WriteStartElement(new SheetData()); // write data rows foreach (var row in dataSet.Rows) { Row r = new Row { RowIndex = (uint)row.Index }; // start row openXmlWriter.WriteStartElement(r); foreach (var rowCell in row.Cells) { Cell c = new Cell { DataType = CellValues.String, CellReference = rowCell.Key, CellValue = new CellValue(rowCell.Value) }; // cell openXmlWriter.WriteElement(c); } // end row openXmlWriter.WriteEndElement(); } // end sheet openXmlWriter.WriteEndElement(); } else { //this block is for writing all not so interesting parts of XML //but they are still are necessary if (openXmlReader.ElementType == typeof(Row) && openXmlReader.ElementType == typeof(Cell) && openXmlReader.ElementType == typeof(CellValue)) { openXmlReader.ReadNextSibling(); continue; } if (openXmlReader.IsStartElement) { openXmlWriter.WriteStartElement(openXmlReader); } else if (openXmlReader.IsEndElement) { openXmlWriter.WriteEndElement(); } } } openXmlReader.Close(); openXmlWriter.Close(); //after all modifications we switch sheets inserting //the new one to the original file Sheet sheet = workbookPart.Workbook .Descendants () .First(c => c.Id == originalSheetId); sheet.Id.Value = replacementPartId; //deleting the original worksheet workbookPart.DeletePart(worksheetPart); } Explanation: This code reads XML elements from a source Excel file one by one and copies its elements to a new sheet. After some manipulation of the data, it deletes the old sheet and saves the new one. Metrics Count of rows Time to process Memory gained (MB) 100 414ms 22 Mb 10 000 961ms 87 Mb 100 000 3s 488ms 492 Mb 1 000 000 30s 224ms over 4.5 GB As you can see, the speed of processing a large number of rows has significantly increased. However, we now have a memory issue that we need to address. Unexpected Memory Leaks: Unraveling the Enigma A discerning observer might have noticed an unexpected surge in memory consumption while processing 10 million cells in Excel. Although the weight of 1 million strings is considerable, it shouldn't account for such a substantial increase. After meticulous investigation with memory profilers, the culprit was identified within the OpenXML library. Specifically, the root cause can be traced to a flaw in the .NET package System.IO.Packaging, affecting both .NET Standard and .NET Core versions. Interestingly, this issue seems absent in classic .NET, likely due to differences in the underlying Windows Base code. Shortly, the OpenXML library uses ZipArchive in it, which copies data in MemoryStream each time when you update the file. It happens only if you open it in update mode, but you can’t do it in another way because it’s the behavior of .NET itself. For those interested in delving deeper into this issue, further details can be found at GitHub Issue #23750. Subsequently, after poring over the .NET source code and consulting peers facing similar challenges, I devised a workaround solution. If we can’t use the SpreadsheetDocument object to work with our Excel file in Open mode - let’s use it in Create mode with our own Package object. It will not use buggy ZipArchive under the hood and will work as it should. (Warning: this code works now only with OpenXML v.2.19.0 and earlier). Change our code to this: public class Builder { public async Task Build(string filePath, string sheetName, DataSet dataSet) { var workbookId = await FillData(filePath, sheetName, dataSet); await WriteAdditionalElements(filePath, sheetName, workbookId); } public async Task FillData(string filePath, string sheetName, DataSet excelDataRows) { //opening our file in create mode await using var fileStream = File.Create(filePath); using var package = Package.Open(fileStream, FileMode.Create, FileAccess.Write); using var excel = SpreadsheetDocument.Create(package, SpreadsheetDocumentType.Workbook); //adding new workbookpart excel.AddWorkbookPart(); var worksheetPart = excel.WorkbookPart.AddNewPart (); var workbookId = excel.WorkbookPart.GetIdOfPart(worksheetPart); //creating necessary worksheet and sheetdata OpenXmlWriter openXmlWriter = OpenXmlWriter.Create(worksheetPart); openXmlWriter.WriteStartElement(new Worksheet()); openXmlWriter.WriteStartElement(new SheetData()); // write data rows foreach (var row in excelDataRows.Rows.OrderBy(r => r.Index)) { Row r = new Row { RowIndex = (uint)row.Index }; openXmlWriter.WriteStartElement(r); foreach (var rowCell in row.Cells) { Cell c = new Cell { DataType = CellValues.String, CellReference = rowCell.Key }; //cell openXmlWriter.WriteStartElement(c); CellValue v = new CellValue(rowCell.Value); openXmlWriter.WriteElement(v); //cell end openXmlWriter.WriteEndElement(); } // end row openXmlWriter.WriteEndElement(); } //sheetdata end openXmlWriter.WriteEndElement(); //worksheet end openXmlWriter.WriteEndElement(); openXmlWriter.Close(); return workbookId; } public async Task WriteAdditionalElements(string filePath, string sheetName, string worksheetPartId) { //here we should add our workbook to the file //without this - our document will be incomplete await using var fileStream = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite, FileShare.None); using var package = Package.Open(fileStream, FileMode.Open, FileAccess.ReadWrite); using var excel = SpreadsheetDocument.Open(package); if (excel.WorkbookPart is null) throw new InvalidOperationException("Workbook part cannot be null!"); var xmlWriter = OpenXmlWriter.Create(excel.WorkbookPart); xmlWriter.WriteStartElement(new Workbook()); xmlWriter.WriteStartElement(new Sheets()); xmlWriter.WriteElement(new Sheet { Id = worksheetPartId, Name = sheetName, SheetId = 1 }); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.Close(); xmlWriter.Dispose(); } } And use it like this: var builder = new Builder(); await builder.Build(filePath, "Sheet1", dataSet); Metrics Count of rows Time to process Memory gained (MB) 100 291ms 18 Mb 10 000 940ms 62 Mb 100 000 3s 767ms 297 Mb 1 000 000 31s 354ms 2.7 GB Now, our measurements look satisfactory compared to the initial ones. Final Thoughts Initially, the showcased code serves purely demonstrative purposes. In practical applications, additional features such as support for various cell types or the replication of cell styles should be considered. Despite the significant optimizations demonstrated in the previous example, its direct application in real-world scenarios may not be feasible. Typically, for handling large Excel files, a chunk-based approach is more suitable. P.S.: If you prefer to avoid delving into the intricacies of generating office documents, you're welcome to explore my NuGet package, which simplifies and integrates all these functionalities seamlessly. Feature Image by vecstock on Freepik Importance of Excel reporting Common approach to generating Excel files Passing large datasets in Excel Unexpected memory leaks: unraveling the Enigma Final thoughts Importance of Excel reporting Common approach to generating Excel files Passing large datasets in Excel Unexpected memory leaks: unraveling the Enigma Final thoughts Importance of Excel Reporting In large enterprise companies, generating Excel reports has become an indispensable process for managing and analyzing extensive datasets efficiently. These reports are crucial for tracking performance metrics, financial records, and operational statistics, offering valuable insights that drive strategic decision-making. In such environments, automation tools that generate these files play a pivotal role in streamlining report creation and ensuring accuracy. As we advance into 2024, the ability to generate Excel files should be an easy and common task, right? Common Approach to Generating Excel Files To generate an Excel file with your own dataset, we will use the OpenXML library. The first thing you should do is install this library into your project: OpenXML dotnet add package DocumentFormat.OpenXml dotnet add package DocumentFormat.OpenXml After installing the necessary library and creating our template Excel file named “Test.xlsx,” we added this code to our application: // this custom type is for your input data public class DataSet { public List Rows { get; set; } } // this row will contain number of our row and info about each cell public class DataRow { public int Index { get; set; } public Dictionary Cells { get; set; } } private void SetValuesToExcel(string filePath, DataSet dataSet) { if (string.IsNullOrWhiteSpace(filePath)) { throw new FileNotFoundException($"File not found at this path: {filePath}"); } using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, true)) { //each excel document has XML-structure, //so we need to go deeper to our sheet WorkbookPart wbPart = document.WorkbookPart; //feel free to pass sheet name as parameter. //here we'll just use the default one Sheet theSheet = wbPart.Workbook .Descendants () .FirstOrDefault(s => s.Name.Value.Trim() == "Sheet1"); //next element in hierarchy is worksheetpart //we need to dive deeper to SheetData object WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(theSheet.Id)); Worksheet worksheet = wsPart.Worksheet; SheetData sheetData = worksheet.GetFirstChild (); //iterating through our data foreach (var dataRow in dataSet.Rows) { //getting Row element from Excel's DOM var rowIndex = dataRow.Index; var row = sheetData .Elements () .FirstOrDefault(r => r.RowIndex == rowIndex); //if there is no row - we'll create new one if (row == null) { row = new Row { RowIndex = (uint)rowIndex }; sheetData.Append(row); } //now we need to iterate though each cell in the row foreach (var dataCell in dataRow.Cells) { var cell = row.Elements () .FirstOrDefault(c => c.CellReference.Value == dataCell.Key); if (cell == null) { cell = new Cell { CellReference = dataCell.Key, DataType = CellValues.String }; row.AppendChild(cell); } cell.CellValue = new CellValue(dataCell.Value); } } //after all changes in Excel DOM we need to save it wbPart.Workbook.Save(); } } // this custom type is for your input data public class DataSet { public List Rows { get; set; } } // this row will contain number of our row and info about each cell public class DataRow { public int Index { get; set; } public Dictionary Cells { get; set; } } private void SetValuesToExcel(string filePath, DataSet dataSet) { if (string.IsNullOrWhiteSpace(filePath)) { throw new FileNotFoundException($"File not found at this path: {filePath}"); } using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, true)) { //each excel document has XML-structure, //so we need to go deeper to our sheet WorkbookPart wbPart = document.WorkbookPart; //feel free to pass sheet name as parameter. //here we'll just use the default one Sheet theSheet = wbPart.Workbook .Descendants () .FirstOrDefault(s => s.Name.Value.Trim() == "Sheet1"); //next element in hierarchy is worksheetpart //we need to dive deeper to SheetData object WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(theSheet.Id)); Worksheet worksheet = wsPart.Worksheet; SheetData sheetData = worksheet.GetFirstChild (); //iterating through our data foreach (var dataRow in dataSet.Rows) { //getting Row element from Excel's DOM var rowIndex = dataRow.Index; var row = sheetData .Elements () .FirstOrDefault(r => r.RowIndex == rowIndex); //if there is no row - we'll create new one if (row == null) { row = new Row { RowIndex = (uint)rowIndex }; sheetData.Append(row); } //now we need to iterate though each cell in the row foreach (var dataCell in dataRow.Cells) { var cell = row.Elements () .FirstOrDefault(c => c.CellReference.Value == dataCell.Key); if (cell == null) { cell = new Cell { CellReference = dataCell.Key, DataType = CellValues.String }; row.AppendChild(cell); } cell.CellValue = new CellValue(dataCell.Value); } } //after all changes in Excel DOM we need to save it wbPart.Workbook.Save(); } } And that is how to use the code above: var filePath = "Test.xlsx"; // number of rows that we want to add to our Excel file var testRowsCounter = 100; // creating some data for it var dataSet = new DataSet(); dataSet.Rows = new List (); string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; for (int i = 0; i (), Index = i + 1 }; for (int j = 0; j < 10; j++) { row.Cells.Add($"{alphabet[j]}{i+1}", Guid.NewGuid().ToString()); } dataSet.Rows.Add(row); } //passing path to our file and data object SetValuesToExcel(filePath, dataSet); var filePath = "Test.xlsx"; // number of rows that we want to add to our Excel file var testRowsCounter = 100; // creating some data for it var dataSet = new DataSet(); dataSet.Rows = new List (); string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; for (int i = 0; i (), Index = i + 1 }; for (int j = 0; j < 10; j++) { row.Cells.Add($"{alphabet[j]}{i+1}", Guid.NewGuid().ToString()); } dataSet.Rows.Add(row); } //passing path to our file and data object SetValuesToExcel(filePath, dataSet); Metrics Metrics Count of rows Time to process Memory gained (MB) 100 454ms 21 Mb 10 000 2.92s 132 Mb 100 000 10min 47s 270ms 333 Mb Count of rows Time to process Memory gained (MB) 100 454ms 21 Mb 10 000 2.92s 132 Mb 100 000 10min 47s 270ms 333 Mb Count of rows Time to process Memory gained (MB) Count of rows Count of rows Count of rows Time to process Time to process Time to process Memory gained (MB) Memory gained (MB) Memory gained (MB) 100 454ms 21 Mb 100 100 454ms 454ms 21 Mb 21 Mb 10 000 2.92s 132 Mb 10 000 10 000 2.92s 2.92s 132 Mb 132 Mb 100 000 10min 47s 270ms 333 Mb 100 000 100 000 10min 47s 270ms 10min 47s 270ms 333 Mb 333 Mb In this table, we tried to test our function with various numbers of rows. As expected - increasing number of rows will lead to decreasing of performance. To fix that, we can try another approach. Passing Large Datasets in Excel The approach demonstrated above is straightforward and sufficient for small datasets. However, as illustrated in the table, processing large datasets can be significantly slow. This method involves DOM manipulations, which are inherently slow. In such cases, the SAX (Simple API for XML) approach becomes invaluable. As the name suggests, SAX allows us to work directly with the XML of the Excel document, providing a more efficient solution for handling large datasets. Changing code from the first example to this: using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, true)) { WorkbookPart workbookPart = document.WorkbookPart; //we taking the original worksheetpart of our template WorksheetPart worksheetPart = workbookPart.WorksheetParts.First(); //adding the new one WorksheetPart replacementPart = workbookPart.AddNewPart (); string originalSheetId = workbookPart.GetIdOfPart(worksheetPart); string replacementPartId = workbookPart.GetIdOfPart(replacementPart); //the main idea is read through XML of original sheet object OpenXmlReader openXmlReader = OpenXmlReader.Create(worksheetPart); //and write it to the new one with some injection of our custom data OpenXmlWriter openXmlWriter = OpenXmlWriter.Create(replacementPart); while (openXmlReader.Read()) { if (openXmlReader.ElementType == typeof(SheetData)) { if (openXmlReader.IsEndElement) continue; // write sheet element openXmlWriter.WriteStartElement(new SheetData()); // write data rows foreach (var row in dataSet.Rows) { Row r = new Row { RowIndex = (uint)row.Index }; // start row openXmlWriter.WriteStartElement(r); foreach (var rowCell in row.Cells) { Cell c = new Cell { DataType = CellValues.String, CellReference = rowCell.Key, CellValue = new CellValue(rowCell.Value) }; // cell openXmlWriter.WriteElement(c); } // end row openXmlWriter.WriteEndElement(); } // end sheet openXmlWriter.WriteEndElement(); } else { //this block is for writing all not so interesting parts of XML //but they are still are necessary if (openXmlReader.ElementType == typeof(Row) && openXmlReader.ElementType == typeof(Cell) && openXmlReader.ElementType == typeof(CellValue)) { openXmlReader.ReadNextSibling(); continue; } if (openXmlReader.IsStartElement) { openXmlWriter.WriteStartElement(openXmlReader); } else if (openXmlReader.IsEndElement) { openXmlWriter.WriteEndElement(); } } } openXmlReader.Close(); openXmlWriter.Close(); //after all modifications we switch sheets inserting //the new one to the original file Sheet sheet = workbookPart.Workbook .Descendants () .First(c => c.Id == originalSheetId); sheet.Id.Value = replacementPartId; //deleting the original worksheet workbookPart.DeletePart(worksheetPart); } using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, true)) { WorkbookPart workbookPart = document.WorkbookPart; //we taking the original worksheetpart of our template WorksheetPart worksheetPart = workbookPart.WorksheetParts.First(); //adding the new one WorksheetPart replacementPart = workbookPart.AddNewPart (); string originalSheetId = workbookPart.GetIdOfPart(worksheetPart); string replacementPartId = workbookPart.GetIdOfPart(replacementPart); //the main idea is read through XML of original sheet object OpenXmlReader openXmlReader = OpenXmlReader.Create(worksheetPart); //and write it to the new one with some injection of our custom data OpenXmlWriter openXmlWriter = OpenXmlWriter.Create(replacementPart); while (openXmlReader.Read()) { if (openXmlReader.ElementType == typeof(SheetData)) { if (openXmlReader.IsEndElement) continue; // write sheet element openXmlWriter.WriteStartElement(new SheetData()); // write data rows foreach (var row in dataSet.Rows) { Row r = new Row { RowIndex = (uint)row.Index }; // start row openXmlWriter.WriteStartElement(r); foreach (var rowCell in row.Cells) { Cell c = new Cell { DataType = CellValues.String, CellReference = rowCell.Key, CellValue = new CellValue(rowCell.Value) }; // cell openXmlWriter.WriteElement(c); } // end row openXmlWriter.WriteEndElement(); } // end sheet openXmlWriter.WriteEndElement(); } else { //this block is for writing all not so interesting parts of XML //but they are still are necessary if (openXmlReader.ElementType == typeof(Row) && openXmlReader.ElementType == typeof(Cell) && openXmlReader.ElementType == typeof(CellValue)) { openXmlReader.ReadNextSibling(); continue; } if (openXmlReader.IsStartElement) { openXmlWriter.WriteStartElement(openXmlReader); } else if (openXmlReader.IsEndElement) { openXmlWriter.WriteEndElement(); } } } openXmlReader.Close(); openXmlWriter.Close(); //after all modifications we switch sheets inserting //the new one to the original file Sheet sheet = workbookPart.Workbook .Descendants () .First(c => c.Id == originalSheetId); sheet.Id.Value = replacementPartId; //deleting the original worksheet workbookPart.DeletePart(worksheetPart); } Explanation : This code reads XML elements from a source Excel file one by one and copies its elements to a new sheet. After some manipulation of the data, it deletes the old sheet and saves the new one. Explanation Metrics Metrics Count of rows Time to process Memory gained (MB) 100 414ms 22 Mb 10 000 961ms 87 Mb 100 000 3s 488ms 492 Mb 1 000 000 30s 224ms over 4.5 GB Count of rows Time to process Memory gained (MB) 100 414ms 22 Mb 10 000 961ms 87 Mb 100 000 3s 488ms 492 Mb 1 000 000 30s 224ms over 4.5 GB Count of rows Time to process Memory gained (MB) Count of rows Count of rows Count of rows Time to process Time to process Time to process Memory gained (MB) Memory gained (MB) Memory gained (MB) 100 414ms 22 Mb 100 100 414ms 414ms 22 Mb 22 Mb 10 000 961ms 87 Mb 10 000 10 000 961ms 961ms 87 Mb 87 Mb 100 000 3s 488ms 492 Mb 100 000 100 000 3s 488ms 3s 488ms 492 Mb 492 Mb 1 000 000 30s 224ms over 4.5 GB 1 000 000 1 000 000 30s 224ms 30s 224ms over 4.5 GB over 4.5 GB As you can see, the speed of processing a large number of rows has significantly increased. However, we now have a memory issue that we need to address. Unexpected Memory Leaks: Unraveling the Enigma Unexpected Memory Leaks: Unraveling the Enigma A discerning observer might have noticed an unexpected surge in memory consumption while processing 10 million cells in Excel. Although the weight of 1 million strings is considerable, it shouldn't account for such a substantial increase. After meticulous investigation with memory profilers, the culprit was identified within the OpenXML library. Specifically, the root cause can be traced to a flaw in the .NET package System.IO.Packaging, affecting both .NET Standard and .NET Core versions. Interestingly, this issue seems absent in classic .NET, likely due to differences in the underlying Windows Base code. Shortly, the OpenXML library uses ZipArchive in it, which copies data in MemoryStream each time when you update the file. It happens only if you open it in update mode, but you can’t do it in another way because it’s the behavior of .NET itself. For those interested in delving deeper into this issue, further details can be found at GitHub Issue #23750 . GitHub Issue #23750 Subsequently, after poring over the .NET source code and consulting peers facing similar challenges, I devised a workaround solution. If we can’t use the SpreadsheetDocument object to work with our Excel file in Open mode - let’s use it in Create mode with our own Package object. It will not use buggy ZipArchive under the hood and will work as it should. (Warning: this code works now only with OpenXML v.2.19.0 and earlier). Change our code to this: public class Builder { public async Task Build(string filePath, string sheetName, DataSet dataSet) { var workbookId = await FillData(filePath, sheetName, dataSet); await WriteAdditionalElements(filePath, sheetName, workbookId); } public async Task FillData(string filePath, string sheetName, DataSet excelDataRows) { //opening our file in create mode await using var fileStream = File.Create(filePath); using var package = Package.Open(fileStream, FileMode.Create, FileAccess.Write); using var excel = SpreadsheetDocument.Create(package, SpreadsheetDocumentType.Workbook); //adding new workbookpart excel.AddWorkbookPart(); var worksheetPart = excel.WorkbookPart.AddNewPart (); var workbookId = excel.WorkbookPart.GetIdOfPart(worksheetPart); //creating necessary worksheet and sheetdata OpenXmlWriter openXmlWriter = OpenXmlWriter.Create(worksheetPart); openXmlWriter.WriteStartElement(new Worksheet()); openXmlWriter.WriteStartElement(new SheetData()); // write data rows foreach (var row in excelDataRows.Rows.OrderBy(r => r.Index)) { Row r = new Row { RowIndex = (uint)row.Index }; openXmlWriter.WriteStartElement(r); foreach (var rowCell in row.Cells) { Cell c = new Cell { DataType = CellValues.String, CellReference = rowCell.Key }; //cell openXmlWriter.WriteStartElement(c); CellValue v = new CellValue(rowCell.Value); openXmlWriter.WriteElement(v); //cell end openXmlWriter.WriteEndElement(); } // end row openXmlWriter.WriteEndElement(); } //sheetdata end openXmlWriter.WriteEndElement(); //worksheet end openXmlWriter.WriteEndElement(); openXmlWriter.Close(); return workbookId; } public async Task WriteAdditionalElements(string filePath, string sheetName, string worksheetPartId) { //here we should add our workbook to the file //without this - our document will be incomplete await using var fileStream = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite, FileShare.None); using var package = Package.Open(fileStream, FileMode.Open, FileAccess.ReadWrite); using var excel = SpreadsheetDocument.Open(package); if (excel.WorkbookPart is null) throw new InvalidOperationException("Workbook part cannot be null!"); var xmlWriter = OpenXmlWriter.Create(excel.WorkbookPart); xmlWriter.WriteStartElement(new Workbook()); xmlWriter.WriteStartElement(new Sheets()); xmlWriter.WriteElement(new Sheet { Id = worksheetPartId, Name = sheetName, SheetId = 1 }); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.Close(); xmlWriter.Dispose(); } } public class Builder { public async Task Build(string filePath, string sheetName, DataSet dataSet) { var workbookId = await FillData(filePath, sheetName, dataSet); await WriteAdditionalElements(filePath, sheetName, workbookId); } public async Task FillData(string filePath, string sheetName, DataSet excelDataRows) { //opening our file in create mode await using var fileStream = File.Create(filePath); using var package = Package.Open(fileStream, FileMode.Create, FileAccess.Write); using var excel = SpreadsheetDocument.Create(package, SpreadsheetDocumentType.Workbook); //adding new workbookpart excel.AddWorkbookPart(); var worksheetPart = excel.WorkbookPart.AddNewPart (); var workbookId = excel.WorkbookPart.GetIdOfPart(worksheetPart); //creating necessary worksheet and sheetdata OpenXmlWriter openXmlWriter = OpenXmlWriter.Create(worksheetPart); openXmlWriter.WriteStartElement(new Worksheet()); openXmlWriter.WriteStartElement(new SheetData()); // write data rows foreach (var row in excelDataRows.Rows.OrderBy(r => r.Index)) { Row r = new Row { RowIndex = (uint)row.Index }; openXmlWriter.WriteStartElement(r); foreach (var rowCell in row.Cells) { Cell c = new Cell { DataType = CellValues.String, CellReference = rowCell.Key }; //cell openXmlWriter.WriteStartElement(c); CellValue v = new CellValue(rowCell.Value); openXmlWriter.WriteElement(v); //cell end openXmlWriter.WriteEndElement(); } // end row openXmlWriter.WriteEndElement(); } //sheetdata end openXmlWriter.WriteEndElement(); //worksheet end openXmlWriter.WriteEndElement(); openXmlWriter.Close(); return workbookId; } public async Task WriteAdditionalElements(string filePath, string sheetName, string worksheetPartId) { //here we should add our workbook to the file //without this - our document will be incomplete await using var fileStream = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite, FileShare.None); using var package = Package.Open(fileStream, FileMode.Open, FileAccess.ReadWrite); using var excel = SpreadsheetDocument.Open(package); if (excel.WorkbookPart is null) throw new InvalidOperationException("Workbook part cannot be null!"); var xmlWriter = OpenXmlWriter.Create(excel.WorkbookPart); xmlWriter.WriteStartElement(new Workbook()); xmlWriter.WriteStartElement(new Sheets()); xmlWriter.WriteElement(new Sheet { Id = worksheetPartId, Name = sheetName, SheetId = 1 }); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.Close(); xmlWriter.Dispose(); } } And use it like this: var builder = new Builder(); await builder.Build(filePath, "Sheet1", dataSet); var builder = new Builder(); await builder.Build(filePath, "Sheet1", dataSet); Metrics Metrics Count of rows Time to process Memory gained (MB) 100 291ms 18 Mb 10 000 940ms 62 Mb 100 000 3s 767ms 297 Mb 1 000 000 31s 354ms 2.7 GB Count of rows Time to process Memory gained (MB) 100 291ms 18 Mb 10 000 940ms 62 Mb 100 000 3s 767ms 297 Mb 1 000 000 31s 354ms 2.7 GB Count of rows Time to process Memory gained (MB) Count of rows Count of rows Count of rows Time to process Time to process Time to process Memory gained (MB) Memory gained (MB) Memory gained (MB) 100 291ms 18 Mb 100 100 291ms 291ms 18 Mb 18 Mb 10 000 940ms 62 Mb 10 000 10 000 940ms 940ms 62 Mb 62 Mb 100 000 3s 767ms 297 Mb 100 000 100 000 3s 767ms 3s 767ms 297 Mb 297 Mb 1 000 000 31s 354ms 2.7 GB 1 000 000 1 000 000 31s 354ms 31s 354ms 2.7 GB 2.7 GB Now, our measurements look satisfactory compared to the initial ones. Final Thoughts Initially, the showcased code serves purely demonstrative purposes. In practical applications, additional features such as support for various cell types or the replication of cell styles should be considered. Despite the significant optimizations demonstrated in the previous example, its direct application in real-world scenarios may not be feasible. Typically, for handling large Excel files, a chunk-based approach is more suitable. P.S.: If you prefer to avoid delving into the intricacies of generating office documents, you're welcome to explore my NuGet package , which simplifies and integrates all these functionalities seamlessly. P.S.: package Feature Image by vecstock on Freepik