Read the large 50 gb csv file with python code, only with chunks
Certainly, here’s an example Python code that can be used to read a large 50 GB CSV file in chunks:
import pandas as pd # set the file path and name of the CSV file file_path = "path/to/csv/file.csv" # define the chunk size chunk_size = 100000 # create an empty list to store the chunks chunks =  # iterate through the CSV file using pandas chunksize for chunk in pd.read_csv(file_path, chunksize=chunk_size): chunks.append(chunk) # process the chunk here as needed # concatenate the chunks into a single pandas dataframe df = pd.concat(chunks, ignore_index=True)
In this example code, we use the
pandas library to read in the CSV file in chunks. We set the file path and name of the CSV file, and then define a chunk size of 100,000 rows.
We create an empty list called
chunks to store each chunk of data that is read in. We then iterate through the CSV file using pandas
read_csv function with the specified chunk size.
For each chunk, we append it to the
chunks list. You can then process each chunk as needed, such as performing data cleaning, transformation or analysis.
Finally, we use the pandas
concat function to concatenate all of the chunks into a single pandas dataframe called
ignore_index=True parameter ensures that the index is reset after concatenation.
Using this approach, you can read a large CSV file in chunks and process each chunk as needed, without having to load the entire file into memory at once. This can help to conserve system resources and improve performance.