Tech
Read the large 50 gb csv file with python code, only with chunks

Read the large 50 gb csv file with python code, only with chunks

Certainly, here’s an example Python code that can be used to read a large 50 GB CSV file in chunks:

pythonCopy codeimport pandas as pd

# set the file path and name of the CSV file
file_path = "path/to/csv/file.csv"

# define the chunk size
chunk_size = 100000

# create an empty list to store the chunks
chunks = []

# iterate through the CSV file using pandas chunksize
for chunk in pd.read_csv(file_path, chunksize=chunk_size):
    chunks.append(chunk)
    # process the chunk here as needed

# concatenate the chunks into a single pandas dataframe
df = pd.concat(chunks, ignore_index=True)

In this example code, we use the pandas library to read in the CSV file in chunks. We set the file path and name of the CSV file, and then define a chunk size of 100,000 rows.

We create an empty list called chunks to store each chunk of data that is read in. We then iterate through the CSV file using pandas read_csv function with the specified chunk size.

For each chunk, we append it to the chunks list. You can then process each chunk as needed, such as performing data cleaning, transformation or analysis.

Finally, we use the pandas concat function to concatenate all of the chunks into a single pandas dataframe called df. The ignore_index=True parameter ensures that the index is reset after concatenation.

Using this approach, you can read a large CSV file in chunks and process each chunk as needed, without having to load the entire file into memory at once. This can help to conserve system resources and improve performance.