Pickling & CSV
Preservation through Serialization and Tabulation
Pickle
Module for (de)serialization: Storing complete Python objects into files and later
loading them back.
● Supports almost all data types – good.
● Works only with Python – bad.
import pickle
pickle.dump(object, openBinaryFile) # Save object to an open file
object = pickle.load(openBinaryFile) # Restore an object from an open file
2
What Is CSV?
● “Comma Separated Values”
● Tabular file with rows and columns.
● All rows have the same number of fields.
● Fields separated by commas.
○ “Commas” do not have to be commas. Any other character can be used, such as TAB (TSV,
“tab separated values”), vertical bar, space...
● The first row often serves as headers.
3
CSV Example
4
Student, ID, E-mail Address, Phone Number, Class, Academic Level
“Almarar, Hassan A”, 16897**, halmarar2@suffolk.edu, Junior, UG
“Arakelyan, Artur”, 17577**, aarakelyan@suffolk.edu, Sophomore, UG
“Batista, Christopher A”, 16357**, cbatista@suffolk.edu, Senior, UG
Complete file...
Reading CSV
import csv
with open("path-words.csv") as csvfileIn:
reader = csv.reader(csvfileIn, delimiter=',', quotechar='"')
# Returns the next row parsed as a list, if necessary
headers = next(reader)
# Process the rest of the file
for row in reader:
do_something(row)
# Or, since reader is a generator:
all_rows = list(reader)
5
Writing CSV
import csv
with open("path-words.csv", "w") as csvfileOut:
writer = csv.writer(csvfileOut, delimiter=',', quotechar='"')
writer.writerow([..., ..., ...]) # Write headers
# Write the rest of the file; each row is a list of strings or numbers
writer.writerows([row1, row2, row3 ...])
6
Example: Who Are the Students? (students.py)
import csv, collections
with open("class-2017.csv") as mystudents:
reader = csv.reader(mystudents)
headers = next(reader)
class_position = headers.index("Class") # Where is the Class column?
class_levels = [row[class_position] for row in reader]
who_s_who = collections.Counter(class_levels) # Summary
with open("class-summary.csv", "w") as levels:
writer = csv.writer(levels)
writer.writerow(['Class', 'count']) # New headers
writer.writerows(who_s_who.items()) # New content
7
Whenever
possible, use
Pandas
8

Pickling and CSV

  • 1.
    Pickling & CSV Preservationthrough Serialization and Tabulation
  • 2.
    Pickle Module for (de)serialization:Storing complete Python objects into files and later loading them back. ● Supports almost all data types – good. ● Works only with Python – bad. import pickle pickle.dump(object, openBinaryFile) # Save object to an open file object = pickle.load(openBinaryFile) # Restore an object from an open file 2
  • 3.
    What Is CSV? ●“Comma Separated Values” ● Tabular file with rows and columns. ● All rows have the same number of fields. ● Fields separated by commas. ○ “Commas” do not have to be commas. Any other character can be used, such as TAB (TSV, “tab separated values”), vertical bar, space... ● The first row often serves as headers. 3
  • 4.
    CSV Example 4 Student, ID,E-mail Address, Phone Number, Class, Academic Level “Almarar, Hassan A”, 16897**, [email protected], Junior, UG “Arakelyan, Artur”, 17577**, [email protected], Sophomore, UG “Batista, Christopher A”, 16357**, [email protected], Senior, UG Complete file...
  • 5.
    Reading CSV import csv withopen("path-words.csv") as csvfileIn: reader = csv.reader(csvfileIn, delimiter=',', quotechar='"') # Returns the next row parsed as a list, if necessary headers = next(reader) # Process the rest of the file for row in reader: do_something(row) # Or, since reader is a generator: all_rows = list(reader) 5
  • 6.
    Writing CSV import csv withopen("path-words.csv", "w") as csvfileOut: writer = csv.writer(csvfileOut, delimiter=',', quotechar='"') writer.writerow([..., ..., ...]) # Write headers # Write the rest of the file; each row is a list of strings or numbers writer.writerows([row1, row2, row3 ...]) 6
  • 7.
    Example: Who Arethe Students? (students.py) import csv, collections with open("class-2017.csv") as mystudents: reader = csv.reader(mystudents) headers = next(reader) class_position = headers.index("Class") # Where is the Class column? class_levels = [row[class_position] for row in reader] who_s_who = collections.Counter(class_levels) # Summary with open("class-summary.csv", "w") as levels: writer = csv.writer(levels) writer.writerow(['Class', 'count']) # New headers writer.writerows(who_s_who.items()) # New content 7
  • 8.