CS50: Week 5.
Published 19 July 2020 at Yours, Kewbish. Also published on Dev.to. 1,264 words. Subscribe via RSS.
This post is unlisted and has been archived. This doesn't represent my best work; please check out the posts listed here instead.
Introduction
Finally, we’re over the worst of CS50 (in my opinion, at least). Week 5 was a bit of a difficult lesson and problem set, but in the end, it actually wasn’t as hard as I thought it’d be.1 Week 5 covers data structures - detailing hash tables, linked lists, and tries, which are a combination of both! Speller was a decent challenge, and for once, I really felt that I was applying what I learned in lecture to the problem. This is the week I was looking the most forward to (my original purpose for taking CS50 was for data structures and algorithms after all), and dreading as well.
Notes
Alright, here are my notes:
- you can’t reassign something if it doesn’t exist yet
- remember to initialize to a chunk of memory
- also, you should remember malloc’s effects if you reassign, and free if you reassign a malloc
- arrays are difficult to resize, because they’re initialized to a certain amount of memory
- could move a copy of the array to a larger, free area
- then can delete old copy
- we could also use realloc
- as its name implies, it reallocates memory
- give it the pointer of the old array
- will return address of new array
- remember to free variable
- data structures are custom structures to store information
- made with structs
- builds off included data types
- linked lists
- basically an array, but each element points to the next one
- elements are not together in memory
- each element includes a pointer to the next
- can’t access the middle of the list with just [x] notation
- there isn’t a ‘middle’
- need to navigate through the entire list first
- also takes twice as much memory per element
- needs to store the next pointer
- usually constructed of a struct
- one part to store the actual data, and the same struct pointing to the next struct
- initialize first to NULL, so you can assign
- introduce a new notation, -> notation
- similar to dot notation of a pointer
node->next = x;
- need to use a while loop to iterate through the properties
- check if not NULL
- set the variable
- if you want to add to the beginning
- set a pointer to point to the beginning
- then set the list to the last pointer
- inserts a node at the beginning
- to insert in the middle
- do something similar
- need to create a temporary variable for the swap
- linked lists are O(n) time, need to follow each node pointer to find the next
- also introduces a tree
- each node points to two nodes, like the famous binary search tree
- makes binary search very easy, only compare two nodes
- makes insertion easy as well, only rearrange a small subset
- search is O(log n)
- need to balance these though, or else may become reweighted
- also memory-expensive, but can search faster
- hashtable combines arrays and linked lists
- each element in the array is a linked list
- can add elements quickly, and the initial searching time is decreased
- however, they might all end up in the same element, in which case the time efficiency is negated
- get as close as possible to O(1) when the number of elements equals the possible values
- retrieval tree provides O(1) searching, but at cost of space
- stores each level of element (here, letters) in a separate array
- in this example, 26x as much memory
- more data structures
- stacks -> last in, first out, like email inbox
- queue -> first in, first out, like line in a store
- dictionary -> map keys to values, like Python!
- these data structures can be implemented with arrays, linked lists, hashtables, and other structures
Problem Set
This week wasn’t much of a problem set - we only had Speller to work through. But don’t underestimate it either - it took two consecutive days of work to finish out. The logic wasn’t too hard to implement, actually. First, I split up the problem in its subparts
hash
-> I decided to use the simplest hash function - just the first character. Could this be optimized? Yes, but I just wanted to try the data structure out first, and not have to worry about copying a hash function that I didn’t completely understand either.load
-> Made an array, and I’d put each word into its appropriate element. I just appended the current word to the end of the linked list, and lowercased it as well.size
-> In load, I’d created a line to increment a global variable, which made size just areturn count;
statement.check
-> I hashed the current word to compare, and then used a while loop to iterate over the linked list and checked if it matched the current targeted word.unload
-> I iterated over each element in the array, and again iterated over the linked list to free each pointer in the list.
The first time I wrote the program, it worked as intended, so check. But, upon check50-ing, I got a bunch of valgrind errors. I had forgotten to free a bunch of malloc’ed variables, and to fix this, I tried to use a character array instead. Also, I finally learned to use valgrind properly - I’d kind of ignored it in the past week, given that there wasn’t a check for a memory leak, just a reminder that it could exist. I also realized that my unload function was logically incorrect, and would always return true right away. After fixing this, I attempted to test it - but now, it didn’t produce the intended output. Oops.
I stripped out the entire program, and rewrote it from scratch, including what I’d learned about valgrind and memory allocation in the first runthrough. (Now that I think about it, I did have some confusion about >
vs >>
in bash. They overwrite a file while redirecting output and append to a file while redirecting, respectively. I probably mixed >
up with >>
, and that’s actually probably why it didn’t work as intended, meaning the logic was solid initially, meaning I rewrote it from scratch for essentially nothing. Big facepalm.)
Now, I was memory leak free, and working with the correct output. Nice! In the process of scrolling dozens of Stack Overflow pages, I really learned to appreciate the full power of valgrind. It’s a great resource for checking memory leaks, and despite its rather scary, confusing interface, it’s an essential tool, really. I never had to keep memory management in mind while Pythoning, but C made me more cognizant of the lower-level management that goes into the easy-to-learn Python flow.
Conclusion
Am I going to continue with C? I doubt it, unless I’m just making a toy thing, or really want performance for something. Would I have ignored C if I were to take some version of CS50 where I didn’t need CS50? Probably not. I’ve learned a lot about lower level things, like memory management and bytes, as well as getting a glimpse into how easy-to-use features in Python were really implemented. As other people have said, working with C makes one really appreciate how nice higher level languages like Python are to work with, and it was a great experience.
That said, I still can’t wait to get into the later half of CS50 - Python, SQL, and web? Yes, thank you very much. I guess this would be the more application side of CS50, and I’m excited to get learning.
-
The CS50 Reddit and my initial lack of experience with data structures were pretty scary. But I’d rate this problem set lower than Tideman in difficulty, honestly. ↩︎