CS50: Week 5.

Published 19 July 2020 at Yours, Kewbish. Also published on Dev.to. 1,264 words. Subscribe via RSS.

This post is unlisted and has been archived. This doesn't represent my best work; please check out the posts listed here instead.

Introduction

Finally, we’re over the worst of CS50 (in my opinion, at least). Week 5 was a bit of a difficult lesson and problem set, but in the end, it actually wasn’t as hard as I thought it’d be.¹ Week 5 covers data structures - detailing hash tables, linked lists, and tries, which are a combination of both! Speller was a decent challenge, and for once, I really felt that I was applying what I learned in lecture to the problem. This is the week I was looking the most forward to (my original purpose for taking CS50 was for data structures and algorithms after all), and dreading as well.

Notes

Alright, here are my notes:

you can’t reassign something if it doesn’t exist yet
- remember to initialize to a chunk of memory
also, you should remember malloc’s effects if you reassign, and free if you reassign a malloc
arrays are difficult to resize, because they’re initialized to a certain amount of memory
- could move a copy of the array to a larger, free area
- then can delete old copy
we could also use realloc
- as its name implies, it reallocates memory
- give it the pointer of the old array
- will return address of new array
- remember to free variable
data structures are custom structures to store information
- made with structs
- builds off included data types
linked lists
- basically an array, but each element points to the next one
- elements are not together in memory
- each element includes a pointer to the next
can’t access the middle of the list with just [x] notation
- there isn’t a ‘middle’
- need to navigate through the entire list first
also takes twice as much memory per element
- needs to store the next pointer
usually constructed of a struct
- one part to store the actual data, and the same struct pointing to the next struct
- initialize first to NULL, so you can assign
introduce a new notation, -> notation
- similar to dot notation of a pointer
- node->next = x;
need to use a while loop to iterate through the properties
- check if not NULL
- set the variable
if you want to add to the beginning
- set a pointer to point to the beginning
- then set the list to the last pointer
- inserts a node at the beginning
to insert in the middle
- do something similar
- need to create a temporary variable for the swap
linked lists are O(n) time, need to follow each node pointer to find the next
also introduces a tree
- each node points to two nodes, like the famous binary search tree
- makes binary search very easy, only compare two nodes
- makes insertion easy as well, only rearrange a small subset
- search is O(log n)
need to balance these though, or else may become reweighted
- also memory-expensive, but can search faster
hashtable combines arrays and linked lists
- each element in the array is a linked list
- can add elements quickly, and the initial searching time is decreased
- however, they might all end up in the same element, in which case the time efficiency is negated
- get as close as possible to O(1) when the number of elements equals the possible values
retrieval tree provides O(1) searching, but at cost of space
- stores each level of element (here, letters) in a separate array
- in this example, 26x as much memory
more data structures
- stacks -> last in, first out, like email inbox
- queue -> first in, first out, like line in a store
- dictionary -> map keys to values, like Python!
these data structures can be implemented with arrays, linked lists, hashtables, and other structures

Problem Set

This week wasn’t much of a problem set - we only had Speller to work through. But don’t underestimate it either - it took two consecutive days of work to finish out. The logic wasn’t too hard to implement, actually. First, I split up the problem in its subparts

hash -> I decided to use the simplest hash function - just the first character. Could this be optimized? Yes, but I just wanted to try the data structure out first, and not have to worry about copying a hash function that I didn’t completely understand either.
load -> Made an array, and I’d put each word into its appropriate element. I just appended the current word to the end of the linked list, and lowercased it as well.
size -> In load, I’d created a line to increment a global variable, which made size just a return count; statement.
check -> I hashed the current word to compare, and then used a while loop to iterate over the linked list and checked if it matched the current targeted word.
unload -> I iterated over each element in the array, and again iterated over the linked list to free each pointer in the list.

The first time I wrote the program, it worked as intended, so check. But, upon check50-ing, I got a bunch of valgrind errors. I had forgotten to free a bunch of malloc’ed variables, and to fix this, I tried to use a character array instead. Also, I finally learned to use valgrind properly - I’d kind of ignored it in the past week, given that there wasn’t a check for a memory leak, just a reminder that it could exist. I also realized that my unload function was logically incorrect, and would always return true right away. After fixing this, I attempted to test it - but now, it didn’t produce the intended output. Oops.

I stripped out the entire program, and rewrote it from scratch, including what I’d learned about valgrind and memory allocation in the first runthrough. (Now that I think about it, I did have some confusion about > vs >> in bash. They overwrite a file while redirecting output and append to a file while redirecting, respectively. I probably mixed > up with >>, and that’s actually probably why it didn’t work as intended, meaning the logic was solid initially, meaning I rewrote it from scratch for essentially nothing. Big facepalm.)

Now, I was memory leak free, and working with the correct output. Nice! In the process of scrolling dozens of Stack Overflow pages, I really learned to appreciate the full power of valgrind. It’s a great resource for checking memory leaks, and despite its rather scary, confusing interface, it’s an essential tool, really. I never had to keep memory management in mind while Pythoning, but C made me more cognizant of the lower-level management that goes into the easy-to-learn Python flow.

Conclusion

Am I going to continue with C? I doubt it, unless I’m just making a toy thing, or really want performance for something. Would I have ignored C if I were to take some version of CS50 where I didn’t need CS50? Probably not. I’ve learned a lot about lower level things, like memory management and bytes, as well as getting a glimpse into how easy-to-use features in Python were really implemented. As other people have said, working with C makes one really appreciate how nice higher level languages like Python are to work with, and it was a great experience.

That said, I still can’t wait to get into the later half of CS50 - Python, SQL, and web? Yes, thank you very much. I guess this would be the more application side of CS50, and I’m excited to get learning.

The CS50 Reddit and my initial lack of experience with data structures were pretty scary. But I’d rate this problem set lower than Tideman in difficulty, honestly. ↩︎

‹ go back