Reading Club: The Book Ch 4 "Understanding Ownership" [PROJECT]

maegul@lemmy.ml · 14 days ago

Reading Club: The Book Ch 4 "Understanding Ownership" [PROJECT]

maegul@lemmy.ml · 5 days ago

Related comment in a separate post: https://lemmy.ml/post/16197939

Provides IMO some really helpful perspective on what references are in rust and how they should be seen and used (in short, they are much more restrictive and constraining than The Book would tell you and should be used conservatively for this reason).

maegul@lemmy.ml · 14 days ago

If I had to explain ownership in rust (based on The Book, Ch 4)

I had a crack at this and found myself writing for a while. I thought I’d pitch at a basic level and try to provide a sort core and essential conceptual basis (something which I think The Book might be lacking a little??)

Dunno if this will be useful or interesting to anyone, but I found it useful to write. If anyone does find any significant errors, please let me know!

General Idea or Purpose

Generally, the whole point is to prevent memory mismanagement.
IE “undefined behaviour”: whenever memory can be read/written when it is no longer controlled by a variable in the program.
- Rust leans a little cautious in preventing this. It will raise compilation errors for some code that won’t actually cause undefined. And this is in large part, AFAICT, because its means of detecting how long a variable “lives” can be somewhat course/incomplete (see the Rustonomicon). Thus, rust enforces relatively clean variable management, and simply copying data will probably be worth it at times.

Ownership

Variables live in, or are “owned by” a particular scope (or stack frames, eg functions).
Data, memory, or “values” are owned by variables, and only one at a time.
Variables are stuck in their scopes (they live and die in a single scope and can’t be moved out).
Data or memory can be moved from one owning variable to another. In doing so they can also move from one scope to another (eg, by passing a variable into a function).
Once a variable has its data/memory moved to another, that variable is dead.
If data/memory is not moved away from its variable by the completion of its scope, that data/memory “dies” along with the variable (IE, the memory is deallocated).

// > ON THE HEAP

// Ownership will be "moved" into this function's scope
fn take_ownership_heap(_: Vec<i32>) {}

let a = vec![1, 2, 3];
take_ownership_heap(a);

// ERROR
let b = a[0];
// CAN'T DO: value of `a` is borrowed/used after move
// `a` is now "dead", it died in `take_ownership_heap()`;

Variables of data on the stack (eg integers) are implicitly copied (as copying basic data types like integers is cheap and unproblematic), so ownership isn’t so much of an issue.
Copying (or cloning) data/memory on the heap is not trivial and so must be done explicitly (eg, with my_variable.copy()) and in the case of custom types (eg structs) added to or implemented for that particular type (which isn’t necessarily difficult).

// > ON THE STACK

// An integer will copied into `_`, and no ownership will be moved
fn take_ownership_stack(_: i32) {}

let x = 11;
take_ownership_stack(x);

let y = x * 10;
// NOT A PROBLEM, as x was copied into take_ownerhsip_stack

Borrowing (with references)

Data can be “borrowed” without taking ownership.
This kind of variable is a “reference” (AKA a “non-owning pointer”).
As the variable doesn’t “own” the data, the data can “outlive” the reference.
- Useful for passing a variable’s data into a function without it “dying” in that function.

fn borrow_heap(_: &Vec<i32>) {}

let e = vec![1, 2, 3];
// pass in a reference
borrow_heap(&e);

let f = e[0];
// NOT A PROBLEM, as the data survived `borrow_heap`
// because `e` retained ownership.
// &e, a reference, only "borrowed" the data

But it also means that the abilities or “permissions” of the reference with respect to the data are limited and more closely managed in order to prevent undefined behaviour.
The chief limitation is that two references cannot exist at the same time where one can mutate the data it points to and another can read the same data.
Multiple references can exist that only have permission to read the same data, that’s fine.
The basic idea is to prevent data from being altered/mutated while something else is reading the same data, as this is a common cause of problems.
Commonly expressed as Pointer Safety Principle: data should never be aliased and mutated at the same time.
For this reason, shared references are “read only” references, while unique references are mutable references that enable their underlying data to be mutated (AKA, mutable references).
- A minor confusion that can arise here is between mutable or unique references and reference variables that are mutable. A unique reference is able to mutate the data pointed to. While a mutable variable that is also a reference can have its pointer and the data/memory and points to mutated. These are independent aspects and can be freely combined.
- Perhaps easily understood by recognising that a reference is just another variable whose data is a pointer or memory address.
Additionally, while variables of data on the stack typically don’t run into ownership issues because whenever ownership would be moved the data is implicitly copied, references to such variables can exist and they are subject to the same rules and monitoring by the compiler.

// >>> Can have multiple "shared references"

let e_ref1 = &e;
let e_ref2 = &e;

let e1 = e_ref1[0];
let e2 = e_ref2[0];

// >>> CANNOT have shared and mutable/unique references

let mut j = vec![1, 2, 3];

// A single mutable or "unique" reference
let j_mut_ref = &mut j;
// can mutate the actual vector
j_mut_ref[0] = 11;

// ERROR
let j_ref = &j;
// CANNOT now have another shared/read-only reference while also having a mutable one (j_mut_ref)
// mutation actually needs to occur after the shared reference is created
// in order for rust to care, otherwise it can recognise that the mutable
// reference is no longer used and so doesn't matter any more
j_mut_ref[1] = 22;

// same as above but for stack data
let mut j_int = 11;
let j_int_mut_ref = &mut j_int;
// ERROR
let j_int_ref = &j_int;
// CANNOT assign another reference as mutable reference already exists

*j_int_mut_ref = 22;
// dereference to mutate here and force rust to think the mutable reference is still "alive"

Ownership and read/write permissions are altered when references are created

The state of a variable’s ownership and read-only or mutable permissions is not really static.
Instead, they are altered as variables and references are created, used, left unused and then “die” (ie, depending on their “life times”).
This is because the problem being averted is multiple variables mangling the same data. So what a variable or reference can or cannot do depends on what other related variables exist and what they are able to do.
Generally, these “abilities” can be thought of as “permissions”.
- “Ownership”: the permission a variable has to move its ownership to another variable or “kill” the “owned” data/memory when the variable falls out of scope.
- “Read”: permission to read the data
- “Write”: permission to mutate the data or write to the referenced heap memory
As an example of permissions changing: a variable loses “ownership” of its data when a reference to it is created. This prevents a variable from taking its data into another scope and then potentially “dying” and being deallocated for a reference to that memory to then be used and read or write random/arbitrary data from the now deallocated memory.
Similarly, a variable that owns its data/memory/value will lose all permissions if a mutable reference (or unique reference) is made to the same data/variable. This is why a mutable reference is also known as a unique reference.
Permissions are returned when the reference(s) that altered permissions are no longer used, or “die” (IE, their lifetime comes to an end).

// >>> References remove ownership permissions

fn take_ownership_heap(_: Vec<i32>) {}

let k = vec![1, 2, 3];

let k_ref = &k;

// ERROR
take_ownership_heap(k);
// Cannot move out of `k` into `take_ownership_heap()` as it is currently borrowed
let k1 = k_ref[0];
// if the shared reference weren't used here, rust be happy...
// as the reference's lifetime would be considered over

// >>> Mutable reference remove read permissions

let mut m = 13;

let m_mut_ref = &mut m;

// ERROR
let n = m * 10;
// CANNOT read or use `m` as it's mutably borrowed
*m_mut_ref += 1;
// again, must use the mutable reference here to "keep it alive"

Lifetimes are coming

fn first_or(strings: &Vec<String>, default: &String) -> &String {
    if strings.len() > 0 {
        &strings[0]
    } else {
        default
    }
}

// Does not compile
error[E0106]: missing lifetime specifier
 --> test.rs:1:57
  |
1 | fn first_or(strings: &Vec<String>, default: &String) -> &String {
  |                      ------------           -------     ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `strings` or `default`

In all of the above, the dynamics of what permissions are available depends on how long a variable is used for, or its “lifetime”.
Lifetimes are something that rust detects by inspecting the code. As stated above, it can be a bit cautious or course in this detection.
This can get to the point where you will need to explicitly provide information as to the length of a variable’s lifetime in the code base. This is done with lifetime annotations and are the 'as in the following code: fn longest<'a>(x: &'a str, y: &'a str) -> &'a str.
They won’t be covered here … but they’re coming.
Suffice it to appreciate why this is a problem needing a solution, with the code above as an example:
- the function first_or takes two references but returns only one reference that will, depending entirely on runtime logic, depend on one of the two input references. IE, depending on what happens at runtime, one of the input references have a longer lifetime than the other. As Rust cannot be sure of the lifetimes of all three references, the programmer has to provide that information. A topic for later.

Jayjader@jlai.lu · 11 days ago

I said this during my stream of 4.2 (I think): reading about the explicit “Flow” permission was a wonderful validation of my own internalized representation of how variables/lifetimes behave with regards to function calls. Things “flow” into functions, and the only things that “flow out” are what is part of the explicit return value. Deriving this base set of assumptions gives you why you can’t just return, from a function, a reference/borrow of data created / memory allocated inside the function call: you need to have the referenced data “flow out” as well.

Persistent gripes

So much time is spent talking about double frees, use-after-frees, and pointers in general yet we never stop to acquire or review what they definitively look like in practice. It feels to me like The Book ends up specifically assuming you have some prior knowledge of low-level/assembly and/or experience implementing a compiler(s), despite it claiming to be agnostic as to your prior programming language in its intro:

Who This Book Is For

This book assumes that you’ve written code in another programming language but doesn’t make any assumptions about which one. We’ve tried to make the material broadly accessible to those from a wide variety of programming backgrounds. We don’t spend a lot of time talking about what programming is or how to think about it. If you’re entirely new to programming, you would be better served by reading a book that specifically provides an introduction to programming.

I understand if the rust internals are too complex to serve as support for introducing lifetimes, but I wish we got equivalent C code or maybe were shown the compiled output for examples illustrating each part of the chapter. For example, if we could be shown how function calls result in stack frames being pushed & popped beyond just the (more abstract) diagrams we already have, or see some malloc() and more importantly free() calls. Or, at least, see some example memory addresses used in the diagrams so that we can figure out for ourselves which pointers are invalid when, instead of having the arrows in the diagrams keep track of the addresses for us.

tried and true tips

when you’re designing an algorithm to solve a given problem, start with doing as many repeated linear passes over your data collections as you need. Copy/clone/recreate separately anytime you would reach for mutating the original data set. Never [have your code] do more than 1 thing at a time. Only once you have a working implementation of your entire algorithm should you then think about reducing the amount of work your code makes the CPU do to arrive at the same result.

^ this really seems to keep ownership problems: to a minimum, and non-existant during exploratory coding / brainstorming phases.

hard learnt lessons

I am not smart enough to expect to be able to write, first try, rust code that does 0 superfluous copies of data. Attempting to do so always results in going in circles fighting the borrow checker for up to an entire day, before I give up and take the approach I mention above [and often enough end up solving the problem in under half an hour].

maegul@lemmy.ml · 11 days ago

Yep. I’m with you on all of that!

The pitching of The Book is definitely off (this my attempt to write a basic intro to the borrow checker, just to see where my own brain was at but also out of a somewhat fanciful interest in what a better version could look like).

I wonder if the lack of C or assembly equivalents is because the internals aren’t stable??

And yea, optimising data copies on the first go seems to be a trap (for me too!)

Do you know if there are any good tools for analysing the hot spots of data copying?

maegul@lemmy.ml · 14 days ago

2. Any persistent gripes, difficulties or confusions?

I’m not entirely sure why, but the whole Double-Free issue never quite sunk in from chapter 4. It’s first covered, I think here, section 4.3: Fixing an Unsafe Program: Copying vs. Moving Out of a Collection

I think it was because the description of the issue kinda conflated ownership and the presence or absence of the Copy trait, which isn’t covered until way after chapter 4. Additionally, it seems that the issue mechanically comes down to whether the value of a variable is actually a pointer to a heap allocation or not (??)

It was also a behaviour/issue that tripped me up in a later quiz, in an ownership recap quiz in chapter 6 where I didn’t pick it up correctly.

Here’s the first quiz question that touches on it (see Q2 in The Book here, by scrolling down).

Which of the following best describes the undefined behavior that could occur if this program were allowed to execute?

let s = String::from("Hello world");
let s_ref = &s;
let s2 = *s_ref;
println!("{s2}");

For those not clear, the issue, if this code were permitted to execute, is that s2 would be a pointer to the same String that s points too. Which means that when deallocations occur as the scope ends, both s and s2 would be deallocated, as well as their corresponding memory allocations on the heap. The second such deallocation would then be of undefined content.

I find this simple enough, but I feel like the issue can catch me whenever the code or syntax obscures that a pointer would be copied, not some other value, like in the re-cap quiz in chapter 6 that I got wrong and linked above.

maegul@lemmy.ml · edit-2 13 days ago

4. Any hard learnt lessons? Or tried and true tips?

A basic lesson or tip from a discussion in this community (link here):

PS: Abso-fucking-lutely just clone and don’t feel bad about it. Cloning is fine if you’re not doing it in a hot loop or something. It’s not a big deal. The only thing you need to consider is whether cloning is correct - i.e. is it okay for the original and the clone to diverge in the future and not be equal any more? Is it okay for there to be two of this value? If yes, then it’s fine.

IE, using copy/clone as an escape hatch for ownership issues is perfectly fine.

Another one that helps put ownership into perspective I think is this section in the Rustonomicon on unsafe rust, and the section that follows:

There are two kinds of reference:

Shared reference: &
Mutable reference: &mut

Which obey the following rules:

A reference cannot outlive its referent
A mutable reference cannot be aliased

That’s it. That’s the whole model references follow.

Of course, we should probably define what aliased means.

error[E0425]: cannot find value `aliased` in this scope
 --> <rust.rs>:2:20
  |
2 |     println!("{}", aliased);
  |                    ^^^^^^^ not found in this scope

error: aborting due to previous error

Unfortunately, Rust hasn’t actually defined its aliasing model. 🙀

While we wait for the Rust devs to specify the semantics of their language, let’s use the next section to discuss what aliasing is in general, and why it matters.

Basically it highlights that rust’s inferential understanding of the lifetimes of variables is a bit coarse (and maybe a work in progress?) … so when the compiler raises an error about ownership, it’s being cautious (as The Book stresses, unsafe code may not have any undefined behaviour).

It helps I think reframe the whole thing as not being exclusively about correctness but just making sure memory bugs don’t happen

Last lesson I think I’ve gained after chapter 4 was that the implementation and details of any particular method or object matter. The quiz in chapter 6 (question 5) I’ve mentioned is I think a good example of this. What exactly the Copy and Clone trait are all about too … where I found looking into those made me comfortable with the general problem space I was navigating in working with ownership in rust. Obviously the compiler is the safe guard, but you don’t always want to get beaten over with ownership problems.

Reading Club: The Book Ch 4 "Understanding Ownership" [PROJECT]

Reading Club: The Book Ch 4 "Understanding Ownership" [PROJECT]

Understanding Ownership - The Rust Programming Language

If I had to explain ownership in rust (based on The Book, Ch 4)

General Idea or Purpose

Ownership

Borrowing (with references)

Ownership and read/write permissions are altered when references are created

Lifetimes are coming

Persistent gripes

tried and true tips

hard learnt lessons