The Ups and Downs of Rust Adoption - Draft
By Bevan Mardiros
The ups and downs of Rust adoption
Rust is easily my favourite programming language, but it does have some problems or difficulties that may serve to slow its adoption in industry. For the purpose of this article, I'll assume that the reader understands the basics of Rust, so I need only outline the peculiarites to of its differences to other languages, and not explain what the differences themselves are.
Learning curve
Rust has a well-earned reputation for complexity; though I've found that in practice certain features that reduce complexity more than make up for those aspects that seem to increase complexity. In my opinion the repution comes from the fact that rust's complexity is mostly up front; new crustaceans have a steep learning curve ahead of them, and there are a number of unfamiliar concepts that they must learn, before they can get to the complexity-reducing benefits that come later. The features that create up-front complexity are: the ownership rules, traits, lifetimes, and futures/async. I don't include unsafe rust in this list because new Rustaceans should generally avoid it entirely, but if they don't then generally the complexity added is only the same complexity that comes when using C, that is they open themselves to the potential for making catastrophic memory safety errors, while simplifying some other aspect of their program.
In comparison, C is a very simple language, in that its syntax is straitforward and it has only a small number of keywords. Coming from higher-level languages, the idea of memory pointers and memory allocation can be complex, since the standard approach is to eliminate the memory-management interface entirely through costly garbage collection. The malloc/free interface itself however is very simple: just allocate X amount of memory, get pointers to it, and use those pointers in your program logic. The complexity comes from dealing with the lack of safeguards in this approach. There is nothing preventing undefined behaviour like dereferencing null pointers, or applying math to an array pointer to overwrite or read values from data beyond the actual length of the array, or writing/reading data after it has been freed and possibly reallocated. In this way, the simple interface, easily abused, creates the potential for enormous complexity in the form of bugs that are hard to diagnose. Rust on the other hand, avoids the possibility for these errors by introducing ownership rules and reference lifetimes. These rules are extremely strict about which data can be used where, how it can be stored and passed around. This requires that Rust developers create far more rigourous data structures and control flow than they are used to. However, if they manage to get their program to run (and use some best practices, which I'll get into) then its very likely that the program is correct, or nearly so. So C is simple while writing and compiling, and then complex in debugging, a process which may take years for a complex program, Rust is complex to design (and design occupies a larger part of development), write, and compile, but far simpler to run and debug.
It useful to distinguish between those aspects of rust which contribute to its learning curve but which pay of later by reducing complexity, and those that don't. The borrow checking mentioned above pays off as described. The leaky abstractions common in proc-macro based libraries pay off less. For example, diesel provides compile-time query checks, migrations, and code reuse between queries. But the implementation depends on apply proc-macros to data structures to connect the model structs to the database tables. The result is that when diesel is used incorrectly, the compile-time error reveals unclear messages that seem specific to diesel's internal code. The result is that the standard "compiler-driven development" approach doesn't work here, instead you must study the documentation to determine what you're doing wrong.
Tips for ownership rules
- try using an explicit
drop
on&mut
values to avoid double mutable borrow errors. - avoid returning references to fields from a method of a struct, since that will prevent mutable calls. Instead, return a copied value, a cloned (suboptimal) value, or pass a ref of the struct into whatever block requires the field.
- avoid using RefCell to get around ownership rules, it simply applies the same rules during runtime; so the complexity is still there, but now you're seeing it with a program crash. I ran into this in an earlier version of bramble-sync, but using refcells to contain the program state required by the various async functions. I would get a reference to the inner value, then hold that reference during an async operation. That op would suspend, another would start and request the same internal value, violating the borrowing rules and crashing the program.
Complexity-Reducing Benefits
Memory Safety
NullPtr vs Option
int* vs [i32; 8], &[i32], and Vec
- in rust out of bounds is either impossible, with arrays, or panics the thread, with slices and vectors. Both of these are preferrable to C arrays, which could result in transmutation of random data into a runtime value
Unsafe
- confine UB diagnostics to unsafe {} blocks
- miri to detect runtime UB
Type system
Error handling
- facilitates addresses APoSD's suggestion of reducing error-handling complexity by confining it to a small number of areas, either by masking or collating errors. In rust errors are handled by the Result enum (and panics, but that's mostly for application-ending errors), and with pattern matching the dev can ensure that all errors are handled in code, unlike languages which use exceptions.
- the
?
operator allows errors to propogate up so that they can be collated at the top.
Typestates
- another way to handle null-safety, among other problems.
- defines errors out of existence, by defining types and controlling how one can be destroyed to become another. If the entire application state can be defined in terms of typestates, then invalid states cannot occur.
Tooling
- miri
- clippy
- cargo test
- rust-analyzer
An overview of my Rust Projects
bramble-sync
My first and largest rust project. The bramble-sync protocol allows for groups consisting of directed acyclic graphs of messages to be synchronized between devices communicating over some transport protocol. I wrote and rewrote this project, which contains about 6000 lines of code about 3 times, as I learned progressively more about Rust. Some of the changes between such rewrites include:
- design of the storage layer, which previously included a loop that read from and wrote to channels to the sync layer, assigning each op an ID, and handled each op in turn
Storm
This project includes several interesting components:
- a wrapper around the TOR C library, making a local instance of Tor available for the program to use
- this could probably be removed considering that Arti 1.2.0 came out in May 2024
- an implementation of the bramble-rendezvous libraries trait, allowing a consumer to (eventually) connect two peers over tor, given a private and public key
- a Connector class that uses the lower modules to wrap rendexvous in transport, and provide transport to sync, which is then provided to the controller as a stream of messages.
- a controller, the stand-in for a consumer of bramble-sync, and which coordinates the application-specific
Mapped Futures
Overview of Lessons
- tradoffs of using an async runtime
- not much of a tradeoff, without a runtime you'll need to manually handle future storage and decide on distribution of tasks between threads; tokio will automatically optimize task distribution.
Useful readings
https://github.co/rust-unofficial/patterns https://users.rust-lang.org/t/rust-koans/2408/4 https://thenewwazoo.github.io/whining.html