Warning
|
This is a work-in-progress document which is being made by the author while learning the language from the official documentation and other references on the net. The author hopes it could be useful for somebody, but please make sure to verify its contents with authoritative sources. For error reporting please write to contactos at americati.com. |
Intro
We’ll implement a trivial task: get the average of some of student scores. The usual average is the arithmetic one, given by the usual formula:
PH = (a + b + ...)/n
To add a little twist, we’ll also calculate the harmonic average, given by
PH = n/[1/a + 1/b + ...]
Which for two elements is reduced to 2ab/(a+b)
.
We’ll use this "problem" in order to implement several possible solutions with Rust.
The highly recommended Rust installation method is via rustup
(see
https://www.rust-lang.org/tools/install.)
After installation, a Cargo binary project
must be setup:
diego@dataone:~/devel/RUST$ cargo new --vcs none xtut
In the src
subdirectory there is a main.rc
program
which may be used as starting point. But for this tutorial
we’ll create many small standalone examples, which is
accomplished by storing the code files in the
src/bin
subdirectory, which must be manually created:
diego@dataone:~/devel/RUST$ cargo new --vcs none xtut diego@dataone:~/devel/RUST$ cd xtut diego@dataone:~/devel/RUST/xtut$ mkdir src/bin
From now, we’ll work from the xtut
project’s subdirectory. I’m
working in a Linux Xubuntu environment with Rust 1.51, using the VSCode
IDE (here https://www.youtube.com/watch?v=f6tizikEMTk some instructions
for the VSCode setup.)
Note
|
Originally I’ve used "Corrosion" (Rust plugin for the Eclipse IDE), but found "VSCode + Rust Plugin" to be more responsive, lightweight and less buggy. |
The Solutions
Trivial implementation
We start using 32-bit-signed-integers (i32) to store the student scores:
fn main() {
let x1 = 16;
let x2 = 12;
println!("AVG({},{}) PA={}", x1, x2, pa(x1, x2));
}
fn pa(a: i32, b: i32) -> i32 {
(a + b) / 2
}
We build and run with:
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0010
Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut)
Finished dev [unoptimized + debuginfo] target(s) in 0.50s
Running `target/debug/p0010`
AVG(16,12) PA=14
Obviously, main
is the starting function (fn
) of the program, and
we defined two integer variables (the type is inferred as i32 by
the called function. The println!
is a macro instruction (recognized
by the ending bang) which sends output to the standard output (like
C’s printf
) using {}
as placeholders for the arguments.
Note that the returned value is the last expression in the function; in the C language we would have to employ:
return (a + b) / 2;
Which is also valid in Rust, and mandatory for returning from any non-final location in the function.
Note
|
This trivial program did produce (in my computer) an executable of about 3.3 megabytes, but there are several (good) reasons for it. Please see https://github.com/johnthagen/min-sized-rust for some ideas to mitigate this situation. |
Note
|
The println!() placeholders support a similar syntax
as C’s printf() , but it is not the same. For more information please
see https://doc.rust-lang.org/std/fmt/ .
|
Harmonic Average
As shown in the introduction, the harmonic average is the inverse of the arithmetic average of the inverted values; here we blindly apply this concept by reusing the previous arithmetic average function:
fn main() {
let x1 = 16;
let x2 = 12;
println!("AVG({},{}) PA={} PH={}", x1, x2, pa(x1, x2), ph(x1, x2));
}
fn pa(a: i32, b: i32) -> i32 {
(a + b) / 2
}
fn ph(a: i32, b: i32) -> i32 {
1 / pa(1 / a, 1 / b)
}
The execution obviously crashes (panicks in Rust jargon) because the integer divisions
1 / a
and 1 / b
both return zero (like in a C language version):
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0020 Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished dev [unoptimized + debuginfo] target(s) in 0.47s Running `target/debug/p0020` thread 'main' panicked at 'attempt to divide by zero', src/bin/p0020.rs:12:5 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
A "solution" would be to apply the "reduced" formula for two values 2ab/(a+b)
;
here we’ll also introduce floating point values in order to avoid the
"trucating" behavior of the integer division:
fn main() {
let x1 = 16;
let x2 = 12;
println!("AVG({},{}) PA={} PH={}", x1, x2, pa(x1, x2), ph(x1, x2));
}
fn pa(a: i32, b: i32) -> i32 {
(a + b) / 2
}
fn ph(a: i32, b: i32) -> i32 {
let af = a as f64;
let bf = b as f64;
let ans = 2.0 * af * bf / (af + bf);
println!("PH (f64) is {}", ans);
// cast to integer: rounds toward zero
ans as i32
}
The ph()
function converts the scores to double precision floating point
values (type f64
) and makes the arithmetical division avoiding the
truncation. We can’t leverage the pa()
function since it does requiere
integer parameters.
Finally, the result is converted to i32
using the as
keyword. It works
like a C language cast: return (int)ans;
:
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0030 Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished dev [unoptimized + debuginfo] target(s) in 0.47s Running `target/debug/p0030` PH (f64) is 13.714285714285714 AVG(16,12) PA=14 PH=13
As shown, the final cast is truncating the harmonic average from 13.71.. to 13. In scores, is common practice to round the decimal quantities:
fn main() {
let x1 = 16;
let x2 = 12;
println!("AVG({},{}) PA={} PH={}", x1, x2, pa(x1, x2), ph(x1, x2));
}
fn pa(a: i32, b: i32) -> i32 {
(a + b) / 2
}
fn ph(a: i32, b: i32) -> i32 {
let af = a as f64;
let bf = b as f64;
let ans = 2.0 * af * bf / (af + bf);
println!("PH (f64) is {}", ans);
ans.round() as i32
}
Now we get:
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0040 Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished dev [unoptimized + debuginfo] target(s) in 0.48s Running `target/debug/p0040` PH (f64) is 13.714285714285714 AVG(16,12) PA=14 PH=14
-
Avoid the truncation of integer division in the arithmetic average function.
A single call: Tuples
Here we want to extract both averages in a single call. The function parameters are the same (the scores), but how to return two values together?
One way is using a tuple:
fn main() {
let x1 = 16;
let x2 = 12;
let proms = pah(x1, x2);
println!("AVG({},{}) PA={} PH={}", x1, x2, proms.0, proms.1);
}
fn pah(a: i32, b: i32) -> (i32, i32) {
let pa = (a + b) / 2;
let af = a as f64;
let bf = b as f64;
let ans = 2.0 * af * bf / (af + bf);
println!("PH (f64) is {}", ans);
let ph = ans.round() as i32;
(pa, ph)
}
Note that the tuple components are extracted with the
syntax value.0
, value.1
, etc.
Instead of a couple of arguments, we could use a single tuple-argument:
fn main() {
let x1 = 16;
let x2 = 12;
let args = (x1, x2);
let proms = pah(args);
println!("AVG({},{}) PA={} PH={}", x1, x2, proms.0, proms.1);
}
fn pah(ab: (i32, i32)) -> (i32, i32) {
let pa = (ab.0 + ab.1) / 2;
let af = ab.0 as f64;
let bf = ab.1 as f64;
let ans = 2.0 * af * bf / (af + bf);
println!("PH (f64) is {}", ans);
let ph = ans.round() as i32;
(pa, ph)
}
A more complex data type
The scores will be associated to some course phase which will be
named "period". To model this idea, we’ll define a new data type
named SchoolScore
which contains both the period and the
achieved score:
pub struct SchoolScore {
#[allow(dead_code)]
period: u8,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: 1,
score: 16,
};
let s2 = SchoolScore {
period: 2,
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
// the first average works!
println!("PA={}", pa(s1, s2));
// but with the second one fails
//println!("PH={}", ph(s1, s2));
}
fn pa(a: SchoolScore, b: SchoolScore) -> i32 {
(a.score + b.score) / 2
}
#[allow(dead_code)]
fn ph(a: SchoolScore, b: SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Structs are similar to their C language counterparts. As shown, their values are built with the same syntax of their declaration, but replacing the component types with the component values.
Also, the instance components are accessed with the object.attribute
syntax (like
C.) Since the period
is not used at all, we avoid some ugly warnings
at compilation time by adding the optional #[allow(dead_code)]
directive.
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0060 Finished dev [unoptimized + debuginfo] target(s) in 0.01s Running `target/debug/p0060` AVG(16,12) PA=14
As shown, the first call (to the pa()
function) works as expected; but if we uncomment
the second one (to the ph()
function) we get the following compilation error:
18 | println!("PA={}", pa(s1, s2)); | -- value moved here 19 | // but with the second one fails 20 | println!("PH={}", ph(s1, s2)); | ^^ value used here after move
In this point we got into the first interesting Rust feature: ownership.
Ownership
Let’s explain this from the C language perspective: assume we have an object (maybe an struct) dynamically created:
struct Dummy *ptr = calloc(1, sizeof(struct Dummy));
some_call(ptr);
....
free(ptr);
What if some_call()
(or any other further called function) does free the pointer? Clearly
the second free()
may crash the program.
Some (most?) programming languages avoid this issue at all by providing an automatic facility for object removal (freeing the memory) called garbage collection. Rust provides the concept of "ownership", corresponding to the program code who is "in charge" of the object removal: it (usually) happens when the containing function or block finishes its execution (the variable gets out of scope.) Again: all the objects created during the program execution have an owning code block (usually a function.) At first, the owner is the block where the object is created, but it may be transfered later.
The problem with the previous program (with the ph()
call) is that the
first call (to the pa()
function) effectively transfers the ownership of
the object handled by s1
(and s2
) to the function. The ownership passes
to the variable a
(and b
.) Those variables are discarded upon function termination
(like any local variable), and the handled object is destroyed.
So, when the main()
function is resumed, both s1
and s2
are invalid handlers
since the pointed objects are no longer in memory.
Such "ownership transfer" is known as a "move" operation.
Please re-read this section until enlightment :). Next, we’ll see some ways
to call ph()
.
The Copy trait
But why the previous examples did work? … if we look up carefully, the
function arguments were of type i32
. Also, all the primitive numeric types
like i32
implement the so called Copy
"trait".
Note
|
A trait is like a Java interface or a C++ abstract class: the instances have a set of methods available. Sometimes, no method is present at all: the trait is used to specify (mark) some behavior of the type. |
The Copy
trait is used by Rust to know when an object is "simple" enough so:
-
When calling a function, the parameters have a complete copy of the object, with its own owership (no "move" operation is performed.) So the caller’s variable continues being valid
-
Each of the object copies is liberated from memory when their handling variable goes out of scope
-
In practice, just a "shallow copy" (bitwise copy) is needed to have a complete object copy of the same data type
Note
|
For tuples, since the two components were i32 , i.e. of a Copy type,
then the full tuple is considered of Copy type:
https://doc.rust-lang.org/std/primitive.tuple.html#trait-implementations-1
|
Now, our struct
is composed of primitive types, but Rust does require we
to explitly mark it as a Copy. A "fast way" to achieve it is shown in the
following listing:
// first solution: Copy trait
#[derive(Copy, Clone)]
pub struct SchoolScore {
#[allow(dead_code)]
period: u8,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: 1,
score: 16,
};
let s2 = SchoolScore {
period: 2,
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(s1, s2));
println!("PH={}", ph(s1, s2));
}
fn pa(a: SchoolScore, b: SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: SchoolScore, b: SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
The standard trait implementation syntax will be deferred
to further examples. Here we use an "abbreviated form" provided by Rust for some
common cases in the #[derive(…)]
directive: it instructs the compiler to
silently generate a "default" trait implementation which is enough for the
simplest cases.
The Clone
trait is a "supertrait" of Copy
, so types which implement Copy
must also implement Clone
; fortunately, Rust does provide default implementations
for both traits, so we are able to implement them with #[derive] directives:
#[derive(Copy, Clone)]
pub struct SchoolScore {...}
Note
|
For more information, please see: https://doc.rust-lang.org/core/marker/trait.Copy.html#how-can-i-implement-copy and https://doc.rust-lang.org/stable/rust-by-example/trait/derive.html. |
Recapitulating: we have a complex (struct
) data type which is composed
of primitive (Copy) types, so we are signaling the compiler that such
complex type is also a Copy type; then we use it as in the previous examples: the
object is fully copied when passed between functions.
The following solution will work even if the struct components are not Copy types.
References
For any type T
, the type described as &T
is known as a "shared reference to T". It is
an alias for the object which does not entail the move operation, so we
may solve our problem with the following listing:
// second solution: Immutable (shared) references
pub struct SchoolScore {
#[allow(dead_code)]
period: u8,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: 1,
score: 16,
};
let s2 = SchoolScore {
period: 2,
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Since pa()
and pb()
just get an alias but does not get ownership, the objects
are not destroyed until main()
(the only owner) terminates.
We should understand that this solution and the previous one are not opposed but complementary. Some ideas:
-
The Copy (and Clone) trait is usually implemented for types for which we’re sure no non-Copy component will be added in the future
-
Passing a reference to a function entails passing a memory location (pointer) which imply filling the stack with 4 or 8 bytes
-
Passing a Copy-type object fills the stack with a number of bytes equal to the object size (with some additional padding), so could be expensive for bigger types
The passing of a reference as parameter is known as a "borrow", as opposed with a transfer of the ownership.
Move again
The following "solution" is shown only for illustrative purposes: the
"move" operation happens when calling the pa()
with both objects, but
another "move" does happen when that function does return a tuple which
contains the same objects:
// third solution: moving in and out
pub struct SchoolScore {
#[allow(dead_code)]
period: u8,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: 1,
score: 16,
};
let s2 = SchoolScore {
period: 2,
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
let (s1, s2, xpa) = pa(s1, s2); // reusing the s1/s2 identifiers (not mandatory)
println!("PA={}", xpa);
println!("PH={}", ph(s1, s2));
}
fn pa(a: SchoolScore, b: SchoolScore) -> (SchoolScore, SchoolScore, i32) {
let ans = (a.score + b.score) / 2;
(a, b, ans)
}
// weird implementation: get ownership and return the ownership
fn ph(a: SchoolScore, b: SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
The returned tuple components are individually assigned to variables using the following syntax which leverages the Rust’s "pattern matching" feature:
let (s1, s2, xpa) = pa(s1, s2);
We reused the names s1
and s2
, effectively shadowing the
previous variables: it is a feature in Rust that the
variable identifiers can be reused as needed.
The ownership of the objects returns to main()
, allowing the
final call to ph()
with a final "move".
Again, this is only for conceptual demonstration purposes.
Text Strings
Now we introduce the String
data type, which is a
container for lists of characters, also known as
text strings. We’ll change the "period" component’s data type
of the SchoolScore
structure to be a String:
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(s1, s2));
// println!("PH={}", ph(s1, s2));
}
fn pa(a: SchoolScore, b: SchoolScore) -> i32 {
(a.score + b.score) / 2
}
#[allow(dead_code)]
fn ph(a: SchoolScore, b: SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
The structure is initialized with a String
built with the
String::from()
function, which is "associated" to the
String
type, so is known as an "associated function".
Note that a String
object allows the mutation of the contained
text; in this respect is similar to Java’s StringBuilder
.
As in previous examples, if we uncomment the call to ph()
we get
the familiar "moved" error:
Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) error[E0382]: use of moved value: `s1` --> src/bin/p0100.rs:18:26 | 8 | let s1 = SchoolScore { | -- move occurs because `s1` has type `SchoolScore`, which does not implement the `Copy` trait ... 17 | println!("PA={}", pa(s1, s2)); | -- value moved here 18 | println!("PH={}", ph(s1, s2)); | ^^ value used here after move
As before, we’ll analyze some possible solutions.
The Clone trait
A first approach would be to implement the Copy
trait as in a previous
example. This fails because the period: String
component does not implement
the Copy
trait.
An interesting question is why String
does not implement
such trait? the reason is that String`s are essentialy pointers
to some memory region where the associated text resides; and as the `Copy
trait
signals Rust to make copies of such pointer (a shallow copy), the
result would be a new variable pointing to the same memory region. But
a new String
does need a new associated memory region since it must be
able to store different text contents.
So the String
copy process involves
more than a "shallow copy" or "bitwise copy" as provided by Copy
trait. This
scenery is known as a "deep copy", and the Clone
trait is the
associated one.
Unlike the "shallow copy" implied in a function call or variable assignement, the
cloning operation needs to be explicitelly requested by the invocation of
the trait’s clone()
method.
Fortunately, String
already implements the Clone
trait, so we may
use the #[derive(Clone)]
directive for the structure, which in turn
clones the compoments:
// first solution with String: Clone trait
#[derive(Clone)]
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(s1.clone(), s2.clone()));
println!("PH={}", ph(s1.clone(), s2.clone()));
}
fn pa(a: SchoolScore, b: SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: SchoolScore, b: SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Explicit implementation of the Clone trait
The following alternative version avoids the #[derive(Clone)]
directive
by an explicit implementation of the trait. It is the same result, but
we introduce it to show the syntax of for trait implementations:
// first solution with String: Clone trait
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
impl Clone for SchoolScore {
fn clone(&self) -> Self {
SchoolScore {
period: self.period.clone(),
score: self.score,
}
}
}
fn main() {
let s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(s1.clone(), s2.clone()));
println!("PH={}", ph(s1.clone(), s2.clone()));
}
fn pa(a: SchoolScore, b: SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: SchoolScore, b: SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
The impl Clone for SchoolScore {…}
is fairly obvious. As the Clone
trait has only the clone()
method, let’s explain its signature:
fn clone(&self) -> Self {...}
The clone()
method does receive a reference to the called
object with the self
keyword (lower case) and its output
is an object (not a reference) of the same type of the
called object: such type is described by the Self
keyword
(first letter in upper case.)
Alternative solutions
The introduction of references is straightforward with the
addition of the &
operator:
// using references
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
And as in a previous example, we may adapt the "move in, move out" pattern:
// moving in and out
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
let (s1, s2, xpa) = pa(s1, s2); // reusing the s1/s2 identifiers (not mandatory)
println!("PA={}", xpa);
println!("PH={}", ph(s1, s2));
}
fn pa(a: SchoolScore, b: SchoolScore) -> (SchoolScore, SchoolScore, i32) {
let ans = (a.score + b.score) / 2;
(a, b, ans)
}
// weird implementation: get ownership and return the ownership
fn ph(a: SchoolScore, b: SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Text Constants
The following dummy program shows a dummy handling of a
text constant extracted from a String
; we explicitly
added the variable’s data types for better understanding:
fn main() {
let start: String = String::from("some text");
let k: &str;
k = start.as_str();
xtest(k);
}
// no return value
fn xtest(s: &str) {
println!("text is {}", s);
}
The expected output:
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0100 Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished dev [unoptimized + debuginfo] target(s) in 0.49s Running `target/debug/p0100` text is some text
The str
(String slice) is a Rust built-in data type used to reference a
range of characters (like a text constant.) In database terminology, it
may be considered a "view" to some (section of) a text. In general,
the str’s text length is only known at runtime, so Rust forces us
to always employ a reference (a pointer) to it. For example, changing the
type declaration in the previous program:
let k: str;
Generates the message:
error[E0277]: the size for values of type `str` cannot be known at compilation time
A simpler version of the previous example involves discarding the String
at all:
let k = "some text";
That is, literal text constants in the program must also be handled by the &str
type.
A previous example involved the SchoolScore
containing a full period: String
component
which is totally copied when the structure is cloned. But if our domain model
allows to assume that such text is never changed from the structure-object creation
time (i.e. is inmmutable), then we could emply a &str
reference:
pub struct SchoolScore {
period: &str,
score: i32,
}
Since period
is now simply a pointer to an immutable memory region, we could
create copies of the structure object by the "shallow copy" mechanism: it’s okay
that more than one object’s period
points to the same region, since it is
immutable.
Copy Trait for a pointer to immutable text
So we add the component of type &str
, but the compiler rejects it with:
error[E0106]: missing lifetime specifier --> src/bin/p0110.rs:7:13 | 7 | period: &str, | ^ expected named lifetime parameter
Lifetimes
To illustrate the problem we’ll analyze a hypotetical version of the program:
pub struct SchoolScore {
period: &str,
score: i32,
}
fn main() {
let s1;
// a new block scope
{
let k = String::from("January");
s1 = SchoolScore {
period: k.as_str(),
score: 16,
};
}
println!("S1: {} ", s1.score);
}
For illustrative purposes, we created a new "block scope" in which the object owned
by k
is created and destroyed at termination.
The s1
structure object does contain a reference to the k
object’s pointed memory region (captured
by the period: k.as_str()
line) inside the internal block. But as soon as the execution exits from it,
the period
component gets invalid as the mentioned memory region is freed (and maybe overwritten.)
More in Rust terms, we say that k
does not live enough for its usage in s1
. In simple cases
(like a single function body), the compiler has all the information needed to discover this problem,
but when function calls are involved then the compiler may ask for more information about
the object’s lifetimes.
So, from https://rust-unofficial.github.io/too-many-lists/second-iter.html: "Quite simply, a lifetime is the name of a region (~block/scope) of code somewhere in a program. That’s it. When a reference is tagged with a lifetime, we’re saying that it has to be valid for that entire region."
Returning to the original example, following the compiler suggestion we add a "lifetime parameter" and the compiler is satisfied:
// first solution with &str: Copy trait
#[derive(Copy, Clone)]
pub struct SchoolScore<'a> {
#[allow(dead_code)]
period: &'a str,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: "January",
score: 16,
};
let s2 = SchoolScore {
period: "February",
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(s1, s2));
println!("PH={}", ph(s1, s2));
}
fn pa(a: SchoolScore, b: SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: SchoolScore, b: SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
The SchoolStore
objects have a lifetime named 'a
(indicated by the <'a>
parameter of the
struct
declaration), and the period
references have the same lifetime (indicated by the
&'a str
type declaration.) This means that the object pointed by period
must live at least
as the struct
container object.
Note that this is only needed for references. For example, the previous String
object was
"owned" by the struct
, so its lifetime is automatically attached to its container.
Note
|
The 'a , 'b , etc. is the Rust convention for lifetime specifications.
|
Again passing references
Nothing to comment in the following example:
// second solution with &str: Immutable (shared) references
pub struct SchoolScore<'a> {
#[allow(dead_code)]
period: &'a str,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: "January",
score: 16,
};
let s2 = SchoolScore {
period: "February",
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
What if instead of returning a primitive i32
average score, we want
to return a full SchoolStore
containing the average? for example, a
function like:
fn pa(a: &SchoolScore, b: &SchoolScore) -> SchoolScore {...}
The compiler rejects with:
| 24 | fn pa(a: &SchoolScore, b: &SchoolScore) -> SchoolScore { | ------------ ------------ ^^^^^^^^^^^ expected named lifetime parameter = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from one of `a`'s 2 lifetimes or one of `b`'s 2 lifetimes help: consider introducing a named lifetime parameter
This is because the response does contain a reference whose lifetime is unknown to Rust; the compiler suspects the reference lifetime matches the input parameters', so complains.
See also https://doc.rust-lang.org/stable/book/ch10-03-lifetime-syntax.html for more information.
So we declare a lifetime parameter in the functions pa()
and
ph()
used to annotate the return value:
// returning an object
pub struct SchoolScore<'a> {
#[allow(dead_code)]
period: &'a str,
score: i32,
}
fn main() {
let s1 = SchoolScore {
period: "January",
score: 16,
};
let s2 = SchoolScore {
period: "February",
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
let xpa = pa(&s1, &s2);
println!("PA={}", xpa.score);
let xph = ph(&s1, &s2);
println!("PH={}", xph.score);
}
fn pa<'a>(a: &SchoolScore, b: &SchoolScore) -> SchoolScore<'a> {
let ans = (a.score + b.score) / 2;
SchoolScore {
period: "PA",
score: ans,
}
}
fn ph<'a>(a: &SchoolScore, b: &SchoolScore) -> SchoolScore<'a> {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
SchoolScore {
period: "PH",
score: ans.round() as i32,
}
}
This way the output parameter lifetime is linked to the "dummy" function parameter and is not
liked to any of the input parameters since they were not associated to 'a
.
Note that the returned reference (pointing to texts "PA" and "PH") has a lifetime corresponding
to the total program execution since they are literal constants in the program text (see
below about the static lifetime.) For better
undestanding, Rust rejects the following version of pa()
:
fn pa<'a>(a: &SchoolScore, b: &SchoolScore) -> SchoolScore<'a> {
let ans = (a.score + b.score) / 2;
let x = String::from("PA");
SchoolScore {
period: x.as_str(),
score: ans,
}
}
with the following message:
error[E0515]: cannot return value referencing local variable `x` | 35 | / SchoolScore { 36 | | period: x.as_str(), | | - `x` is borrowed here 37 | | score: ans, 38 | | } | |_____^ returns a value referencing data owned by the current function
The returned reference inside SchoolScore
is pointing to the text region
of variable x
, which is destroyed upon pa()
termination.
Mutability
The following program fails in several ways:
pub struct SchoolScore {
period: String,
score: i32,
}
fn main() {
fail0();
fail1();
fail2();
}
fn fail0() {
let n = 45;
n += 1;
println!("number={}", n);
}
fn fail1() {
let k = String::from("April");
k.push_str(" and May");
println!("another period={}", k);
}
fn fail2() {
let s = SchoolScore {
period: String::from("January"),
score: 31,
};
s.period = String::from("February");
s.period.push_str(" and March");
println!("period={} score={}", s.period, s.score);
}
The compiler errors were:
error[E0384]: cannot assign twice to immutable variable `n` --> src/bin/p0300.rs:14:5 | 13 | let n = 45; | | | first assignment to `n` | help: make this binding mutable: `mut n` 14 | n += 1; | ^^^^^^ cannot assign twice to immutable variable error[E0596]: cannot borrow `k` as mutable, as it is not declared as mutable --> src/bin/p0300.rs:20:5 | 19 | let k = String::from("April"); | - help: consider changing this to be mutable: `mut k` 20 | k.push_str(" and May"); | ^ cannot borrow as mutable error[E0594]: cannot assign to `s.period`, as `s` is not declared as mutable --> src/bin/p0300.rs:29:5 | 25 | let s = SchoolScore { | - help: consider changing this to be mutable: `mut s` ... 29 | s.period = String::from("February"); | ^^^^^^^^ cannot assign error[E0596]: cannot borrow `s.period` as mutable, as `s` is not declared as mutable --> src/bin/p0300.rs:30:5 | 25 | let s = SchoolScore { | - help: consider changing this to be mutable: `mut s` ... 30 | s.period.push_str(" and March"); | ^^^^^^^^ cannot borrow as mutable
The root problem is that Rust assumes by default that all variables are
immutable. For example, being n
an immutable integer, it is not allowed
to increment its value. Note that a new n
variable may be declared with
let
(even of any other type), but the current one is immutable.
The String
object k
is immutable, so calling its push_str()
method
fails because it needs a "mutable reference" (more on this later.)
Finally, the structure members are immutable as the enclosing object, which
diallows the reassignement of period
and the mutation of such String
for
via push_str()
as noted above.
But all is happiness with mutable variables:
pub struct SchoolScore {
period: String,
score: i32,
}
fn main() {
fail0();
fail1();
fail2();
}
fn fail0() {
let mut n = 45;
n += 1;
println!("number={}", n);
}
fn fail1() {
let mut k = String::from("April");
k.push_str(" and May");
println!("another period={}", k);
}
fn fail2() {
let mut s = SchoolScore {
period: String::from("January"),
score: 31,
};
s.period = String::from("February");
s.period.push_str(" and March");
println!("period={} score={}", s.period, s.score);
}
The expected results:
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0300 Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished dev [unoptimized + debuginfo] target(s) in 0.73s Running `target/debug/p0300` number=46 another period=April and May period=February and March score=31
Mutable References
Since the students were nice people, the teacher wanted to avoid any score less than 14. So previously to the average calculation, the scores are to be "promoted" to be at least 14.
We’ll employ references in our new promotion()
function, but
a shared (immutable) reference will not allow the structure
modification, so we need a "mutable reference":
fn promotion(a: &mut SchoolScore) -> bool {...}
The function returns whether the score was "promoted" or not; we’ll ignore this return value but here is used to illustrate the built-in data type for boolean values.
In order to call such function with a mutable reference, we need to start from mutable objects (thinking a bit, it would be pointless to be able to create mutable references from immutable objects.) Then the call use a syntax like:
promotion(&mut s1);
The complete listing goes now:
pub struct SchoolScore<'a> {
#[allow(dead_code)]
period: &'a str,
score: i32,
}
fn main() {
let mut s1 = SchoolScore {
period: "January",
score: 16,
};
let mut s2 = SchoolScore {
period: "February",
score: 12,
};
println!("AVG({},{}) ", s1.score, s2.score);
promotion(&mut s1);
promotion(&mut s2);
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
}
fn promotion(a: &mut SchoolScore) -> bool {
if a.score < 14 {
a.score = 14;
true
} else {
false
}
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
The results:
AVG(16,12) PA=15 PH (f64) is 14.933333333333334 PH=15
More information from https://doc.rust-lang.org/stable/book/ch04-02-references-and-borrowing.html#the-rules-of-references:
"At any given time, you can have either one mutable reference or any number of immutable references [to a single object]. References must always be valid."
Traits
Like Java interfaces or C++ abstract classes, Rust’s traits specify
some behavior of a type (usualy a struct
) by declaring a set of
functions (methods) the complying type must implement. In the following
example, we’ll define two types: our already known SchoolScore
and
the new WorkerScore
; both represent entities containing the
notion of an integer score. So it would be useful to specify a
common trait for both types:
pub trait Scored {
fn get_score(&self) -> i32;
}
So objects of SchoolScore
and WorkerScore
will be able to
provide a score by a call with the form object.get_score()
. Note
that the &self
argument represent the instance being called, and is
used by the implementations as shown below.
To make the example a bit more interesting, we’ll define another trait for our types, in order to implement the "promotion" behavior previously discused. So we need a trait able to mutate the object, which is achieved by requiring a mutable reference:
pub trait Promotable {
fn promote(&mut self) -> bool;
}
The following partial listing shows the types, the trait declarations and
their implementations. Also, a main()
method is added for demonstration
purposes:
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
pub struct WorkerScore {
#[allow(dead_code)]
name: String,
points: i32,
}
pub trait Scored {
fn get_score(&self) -> i32;
}
pub trait Promotable {
fn promote(&mut self) -> bool;
}
impl Scored for SchoolScore {
fn get_score(&self) -> i32 {
self.score
}
}
impl Scored for WorkerScore {
fn get_score(&self) -> i32 {
self.points
}
}
impl Promotable for SchoolScore {
fn promote(&mut self) -> bool {
if self.score < 14 {
self.score = 14;
true
} else {
false
}
}
}
impl Promotable for WorkerScore {
fn promote(&mut self) -> bool {
if self.points < 14 {
self.points = 14;
true
} else {
false
}
}
}
fn main() {
let mut s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let mut s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("Student averages:");
s1.promote();
s2.promote();
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
w1.promote();
w2.promote();
println!("PA={}", pa(&w1, &w2));
println!("PH={}", ph(&w1, &w2));
}
The pa()
and ph()
implementation is not shown (yet.) But we already see that
the promotion()
method is applied to the struct objects thanks to the
Promotable
trait.
Trait Bounds
Now we’ll complete the previous example. The following listing provides the implementation of the missing functions:
fn pa<T: Scored>(a: &T, b: &T) -> i32 {
(a.get_score() + b.get_score()) / 2
}
fn ph<T: Scored>(a: &T, b: &T) -> i32 {
let af = a.get_score() as f64;
let bf = b.get_score() as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Both functions have a generic type argument T
, so are able to receive two
objects of some unknown type. But the syntax <T: Scored>
specify a restriction
for the type: it must implement the Scored
trait. It means we are able to
call the get_score()
method for any object of T
type.
As in previous examples, the function does receive shared references, since
it is enough to call get_score()
by its trait definition.
So this way the pa()
and ph()
are implemented in a generic way. As an
important implementation detail, the compiler silently generates the corresponding
code for all the types implementing the trait (here were two types), so there is
no performance penalty at execution time.
Another equivalent syntax for trait bounds:
fn pa<T>(a: &T, b: &T) -> i32
where
T: Scored,
{
(a.get_score() + b.get_score()) / 2
}
fn ph<T>(a: &T, b: &T) -> i32
where
T: Scored,
{
let af = a.get_score() as f64;
let bf = b.get_score() as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Note
|
There is a less used syntax for trait bounds using the impl keyword
where the function parameter specify the trait complying
argument: fn function-name(param: impl trait-name) {…} .
|
A related less efficient implementation is presented in the following listing. Here
we defined an "attribute empty" structure ImplAveraging
which implements the
Averaging
trait with the previous functions. But this trait is generic in T
, so
we implement it for T=SchoolScore
and T=WorkerScore
:
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
pub struct WorkerScore {
#[allow(dead_code)]
name: String,
points: i32,
}
trait Averaging<T> {
fn pa(a: &T, b: &T) -> i32;
fn ph(a: &T, b: &T) -> i32;
fn promote(a: &mut T);
}
struct ImplAveraging;
impl Averaging<SchoolScore> for ImplAveraging {
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
fn promote(s: &mut SchoolScore) {
if s.score < 14 {
s.score = 14;
}
}
}
impl Averaging<WorkerScore> for ImplAveraging {
fn pa(a: &WorkerScore, b: &WorkerScore) -> i32 {
(a.points + b.points) / 2
}
fn ph(a: &WorkerScore, b: &WorkerScore) -> i32 {
let af = a.points as f64;
let bf = b.points as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
fn promote(w: &mut WorkerScore) {
if w.points < 14 {
w.points = 14;
}
}
}
fn main() {
let mut s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let mut s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("Student averages:");
ImplAveraging::promote(&mut s1);
ImplAveraging::promote(&mut s2);
println!("PA={}", ImplAveraging::pa(&s1, &s2));
println!("PH={}", ImplAveraging::ph(&s1, &s2));
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
ImplAveraging::promote(&mut w1);
ImplAveraging::promote(&mut w2);
println!("PA={}", ImplAveraging::pa(&w1, &w2));
println!("PH={}", ImplAveraging::ph(&w1, &w2));
}
Trait Objects
In the previous example, the averaging function implementations for our two test types was resolved at compilation time by selecting the appropriate function implementation.
In some situations it is not convenient or possible to resolve the call at compilation time, but only at runtime; this "late binding" is implemented by an additional indirection to be resolved at invocation time, adding a (small) time penalty. This is the same situation which happens with C++ virtual methods, and is the default behavior in Java.
In Rust, it is possible to borrow objects with references tied to
a trait; these are kwown as "trait objects" and the &dyn
syntax
is used to distinguish them from plain references:
fn pa(a: &dyn Scored, b: &dyn Scored) -> i32 {
(a.get_score() + b.get_score()) / 2
}
fn ph(a: &dyn Scored, b: &dyn Scored) -> i32 {
let af = a.get_score() as f64;
let bf = b.get_score() as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Supertraits
It is a common pattern to specialize the types requirements by specifying a hierarchy of interfaces or classes. In Rust, traits may be dependent on other traits (or supertraits.)
In the previous example, the Promotable
trait may be
implemented in terms of an improved Scored
trait, able to
read and update the score:
pub trait Scored {
fn get_score(&self) -> i32;
fn set_score(&mut self, s: i32);
}
The new Promotable
trait relies on the
object being scored:
pub trait Promotable: Scored {
fn promote(&mut self) -> bool;
With this in place, both Promotable
implementations may be
exactly the same. To avoid code duplication, we’ll write
a single "default" trait implementation inside the trait
declaration:
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
pub struct WorkerScore {
#[allow(dead_code)]
name: String,
points: i32,
}
pub trait Scored {
fn get_score(&self) -> i32;
fn set_score(&mut self, s: i32);
}
pub trait Promotable: Scored {
fn promote(&mut self) -> bool {
if self.get_score() < 14 {
self.set_score(14);
true
} else {
false
}
}
}
impl Scored for SchoolScore {
fn get_score(&self) -> i32 {
self.score
}
fn set_score(&mut self, score: i32) {
self.score = score
}
}
impl Scored for WorkerScore {
fn get_score(&self) -> i32 {
self.points
}
fn set_score(&mut self, points: i32) {
self.points = points
}
}
impl Promotable for SchoolScore {}
impl Promotable for WorkerScore {}
fn main() {
let mut s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let mut s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("Student averages:");
s1.promote();
s2.promote();
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
w1.promote();
w2.promote();
println!("PA={}", pa(&w1, &w2));
println!("PH={}", ph(&w1, &w2));
}
fn pa<T: Scored>(a: &T, b: &T) -> i32 {
(a.get_score() + b.get_score()) / 2
}
fn ph<T: Scored>(a: &T, b: &T) -> i32 {
let af = a.get_score() as f64;
let bf = b.get_score() as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
As shown, empty trait implementations of Promotable
simply
inherit the default implementation.
Associated functions
As in C++/Java, static methods are associated to a type but not
to a particular instance. In rust, the corresponding "associated functions"
doesn’t have the self
parameter.
The following program adds an associated function to the Scored
trait. That
method is used to build a new instance of a concrete type:
fn build(text: &str, value: i32) -> Self;
Note the usage of the Self
keyword (upper case) signaling the return
value is of some type implementing the trait. It will be invocated
in a generic way, for example:
T::build("PA", val)
The so built objects will be the new return value for the averaging functions:
fn pa<T: Scored>(a: &T, b: &T) -> T {...}
As before, the concrete type for T
is resolved at compilation
time by the analysis of the trait implementing types.
We also added the duo_promoter()
function to illustrate
the passing of mutable references:
pub struct SchoolScore {
period: String,
score: i32,
}
pub struct WorkerScore {
name: String,
points: i32,
}
pub trait Scored {
fn get_score(&self) -> i32;
fn set_score(&mut self, s: i32);
fn build(text: &str, value: i32) -> Self;
}
pub trait Promotable: Scored {
fn promote(&mut self) -> bool {
if self.get_score() < 14 {
self.set_score(14);
true
} else {
false
}
}
}
impl Scored for SchoolScore {
fn get_score(&self) -> i32 {
self.score
}
fn set_score(&mut self, score: i32) {
self.score = score
}
fn build(text: &str, value: i32) -> Self {
SchoolScore {
period: String::from(text),
score: value,
}
}
}
impl Scored for WorkerScore {
fn get_score(&self) -> i32 {
self.points
}
fn set_score(&mut self, points: i32) {
self.points = points
}
fn build(text: &str, value: i32) -> Self {
WorkerScore {
name: String::from(text),
points: value,
}
}
}
impl Promotable for SchoolScore {}
impl Promotable for WorkerScore {}
fn main() {
let mut s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let mut s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("Student averages:");
duo_promoter(&mut s1, &mut s2);
let xpa = pa(&s1, &s2);
println!("{}={}", xpa.period, xpa.get_score());
let xph = ph(&s1, &s2);
println!("{}={}", xph.period, xph.get_score());
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
duo_promoter(&mut w1, &mut w2);
let xpa = pa(&w1, &w2);
println!("{}={}", xpa.name, xpa.get_score());
let xph = ph(&w1, &w2);
println!("{}={}", xpa.name, xph.get_score());
}
fn duo_promoter<T: Promotable>(a: &mut T, b: &mut T) {
a.promote();
b.promote();
}
fn pa<T: Scored>(a: &T, b: &T) -> T {
let val = (a.get_score() + b.get_score()) / 2;
T::build("PA", val)
}
fn ph<T: Scored>(a: &T, b: &T) -> T {
let af = a.get_score() as f64;
let bf = b.get_score() as f64;
let ans = 2.0 * af * bf / (af + bf);
let val = ans.round() as i32;
T::build("PH", val)
}
Let’s retry a previous variant: set the SchoolScore’s `period
attribute to a string slice:
period: &str
This forced us to introduce lifetimes:
period: &str -> period: &'a str -> SchoolScore<'a> ...
Now this will impact in the traits. Specifically, the build()
method
does receive a &str
which is directly used for the period
attribute,
but there is no guarantee that the pointed string survives to the
new returned SchoolScore
object. Note that this was not an issue
in the previous implementation, since the &str
was used (copied)
only for the creation of a brand new String
with their own
managed memory.
So the compiler will ask us to introduce lifetimes specifiers and just following its suggestions we came to this listing:
pub struct SchoolScore<'a> {
//#[allow(dead_code)]
period: &'a str,
score: i32,
}
pub struct WorkerScore {
#[allow(dead_code)]
name: String,
points: i32,
}
pub trait Scored<'a> {
fn get_score(&self) -> i32;
fn set_score(&mut self, s: i32);
fn build(text: &'a str, value: i32) -> Self;
}
pub trait Promotable<'a>: Scored<'a> {
fn promote(&mut self) -> bool {
if self.get_score() < 14 {
self.set_score(14);
true
} else {
false
}
}
}
impl<'a> Scored<'a> for SchoolScore<'a> {
fn get_score(&self) -> i32 {
self.score
}
fn set_score(&mut self, score: i32) {
self.score = score
}
fn build(text: &'a str, value: i32) -> Self {
SchoolScore {
period: text,
score: value,
}
}
}
impl<'a> Scored<'a> for WorkerScore {
fn get_score(&self) -> i32 {
self.points
}
fn set_score(&mut self, points: i32) {
self.points = points
}
fn build(text: &'a str, value: i32) -> Self {
WorkerScore {
name: String::from(text),
points: value,
}
}
}
impl<'a> Promotable<'a> for SchoolScore<'a> {}
impl Promotable<'_> for WorkerScore {}
fn main() {
let mut s1 = SchoolScore {
period: "January",
score: 16,
};
let mut s2 = SchoolScore {
period: "February",
score: 12,
};
println!("Student averages:");
duo_promoter(&mut s1, &mut s2);
let xpa = pa(&s1, &s2);
println!("{}={}", xpa.period, xpa.get_score());
let xph = ph(&s1, &s2);
println!("{}={}", xph.period, xph.get_score());
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
duo_promoter(&mut w1, &mut w2);
let xpa = pa(&w1, &w2);
println!("{}={}", xpa.name, xpa.get_score());
let xph = ph(&w1, &w2);
println!("{}={}", xpa.name, xph.get_score());
}
fn duo_promoter<'a, T: Promotable<'a>>(a: &mut T, b: &mut T) {
a.promote();
b.promote();
}
fn pa<'a, T: Scored<'a>>(a: &T, b: &T) -> T {
let val = (a.get_score() + b.get_score()) / 2;
T::build("PA", val)
}
fn ph<'a, T: Scored<'a>>(a: &T, b: &T) -> T {
let af = a.get_score() as f64;
let bf = b.get_score() as f64;
let ans = 2.0 * af * bf / (af + bf);
let val = ans.round() as i32;
T::build("PH", val)
}
It could be interesting to compare the previous situation with a simplified C program:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct SchoolScore {
const char *period;
int score;
};
struct SchoolScore build(const char *ptr, int score) {
struct SchoolScore ans;
ans.period = ptr;
ans.score = score;
return ans;
}
int main() {
char *ptr = malloc(100);
printf("period? ");
fgets(ptr, 100, stdin);
ptr[strlen(ptr)-1]='\0';
struct SchoolScore s = build(ptr, 15);
/*free(ptr);
char *xptr = malloc(100);
strcpy(xptr, "hehe");*/
printf("%s -> %d\n", s.period, s.score);
return 0;
}
Its execution works as expected:
period? March March -> 15
But uncommenting the commented lines leads to undefined behavior. In my computer the previous memory block was reused:
period? April hehe -> 15
Rust did protect us from destroying the allocated *ptr
buffer while
the structure object was in use.
The static lifetime
Despite its name, the "static lifetime" is not related to static methods. The 'static
lifetime
signals Rust that the associated objects live for the entire program execution.
The most known case are the literal texts placed in the program’s text: their type may
be annotated as &'static str
.
In the previous examples, all the text strings procede from the literal constants in the source
code, so a simpler (yet more restricted) version may be obtained by defining the SchoolScore
structure as:
pub struct SchoolScore {
period: &'static str,
score: i32,
}
Note that this is not a general solution but a very restricted one: for example, the period
text
can no longer stem from user input, disk files, etc. For more
emphasis, from https://doc.rust-lang.org/stable/book/ch10-03-lifetime-syntax.html#the-static-lifetime:
"[…] before specifying 'static
as the lifetime for a reference, think about whether
the reference you have actually lives the entire lifetime of your program or not […]. Most of
the time, the problem results from attempting to create a dangling reference or a mismatch of
the available lifetimes. In such cases, the solution is fixing those problems, not specifying
the 'static
lifetime."
pub struct SchoolScore {
period: &'static str,
score: i32,
}
pub struct WorkerScore {
name: String,
points: i32,
}
pub trait Scored {
fn get_score(&self) -> i32;
fn set_score(&mut self, s: i32);
fn build(text: &'static str, value: i32) -> Self;
}
pub trait Promotable: Scored {
fn promote(&mut self) -> bool {
if self.get_score() < 14 {
self.set_score(14);
true
} else {
false
}
}
}
impl Scored for SchoolScore {
fn get_score(&self) -> i32 {
self.score
}
fn set_score(&mut self, score: i32) {
self.score = score
}
fn build(text: &'static str, value: i32) -> Self {
SchoolScore {
period: text,
score: value,
}
}
}
impl Scored for WorkerScore {
fn get_score(&self) -> i32 {
self.points
}
fn set_score(&mut self, points: i32) {
self.points = points
}
fn build(text: &str, value: i32) -> Self {
WorkerScore {
name: String::from(text),
points: value,
}
}
}
impl Promotable for SchoolScore {}
impl Promotable for WorkerScore {}
fn main() {
let mut s1 = SchoolScore {
period: "January",
score: 16,
};
let mut s2 = SchoolScore {
period: "February",
score: 12,
};
println!("Student averages:");
duo_promoter(&mut s1, &mut s2);
let xpa = pa(&s1, &s2);
println!("{}={}", xpa.period, xpa.get_score());
let xph = ph(&s1, &s2);
println!("{}={}", xph.period, xph.get_score());
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
duo_promoter(&mut w1, &mut w2);
let xpa = pa(&w1, &w2);
println!("{}={}", xpa.name, xpa.get_score());
let xph = ph(&w1, &w2);
println!("{}={}", xpa.name, xph.get_score());
}
fn duo_promoter<T: Promotable>(a: &mut T, b: &mut T) {
a.promote();
b.promote();
}
fn pa<T: Scored>(a: &T, b: &T) -> T {
let val = (a.get_score() + b.get_score()) / 2;
T::build("PA", val)
}
fn ph<T: Scored>(a: &T, b: &T) -> T {
let af = a.get_score() as f64;
let bf = b.get_score() as f64;
let ans = 2.0 * af * bf / (af + bf);
let val = ans.round() as i32;
T::build("PH", val)
}
This behavior of literal text constants is similar in the C programming language; for example, the following listing works as expected (compare with the C example in the previous section):
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct SchoolScore {
const char *period;
int score;
};
struct SchoolScore build(const char *ptr, int score) {
struct SchoolScore ans;
ans.period = ptr;
ans.score = score;
return ans;
}
int main() {
char *ptr = "June";
struct SchoolScore s = build(ptr, 15);
char *xptr = malloc(100);
strcpy(xptr, "hehe");
printf("%s -> %d\n", s.period, s.score);
return 0;
}
More on generic types
Our "library" of averaging functions is restricted to i32
quantities. We were
lucky our scores were given in such unit. Let’s assume our WorkerScore
struct has
a u32
unit for the score. To make things more interesting, let’s also assume
that in the future some (not yet written) type will need to deal
with f64
scores.
A way to implement these requirements is to define a new trait to be satisfied by the scores, involving conversions to/from f64, since we’ll implement all the calculations in double precision floating point values:
pub trait ScoreUnit {
fn to_f64(&self) -> f64;
fn from_f64(val: f64) -> Self;
}
We will provide implementations of the trait for the types we are interested: i32
,
u32
and f64
. It is interesting that Rust allows the implementation of
user traits for any built-in type. For example:
impl ScoreUnit for i32 {
fn to_f64(&self) -> f64 {
*self as f64
}
fn from_f64(val: f64) -> Self {
val.round() as i32
}
}
The only strange item is the *self
expression, which means "the value pointed
by the reference", i.e. the ScoreUnit
object which in this context is an i32
.
Now the Scored
trait may be generic:
pub trait Scored<T> where T: ScoreUnit{...}
The averaging functions now will be generic in two parameters: the
scoring unit (K
) and the scored entity (T
), so both types
must be declared. For example:
fn pa<K: ScoreUnit, T: Scored<K>>(a: &T, b: &T) -> T {...}
Now the full listing. Observe the implementation of the Promotable
trait:
pub struct SchoolScore {
period: String,
score: i32,
}
pub struct WorkerScore {
name: String,
points: u32,
}
pub trait ScoreUnit {
fn to_f64(&self) -> f64;
fn from_f64(val: f64) -> Self;
}
impl ScoreUnit for i32 {
fn to_f64(&self) -> f64 {
*self as f64
}
fn from_f64(val: f64) -> Self {
val.round() as i32
}
}
impl ScoreUnit for u32 {
fn to_f64(&self) -> f64 {
*self as f64
}
fn from_f64(val: f64) -> Self {
val.round() as u32
}
}
impl ScoreUnit for f64 {
fn to_f64(&self) -> f64 {
*self as f64
}
fn from_f64(val: f64) -> Self {
val
}
}
pub trait Scored<T>
where
T: ScoreUnit,
{
fn get_score(&self) -> T;
fn set_score(&mut self, s: T);
fn build(text: &str, value: T) -> Self;
}
pub trait Promotable<T>: Scored<T>
where
T: ScoreUnit,
{
fn promote(&mut self) -> bool {
if self.get_score().to_f64() < 14.0 {
self.set_score(ScoreUnit::from_f64(14.0));
true
} else {
false
}
}
}
impl Scored<i32> for SchoolScore {
fn get_score(&self) -> i32 {
self.score
}
fn set_score(&mut self, score: i32) {
self.score = score
}
fn build(text: &str, value: i32) -> Self {
SchoolScore {
period: String::from(text),
score: value,
}
}
}
impl Scored<u32> for WorkerScore {
fn get_score(&self) -> u32 {
self.points
}
fn set_score(&mut self, points: u32) {
self.points = points
}
fn build(text: &str, value: u32) -> Self {
WorkerScore {
name: String::from(text),
points: value,
}
}
}
impl Promotable<i32> for SchoolScore {}
impl Promotable<u32> for WorkerScore {}
fn main() {
let mut s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let mut s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("Student averages:");
duo_promoter(&mut s1, &mut s2);
let xpa = pa(&s1, &s2);
println!("{}={}", xpa.period, xpa.get_score());
let xph = ph(&s1, &s2);
println!("{}={}", xph.period, xph.get_score());
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
duo_promoter(&mut w1, &mut w2);
let xpa = pa(&w1, &w2);
println!("{}={}", xpa.name, xpa.get_score());
let xph = ph(&w1, &w2);
println!("{}={}", xpa.name, xph.get_score());
}
fn duo_promoter<K: ScoreUnit, T: Promotable<K>>(a: &mut T, b: &mut T) {
a.promote();
b.promote();
}
fn pa<K: ScoreUnit, T: Scored<K>>(a: &T, b: &T) -> T {
let val = (a.get_score().to_f64() + b.get_score().to_f64()) / 2.0;
T::build("PA", ScoreUnit::from_f64(val))
}
fn ph<K: ScoreUnit, T: Scored<K>>(a: &T, b: &T) -> T {
let af = a.get_score().to_f64();
let bf = b.get_score().to_f64();
let ans = 2.0 * af * bf / (af + bf);
let val = ScoreUnit::from_f64(ans);
T::build("PH", val)
}
Associated types
Returning to the averaging functions like:
fn pa<K: ScoreUnit, T: Scored<K>>(a: &T, b: &T) -> T {...}
The complexity of this declaration stems from the degree of generalization
of the Scored
trait. But the structure types only implement Scored
once
for its corresponding scoring type:
impl Scored<i32> for SchoolScore {...}
impl Scored<u32> for WorkerScore {...}
Given the current requirements, we may assume that there is no sense in providing additional trait implementations; i.e. the following would be pointless:
impl Scored<SomeOtherType> for SchoolScore {...}
...
impl Scored<AnotherType> for WorkerScore {...}
...
If a single trait implementation is required for a type, then we may employ a simpler option using "associated types":
pub trait Scored {
type A: ScoreUnit;
fn get_score(&self) -> Self::A;
fn set_score(&mut self, s: Self::A);
fn build(text: &str, value: Self::A) -> Self;
}
The type
keyword replaces the <T>
generic parameter. The values
corresponding to this type are specified with the syntax Self::T
.
The implementations must resolve the associated type by associating a concrete type (like in this example) or with the help of a generic parameter:
impl Scored for SchoolScore {
type A = i32;
fn get_score(&self) -> i32 {
self.score
}
...
}
As shown, the associated types have sense where there is a unique relation between an implementing type, and some trait component type. Now the modified program:
pub struct SchoolScore {
period: String,
score: i32,
}
pub struct WorkerScore {
name: String,
points: u32,
}
pub trait ScoreUnit {
fn to_f64(&self) -> f64;
fn from_f64(val: f64) -> Self;
}
impl ScoreUnit for i32 {
fn to_f64(&self) -> f64 {
*self as f64
}
fn from_f64(val: f64) -> Self {
val.round() as i32
}
}
impl ScoreUnit for u32 {
fn to_f64(&self) -> f64 {
*self as f64
}
fn from_f64(val: f64) -> Self {
val.round() as u32
}
}
impl ScoreUnit for f64 {
fn to_f64(&self) -> f64 {
*self as f64
}
fn from_f64(val: f64) -> Self {
val
}
}
pub trait Scored {
type A: ScoreUnit;
fn get_score(&self) -> Self::A;
fn set_score(&mut self, s: Self::A);
fn build(text: &str, value: Self::A) -> Self;
}
pub trait Promotable: Scored {
fn promote(&mut self) -> bool {
if self.get_score().to_f64() < 14.0 {
self.set_score(ScoreUnit::from_f64(14.0));
true
} else {
false
}
}
}
impl Scored for SchoolScore {
type A = i32;
fn get_score(&self) -> i32 {
self.score
}
fn set_score(&mut self, score: i32) {
self.score = score
}
fn build(text: &str, value: i32) -> Self {
SchoolScore {
period: String::from(text),
score: value,
}
}
}
impl Scored for WorkerScore {
type A = u32;
fn get_score(&self) -> u32 {
self.points
}
fn set_score(&mut self, points: u32) {
self.points = points
}
fn build(text: &str, value: u32) -> Self {
WorkerScore {
name: String::from(text),
points: value,
}
}
}
impl Promotable for SchoolScore {}
impl Promotable for WorkerScore {}
fn main() {
let mut s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let mut s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("Student averages:");
duo_promoter(&mut s1, &mut s2);
let xpa = pa(&s1, &s2);
println!("{}={}", xpa.period, xpa.get_score());
let xph = ph(&s1, &s2);
println!("{}={}", xph.period, xph.get_score());
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
duo_promoter(&mut w1, &mut w2);
let xpa = pa(&w1, &w2);
println!("{}={}", xpa.name, xpa.get_score());
let xph = ph(&w1, &w2);
println!("{}={}", xpa.name, xph.get_score());
}
fn duo_promoter<T: Promotable>(a: &mut T, b: &mut T) {
a.promote();
b.promote();
}
fn pa<T: Scored>(a: &T, b: &T) -> T {
let val = (a.get_score().to_f64() + b.get_score().to_f64()) / 2.0;
T::build("PA", ScoreUnit::from_f64(val))
}
fn ph<T: Scored>(a: &T, b: &T) -> T {
let af = a.get_score().to_f64();
let bf = b.get_score().to_f64();
let ans = 2.0 * af * bf / (af + bf);
let val = ScoreUnit::from_f64(ans);
T::build("PH", val)
}
As shown, now the averaging functions returned to a single generic parameter, reducing unnecessary complexity:
fn pa<T: Scored>(a: &T, b: &T) -> T {...}
The From
trait
The last example could be made in a more "idiomatic" way by leveraging the
From
trait. We may define a new type named ScoreValue
containing a
f64
, and convesion facilities to translate to/from this new type and
the required i32
, u32
and f64
. For example:
pub struct ScoreValue {
value: f64,
}
impl From<i32> for ScoreValue {
fn from(item: i32) -> Self {
ScoreValue { value: item as f64 }
}
}
// then we may do:
let i : i32 = 12;
let sval = ScoreValue::from(i); // sval is of ScoreValue type
Previously we used extesively this trait when creating String
objects from
literal text constants using String::from("some text")
.
The From
trait implies the Into
trait, so in the previous example, an
alternative syntax would be:
let i : i32 = 12;
let sval : ScoreValue = i.into(); // sval's type required for disambiguation
Note that From
is not included in the automatically
imported types (prelude), so a use std::convert::From
directive is supplied
at the beginning of the listing (this is similar to C++'s use
, or Java’s import
directives):
use std::convert::From;
pub struct SchoolScore {
period: String,
score: i32,
}
pub struct WorkerScore {
name: String,
points: u32,
}
pub struct ScoreValue {
value: f64,
}
impl From<i32> for ScoreValue {
fn from(item: i32) -> Self {
ScoreValue { value: item as f64 }
}
}
impl From<u32> for ScoreValue {
fn from(item: u32) -> Self {
ScoreValue { value: item as f64 }
}
}
impl From<f64> for ScoreValue {
fn from(item: f64) -> Self {
ScoreValue { value: item }
}
}
impl From<ScoreValue> for i32 {
fn from(item: ScoreValue) -> Self {
item.value.round() as i32
}
}
impl From<ScoreValue> for u32 {
fn from(item: ScoreValue) -> Self {
item.value.round() as u32
}
}
impl From<ScoreValue> for f64 {
fn from(item: ScoreValue) -> Self {
item.value
}
}
pub trait Scored {
fn get_score(&self) -> ScoreValue;
fn set_score(&mut self, s: ScoreValue);
fn build(text: &str, value: ScoreValue) -> Self;
}
pub trait Promotable: Scored {
fn promote(&mut self) -> bool {
// let v64: f64 = self.get_score().into();
let v64 = f64::from(self.get_score());
if v64 < 14.0 {
self.set_score(ScoreValue::from(14.0));
true
} else {
false
}
}
}
impl Scored for SchoolScore {
fn get_score(&self) -> ScoreValue {
ScoreValue::from(self.score)
}
fn set_score(&mut self, score: ScoreValue) {
self.score = score.into()
}
fn build(text: &str, value: ScoreValue) -> Self {
SchoolScore {
period: String::from(text),
score: value.into(),
}
}
}
impl Scored for WorkerScore {
fn get_score(&self) -> ScoreValue {
ScoreValue::from(self.points)
}
fn set_score(&mut self, points: ScoreValue) {
self.points = points.into()
}
fn build(text: &str, value: ScoreValue) -> Self {
WorkerScore {
name: String::from(text),
points: value.into(),
}
}
}
impl Promotable for SchoolScore {}
impl Promotable for WorkerScore {}
fn main() {
let mut s1 = SchoolScore {
period: String::from("January"),
score: 16,
};
let mut s2 = SchoolScore {
period: String::from("February"),
score: 12,
};
println!("Student averages:");
duo_promoter(&mut s1, &mut s2);
let xpa = pa(&s1, &s2);
println!("{}={}", xpa.period, i32::from(xpa.get_score()));
let xph = ph(&s1, &s2);
println!("{}={}", xph.period, i32::from(xph.get_score()));
let mut w1 = WorkerScore {
name: String::from("Mike"),
points: 18,
};
let mut w2 = WorkerScore {
name: String::from("Lance"),
points: 8,
};
println!("Worker averages:");
duo_promoter(&mut w1, &mut w2);
let xpa = pa(&w1, &w2);
println!("{}={}", xpa.name, i32::from(xpa.get_score()));
let xph = ph(&w1, &w2);
println!("{}={}", xpa.name, i32::from(xph.get_score()));
}
fn duo_promoter<T: Promotable>(a: &mut T, b: &mut T) {
a.promote();
b.promote();
}
fn pa<T: Scored>(a: &T, b: &T) -> T {
let val = (f64::from(a.get_score()) + f64::from(b.get_score())) / 2.0;
T::build("PA", ScoreValue::from(val))
}
fn ph<T: Scored>(a: &T, b: &T) -> T {
let af: f64 = a.get_score().into();
let bf: f64 = b.get_score().into();
let ans = 2.0 * af * bf / (af + bf);
let val = ScoreValue::from(ans);
T::build("PH", val)
}
-
If the school scores have a range
0-20
, then theu8
type would be enough for their storage. Implement the required changes for its support -
The
promote()
implementation function contains the comparison:if v64 < 14.0 {…}
. It is bad style to have a parameter value like14.0
interspersed in the code expressions. Following https://doc.rust-lang.org/stable/rust-by-example/custom_types/constants.html, replace it with a constant defined at the beginning of the program like:const PROM_THRESHOLD: f64 = 14.0;
-
The
PROM_THRESHOLD
is a global constant in the program. It is possible to define type-specific constants; see https://doc.rust-lang.org/reference/items/associated-items.html#associated-constants for more information
Dynamic allocation
The Box
data type allows the storage of some information in the heap. It is a pointer
to some (user provided) data, which is placed in a dynamically allocated memory region.
The allocated memory is automatically freed when the pointer goes out of scope, so it is more like a C++ smart pointer.
The following program Makes use of boxed variables. To make things a bit more interesting,
pa()
will return a numeric score, but ph()
a boxed object.
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn main() {
let s1 = Box::new(SchoolScore {
period: String::from("January"),
score: 16,
});
let s2 = Box::new(SchoolScore {
period: String::from("February"),
score: 12,
});
println!("AVG({},{}) ", (*s1).score, (*s2).score);
println!("PA={}", pa(&(*s1), &(*s2)));
let xph = ph(&s1, &s2);
println!("PH={}", (*xph).score);
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &Box<SchoolScore>, b: &Box<SchoolScore>) -> Box<SchoolScore> {
let af = (*a).score as f64;
let bf = (*b).score as f64;
let ans = 2.0 * af * bf / (af + bf);
let ans = ans.round() as i32;
Box::new(SchoolScore {
period: String::from("PH"),
score: ans,
})
}
The boxed contained object is extracted with the "de-reference" operator (like in *s1.score
.) But
Rust is smart enough to deduce we want the de-referenced version and the program may be simplified
in the following way (as if no Boxing happened at all):
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn main() {
let s1 = Box::new(SchoolScore {
period: String::from("January"),
score: 16,
});
let s2 = Box::new(SchoolScore {
period: String::from("February"),
score: 12,
});
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(&s1, &s2));
let xph = ph(&s1, &s2);
println!("PH={}", xph.score);
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> Box<SchoolScore> {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
Box::new(SchoolScore {
period: String::from("PH"),
score: ans.round() as i32,
})
}
This automatic de-referencing is called "Deref coercion". From https://doc.rust-lang.org/stable/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods:
"Deref coercion works only on types that implement the Deref
trait. Deref coercion converts such a
type into a reference to another type."
Reading from command line
The following program shows how to read the contents of the SchoolScore
line to line
from standard input:
use std::io;
use std::io::Write;
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn input(prompt: &str, line: &mut String) {
print!("{}", prompt);
io::stdout().flush().unwrap();
line.clear();
io::stdin().read_line(line).unwrap();
}
fn main() {
let mut line = String::new();
println!("First score");
input("Enter period> ", &mut line);
let period1 = String::from(line.trim());
input("Enter score> ", &mut line);
let score1: i32 = line.trim().parse().unwrap();
let s1 = SchoolScore {
period: period1,
score: score1,
};
println!("Second score");
input("Enter period> ", &mut line);
let period2 = String::from(line.trim());
input("Enter score> ", &mut line);
let score2: i32 = line.trim().parse().unwrap();
let s2 = SchoolScore {
period: period2,
score: score2,
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Notable points:
-
The
input()
function displays the message prompt before reading text; we use theprint!
macro (instead of theprintln!
) to avoid sending a new line after the text -
We’re forcing the flushing of the text from the
print!
macro withStdout::flush()
-
The text line is read with
Stdin::read_line()
which requires a mutable reference to aString
-
The
flush()
,read_line()
andparse()
functions return aResult
value which is an enumeration expressing whether the operation was or not successful (Ok
orErr
variants); see below for more information -
This
Result
value is internally marked with the#[must_use]
attribute, so it will generate a warning if not handled by the programmer -
The
line.clear()
call simply removes the previous contents in theString
(elseread_line()
would append the input to the previous stored text) -
The trimmed
String
is converted to ani32
by theparse()
function
We "resolve" the #[must_use]
attribute in the Result
value by calling
the unwrap()
method, which simply panics on the error case. See also the
expect(msg)
method which also panics on error, but displaying a user provided
error message.
Note that the standard library provides several enumerations named Result
with
the same variants. The I/O related functions return std::io::Result
, but the
parse()
function does return std::Result
.
-
Replace the
unwrap()
calls withexpect()
and try tu force their failure cases -
Test what happens when parsing if we avoid the
trim()
call -
The
input()
function could be simpler if it just return a brand newString
instead of reusing a single instance; try it!
Matching variants
The Result
type allows a more intelligent handling of the error condition; a
usual pattern is to request the user to re-enter a non valid text. This imply
a loop in the execution flow, which may be provided by the Rust’s loop
keyword.
The loop
keyword has the form loop { … }
and in principle in an "infinite loop",
since it requires an explicit interruption for termination: here we simply
return
from the containing function, ending the loop. Another way corresponds to
the break
keyword, which immediatly jumps to the next line after the loop.
As in C or Java, there is also a continue
keyword which forces an immediate
new loop iteration, which was used in the present example.
Returning to the processing of the Result
value, it is usualy done by
pattern matching on its variants, with the match
facility:
match io::Result-value {
io::Result::Ok(value) => some-expression,
io::Result::Err(msg) => some-expression,
}
The last variants may be abbreviated with the _
wildcard. This is
similar to the "default case" of the switch
block in the C programming
language. Also, note that the variants Ok
and Err
are imported
into the prelude:
use std::io;
use std::io::Write;
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn input_string(prompt: &str) -> String {
loop {
print!("{}", prompt);
io::stdout().flush().unwrap();
let mut line = String::new();
match io::stdin().read_line(&mut line) {
Ok(_) => return String::from(line.trim()),
_ => continue,
}
}
}
fn input_number(prompt: &str) -> i32 {
loop {
print!("{}", prompt);
io::stdout().flush().unwrap();
let mut line = String::new();
match io::stdin().read_line(&mut line) {
Ok(_) => (),
_ => continue,
}
match line.trim().parse() {
Ok(val) => return val,
_ => continue,
}
}
}
fn input() -> SchoolScore {
let period = input_string("Enter period> ");
let score = input_number("Enter score> ");
SchoolScore {
period: period,
score: score,
}
}
fn main() {
println!("First score");
let s1 = input();
let s2 = input();
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
Error handling with Result
The Result
enumeration is used as return value for user
functions which may fail for some interesting reason. As shown, its
variants are Ok
for holding a successful result, and Err
for holding
a failure reason.
For more illustation, lets adapt this previous version of the averaging program:
fn main() {
let x1 = 16;
let x2 = 12;
println!("AVG({},{}) PA={} PH={}", x1, x2, pa(x1, x2), ph(x1, x2));
}
fn pa(a: i32, b: i32) -> i32 {
(a + b) / 2
}
fn ph(a: i32, b: i32) -> i32 {
let af = a as f64;
let bf = b as f64;
let ans = 2.0 * af * bf / (af + bf);
println!("PH (f64) is {}", ans);
ans.round() as i32
}
Here we’ll introduce Result
for the corresponding return values and
a user defined error type ScoreError
, considering that the scores
must be in the range 0-20, and that our harmonic average formula
fails when both scores are zero:
enum ScoreError {
OutOfRange,
BothZero,
}
fn main() {
let x1 = 16;
//let x2 = 12;
let x2 = -10;
let arith = match pa(x1, x2) {
Ok(x) => x,
_ => panic!("Invalid arguments for pa()"),
};
let harm = match ph(x1, x2) {
Ok(x) => x,
_ => panic!("Invalid arguments for ph()"),
};
println!("AVG({},{}) PA={} PH={}", x1, x2, arith, harm);
}
fn pa(a: i32, b: i32) -> Result<i32, ScoreError> {
if a < 0 || a > 20 || b < 0 || b > 20 {
return Err(ScoreError::OutOfRange);
}
Ok((a + b) / 2)
}
fn ph(a: i32, b: i32) -> Result<i32, ScoreError> {
if a == 0 && b == 0 {
return Err(ScoreError::BothZero);
}
if a < 0 || a > 20 || b < 0 || b > 20 {
return Err(ScoreError::OutOfRange);
}
let af = a as f64;
let bf = b as f64;
let ans = 2.0 * af * bf / (af + bf);
Ok(ans.round() as i32)
}
Which purposedly fails this way:
$ cargo run --bin p0470 Finished dev [unoptimized + debuginfo] target(s) in 0.01s Running `target/debug/p0470` thread 'main' panicked at 'Invalid arguments for pa()', src/bin/p0470.rs:11:14 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
For more information, we may obtain basic information about the error cause
by deriving Debug
:
#[derive(Debug)]
enum ScoreError {
OutOfRange,
BothZero,
}
fn main() {
let x1 = 16;
//let x2 = 12;
let x2 = -10;
let arith = match pa(x1, x2) {
Ok(x) => x,
Err(e) => panic!("Error calling pa(): {:?}", e),
};
let harm = match ph(x1, x2) {
Ok(x) => x,
Err(e) => panic!("Error calling ph(): {:?}", e),
};
println!("AVG({},{}) PA={} PH={}", x1, x2, arith, harm);
}
which produce:
$ cargo run --bin p0472 Finished dev [unoptimized + debuginfo] target(s) in 0.01s Running `target/debug/p0472` thread 'main' panicked at 'Error calling pa(): OutOfRange', src/bin/p0472.rs:12:19 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
A more erognomic way to handle those errors may be obtained with the
?
operator, which extract the Ok()
variant’s value, or simply
forwards (like an exception re-throw) the error for the caller to handle it. Here the error is
passed to main()
, whose return value is adapted appropiatedly:
#[derive(Debug)]
enum ScoreError {
OutOfRange,
BothZero,
}
fn main() -> Result<(), ScoreError> {
let x1 = 16;
//let x2 = 12;
let x2 = -10;
println!("AVG({},{}) PA={} PH={}", x1, x2, pa(x1, x2)?, ph(x1, x2)?);
Ok(())
}
The standard library provides an std::error::Error
trait which
is the conventional super type for the errors in the language. The
following version of the program defines an ScoreError
type
implementing the trait: it has Debug
and Display
as
supertypes so we "derive" the first one, and implement the
second one.
The program asks for two numbers, which are captured from the
command line. Here two kinds of error may happen: an I/O related
error when reading from the standard input, and a "parsing" error
when translating the user text into a number. Both are considered
as special values of the ScoreError
, and we preserve their
original value as arguments of the enum variants.
Now let’s see the function:
fn get_number() -> Result<i32, ScoreError> {...}
It may "fail" because of an I/O error or a parsing error; in both
cases we employ the "?" operator to "re-throw" the exception to
the caller, but with the ScoreError
type.
That means that the I/O errors and the parsing errors must be
"converted" into the corresponding variants of ScoreError
when
employing the "?" operator.
In order to achieve this conversion we simple implement the From
trait for those error types:
impl From<io::Error> for ScoreError {...}
impl From<num::ParseIntError> for ScoreError {...}
Finally, for demonstration purposes we are explicitly displaying
the generated error (if it happens) in the main()
function.
use std::error;
use std::fmt;
use std::io;
use std::num;
#[derive(Debug)]
enum ScoreError {
OutOfRange(usize),
BothZero,
IO(io::Error),
Parse(num::ParseIntError),
}
impl error::Error for ScoreError {}
impl fmt::Display for ScoreError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
ScoreError::BothZero => write!(f, "Both scores are zero"),
ScoreError::Parse(err) => write!(f, "Invalid number can't be parsed: {}", err),
ScoreError::OutOfRange(par) => write!(f, "Out of range in parameter {}", par),
ScoreError::IO(err) => write!(f, "I/O related error: {}", err),
}
}
}
impl From<io::Error> for ScoreError {
fn from(err: io::Error) -> ScoreError {
ScoreError::IO(err)
}
}
impl From<num::ParseIntError> for ScoreError {
fn from(err: num::ParseIntError) -> ScoreError {
ScoreError::Parse(err)
}
}
fn main() {
if let Err(e) = run() {
println!("Error: {}", e);
}
}
fn run() -> Result<(), ScoreError> {
let x1 = get_number()?;
let x2 = get_number()?;
println!("AVG({},{}) PA={} PH={}", x1, x2, pa(x1, x2)?, ph(x1, x2)?);
Ok(())
}
fn get_number() -> Result<i32, ScoreError> {
let mut buf: String = String::new();
io::stdin().read_line(&mut buf)?;
let number = buf.trim().parse::<i32>()?;
Ok(number)
}
fn pa(a: i32, b: i32) -> Result<i32, ScoreError> {
if a < 0 || a > 20 {
return Err(ScoreError::OutOfRange(1));
}
if b < 0 || b > 20 {
return Err(ScoreError::OutOfRange(2));
}
Ok((a + b) / 2)
}
fn ph(a: i32, b: i32) -> Result<i32, ScoreError> {
if a == 0 && b == 0 {
return Err(ScoreError::BothZero);
}
if a < 0 || a > 20 {
return Err(ScoreError::OutOfRange(1));
}
if b < 0 || b > 20 {
return Err(ScoreError::OutOfRange(2));
}
let af = a as f64;
let bf = b as f64;
let ans = 2.0 * af * bf / (af + bf);
Ok(ans.round() as i32)
}
The topic of error handling in Rust is not trivial. Please see https://blog.burntsushi.net/rust-error-handling/ for a comprehensive explanation. More ideas may be read in https://doc.rust-lang.org/rust-by-example/error/multiple_error_types.html .
Regular expressions
Rust does not provide built-in support for regular expressions, so we need
to add the regex
crate into Cargo.toml
:
[dependencies]
regex = "1.5"
As an illustration, from https://turreta.com/2019/09/14/rust-validate-email-address-using-regular-expressions/
we’ll take a regular expression for basic email validation (if interested in this topic, please see
the discussion in
https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression
). In the following
listing, a regular expression matcher is built
with the Regex::new()
method (the regular expression is compiled) and then is used to verify an
email address provided by the user. Note that this program is a small evolution from the previous
one:
use regex::Regex;
use std::io;
use std::io::Write;
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
#[allow(dead_code)]
email: String,
score: i32,
}
// same as previous listing ...
fn input() -> SchoolScore {
let re = Regex::new(
r"^([a-z0-9_+]([a-z0-9_+.]*[a-z0-9_+])?)@([a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6})",
)
.unwrap();
let period = input_string("Enter period> ");
let email;
loop {
let t_email = input_string("Enter email> ");
if re.is_match(&t_email) {
email = t_email;
break;
}
println!("Bad email address - please retry");
}
let score = input_number("Enter score> ");
SchoolScore {
period: period,
score: score,
email: email,
}
}
Note
|
For serious email address validation you should avoid regular expressions at all and employ a special purpose library like https://crates.io/crates/validator . |
-
On invalid intput, the example just retry the input, but do not explain what’s happening: provide some information for the user in the error cases
-
Modify the regular expression in order to accept email addresses of an specific domain (like
university.edu
) -
Remove the regular expressions support from
Cargo.toml
and add thevalidator
crate for email address verification
Random scores
The usual way of generating random numbers with with he help
of the rand::Rng
generator, which is obtained with the
rand::thread_rng()
function. In the following example we
get values in the range [0,21)
; that is, from zero to
20:
use rand::{thread_rng, Rng};
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn main() {
let mut rndgen = thread_rng();
let s1 = SchoolScore {
period: String::from("January"),
score: rndgen.gen_range(0, 21),
};
let s2 = SchoolScore {
period: String::from("February"),
score: rndgen.gen_range(0, 21),
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", pa(&s1, &s2));
println!("PH={}", ph(&s1, &s2));
}
fn pa(a: &SchoolScore, b: &SchoolScore) -> i32 {
(a.score + b.score) / 2
}
fn ph(a: &SchoolScore, b: &SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
This random number generator is not part of the Rust standard library, but
an additional library: a Rust "crate". In order to make it available in the
project, the create must be listed as a dependency in the Cargo.toml
configuration file:
[dependencies]
rand = "0.5.5"
On the next project build, the library will be downloaded into the Rust installation for further use.
-
Search for the
rand
crate at https://crates.io/; what is its "current" version? -
Read about the multiple ways to specify the dependency’s crate versions at https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html
-
See more ideas on random number generation in https://rust-lang-nursery.github.io/rust-cookbook/algorithms/randomness.html
-
From the previous link, investigate a more reasonable way to model random scores using the normal (https://docs.rs/rand_distr/0.4.0/rand_distr/struct.Normal.html) distribution, and improve the current simulation
More on looping
With our random number generator, now we’ll create a disk file with
any number of random scores. Observe the use of the loop
keyword
for the iteration and how it is terminated with an explicit break
:
use rand::{thread_rng, Rng};
use std::fs::File;
use std::io;
use std::io::Write;
fn input_number(prompt: &str) -> u32 {
loop {
print!("{}", prompt);
io::stdout().flush().unwrap();
let mut line = String::new();
match io::stdin().read_line(&mut line) {
Ok(_) => (),
_ => continue,
}
match line.trim().parse() {
Ok(val) => return val,
_ => continue,
}
}
}
fn main() {
let num = input_number("How many records? ");
let mut rndgen = thread_rng();
let mut file = File::create("/tmp/test-output.txt").expect("Can't create file");
let mut z = 0;
loop {
if z >= num {
break;
}
let score = rndgen.gen_range(0, 21);
writeln!(&mut file, "{}->{}", z, score).unwrap();
z += 1;
}
}
In order to opening the file in "append" mode, change the File::create()
call
with something like:
let mut file = OpenOptions::new().append(true).
open("/tmp/test-output.txt").expect("Can't open file for append");
Note
|
There is no close() method for the file handle: the file is
closed when its handler object goes out of score. The closing may be
forced by destroying the handler by calling drop(&file) .
|
Execution test:
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0580 Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished dev [unoptimized + debuginfo] target(s) in 0.73s Running `target/debug/p0580` How many records? 4 diego@dataone:~/devel/RUST/xtut$ cat /tmp/test-output.txt 0->17 1->2 2->18 3->13
Since this loop pattern is very common, Rust provides the classic
while condition {…}
construct:
use rand::{thread_rng, Rng};
use std::fs::File;
use std::io;
use std::io::Write;
fn input_number(prompt: &str) -> u32 {
loop {
print!("{}", prompt);
io::stdout().flush().unwrap();
let mut line = String::new();
match io::stdin().read_line(&mut line) {
Ok(_) => (),
_ => continue,
}
match line.trim().parse() {
Ok(val) => return val,
_ => continue,
}
}
}
fn main() {
let num = input_number("How many records? ");
let mut rndgen = thread_rng();
let mut file = File::create("/tmp/test-output.txt").expect("Can't create file");
let mut z = 0;
while z < num {
let score = rndgen.gen_range(0, 21);
writeln!(&mut file, "{}->{}", z, score).unwrap();
z += 1;
}
}
and a for range {…}
construct, where the range is specified
with the syntax a..b
corresponding to the values a
, a+1
…
b-1
, but not including b
. There is an alternative a..=b
syntax
for including the final extreme b
.
use rand::{thread_rng, Rng};
use std::fs::File;
use std::io;
use std::io::Write;
fn input_number(prompt: &str) -> u32 {
loop {
print!("{}", prompt);
io::stdout().flush().unwrap();
let mut line = String::new();
match io::stdin().read_line(&mut line) {
Ok(_) => (),
_ => continue,
}
match line.trim().parse() {
Ok(val) => return val,
_ => continue,
}
}
}
fn main() {
let num = input_number("How many records? ");
let mut rndgen = thread_rng();
let mut file = File::create("/tmp/test-output.txt").expect("Can't create file");
for z in 0..num {
let score = rndgen.gen_range(0, 21);
writeln!(&mut file, "{}->{}", z, score).unwrap();
}
}
Note
|
The for range extremes are evaluated only once before the first loop iteration,
and remain fixed until its termination, even if they were the result of some variable
expression whose value would change during the loop.
|
-
Try to verify the last note regarding the
for
range immutablility: make thenum
a mutable variable, and inside the loop reset it to a small value (like zero) -
Do the same test with the
while
loop example
Modules
Modules introduce a controlled access scope and namespacing for types and functions. As a trivial example, the following listing does contain a module containing the averaging functions:
use rand::{thread_rng, Rng};
pub struct SchoolScore {
#[allow(dead_code)]
period: String,
score: i32,
}
fn main() {
let mut rndgen = thread_rng();
let s1 = SchoolScore {
period: String::from("January"),
score: rndgen.gen_range(0, 21),
};
let s2 = SchoolScore {
period: String::from("February"),
score: rndgen.gen_range(0, 21),
};
println!("AVG({},{}) ", s1.score, s2.score);
println!("PA={}", averaging::pa(&s1, &s2));
println!("PH={}", averaging::ph(&s1, &s2));
}
pub mod averaging {
pub fn pa(a: &super::SchoolScore, b: &super::SchoolScore) -> i32 {
(a.score + b.score) / 2
}
pub fn ph(a: &super::SchoolScore, b: &super::SchoolScore) -> i32 {
let af = a.score as f64;
let bf = b.score as f64;
let ans = 2.0 * af * bf / (af + bf);
ans.round() as i32
}
}
Here the averaging
module is specified with the mod name {…}
syntax, but
usualy, modules are stored in its own file with no need for this enclosing
syntax; in this case, calling code needs to specify a mod name;
declaration.
The SchoolScore
type is outside the averaging
module, which technically is
the parent (default) module. For this reason, from the module it needs to
be referenced whith the super::SchoolScore
syntax.
To make things a bit confusing, module files may be created inside
the src/
directory with a file name name.rs
which defines the module name,
or inside a new subdirectory under src/
with the predefined file name
mod.rs
; in this case, the subdirectory name defines the module name.
The code which is not under a module is added to a default module named crate
.
-
Create a new project with the example code into
main.rs
, but put theaveraging
module in its ownaveraging.rs
file (besidesmain.rs
.) -
In
averagin.rs
, make sure to remove the enclosingmod averaging {…}
since it is no longer needed (leaving it would define anaveraging::averaging
submodule) -
Add the
mod averaging;
at the beginning ofmain.rs
in order to access the module -
The module functions are private by default. Remove the
pub
directive from the module functions to verify that the parent module is unable to access them
Vectors
Let’s create a number of scores with the help of our random number generator:
use rand::distributions::Alphanumeric;
use rand::{thread_rng, Rng};
use std::io;
use std::io::Write;
#[derive(Debug)]
pub struct SchoolScore {
period: String,
score: i32,
}
fn input_number(prompt: &str) -> u32 {
loop {
print!("{}", prompt);
io::stdout().flush().unwrap();
let mut line = String::new();
match io::stdin().read_line(&mut line) {
Ok(_) => (),
_ => continue,
}
match line.trim().parse() {
Ok(val) => return val,
_ => continue,
}
}
}
fn build_record<R: Rng>(rng: &mut R) -> SchoolScore {
let sz = rng.gen_range(5, 16);
let period: String = rng.sample_iter(&Alphanumeric).take(sz).collect();
let score = rng.gen_range(0, 21);
SchoolScore { period, score }
}
fn main() {
let num = input_number("How many records? ");
let mut rndgen = thread_rng();
let mut records: Vec<SchoolScore> = Vec::new();
for _ in 0..num {
let rec = build_record(&mut rndgen);
records.push(rec);
}
for rec in records.iter() {
println!("Record is {:?}", rec);
}
}
The Vec<T>
is a "vector" container similar to the corresponding C++ type or
Java’s ArrayList
: it allows the insertion of arbitrary number of same-type
elements, preserving insertion order, and providing very fast access of
the elements by position. We create a new vector with its new()
static
method, assigning to a mutable and explicitely typed variable (like
records
in the example.)
Since the SchoolScore
type is composed from basic Rust types, a "debug view" may
be automatically generated with the #[derive(Debug)]
directive, so its instances
may be printed with the {:?}
placeholder notation.
Also, Random::Rng
is a trait with for some unknown implementation, so we
employ a trait bound in order to provide its value to the build_record()
function. Note
the use of the sample_iter()
method used to generate random strings containing
5-15 characters; those characters are letters (upper and lower case) and numeric
digits. Compare with the following implementation which generates random strings
containing characters in the set A-Z
(ASCII codes 65 to 90):
// sample_iter() imitation
let mut period = String::new();
for _ in 0..rng.gen_range(5, 16) {
let ch = char::from(rng.gen_range(65, 91));
period.push(ch);
}
Returning to the vector, the push()
method is used to append elements into it. An
important functionality is shown in the display of the vector contents:
for rec in records.iter() {
println!("Record is {:?}", rec);
}
Here the for
keyword is running on a vector’s iterator obtained by iter()
. This
is a "read only" iterator which does not allow any kind of modification of the elements.
There are two other kinds of iterator obtained by:
for rec in records.iter_mut() {...}
this iterator allows the modification of the elements while being iterated; and:
for rec in records.into_iter() {...}
This iterator "consumes" the elements from the vector, and transfer the vector’s ownership into the function, so after the iteration, the vector variable can no longer be used. This last iteration mode may be abbreviated with the following syntax:
for rec in records {...} // like into_iter()
For example, with the following modifications the last sample
does not compile since the records
variable was "moved" after
the first loop:
// ... previous lines same as before ...
for rec in records {
println!("1st run, record is {:?}", rec);
}
for rec in records {
println!("2nd run, record is {:?}", rec);
}
Of course, using records.iter()
it would be okay.
-
Complete the
sample_iter()
imitation shown above in order to include lower case letters and digits -
The vector contents are stored in memory. Set a limit for the size of the generation (for example, 100000 records)
-
Make the program to pause by requesting some text from the console before the final display of the records. Run the program, and while paused, check its memory consumption with the operating system utilities. Do it for several values of the the vector
Averaging the vector
From the previous example, we’ll reinsert the calculation of the averaging functions, but now they will work for any number of values. Also, this version does show how to capture command line arguments: the number of elements must be specified in the command line. Following the C/Unix convention, the first argument (position zero) is the executable program name, and the second one (position 1) will be the user provided number.
The arguments are provided in a Rust specific type named env::Args
which is an iterator. It is converted into a Vec
for
convenience.
use rand::distributions::Alphanumeric;
use rand::{thread_rng, Rng};
use std::env;
#[derive(Debug)]
pub struct SchoolScore {
period: String,
score: i32,
}
fn build_record<R: Rng>(rng: &mut R) -> SchoolScore {
let period: String = rng.sample_iter(&Alphanumeric).take(10).collect();
let score = rng.gen_range(0, 21);
SchoolScore { period, score }
}
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() != 2 {
panic!("Must provide the number of records");
}
let num = match args[1].parse() {
Ok(val) => val,
_ => panic!("Must provide the number of records"),
};
let mut rndgen = thread_rng();
let mut records: Vec<SchoolScore> = Vec::new();
for _ in 0..num {
let rec = build_record(&mut rndgen);
records.push(rec);
}
for rec in records.iter() {
println!("Record is {:?}", rec);
}
println!("PA={}", pa(&records));
println!("PH={}", ph(&records));
}
fn pa(data: &Vec<SchoolScore>) -> i32 {
let mut total: f64 = 0.0;
for rec in data.iter() {
total = total + f64::from(rec.score);
}
let n = data.len() as f64;
(total / n).round() as i32
}
fn ph(data: &Vec<SchoolScore>) -> i32 {
let mut total: f64 = 0.0;
for rec in data.iter() {
total = total + 1.0 / f64::from(rec.score);
}
let n = data.len() as f64;
(n / total).round() as i32
}
Besides iterators, vector elements may be extracted with
the var[position]
syntax (as in the args
variable.) Usualy, it is
better to take a reference with &var[position]
, since the
former transfer the ownership of the element from the vector. For example,
the following program works as expected:
fn main() {
let mut something: Vec<String> = Vec::new();
something.push(String::from("one"));
something.push(String::from("two"));
something.push(String::from("three"));
let k = &something[1];
println!("element[1] is {}", k);
}
But after changing the assignation of k
into:
let k = something[1];
then we get the following error:
error[E0507]: cannot move out of index of `Vec<String>` --> src/bin/p0665.rs:6:13 | 6 | let k = something[1]; | ^^^^^^^^^^^^ | | | move occurs because value has type `String`, which does not implement the `Copy` trait | help: consider borrowing here: `&something[1]`
Another way to extract elements is with the get(index)
method: it returns
an Option<&T>
where None
is returned when the index is out of range.
-
If no command line arguments are provided, or if an invalid value is provided, then ask the user to provide the required value using the
input_number()
function shown in previous example
A vector application
This section is a recapilutation of the previous concepts. The following example shows a
new SchoolClass
containing a vector of Strudent`s. The `SchoolClass
does not implements
any trait, but has a number of methods: it behaves like a classic C++ or Java class.
The averaging functions now are methods which operate in the internal vector. The best()
method
is interesting in its return value: the Option
may be None
if no students were
added to the SchoolClass
instance.
#[derive(Debug)]
pub struct Student {
name: String,
age: u8,
sex: Sex,
score: u8,
}
#[derive(Debug)]
pub enum Sex {
MALE,
FEMALE,
}
pub struct SchoolClass {
name: String,
people: Vec<Student>,
}
impl SchoolClass {
fn new(name: &str) -> SchoolClass {
SchoolClass {
name: String::from(name),
people: Vec::new(),
}
}
fn new_student(&mut self, name: &str, age: u8, sex: Sex, score: u8) {
let s = Student {
name: String::from(name),
age: age,
sex: sex,
score: score,
};
self.people.push(s);
}
fn pa(&self) -> u8 {
let mut total: f64 = 0.0;
for rec in self.people.iter() {
total = total + f64::from(rec.score);
}
let n = self.people.len() as f64;
(total / n).round() as u8
}
fn ph(&self) -> u8 {
let mut total: f64 = 0.0;
for rec in self.people.iter() {
total = total + 1.0 / f64::from(rec.score);
}
let n = self.people.len() as f64;
(n / total).round() as u8
}
fn len(&self) -> usize {
self.people.len()
}
fn best(&self) -> Option<&Student> {
let mut idx: i32 = -1;
let mut cur_max: u8 = 0;
for (i, rec) in self.people.iter().enumerate() {
if rec.score > cur_max {
idx = i as i32;
cur_max = rec.score;
}
}
if idx < 0 {
return None;
}
Some(&self.people[idx as usize])
}
}
fn main() {
let mut school_class = SchoolClass::new("Rust 101");
school_class.new_student("Mike Schumacher", 21, Sex::MALE, 17);
school_class.new_student("Max Verstappen", 25, Sex::MALE, 19);
school_class.new_student("Checo Perez", 31, Sex::MALE, 18);
school_class.new_student("Marcus Mazepin", 22, Sex::MALE, 9);
println!("There are {} students", school_class.len());
println!("PA={}", school_class.pa());
println!("PH={}", school_class.ph());
let best_student = school_class.best();
println!("Best of class {}: {:?}", school_class.name, best_student);
}
See also the enumerate()
method which is used for returning tuples containing
the interation index (from zero) and the iterated element.
-
The averages will fail when there are no students in the vector; an
Option<u8>
would be a better return value for them -
Implement a function to find the "worst" student
Initializing vectors
Rust provides the vec!
macro to help in the initialization of vector values. The following
example illustrates its usage by randomly combining names and last names. Observe the use of
literal constants, so the resulting vector type is Vec<&str>
:
use rand::{thread_rng, Rng};
#[derive(Debug)]
pub struct SchoolScore {
driver: String,
score: i32,
}
fn build_record<R: Rng>(rng: &mut R, fnames: &Vec<&str>, lnames: &Vec<&str>) -> SchoolScore {
let idx_fname = rng.gen_range(0, fnames.len());
let idx_lname = rng.gen_range(0, lnames.len());
let mut driver = String::new();
driver.push_str(fnames[idx_fname]);
driver.push_str(" ");
driver.push_str(lnames[idx_lname]);
let score = rng.gen_range(0, 21);
SchoolScore { driver, score }
}
fn main() {
let num = 5;
let fnames = vec![
"Mike",
"Niki",
"Ayrton",
"Max",
"Lewis",
"Kimi",
"Sebastian",
"Sergio",
"Jenson",
"Daniel",
];
let lnames = vec![
"Schumacher",
"Lauda",
"Senna",
"Verstappen",
"Hamilton",
"Raikkonen",
"Vettel",
"Perez",
"Button",
"Ricciardo",
"Russel",
];
let mut rndgen = thread_rng();
let mut records: Vec<SchoolScore> = Vec::new();
for _ in 0..num {
let rec = build_record(&mut rndgen, &fnames, &lnames);
records.push(rec);
}
for rec in records.iter() {
println!("Record is {:?}", rec);
}
}
A run example:
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0685 Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running `target/debug/p0685` Record is SchoolScore { driver: "Sergio Ricciardo", score: 16 } Record is SchoolScore { driver: "Max Schumacher", score: 13 } Record is SchoolScore { driver: "Mike Ricciardo", score: 11 } Record is SchoolScore { driver: "Max Raikkonen", score: 0 } Record is SchoolScore { driver: "Lewis Vettel", score: 19 }
Slices
The Slice types provide a "view" for a region of a vector, and other
types containing collections of elements. If the contained elements
are of type T
, then the slice type is denoted by [T]
.
It is not possible to transfer the ownership of "some contents" of
a vector (or any related type), so the slices will normally be used as
references: &[T]
.
The slices may be obtained by methods in the Slice
and SliceMut
traits
(like as_slice()
), but usually they are automatically generated by
"coercion". The previous example may be translated to slice notation
simply by changing:
fn build_record<R: Rng>(rng: &mut R, fnames: &Vec<&str>, lnames: &Vec<&str>) -> SchoolScore {...}
into
fn build_record<R: Rng>(rng: &mut R, fnames: &[&str], lnames: &[&str]) -> SchoolScore {...}
The rest of the code remains exactly as before. That is, slices provide element access by
the var[position]
syntax, the len()
method, etc.
The key is the automatic coercion by the syntax &fnames
or &lnames
: it generates a
slice pointing to the full vector contents.
But there are more slice expressions; for example, changing the build_record()
invocation to:
let rec = build_record(&mut rndgen, &fnames[0..2], &lnames[0..3]);
Does generate slices corresponding to views of only the first two and three elements of the mentioned vectors, respectively. A run of this program follows:
diego@dataone:~/devel/RUST/xtut$ cargo run --bin p0691 Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running `target/debug/p0691` Record is SchoolScore { driver: "Mike Schumacher", score: 17 } Record is SchoolScore { driver: "Niki Schumacher", score: 11 } Record is SchoolScore { driver: "Mike Schumacher", score: 12 } Record is SchoolScore { driver: "Niki Senna", score: 14 } Record is SchoolScore { driver: "Mike Schumacher", score: 9 }
See https://rust-lang.github.io/rfcs/0198-slice-notation.html for more information.
Note that String
contained text may also be coerced to slice references. Its type is
&str
. See https://doc.rust-lang.org/stable/book/ch04-03-slices.html#string-slices
for more information.
Arrays
As in many programming languages, are compile time fixed-length ordered lists
of same-type elements. For example, the following defines a five element
array containing &str
elements. This array’s type is denoted by [&str;5]
:
let some_array = ["ha", "he", "hi", "ho", "hu"];
So the type [T;n]
(where n
is a non negative integer constant) denotes an array
containing exactly n
elements, each of type T
.
The previous examples include two vectors which are never resized, so they may be
replaced by arrays. And like vectors, arrays of type [T;n]
(where n
is ths array
size) are coerced into slices of type [T]
.
So, following with the previous examples, in order to interchange vectors by arrays the only required modification is to replace these lines:
let fnames = vec![
...
];
let lnames = vec![
...
]:
by the following:
let fnames = [
...
];
let lnames = [
...
]:
That all! Note that the types of both arrays are [&str;10]
and [&str;11]
; when
calling by &fname
or &lname
, the arguments become coerced slice refences of
type &[&str]
. Now the full listing for the new version:
use rand::{thread_rng, Rng};
#[derive(Debug)]
pub struct SchoolScore {
driver: String,
score: i32,
}
fn build_record<R: Rng>(rng: &mut R, fnames: &[&str], lnames: &[&str]) -> SchoolScore {
let idx_fname = rng.gen_range(0, fnames.len());
let idx_lname = rng.gen_range(0, lnames.len());
let mut driver = String::new();
driver.push_str(fnames[idx_fname]);
driver.push_str(" ");
driver.push_str(lnames[idx_lname]);
let score = rng.gen_range(0, 21);
SchoolScore { driver, score }
}
fn pa(records: &[SchoolScore]) -> u8 {
let mut total: f64 = 0.0;
for rec in records.iter() {
total = total + f64::from(rec.score);
}
let n = records.len() as f64;
(total / n).round() as u8
}
fn ph(records: &[SchoolScore]) -> u8 {
let mut total: f64 = 0.0;
for rec in records.iter() {
total = total + 1.0 / f64::from(rec.score);
}
let n = records.len() as f64;
(n / total).round() as u8
}
fn main() {
let num = 5;
let fnames = [
"Mike",
"Niki",
"Ayrton",
"Max",
"Lewis",
"Kimi",
"Sebastian",
"Sergio",
"Jenson",
"Daniel",
];
let lnames = [
"Schumacher",
"Lauda",
"Senna",
"Verstappen",
"Hamilton",
"Raikkonen",
"Vettel",
"Perez",
"Button",
"Ricciardo",
"Russel",
];
let mut rndgen = thread_rng();
let mut records: Vec<SchoolScore> = Vec::new();
for _ in 0..num {
let rec = build_record(&mut rndgen, &fnames, &lnames);
records.push(rec);
}
for rec in records.iter() {
println!("Record is {:?}", rec);
}
println!("PA={}", pa(&records));
println!("PH={}", ph(&records));
}
See more examples in https://doc.rust-lang.org/rust-by-example/primitives/array.html.
The HashMap
container
Also known as dictionary or associative array, allows the unordered storage of single typed values, each one associated and referenced by a single type unique key value.
For example, the students in a school usualy have a unique code for clear identification. From our
previous student class example, we’ll add a the id
property for the students:
#[derive(Debug)]
pub struct Student {
id: String,
name: String,
age: u8,
sex: Sex,
score: u8,
}
The id
is a String
which (in this example) will contain an incrementing
numeric value: this "counter" will be stored in the SchoolClass
instance.
The HashMap
is created by its new()
static method and is parametric in
two types: one for the keys and one for the values.
As shown in the next example, the iterator returns a pair containing
the (key,value)
pairs:
use std::collections::HashMap;
#[derive(Debug)]
pub struct Student {
id: String,
name: String,
age: u8,
sex: Sex,
score: u8,
}
#[derive(Debug)]
pub enum Sex {
MALE,
FEMALE,
}
pub struct SchoolClass {
name: String,
people: HashMap<String, Student>,
last_id: u64,
}
impl SchoolClass {
fn new(name: &str) -> SchoolClass {
SchoolClass {
name: String::from(name),
people: HashMap::new(),
last_id: 1000000,
}
}
fn new_student(&mut self, name: &str, age: u8, sex: Sex, score: u8) {
let id = self.last_id.to_string();
let idmap = id.clone();
self.last_id = self.last_id + 1;
let s = Student {
id: id,
name: String::from(name),
age: age,
sex: sex,
score: score,
};
self.people.insert(idmap, s);
}
fn pa(&self) -> u8 {
let mut total: f64 = 0.0;
for (_, rec) in self.people.iter() {
total = total + f64::from(rec.score);
}
let n = self.people.len() as f64;
(total / n).round() as u8
}
fn ph(&self) -> u8 {
let mut total: f64 = 0.0;
for (_, rec) in self.people.iter() {
total = total + 1.0 / f64::from(rec.score);
}
let n = self.people.len() as f64;
(n / total).round() as u8
}
fn len(&self) -> usize {
self.people.len()
}
fn best(&self) -> Option<&Student> {
let mut max_score: i32 = -1;
let mut the_best: Option<&Student> = Option::None;
for (_, rec) in self.people.iter() {
if rec.score as i32 > max_score {
max_score = rec.score as i32;
the_best = Some(rec);
}
}
the_best
}
}
fn main() {
let mut school_class = SchoolClass::new("Rust F1");
school_class.new_student("Mike Schumacher", 21, Sex::MALE, 17);
school_class.new_student("Max Verstappen", 25, Sex::MALE, 19);
school_class.new_student("Checo Perez", 31, Sex::MALE, 18);
school_class.new_student("Marcus Mazepin", 22, Sex::MALE, 9);
println!("There are {} students", school_class.len());
println!("PA={}", school_class.pa());
println!("PH={}", school_class.ph());
let best_student = school_class.best();
println!("Best of class {}: {:?}", school_class.name, best_student);
}
Sorting values
Ordered containers like vectors provide sorting facilities. We want to produce a listing of the students of the previous example sorted by age.
Since HashMap
is not an ordered container, as a first step will create
a vector with references to the elements. Remember that the other options
are:
-
A vector with clones from the elements (maybe too heavy)
-
Transfer the values from the
HashMap
(leaving it empty) into the vector
Note that this explicit distinction is one of the best selling points for Rust as compared to C/C++.
Our sorted vector will be generated from a SchoolClass
method
named by_age()
:
impl SchoolClass {
// ... same as before ...
fn age_sorter(age1: &&Student, age2: &&Student) -> Ordering {
if age1.age > age2.age {
Ordering::Less
} else if age1.age < age2.age {
Ordering::Greater
} else {
Ordering::Equal
}
}
fn by_age(&self) -> Vec<&Student> {
let mut drivers: Vec<&Student> = Vec::new();
for (_, rec) in self.people.iter() {
drivers.push(rec);
}
drivers.sort_by(Self::age_sorter);
drivers
}
}
fn main() {
// ... same as before ...
let students_by_age = school_class.by_age();
for student in students_by_age.iter() {
println!("By age -> {:?}", student);
}
}
Here we use the sort_by()
method which requires a comparator
function, here named age_sorter()
. The sort_by()
calls
the comparator with references to the vector elements. Since
the elements have type &Student
, then the calls will send
values of type &&Student
. This explains the age_sorter()
signature.
Ordered types
There are many criteria for ordering a vector of complex data types as in the
previous example. But some data types have a "natural" or "preferred" ordering
which depends on the domain model. Let’s assume that the natural ordering for
the students is based on their age. This situation in Rust is described by
the implementation of the Ord
trait.
As shown in https://doc.rust-lang.org/std/cmp/trait.Ord.html, this trait
has the Eq
and PartialOrd
as super traits, so both traits must be
implemented. They also have the PartialEq
as super trait, so it also
must be implemented.
When the Ord
trait is implemented, then the sort()
method is available
as shown in the following listing. We removed the averaging methods for clarity:
use std::cmp::Ord;
use std::cmp::Ordering;
use std::collections::HashMap;
#[derive(Debug)]
pub struct Student {
id: String,
name: String,
age: u8,
sex: Sex,
score: u8,
}
#[derive(Debug)]
pub enum Sex {
MALE,
FEMALE,
}
pub struct SchoolClass {
#[allow(dead_code)]
name: String,
people: HashMap<String, Student>,
last_id: u64,
}
impl Ord for Student {
fn cmp(&self, other: &Self) -> Ordering {
self.age.cmp(&other.age)
}
}
impl Eq for Student {}
impl PartialOrd for Student {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl PartialEq for Student {
fn eq(&self, other: &Self) -> bool {
self.age == other.age
}
}
impl SchoolClass {
fn new(name: &str) -> SchoolClass {
SchoolClass {
name: String::from(name),
people: HashMap::new(),
last_id: 1000000,
}
}
fn new_student(&mut self, name: &str, age: u8, sex: Sex, score: u8) {
let id = self.last_id.to_string();
let idmap = id.clone();
self.last_id = self.last_id + 1;
let s = Student {
id: id,
name: String::from(name),
age: age,
sex: sex,
score: score,
};
self.people.insert(idmap, s);
}
fn by_age(&self) -> Vec<&Student> {
let mut drivers: Vec<&Student> = self.people.values().collect();
drivers.sort();
drivers
}
}
fn main() {
let mut school_class = SchoolClass::new("Rust F1");
school_class.new_student("Mike Schumacher", 21, Sex::MALE, 17);
school_class.new_student("Max Verstappen", 25, Sex::MALE, 19);
school_class.new_student("Checo Perez", 31, Sex::MALE, 18);
school_class.new_student("Marcus Mazepin", 22, Sex::MALE, 9);
let students_by_age = school_class.by_age();
for student in students_by_age.iter() {
println!("By age -> {:?}", student);
}
}
Note also that we created the vector by collecting the HashMap
values,
which in turm was obtained from the values()
iterator.
Introducing closures
The sort_by()
method does require a comparator function, which previously
was implemented as a usual method named age_sorter
. This may be best served
with an anonymous function, which provide the "closure" functionality of some
programming languages. From https://doc.rust-lang.org/stable/book/ch13-01-closures.html:
"Rust’s closures are anonymous functions you can save in a variable or pass as arguments to other functions. You can create the closure in one place and then call the closure to evaluate it in a different context. Unlike functions, closures can capture values from the scope in which they’re defined."
The following version is an evolution from the sort_by()
call. Here we replaced
it by the sort_unstable_by()
which may be a little faster:
use std::collections::HashMap;
#[derive(Debug)]
pub struct Student {
id: String,
name: String,
age: u8,
sex: Sex,
score: u8,
}
#[derive(Debug)]
pub enum Sex {
MALE,
FEMALE,
}
pub struct SchoolClass {
#[allow(dead_code)]
name: String,
people: HashMap<String, Student>,
last_id: u64,
}
impl SchoolClass {
fn new(name: &str) -> SchoolClass {
SchoolClass {
name: String::from(name),
people: HashMap::new(),
last_id: 1000000,
}
}
fn new_student(&mut self, name: &str, age: u8, sex: Sex, score: u8) {
let id = self.last_id.to_string();
let idmap = id.clone();
self.last_id = self.last_id + 1;
let s = Student {
id: id,
name: String::from(name),
age: age,
sex: sex,
score: score,
};
self.people.insert(idmap, s);
}
fn by_age(&self) -> Vec<&Student> {
let mut drivers: Vec<&Student> = self.people.values().collect();
drivers.sort_unstable_by(|a, b| a.age.cmp(&b.age));
drivers
}
}
fn main() {
let mut school_class = SchoolClass::new("Rust F1");
school_class.new_student("Mike Schumacher", 21, Sex::MALE, 17);
school_class.new_student("Max Verstappen", 25, Sex::MALE, 19);
school_class.new_student("Checo Perez", 31, Sex::MALE, 18);
school_class.new_student("Marcus Mazepin", 22, Sex::MALE, 9);
let students_by_age = school_class.by_age();
students_by_age
.iter()
.for_each(|s| println!("By age -> {:?}", s));
}
This example illustrates the annonymous function syntax notation:
|parameters…| body
. Also, the final display was done with the help of
the for_each()
function, which executes a closure
as many times as elements are provided by the iterator (as returned by the
iter()
method.)
We may further simplify the program by employing the
sort_by_key()
method, which requires a closure which provide Ord
-implementing
values to be used as sorting keys: for our students, their ages have type u8
and
this type already implements the Ord
trait
(see https://doc.rust-lang.org/std/primitive.u8.html#trait-implementations for
more information):
fn by_age(&self) -> Vec<&Student> {
let mut drivers: Vec<&Student> = self.people.values().collect();
drivers.sort_by_key(|st| st.age);
drivers
}
Containing References
Imagine we have many Student
objects, all of them included in
the "total set", and some of them included in the
"top set". We may model this situation with the containers:
let mut vec_total: Vec<Student>;
let mut vec_top: Vec<Student>;
For this solution to work, we will need to duplicate each student belonging to both sets, which may be a waste of memory. Also, when updating a student, we will need to modify one or two objects, with the corresponding complexity of this tracking.
In some cases, a better solution may be to store references to the students:
let mut vec_total: Vec<&Student>;
let mut vec_top: Vec<&Student>;
This will work as long the referenced objects are alive while the vectors are used. Let’s review an example:
#[derive(Debug)]
pub struct Student {
id: u32,
name: String,
age: u8,
score: u8,
}
impl Student {
fn new(id: u32, name: &str, age: u8, score: u8) -> Student {
Student {
id: id,
name: String::from(name),
age: age,
score: score,
}
}
}
fn pa(data: &Vec<&Student>) -> u8 {
let mut total: f64 = 0.0;
for st in data.iter() {
total = total + f64::from(st.score);
}
let n = data.len() as f64;
(total / n).round() as u8
}
fn ph(data: &Vec<&Student>) -> u8 {
let mut total: f64 = 0.0;
for st in data.iter() {
total = total + 1.0 / f64::from(st.score);
}
let n = data.len() as f64;
(n / total).round() as u8
}
/* fn ac(vec_total: &mut Vec<&Student>) {
let st = Student::new(1001, "Andres Calamaro", 50, 12);
vec_total.push(&st);
} */
fn main() {
let mut vec_total: Vec<&Student> = Vec::new();
let mut vec_top: Vec<&Student> = Vec::new();
let st = Student::new(1001, "Mike Schumacher", 21, 17);
vec_total.push(&st);
vec_top.push(&st);
let st = Student::new(1002, "Max Verstappen", 25, 19);
vec_total.push(&st);
vec_top.push(&st);
let st = Student::new(1003, "Checo Perez", 31, 18);
vec_total.push(&st);
vec_top.push(&st);
let st = Student::new(1004, "Marcus Mazepin", 22, 9);
vec_total.push(&st);
/* ac(&mut vec_total); */
println!("TOTAL PA={}", pa(&vec_total));
println!("TOTAL PH={}", ph(&vec_total));
println!("TOP PA={}", pa(&vec_top));
println!("TOP PH={}", ph(&vec_top));
}
The most important issue here is what happens when the commented lines are enabled:
40 | vec_total.push(&st); | ---------------^^^- | | | | | borrowed value does not live long enough
The st
object dies with the ac()
function, so &st
would be
invalid, so Rust complains.
A third way to implement this scenary is the usage of "reference counting": here we will transfer the ownership of each object into one or many "intelligent references" to be stored in the vectors as needed. When all the containing references for an object are destroyed, then the object is destroyed:
use std::rc::Rc;
#[derive(Debug)]
pub struct Student {
id: u32,
name: String,
age: u8,
score: u8,
}
impl Student {
fn new(id: u32, name: &str, age: u8, score: u8) -> Student {
Student {
id: id,
name: String::from(name),
age: age,
score: score,
}
}
}
fn pa(data: &Vec<Rc<Student>>) -> u8 {
let mut total: f64 = 0.0;
for st in data.iter() {
total = total + f64::from(st.score);
}
let n = data.len() as f64;
(total / n).round() as u8
}
fn ph(data: &Vec<Rc<Student>>) -> u8 {
let mut total: f64 = 0.0;
for st in data.iter() {
total = total + 1.0 / f64::from(st.score);
}
let n = data.len() as f64;
(n / total).round() as u8
}
fn ac(vec_total: &mut Vec<Rc<Student>>) {
let st = Rc::new(Student::new(1001, "Andres Calamaro", 50, 12));
vec_total.push(st);
}
fn main() {
let mut vec_total: Vec<Rc<Student>> = Vec::new();
let mut vec_top: Vec<Rc<Student>> = Vec::new();
let st_total: Rc<Student> = Rc::new(Student::new(1001, "Mike Schumacher", 21, 17));
let st_top: Rc<Student> = Rc::clone(&st_total);
vec_total.push(st_total);
vec_top.push(st_top);
let st_total: Rc<Student> = Rc::new(Student::new(1002, "Max Verstappen", 25, 19));
let st_top: Rc<Student> = Rc::clone(&st_total);
vec_total.push(st_total);
vec_top.push(st_top);
let st_total: Rc<Student> = Rc::new(Student::new(1003, "Checo Perez", 31, 18));
let st_top: Rc<Student> = Rc::clone(&st_total);
vec_total.push(st_total);
vec_top.push(st_top);
let st_total: Rc<Student> = Rc::new(Student::new(1004, "Marcus Mazepin", 22, 9));
vec_total.push(st_total);
ac(&mut vec_total);
println!("TOTAL PA={}", pa(&vec_total));
println!("TOTAL PH={}", ph(&vec_total));
println!("TOP PA={}", pa(&vec_top));
println!("TOP PH={}", ph(&vec_top));
}
Now the ac()
function works as expected, since it is transfered into the
Rc
object.
Cells
A limitation with Rc
references is that the underlying objects are immutable. Following
our example, now we want to "improve" the scores lower than 14 using Rc
. For
this situation we want to "force" a mutable borrow from an immutable reference: this is
accomplished by the RefCell
type (and its partner Cell
.) Our vector will store
elements of type Rc<RefCell<Student>>
:
let mut vec_total: Vec<Rc<RefCell<Student>>>;
In order to simplify the code, we’ll create an alias for this type:
type V = Vec<Rc<RefCell<Student>>>;
Our new program follows:
use std::cell::RefCell;
use std::rc::Rc;
#[derive(Debug)]
pub struct Student {
id: u32,
name: String,
age: u8,
score: u8,
}
impl Student {
fn new(id: u32, name: &str, age: u8, score: u8) -> Student {
Student {
id: id,
name: String::from(name),
age: age,
score: score,
}
}
}
type V = Vec<Rc<RefCell<Student>>>;
fn improve(data: &V) {
for st in data.iter() {
let mut refer = st.borrow_mut();
if refer.score < 14 {
refer.score = refer.score + 1;
}
}
}
fn pa(data: &V) -> u8 {
let mut total: f64 = 0.0;
for st in data.iter() {
total = total + f64::from(st.borrow().score);
}
let n = data.len() as f64;
(total / n).round() as u8
}
fn ph(data: &V) -> u8 {
let mut total: f64 = 0.0;
for st in data.iter() {
total = total + 1.0 / f64::from(st.borrow().score);
}
let n = data.len() as f64;
(n / total).round() as u8
}
fn ac(vec_total: &mut V) {
let st = Rc::new(RefCell::new(Student::new(1001, "Andres Calamaro", 50, 12)));
vec_total.push(st);
}
fn main() {
let mut vec_total: V = Vec::new();
let mut vec_top: V = Vec::new();
let st_total = Rc::new(RefCell::new(Student::new(1001, "Mike Schumacher", 21, 17)));
let st_top = Rc::clone(&st_total);
vec_total.push(st_total);
vec_top.push(st_top);
let st_total = Rc::new(RefCell::new(Student::new(1002, "Max Verstappen", 25, 19)));
let st_top = Rc::clone(&st_total);
vec_total.push(st_total);
vec_top.push(st_top);
let st_total = Rc::new(RefCell::new(Student::new(1003, "Checo Perez", 31, 18)));
let st_top = Rc::clone(&st_total);
vec_total.push(st_total);
vec_top.push(st_top);
let st_total = Rc::new(RefCell::new(Student::new(1004, "Marcus Mazepin", 22, 9)));
vec_total.push(st_total);
ac(&mut vec_total);
improve(&vec_total);
println!("TOTAL PA={}", pa(&vec_total));
println!("TOTAL PH={}", ph(&vec_total));
println!("TOP PA={}", pa(&vec_top));
println!("TOP PH={}", ph(&vec_top));
}
The key line is in the improve()
function:
let mut refer = st.borrow_mut();
Here we are calling the borrow_mut()
obtaining a mutable reference
(of RefMut
type.) It allows the modification of the underlying
object as desired.
An direct generalization is to allow the cell to contain a trait object. The
following partial listing shows the required changes in order to support
the type V = Vec<Rc<RefCell<dyn MayImprove>>>
:
use std::cell::RefCell;
use std::rc::Rc;
#[derive(Debug)]
pub struct Student {
id: u32,
name: String,
age: u8,
score: u8,
}
impl Student {
fn new(id: u32, name: &str, age: u8, score: u8) -> Student {
Student {
id: id,
name: String::from(name),
age: age,
score: score,
}
}
}
type V = Vec<Rc<RefCell<dyn MayImprove>>>;
trait MayImprove {
fn get_name(&self) -> &String;
fn get_score(&self) -> u8;
fn improve(&mut self);
}
impl MayImprove for Student {
fn get_name(&self) -> &String {
&self.name
}
fn get_score(&self) -> u8 {
self.score
}
fn improve(&mut self) {
if self.score < 14 {
self.score = self.score + 1;
}
}
}
fn improve(data: &V) {
for st in data.iter() {
let mut refer = st.borrow_mut();
refer.improve();
}
}
Storing function pointers
A struct may "function pointers" as a way to implement dynamic dispatching. For example, the following program implements a "machine" in charge of the calculation of the class average, where the scores are dynamically added, and the averages are updated in turn:
struct AveragingMachine {
accumulated: f64,
counter: u16,
}
impl AveragingMachine {
fn add_score(&mut self, score: u8) {
self.counter = self.counter + 1;
self.accumulated = self.accumulated + score as f64;
}
fn average(&self) -> f64 {
self.accumulated / self.counter as f64
}
fn new() -> Self {
Self {
accumulated: 0.0,
counter: 0,
}
}
}
fn main() {
let mut machine = AveragingMachine::new();
machine.add_score(12);
machine.add_score(16);
machine.add_score(8);
println!("Current average: {}", machine.average());
machine.add_score(13);
machine.add_score(12);
println!("New average: {}", machine.average());
}
Let’s leverage and generalize this "machine" in order to process harmonic averages. Recalling the introduction of this text, the harmonic average may be defined as an arithmetic average of the inverse of the quantities, but with an additional inversion at the end:
PH = 1 / [ (1/a + 1/b + ...)/n ] = 1 / PA{1/x}
So, in order to calculate the harmonic average by leveraging the arithmetic average algorithm, we may consider the "pre-processing" of the elements to be accumulated; in terms of an anonymous function:
|x| 1 / x
Also, the answer needs a final "post-processing", which may be expressed with exactly the same previous anonymous function.
Note that the arithmetic average may be calculated as always by providing identity functions:
struct AveragingMachine {
accumulated: f64,
counter: u16,
pre_adder: fn(u8) -> f64,
post_answer: fn(f64) -> f64,
}
impl AveragingMachine {
fn add_score(&mut self, score: u8) {
self.counter = self.counter + 1;
self.accumulated = self.accumulated + (self.pre_adder)(score);
}
fn average(&self) -> f64 {
(self.post_answer)(self.accumulated / self.counter as f64)
}
fn new(pre_adder: fn(u8) -> f64, post_answer: fn(f64) -> f64) -> Self {
Self {
accumulated: 0.0,
counter: 0,
pre_adder: pre_adder,
post_answer: post_answer,
}
}
}
fn main() {
let mut machine = AveragingMachine::new(|x| x as f64, |x| x);
machine.add_score(12);
machine.add_score(16);
machine.add_score(8);
println!("Current average: {}", machine.average());
machine.add_score(13);
machine.add_score(12);
println!("New average: {}", machine.average());
}
Now, the harmonic average is calculated by modifying the anonymous functions:
// same as above
fn main() {
let mut machine_harmonic = AveragingMachine::new(|x| 1.0 / x as f64, |x| 1.0 / x);
machine_harmonic.add_score(12);
machine_harmonic.add_score(16);
machine_harmonic.add_score(8);
println!("Current average: {}", machine_harmonic.average());
machine_harmonic.add_score(13);
machine_harmonic.add_score(12);
println!("New average: {}", machine_harmonic.average());
}
Finally, the quadratic average, more known as "root mean square" is defined by:
RMS = SQRT[ (a^2 + b^2 + ...)/n ]
Again, only the anonymous functions need to be updated:
// same as above
fn main() {
let mut machine_rms = AveragingMachine::new(|x| (x as f64) * (x as f64), |x| f64::sqrt(x));
machine_rms.add_score(12);
machine_rms.add_score(16);
machine_rms.add_score(8);
println!("Current average: {}", machine_rms.average());
machine_rms.add_score(13);
machine_rms.add_score(12);
println!("New average: {}", machine_rms.average());
}
In the previous examples we avoided calling the anonymous functions
as "closures", since they indeed do not "close" over any other
variable. When a real closure is needed, then the traits Fn
,
FnMut
and FnOnce
are employed:
struct AveragingMachine<T, S>
where
T: Fn(u8) -> f64,
S: Fn(f64) -> f64,
{
accumulated: f64,
counter: u16,
pre_adder: T,
post_answer: S,
}
impl<T, S> AveragingMachine<T, S>
where
T: Fn(u8) -> f64,
S: Fn(f64) -> f64,
{
fn add_score(&mut self, score: u8) {
self.counter = self.counter + 1;
self.accumulated = self.accumulated + (self.pre_adder)(score);
}
fn average(&self) -> f64 {
(self.post_answer)(self.accumulated / self.counter as f64)
}
fn new(pre_adder: T, post_answer: S) -> Self {
Self {
accumulated: 0.0,
counter: 0,
pre_adder: pre_adder,
post_answer: post_answer,
}
}
}
fn main() {
let n = 2;
let mut machine_pow =
AveragingMachine::new(|x| (x as f64).powf(n as f64), |x| x.powf(1.0 / n as f64));
machine_pow.add_score(12);
machine_pow.add_score(16);
machine_pow.add_score(8);
println!("Current average: {}", machine_pow.average());
machine_pow.add_score(13);
machine_pow.add_score(12);
println!("New average: {}", machine_pow.average());
}
Firing threads
Everywhere you may find interesting information about the usefulness of threads. The Rust Programming Language provides details about the threading tradeoffs faced by the Rust creators: https://doc.rust-lang.org/book/ch16-01-threads.html and some basic examples may be find here: https://doc.rust-lang.org/rust-by-example/std_misc/threads.html .
Threads are started by providing a no-arguments closure
to the std::thread::spawn()
function. This method
returns a "join handle" useful to await for the thread
termination, and also (if needed) get a return value from it,
corresponding to the passed closure return value.
Following our theme, we will try to calculate averages for students. Now we have several school classes, each with some number of students, and we want the total averages for the complete school. The averages will be calculated by combining the "weighted" averages for each class. The following program uses the random student data generator from previous examples, and calculates the total averages without employing threads:
use rand::{thread_rng, Rng};
#[derive(Debug)]
pub struct SchoolScore {
driver: String,
score: i32,
}
fn build_record<R: Rng>(rng: &mut R, fnames: &[&str], lnames: &[&str]) -> SchoolScore {
let idx_fname = rng.gen_range(0, fnames.len());
let idx_lname = rng.gen_range(0, lnames.len());
let mut driver = String::new();
driver.push_str(fnames[idx_fname]);
driver.push_str(" ");
driver.push_str(lnames[idx_lname]);
let score = rng.gen_range(1, 21);
SchoolScore { driver, score }
}
fn pa(records: &[SchoolScore]) -> u8 {
let mut total: f64 = 0.0;
for rec in records.iter() {
total = total + f64::from(rec.score);
}
let n = records.len() as f64;
(total / n).round() as u8
}
fn ph(records: &[SchoolScore]) -> u8 {
let mut total: f64 = 0.0;
for rec in records.iter() {
total = total + 1.0 / f64::from(rec.score);
}
let n = records.len() as f64;
(n / total).round() as u8
}
fn main() {
let fnames = [
"Mike",
"Niki",
"Ayrton",
"Max",
"Lewis",
"Kimi",
"Sebastian",
"Sergio",
"Jenson",
"Daniel",
];
let lnames = [
"Schumacher",
"Lauda",
"Senna",
"Verstappen",
"Hamilton",
"Raikkonen",
"Vettel",
"Perez",
"Button",
"Ricciardo",
"Russel",
];
let nclasses = 8;
let mut rndgen = thread_rng();
let mut records: Vec<Vec<SchoolScore>> = Vec::new();
for _ in 0..nclasses {
let nitems = rndgen.gen_range(5, 12);
let mut claz: Vec<SchoolScore> = Vec::new();
for _ in 0..nitems {
let rec = build_record(&mut rndgen, &fnames, &lnames);
claz.push(rec);
}
records.push(claz);
}
// find the averages
let mut results_vector: Vec<(usize, u8, u8)> = Vec::new();
for subvec in records.iter() {
let result = (subvec.len(), pa(subvec), ph(subvec));
results_vector.push(result);
}
let mut pa_sum: f64 = 0.0;
let mut ph_sum: f64 = 0.0;
let mut nsum: f64 = 0.0;
for (n, pa_val, ph_val) in results_vector {
let nf: f64 = f64::from(n as u32);
pa_sum = pa_sum + f64::from(pa_val) * nf;
ph_sum = ph_sum + nf / f64::from(ph_val);
nsum = nsum + nf;
}
println!("TOTAL PA={}, PH={}", pa_sum / nsum, nsum / ph_sum);
}
Now we want to make each class' average calculation in an independent
thread, and later proceed to combine those averages in the
main thread to get the final response. A first attempt
may include the following code, note that the thread
result will be captured in a vector containing
the thread::JoinHandle<T>
type:
// find the averages
let mut join_vector: Vec<thread::JoinHandle<(usize, u8, u8)>> = Vec::new();
for subvec in records.iter() {
let jh = thread::spawn(|| (subvec.len(), pa(subvec), ph(subvec)));
join_vector.push(jh);
}
But this does not compile; here we show an extract from the compiler error messages:
error[E0597]: `records` does not live long enough --> src/bin/p0905.rs:79:19 | 79 | for subvec in records.iter() { | ^^^^^^^------- | | | borrowed value does not live long enough | argument requires that `records` is borrowed for `'static` ... 94 | } | - `records` dropped here while still borrowed
The problem is the variable records
, which is discarded when the main function
terminates: the threads are using references to it (captured from records.iter()
) and
may try to employ them even after main does finalize. Rust alerts us that
records
"does not live long enough", and suggest a 'static
lifetime which
here is not a good advice.
Moving the vector
Now we have a solution which does compile (we show only the final section):
use std::thread;
...
// find the averages
let mut join_vector = Vec::new();
for subvec in records {
let jh = thread::spawn(move || (subvec.len(), pa(&subvec), ph(&subvec)));
join_vector.push(jh);
}
let mut pa_sum: f64 = 0.0;
let mut ph_sum: f64 = 0.0;
let mut nsum: f64 = 0.0;
for child in join_vector {
let (n, pa_val, ph_val) = child.join().unwrap();
let nf: f64 = f64::from(n as u32);
pa_sum = pa_sum + f64::from(pa_val) * nf;
ph_sum = ph_sum + nf / f64::from(ph_val);
nsum = nsum + nf;
}
println!("TOTAL PA={}, PH={}", pa_sum / nsum, nsum / ph_sum);
}
Here we applied two concepts. First, the iteration did not employ
in records.iter()
but in records.into_iter()
which in Rust may be
abbreviated simply as in records
. That means that the records
vector is moved into the iteration, so nothing remains to be
dropped at the end of main()
:
for subvec in records { ... }
The second concept is new: the closure is now a "move" closure, obviously
distinguished by the move
keyword before the parameter list. In this
case the referenced variables will forcibly be "moved" into the closure,
so the "subvectors" will not be caually removed before the closure
execution finishes. Note that subvec
is a vector object (not a
reference to it), so we provide &subvec
for pa()
and ph()
:
move || (subvec.len(), pa(&subvec), ph(&subvec))
Reference counting
As shown above, the "move closures" capture the vector; sometimes
this is undesirable (for example, the vector may be needed for further
processing); since plain references don’t work because of unfulfilled
lifetimes, reference counting may provide the solution. Before we’ve
seen the Rc
type which is indicated for single-threaded code; for
multithreading reference counting Rust provides the Arc
type. The
following solution employs Arc
references:
use std::sync::Arc;
use std::thread;
...
// find the averages
let mut join_vector = Vec::new();
let wrecords: Arc<_> = Arc::new(records);
for i in 0..wrecords.len() {
let records_clone = Arc::clone(&wrecords);
let jh = thread::spawn(move || {
let subvec = records_clone.get(i).unwrap();
(subvec.len(), pa(subvec), ph(subvec))
});
join_vector.push(jh);
}
Now the records_clone
variable is moved into the closure, and
the internal counter avoids the problematic deletion of
the object at the end of main()
.
Mutable data: Mutex
Our next goal is to allow the threads to "promote" (increase
low scores) in the vectors before calculating the averages. This
calls for an exclusive access to the vector and mutable
references. The usual solution involves the
combination of Arc<Mutex<T>>
:
use std::sync::Arc;
use std::sync::Mutex;
use std::thread;
fn promote(records: &mut [SchoolScore]) {
for rec in records.iter_mut() {
if rec.score < 14 {
rec.score = 14;
}
}
}
...
// find the averages
let mut join_vector = Vec::new();
let rlen = records.len();
let wrecords: Arc<Mutex<_>> = Arc::new(Mutex::new(records));
for i in 0..rlen {
let wrec_clone = Arc::clone(&wrecords);
let jh = thread::spawn(move || {
let mut vec = wrec_clone.lock().unwrap();
let subvec = vec.get_mut(i).unwrap();
promote(subvec);
(subvec.len(), pa(subvec), ph(subvec))
});
join_vector.push(jh);
}
The promote()
function does require a mutable
slice reference in order to modify the scores. The
key lines are:
let mut vec = wrec_clone.lock().unwrap();
let subvec = vec.get_mut(i).unwrap();
The lock()
method ensures that only one thread at a time executes this
section. From its documentation:
"This function will block the local thread until it is available to acquire
the mutex. Upon returning, the thread is the only thread with the lock held."
Then we get a mutable reference the i-th school class vector, suitable to
call promote()
.
Using "mpsc" channels
The mpsc (multi-producer, single-consumer) module provide a mechanism for
message (object) exchange between threads. The channel()
function
provides for asynchronously sending messages from multiple
threads (Sender
trait), while a single receptor (Receiver
trait)
collect those messages. There is an internal unlimited buffer for
the messages pending to be read. The following example is
a evolution from the first working example of this section; here
we collect the messages from a channel, while the previous
version used the threads' "return values":
use rand::{thread_rng, Rng};
use std::sync::mpsc::channel;
use std::thread;
...
// find the averages
let (tx, rx) = channel();
for subvec in records {
let tx = tx.clone();
thread::spawn(move || {
let ans = (subvec.len(), pa(&subvec), ph(&subvec));
tx.send(ans).unwrap();
});
}
let mut pa_sum: f64 = 0.0;
let mut ph_sum: f64 = 0.0;
let mut nsum: f64 = 0.0;
for _ in 0..nclasses {
let (n, pa_val, ph_val) = rx.recv().unwrap();
let nf: f64 = f64::from(n as u32);
pa_sum = pa_sum + f64::from(pa_val) * nf;
ph_sum = ph_sum + nf / f64::from(ph_val);
nsum = nsum + nf;
}
println!("TOTAL PA={}, PH={}", pa_sum / nsum, nsum / ph_sum);
}
As shown, the "transmiter" part of the channel is cloned
as needed, while the receptor is a single instance. While
the transmision is asynchronous, the reception blocks
with the recv()
method. There is also the try_recv()
for
non blocking reception attempts, and recv_timeout()
for
timeout control.
More Examples
Some "interesting" examples to review the concepts.
Calculating PI by brute force
Consider the "increasing" set
A={1, -3, 5, -7, 9, -11, …}
and its number of elements n(A). Then
we have an approximation for PI as 4*n(A)/PH(A)
which improves as the
number of terms go to infinite.
This is one of the simplest (but rather unperformant) methods for the calculation of the mathematical constant PI, also known as the Leibniz formula:
PI = 4 * (1 - 1/3 + 1/5 - 1/7 + 1/9 ...)
It’s very easy to write a program to implement the Leibniz formula; in this section we are interested in the time required to achieve a high precision PI approximation with it.
Basic solution
The following implementation shows the approximated number of steps required
by the Leibniz formula to approximate PI up to 10^-9
:
fn main() {
let mut n: u64 = 0;
let mut den: f64 = 1.0;
let mut sum: f64 = 0.0;
let pi4 = std::f64::consts::PI / 4.0;
loop {
sum = sum + 1.0 / den;
n = n + 1;
if den > 0.0 {
den = -(den + 2.0);
} else {
den = -(den - 2.0);
}
if f64::abs(pi4 - sum) < 1.0e-9 {
break;
}
}
println!("Steps: {}", n);
}
The unoptimized result:
diego@dataone:~/devel/RUST/xtut$ cargo build Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished dev [unoptimized + debuginfo] target(s) in 19.31s diego@dataone:~/devel/RUST/xtut$ time target/debug/p5010 Steps: 250036592 real 0m3.102s user 0m3.022s sys 0m0.021s
Now we try with a "release" version, and the time is reduced in about 30%:
diego@dataone:~/devel/RUST/xtut$ cargo build --release Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished release [optimized] target(s) in 32.47s diego@dataone:~/devel/RUST/xtut$ time target/release/p5010 Steps: 249963605 real 0m1.955s user 0m1.954s sys 0m0.000s
Native CPU model
Now we’ll build a version specific for the cpu where the Rust compiler is running: the executable may not be able to work in a different CPU model:
diego@dataone:~/devel/RUST/xtut$ RUSTFLAGS="-C target-cpu=native" cargo build --release Compiling xtut v0.1.0 (/home/diego/devel/RUST/xtut) Finished release [optimized] target(s) in 35.06s diego@dataone:~/devel/RUST/xtut$ time target/release/p5010 Steps: 250036592 real 0m1.046s user 0m1.039s sys 0m0.001s
Again we reduced the time to its half. Please note that the time reduction depends strongly on the CPU model.
Using SIMD extensions
The following version employs the module core::arch::x86_64
which
provides access to Intel/AMD intrinsics for CPUs of 64 bits in the
x86 family. We’ll try to leverage the potential parallellism in the
Leibniz formula by computing four terms at the same time: this is
achieved by the SIMD operations provided by the 256-bits YMM*
registers introduced in the AVX (Advanced Vector Extensions)
extension around 2011.
Note
|
Please see https://rust-lang.github.io/packed_simd/perf-guide/target-feature/rustflags.html for more information regarding SIMD optimizations in Rust. |
For example, the _mm256_div_pd()
function will use the
corresponding Intel intrinsic to simultaneously divide the
corresponding four double precision values stored in two 256-bits
registers:
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
#[repr(align(32))]
struct Buf {
darray: [f64; 4],
}
fn main() {
let mut n: u64 = 0;
let mut sum: f64 = 0.0;
let pi4 = std::f64::consts::PI / 4.0;
unsafe {
let mut dens = _mm256_set_pd(1.0f64, -3.0f64, 5.0f64, -7.0f64);
let adder = _mm256_set_pd(8.0f64, -8.0f64, 8.0f64, -8.0f64);
let ones = _mm256_set1_pd(1.0f64);
let mut buf = Buf { darray: [0.0; 4] };
let data_ptr: *mut f64 = &mut buf.darray[0] as *mut f64;
loop {
let quotients = _mm256_div_pd(ones, dens);
_mm256_store_pd(data_ptr, quotients);
sum = sum + buf.darray[0] + buf.darray[1] + buf.darray[2] + buf.darray[3];
dens = _mm256_add_pd(dens, adder);
n = n + 1;
if f64::abs(pi4 - sum) < 1.0e-9 {
break;
}
}
}
println!("Steps: {}", 4 * n);
}
To instruct the compiler to employ the AVX optimizations, we provide the "avx feature":
diego@dataone:~/devel/RUST/xtut$ RUSTFLAGS="-C target-feature=+avx" cargo build --release Finished release [optimized] target(s) in 0.02s diego@dataone:~/devel/RUST/xtut$ time target/release/p5020 Steps: 250036408 real 0m0.415s user 0m0.410s sys 0m0.005s
The time improved dramatically as compared to the previous version. Note that we collect the added terms in an array buffer:
_mm256_store_pd(data_ptr, quotients);
sum = sum + buf.darray[0] + buf.darray[1] + buf.darray[2] + buf.darray[3];
This example illustrates the usage of a "raw pointer", pointing
to the first element of the darray
array:
let mut buf = Buf { darray: [0.0; 4] };
let data_ptr: *mut f64 = &mut buf.darray[0] as *mut f64;
The array is defined as a member of the Buf
structure. This was
done in order to satisfy the 32-bytes memory alignment requirement of
the _mm256_store_pd()
intrinsic:
#[repr(align(32))]
struct Buf {
darray: [f64; 4],
}
The following version (hopefully) avoids the usage of the memory buffer, which a further time improvement:
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
fn main() {
let mut n: u64 = 0;
let pi4 = std::f64::consts::PI / 4.0;
unsafe {
let mut dens = _mm256_set_pd(1.0f64, -3.0f64, 5.0f64, -7.0f64);
let adder = _mm256_set_pd(8.0f64, -8.0f64, 8.0f64, -8.0f64);
let ones = _mm256_set1_pd(1.0f64);
let mut rsum = _mm256_set1_pd(0.0f64);
let mut quotients: __m256d;
loop {
quotients = _mm256_div_pd(ones, dens);
rsum = _mm256_add_pd(rsum, quotients);
dens = _mm256_add_pd(dens, adder);
n = n + 1;
// https://stackoverflow.com/a/49943540/4876728
// let vlow = _mm256_castpd256_pd128(quotients);
let vlow = _mm256_extractf128_pd(rsum, 0);
let vhigh = _mm256_extractf128_pd(rsum, 1);
let add_partial = _mm_add_pd(vlow, vhigh);
let sum = _mm_cvtsd_f64(add_partial)
+ _mm_cvtsd_f64(_mm_unpackhi_pd(add_partial, add_partial));
if f64::abs(pi4 - sum) < 1.0e-9 {
break;
}
}
}
println!("Steps: {}", 4 * n);
}
A test:
Steps: 249807812 real 0m0.357s user 0m0.353s sys 0m0.004s
Note
|
In order to see the generated assembly, the cargo rustc command
is in order. The assembly listing may be obtained in
the target/release/deps/p5030-<hash>.s file.
|
A corresponding C version follows:
#include <immintrin.h>
#include <math.h>
#include <stdio.h>
int main() {
long n = 0;
double pi4 = M_PI / 4.0;
__m256d dens = _mm256_set_pd(1.0, -3.0, 5.0f, -7.0);
__m256d adder = _mm256_set_pd(8.0, -8.0, 8.0, -8.0);
__m256d ones = _mm256_set1_pd(1.0);
__m256d rsum = _mm256_set1_pd(0.0);
__m256d quotients;
for(;;) {
quotients = _mm256_div_pd(ones, dens);
rsum = _mm256_add_pd(rsum, quotients);
dens = _mm256_add_pd(dens, adder);
n = n + 1;
if(n & 0xf != 0) {
continue;
}
// https://stackoverflow.com/a/49943540/4876728
// let vlow = _mm256_castpd256_pd128(quotients);
__m128d vlow = _mm256_extractf128_pd(rsum, 0);
__m128d vhigh = _mm256_extractf128_pd(rsum, 1);
__m128d add_partial = _mm_add_pd(vlow, vhigh);
double sum = _mm_cvtsd_f64(add_partial)
+ _mm_cvtsd_f64(_mm_unpackhi_pd(add_partial, add_partial));
if(fabs(pi4 - sum) < 1.0e-9) {
break;
}
}
printf("Steps: %ld\n", 4 * n);
}
Which is executed achieving similar times:
diego@dataone:~/devel/RUST/xtut$ gcc -O3 -mavx -o /tmp/p5030-i src/bin/p5030-i.c diego@dataone:~/devel/RUST/xtut$ time /tmp/p5030-i Steps: 249807816 real 0m0.355s user 0m0.354s sys 0m0.001s
Deterministic Password Generator
This program illustrates an strategy which may be employed in the construction of a password manager. A user has several website accounts:
this user is tired of remembering their passwords. A simple solution is to take note of them in a text file:
-
name@hotmail.com → password 1
-
name@yahoo.com → password 2
-
name@latinmail.com → password 3
-
name@ebay.com → password 4
but now this file is a single point of risk. One strategy is to store with an encrypted version of the passwords, with the encryption key derived from a single "master password" which is the single piece of the scheme which must be remembered:
-
name@hotmail.com → ENC(password 1)
-
name@yahoo.com → ENC(password 2)
-
name@latinmail.com → ENC(password 3)
-
name@ebay.com → ENC(password 4)
this method relies in the quality of the encryption algorithm. An attacker with a copy of this file may be able to try a brute force attack from all the included information.
Another strategy is the derivation of the passwords from the account names and the already mentioned "master password", so the file no longer needs to contain passwords nor their encrypted versions: the passwords are always recalculated when needed. A simple method for this derivation is the usage of a hashing function, like:
Derived-password = EXTRAS(SHA512("account id" + "master password"))
the SHA512
function refers to the so-called standard hashing
algorithm, and the EXTRAS()
is a custom function we will
employ to derivate a textual representation including some
password policy requirements.
Password policy
The following function will be employed below; it simply checks that a string does contain at least one upper case, one lower case, and one numeric digit:
fn has_all_needed(probe: &str) -> bool {
probe.chars().any(|cc| cc >= 'A' && cc <= 'Z')
&& probe.chars().any(|cc| cc >= 'a' && cc <= 'z')
&& probe.chars().any(|cc| cc >= '0' && cc <= '9')
}
Password generation
This is the biggest function in our program. It creates a text string
concatenating the "account id" with the "master password", separated
by a colon character. The Sha512
struct (from the sha2
crate) is
used to generate the SHA-512 value from the mentioned text. It generates
a GenericArray
object, which in practice may be used as a byte array.
It is provided to base64::encode()
which in turn generates a String
representation of the binary data using the famous Base64 encoding.
Now, the tactic is to extract substrings (variable probe
) from the
Base64 String and check if any of them satisfy the previous
has_all_needed()
function:
for i in 0..max {
let k = i * PASSWD_LEN;
let probe = &enc_data[k..(k + PASSWD_LEN)];
if has_all_needed(probe) {
...
return Some(ans);
}
}
There is a (very) small probability for all the Base64 substrings
not satisfying our has_all_needed()
criteria. In that case the
function simply returns None
.
We also want to insert a symbol in the password. The symbol comes
from a char array containing the values !
, @
, #
, $
, and
%
. To "randomly" select one of the these characters, and the
insert position in the password for it, we use the last two
bytes of the SHA-512 value (scaled to the required range by a
modulo operation.)
let rnd1 = result[result.len() - 2] as usize;
let rnd2 = result[result.len() - 1] as usize;
let sym_pos = rnd1 % (password_length - 1);
let sym_val = SYMBOLS[rnd2 % SYMBOLS.len()];
Reading the master password
The password entry should avoid the terminal echo in order
to prevent accidental reading by unauthorized people. Disabling
the terminal echo is a system-dependent feauture, so here
we’ll emply the termion
crate, which provides it for
Linux environments.
The read_secret()
function is in charge of this operation. Please
see the termion
documentation for more information.
Displaying the generated password
The generated password must be printed in the screen as the final step of the process, so it could be applied in the associated website.
At this stage the password could be read by unauthorized nearby people. In order to minimize the probability of this event, we’ll ask the user to press some key when done with the password, in order to immediatly overwrite it:
Enter account id> diego@latinmail.com Enter master password> - - - press any key when done - - - PASSWORD: $3cNNj5T
After key press:
Enter account id> diego@latinmail.com Enter master password> - - - press any key when done - - - PASSWORD: ########
The key line includes a "carriage return" character used
to replace the password with a series of #
characters:
println!("\x0dPASSWORD: {} ", "#".repeat(clear_password.len()));
Now the full program:
use sha2::{Digest, Sha512};
use std::io;
use std::io::Read;
use std::io::Write;
use termion::input::TermRead;
use termion::raw::IntoRawMode;
const SYMBOLS: [char; 5] = ['!', '@', '#', '$', '%'];
const PASSWD_LEN: usize = 8;
fn has_all_needed(probe: &str) -> bool {
probe.chars().any(|cc| cc >= 'A' && cc <= 'Z')
&& probe.chars().any(|cc| cc >= 'a' && cc <= 'z')
&& probe.chars().any(|cc| cc >= '0' && cc <= '9')
}
fn build_password(
account_id: &str,
master_password: &str,
password_length: usize,
) -> Option<String> {
let data = format!("{}:{}", account_id, master_password);
let mut hasher = Sha512::new();
hasher.update(data);
let result = hasher.finalize();
let enc_data = base64::encode(result);
let rnd1 = result[result.len() - 2] as usize;
let rnd2 = result[result.len() - 1] as usize;
let sym_pos = rnd1 % (password_length - 1);
let sym_val = SYMBOLS[rnd2 % SYMBOLS.len()];
let max = enc_data.len() / PASSWD_LEN - 1;
for i in 0..max {
let k = i * PASSWD_LEN;
let probe = &enc_data[k..(k + PASSWD_LEN)];
if has_all_needed(probe) {
let p1 = &enc_data[k..(k + password_length)];
let mut ans = String::new();
ans.push_str(&p1[0..sym_pos]);
ans.push(sym_val);
ans.push_str(&p1[sym_pos..(password_length - 1)]);
return Some(ans);
}
}
None
}
fn read_secret(prompt: &str) -> String {
print!("{}", prompt);
io::stdout().flush().unwrap();
let stdout = std::io::stdout();
let mut stdout = stdout.lock();
let stdin = std::io::stdin();
let mut stdin = stdin.lock();
let pass = stdin.read_passwd(&mut stdout);
println!();
if let Ok(Some(pass)) = pass {
return pass.trim().to_string();
}
panic!("Error extracting user input");
}
fn wait_for_key() {
let stdout = std::io::stdout();
let mut stdout = stdout.lock().into_raw_mode().unwrap();
let mut stdin = std::io::stdin().bytes();
stdin.next();
stdout.flush().unwrap();
}
fn read_string(prompt: &str) -> String {
print!("{}", prompt);
io::stdout().flush().unwrap();
let mut line = String::new();
io::stdin()
.read_line(&mut line)
.expect("Error extracting user input");
line.trim().to_string()
}
fn show_password(clear_password: &str) {
println!(" - - - press any key when done - - -");
print!("PASSWORD: {} ", clear_password);
std::io::stdout().flush().unwrap();
wait_for_key();
println!("\x0dPASSWORD: {} ", "#".repeat(clear_password.len()));
}
fn main() {
let account_id = read_string("Enter account id> ");
let master_password = read_secret("Enter master password> ");
match build_password(&account_id, &master_password, PASSWD_LEN) {
Some(clear_password) => show_password(&clear_password),
None => println!("Sorry, could not generate password"),
}
}
-
The master password should be asked at least twice, since any typing error would be (by definition) undetected.
-
Make different password requirements: only digits, only letters, etc.
-
The password "probes" of our program may be too weak, even containing letters and digits. Use the
zxcvbn
crate to include a stronger validation. -
For better security, we may give a timeout of 60 seconds for the password show:
fn wait_for_key() {
let mut counter: u64 = 0;
let stdout = std::io::stdout();
let mut stdout = stdout.lock().into_raw_mode().unwrap();
let mut stdin = termion::async_stdin().bytes();
loop {
let b = stdin.next();
if let Some(Ok(_)) = b {
break;
}
thread::sleep(time::Duration::from_millis(10));
counter = counter + 1;
if counter > 60 * 100 {
break;
}
}
stdout.flush().unwrap();
}
National students scores
The goverment anually applies a national standard test in all the schools of the country. The total (arithmetic) average must be calculated in order to assess the educational policy.
Since the number of samples is pretty big, the goverment is interested in partial averages as the samples are collected. So, the averages will be calculated several (many) times as the samples are reported from the schools.
In order to provide a timely average, our program needs to store just two (mutable) values: the total sum of all the collected scores, and the number of collected scores. Both values will be updated as new scores are reported, so the new average may be calculated:
Average(n) = TOTAL(up to n) / n * Got new Score(n+1) * TOTAL(up to n+1) = TOTAL(up to n) + Score(n+1) Average(n+1) = TOTAL(up to n+1) / ( n+1 )
The next program is a trivial one, where only one operator is in charge of entering all the scores of the nation. Note that several scores may be entered at once, reflecting the fact that the schools report scores in batches:
use std::io;
use std::io::Write;
fn read_string(prompt: &str) -> Option<String> {
print!("{}", prompt);
io::stdout().flush().unwrap();
let mut line = String::new();
match io::stdin().read_line(&mut line) {
Ok(_) => Some(line.trim().to_string()),
Err(_) => {
println!("Error extracting user input");
None
}
}
}
fn read_numbers(prompt: &str) -> Option<Vec<u8>> {
let snumbers = read_string(prompt);
let line = match snumbers {
None => return None,
Some(line) => line,
};
if line.is_empty() {
return None;
}
let str_vec: Vec<&str> = line.split(',').map(|x| x.trim()).collect();
let mut ans: Vec<u8> = Vec::new();
let mut counter: usize = 1;
for v in str_vec.iter() {
let val: u8 = match v.parse() {
Ok(x) => x,
Err(_) => {
println!("Parse error at element 1-position {}", counter);
return None;
}
};
if val > 20 {
println!("Overflow at element 1-position {}", counter);
return None;
}
ans.push(val);
counter += 1;
}
Some(ans)
}
struct TheAverage {
samples: usize,
average: Option<f32>,
}
fn display(avg: &TheAverage) {
match avg.average {
None => {
println!("There are no samples yet!");
}
Some(x) => {
println!("Current average is {} from {} samples", x, avg.samples);
}
};
}
fn main() {
let mut n_total: usize = 0;
let mut s_total: u64 = 0;
loop {
let average: TheAverage;
if n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = s_total as f64;
let den = n_total as f64;
let q = num / den;
average = TheAverage {
samples: n_total,
average: Some(q as f32),
}
}
display(&average);
let more_scores = read_numbers("Enter scores separated by commas> ");
if let Some(scores) = more_scores {
n_total += scores.len();
scores.iter().for_each(|&score| s_total += score as u64);
}
}
}
A sample run:
$ cargo run --bin p7010 Compiling p70 v0.1.0 (/home/diego/devel/RUST/tutorial21/p70) Finished dev [unoptimized + debuginfo] target(s) in 0.68s Running `/home/diego/devel/RUST/tutorial21/p70/target/debug/p7010` There are no samples yet! Enter scores separated by commas> 12 Current average is 12 from 1 samples Enter scores separated by commas> 13,17 Current average is 14 from 3 samples Enter scores separated by commas> 11,45 Overflow at element 1-position 2 Current average is 14 from 3 samples Enter scores separated by commas> 18 Current average is 15 from 4 samples Enter scores separated by commas> 18 Current average is 15.6 from 5 samples Enter scores separated by commas>
Many data-entry operators
Since the number of scores to be entered is big, the goverment hires many operators for data-entry. The developer in charge considers splitting the previous program in a data-entry executable (to be run in several terminal stations by every contracted operator), and a "displaying" executable which collects the data from the stations.
So, we now have two programs, where one of them will be executed several times at once. Their output must be transfered to the former for further processing.
As this developer is a bit old-fashioned, he considers a data transfer by disk files. All the operators will be logged in a single computer, and all the running programs (operating system processes) will share a disk directory where the data-entry programs will write the collected scores. The "displaying" program will read these files and then remove them (to avoid re-processing the same scores.)
Here the new data-entry program:
use rand::{thread_rng, Rng};
use std::fs::File;
use std::fs::OpenOptions;
use std::io;
use std::io::Write;
...
fn new_data_file<R: Rng>(rng: &mut R) -> File {
let random: u64 = rng.gen_range(100000, 1000000);
let filename = format!("/tmp/{}.scores", random);
println!("Creating scores file at {}", filename);
OpenOptions::new()
.create_new(true)
.write(true)
.open(filename)
.expect("Failed to create scores file")
}
fn main() {
let mut rndgen = thread_rng();
loop {
let more_scores = read_numbers("Enter scores separated by commas> ");
if let Some(scores) = more_scores {
let mut file = new_data_file(&mut rndgen);
let mut first = true;
for score in scores {
if first {
write!(&mut file, "{}", score).unwrap();
first = false;
} else {
write!(&mut file, ",{}", score).unwrap();
}
}
}
}
}
For simplicity, all the "scoring files" will be stored in /tmp
using the format #.scores
(a numeric random prefix) which
will be scanned by the "displaying" program. A sample run:
$ cargo run --bin p7020 Compiling p70 v0.1.0 (/home/diego/devel/RUST/tutorial21/p70) Finished dev [unoptimized + debuginfo] target(s) in 0.77s Running `/home/diego/devel/RUST/tutorial21/p70/target/debug/p7020` Enter scores separated by commas> 12,13,16 Creating scores file at /tmp/265818.scores Enter scores separated by commas> 18,7,19 Creating scores file at /tmp/773157.scores Enter scores separated by commas> ^C
Note the create_new()
method which avoids the overwriting of the
files by two operators which by bad luck get to generate
the same random number.
Now the displaying program:
use std::fs;
use std::path::Path;
struct TheAverage {
samples: usize,
average: Option<f32>,
}
fn display(avg: &TheAverage) {
match avg.average {
None => {
println!("There are no samples yet!");
}
Some(x) => {
println!("Current average is {} from {} samples", x, avg.samples);
}
};
}
fn read_scores(filename: &str) -> Option<Vec<u8>> {
let line = match fs::read_to_string(filename) {
Ok(text) => text,
Err(_) => return None,
};
fs::remove_file(filename).ok()?;
if line.is_empty() {
return None;
}
let str_vec: Vec<&str> = line.split(',').map(|x| x.trim()).collect();
let mut ans: Vec<u8> = Vec::new();
let mut counter: usize = 1;
for v in str_vec.iter() {
let val: u8 = match v.parse() {
Ok(x) => x,
Err(_) => {
println!("Parse error at element 1-position {}", counter);
return None;
}
};
if val > 20 {
println!("Overflow at element 1-position {}", counter);
return None;
}
ans.push(val);
counter += 1;
}
Some(ans)
}
fn scan_directory(path: &Path) -> Result<Vec<String>, &str> {
let entries = match fs::read_dir(path) {
Ok(x) => x,
Err(_) => {
return Err("Problem reading directory");
}
};
let ans: Vec<String> = entries
.map(|e| e.unwrap().path().to_str().unwrap().to_string())
.filter(|x| x.ends_with(".scores"))
.collect();
// println!("Scan dir result: {:?}", ans);
Ok(ans)
}
fn main() {
let mut n_total: usize = 0;
let mut s_total: u64 = 0;
let one_second = std::time::Duration::from_millis(1000);
loop {
std::thread::sleep(one_second);
let average: TheAverage;
if n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = s_total as f64;
let den = n_total as f64;
let q = num / den;
average = TheAverage {
samples: n_total,
average: Some(q as f32),
}
}
display(&average);
let score_files = match scan_directory(Path::new("/tmp")) {
Ok(x) => x,
Err(msg) => {
println!("Program scanning scores directory: {}", msg);
continue;
}
};
for score_file in score_files.iter() {
if let Some(scores) = read_scores(score_file) {
n_total += scores.len();
scores.iter().for_each(|&score| s_total += score as u64);
}
}
}
}
Economic storage
The old-fashioned developer soon discovers that each score may be stored in a single byte, so he modifies the program in order to employ byte buffers and binary files. The programs get simplified a bit:
fn main() {
let mut rndgen = thread_rng();
loop {
let more_scores = read_numbers("Enter scores separated by commas> ");
if let Some(scores) = more_scores {
let mut buffer: Vec<u8> = Vec::new();
for score in scores {
buffer.push(score);
}
let mut file = new_data_file(&mut rndgen);
file.write(&buffer).unwrap();
}
}
}
and:
use std::io::Read;
...
fn read_scores(filename: &str) -> Option<Vec<u8>> {
let mut file = fs::OpenOptions::new().read(true).open(filename).unwrap();
let mut ans: Vec<u8> = Vec::new();
file.read_to_end(&mut ans).ok()?;
drop(file);
fs::remove_file(filename).ok()?;
Some(ans)
}
...
-
There is a race condition in the score-files scheme: the displayer may read a file which was just created but which its contents are not yet written. There are several ways to avoid this situation; for example, an impossible score (like 99) may be added at the end of the data in order to signal comletion; this way the displayer program may skip (for further retry) those files which do not (yet) have their completion mark.
-
Another option consists in writing the score files in a different directory. For each file, after writing the score information the it is moved to the final directory. The "move" operation usually is an atomic one, provided the involved directories live in the same filesystem.
-
A third option involves a probabilistic aproach: check the write time of the files and skip those which are "too recent" (by some reasonable measure.)
Unix queues
Our conservative developer wants to upgrade the system to
employ Unix kernel queues. There are two flavors of
queue interfaces (Posix and System V), but the developer
wants to try the nix
crate which is a nice wrapper to
the most employed Unix system calls; this crate currently
only supports the Posix interface.
The "displayer" process will be in charge of creating
a queue (identified by the path /scores
), while
the data-entry processes will die if can’t access
it.
Let’s start with the data-entry program:
use nix::mqueue;
use nix::mqueue::MQ_OFlag;
use nix::sys::stat::Mode;
use std::ffi::CString;
use std::io;
use std::io::Write;
...
fn main() {
let qpath = CString::new("/scores").unwrap();
let queue_id = mqueue::mq_open(&qpath, MQ_OFlag::O_WRONLY, Mode::empty(), None)
.expect("Can't access system queue - check displayer");
loop {
let more_scores = read_numbers("Enter scores separated by commas> ");
if let Some(scores) = more_scores {
let mut buffer: Vec<u8> = Vec::new();
for score in scores {
buffer.push(score);
}
mqueue::mq_send(queue_id, &buffer, 0)
.expect("Can't write in system queue - check displayer");
}
}
}
The queue creation currently requires the specification
of some attributes for mq_open
provided in the
MqAttr
structure. For Linux, please see the mq_overview
man page for pointers about the limits in the queue which
are related to the MqAttr
attributes.
use nix::mqueue;
use nix::mqueue::{MQ_OFlag, MqAttr};
use nix::sys::stat::Mode;
use std::ffi::CString;
...
fn main() {
let qpath = CString::new("/scores").unwrap();
let flags = MQ_OFlag::from_bits(
MQ_OFlag::O_RDONLY.bits() | MQ_OFlag::O_CREAT.bits() | MQ_OFlag::O_NONBLOCK.bits(),
)
.unwrap();
let mode = Mode::from_bits(0o600).unwrap();
let attrs = MqAttr::new(0, 5, 8192, 0);
let queue_id = mqueue::mq_open(&qpath, flags, mode, Some(&attrs))
.expect("Can't access system queue - check displayer");
let mut read_buffer = [0u8; 8192];
let mut n_total: usize = 0;
let mut s_total: u64 = 0;
let one_second = std::time::Duration::from_millis(1000);
loop {
std::thread::sleep(one_second);
let average: TheAverage;
if n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = s_total as f64;
let den = n_total as f64;
let q = num / den;
average = TheAverage {
samples: n_total,
average: Some(q as f32),
}
}
display(&average);
loop {
let mut pri = 0u32;
match mqueue::mq_receive(queue_id, &mut read_buffer, &mut pri) {
Ok(n) => {
n_total += n;
for z in 0..n {
s_total += read_buffer[z] as u64;
}
}
Err(_) => {
break;
}
}
}
}
}
More Unix queues
The nix
create is a convenient wrapper around the libc
library
which in several systems (like Linux) provides access to the kernel
system calls. Here we want to employ the traditional "System V" IPC
queues, which are (currently) not provided by nix
, so we’ll use the
libc
crate. To this effect we’ll specify the create in Cargo.toml
:
[dependencies] libc = "0.2.98"
Here the data-entry program:
use libc::c_long;
use libc::c_void;
use std::io;
use std::io::Write;
...
const MSGKEY: i32 = 0x4415512;
struct Msgbuf {
#[allow(dead_code)]
mtype: c_long,
mtext: [u8; 1024],
}
fn main() {
let msgid = unsafe { libc::msgget(MSGKEY, 0) };
if msgid == -1 {
panic!("Can't access queue with key {}", MSGKEY);
}
loop {
let more_scores = read_numbers("Enter scores separated by commas> ");
if let Some(scores) = more_scores {
let mut msgbuf = Msgbuf {
mtype: 1,
mtext: [0u8; 1024],
};
for i in 0..scores.len() {
msgbuf.mtext[i] = scores[i];
}
let res = unsafe {
libc::msgsnd(
msgid,
&msgbuf as *const Msgbuf as *const c_void,
scores.len(),
0,
)
};
if res != 0 {
panic!("Can't write in system queue - check displayer");
}
}
}
}
The System V IPC relies in numeric key constants which are used
by the programs which inter-communicate. Here we employ an
arbitrary constant 0x4415512
in order to access
the queue with libc::msgget()
. Note that the libc
functions
are considered unsafe
code, so must be enclosed in such blocks.
The libc::msgget()
function is documented in the Unix' msgget
man page (type something like man msgget
in order to read it.) Same
for any libc
function.
The program sends messages into the queue with the libc::msgsnd()
function, whose arguments are the "queue descriptor" returned by
libc::msgget()
, a pointer to a C-language structure, the length of
the contained data (in bytes), and optional flags (here zero.)
The mentioned structure is composed of two members: a numeric "message
type" of C-type "long" (corresponding to the libc::c_long
type), and
a raw buffer of arbitrary size (with system defined limits.) We mimic
this structure with struct Msgbuf
.
When sending the message, the libc::msgsnd()
function does
require a const pointer to the structure; this is achieved
by taking a reference of our structure value &msgbuf
which
is converted to a const pointer: &msgbuf as *const Msgbuf
, then converted
to a C-language void pointer: &msgbuf as *const Msgbuf as *const c_void
.
The displayer now:
use libc::c_long;
use libc::c_void;
...
const MSGKEY: i32 = 0x4415512;
struct Msgbuf {
#[allow(dead_code)]
mtype: c_long,
mtext: [u8; 1024],
}
fn main() {
let msgid = unsafe { libc::msgget(MSGKEY, libc::IPC_CREAT | 0o600) };
if msgid == -1 {
panic!("Can't create queue with key {}", MSGKEY);
}
let mut msgbuf = Msgbuf {
mtype: 1,
mtext: [0u8; 1024],
};
let mut n_total: usize = 0;
let mut s_total: u64 = 0;
let one_second = std::time::Duration::from_millis(1000);
loop {
std::thread::sleep(one_second);
let average: TheAverage;
if n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = s_total as f64;
let den = n_total as f64;
let q = num / den;
average = TheAverage {
samples: n_total,
average: Some(q as f32),
}
}
display(&average);
loop {
let mtype = 0i64;
let mflgs = libc::IPC_NOWAIT;
let res = unsafe {
libc::msgrcv(
msgid,
&mut msgbuf as *mut Msgbuf as *mut c_void,
msgbuf.mtext.len(),
mtype,
mflgs,
)
};
if res == -1 {
break;
}
let ures = res as usize;
n_total += ures;
for z in 0..ures {
s_total += msgbuf.mtext[z] as u64;
}
}
}
}
The displayer is in charge of creating the queue, so we
provide libc::IPC_CREAT
for libc::msgget()
; also
providing Unix access mode 0600
(user read/write.)
The reception is done with libc::msgrcv()
which requires
a non-constant (mutable) pointer to the Msgbuf
structure; here
the corresponding expression is &mut msgbuf as *mut Msgbuf as *mut c_void
.
The reception is done in non-blocking way (libc::IPC_NOWAIT
)
without message type discrimination (let mtype = 0i64
.) Please
consult additional details in the manual page
for msgsnd()
and msgrcv()
.
TCP Sockets
Our aging developer is admonished because the obvious limitations of his centralized solution, so he devises the introduction of plain TCP sockets in order to allow the remote connection from several data-entry stations in the country.
The data-entry program will work as a TCP client which tries to connect to the displayer program (which acts as TCP server.)
use std::io;
use std::io::Write;
use std::net::TcpStream;
...
fn main() {
let mut stream = TcpStream::connect("127.0.0.1:9999").expect("Can't connect to server");
loop {
let more_scores = read_numbers("Enter scores separated by commas> ");
if let Some(scores) = more_scores {
if scores.len() > 65000 {
println!("Too many values!");
continue;
}
if scores.len() == 0 {
continue;
}
let mut buffer: Vec<u8> = Vec::new();
let hi = scores.len() / 256;
let lo = scores.len() % 256;
buffer.push(hi as u8);
buffer.push(lo as u8);
for score in scores {
buffer.push(score);
}
let total = buffer.len();
let mut written = 0;
while written < total {
written += stream
.write(&buffer[written..total])
.expect("Can't write to socket - check displayer");
}
stream
.flush()
.expect("Can't flush socket - check displayer");
}
}
}
Note that messages are being prefixed with the data-length encoded as two bytes (hi+low), which imposes a limit of 66535 bytes.
For our server we’ll implement a classic multithreaded TCP
server which accepts inbound connections on a server socket
(see the accepter()
function) and spawns a processing thread
(the net_reader()
function.)
From each thread, each packet is used to build a vector of bytes, which in turn is sent into a channel.
The main thread simply collects the pending messages from the channel’s receiving end, and updates the counters.
use std::io::Read;
use std::net::{TcpListener, TcpStream};
use std::sync::mpsc::channel;
use std::sync::mpsc::Sender;
use std::thread;
struct TheAverage {
samples: usize,
average: Option<f32>,
}
fn display(avg: &TheAverage) {
match avg.average {
None => {
println!("There are no samples yet!");
}
Some(x) => {
println!("Current average is {} from {} samples", x, avg.samples);
}
};
}
fn net_reader(mut stream: TcpStream, tx: Sender<Vec<u8>>) {
let mut read_buffer = [0u8; 65536];
let mut on_header = true;
let mut target = 2;
let mut offset = 0;
loop {
match stream.read(&mut read_buffer[offset..target]) {
Ok(n) => {
if n == 0 {
// client closed, so the thread terminates
println!("Closed connection from {}", stream.peer_addr().unwrap());
break;
}
offset += n;
if offset == target {
if on_header {
// done with the header
let hi = read_buffer[0] as usize;
let lo = read_buffer[1] as usize;
target = hi * 256 + lo;
offset = 0;
on_header = false;
} else {
// done with the data
let data: Vec<u8> = Vec::from(&read_buffer[0..target]);
tx.send(data).unwrap();
target = 2;
offset = 0;
on_header = true;
}
}
}
Err(_) => {
// client closed, so the thread terminates
println!("Error in connection from {}", stream.peer_addr().unwrap());
break;
}
}
}
}
fn accepter(listener: TcpListener, tx: Sender<Vec<u8>>) {
for stream in listener.incoming() {
let t_tx = tx.clone();
thread::spawn(move || {
let stream = stream.unwrap();
println!("Accepted connection from {}", stream.peer_addr().unwrap());
net_reader(stream, t_tx);
});
}
}
fn main() {
let listener = TcpListener::bind("127.0.0.1:9999").expect("Can't bind server socket");
let (tx, rx) = channel();
thread::spawn(move || {
accepter(listener, tx);
});
let mut n_total: usize = 0;
let mut s_total: u64 = 0;
let one_second = std::time::Duration::from_millis(1000);
loop {
std::thread::sleep(one_second);
let average: TheAverage;
if n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = s_total as f64;
let den = n_total as f64;
let q = num / den;
average = TheAverage {
samples: n_total,
average: Some(q as f32),
}
}
display(&average);
for data in rx.try_iter() {
let n = data.len();
n_total += n;
data.iter().for_each(|&score| s_total += score as u64);
}
}
}
UDP Sockets
The developer looks for improvements in the network response time since some remotely located data-entry operators reported a sluggish behavior of the program.
As an experiment, a rewrite of the network code using UDP datagrams is tested. The data-entry program follows:
use std::io;
use std::io::Write;
use std::net::UdpSocket;
...
fn main() {
let server = "127.0.0.1:7171";
let socket = UdpSocket::bind("0.0.0.0:0").expect("Can't bind local address");
socket.connect(server).expect("Can't connect to server");
loop {
let more_scores = read_numbers("Enter scores separated by commas> ");
if let Some(scores) = more_scores {
if scores.len() > 65000 {
println!("Too many values!");
continue;
}
if scores.len() == 0 {
continue;
}
socket
.send_to(&scores, server)
.expect("Can't write to socket - check displayer");
}
}
}
The best thing is that data headers are no longer needed (UDP handles the message limits.)
A trivial UDP server implementation follows:
use std::net::UdpSocket;
struct TheAverage {
samples: usize,
average: Option<f32>,
}
fn display(avg: &TheAverage) {
match avg.average {
None => {
println!("There are no samples yet!");
}
Some(x) => {
println!("Current average is {} from {} samples", x, avg.samples);
}
};
}
fn main() {
let server = "127.0.0.1:7171";
let socket = UdpSocket::bind(server).expect("Can't bind local address");
socket
.set_nonblocking(true)
.expect("Can't set non-blocking mode");
let mut read_buffer = [0u8; 65536];
let mut n_total: usize = 0;
let mut s_total: u64 = 0;
let one_second = std::time::Duration::from_millis(1000);
loop {
std::thread::sleep(one_second);
let average: TheAverage;
if n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = s_total as f64;
let den = n_total as f64;
let q = num / den;
average = TheAverage {
samples: n_total,
average: Some(q as f32),
}
}
display(&average);
loop {
if let Ok((n, _addr)) = socket.recv_from(&mut read_buffer) {
n_total += n;
&read_buffer[0..n]
.iter()
.for_each(|&score| s_total += score as u64);
} else {
break;
}
}
}
}
This server is single-threaded. Note that the socket is set
in "non-blocking" mode in order to avoid blocking on the
recv_from()
method.
Its main drawback is that the read from the socket happens only once per second: if the traffic is high, the inbound messages will be enqueued in the kernel buffers. When those buffers get filled, further packets will be discarded until the next read cycle which empties them.
The next server version uses a dedicated thread to read the inbound messages and forwards them into an unbound channel for further processing (similarly to the TCP version previously shown.) Here the socket remains in its default "blocking" mode.
use std::net::UdpSocket;
use std::sync::mpsc::channel;
use std::sync::mpsc::Sender;
use std::thread;
struct TheAverage {
samples: usize,
average: Option<f32>,
}
fn display(avg: &TheAverage) {
match avg.average {
None => {
println!("There are no samples yet!");
}
Some(x) => {
println!("Current average is {} from {} samples", x, avg.samples);
}
};
}
fn net_reader(socket: UdpSocket, tx: Sender<Vec<u8>>) {
let mut read_buffer = [0u8; 65536];
loop {
let (n, _addr) = socket
.recv_from(&mut read_buffer)
.expect("Network error receiving information");
let data: Vec<u8> = Vec::from(&read_buffer[0..n]);
tx.send(data).unwrap();
}
}
fn main() {
let server = "127.0.0.1:7171";
let socket = UdpSocket::bind(server).expect("Can't bind local address");
let (tx, rx) = channel();
thread::spawn(move || {
net_reader(socket, tx);
});
let mut n_total: usize = 0;
let mut s_total: u64 = 0;
let one_second = std::time::Duration::from_millis(1000);
loop {
std::thread::sleep(one_second);
let average: TheAverage;
if n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = s_total as f64;
let den = n_total as f64;
let q = num / den;
average = TheAverage {
samples: n_total,
average: Some(q as f32),
}
}
display(&average);
for data in rx.try_iter() {
let n = data.len();
n_total += n;
data.iter().for_each(|&score| s_total += score as u64);
}
}
}
Sharing state between threads
Our developer notes that the channels must store all the pending messages from the previous second, which may be a resource waste: better to update the counters as soon as each message is received. To this effect he devises a structure for the counters:
struct Counters {
n_total: usize,
s_total: u64,
}
a singleton will be shared by both threads: for updating the counters, and for average calculations. This entails a locking mechanism as provided by a mutex; also, a reference counting smart pointer will be employed for this sharing, so the shared value will have the type:
type MtCounters = Arc<Mutex<Counters>>;
Note that Arc
is simmilar to Rc
, but is indicated when the
references go to different threads. Now the updated program
which no longer employs a channel:
use std::net::UdpSocket;
use std::sync::{Arc, Mutex};
use std::thread;
struct TheAverage {
samples: usize,
average: Option<f32>,
}
struct Counters {
n_total: usize,
s_total: u64,
}
type MtCounters = Arc<Mutex<Counters>>;
fn display(avg: &TheAverage) {
match avg.average {
None => {
println!("There are no samples yet!");
}
Some(x) => {
println!("Current average is {} from {} samples", x, avg.samples);
}
};
}
fn net_reader(counters: MtCounters, socket: UdpSocket) {
let mut read_buffer = [0u8; 65536];
loop {
let (n, _addr) = socket
.recv_from(&mut read_buffer)
.expect("Network error receiving information");
let mut the_counters = counters.lock().unwrap();
for i in 0..n {
the_counters.n_total += 1;
the_counters.s_total += read_buffer[i] as u64;
}
}
}
fn main() {
let counters_mutex: MtCounters = Arc::new(Mutex::new(Counters {
n_total: 0,
s_total: 0,
}));
let server = "127.0.0.1:7171";
let socket = UdpSocket::bind(server).expect("Can't bind local address");
let clone = Arc::clone(&counters_mutex);
thread::spawn(move || {
net_reader(clone, socket);
});
let one_second = std::time::Duration::from_millis(1000);
loop {
std::thread::sleep(one_second);
let average: TheAverage;
let counters = counters_mutex.lock().unwrap();
if counters.n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = counters.s_total as f64;
let den = counters.n_total as f64;
let q = num / den;
average = TheAverage {
samples: counters.n_total,
average: Some(q as f32),
}
}
display(&average);
}
}
The lock()
method returns a mutex "guard", which wraps the
internal object but also is in charge of unlocking the mutex
when goes out of scope (or is explicitly dropped.)
-
UDP does not guarantee the message delivery as TCP, so an acknowledging response is in order to be sent by the server while being awaited by the client. Implement the response and the reception in the programs.
-
The data-entry program should retry (several times) the delivery if no acknowledge is received in a "reasonable" time. Check the
set_read_timeout()
method to implement the retry mechanism. -
Implement the multithreading TCP server without channels using the mutex and reference counting mechanism presented in the last example.
Using HTTP
Our developer is then instructed to upgrade the system in
order to support a client to be developed by third party
contractor, which in turn insists in employing a "web service"
based solution. So after checking the supported frameworks,
the developer selects warp
for the server
program (https://github.com/seanmonstar/warp), and
as a proof of concept he also decides to develop a simple data-entry
client using the reqwest
library (https://github.com/seanmonstar/reqwest).
The web service will support just a POST request which receives the scores as a sequence of bytes, in the address http://ip-address:8888/scores, which for testing purposes, will be http://127.0.0.1:8888/scores.
The client program is pretty trivial:
use std::io;
use std::io::Write;
...
fn main() {
let url = "http://127.0.0.1:8888/scores";
let client = reqwest::blocking::Client::new();
loop {
let more_scores = read_numbers("Enter scores separated by commas> ");
if let Some(scores) = more_scores {
if scores.len() > 65000 {
println!("Too many values!");
continue;
}
if scores.len() == 0 {
continue;
}
let data = Vec::from(scores);
let res = client
.post(url)
.body(data)
.send()
.expect("Can't post to server - check displayer");
if !res.status().is_success() {
println!("Http error: {:?}", res.status());
break;
}
}
}
}
This requires the reqwest
library in Cargo.toml
:
reqwest = { version = "0.11", features = ["blocking"] }
The web server is adapted from the examples of warp
:
use std::sync::{Arc, Mutex};
use std::thread;
use warp::Filter;
struct TheAverage {
samples: usize,
average: Option<f32>,
}
struct Counters {
n_total: usize,
s_total: u64,
}
type MtCounters = Arc<Mutex<Counters>>;
fn display(avg: &TheAverage) {
match avg.average {
None => {
println!("There are no samples yet!");
}
Some(x) => {
println!("Current average is {} from {} samples", x, avg.samples);
}
};
}
fn show_counters(counters_clone_dsp: MtCounters) {
let one_second = std::time::Duration::from_millis(1000);
loop {
std::thread::sleep(one_second);
let average: TheAverage;
let counters = counters_clone_dsp.lock().unwrap();
if counters.n_total == 0 {
average = TheAverage {
samples: 0,
average: None,
};
} else {
let num = counters.s_total as f64;
let den = counters.n_total as f64;
let q = num / den;
average = TheAverage {
samples: counters.n_total,
average: Some(q as f32),
}
}
display(&average);
}
}
fn update_counters(counters: MtCounters, data: &[u8]) {
let n = data.len();
let mut the_counters = counters.lock().unwrap();
for i in 0..n {
the_counters.n_total += 1;
the_counters.s_total += data[i] as u64;
}
}
#[tokio::main]
async fn main() {
let counters_mutex: MtCounters = Arc::new(Mutex::new(Counters {
n_total: 0,
s_total: 0,
}));
let counters_clone_dsp = Arc::clone(&counters_mutex);
thread::spawn(move || {
show_counters(counters_clone_dsp);
});
let counters_clone = warp::any().map(move || Arc::clone(&counters_mutex));
let routes = warp::post()
.and(warp::path("scores"))
.and(warp::body::content_length_limit(1024 * 64))
.and(warp::body::bytes())
.and(counters_clone)
.map(|data: warp::hyper::body::Bytes, counters| {
update_counters(counters, &data);
warp::http::Response::builder().body("ok")
});
warp::serve(routes).run(([127, 0, 0, 1], 8888)).await;
}
which in turn does require the addition of:
tokio = { version = "1", features = ["full"] } warp = "0.3"
Note that the web server is being fired in main()
which also is declared as an asynchronous function.
The warp::serve(…).run(…)
builds and runs the server,
returning a "future" which is awaited. That means that the
main()
thread only ends when the web server terminates (that
is, never.)
For that reason we started the displaying loop in an auxiliar
thread (the show_counters()
function.)
Warp defines the routing and processing of requests around
the composable warp::Filter
type. The last filter is
a "processing closure" which is in charge of updating the
counters; this closure does get the request bytes and
the "counters reference" as arguments, since the
warp::body::bytes()
and counters_clone
filters
were also provided. The first obviously is in charge
of extracting the request body as bytes (the Bytes
type allows derefencing into &[u8]
), and the second
one is a closure which allows the generation of a
new reference clone of the counters singleton for each
request.
Note that asynchronous code is not a trivial thing to learn, so there is a big reference at https://rust-lang.github.io/async-book/
*