Part 2: Optimising the code

Last updated:


Where we left off…

When I finished writing up the last post, I had managed to get the code into RTIC, but the code had grown from ~350 lines to ~620 lines. A lot of this was lengthy and repetitive bits of code that spanned several lines.

Even though this code worked fine, it would be a nightmare to maintain. Each time I want to add another current measurement, or even a voltage source, I would have to duplicate large chunks of the existing code. And then hope I don’t mess something up when I change some variable’s names.

What needs to be optimised?

The repetition in the variables

At the moment, all measurement variables are defined in a public struct at the start of the code, and assigned values later in the code. Let’s start by mapping out what is common to each measurement.

What is common to all measurements?

In the original code, all measurements had:

  • Descriptive name
  • Assigned pin
  • Calibration value
  • Buffer (where we store the last 10 ADC measurements)
  • Simple moving average (the average of the buffered measurements)
  • Sum (of buffered measurements)
  • ADC offset value
  • Corrected value (simple moving average – ADC offset)
  • Squared corrected value
  • Sum of squared values
  • Correction Ration (calibration value * (voltage / ADC resolution))
  • RMS value

For the revised code, I want to include the type of measurement (voltage or current). I wanted to see if listing the type of measurement means that I can use a loop to multiply each of the current measurements by the voltage measurement.

What is common to current measurements only?

In addition to the above, current measurements have:

  • Instantaneous power
  • Sum of instantaneous power
  • Real Power
  • Apparent Power
  • Power Factor

At first glance, I may be able to use rust macros to generate a unique struct for each of the measurements that I define before compiling the code.

The repetition in the calculations

As well as repetitive variable names, there was a lot of code that was being repeated.

Repetition within each group of measurements

For each set of calculations, the calculations are repeated for each variable. For example:

// Repetitive code obtaining an ADC value and pushing it
// onto the end of the buffer

shared_adc_resources
    .buffer_voltage
    .push(shared_adc_resources
        .adc
        .convert(&shared_adc_resources.pin_voltage,
            SampleTime::Cycles_3),).unwrap();

shared_adc_resources
    .buffer_current_1
    .push(shared_adc_resources
        .adc
        .convert(&shared_adc_resources.pin_current_1,
            SampleTime::Cycles_3),).unwrap();

shared_adc_resources
    .buffer_current_2
    .push(shared_adc_resources
        .adc
        .convert(&shared_adc_resources.pin_current_2,
            SampleTime::Cycles_3),).unwrap();


// Repetitive code summing the simple moving average

shared_adc_resources.sum_voltage = shared_adc_resources.sum_voltage + shared_adc_resources.simple_moving_average_voltage;

shared_adc_resources.sum_current_1 = shared_adc_resources.sum_current_1 + shared_adc_resources.simple_moving_average_current_1;

shared_adc_resources.sum_current_2 = shared_adc_resources.sum_current_2 + shared_adc_resources.simple_moving_average_current_2;

Repetition of blocks of code

There is a large section of code that is is used to obtain the values from the ADC and apply a simple moving average to remove some of the noise.

// Calculate the ADC offset value
    // Get measurements from the ADC                         ---|
        // Take measurement for each pin                        |
        // Put measurement onto the end of the buffer           |
        // If buffer length > 9                                 |---|
            // Take the average of all values in the buffer     |   |
            // Sum the averaged values                          |   |
            // Remove the first value from the buffer           |   |
            // Increment the measurement counter             ---|   |
    // If measurements > 2000                                       |
        // Get the average for each measurement                     |
                                                                    |
// Get the main measurements                                        |
    // Get measurements from the ADC                         ---|   |
        // Take measurement for each pin                        |   |
        // Put measurement onto the end of the buffer           |   |
        // If buffer length > 9                                 |   |
            // Take the average of all values in the buffer     |---|
            // Sum the averaged values                          |
            // Remove the first value from the buffer           |
            // Increment the measurement counter             ---|
            // Remove the ADC offset
            // Square the values with ADC offset removed
            // Sum the squared values
            // Get the instantaneous power
            // Sum the instantaneous power
    // If measurements > 2000
        // Calculate correction ratio
        // Calculate RMS values
        // Calculate real power
        // Calculate apparent power
        // Calculate power factor

At first glance I should be able to make use of functions to reduce the repetitive calculations.

But, there’s a catch…

What are my priorities for this code?

The way the code is written at the moment, I am deliberately calling all three ADC measurements one after the other (as shown in the repetitive code section). This was written so that the measurements of the voltage and current(s) are taken as close together as possible.

I had a hesitation about writing a generic function for taking a pin’s ADC measurements, storing the result in each struct’s buffer, and then performing a series of calculations. Each time I run the function, I will have to lock access to the struct, take the measurement, store the measurement, and then release access to the struct before repeating the process with the next struct. This may introduce some unnecessary delay in between taking the ADC measurements.

The workaround here was to make use of direct memory access (DMA).

How I optimised the code

Using DMA to obtain the ADC measurements

It turned out to be easier to use direct memory access than I had anticipated. There is a good write up about how DMA works and how to use it by Omar Hiari and a great example in the stm32f4xx-hal github repository.

In short, I reworked the code to obtain ADC values for eight pins (A0 to A7). Have a read of Omar Hiarai’s post to get an idea on how ADC works with DMA.

let gpioa = device.GPIOA.split();
let pin_7 = gpioa.pa7.into_analog();
let pin_6 = gpioa.pa6.into_analog();
let pin_5 = gpioa.pa5.into_analog();
let pin_4 = gpioa.pa4.into_analog();
let pin_3 = gpioa.pa3.into_analog();
let pin_2 = gpioa.pa2.into_analog();
let pin_1 = gpioa.pa1.into_analog();
let pin_0 = gpioa.pa0.into_analog();

let dma = StreamsTuple::new(device.DMA2);

let config_dma = DmaConfig::default()
    .transfer_complete_interrupt(true)
    .memory_increment(true)
    .double_buffer(false);

let config_adc = AdcConfig::default()
    .dma(Dma::Continuous)
    .scan(Scan::Enabled);

let mut adc = Adc::adc1(device.ADC1, true, config_adc);
adc.configure_channel(&pin_0, Sequence::One, 
    SampleTime::Cycles_480);
adc.configure_channel(&pin_1, Sequence::Two, 
    SampleTime::Cycles_480);
adc.configure_channel(&pin_2, Sequence::Three, 
    SampleTime::Cycles_480);
adc.configure_channel(&pin_3, Sequence::Four, 
    SampleTime::Cycles_480);
adc.configure_channel(&pin_4, Sequence::Five, 
    SampleTime::Cycles_480);
adc.configure_channel(&pin_5, Sequence::Six, 
    SampleTime::Cycles_480);
adc.configure_channel(&pin_6, Sequence::Seven, 
    SampleTime::Cycles_480);
adc.configure_channel(&pin_7, Sequence::Eight, 
    SampleTime::Cycles_480);

let first_buffer = cortex_m::singleton!(: 
    [u16; 8] = [0; 8]).unwrap();
let second_buffer = Some(cortex_m::singleton!(: 
    [u16; 8] = [0; 8]).unwrap());
let transfer = Transfer::init_peripheral_to_memory(
    dma.0, adc, first_buffer, None, config_dma);

Using DMA to obtain the ADC values means that I don’t need to have code for obtaining each value one after the other. In this revised code, I define the pins, set up the DMA, tell the ADC the order in which to measure the pins and then…

adc.start_conversion();

It’s (almost) that simple. This puts eight values onto the buffer that I use later in the code.

Reducing repetitive code

Each measurement is now in its own struct

Originally, all measurements were outlined in one large struct which would be unwieldy to maintain. Now, each measurement is given its own struct.

#[derive(Debug, Default, PartialEq)]
enum MeasurementTypes {
    Voltage,
    #[default]
    Current,
}

#[derive(Debug, Default)]
pub struct MeasurementParameters {
    name: &'static str,
    pin: u16,
    measurement_type: MeasurementTypes,
    calibration_value: f64,
    buffer: u16,
    sum_raw_adc_input: i128,
    adc_offset: i128,
    corrected_value: i128,
    sum_squared_corrected_values: i128,
    sum_power: i128,
    rms_value: f64,
    correction_ratio: f64,
}

All measurements are then contained in a single vector

Each of the structs (one for each measurement) are now placed onto a vector.

let mut measurements: Vec<MeasurementParameters, 8> = Vec::new();

let current_7 = MeasurementParameters {
    name: "Current_7",
    pin: 7,
    measurement_type: MeasurementTypes::Current,
    calibration_value: 1.0,
    ..Default::default()
};

//I've removed some measurements from this example 

let voltage = MeasurementParameters {
    name: "Voltage",
    pin: 0,
    measurement_type: MeasurementTypes::Voltage,
    calibration_value: 266.67,
    ..Default::default()
};

measurements.push(voltage).unwrap();
measurements.push(current_1).unwrap();
// And so on...
measurements.push(current_7).unwrap();

The advantage of having each measurement in its own struct and all the structs on a single vector will become apparent very shortly…

Repeated calculations are performed by iterating over the structs within the vector

We can perform repeated calculations by iterating over the structs within the ‘measurements’ vector.

// For example:
// Code to take the ADC value from the DMA buffer 
// (imaginatively named 'dma_buffer') and put it in the 
// variable named 'buffer' in each struct.

// First we lock access to the 'measurements' vector
cx.shared.measurements.lock(|m| {
// For each struct in the vector...
    for i in 0..m.len() {
// ...retrieve the pin number (assigned in advance)...
        let pin = m[i].pin as usize; 
// ...and take the relevant dma_buffer value.  
        m[i].buffer = dma_buffer[pin];
    }
});

The repeated blocks of code just runs until it is no longer needed.

Using DMA to obtain the ADC measurements and having the measurements in structs which are contained on a single vector meant that it was easy to organise the flow of the code.

  1. init
    • Starts a timer which counts down every 10 seconds
  2. task_1_start_adc
    • Triggered by the timer from the init task
    • Starts a timer which counts down every 1 millisecond (i.e. at a rate of 1000Hz)
  3. task_2_start_adc_conversion
    • Triggered by the timer from task_1_start_adc
    • Starts the ADC conversion sequence via DMA
  4. task_3_retrieve_adc_data
    • Triggered once the ADC conversion sequence is finished
    • If less than 2000 measurements have been taken, spawns task_4
    • If 2000 or more measurements have been taken, spawns task_5
  5. task_4_calculate_adc_offset
    • Just runs repeatedly unless 2000 or more measurements have been taken
  6. task_5_obtain_energy_flow
    • Runs repeatedly when 2000 to 4000 measurements have been taken
    • Stops the timer started in task_1 once 4000 measurements have been taken
  7. Wait for the timer in the init task to restart the loop

The optimisations significantly reduced the code length

    These optimizations to the code meant that the code is now 410 lines (13.9kB) compared to 627 lines (23.7KB source), even though we are now measuring all eight ADC inputs instead of three ADC inputs in the original code.

    Where to from here?

    So far, I have reduced the code by containing the measurements in a vector of structs that the code can iterate over and by using DMA to obtain the ADC values. There is still some repetition in the initialisation code, mainly in the way that each pin is set up. This may be a good area where I could use a macro to automatically generate some of the code and reduce the length of this code base.


    Leave a Reply

    Your email address will not be published. Required fields are marked *