Karl's Code

and other code related stuff

Refactoring Ugly Scala to Idiomatic Scala

This Blog is my (#Scalasyd)[https://github.com/scalasyd/scalasyd] talk about refactoring Scala code to be more idiomatic

It is also a slide deck for the talk given at the ScalaSyd meetup May 2018.
To switch between modes press a number as follows :

  • ‘1’ -> Doc mode:
    • shows the document as intended.
  • ‘2’ -> Deck mode, see the slides
    • see the slides
  • ‘4’ -> Lecture Mode
    • enter zooms current navigated to section
    • click zooms div or block clicked

Arrow keys navigate to next or previous section or slide

Refactor

Ugly Scala Code

9th May 2018

 

Code Weaver

 

By

Karl Roberts
@MrK4rl
To present this document press 2. Press Esc to get back to document view. Left and Right arrow keys to navigate. See suited.js

In this blog I’ll be talking about working with Scala to build applications.

Demo code

Confession

  • First I need to admit to a dirty little secret I’ve been keeping…

The Ugly Truth

  •  
  •  
  •  

The Ugly Truth

  • It can be easier to hack away in an imperative style
  •  
  •  

The Ugly Truth

  • It can be easier to hack away in an imperative style
  • Especially when hacking on a new Java API
  •  

The Ugly Truth

  • It can be easier to hack away in an imperative style
  • Especially when hacking on a new Java API
  • Or noodling around to get an understanding

The Excuses?

  •  
  •  
    •  
    •  
  •  

The Excuses?

  • Thinking in Java
  •    
    •  
    •  
  •  

The Excuses?

  • Thinking in Java
  • looping in loops
    • helps walk or build a data structure
    • while I’m developing it
  •  

I have a bunch of justifications for this behaviour they may not be good excuses but here they are.

The Excuses?

  • Thinking in Java
  • looping in loops
    • helps walk or build a data structure
    • while I’m developing it
  • Types get in the way less

The Problem?

  • Sneaky Bugs can exist
    •  
  • Hard to test pieces in isolation
  • Hard to understand later or refactor?
    •  
  •  

The Problem?

  • Sneaky Bugs can exist
    • especially in the edge cases
  • Hard to test pieces in isolation
  • Hard to understand later or refactor?
    •  
  •  

The Problem?

  • Sneaky Bugs can exist
    • especially in the edge cases
  • Hard to test pieces in isolation
  • Hard to understand later or refactor?
    • you may forget to take account of some out of view mutation
  •  

The Problem?

  • Sneaky Bugs can exist
    • especially in the edge cases
  • Hard to test pieces in isolation
  • Hard to understand later or refactor?
    • you may forget to take account of some out of view mutation
  • Gives me the creeps.

The code written in the style, that you are going to see, runs ok (I think) but I am always wary of it, because I know how hard I had to work to get it running. Coding it was a trial and error affair, with contant compiling and running and attaching debuggers to test what was happening to my mutable placeholders.

The trick is to refactor it to a more imutable functional style as soon as it is “working”.

It is always best to refactor runnable code because we can see that the refactor caused no disernable difference to the outcome by simple running it.

Tests help of course, but the problem with testing mutable sections of code is that you have to test from the outside, as alluded to above it is hard to test an isolated fragment to see if it works.

The Solution?

  • Once Running Refactor
    • while you still understand the problem
  • Remove mutation
  • Remove loops
  • Learn the API you are working in
    • it may well have realy helpful methods/functions for you to use.
  • Discover nice general patterns

My Example?

source code location

api doc for List

This code is extracted from a Crypto currency tool I wrote to help me keep track of my position on coinspot.

Unfortunatly at the time coinspot api did not allow me to get all the data from them so I needed to extract my trade history as csv, parse it and build up a data structure to allow me to query my position and other useful info.

Using my Position data structure I can see what currencies I’m making money in and ask questions ike “how much bitcoin did I buy below this price?”

The big problem solved above is that the trade history showed prices for cross trades in terms of each other eg bought 1.2 ETH for .05 BTC but I want to know my position in AUD for each currency, so I was forced to look to the internet for historical prices of the coins in AUD at the price of the time of the trade to work backwards.

Here you can see I have a list of previously known prices from which I build up a PriceTable which is a Map of PriceTableForCurrency.

This is really a big map of maps. The caller of this function (in this case the main function) saves the newly discovered prices to a file that I read and parse on start-up to contruct my known prices List.

In short what I need to do is:-

PriceTable

  • I need to build up a PriceTable for Currencies.
    •  
  •  
    •  
    •  

PriceTable

  • I need to build up a PriceTable for Currencies.
    • A Map of Maps
  •  
    •  
    •  

PriceTable

  • I need to build up a PriceTable for Currencies.
    • A Map of Maps
  • So I can
    •  
    •  

PriceTable

  • I need to build up a PriceTable for Currencies.
    • A Map of Maps
  • So I can
    • look up a currency and
    •  

Build a PriceTable

  • I need to build up a PriceTable for Currencies.
    • A Map of Maps
  • So I can
    • look up a currency and
    • from there look up a price a date I care about

For context you should know that to help me work around the Map of Maps in the imperative style I created a few type aliases and helper classes to make it easier to reason about, the main structures are as follows:-

PriceTable

// a type alias
type PriceTable = Map[Currency, PriceTableForCurrency]

Here PriceTableForCurrency is just a case class that wraps a Map but has a helpful API that allows me to augment it and build it up from HistoricPrice objects which I get from parsing the CSV file of previously discovered historic prices.

PriceTableForCurrency

source code location
case class PriceTableForCurrency(currency: Currency, prices: Map[LocalDateTime,Double]) {
  /**
    * Will overwrite if we alredy have a price at this time
    * @param historicPrice
    * @return
    */
  def +=(historicPrice: HistoricPrice) = {
    val hp = (historicPrice.localDateTime, historicPrice.quotePrice)
    this.copy(prices = prices + hp)
  }
  def ++=(other:PriceTableForCurrency) = if(other.currency == currency) this.copy(prices = prices ++ other.prices) else this

  def asHistoricPrices: List[HistoricPrice] = {
    import scala.collection.mutable.ListBuffer
    val lb = ListBuffer[HistoricPrice]()
    prices.foreach(tup => {
      lb += HistoricPrice(tup._1,currency,tup._2)
    })
    lb.toList
  }
}

My Example - quick re-read for context

source code location

api doc for List

The first part I want to look at is building up the initial PriceTable (effectivly a Map of Maps) from the known prices list, which I read in from a CSV file.

Building a Map of Maps is something that I do a lot while coding, and it is a pain every time.

Walking a list and examining the Map to see if I have the Key in the map already and adding it and a value if I don’t have it or modifying the value if I do have it is tedious for a single Map. It is exponentially more tedious and error prone for each level of nested Map.

If only there was a better way.

Lets create a better Map of Maps

Api docs for Map
  • We can contruct a Map like this:

    val myMap = Map(“a” -> “little A”, “b” -> “little B”)

  •  

  •  
    •  

 

Lets create a better Map of Maps

Api docs for Map
  • We can contruct a Map like this:

    val myMap = Map(“a” -> “little A”, “b” -> “little B”)

  • The -> is a function that create a Tuple

  •  
    •  

 

Lets create a better Map of Maps

Api docs for Map
  • We can contruct a Map like this:

    val myMap = Map(“a” -> “little A”, “b” -> “little B”)

  • The -> is a function that create a Tuple

  • and so this is like building a Map from a list of Tuples
    • ie (Key, Value) pairs

 

Lets create a better Map of Maps

Api docs for Map
  • We can contruct a Map like this:

    val myMap = Map(“a” -> “little A”, “b” -> “little B”)

  • The -> is a function that create a Tuple

  • and so this is like building a Map from a list of Tuples
    • ie (Key, Value) pairs

api doc for map

This makes sense. A Map can be thought of as a list of Key and Value pairs.

So if I can convert my List into a list of Key/Value pairs, ie a Tuple2 or (Key,Value), I should be able to turn it into a Map.

Lets create a better Map of Maps

Api docs for List
  • If only there was a way to turn a list of tuples into a Map

  •  

Lets create a better Map of Maps

Api docs for List
  • If only there was a way to turn a list of tuples into a Map

  • api doc for List

Awesome. I can do it using the standard library.

Lets create a better Map of Maps

Api docs for List
  • So we can build a map from a list but what about a Map of Maps?
  •  

  •  

Lets create a better Map of Maps

Api docs for List
  • So we can build a Map from a list but what about a Map of Maps?
  • If only there was a way to group list items by some attribute

  •  

Lets create a better Map of Maps

Api docs for List
  • So we can build a Map from a list but what about a Map of Maps?
  • If only there was a way to group list items by some attribute

  • api doc for List

Mmm. This is giving me a clue. Maybe I can refactor this.

Lets create a better Map of Maps

Api docs for List
  • So this is cool we an turn a List of Tuple2 into a Map
  • We can also group a List into a Map with a List as its values

  • So if the value List could be transformed into a Tuple

    • we have th building blocks for a Map of Maps

This is good I’m getting a raging clue now.

Lets do it - fix Map of Maps trial 1
  • git checkout start_1

Lets do it - fix Map of Maps trail 2
  • git checkout start_2

Lets do it - fixed Map of Maps
  • git checkout start_3

That was great. Now I have a functional tool to construct maps and maps of maps.

List.groupBy

Now fix the ugly foreach

Ok now that is done lets look at the ugle walk of the xtrades (cross trades) list using foreach.

While its better than a forloop (which could adds another possible iteration bug) the body of the foreach is convoluted. The problem is that it is doing too many things at once.

I need to break it down into smaller problems which will not only make it easier to test and debug but will structure the code around clearer thinking.

Lets fix that ugly foreach

  • Instead of trying to do 3 things
    •  
    •  
    •  
      •  
      •  
      •  

Lets fix that ugly foreach

  • Instead of trying to do 3 things
    • walk trades looking for a historic price miss
    •  
    •  
      •  
      •  
      •  

Lets fix that ugly foreach:

  • Instead of trying to do 3 things
    • walk trades looking for a historic price miss
    • fetch the new historic price
    •  
      •  
      •  
      •  

Lets fix that ugly foreach:

  • Instead of trying to do 3 things
    • walk trades looking for a historic price miss
    • fetch the new historic price
    • add it to Map of Maps
      •  
      •  
      •  

Lets fix that ugly foreach:

  • Instead of trying to do 3 things
    • walk trades looking for a historic price miss
    • fetch the new historic price
    • add it to Map of Maps
      • looking out to see if we have the key or not first
      •  
      •  

Lets fix that ugly foreach:

  • Instead of trying to do 3 things
    • walk trades looking for a historic price miss
    • fetch the new historic price
    • add it to Map of Maps
      • looking out to see if we have the key or not first
      • and if not adding else just adding
      •  

Lets fix that ugly foreach:

  • Instead of trying to do 3 things
    • walk trades looking for a historic price miss
    • fetch the new historic price
    • add it to Map of Maps
      • looking out to see if we have the key or not first
      • and if not adding else just adding
      • again looking to see if we have the nested key etc..

Lets fix that ugly foreach - why not this?

  •  
    •  
  •  
  •  

Lets fix that ugly foreach - why not this?

  • Filter the list of trades
    •  
  •  
  •  

Lets fix that ugly foreach - why not this?

  • Filter the list of trades
    • to create a List of trades that dont have a price
  •  
  •  

Lets fix that ugly foreach - why not this?

  • Filter the list of trades
    • to create a List of trades that dont have a price
  • Then just get them
  • I can worry about how to stuff the new prices in the PriceTable later

Lets do it

  • git checkout start_4

That was simple. As a side benefit, now I am not iterativley Building the lists and Maps, I can get rid of all the var placeholder Maps and Buffers

A close look at the the code shows that I was able to stuff the new prices into the imutable PriceTable by folding down the new prices.

Add to imutable PriceTable

  • use a fold
  • have the starting accumulator be the current PriceTable

api doc for List

Ok

so …?

  • We’ve tidied most of it up
    • by mapping and filtering and folding we dont need var placeholders anymore
  •  

  •  

  •  
    •  

so …?

  • We’ve tidied most of it up
    • by mapping and filtering and folding we dont need var placeholders anymore
  • Can I get onto the fun stuff?

  •  

  •  
    •  

so can I prematurely optimize yet?

  • We’ve tidied most of it up
    • by mapping and filtering and folding we dont need var placeholders anymore
  • Can I get onto the fun stuff?

  • so can i tune it up a bit?

  •  
    •  

So now the code is tidy and manageble, is it time for me to speed up that slow price fetch from the internet?

so can I prematurely optimize yet?

  • We’ve tidied most of it up
    • by mapping and filtering and folding we dont need var placeholders anymore
  • Can I get onto the fun stuff?

  • Can i tune it up a bit?

  • Yes by fetching in parallel.
    • which is now trivial as I have simple collections to work with.

so can I prematurely optimize yet?

  • I will use scalaz Task
  • to wrap the get historic prices function
  • and return a List[Task[PriceTableForCurrency]]
  • rather than a List[PriceTableForCurrency]
  • then gather all the tasks in parallel
  • to give me back a List[PriceTableForCurrency]

Lets do it
$ git checkout start_5
$ # clear out the existing prefetched prices
$ cat /dev/null >  $HOME/uglycode/coinspotpos/historicPrices.csv
$ sbt clean run

So lets have a look at how much tidier the code is now.

Nice.

The End

Comments