Karl's Code

and other code related stuff

ScalaJs Markdown Combinator Parser

This Blog is about Scala Combinator Parsers but excitingly about compiling the Scala to JavaScript using Scala.js.

It is also a slide deck for a talk given at the ScalaSyd meetup Sept 2017.
To switch between modes press a number as follows :

  • ‘1’ -> Doc mode:
    • shows the document as intended.
  • ‘2’ -> Deck mode, see the slides
    • see the slides
  • ‘4’ -> Lecture Mode
    • enter zooms current navigated to section
    • click zooms div or block clicked

Arrow keys navigate to next or previous section or slide

A Markdown combinator parser in

Scala

13th September 2017
To present this document press 2. Press Esc to get back to document view. Left and Right arrow keys to navigate. See suited.js

A Markdown combinator parser in

Scala.js

13th September 2017
To present this document press 2. Press Esc to get back to document view. See suited.js

This Blog is my talk for ScalaSyd meetup on September 13th 2017. It is about Scala combinator parsers and Scala.js

Intro

  • This talk is about Scala
    •  
    •  
    •  

Intro

  • This talk is about Scala.js
    •  
    •  
    •  

Intro

  • This talk is about Scala.js
    • Combinator Parsing
    •  
    •  

Intro

  • This talk is about Scala.js
    • Combinator Parsing
    • Markdown
    •  

Intro

  • This talk is about Scala.js
    • Combinator Parsing
    • Markdown
    • suited.js

Before diving into nitty-gritty details it’s helpful to explain my motivation for this.

Why?

  •  
    •  
    •  
  •  
  •  
    •  
  •  
    •  

Why?

  • LambdaJam 2017
    •  
    •  
  •  
  •  
    •  
  •  
    •  

Why?

  • LambdaJam 2017
    • compile all the things to other languages
    •  
  •  
  •  
    •  
  •  
    •  

Why?

  • LambdaJam 2017
    • compile all the things to other languages
    • treat javascript like assembly lang for the web
  •  
  •  
    •  
  •  
    •  

Why?

  • LambdaJam 2017
    • compile all the things to other languages
    • treat javascript like assembly lang for the web
  • Wanted to explore Scala.js
  •  
    •  
  •  
    •  

Why?

  • LambdaJam 2017
    • compile all the things to other languages
    • treat javascript like assembly lang for the web
  • Wanted to explore Scala.js
  • I needed yet another partial project
    •  
  •  
    •  

Why?

  • LambdaJam 2017
    • compile all the things to other languages
    • treat javascript like assembly lang for the web
  • Wanted to explore Scala.js
  • I needed yet another partial project
    • devs seem to like starting projects
  •  
    •  

Why?

  • LambdaJam 2017
    • compile all the things to other languages
    • treat javascript like assembly lang for the web
  • Wanted to explore Scala.js
  • I needed yet another partial project
    • devs seem to like starting projects
  • I had too much time on my hands one day.
    •  

Why?

  • LambdaJam 2017
    • compile all the things to other languages
    • treat javascript like assembly lang for the web
  • Wanted to explore Scala.js
  • I needed yet another partial project
    • devs seem to like starting projects
  • I had too much time on my hands one day.
    • solved!

So apart from getting down with all the transpiling kool kids I actually have a need to do some JavaScript jiggery-pokery.

What?

  • This talk is presented using suited.js
    • a JavaScript library
    • Allows a single document to render as page or slide deck
      •  
      •  

What?

  • This talk is presented using suited.js
    • a JavaScript library
    • Allows a single document to render as page or slide deck
      • this slide deck is actually my in blog
      •  

What?

  • This talk is presented using suited.js
    • a JavaScript library
    • Allows a single document to render as page or slide deck
      • this slide deck is actually my in blog
      • hint: hit key 1,2 or 4 for fun
        • mode 4 will zoom on enter or click

suited.js

  • was written by Dirk and myself
    • a small library
    • uses no other js lib
  • event driven
  • plugin architecture

suited.js

  • but then we wanted markdown

Suited is cool and I’ve given many talks using it and the same document can be a blog or article too. But we always want it to do more for instance we wanted to write out talks in markdown.

So we needed a markdown plugin, but we also wanted a magic syntax to add slides and figures in markdown, Suited.js uses <section data-figure> and <section data-slide> to mark out sections of the contents to appear as navigable sections. Slides are just visible in the slide show whereas figures are visible in doc mode and also in the slide show.

markdown slides

~~* **your bold stuff** *~~

markdown figures

~~: **your bold and _italic_ stuff** :~~

So we implemented a markdown plugin, but delegated to markdown-it.

But suited has a fatal flaw. It has no parser to implement fragments

suited.js

  • Fragments are no fun

fragments

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<section data-slide>
### Why?
* &nbsp;
  - &nbsp;
</section>

<section data-slide>
### Why?
* because
  - &nbsp;
</section>

<section data-figure>
### Why?
* because
  - it's like that!
</section>

markdown-it plugin?

  • I dont want to write another markdownit plugin
  • especially a complex one
    • it will probably add javascript to the markup

so I wrote my own markdown parser for Javascript?

  • Using Scala.js
  • how hard can it be?

Because I want to rely on a few libraries as possible while experimenting with Scala.js I decided to use Mark Hibberd’s Combinator Parser demo code from his previous Scalasyd talk “Dont fear the parser” as a starting point.

Parser recap

case class Parser[A](run: String => ParseState[A])

Parser recap

1
2
3
4
sealed trait ParseState[A]

case class ParseOk[A](input: String, value: A) extends ParseState[A]
case class ParseKo[A](message: String) extends ParseState[A]

Parsing is OK but I also want to render the parse tree into HTML, this is what Transformers are for.

Transformer typeclass

1
2
3
4
5
6
trait Transformer[T] {
  type IN
  type OUT

  def run(t: T, in: IN): OUT
}

Syntax pimp and Aux Pattern

1
2
3
4
5
6
7
8
9
10
11
// syntax pimps
implicit class TransformerOps[T0](foo: T0) {

  /**
   * magic wand. pimp alias of Transformer.run eg a transform function
   */
  def ---*[A,B](bar: A)(implicit aux: Transformer.Aux[T0,A,B]) : B = {
    aux.run(foo, bar)
  }
  ...
}

See my previous talk on the Aux Pattern

Now lets look at a specific member of the Transformer typeclass. The MarkdownToHtml transformer.

I wanted to split the transform function into two parts, one to parse, and of course I want to use my markdown parser’s run function in this place, and one to render the ParseResult into HTML.

Not only does this make it easier to reason about but also (I Hope) easier to build a plugin interface where you can supply a function to parse modified markdown as long as it still produces a ParseResult[Markdown] and one to enrich the HTML or even transform it completely, say into rich text or PDF

MarkdownToHtml

1
2
3
case class MarkdownToHtml(
    p: String => ParseState[MarkdownDoc],
    r: ParseState[MarkdownDoc] => Html)
  • notice I’ve split it into 2 functions
    • one to parse, p
    • and one to render, r

MarkdownToHtml

use instance to join the Transformer typeclass
1
2
3
4
5
6
7
8
9
10
object MarkdownToHtml {
  import Transformer._
  import ast._

  // Use `instance` typeclass constructor to add MarkdownToHtml to the Transformer typeclass
  implicit val m2hTransformer:
     Transformer.Aux[MarkdownToHtml,String, parser.Html] =
       instance( (t, in) => t.r(t.p(in)) )
  ...
}

see MarkdownToHtml

MarkdownToHtml ---* function

  • Now MarkdownToHtml is in the Transformer typeclass
1
simple ---* """## this is h2 i presume"""

Demolition time

  • Scala parser and render in the sbt console

So let’s demonstrate the parser and renderer in the console.

Demo parser: start console

me@host $ sbt
sbt:Markdownem> console
scala> :paste consoleimports.txt
Pasting file consoleimports.txt...
import parser._
import parser.markdownParser._
import transformers._
import transformers.Transformer._
import transformers.MarkdownToHtml._

Notice that

1
[warn] Scala REPL doesn't work with Scala.js. You are running a JVM REPL.

The Scala REPL is only available to the Scala code. So only code built without depending on JavaScript libraries will work in this REPL demo.

But that’s OK, this demo has no JavaScript dependencies, it just generate JavaScript output.

Demo headerParser

1
scala> headerParser.run("""## this is h2 i presume""")
1
2
res0: parser.ParseState[ast.Markdown] =
  ParseOk(,H2(List(this is h2 i presume)))

Demo headerParser

1
2
scala> headerParser.run("""## this is h2 i presume
     | more input not in header""")

The header parser just parses a single header.

1
2
3
res3: parser.ParseState[ast.Markdown] =
  ParseOk(more input not in header,
    H2(List(this is h2 i presume)))

Notice that this is a successful parse, and that the ParseOK case class also contains the remaining input for further processing

Also notice that the H2 contains a List of more Markdown, in this case rawHtml. This is because the link text could contain more inline markdown, like so…

Demo headerParser

1
scala> headerParser.run("""## this _is_ h2 i **presume**""")
1
2
3
4
res1: parser.ParseState[ast.Markdown] =
  ParseOk(,H2(List(this ,
                Italic(List(is)),
                h2 i , Bold(List(presume)))))

Demo headerParser failing

1
scala> headerParser.run("""this _is_ NOT h2 i presume""")
1
2
3
4
scala> headerParser.run("""this _is_ NOT h2 i presume""")
headerParser.run("""this _is_ NOT h2 i presume""")
res0: parser.ParseState[ast.Markdown] =
  ParseKo(Input failed to match predicate.: instead saw char 't')

Admittedly, this is not the best error message (for better errors use the built in Scala combinator parser library rather than this demo). It failed to see a # as the first char in a header and so produced a ParseKo instead of ParseOk.

The important thing to see is that a parser stops as soon as it can go no further.

In order to parse a whole document we can use the combinator methods, such as ||| i.e. or on Parser and some of the parsers such a list to compose more powerful parsers that continue parsing input by trying one parser after another.

An example is the markdownParser which will try all blockParsers followed by the inlineParsers continuously until it can parse no more.

However that will only occur when there is no more input because the last parser it tries is always the rawHtml parser which will always succeed because any input that is not parsed as markdown must be is just treated as RawHtml.

Demo markdownParser

1
2
scala> markdownParser.run("""this _is_ NOT h2 i presume
     | ## but this _is **h2**_""")
1
2
3
4
5
6
7
res4: parser.ParseState[List[ast.Markdown]] =
  ParseOk(,List(this ,
             Italic(List(is)),
              NOT h2 i presume,
             Hardwrap,
             H2(List(but this ,
               Italic(List(is , Bold(List(h2))))))))

Now we can see that the input parses into a List of Markdown. This is all very well but it’d be nice to see the rendered HTML.

This is where simple MarkdownToHtml Transformer comes in.

I can now use the ---* magic wand function

Demo simple render

1
2
scala> simple ---* """this _is_ NOT h2 i presume
     | ## but this _is **h2**_"""
1
2
3
4
res1: transformers.MarkdownToHtml.m2hTransformer.OUT =
"this <em>is</em> NOT h2 i presume
<h2>but this <em>is <strong>h2</strong></em></h2>
"

Demo simple render a complex list

1
2
3
4
5
6
7
8
scala> simple ---* """* unordered list item
     |   1. _nested_ list item
     |   2. **another _nested_** list item
     | * unorderd [link](http://foo.bar) list item
     | 
     | [a ref link][1] whose url detail is at the end
     | 
     | [1] http://the.reflink.com"""

Demo simple render a complex list result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
res0: transformers.MarkdownToHtml.m2hTransformer.OUT =
"<ul>
    <li>unordered list item</li>
    <ol>
        <li><em>nested</em> list item</li>
        <li><strong>another <em>nested</em></strong> list item</li>
    </ol>
    <li>unorderd <a href="http://foo.bar">link</a> list item</li>
</ul>

<a href="http://the.reflink.com">a ref link</a> whose url detail is at the end
<p>
<!-- [1] http://the.reflink.com --></p>
"

So… Performance?

What about Performance?

I haven’t run proper benchmarks yet, but as a quick and dirty test I have added timing output to the main function.

remember this in build.sbt

1
scalaJSUseMainModuleInitializer := true
  • After compiling to JavaScript
    • adds a call to the main function in the MainClass
    • at the end of the JavaScript.
  • This makes the main function run when the JS is loaded

We can get sbt run task to do the same for us

1
2
3
4
5
6
7
8
9
sbt
[info] Loading settings from idea.sbt ...
[info] Loading global plugins from /home/robertk/.sbt/1.0/plugins
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from /home/robertk/projects/skunk/markdownem_js/project
[info] Loading settings from build.sbt ...
[info] Set current project to Markdownem (in build file:/home/robertk/projects/skunk/markdownem_js/)
[info] sbt server started at 127.0.0.1:5300
sbt:Markdownem>

We can get sbt run task to do the same for us

1
2
3
4
sbt:Markdownem> run
[info] Running tutorial.webapp.TestApp
[error] java.io.IOException: Cannot run program "node": error=2, No such file or directory
...

What happened?

It is an error from sbt as it is trying to run the default JavaScript runtime Node.js to execute the generated JavaScript.

sbt can also run using other JavaScript runners such as PhantomJS, Selenium or Rhino see the docs

I had better load NodeJS onto my PATH, i use nvm for this.

 

WTF!

#

WTF!

  • It was trying to use node to run the JavaScript.
  •  

#

WTF!

  • It was trying to use node to run the JavaScript.
  • I better load it onto my PATH

1
2
3
4
5
6
7
8
9
sbt:Markdownem> exit
$ nvm use v7.9.0
Now using node v7.9.0 (npm v4.2.0)
$ sbt
[info] Loading settings from idea.sbt ...
...
[info] Set current project to Markdownem (in build file:/home/robertk/projects/skunk/markdownem_js/)
[info] sbt server started at 127.0.0.1:5300
sbt:Markdownem>

We can get sbt running NodeJs

1
2
3
4
5
6
7
8
9
10
11
sbt:Markdownem> run
[info] Running tutorial.webapp.TestApp
>>>>>>>>>>> ms: -> 690
>>>>>>>>>>> ms: -> 443
>>>>>>>>>>> ms: -> 434
>>>>>>>>>>> ms: -> 447
>>>>>>>>>>> ms: -> 423
>>>>>>>>>>> ms: -> 430
[success] Total time: 4 s, completed 21/09/2017 9:53:01 AM
sbt:Markdownem>
...

The TestApp just renders 112 lines of markdown into HTML a few times.

We can see that after the initial run where it starts the node environment it ends up taking about 430ms.

That seems slow!

seems a bit slow!

  • 112 lines of markdown in 430ms

  •  

seems a bit slow!

  • 112 lines of markdown in 430ms

  • lets compare the JS to a scala/JVM run

If we comment the JS lines in build.sbt we can build and run the main app in standard Scala on the JVM

comment Scala.js from build.sbt

1
2
3
4
5
6
7
8
9
// enablePlugins(ScalaJSPlugin)

// This is an application with a main method
// change this to true if you want the The TestApp main class to be a JS "Application"
// scalaJSUseMainModuleInitializer := true

name := "Markdownem"
scalaVersion := "2.12.2"
...

now run as a ScalaJVM app

1
2
3
4
5
6
7
8
9
10
11
12
13
sbt:Markdownem> reload
sbt:Markdownem> run
[info] Compiling 8 Scala sources to /home/markdownem_js/target/scala-2.12/classes ...
[info] Done compiling.
[info] Packaging /home/markdownem_js/target/scala-2.12/markdownem_2.12-0.1-SNAPSHOT.jar ...
[info] Done packaging.
[info] Running tutorial.webapp.TestApp
>>>>>>>>>>> ms: -> 592
>>>>>>>>>>> ms: -> 263
>>>>>>>>>>> ms: -> 234
>>>>>>>>>>> ms: -> 229
>>>>>>>>>>> ms: -> 225
>>>>>>>>>>> ms: -> 235

On the JVM same code runs 200ms faster

  • 112 lines of markdown in ~ 230ms

  •  

  •  

  •  

On the JVM same code runs 200ms faster

  • 112 lines of markdown in ~ 230ms

  • But this is not really a fair comparison with NodeJS

  •  

  •  

On the JVM same code runs 200ms faster

  • 112 lines of markdown in ~ 230ms

  • But this is not really a fair comparison with NodeJS

  • The Javascript was not fully optomised because it would take longer to compile

  •  

On the JVM same code runs 200ms faster

  • 112 lines of markdown in ~ 230ms

  • But this is not really a fair comparison with NodeJS

  • The Javascript was not fully optomised because it would take longer to compile

  • So lets optomise it.

use the optimising compiler

  • The Javascript was compiled with the FastOptStage compiler
    • meaning it was fast to compile
  • We can tell sbt to run with the FullOptStage compiler
    • which will optimise much more

un-comment Scala.js, put it back into build.sbt

1
2
3
4
5
6
7
8
9
enablePlugins(ScalaJSPlugin)

// This is an application with a main method
// change this to true if you want the The TestApp main class to be a JS "Application"
scalaJSUseMainModuleInitializer := true

name := "Markdownem"
scalaVersion := "2.12.2"
...

tell sbt to use Full optimisation

1
2
sbt:Markdownem> reload
sbt:Markdownem> set scalaJSStage in Global := FullOptStage
  • N.B. we can set it back with
1
sbt:Markdownem> set scalaJSStage in Global := FastOptStage

now run optimised Javascript

1
2
3
4
5
6
7
8
9
10
sbt:Markdownem> run
[info] Full optimizing ./target/scala-2.12/markdownem-opt.js
[info] Closure: 0 error(s), 0 warning(s)
[info] Running tutorial.webapp.TestApp
>>>>>>>>>>> ms: -> 590
>>>>>>>>>>> ms: -> 298
>>>>>>>>>>> ms: -> 281
>>>>>>>>>>> ms: -> 296
>>>>>>>>>>> ms: -> 270
>>>>>>>>>>> ms: -> 259

Javascript in NodeJS almost as quick as JVM

  • 112 lines of markdown in 260ms

But still it seems slow

  • Combinator parsers are recursive decent parsers
  • every or ||| branch backtracks on the input
    • and re-parses of the first parser fails
  •  
 

But still it seems slow

  • Combinator parsers are recursive decent parsers
  • every or ||| branch backtracks on the input
    • and re-parses of the first parser fails
  • Packrat parsing’s memoization can massivly improve it.
 

But still it seems slow

  • Combinator parsers are recursive decent parsers
  • every or ||| branch backtracks on the input
    • and re-parses of the first parser fails
  • Packrat parsing’s memoization can massivly improve it.
caveat: also I’ve done no code optimisation yet

Runing in the sbt prompt is all very well.

But how can I run this code in my browser?

So… how do I run it in my browser?

export a javascript function to the “toplevel”

  • this allows other javascript ion a page to call it.
  • add @JSExportTopLevel annotation to the method to export
1
import scala.scalajs.js.annotation.JSExportTopLevel

export a javascript function to the “toplevel”

  • add @JSExportTopLevel annotation to the method to export
1
2
@JSExportTopLevel("mdmagic")
def transform(md: String): String = simple ---* md

compile the optimised JavaScript

1
2
3
4
5
6
sbt:Markdownem> fullOptJS
[info] Compiling 8 Scala sources to ./target/scala-2.12/classes ...
[info] Done compiling.
[info] Full optimizing /home/robertk/projects/skunk/markdownem_js/target/scala-2.12/markdownem-opt.js
[info] Closure: 0 error(s), 0 warning(s)
[success] Total time: 16 s, completed 21/09/2017 11:26:04 AM

import the markdownem-opt.js in a html page

1
<script type="text/javascript" src="./target/scala-2.12/markdownem-opt.js"></script>
NB you probably want to set scalaJSUseMainModuleInitializer := false in build.sbt
to prevent the slow main app test from running when the page loads

simply use the mdmagic function in some javascript
1
2
3
element = document.getElementById(elementId)
theContent = element.innerHTML
element.innerHTML = mdmagic(theContent)

* Open test.html in your browser - to see it render the markdown as html

For Remaining topics eg

Javascript interop

TODO….

  • just use scala lib PackratParsers or sbt-rats
    • better error messages
    • faster
  • add the plugin interface
  • write the fragments plugin
  • re-write suited.js in scala.js
  • look at a recursion scheme for render
  • Free Monad/Applicative for renderer/interpreter

The End

Comments