Recap.: some of the things tried, did, lessons-learned, etc.

My exposure to modern JavaScript web framework was started with EmberJS back in 2013, for a project developing inventory system.

I didn't consider myself fullstack developer back then. I was dealing more with database design, workflow -- using workflow management system Activiti, and REST middleware -- using Play framework of Scala at that time, but I had to chip in helping speed up the work of frontend developer.

The experience with EmberJS was sufficient to draw my interest in modern webapp development. Coming from Java background with JSP, I used to see web frontend of less importance (my bad), merely as simple window to what we have in the middleware. But now with modern web UI framework like EmberJS, things got more interesting (from Object-Oriented programmer point of view).

I already knew before then patterns like Presentation Model but thought it was only applicable in desktop settings (with Swing Framework for example), where there we can manages the state in the frontend. It was a bit tricky doing that with Servlet-JSP-HTML because of the split between the UI (HTML) and the underlying model (state) which resides in the middleware.

Back then (before I knew EmberJS) the closest thing to ideas in Presentation Model for me was Tapestry framework (tried that, wow-ed by it for enabling us to write server-side UI components with all the bindings to the model). But then I moved on to Stripesframework because I found Tapestry too esoteric at that time.

Fast forward, AngularJS. The switch from EmberJS to AngularJS was for two reasons: First, I found AngularJS was simpler / more pragmatic compared than EmberJS. Secondly, it's Google (and popularity and development community that entails). So, I used it the first time for developing a dashboard to display monitor election results and simulations: https://www.youtube.com/watch?v=_EwkqRivmiY and https://www.youtube.com/watch?v=0xGo_JyhfMs .

Along with it, I developed my interest for data vizualization (since I was in the data space eversince I work with my friend Edmundo at Baktun). I came across and liked a lot D3JS library (a simple programming model: data in, data out, data at intersection, and define handler for each of those cases).

Here are some of visualizations I wrote: https://www.youtube.com/watch?v=SsUtlhKkSHo , https://www.youtube.com/watch?v=GXDUsm7BUQ0 , https://www.youtube.com/watch?v=k_ybN8SDhnw , and this is a little framework to ease working with chart (just provided data in JSON that complies to certain format, along with the metadata): https://www.youtube.com/watch?v=GZdvCQIvSA4

Then I had a project with a friend developing a pothole-reporting app. This was my first time experience developing a complete mobile app (not just a few form type of screens). I took the path I deemed easier, that is to leverage my experience with AngularJS. I developed it as a hybrid mobile app, using Ionic framework.

The video: https://vimeo.com/175320858 . The mobile app uses the camera feature, offline storage, and has account linking (with Facebook) through the use of OAuth 2.0, which allows publishing sharing the pothole report to Facebook page and your personal newsfeed. The company who gave me the project is a road maintenance company, and they make money from government assignations to fix the potholes, so it's in their best interest to make the reports go viral, to add some pressure.

Along with it I developed the dashboard web application (also in Angular) that allows seeing all the reports in the map, along with their statuses, and make assignation to workers to fix them.

I also use the same knowledge on AngularJS to develop a mobile app that is used by person conducting interview (in a survey project). At that time I was working with a friend (Edmundo) whose family runs a survey business. This mobile app connects to a middleware to download questionnaires (JSON documents) associated with the projects she is assigned to. The application has an engine that interprets the questionnaire, and guide her along the list of questions in the questionnaire. We can have JavaScript expression inside the JSON document, that serves as a rule for deciding where to go next (which question) based on the answer to current question.

The app was specifically written to deal with poor internet connection. Therefore, it has offline storage feature. The challenge was to have a proper and robust syncing between offline and online data. Video: https://vimeo.com/175312345

Encouraged by positive feedbacks / welcome from the clients and users, we decided to build a survey platform, based on the premise we can live by renting this platform to other companies running survey executions. We wanted to make this as a SaaS, where everything is in the cloud, but... we ended up with kind of hybrid (ASP and SaaS), where each single project is assigned its own stack (from DB instances, to REST api, to dashboard app).

In order to help a bit with deployments, we went with containerization, docker. That was my first time taking on DevOps tasks (as well). Some BASH-scripting along with docker-compose (later replace by a more visual solution, Rancher) we were able to achieve a "push-button" deployment of those containers. We added API gateway into the mix (using Kong), in order to automate routing configuration, so we can establish routing schemes like this for each project: dashboard-web.[project-name].[organization-name].baktun.net, dashboard-api.[project-name].[organization-name].baktun.net, mobile-api.[project-name].[organization-name].baktun.net, etc.

We learned that deployment model was not the most efficient in term of resource usage. But that was because the system itself was in a very early stage, maybe prematurely launched into the wild, and each survey project that came to us had different reporting requirements (thus different aggregations at the REST api and different pages / elements in the dashboard). So we had to make modifications on REST-api and dashboard frontend, to cope with requirements of each project. The design of our system was not flexible enough to cope with different requirements presented to us (we were in learning stage).

I realized that situation was far from ideal, and it drained us from having to write aggregation queries and dashboard frontend for each project. A better solution would have allowed user themselves "write" the queries and "construct" the dashboard. We explored various options, including incorporating an off-the-shelf solution similar to tableau..., namely Superset from AirBnB. It was compelling to us because it can pull data from Druid, the same database that we use for analytics of survey results.

At the end we didn't use Superset, although I still think that could be the best route, if only we knew how to sort of the challenge of integrations (especially the access control; how to bridge the ACL model in our survey realm with ACL model of the superset itself). Ad interim, I developed a simple charts library, which is just a thin layer above a commercial chart library called Highcharts. The additional abstraction is aimed at simplifying the use of the charting library, providing uniform data & metadata structure, for different types of charts. Video: https://www.youtube.com/watch?v=GZdvCQIvSA4

And for the report builder, we experimented with code generation using Blockly (https://developers.google.com/blockly). The idea was to provide a set of primitives (and not-so primitive) blocks that enable end-users to shape the report. Those blocks can be combined, just like pieces of lego, and together what they do basically is a series of operations that transform the result of queries to interview database into a JSON-structure they want, which later can be fed into the charts mentioned in the paragraph above. I made a video presentation that shows the use of blockly in the survey platform here: https://vimeo.com/205063500 . Also, I did a talk in JavaScript User group in Guadalajara about Blockly, the slides here: https://app.box.com/s/w9z9mitvd2kovmnruxzon891cxf4lkgs

Out of the need to have a modern / future-proof security mechanism for securing the services that make up the survey platform, I did some learning on OAuth 2.0. One of the articles that I like most -- and helped me in my attempt to implement the mechanism -- was this: https://nordicapis.com/how-to-control-user-identity-within-microservices/ . This helped me built my own (poor man) implementation of Single Sign-On, which I explained here, https://www.youtube.com/watch?v=r7FAuAlKIqY , using a combination of OAuth 2.0 plugin provided by Kong API gateway, and custom authentication server implemented using ExpressJS.

At some later point, I came across an Identify & Access Management System from RedHat called Keycloak. I got hooked into it because of its features (from the basic OAuth 2.0, OpenID connect), ease of administration, User-Managed Access (most interesting, allowing scenarios like lending access to co-workers / other persons), etc. I did some Proof-of-concept of the use of Keycloak for single-sign-on, and along the way I came across a bug, which I participate in providing bugfix: https://issues.jboss.org/browse/KEYCLOAK-8828

I'm scheduled to do a talk in upcoming Java User Group meetup in Guadalajara this 27 November, and I'm going to be presenting Keycloak.

In short I had an opportunity to work with / put into use different types of shiny-stuffs during the development of this survey platform. I also learned, maybe the hard way, that you should start simple. Probably all those ambitions to play with bigdata -- through the use of Druid, Hadoop, SaaS model, containerization along with its automation, etc..., weren't really necessary. Using heavy stuffs like Druid, Hadoop, comes with administration costs, which could take engineering time away. I shared some of the issues based on my experience here: http://jananuraga.blogspot.com/2017/04/what-should-we-expect-from-devops.html

Probably we should have focused more on the actual challenges (like providing simple UI for administering survey plans, or providing a simple UI for constructing questionnaire JSON, or making sure the design of dashboard-frontend is pluggable that would've allowed us to plug in custom report-generation scripts for each project, etc).

That, along with limited team power (only 5 programmers, including myself, the only senior) and lack of focus at the end killed the team (programmers resigned, wanting to be in a healthier work environment where software engineering process is more appreciated). Among the things we had to juggle with were:

1. Having to deal with many survey projects.
2. At the same time trying to build version 2.0. with all the concerns we learned along the way taken into account in the design.
3. Side project imposed upon our team that had nothing to do with survey paltform (data cleansing in an insurance company and schoolbus fleet management system).

At the end I lost my team, on which I spent my effort for almost a year building it, because of lack of focus & proper handling of engineering process (constantly cornered in "urgent" mode)..., and that put a halt on the realization of the ideas.

Nevertheless, I was able to keep some lessons-learned from it, and some encouregement from my ex team members: https://app.box.com/s/t1i8zqpdeervxffo8tqwnrhcwbtdpl77 and https://app.box.com/s/c936cj737mravbd3n6587w13d3oc16zw

On the Scala programming..., I picked up that skill about 4-5 years ago (because at that time it was considered the hottest and shiniest programming language). Especially in the hype of bigdata, everybody's talking about Spark, where Scala is at the heart of it. So it was my primary motivation to learn Scala (apart from Java was kind of losing its steam for a moment at that time).

I picked up some new perspectives on writing softwares like: functional programming, reactive programming (with its reactive manifesto "let it crash", as opposed to defensive programming I was used to when programming in Java), actor-based programming, and ES/CQRS (Event-Sourcing / Command-Query-Responsability-Segregation).

I did a talk in Java User Group Guadalajara about 4 years ago. The slides here: https://vimeo.com/163235306
I made a presentation on ES/CQRS using Lagom framework here: https://vimeo.com/166771346

The frameworks I used when working with Scala was:

1. Akka (I used it to build twitter crawler daemon)
2. Play framework (I used it in inventory-system project, along with a BPMN (business-process management) engine named Activiti: https://www.youtube.com/watch?v=yKLaIGPmUn0 )
3. Lagom

I got really hooked into ES/CQRS model. Because experience has taught me at least two things:

(1) serious system must have audit trail,
(2) CRUD system also led me to the same problems:
    1. anemic domain model
    2. losing battle of trying to have Entity-Relationship model that works well for both write operations and read operations.
    3. concurrency issues (with transactions, which has a lot to do with the way our domain is modeled).

In ES architecture, audit trail is at the heart of the design (no longer afterthough / peripheral). The events are essentialy the audit trail. CQRS promotes separating write parts and read parts (by having separate applications), which basically circumvent the losing battle mentioned above. Concurrency issues is tackled by making the oprations serialized (events are processed in serial).

In the NodeJS world, I did quite a lot of exploration to find which open source framework I can use to build my ES/CQRS solution upon. At the time of writing we have Wolkenkit, NestJS, and an unnamed / too-generically-named framework written by Adriano Raiano (http://cqrs.js.org/).

When I started looking for solution for ES/CQRS in NodeJS, there was only http://cqrs.js.org/ . Actually there are others, but they don't have production-level concerns as cqrs.js.org, such as aggregate-locking, command deduplication, etc. So I sticked to that. I made a presentation about it: https://vimeo.com/235661687

Last but not least, I also have implemented solution that uses a lot of messaging (using RabbitMQ). I used that technique in the renewed twitter crawler (previously I wrote it using Akka, using actor-based programming, which basically is also based on messaging, but .... its own mechanism, not a message broker like RabbitMQ).

The NodeJS + RabbitMQ version is based on microservices ideas: a federation of invididual service, each doing specific task (following UNIX principle: do one thing and do it right), which communicates among themselves using messages that run through message broker (RabbitMQ). The twitter-crawling business is not just about querying twitter API for specific keywords. It also involves other things like:

1. Finding out who share the tweet that matches the query, who do the retweets, who make replies, etc (in hope of finding patterns like "clique" amont twitter users).
2. Expanding the short URL to articles mentioned in the tweets (because we are interested in knowing how popular the articles are).
3. Going into the articles, inspect its content, and extract relevant information: who is the author, what's his twitter account, what tweets he included in the article, what are the related articles, what are relevant keywords in the article (and based on that we organically expand our search).
4. Build a graph out of those data (in order to be able to do graph analysis later). I made a video presentation on Neo4J here: https://www.youtube.com/watch?v=pUPgY3piqaQ

I wrote a blogpost and video presentation about the twitter crawler (microservice architecture), here: http://jananuraga.blogspot.com/2018/03/introducing-glazplatova-10.html

I also use RxJS (reactive programming library in NodeJS), because I needed to have some fancy signaling scheme for querying Twitter API (the crawler has separate channels for quering tweets and users. I needed to give tweet stream higher priority while there are hits..., and only increase the users stream once the end of the tweet search results is reached).

I also made a a fancy infix expression evaluator, to demonstrate how to do stream-programming using RxJS: https://github.com/rakamoviz/infixCalc


Raka & Bienvenida, galloping in the woods.

Drone view: Bienvenida and Raka

Comparison, twitter-crawler, in Akka vs in RxJS

It's been two years since I wrote my first twitter-crawler, at that time using Akka framework in Scala programming language. Last month I wrote another (better, cleaner) twitter-crawler, using microservice architecture (complete with message queuing and caching). This time using RxJS library in JavaScript (see the blogpost about Glazplatova)

... and I felt the need to articulate the difference between the two..., or maybe the reason why. Ok, spoiler alert: actually my motive wasn't purely technical. I can't really tell whether RxJS is better than Akka for this particular use case (or vice versa) -- more on that later. I was just having some difficulties to understand the complex Scala/Akka code I wrote more than two years ago.

It was becoming hard for me to add new features to that code (yeah, I broke the "don't code today what you can't debug tommorow" principle). Beside, these days -- for about a year now -- I've been making heavy use of RxJS in NodeJS scripts, mostly for ETL purposes where I have to do transformations over data that comes in as streams.

Precisely because of this "freshness of RxJS in my head", now I feel the need to refresh my knowledge of Scala/Akka. Making this video serves me that purpose, and I hope the audience can benefit from watching it, somehow :) As usual, I made this video without script. So my apologies for getting off track at several points. Without further ado, here's the video:

Now, as I got the end of the recording, curiosity crawled in. Primarily because even after all that explaining and demo-ing, I don't think I managed to make a point. What exactly is the difference? Or maybe a better question would be: when to use Akka? when to use Rx? Are they equivalent? Are they alternatives to each other (competing)? I googled it up, found this: What are the main differences between Akka and RxJava?

Well..., that was it, if you care about distributing the crawling activities across several nodes, then use Akka, as it handles the distribution of actors across several nodes automatically & transparently.

In the light of that..., was writing the crawler in RxJS a bad decision (because there is no automatic replication & distribution)? Not really. I mean, in my case, there is no need to have multiple instances of crawler. Why? Because of rate-limit in the Twitter API (we can only make certain amount of requests, during certain period of time, for certain endpoint). If you have more than one instance of crawler, and all of them using the same twitter account, and each one of them runs on separate node..., it would be very difficult to control the (total amount of) requests-per-minute, and you will hit the roof very (or too) quickly.... But...If each one of the crawler use a distinct twitter account, then it would be a different story. But in that case, I would simply run another instance of crawler (another NodeJS processes, can be in separate node), which will run independently from the first instance of the crawler.

Beside, the crawling (talking to TwitterAPI) is only part of the bigger picture. There are other activities, as I showed in my video, such as: resolving the short URLs of the embedded articles, extracting metadata from articles, analyzing content of articles, storing in different types of database (mongo, sqlserver, neo4j), plus whatever thing you can come up with such as sentiment analysis in "real-time". Neither of them suffer the same constraint as the crawler (that rate-limiting). Beside the tasks they carry are stateless..., so any of them can easily scaled, simply by spawning another instance of container for the service, and make it listen to the same work-queue as the existing service instance. This sketch of the architecture of the crawler (RxJS version) can clarify what I just stated. It is arguably easier to understand and explain than my version of twitter-crawler in Akka. I insist, not Akka fault. It's mea culpa... probably if I had known better about best practices in Akka, better tooling, etc etc.... probably.

Now..., for the sake of discussion, let's assume I do care about replicating the crawler. The answer would be "use Akka". But now the question: why not Spark? It's also based of Scala, it has some list transformation operators, it also handles distributing the handling of items in the list (much like several actors in Akka, running in different nodes, working in parallel, emptying away their respective mailbox, which are partitions of the complete stream, by means of router).

Well..., this twitter crawler is not so much about transforming things (as opposed to job in spark which is about data-processing-pipeline). It's more about a daemon process. A process that interacts with the outside world (pulling data from twitter in this case), listen to external events (notifications from Redis in this case), and adjusts its interaction with the outside world accordingly, round the clock. Code in Spark is about __job__ (one-off), this twitter crawler is __daemon__. The data transformation / processing itself resides in other microservice(s)..., the sentiment analysis for example..., which receives stream of tweets emitted by the crawler and channeled through message queue (I use RabbitMQ here). For that one, indeed, we can consider using Spark, spark-streaming, especially when you think of using Spark ML libraries for big data.

Now the question: why not Apache Flink (instead of spark-streaming)? :D Ooo... kay..., it's getting late now, maybe that's a topic for another blogpost. See ya!

Oh..., and here are some nice links about RxJS. Might be handy for future references:

Playing around with my sweet "Bienvenida"

She is Bienvenida, my sweet little aztec horse (a mare), now completes 5 years old.

I guess now I should start looking for a good horse so she can have a baby horse :)

Attn. Programmers: know some DevOps stuffs. It’s good for you

Somewhat lengthy video, one hour and a half. The original topic was DevOps for programmers; I wanted to make a point why I think DevOps knowledge is important or beneficial to programmers (so yeah, learn container stuffs guys). It's like an expansion or continuation of blogpost about my view on DevOps, posted here: "What should we expect from DevOps."

I used the Twitter crawler I made (Glazplatova) as a vehicle to this video. I also have a blogpost about that crawler, here: Introducing Glazplatova. Why that thing? Because, it's quite a complex system, that is composed some little microservices. Lots of different kind of servers need to be brought up to bring this entire crawler system up. That's where knowledge of docker shines.

Attn. Programmers: know some DevOps stuffs. It's good for you. from Cokorda Raka Angga Jananuraga on Vimeo.

Along the way I also explain a little about the architecture of this crawler system; how I use messaging (rabbitmq) and caching (redis) for that. I also gave a glimpse of reactive programming using RxJs (for processing stream of tweets in memory efficient way, and elegantly..., and still being able to react on external events, like changes in the queries). I also happen to have a blogpost about RxJS, here: Introduction to reactive programming

So yeah, without further ado, here's the video "Attn. Programmers: know some DevOps stuffs. It’s good for you.". I hope you enjoy it and can take some values out of it. Please comment below or inbox.

Oh,... and this little piece that I forgot to include in the above video, about the Neo4J (graph database) I use in the crawler: