Las Vegas 2020

Monitoring NodeJS with Open Source

Keeping an eye on your application isn't simply pushing the logs somewhere to go nuts with REGEX later, it's much more in modern application stacks. We'll quickly cover a concept Observability and monitoring with Open Source software that can be used with NodeJS apps.


This session is presented by logz.io.

ME

Mike Elsmore

Developer Advocate, Logz.io

Transcript

00:00:13

Good morning. Good afternoon. Good evening. Um, coming from you from sunny Birmingham, even though you can't see it cause the club lines. So I had to talk about monitoring, no jazz with open source. Um, it's quite important in reality. So I I'm Mike , I'm a developer advocate for logs IO. If you wish to get ahold of me, there is my email and my Twitter handle, which also happens to be my get hub as well. So feel free to send any queries during or after this session. So what is the problem? This is covering well, it's no JS or reality. It's just JavaScript because some of these can be applying to, to the front end as well. Um, if you told me 10 years ago, you'd be seeing JavaScript everywhere. I'd probably laughed, but now you do. And we're not just shipping like one huge chunk of it every so often we have more like in reality, shipping hundreds of smaller components, be this in a Docker containers with full, um, express stacks or happy stacks. We're shipping, uh, Lambdas, we're shipping as your functions, we're shipping components, pieces, different bits to different places. We've got backend workers, we've got front end code and all of this is producing well data that we need to keep an eye on.

00:01:38

So we've got all these hundreds of components. There's a lovely little meme that I've been using recently to describe it, which is just Lambdas, Lambdas everywhere. Because when you start shipping one Lambda, you end up shipping hundreds of them. Um, I personally really like Lambdas because it allows me to play with tech quite quickly and quite nicely. So when it comes to monitoring our systems, um, well we're all quite used to just dumping stuff out to console with JavaScript. So I'm just going console dot log, uh, or like attaching it to a log file and just leaving it, be just saying, having it dump out or at least small, a lot of people are in the Earlier part of the 2010 era where we were just throwing stuff somewhere else to think about it later, but what's the next step? Uh, how can we monitor our systems more effectively and

00:02:33

More

00:02:34

With more insight into what's going on? Um, that would be in my opinion, uh, observability. So it's not a new term. It's not an old term, but just for those who aren't aware is what it is. It's um, made of three components, metrics, logs, and traces logs being the historical information as specifically on what's happened within your system, metrics are more of a, what is going on right now and traces, which we're not going to really be covering, um, too much, uh, in the session because 20 minutes is not long enough to cover everything. Unfortunately, that traces the end to end flow of data and information, how your system is interconnected. This is hugely important and launch distributed systems that we see these days, uh, everything from the wonderful graphic Netflix of all of its microservices and all things like that. It's quite important for us to link have knowledge of not only what the individual component is doing, but what the side effects of all of them are. So a lot of people are going, I, you don't need this. The JavaScript to do have friendly Daniel law, it's just not, or it's only the front end or it's a little Lambda validation on the last three bucket, or it's just a connector for, um, the Louis API APIs to connect to something else. So we

00:04:01

Don't, we can,

00:04:02

Well, you probably should, because even if it's a small component or piece that you don't think of every day, it's still within your system and you should probably be keeping an eye on what your overall system is doing. So some of the easy mode, this is the one that everybody's pretty much always aware of and has access to, and has played with before, especially when things are CommBank. So event logs, it's fully easy mode. Um, this is just a mutual data with timestamp attached to it, the common structures, Hare plain text. So just strings in the usual, like formatted version, say the Apache log format or something like that. Uh, we then have structured. Uh, so that is usually based on objects. We're all used to seeing these as well. You kind of in some systems, see them at X amount of which I always find a little odd, cause that's a lot of extra Gump to be sent over the wire quickly.

00:04:56

And then there's binary the most commonly in perfect Bluffs. It's not really, I haven't seen it recently in a lot of systems, but I did use to say in some programs that would just dump out protocol information, um, which oddly enough, I never actually debugged against just a sole what the output of some of the applications I was playing with. Um, as we're talking about this in an open source context, uh, the best open source that we have available to us, most of us and especially me at logs, I know is the elk stack, which is elastic search Logstash and Kibana looks isn't used that much anymore. Um, especially with the, uh, advent of the beets system, especially within logging service, which is Filebeat, which just grabs and streams a, uh, log file to the elastic search. So elastic search as a storage engine in the language to do indexes and look and go on the different levels of you have a look, stash is the transformation and shifting component, which the speakers as well and cabana is the visualization engine.

00:06:08

The way you can look at your data and get more insight into it, rather than just seeing lots of logs, which you need to search. Very, let me go system metrics now, not completely around monitoring, but is in my eyes still a very important part of it because it's telling you what is going in your system right now? So these are once again, immutable data points, but these, um, more numeric information, uh, log data is what's happened. And it's most of the time it's strings of information and long formats failed files, um, memory over usage or stuff that it's a string format. Something that you end up having to search for these are more numeric, more defined values that you can use. So usually consisting of, uh, the timestamp, the label on the data points associated with them. Now the common ones, especially if you're using, uh, the system level metrics, CPU, memory, IO network, and all of these can be displayed as integer values.

00:07:14

So you can graph them and see performance as is in real time of how an application is behaving in the open-source world. The best tools for this are Grafana, which is the UI and control layer on top of a data source. Uh, internally we have this as a product on top of the elastic stack. So we have, uh, Grafana connect to elastic search with some custom cut in between to improve time-series performance. Um, the most common is actually Promethease, which is a complete stack and ecosystem in upon itself using prompt, QL as a querying language. And it has a UI of its own to help you and interrogating what for the data that looks like we are actually working on a prom QL compliant version. However, I'm done. Ask me about any of that. That's the product roadmap side. You can talk to somebody else about that.

00:08:09

And then there's tracing, which is hard mode. It's the thing that most people won't have used or come across. Um, but it's still important. Um, it is the end to end flow of an application consists of traces and spans. The traces are, uh, the actual path, the end or the information is taking so entering in API gateway or a appliance at the front end, then hitting next object, next object, next up next. And then sometimes there's a different services inside your application. Sometimes if it's say a very large service, it'll be individual parts of the code base as well. Um, and those individual blocks are spans. So you can see where things are taking that time to, so you have, I'm going to go back to AWS because that's the one on most common. Uh, well, the, I most commonly use is say, you have API gateway, you're going into a Lambda. You then have it firing off to store in S3, and you have an S3 trigger, an Esri trigger then grabs the information, throws it through a Lambda, which is programmed to Take part. And you use the recognition service to then process that. And I'm frigging information back in. You can then see in that flow, which one of the components is taking too long and then can analyze that component independently and see if there is a performance issue or an issue with some bad code or the actual data that's going through it. But that tracing tracing system gives you an overall view of where the bottleneck may be and how the individual components are impacting each other.

00:09:46

This is as I've written him best shoes within microservices, um, but can be used with the service designs in general. So, um, service oriented architectures where you can have different services, but they may be on the left or anything like this. You can still use it and it will still provide you Holly it's essentially you can think of it more like, um, profiling and debugging, but on a, um, complete system view rather than just a single, uh, application pipeline. So I'm the best tools for that. So there was actually three, but I never put one in because I've never used it. And, um, I don't know enough to be able to answer questions about it. Um, there's yoga is it, there's also Skywalker thing, but I've never used it. So I try not to mention it, to get caught up on it. A Jaeger is at Kent, all the big boys within the distributed, um, Tracing space. So yoga is a project that was spun out as an opensource project from Uber. Um, and they, I don't know if they're still using it, but they will using it at huge scale to monitor their internal infrastructure. And they had a network graph and one of the talks, which looks just as bonkers. So Netflix one, and it can, I remember currently it was spun out of Expedia or something like that. Um, same principle, um, really just

00:11:13

The tooling they use to keep an eye on their huge internal infrastructures, which they then just open-sourced a year ago is getting a huge trend and, uh, considering it's, um, so easy to implement because you can just roll it as a complete stack in itself. There's no reason, but he's not trying, but we're going to skip over that with the next part. Anyway, so here we go at my time, the, um, the thing that proves what I'm going to be talking about, I hope so we're all used to, uh, logs. Oh, God demos. I'm always worried. Um, we used to log, so I have a, uh, quick, uh, Lambda app. That's just doing some stuff to produce some data to prove my point. So we have here using the serverless framework because it makes my life easier. We have a bucket that things have been dumped into it.

00:12:09

I don't have, um, we are grabbing, yes, I'm using the hell out of function as my initial input. Some of the reason I couldn't be bothered to change the name, this one is going to be grabbing information from Giphy, uh, throwing into us three. And then the S3 one is going to be grabbing that file and then producing a single frame, um, thumbnail for me because how else do you produce random leaving meaningless data? We also have some most of down here, which I'm going to example in a minute, but we'll start with the logs. So in a handler, we can go one step above consult dot log or keeping a well formatted structure, um, with, uh, JavaScript and the application stack that it's using and all the different places that can be shipped to you can intercept the logs. So many different layers. Um, for example, if you're using like this, it's a Lambda, you don't have access to the host.

00:13:05

So you can't stick Fabi on the host and stream off a log file. You can't just intercept. Uh, it's not inside of, um, you can't, you don't have access to the host. You can't stream the Docker tail to be able to read from a Docker container. It may be, and you don't have access to all that in the past layer, you may in the IIS layer. So, um, something more like, um, easy to, for lack of a better word, you can actually get hold of the host and be able to stream the information from the host. So you can use final beat or the components, or you could, if you're using bad metal, just control everything. And it's fine. You can into something at different layers, but because you, for, for me, I tried building my code, uh, as if I don't know where it's going to be shipped to.

00:13:56

So here we go, who comes with a nice concept called transports within the, um, common logging libraries. So I'm using Winston because it's one of the more common ones and it produces a very nice JavaScript, uh, Jason format that we can use, uh, inside of cabana. And we have our logger. Now this log I have actually configured to do two things. One is froing, all statements that aren't debug and above to the console. Now it's in a Lambda. That means it's going to end up in CloudWatch. So I have access to all the information in CloudWatch, regardless of how important or how unimportant it is. I'm also sending it via the Winston transport for logs IO. I was sending it to the logs, I have service. So we're getting how to access the inside of the ERK stack to play with, but we're only sending info based logs to play with and going to be able to look those logs up based upon the application name, which is handily called, uh, narrowed, Lunda observability with its version number just to make my life easier.

00:15:01

But hopefully we won't need to worry about that. And, um, it's only for an infer level. So we only want information. That's more pertinent. We don't want just the random debug statements of how launch when the gifts is all and who we want. We want the success metrics, uh, has a, the process and the errors, because there is a more important, we want to know when something could bank. Um, we got, the Lander is doing that. We've got a bunch of info statements and debug information. It's all being thrown fruit, and it's the same with the P uh, the post-process one as well. And it's just produces more data. So if we go to cabana, we can see we, aren't getting lots of lovely information. I have got some, uh, other experiments on here running as well. So I apologize for the extra junk, but we can see here, sorry, finally click tight and no London observability at that time.

00:15:55

And now we're only getting the logs that are coming directly out of, um, the application itself. And we can see that it's coming through, it's coming through with its, uh, log level and it's coming over, uh, with the tanks out of timestamps. So we can do information with that. Then we can do information with it. We can do work with it. Um, that's the quickest way to share that, uh, just as an FYI, if you don't want to have to deal with this internally, because you don't want the extra overhead that the transporters may incur, which they can incur. Um, if you go to our docs and you wanted to send data directly from CloudWatch, there is a way of doing this. I'm not the biggest fan of how it does it, but it can be done. And that was the one logs, uh, POS uh, we got CloudWatch and this is a Lambda that's coming in and grabbing the information from CloudWatch every minute and sending it through to elastic, uh, for, to the ASIC search instance, we have to run against, um, I do actually have this one installed, uh, because I wanted to play with it myself and make sure it's working.

00:17:02

So I actually do have it here running, and it's just a little Python. We're just grabbing stuff on the CloudWatch logs, uh, which is if I close that filter. Yeah, there you go. That CloudWatch Lander is all the S3 messages. So if we didn't want to collect it from within the land, we didn't want to have to do the extra work. We can grab it just straight off CloudWatch in the background. Uh, we can also do some quick visualizations cause I like plight visualizations. We can just get some decent value out of that information with, um, with the elastic stack, uh, where is my, it's got a heat map. Hemopsia nice ones, guy with a count and shows you everything they want the X value to be. Uh, let's make a term. So it just comes out with things, a log level, a fine can find it quickly. We do pop up and a metric count. We'll see that. And it should just drop it all into. Nope, no, I've done it onto the wrong X. Yes. Yes I have. Uh, uh, anyway, uh, I only have a few minutes to quickly show this. So door to door, let's get up back up level metrics at sub buckets. Y-axis aggregation, date histogram, uh, or play and what that means too many series defined.

00:18:41

Try this again, Joe, uh, X date, histogram PLI last 15 minutes. Excellent. Uh, and so bucket Y segregation terms, scroll to level, um, and now we can see, we are getting where the logs are being produced in the infantry era and warning get different parts of the application. It's not really insightful because unfortunately we can see the 30. So if we just quickly switch the options, increase that to the maximum of 10 and play again, we get a slightly better indication of where our errors are appearing at what time and whatnot. So we can see where an application is going to be producing too much error information with that's something we should definitely investigate. And just as an aside, there's the metrics.

00:19:51

We can do the same thing on our Lambdas. So we can just see where all the information is and see how performance is and how things are playing out. So we can see if something's performing badly. Oh, let's get that slights. Um, if you do want to work with this in the more of an open source approach, there is open telemetry. Now that is now tracing and metrics going into GA in the not too distant future and logs coming around after that. But this is a standard specification, which will allow you to, regardless of vendor ship information from a to B and do all of your system monitoring. Um, so if you want to go to open tonometry.io and have a look into a wonderful source project, that's driving all this forward. And if you have any questions, feel free to find them at me and I will do what I can to help, or if I don't know, I definitely know somebody who will. So thank you very much for listening and I will hope to see you on the internet. Goodbye.