What is going on, everybody? Welcome back to my Youtube channel. Richard on data. And if this is your first time here, my name is Richard. And this is the channel where we talk about all things, Data, data, science, statistics and programming so subscribe for all kinds of content. Just like this if you haven’t already and make sure you hit the notification Bell, so Youtube notifies you whenever I upload a video. So this is another video in my R tutorial series and the first videos of the series, I covered the packages, Dplyar ggplot2 and Tidyar. These are some key packages out of the tidyverse, which help you wrangle a dataset into a nice clean format. And then if you’re using the ggplot2 package, it’s super easy to take that data and make clean pretty looking visualizations. Uh, based on that data, so a couple of the other tutorials I did after that cover. Uh, base R functionality, as well as the various data types in R. Now, if you’ve seen my R tutorial number four on all the data types, I do make a lot of mention of the fact that dates and date times can be a little bit tricky to work with now. That’s where the lubridate, uh, package is gonna come in handy. This is another package out of the tidy verse and anytime you’re dealing with dates and date times, it is going to make your life a whole lot easier, so we’re going to cover some various functionality out of it, namely, creating date and daytime objects in the first place, isolating the various components that make up a date time, as well as the various time span classes that is durations, periods and intervals. So this script is going to be available on my Github. Repo, that’s going to be linked in the description of this video. Uh, just please note that some of the code here, not as much as some of my earlier tutorials. But still, a little bit is adapted from the Book, R for Data Science by Hadley Wickham and Garrett Grolamund highly recommend checking that book out the link to that will be in this script as well As in the description of this video, I’ll also have a link to a cheat sheet for lubridate functions as well as a couple helpful resources, which, uh, will just break down some various examples for the various lubridate functions so without further ado, let’s get started so two key functions that we’re going to start with here are the today function and the now function let’s start with today, so I’m going to create an object called day using the today function. Then let’s just look at the structure of that. So this object day. It’s 20 20 09 22 That’s the day that I’m recording this video and this is an object of class date, not much more to it than that. Then if we use the now function again, I’m going to create an object called date time using this now function. If we look at the structure of that, we have the full date time that I ran this chunk of code. It’s 20 20 09 22 at 17. 52 46 local time, and this is an object of class Posix CT. So if you’ve seen my, uh, tutorial on the data types in R, I mentioned that Posix CT is one of the most common, uh, classes for date times in R. Basically, what a posix CT does is. It encodes the number of seconds since an origin point, specifically, that Origin point is 1970-0101 at midnight in the UTC Time Zone, So that’s our two key classes for dates and date times in R just the date and the Posix CT. Now what I’m going to do next is I’m going to create some character objects, then I’m going to convert those into dates and date times, so if we go down here, I’m going to create two objects, string, one and string two. Those are going to be 20 20 09 22 something that just looks like a date, but then 20 20 09 22 at 17. Oclock. Something that looks like a date time, but when I run this code Our defaults to thinking these things are just character objects, and that’s fine now we’re going to convert these things into dates and date times. There’s a couple different ways to do this, so in base R, you do have these functions as dot date and as Dot Posix CT, so for some people remembering Asposix CT and remembering you capitalize the posix and don’t the CT remembering that is not something. A lot of people necessarily like to do so, lubridate makes this a little bit easier and more consistent. We’ve got these new helper functions as underscore date and as underscore date time. So I’m going to run all this code and you can see the base R function as dot date and the lubridate function as underscore date return the exact same thing date objects and then as Dot Posix CT and as underscore date time, both return 20 20 09 22 17. Oclock, but these are objects of class posix CT, both of them so again, it’s ultimately somewhat a matter of personal preference, but lubridate just makes this a little bit more consistent and easier with these functions as underscore date and as underscore date time. Now as anybody who’s worked with real world date and date time data could tell you these things are not necessarily going to be clean, and they’re not going to necessarily be in consistent formats, so luckily, luberdate comes with some more helper functions, which no matter what the format of our data actually look like when it comes in as a character form, but it’s really some kind of date or date time. There are different functions for getting these in deposit CT or date formats, so let’s look at some different examples, So I’m gonna store the same, uh, date in a variety of formats. I’ve got 20 20 09 22 I’ve got 09-20 22-20-20 then I’ve got the way. A lot of people in Europe will style this today 22 slash the month 09 slash the year 2020 as well as some various ways of, uh, styling, an actual date time, so we’ve got these various helper functions in luberdate like YMD. That’s year month date, mdy that’s month date year, etc, etc. That tell r let’s look at this string. Let’s look for the year first, then the month, then the date so on and so forth, you get the idea. And there are similar things for the date time components, So let’s say you have year month day and then the hour year month day the hour and the minute or year month day the hour. The minute the second, which hopefully you’ll have most of the time. So let’s look at all these examples because they’re all going to come together Very clearly. I use the YMD function for date format one. I got the year the month the date got it. Bam, then for date format two. I had the month then the day then the year. So I use the mdy function. Bam for date format three. I had the day, followed by the month, followed by the year. I used my handy. DMY function bam got to 2020 09 22 now for this next date time. I used mdy that’s month date. Y underscore HMS for the hour minute second bam and so on with the next example year month date and then even though this is formatted in a military. Uh, time kind of thing like 1700 Uh, ymd underscore. HMS gave me just the date time that I wanted now. These things aren’t 100 perfect. There are some examples out there that can throw these things off, but by and large, these functions do a pretty good job, so just remember these functions whenever you need to get whenever you’re in a situation where you’ve got some weird formatted data and you need to get it in a consistent date or daytime format, and then, also if you look up the help documentation for any of these like, just do a question Mark search for, uh, ymd underscore HMS. Uh, the documentation for this is fantastic and covers covers the majority of these Now. Of course, once you actually get your dates and your date times into the format that you want. It’ll happen all the time that we want to do some kind of summary or some kind of analytic by the components of that date time, so we want to summarize something by year or by month or by hour or whatever the case may be any of these various components. Lubridate has a function for it, So let’s take a look at all. These we’re going to start by creating an object and call it today date. That’s going to be 2020 09 22 at the time 1715 and let’s just look at what every single one of these functions returns. So the year function gives us back 2020 Obviously, the month function is going to give us back the number nine the M day that stands for day of the month. That’s going to give us back. Uh, the number 22 then the hour function is going to give us 17. The minute function is going to give us 15. The second function is going to give us zero, then as for this y day function, that’s pretty interesting. That’s going to give us the numbered day of the year and it turns out. This is the 266th day of the year who would have known right. Then you also have this W day function. That’s going to return us back the day of the week now. Today is Tuesday, So that’s day number three out of your week because your week, uh, begins on Sunday So Sunday Monday Tuesday would be the third day so these are pretty straightforward with a couple of these functions, though, like month and W Day, You probably want the actual name of the month or the day in question. So in those instances, just specify label equals true as an argument to these functions and we’re going to get The month is September and the day of the week is Tuesday, respectively. Next I’m going to show a couple examples in a more practical sense of where these helper functions from lubridate, isolating the various date time components are going to be helpful and to do this. I’m going to use the flight’s data. If you’ve seen my first R tutorial, you’re familiar with all this. The flight’s data frame comes in the NYC flight’s 13 package. Uh, it’ll be freely available for you to access all I’m going to do is create a data frame called data where I’m just selecting three columns flight carrier and most importantly, time underscore hour, which is a posix CT variable. So in my next chunk of code here, I’m going to create a new variable called month just using the month function where I’m specifying label equals true. All I want to do is create a bar chart of the count of flights by month. And then bam! That was super easy right on the X-axis. I’ve got the various months January through December and the count on the Y-axis. That was easy so next thing I’m going to do is create a line plot for flights by hour of the day. I’m going to do that By creating a new variable and call it hour just use the hour function, obviously do a group by the new hour variable, uh, and use the tally, uh, function to to get the distinct count for each for each hour and then bam. So now I have a line plot showing the count of distinct flights. Uh, for each hour of the day turns out a lot of people don’t like flying between the hours of like midnight and five am. I don’t blame them. Neither do I then the last thing from the Luber Day package, which you should be familiar with are the various time span classes and there’s actually three different classes of them. There are durations, periods and intervals so just to give some definitions for these durations measure the exact number of seconds that occur between two instances in time, a period measures the change in clock time that occurs between two instance, these two seem very similar, but you’re going to see where they’re different in a minute here. Then intervals are kind of different from the other two. Those are the full time spans representing a beginning and an end point, so let’s look at some examples here because that’s really going to illustrate these things, so I’m going to create an object called start date as well as an end date and those are going to be the beginning and end points of the month of March 2020 So if I create a new object called Diff Time, just let that be end date minus start date, that’s going to give me an object of the diff time class. Specifically, it tells me it’s a time difference of 30.999 repeating days, so this diff time thing, I’m going to turn into a duration as well as into a period, so the duration, uh, object tells us that it’s two six, seven, eight, three, nine, nine seconds. I guess that’s the number of seconds in 4.43 weeks or this many days, but this period object breaks it down specifically as 30 days 23 hours, 59 minutes and 59 seconds. So again, there’s an important distinction here, and you’re gonna see it in a little bit here. The interval, uh, object, though, that’s just, um, that’s just nothing to it, But the beginning point and the end point, but it’s all kind of encapsulated into one single object. Now, both durations and periods come with various helper functions to them and where these are going to be very useful is whenever you need to modify existing date times to add some unit of time to them. Let’s say you need to add so and so seconds to that date time minutes. Whatever the case may be, we’ve got these helper functions, D seconds, D minutes, D hours, etc, etc for durations, but periods also have similar helpful functions. So we have just seconds minutes, hours, etc, etc. And those will all work too, so let’s take our start date, which was again the beginning of March 2020 and let’s look at, for instance, D Hours five, so we’re adding five hours or minutes. That’s the period helper function for minutes. We’re gonna add 300 minutes to our start. Date notice that both the duration, helper function and the period helper function. Give us the exact same result. So now I’m going to show you a case where these may not necessarily give you the same result. So one thing which can really trip R up with date. Times is daylight savings time. So if we take the day, uh, 2020 0307 at midnight at noon. Sorry, so let’s say we add one day of time to that. Using the two different approaches, well, in this instance, we actually get two different results and the reason for that is because March 8 of 2020 that is the first day of daylight savings time and how it works. Is you just skip over one of the hours of that day? So actually, this day, March 8 of 2020 That’s a one day that only has 23 hours in it. So how this works is this duration function. D days is just going to add 24 hours or one day worth of seconds to that original date time, so you’re going to go all the way to one oclock the next day of 2020308 that is, whereas with this day’s function, that’s just the human units of days, and when humans like you and I tend to think of a day around daylight savings time, we’re not thinking of. Oh, it’s literally 24 hours or 24 hours worth of seconds. Were just thinking of the next day at the same time. So this is the key, uh, instance in which the duration class and the period class could give. You could give you different results so again. Daylight savings time. That is a common instance in which that can trip you up if you’re dealing with dates and date times and R and you end up with strange, bizarre results, just check that you’re not running into any kind of daylight savings time kind of kind of results. Now I’m sort of glossing over intervals here because just based on my own personal experience, I don’t use them a whole lot, but they do have their own set of helper functions like in start and and int flip just things like that such that if you prefer to work using that class of data, your or store your data that way you’re able to do so now, if you just search for the help documentation here using like question, Mark In’t underscore start. All the help documentation is pretty thorough and again. It’s pretty intuitive. If you choose to use these so that covers the lubridate package. Now, this package, as a member of the tinyverse probably doesn’t have as much of a learning curve and there’s not as much functionality to it as packages like, deploy R or ggplot2. But mark my words, if a time comes when you’re dealing with some messy data frame. That’s got weird looking date times in it. This package is going to make your life much easier. So get familiar with it before you have to. So thanks all of you for watching this tutorial. If you found it helpful, please consider sharing it. Otherwise, at least consider smashing the like button. And then I’ll see you all in the not so distant future until then, Richard on data.