Steve

Tuesday, July 8, 2014

Swift vs Object-c

After I got a Macbook air, I have a lot of interests in developing iOS applications. However, you know, in this year's WWDC there was a notice about Swift which will be an alternative programming language for iOS and MacOS X. Because of that, I am reluctant to start studying Object-c. Obviously studying several programming languages would be helpful for me, but nowadays I am busy for applying my transfer to UC Berkeley. In your case, what would you do?

Saturday, April 5, 2014

CFS_vruntie

CFS(Complete Fair Scheduling)은 linux 2.6.23에 들어간 이미 오래된 기능이다.

리눅스 커널 2.6의 가장 큰 변화 중 하나가 O(1) Scheduler라고 그렇게 외쳐됐던 것이 엇그제 일인데 벌써 다른 스케줄러로 교체되어 버렸다.

요즘 너무 메모리쪽으로만 집중을 하다보니 스케줄러 쪽으로 많이 소원해진 것이 사실이다.

아무리 그래도 기본적인 Design Concept정도는 알고 있어야 할 듯 해서 이 페이지를 정리해 본다.

이글은 아래의 페이지를 내 마음대로 해석하여 정리한 글임을 밝혀둔다.

http://gustavus.edu/+max/os-book/updates/CFS.html

2008/08/15 11:40:46 - barrios -

크게 봐서는 CFS는 이전의 스케줄러와 같은 원리로 동작한다.

CFS도 이전의 스케줄러와 마찬가지로 niceness가 각 쓰레드가 할당받는 CPU time을 제어하기 위한 주요 수단이다.

O(1) 스케줄러는 동적으로 priority를 조절하여 waking thread가 조만간 얼마나 빨리 수행될지를 조절하였지 runnable한 쓰레드들 사이에서의 CPU sharing을 조절하진 않았다.

CFS에서는 runtime을 측정하는 방법과 waking thread가 빨리 수행될 수 있기를 보장하기 위한 방법이 새롭게 사용되었다.

CFS로 들어가기에 앞서 이전 O(1) 스케줄러의 문제점을 몇가지 살펴보자.

nice값으로 CPU sharing을 조절하게 되는 O(1)에 있어서 같은 nice값을 갖는 두개의 쓰레드가 있다고 가정하자. 두쓰레는 공평하게 CPU time을 반씩 나누어 가질 것이다. 하지만 스위칭은 얼마나 자주 일어나게 될까? nice 0이 가지는timeslice가 100ms이고 두 쓰레드가 nice0이라면 매 100ms마다 스위칭이 발생할 것이다. 반면에 nice19라면 그리고 timeslice가 5ms라면 5ms단위로 스위칭이 발생할 것이다. 이것은 이상적인 throughput과response time의 trace off는 아니다. 하지만 시스템에 여러 nice값을 가진 쓰레드들이 존재한다면 다소 완화될수는 있다. 어쨋든, nice의 변경으로 인하여 스위칭 rate가 변경되는 것은 바람직한 관계는 아니다. 굳이 관계를 지어본다면nice값이 높을 수록 CPU intensive한 프로그램이다. 즉 long-running 쓰레드들이라는 것이다. 이런쓰레드들은 throughput이 response time보다 더 중요하다. 그러므로 적은 timeslice로 인하여 빈번한스위칭은 바람직하지 않다. 그러므로 nice값과 스위칭 rate 관계는 기존과는 반대로 되야 할 것이다.

niceness와 timeslice의 관계도 이상하다. 1 정도의 nice값의 차이가 나는 쓰레드가 2개 있다고 가정해보자. CPU를 어떻게나누어 가지게 될까? 대답은 nice의 값의 차이보다는 abosulte niceness level의 차이에 따라 다르다. 예를들어보자. niceness를 0과 1을 가지는 쓰레드들은 100과 95ms의 timeslice를 갖게 될 것이다. 다음niceness 18과 19를 가지는 경우를 살펴보자. 그때는 10과 5ms의 time slice를 갖게 될 것이다. -_-;;

wakingthread들을 더 우선하는 시스템은 큰 문제를 가질 소지가 있다. 리눅스의 waking thread들은 thread의priroity와 관련된 위치의 active array에 들어가게 된다. 게다가 얼마동안 잠든 채로 시간을 보낸runnable한 쓰레드는 timeslice를 다 소진했을 지라도 다시 active array로 들어오는 bonus를 받게된다. 그래서 active array가 쓰레드들을 포함하고 있는 한, expired array에 어떤 쓰레드들도 CPUtime을 얻지 못하게 된다. waking 쓰레드들이 꾸준히 active array로 들어오게 되면 expired array에있는 runnable thread들은 예측할 수 없는 임의의 시간동안 지연되게 된다. 단 2개의 쓰레드들만으로도 sleep과wakeup을 조작하여 runabble한 쓰레드를 몇초 동안 지연시킬 수 있다. 왜냐하면 time slice를 주로 잠으로써 다소진한 쓰레드 는 expired array로 가지 않고 active array로 가기 때문이다.

CFS는 각 niceness level을 timeslice로 할당하기 보다는 weight로 할당하고 runnable한 쓰레드들의total weight에 기반하여 timeslice를 계산한다. 그로므로 각 쓰레드들은 total weight에 기반하여비례적으로 timslice를 받게 된다. CFS는 runabble한 쓰레드들이 한 바퀴 round-robin을 하는 시간을정하여 그 시간을 기준으로 CPU를 분배한다. 그 target time이 100ms라고 가정하면, 두개의 같은priority의 쓰레드들은 같은 weight을 가지게 됨으로 각 쓰레드는 50ms를 얻게 된다. 이는 두 쓰레드가 nice값0을 갖던, 19를 갖던 상관없이 50ms씩을 얻게 된다. 만일 4개의 쓰레드가 같은 nice값을 갖는다면 25ms씩 갖게된다. 스위칭 rate는 이전의 스케줄러와는 달리 nice 값의 level과는 무관하게 된 것이다.

재미있는 것은 스위칭 rate가 전체 시스템 부하와 관련있어 졌다는 것이다. 즉 시스템에 부하가 많아지면 응답성을 보장하기 위해서throughput을 약간 희생하게 될 것이다. 응답성의 level은 target time에 따라 조절되어 진다. 이 값은시스템 어드민에 의해 조절될 수 있다. 100ms의 값은 서버와 같이 throughput이 주요하고 응답성은 단지graphical한 interaction이 아니라 web page reload의 scalability를 중요시 여기는 시스템에서적절하다. CFS의 default target time은 20ms이다. 이것은 응답성이 중요한 desktopmachine에 적절하다.

하지만 시스템 부하가 정말 높아지게 되면 CFS는 응답성을 위하여 throughput을계속해서 희생하지 않는다. 왜냐하면 각 쓰레드가 얼마나 적은 timeslice를 받게 되느냐에 하한선이 있기 때문이다. 그지점에 도달하게 되었을 경우 새로운 쓰레드가 시스템에 들어가게 되면 per-thread time을 감소시키기보다는 쓰레드들을순환하기 위한 전체 시간을 증가시킬 것이다.

niceness0과 niceness5을 가지는 두개의쓰레드가 CPU를 공유하는 경우를 가정해보자. CFS는 이 쓰레듣르에게 1024와 335의 weight을 주게 된다. 그러므로쓰레드가 얻는 시간은 1024/(1024+355), 335/(1024+335)가 된다. 1024는 대략 335의 3배이기 때문에niceness0을 가진 쓰레드는 대략 100ms중에 75ms, niceness5를 가진 쓰레드는 100ms중에 15ms를 갖게될 것을 예측할 수 있다. niceness 5, 10 또는 niceness 0, 5를 갖던 상관 없이 같은 결과를 얻게 될것이다. 왜냐하면 어떻게 하여도 weight의 비율이 3:1이 될 것이기 때문이다. 즉, CPU proportion은niceness의 상대적 차이로 얻게 되는 것이지 이전과 같이 절대 차이로 인하여 얻게 되는 것이 아니다.

CFS의 big idea는 각 쓰레드들이 얼마나 많은 시간동안 실행을 했는지를 쓰레드의 weight으로 scale하여 유지하는것이다. 즉, niceness0 쓰레드는 수행시간을 1ns단위로 scale한다. 반면, niceness5 를 가진 쓰레드는실행시간을 대략 3ns단위로 scale하게 된다.

이렇게 weigh을 이용하여 scale하는 이유는 어떤 쓰레드가가장 뒤쳐져 있는지를 판별하기 위해서이다. 가장 뒤쳐져 있다는 것은 자신에게 주어진 timeslice를 가장 채우지 못한쓰레드라는 것이다. CFS가 timeslice를 채우지 못한 쓰레드들에게 계속해서 제어권을 넘겨 균형을 유지하려고 한다면끊임없이 가장 뒤쳐져 있는 쓰레드들로 스위칭을 해야하며 이는 매우 비효율적일 것이다. 얘기가 어려워진다. 쉽게 예를 들어설명하여 보자.

두 쓰레드가 같은 niceness를 가지고 있다면 스케줄러는 두 쓰레드가 항상같은 timeslice을 수행하고 있음을 보장하려고 애쓸 것이다. 즉, X ns 후에 살펴보니 정확하게 두 쓰레드가 X/2ns씩 수행을 해야 할 것이다. 이것을 만족시키기 위해서는 스케줄러는 끊임없이 쓰레드들 사이를 스위칭 해야 할 것이며 그것은너무 비효율적일 것이다. 그래서 스케줄러는 그렇게 자주 스위칭을 하는 것보다는 한 쓰레드를 그 쓰레드의 timeslice동안지켜보게 된다.(스케줄러는 쓰레드들을 20ms 주기로 스위칭을 한다고 가정하자) 그 결과, 60msec 후에 두 쓰레드가30ms씩 수행된 것이 아니라 40ms, 20ms 씩 수행된 것을 볼수도 있게 될 것이다. 하지만 스케줄러가 다음 실행할쓰레드를 결정할 때 20ms를 수행한 쓰레드 B를 선택하여 쓰레드 A를 빨리 따라잡게 만들 것이다. 이런 방법으로 쓰레드들이서로 제어권을 주고 받으며 수행하게 되면 어느 시점에서도 두 쓰레드들 사이의 차이가 그렇게 크게 되지 않게 만들 것이다.(이예제에서는 20ms 이내가 될 것이다.)

두 쓰레드가 다른 niceness를 가지고 있을 경우를고려해보자. 예를 들어 쓰레드 A는 niceness0을 가지고 있고 쓰레드 B는 niceness5를 가지고 있다고 가정하자.쉽게하기 위해 1024/335는 3이라고 하자. 즉 쓰레드 A는 쓰레드 B보다 3배 더 크다. 60ms이후에 이상적인 경우쓰레드 A는 45ms, 쓰레드 B는 15ms가 되야 한다. 하지만 스케줄러가 20ms마다 쓰레드들을 스위칭한다면 그 이상적인상황은 발생하지 않을 것이다. 쓰레드 A는 40ms, 쓰레드 B는 20ms가 수행된 것을 알 수 있다. 이 시점에서 다음 어떤쓰레드를 스케줄해야 할까?

B는 이미 자신의 할당받은 timeslice(15ms)를 넘어선 것을 알 수 있다.스케줄러가 이러한 상황을 판단하기 위해서는 각 쓰레드의 time을 scale fator를 가지고 곱하는 것이다. 예를 들어쓰레드 A는 scale factor가 1이다. 반면 쓰레드 B는 3이다. 그러므로 실제 run time이 40ms, 20ms인경우, virtual run time은 40ms, 60ms가 되는 것이다. 이제 명확해졌다. 쓰레드 A는 더 뒤쳐져 있다.그러므로 스케줄러는 쓰레드 A를 실행해야 할 것을 알게 된다.

위의 설명은 모든 쓰레드들이system이 처음 bootup됐을 때부터 시작되었고 계속해서 runnable하게 존재한다면 잘 동작할 것이다. 하지만 실제시스템은 그렇지않기 때문에 새로 만들어지는 쓰레드들과 잠에서 깨어나느 쓰레드들에 대한 고려가 반드시 필요하다. 이런 고려가 되지않을 경우, 새로 만들어지는 쓰레드나 잠들었다 깨어난 쓰레드들은 이미 존재하고 있던 쓰레드들을 따라잡을 때가지 수행되게 될것이다. 이는 바람직하지 못하다. 그러므로 하나의 쓰레드가 runqueue에 들어가게 될 경우 그 쓰레드의 virtualruntime은 존재하는 runnable한 쓰레드들의 가장 작은 virtual runtime으로 지정되고 약간 조절되게 된다.(새롭게 만들어지는 쓰레드 or 깨어나는 쓰레드냐에 따라). 이러한 방법으로 우선적으로 실행될 수 있는 구너한을 부여받고비정상적으로 오랫동안 수행되지 않을 것이다.

run queueu는 runnable thread들의virtual runtime으로 정렬한 red-black tree로 유지된다. CFS가 쓰레듣르을 스위칭하려고 할 때 가장왼쪽에 놓여 있는 쓰레드들 선택하게 된다. 즉 가장 작은 virtual runtime을 가진 쓰레드이다.

스케줄러는 2가지 상황에서 이러한 쓰레드들을 스위칭 한다. 첫째는 timeslice를 다 소진했을 경우이다. 다른 상황은 현재쓰레드가 수행된지 어느 정도 시간이 됐는데(이 값은 configurable하다.) 새로운 쓰레드가 runqueue에 들어오게되는 경우이다.

runnable 한 쓰레드들을 virtual runtime의 timeline에위치시키는 장점중 하나는 waking thread들이 다른 쓰레드들을 굶주리게 하는 경우를 막을 수 있다는 것이다.(이미 언급한바와 같이 이전 O(1) 스케줄러의 큰 문제점이기도 했다.) 시간이 가면서 깨어나는 쓰레드들은 virtual runtime의 timeline에 점점 더 뒤쪽으로 위치하게 된다. 즉, 현재 time을 기준으로 상대적으로 더 뒤쪽으로 위치하게 된다. 반면인내심을 가지고 CPU를 기다려온 쓰레드들은 고정된 virtual runtime을 갖게 된다. 즉, 이들은 이미 오래전 fix한 virtual runtime을 가지고 있다. 그러므로 마침내 가장 작은 virtual runtime을 갖게 될 것이며 실행되게될 것이다. sleeping과 wakeup을 가진 어떤 음모도 이제 통하지 않게 된다.

가장 이상적인 CFS slice는 다음과 같이 계산된다.

slice = (se->load.weight / cfs_rq->load.weight) * period

period는 스케줄러가 모든 task들을 수행하는데 걸리는 시간이다. default로 20ms로 되어 있으나 이 값은 고정은 아니다.

시스템에 runnable한 task들의 수가 sched_nr_latency보다 커질 수록 이 값은 증가하게 된다. 그렇지 않게 되면 각 task들에게 분배되는 slice가 너무 작아 context switching 오버헤드가 너무 커져 시스템이 비효율적일 수 있기 때문이다.

se는 sched_entity를 의미하는 것이다. 따로 설명은 안했지만 CFS에서의 sched_entity는 단지 task만 될 수 있는 것이 아니라, user, container등 보다 다양한 hierachy를 제공한다. 여기서는 se가 task로 봐도 무방하다.

즉, slice는 한 period 중 태스크의 weight이 cfs_rq의 전체 weight중에 차지하는 비율이다.

또한 실행중 특정 task의 vruntime은 다음과 같이 계산된다.

vruntime += (NICE_0_LOAD / se->load.weight) * delta_exec

여기서 delta_exec는 그 cpu를 할당받아 수행된 시간이다. NICE_0_LOAD는 상수값 1024이다.

즉 vruntime은 위에서 이미 설명한 것과 같이 task의 weight으로 수행시간을 scale한 결과이다.

위의 식 이전의 식과 결합하면 다음과 같이 될 수 있다.

vruntime += (period/cfs_rq->load.weight) * NICE_0_LOAD

위의 식이 증명하는 것과 같이 period와 cfs_rq->load.weight은 모든 task들에게서 같은 값을 가지기 때문에 모든 task들의 vruntime은 같은 값 갖게 된는 것이다. 아주 이상적인 경우에 말이다.

그러므로 vruntime 값이 적다는 것은 그 task가 자신의 slice를 충분히 사용하지 못했다는 것이고 그 task가 rbtree의 leftmost에 있게 되며 가장 긴급히 수행시켜줘야 할 task가 되는 것이다.

Friday, April 4, 2014

Today

금요일이라 학원일찍 끝나고 근처 영화관에 영화보러 갔다.

날씨가 참 좋다. 여기는 샌프란시스코 다운타운.

공원에 누워서 자는 사람도 있다.

학원 선생님이 추천해줬던 SUPER DUPER.

지나가다가 봤는데 갑자기 생각나서 점심으로 먹기로 했다.

줄서서 먹는 햄버거 가게.

메뉴는 여러가지 있는데 나는 SUPER BURGER을 시켰다.

종업원이 치즈 먹을꺼냐고 물어봐서 yes라고 했는데, 나는 extra 치즈가 아니고 기본 재료인줄 알았다. 알고보니 extra 치즈였다. 진작에 말해주지 그냥 치즈 먹을래? 라고 물어 봤으면서. 나는 우리나라로 치면 세트메뉴(버거,감자튀김,음료)인 meal을 안하고 버거랑 soda만 주문했다. soda는 fountain형식으로 무한 리필가능.

인상적이었던 것은 패티가 두장인데, 무지 맛있다는 것이다. 두꺼운 패티 두장이 식감도 너무 좋았다. 또 먹고 싶다.

SUPER과 SODA만 주문한 가격. 맛은 있지만 비싸다. 가격과 맛을 따지면

버거는 in and out가 최고인 것 같다.

SUPER DUPER먹고나서 같은 건물에 있는 amc영화관에 갔다,

12시 30분 캡틴아메리카:윈터솔져 이고 가격은 10달러 정도. 한국이랑 별차이 없다.

특이한 점은 지정좌석제가 아니라 자유좌석제 라는 것이다. 선착순으로 앉고 싶은데 앉으면 된다. 평일 낮이라 그런지 사람이 별로 없다. 앉고 싶은데 앉으면 된다. 그리고 영화 티켓을 상영관으로 통하는 입구에서만 검사하기 때문에, 영화 하나값으로 하루종일

다른 영화도 감상할 수 있다. 자유좌석이기 때문에 아무데나 들어가서 앉으면 된다.

물론 불법이다.

LUCY 트레일러에서 최민식씨가 나온다.

꽤 비중있어 보이는데 등장인물 소개에는 안나오더라.

캡틴 아메리카 보니깐 포스터 준다길래 받아왔다. 멋있네.

Thursday, April 3, 2014

Today

수업 하나 끝내고 점심시간에 먹었던 CHIPOTLE 부리또.
특징으로는 부르또안에 들어가는 재료를 일일히 선택해야한다. 종업원이 주로 히스패닉계 아줌마 였는데, 내가 말하는 걸 못알아 들어서 부리또 하나 주문하는데 힘들었다.

하숙집 와서 조금 쉬다가 집근처 GYM에서 운동하고 난 뒤에 근처 쇼핑몰에 있는

Apple Store에 갔다. 요즘 ipad air에 관심이 많아져서 그냥 한번 보러갔다. 항상 느끼는 건데 Apple Store는 하나부터 열까지 전부 내맘에 든다. 인테리어며, 종업원들의 친절함과 여러가지 Apple의 디바이스들을 체험해볼 수 있다. 한국에 Apple Store가 없다는게 너무 아쉽고, 이렇게 좋은 서비스와 깔끔한 Store가 있기에 미국인들 대부분이 Apple의 제품을 사용하고 있음을 알 수 있다. 실제로 지하철이나 거리에서 100명중 99명의 미국인이 아이폰을 사용한다. 가끔 삼성의 갤럭시가 보이긴 하는데 여태껏 두번 정도 본 것 같다. 왜 아이폰을 고집하는지 Apple Store에 가본다면 단번에 알 것이다.

아이패드 구경하고 집에와서 먹은 저녁.

스파게티랑 샐러드 그리고 바나나.

Tuesday, September 10, 2013

CFS

The Linux scheduler is an interesting study in competing pressures. On one side are the use models in which Linux is applied. Although Linux was originally developed as a desktop operating system experiment, you'll now find it on servers, tiny embedded devices, mainframes, and supercomputers. Not surprisingly, the scheduling loads for these domains differ. On the other side are the technological advances made in the platform, including architectures (multiprocessing, symmetric multithreading, non-uniform memory access [NUMA]) and virtualization. Also embedded here is the balance between interactivity (user responsiveness) and overall fairness. From this perspective, it's easy to see how difficult the scheduling problem can be within Linux.

A short history of Linux schedulers

Early Linux schedulers used minimal designs, obviously not focused on massive architectures with many processors or even hyperthreading. The 1.2 Linux scheduler used a circular queue for runnable task management that operated with a round-robin scheduling policy. This scheduler was efficient for adding and removing processes (with a lock to protect the structure). In short, the scheduler wasn't complex but was simple and fast.

Linux version 2.2 introduced the idea of scheduling classes, permitting scheduling policies for real-time tasks, non-preemptible tasks, and non-real-time tasks. The 2.2 scheduler also included support for symmetric multiprocessing (SMP).

The 2.4 kernel included a relatively simple scheduler that operated in O(N) time (as it iterated over every task during a scheduling event). The 2.4 scheduler divided time into epochs, and within each epoch, every task was allowed to execute up to its time slice. If a task did not use all of its time slice, then half of the remaining time slice was added to the new time slice to allow it to execute longer in the next epoch. The scheduler would simply iterate over the tasks, applying a goodness function (metric) to determine which task to execute next. Although this approach was relatively simple, it was relatively inefficient, lacked scalability, and was weak for real-time systems. It also lacked features to exploit new hardware architectures such as multi-core processors.

The early 2.6 scheduler, called the O(1) scheduler, was designed to solve many of the problems with the 2.4 scheduler—namely, the scheduler was not required to iterate the entire task list to identify the next task to schedule (resulting in its name, O(1), which meant that it was much more efficient and much more scalable). The O(1) scheduler kept track of runnable tasks in a run queue (actually, two run queues for each priority level—one for active and one for expired tasks), which meant that to identify the task to execute next, the scheduler simply needed to dequeue the next task off the specific active per-priority run queue. The O(1) scheduler was much more scalable and incorporated interactivity metrics with numerous heuristics to determine whether tasks were I/O-bound or processor-bound. But the O(1) scheduler became unwieldy in the kernel. The large mass of code needed to calculate heuristics was fundamentally difficult to manage and, for the purist, lacked algorithmic substance.

Processes vs. threads

Linux incorporates process and thread scheduling by treating them as one in the same. A process can be viewed as a single thread, but a process can contain multiple threads that share some number of resources (code and/or data).

Given the issues facing the O(1) scheduler and other external pressures, something needed to change. That change came in the way of a kernel patch from Con Kolivas, with his Rotating Staircase Deadline Scheduler (RSDL), which included his earlier work on the staircase scheduler. The result of this work was a simply designed scheduler that incorporated fairness with bounded latency. Kolivas' scheduler impressed many (with calls to incorporate it into the current 2.6.21 mainline kernel), so it was clear that a scheduler change was on the way. Ingo Molnar, the creator of the O(1) scheduler, then developed the CFS based around some of the ideas from Kolivas' work. Let's dig into the CFS to see how it operates at a high level.

An overview of CFS

The main idea behind the CFS is to maintain balance (fairness) in providing processor time to tasks. This means processes should be given a fair amount of the processor. When the time for tasks is out of balance (meaning that one or more tasks are not given a fair amount of time relative to others), then those out-of-balance tasks should be given time to execute.

To determine the balance, the CFS maintains the amount of time provided to a given task in what's called the virtual runtime. The smaller a task's virtual runtime—meaning the smaller amount of time a task has been permitted access to the processor—the higher its need for the processor. The CFS also includes the concept of sleeper fairness to ensure that tasks that are not currently runnable (for example, waiting for I/O) receive a comparable share of the processor when they eventually need it.

But rather than maintain the tasks in a run queue, as has been done in prior Linux schedulers, the CFS maintains a time-ordered red-black tree (see Figure 1). A red-black tree is a tree with a couple of interesting and useful properties. First, it's self-balancing, which means that no path in the tree will ever be more than twice as long as any other. Second, operations on the tree occur in O(log n) time (where n is the number of nodes in the tree). This means that you can insert or delete a task quickly and efficiently.

Figure 1. Example of a red-black tree
Figure 2. Structure hierarchy for tasks and the red-black tree Figure 3. Graphical view of scheduling classes

static inline void check_preempt( struct rq *rq, struct task_struct *p )
{
  rq->curr->sched_class->check_preempt_curr( rq, p );
}

referenece by http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

Example of a red-black tree

With tasks (represented by sched_entity objects) stored in the time-ordered red-black tree, tasks with the gravest need for the processor (lowest virtual runtime) are stored toward the left side of the tree, and tasks with the least need of the processor (highest virtual runtimes) are stored toward the right side of the tree. The scheduler then, to be fair, picks the left-most node of the red-black tree to schedule next to maintain fairness. The task accounts for its time with the CPU by adding its execution time to the virtual runtime and is then inserted back into the tree if runnable. In this way, tasks on the left side of the tree are given time to execute, and the contents of the tree migrate from the right to the left to maintain fairness. Therefore, each runnable task chases the other to maintain a balance of execution across the set of runnable tasks.

CFS internals

All tasks within Linux are represented by a task structure called task_struct. This structure (along with others associated with it) fully describes the task and includes the task's current state, its stack, process flags, priority (both static and dynamic), and much more. You can find this and many of the related structures in ./linux/include/linux/sched.h. But because not all tasks are runnable, you won't find any CFS-related fields in task_struct. Instead, a new structure called sched_entity was created to track scheduling information (see Figure 2).

Structure hierarchy for tasks and the red-black tree

The relationships of the various structures are shown in Figure 2. The root of the tree is referenced via the rb_root element from the cfs_rq structure (in ./kernel/sched.c). Leaves in a red-black tree contain no information, but internal nodes represent one or more tasks that are runnable. Each node in the red-black tree is represented by an rb_node, which contains nothing more than the child references and the color of the parent. The rb_node is contained within the sched_entity structure, which includes therb_node reference, load weight, and a variety of statistics data. Most importantly, the sched_entity contains the vruntime (64-bit field), which indicates the amount of time the task has run and serves as the index for the red-black tree. Finally, the task_structsits at the top, which fully describes the task and includes the sched_entity structure.

The scheduling function is quite simple when it comes to the CFS portion. In ./kernel/sched.c, you'll find the generic schedule()function, which preempts the currently running task (unless it preempts itself with yield()). Note that CFS has no real notion of time slices for preemption, because the preemption time is variable. The currently running task (now preempted) is returned to the red-black tree through a call to put_prev_task (via the scheduling class). When the schedule function comes to identifying the next task to schedule, it calls the pick_next_task function. This function is also generic (within ./kernel/sched.c), but it calls the CFS scheduler through the scheduler class. The pick_next_task function in CFS can be found in ./kernel/sched_fair.c (calledpick_next_task_fair()). This function simply picks the left-most task from the red-black tree and returns the associatedsched_entity. With this reference, a simple call to task_of() identifies the task_struct reference returned. The generic scheduler finally provides the processor to this task.

Priorities and CFS

CFS doesn't use priorities directly but instead uses them as a decay factor for the time a task is permitted to execute. Lower-priority tasks have higher factors of decay, where higher-priority tasks have lower factors of delay. This means that the time a task is permitted to execute dissipates more quickly for a lower-priority task than for a higher-priority task. That's an elegant solution to avoid maintaining run queues per priority.

CFS group scheduling

Another interesting aspect of CFS is the concept of group scheduling (introduced with the 2.6.24 kernel). Group scheduling is another way to bring fairness to scheduling, particularly in the face of tasks that spawn many other tasks. Consider a server that spawns many tasks to parallelize incoming connections (a typical architecture for HTTP servers). Instead of all tasks being treated fairly uniformly, CFS introduces groups to account for this behavior. The server process that spawns tasks share their virtual runtimes across the group (in a hierarchy), while the single task maintains its own independent virtual runtime. In this way, the single task receives roughly the same scheduling time as the group. You'll find a /proc interface to manage the process hierarchies, giving you full control over how groups are formed. Using this configuration, you can assign fairness across users, across processes, or a variation of each.

Scheduling classes and domains

Also introduced with CFS is the idea of scheduling classes (recall from Figure 2). Each task belongs to a scheduling class, which determines how a task will be scheduled. A scheduling class defines a common set of functions (via sched_class) that define the behavior of the scheduler. For example, each scheduler provides a way to add a task to be scheduled, pull the next task to be run, yield to the scheduler, and so on. Each scheduler class is linked with one another in a singly linked list, allowing the classes to be iterated (for example, for the purposes of enablement of disablement on a given processor). The general structure is shown in Figure 3. Note here that enqueue and dequeue task functions simply add or remove a task from the particular scheduling structures. The function pick_next_task chooses the next task to execute (depending upon the particular policy of the scheduling class).

But recall that the scheduling classes are part of the task structure itself (see Figure 2). This simplifies operations on tasks, regardless of their scheduling class. For example, the following function preempts the currently running task with a new task (wherecurr defines the currently running task, rq represents the red-black tree for CFS, and p is the next task to schedule) from ./kernel/sched.c:

If this task were using the fair scheduling class, check_preempt_curr() would resolve to check_preempt_wakeup(). You can see these relationships in ./kernel/sched_rt.c, ./kernel/sched_fair.c and ./kernel/sched_idle.c.

Scheduling classes are yet another interesting aspect of the scheduling changes, but the functionality grows with the addition of scheduling domains. These domains allow you to group one or more processors hierarchically for purposes load balancing and segregation. One or more processors can share scheduling policies (and load balance between them) or implement independent scheduling policies to intentionally segregate tasks.

Other schedulers

Work on scheduling continues, and you'll find schedulers under development that push the boundaries of performance and scaling. Con Kolivas was not deterred by his Linux experience and has developed another scheduler for Linux with a provocative acronym: BFS. The scheduler was reported to have better performance on NUMA systems as well as mobile devices and was introduced into a derivative of the Android operating system.

Wednesday, September 4, 2013

[reference] EFI 윈도우7 과 우분투 듀얼부팅

[Ubuntu] Thinkpad E320 1298-rk9 윈도우7 과 우분투 듀얼부팅 설치 부터 셋팅까지 -(1)

reference : http://sungliky.blogspot.kr/2012/10/ubuntu-thinkpad-e320-1298-rk9-7-1.html

TP(Thinkpad) Edge E320

2G intel core i5-2450(2.5ghz)2core 4thread / LED backlight /13.3' /1366x768 /320GB /4GB DDR3

/ Free-DOS / AMD Radeon HD6630M 1GB DDR3 / 1.86Kg / 6cell / 1Gbps Ethernet / 802.11n Wlan /Bluetooth 3.0 / HDMI / D-SUB / Web CAM / USB2.0 x2 / e-SATA / Multi-Reader

씽크패드 E320 모델은 뒤늦게 바이오스 롬업을 통해 uEFI 바이오스 부팅모드를 지원하게됩니다.

이때문에 E320노트북에서 윈도우7 과 우분투의 멀티부팅(듀얼부팅)설치에 애로사항을 겪을수 있습니다.

윈도우 7 & 우분투 듀얼 OS 멀티부트 설치에서 우분투 설치후 셋팅까지 그 과정을 포스팅하겠습니다.

*꼭 E320이 아니더라도 UEFI 메인보드를 사용하는 기타 PC에도 해당되는 내용입니다.

용어정리

MBR(Master Boot Record) = MBR[엠비알]은 운영체계가 어디에, 어떻게 위치해 있는지를 식별하여 컴퓨터의 주기억장치에 적재될 수 있도록 하기 위한 정보로서 하드디스크나 디스켓의 첫 번째 섹터에 저장되어 있다. MBR은 또한 "파티션 섹터" 또는 "마스터 파티션 테이블"이라고도 불리는데, 그 이유는 하드디스크가 포맷될 때 나뉘어지는 각 파티션의 위치에 관한 정보를 가지고 있기 때문이다. 그외에도, MBR은 메모리에 적재될 운영체계가 저장되어 있는 파티션의 부트 섹터 레코드를 읽을 수 있는 프로그램을 포함하고 있는데, 부트 섹터 레코드에는 다시 운영체계의 나머지 부분들을 메모리에 적재시키는 프로그램을 담고 있다.

*출처 = http://www.terms.co.kr/MBR.htm

3~4개의 프라이머리 파티션과 확장파티션을 지원합니다.

GPT(Guid Partition Table)= MBR 과 마찬가지로 디스크 정보를 담고있는 영역이며 EFI 시스템에서 사용됩니다. MBR에 단점을 극복하기위해 만들어졌습니다.(MBR = 3~4개의 파티션 GPT = 128개 지원) 주로 외장하드의 자료저장용으로 사용됩니다.

*출처 = 자료가 부족하여 직접 적었습니다.

UEFI(Unified Extensible Firmware Interface) = 바이오스는 펌웨어(롬) <-> 운영체제 에서 부팅에 필요한 하드웨어 정보를 넘겨주는 역할을합니다.(간단하게 말하자면)

EFI 는 펌웨어<-> 펌웨어 인터페이스 <-> 운영체제 로써 한단계 발전된 형태입니다.

EFI 가 규격이 되고 일반 IBM PC 이를 통합하여 한단계더 규격화 시킨것이 UEFI 입니다.(흔히들 UEFI 하면 GUI형태의 bios 를 생각하시는분들이많은데 모든 PC가 GUI로 된것은아닙니다.)

*출처 = 이것역시 자료가 부족해서....

-방법론-

1. UEFI 를 지원하는 메인보드에서는 MBR로 윈도우7과 우분투를 멀티부트 사용할수없다.

(우분투가 설치는 되지만 부트로더(Grub2)를 할당하지 못합니다.)

2.우분투는 기존 OS 를 탐지했을때 기존OS 의 하드 방식을 따라갑니다.

(예: 윈도7이 MBR이면 우분투도 MBR , 윈도7이 GPT이면 우분투도 GPT)

3.MBR방식이 안되기때문에 GPT방식으로 설치해야하는데 GPT방식은 EFI 모드만 지원합니다.

4.윈도7을 EFI로 부팅할수 있도록 설치(GPT방식)한후 우분투를 설치하여 멀티부팅 합니다.

5. 윈도7 은 반드시 64 bit여야 한다. *우분투 64 bit(32bit 는 시동이안됩니다. 저만 그런걸수도있으니 확실하진 않습니다.)

자 이제 시작해 보겠습니다.

1. 우선 윈도우7을 EFI 로 부팅할수있도록 설치하기위해 윈도우7설치 화일을 수정할 필요가있습니다.

자 지금은 윈도우입니다. 윈도우키 + R 를 눌러 cmd를 입력하여 명령프롬프트를 실행합니다.(또는 윈도->실행->cmd)

> diskpart (리눅스에서 df와 비슷한 명령어입니다. 또는 fdisk)

> list disk (현재 마운트된 디스크를 확인합니다. usb를 찾습니다. 용량을 보면 쉽게 찾을 수 있습니다.)

> select disk N (디스크를 선택합니다. N은 디스크 번호입니다. 유에스비가 3번이면 3을 넣어줍니다.)

> list disk (다시 리스트를 봅니다. 선택된 디스크는 * 표시됩니다. usb가 선택되었는지 확인합니다.)

> clean (디스크를 정리합니다. 확 날리는거죠)

> create partition primary (프라이머리 파티션을 생성해줍니다.)

> format quick fs=fat32 ( fat32로 빠른 포맷 해줍니다.)

> active ( 부팅을 위해 드라이브를 활성화 시켜줍니다.)

> exit (diskpart 를 종료합니다.)

이제 usb 초기화가 끝났습니다.

이제 윈도7 dvd를 usb로 옮기겠습니다.

명령 프롬프트가 열려있죠? 자 이제 윈도7 dvd를 삽입합니다.

> xcopy e: f: /s /e

(e: 는 dvd롬 드라이브 구요 f:는 usb드라이브입니다. 사용자에따라서 다를수있습니다. 옵션 /s 는 내용이있는 폴더를 복사하라는겁니다. /e 는 내용이 없는 폴더도 복사하라는겁니다. 따라서 모!!!든 내용을 복사하라는 옵션입니다.)

이제 윈도7 usb가 만들어졌습니다.

https://docs.google.com/open?id=0ByBboPE-bfeOc1RfRGxQMC1CWUU

파일을 다운받고 usb드라이브에 복사합니다.

usb드라이브에는 ?:\efi\boot 이런 경로 이면 됩니다.

만약 기존에 efi폴더가 있다면 덮어씌우시면됩니다.

이제 설치 usb를 수정하는데 끝났습니다. 이제 설치해봅니다.

2. 설치해봅시다.

cmos셋업에서 부트모드를 uefi only로 변경합니다. 저장하고 나오시구요

usb로 부팅합니다.

데이터는 미리 백업하셨다고 생각하구요 모든 파티션을 지웁니다.

그리고 필요한 파티션을 생성해줍니다.

제 경우는 다음그림과 같습니다.

다음 그림과같이

1번 파티션에는 EFI 부트파티션이 생성되었습니다.(fat32)

2번은 MSR파티션이 생성됩니다.

(msr은 볼륨과볼륨사이 메타데이터를 저장하는건데 윈도7을 위해 생성되는 파티션입니다. 신경안쓰셔도됩니다. 실제로 사용하는 공간은아닙니다.)

3번은 윈도우를 설치하기위해 4번은 우분투를위해 생성하였습니다.

실제로는 두개의 파티션을 나눴지만 자동으로 EFI,MSR 파티션이 생성됩니다.

EFI,MSR이 생성되었다면 정상입니다.

4번 파티션은 굳이 포맷하진 않겠습니다.(어차피 우분투설치할때 포맷할것이기때문)

설치를 진행합니다. 설치와 함께 윈도우부트메니져는 efi파티션에 설치될것이고 앞으로 부팅은 이파티션을 통해서 진행될것입니다.

설치가 완료되었다면 이제 올바르게 efi부트로 부팅되었는지 확인해봅시다.

우선 파티션을 확인하구요..

다음과같이 EFI 파티션이 보이는지 확인합니다.

c:\windows\Panther\setupact 을 메모장으로 열어주시고

callback_boot로 검색해주세요.

다음과같이 부팅 환경이 EFI 로 표시되는지 확인 해 주세요

(* MBR모드는 bios로 표시됩니다.)

3.이제 우분투를 설치해봅시다. 우분투 라이브 CD또는 우분투 USB로 부팅합니다.

( usb 만드는방법은 생략합니다.)

우분투는 기존 OS의 방식을 따르기때문에 특별한 설치법은 없습니다. 사용자 기호에 맞게 설치하시면 됩니다.

제가 설치한방식은 사진 나갑니다.

기존 os를 인식못하네요 쌩까고 '기타'를 누릅니다. 어차피 윈도7을 인식하더라도 '기타'를 누를꺼니까요!

윈도7을 설치할 당시에 우분투를 설치할공간을 포맷하지 않기때문에 '남은공간'에 우분투를 설치하겠습니다.

모든 공간을 다 할당해 주겠습니다.

[스왑(SWAP) 영역은 따로 설정하지 않겠습니다. 스왑영역은 메모리 할당의 부담을 줄이기위해 리눅스운영체제에서 사용되는데요 램메모리 빵빵하기때문에 생략하겠습니다. 일부에서는 스왑을 해야된다 안해야된다 라는 말이 분분한데요 요즘같이 고성능 PC가 나오는 시대에선 생략해도 무관하다고 생각하기때문에 생략하겠습니다. '저는요' 이는 사용자 기호에 맞게 설정하시면 되겠습니다.]

자동으로 리부팅됩니다. 리부팅되면 기본적으로 우분투로 부팅되게 됩니다.

위 그림과같이 그래픽카드가 안잡힌것을 볼수있습니다.

여기서 중요한것은 '추가드라이버' 잡기를 통해서 그래픽카드를 잡으면 안된다는것입니다.

추가드라이버를 통해서 그래픽카드를 잡지마시구요 일단 밀린 업데이트를 시켜주세요

위 설치과정을 촬영할 당시에는 12.04가 최신이었는데 일주일사이에 12.10이 정식으로 릴리즈되었네요.

===================================================================

24시간동안 씨름한 끝에 겨우 성공했네요.

처음에 윈도우7과 우분투 12.04 듀얼부팅 하려고 했는데, 우분투가 설치되었는데 grub실행 안되고 바로 윈도우7로 부팅이 되는겁니다.

처음에는 grub문제 인줄 알고, grub 복구 쪽에만 신경 쓴다고 별 짓을 다했네요.

그러나 문제는, EFI에 있었다는 것...

EFI를 지원하는 보드의 경우를 찾아가 여기 reference한 블로그를 찾게 되었고,

결국 우분투, 윈7 듀얼부팅에 성공했습니다.

윈7 부팅 usb만드실때는 반드시 순정 iso이용 하세요. MSDN에서 다운받으실 수있습니다.

Wednesday, August 21, 2013

[reference] list_head

이런 offset 매크로 정의 하나만 놓고 본다면 별 쓸모가 없을 것 같네요. 이런
표현을 조금 응용하면 꽤 유용한 것이 나옵니다. Linux 커널에서 사용하는list_head라는게 있는데, 이런 표현과 유사한 방법을 사용하지요. 아시는 분도많이 계시겠지만 간단히 소개를 드리겠습니다. 우선 구조체 정의는 다음과같습니다

(linux/list.h):

struct list_head
{ struct list_head *next, *prev;};

뭐.. 별거 없습니다. 그리고 offset을 구하는 식을 다음과 같이 사용하고있습니다

(linux/list.h):

#define list_entry(ptr, type, member) \ ((type *)((char *)(ptr) - (unsigned long)(&((type *)0)->member)))

자.. 이제 모든 리스트를 관리하는데 있어서, list_head 구조체 한가지만 있으면되는 세상이 열리게 되는거지요. 태스크리스트, 소켓버퍼리스트, 버스리스트 등몽땅 list_head 포인터로 선언해 버리고 list_head 관련 함수로 삽입, 삭제, 정렬등의 리스트 관리를 합니다. 그리고 해당 리스트 엔트리의 내용을 참조, 혹은업데이트할 필요가 있을때는 list_entry 매크로로 언제든지 엔트리의 포인터를다시 만들어 낼 수 있는 것이지요. list_head를 사용하는 다음과 같은 구조체가있다고 할때

(linux/timer.h):

struct timer_list { struct list_head list; unsigned long expires; unsigned long data; void (*function)(unsigned long); long sub_expires;};

어떤 struct list_head *p가 struct timer_list의 멤버 list의 포인터라는 것을알고 있는 함수는 list_entry(p, struct timer_list, list)로 언제든지 p가대표하는 리스트 엔트리를 뽑아서 사용할 수 있습니다. 거의 C에서 C++의 STL을구현한 듯한.. (물론 차이가 많습니다만)reference : http://blog.naver.com/PostView.nhn?blogId=popjunior&logNo=80021648631

[출처] list_head 사용 원리????|작성자 포비